-
GROMACS on AMD GPU-Based HPC Platforms: Using SYCL for Performance and Portability
Authors:
Andrey Alekseenko,
Szilárd Páll,
Erik Lindahl
Abstract:
GROMACS is a widely-used molecular dynamics software package with a focus on performance, portability, and maintainability across a broad range of platforms. Thanks to its early algorithmic redesign and flexible heterogeneous parallelization, GROMACS has successfully harnessed GPU accelerators for more than a decade. With the diversification of accelerator platforms in HPC and no obvious choice fo…
▽ More
GROMACS is a widely-used molecular dynamics software package with a focus on performance, portability, and maintainability across a broad range of platforms. Thanks to its early algorithmic redesign and flexible heterogeneous parallelization, GROMACS has successfully harnessed GPU accelerators for more than a decade. With the diversification of accelerator platforms in HPC and no obvious choice for a multi-vendor programming model, the GROMACS project found itself at a crossroads. The performance and portability requirements, and a strong preference for a standards-based solution, motivated our choice to use SYCL on both new HPC GPU platforms: AMD and Intel. Since the GROMACS 2022 release, the SYCL backend has been the primary means to target AMD GPUs in preparation for exascale HPC architectures like LUMI and Frontier. SYCL is a cross-platform, royalty-free, C++17-based standard for programming hardware accelerators. It allows using the same code to target GPUs from all three major vendors with minimal specialization. While SYCL implementations build on native toolchains, performance of such an approach is not immediately evident. Biomolecular simulations have challenging performance characteristics: latency sensitivity, the need for strong scaling, and typical iteration times as short as hundreds of microseconds. Hence, obtaining good performance across the range of problem sizes and scaling regimes is particularly challenging. Here, we share the results of our work on readying GROMACS for AMD GPU platforms using SYCL, and demonstrate performance on Cray EX235a machines with MI250X accelerators. Our findings illustrate that portability is possible without major performance compromises. We provide a detailed analysis of node-level kernel and runtime performance with the aim of sharing best practices with the HPC community on using SYCL as a performance-portable GPU framework.
△ Less
Submitted 2 May, 2024;
originally announced May 2024.
-
Tunable and Portable Extreme-Scale Drug Discovery Platform at Exascale: the LIGATE Approach
Authors:
Gianluca Palermo,
Gianmarco Accordi,
Davide Gadioli,
Emanuele Vitali,
Cristina Silvano,
Bruno Guindani,
Danilo Ardagna,
Andrea R. Beccari,
Domenico Bonanni,
Carmine Talarico,
Filippo Lunghini,
Jan Martinovic,
Paulo Silva,
Ada Bohm,
Jakub Beranek,
Jan Krenek,
Branislav Jansik,
Luigi Crisci,
Biagio,
Cosenza,
Peter Thoman,
Philip Salzmann,
Thomas Fahringer,
Leila Alexander,
Gerardo Tauriello
, et al. (10 additional authors not shown)
Abstract:
Today digital revolution is having a dramatic impact on the pharmaceutical industry and the entire healthcare system. The implementation of machine learning, extreme-scale computer simulations, and big data analytics in the drug design and development process offers an excellent opportunity to lower the risk of investment and reduce the time to the patient.
Within the LIGATE project, we aim to i…
▽ More
Today digital revolution is having a dramatic impact on the pharmaceutical industry and the entire healthcare system. The implementation of machine learning, extreme-scale computer simulations, and big data analytics in the drug design and development process offers an excellent opportunity to lower the risk of investment and reduce the time to the patient.
Within the LIGATE project, we aim to integrate, extend, and co-design best-in-class European components to design Computer-Aided Drug Design (CADD) solutions exploiting today's high-end supercomputers and tomorrow's Exascale resources, fostering European competitiveness in the field.
The proposed LIGATE solution is a fully integrated workflow that enables to deliver the result of a virtual screening campaign for drug discovery with the highest speed along with the highest accuracy. The full automation of the solution and the possibility to run it on multiple supercomputing centers at once permit to run an extreme scale in silico drug discovery campaign in few days to respond promptly for example to a worldwide pandemic crisis.
△ Less
Submitted 19 April, 2023;
originally announced April 2023.
-
Heterogeneous Parallelization and Acceleration of Molecular Dynamics Simulations in GROMACS
Authors:
Szilárd Páll,
Artem Zhmurov,
Paul Bauer,
Mark Abraham,
Magnus Lundborg,
Alan Gray,
Berk Hess,
Erik Lindahl
Abstract:
The introduction of accelerator devices such as graphics processing units (GPUs) has had profound impact on molecular dynamics simulations and has enabled order-of-magnitude performance advances using commodity hardware. To fully reap these benefits, it has been necessary to reformulate some of the most fundamental algorithms, including the Verlet list, pair searching and cut-offs. Here, we presen…
▽ More
The introduction of accelerator devices such as graphics processing units (GPUs) has had profound impact on molecular dynamics simulations and has enabled order-of-magnitude performance advances using commodity hardware. To fully reap these benefits, it has been necessary to reformulate some of the most fundamental algorithms, including the Verlet list, pair searching and cut-offs. Here, we present the heterogeneous parallelization and acceleration design of molecular dynamics implemented in the GROMACS codebase over the last decade. The setup involves a general cluster-based approach to pair lists and non-bonded pair interactions that utilizes both GPUs and CPU SIMD acceleration efficiently, including the ability to load-balance tasks between CPUs and GPUs. The algorithm work efficiency is tuned for each type of hardware, and to use accelerators more efficiently we introduce dual pair lists with rolling pruning updates. Combined with new direct GPU-GPU communication as well as GPU integration, this enables excellent performance from single GPU simulations through strong scaling across multiple GPUs and efficient multi-node parallelization.
△ Less
Submitted 7 September, 2020; v1 submitted 16 June, 2020;
originally announced June 2020.
-
Effective degrees of freedom for surface finish defect detection and classification
Authors:
Natalya Pya Arnqvist,
Blaise Ngendangenzwa,
Eric Lindahl,
Leif Nilsson,
Jun Yu
Abstract:
One of the primary concerns of product quality control in the automotive industry is an automated detection of defects of small sizes on specular car body surfaces. A new statistical learning approach is presented for surface finish defect detection based on spline smoothing method for feature extraction and $k$-nearest neighbour probabilistic classifier. Since the surfaces are specular, structure…
▽ More
One of the primary concerns of product quality control in the automotive industry is an automated detection of defects of small sizes on specular car body surfaces. A new statistical learning approach is presented for surface finish defect detection based on spline smoothing method for feature extraction and $k$-nearest neighbour probabilistic classifier. Since the surfaces are specular, structured lightning reflection technique is applied for image acquisition. Reduced rank cubic regression splines are used to smooth the pixel values while the effective degrees of freedom of the obtained smooths serve as components of the feature vector. A key advantage of the approach is that it allows reaching near zero misclassification error rate when applying standard learning classifiers. We also propose probability based performance evaluation metrics as alternatives to the conventional metrics. The usage of those provides the means for uncertainty estimation of the predictive performance of a classifier. Experimental classification results on the images obtained from the pilot system located at Volvo GTO Cab plant in Umeå, Sweden, show that the proposed approach is much more efficient than the compared methods.
△ Less
Submitted 20 June, 2019;
originally announced June 2019.
-
Tackling Exascale Software Challenges in Molecular Dynamics Simulations with GROMACS
Authors:
Páll Szilárd,
Mark James Abraham,
Carsten Kutzner,
Berk Hess,
Erik Lindahl
Abstract:
GROMACS is a widely used package for biomolecular simulation, and over the last two decades it has evolved from small-scale efficiency to advanced heterogeneous acceleration and multi-level parallelism targeting some of the largest supercomputers in the world. Here, we describe some of the ways we have been able to realize this through the use of parallelization on all levels, combined with a cons…
▽ More
GROMACS is a widely used package for biomolecular simulation, and over the last two decades it has evolved from small-scale efficiency to advanced heterogeneous acceleration and multi-level parallelism targeting some of the largest supercomputers in the world. Here, we describe some of the ways we have been able to realize this through the use of parallelization on all levels, combined with a constant focus on absolute performance. Release 4.6 of GROMACS uses SIMD acceleration on a wide range of architectures, GPU offloading acceleration, and both OpenMP and MPI parallelism within and between nodes, respectively. The recent work on acceleration made it necessary to revisit the fundamental algorithms of molecular simulation, including the concept of neighborsearching, and we discuss the present and future challenges we see for exascale simulation - in particular a very fine-grained task parallelism. We also discuss the software management, code peer review and continuous integration testing required for a project of this complexity.
△ Less
Submitted 1 June, 2015;
originally announced June 2015.