-
Fast Truncated SVD of Sparse and Dense Matrices on Graphics Processors
Authors:
Andres E. Tomas,
Enrique S. Quintana-Orti,
Hartwig Anzt
Abstract:
We investigate the solution of low-rank matrix approximation problems using the truncated SVD. For this purpose, we develop and optimize GPU implementations for the randomized SVD and a blocked variant of the Lanczos approach. Our work takes advantage of the fact that the two methods are composed of very similar linear algebra building blocks, which can be assembled using numerical kernels from ex…
▽ More
We investigate the solution of low-rank matrix approximation problems using the truncated SVD. For this purpose, we develop and optimize GPU implementations for the randomized SVD and a blocked variant of the Lanczos approach. Our work takes advantage of the fact that the two methods are composed of very similar linear algebra building blocks, which can be assembled using numerical kernels from existing high-performance linear algebra libraries. Furthermore, the experiments with several sparse matrices arising in representative real-world applications and synthetic dense test matrices reveal a performance advantage of the block Lanczos algorithm when targeting the same approximation accuracy.
△ Less
Submitted 10 March, 2024;
originally announced March 2024.
-
Earth Virtualization Engines -- A Technical Perspective
Authors:
Torsten Hoefler,
Bjorn Stevens,
Andreas F. Prein,
Johanna Baehr,
Thomas Schulthess,
Thomas F. Stocker,
John Taylor,
Daniel Klocke,
Pekka Manninen,
Piers M. Forster,
Tobias Kölling,
Nicolas Gruber,
Hartwig Anzt,
Claudia Frauen,
Florian Ziemen,
Milan Klöwer,
Karthik Kashinath,
Christoph Schär,
Oliver Fuhrer,
Bryan N. Lawrence
Abstract:
Participants of the Berlin Summit on Earth Virtualization Engines (EVEs) discussed ideas and concepts to improve our ability to cope with climate change. EVEs aim to provide interactive and accessible climate simulations and data for a wide range of users. They combine high-resolution physics-based models with machine learning techniques to improve the fidelity, efficiency, and interpretability of…
▽ More
Participants of the Berlin Summit on Earth Virtualization Engines (EVEs) discussed ideas and concepts to improve our ability to cope with climate change. EVEs aim to provide interactive and accessible climate simulations and data for a wide range of users. They combine high-resolution physics-based models with machine learning techniques to improve the fidelity, efficiency, and interpretability of climate projections. At their core, EVEs offer a federated data layer that enables simple and fast access to exabyte-sized climate data through simple interfaces. In this article, we summarize the technical challenges and opportunities for develo** EVEs, and argue that they are essential for addressing the consequences of climate change.
△ Less
Submitted 16 September, 2023;
originally announced September 2023.
-
Porting Batched Iterative Solvers onto Intel GPUs with SYCL
Authors:
Phuong Nguyen,
Pratik Nayak,
Hartwig Anzt
Abstract:
Batched linear solvers play a vital role in computational sciences, especially in the fields of plasma physics and combustion simulations. With the imminent deployment of the Aurora Supercomputer and other upcoming systems equipped with Intel GPUs, there is a compelling demand to expand the capabilities of these solvers for Intel GPU architectures.
In this paper, we present our efforts in portin…
▽ More
Batched linear solvers play a vital role in computational sciences, especially in the fields of plasma physics and combustion simulations. With the imminent deployment of the Aurora Supercomputer and other upcoming systems equipped with Intel GPUs, there is a compelling demand to expand the capabilities of these solvers for Intel GPU architectures.
In this paper, we present our efforts in porting and optimizing the batched iterative solvers on Intel GPUs using the SYCL programming model. These new solvers achieve impressive performance on the Intel GPU Max 1550s (Ponte Vecchio GPUs) which surpass our previous CUDA implementation on NVIDIA H100 GPUs by an average of 2.4x for the PeleLM application inputs. The batched solvers are ready for production use in real-world scientific applications through the Ginkgo library, complementing the performance portability of the batched functionality of Ginkgo.
△ Less
Submitted 26 September, 2023; v1 submitted 16 August, 2023;
originally announced August 2023.
-
GPU-Resident Sparse Direct Linear Solvers for Alternating Current Optimal Power Flow Analysis
Authors:
Kasia Świrydowicz,
Nicholson Koukpaizan,
Tobias Ribizel,
Fritz Göbel,
Shrirang Abhyankar,
Hartwig Anzt,
Slaven Peleš
Abstract:
Integrating renewable resources within the transmission grid at a wide scale poses significant challenges for economic dispatch as it requires analysis with more optimization parameters, constraints, and sources of uncertainty. This motivates the investigation of more efficient computational methods, especially those for solving the underlying linear systems, which typically take more than half of…
▽ More
Integrating renewable resources within the transmission grid at a wide scale poses significant challenges for economic dispatch as it requires analysis with more optimization parameters, constraints, and sources of uncertainty. This motivates the investigation of more efficient computational methods, especially those for solving the underlying linear systems, which typically take more than half of the overall computation time. In this paper, we present our work on sparse linear solvers that take advantage of hardware accelerators, such as graphical processing units (GPUs), and improve the overall performance when used within economic dispatch computations. We treat the problems as sparse, which allows for faster execution but also makes the implementation of numerical methods more challenging. We present the first GPU-native sparse direct solver that can execute on both AMD and NVIDIA GPUs. We demonstrate significant performance improvements when using high-performance linear solvers within alternating current optimal power flow (ACOPF) analysis. Furthermore, we demonstrate the feasibility of getting significant performance improvements by executing the entire computation on GPU-based hardware. Finally, we identify outstanding research issues and opportunities for even better utilization of heterogeneous systems, including those equipped with GPUs.
△ Less
Submitted 15 August, 2023; v1 submitted 25 June, 2023;
originally announced June 2023.
-
Porting a sparse linear algebra math library to Intel GPUs
Authors:
Yuhsiang M. Tsai,
Terry Cojean,
Hartwig Anzt
Abstract:
With the announcement that the Aurora Supercomputer will be composed of general purpose Intel CPUs complemented by discrete high performance Intel GPUs, and the deployment of the oneAPI ecosystem, Intel has committed to enter the arena of discrete high performance GPUs. A central requirement for the scientific computing community is the availability of production-ready software stacks and a glimps…
▽ More
With the announcement that the Aurora Supercomputer will be composed of general purpose Intel CPUs complemented by discrete high performance Intel GPUs, and the deployment of the oneAPI ecosystem, Intel has committed to enter the arena of discrete high performance GPUs. A central requirement for the scientific computing community is the availability of production-ready software stacks and a glimpse of the performance they can expect to see on Intel high performance GPUs. In this paper, we present the first platform-portable open source math library supporting Intel GPUs via the DPC++ programming environment. We also benchmark some of the developed sparse linear algebra functionality on different Intel GPUs to assess the efficiency of the DPC++ programming ecosystem to translate raw performance into application performance. Aside from quantifying the efficiency within the hardware-specific roofline model, we also compare against routines providing the same functionality that ship with Intel's oneMKL vendor library.
△ Less
Submitted 18 March, 2021;
originally announced March 2021.
-
A Fresh Look at FAIR for Research Software
Authors:
Daniel S. Katz,
Morane Gruenpeter,
Tom Honeyman,
Lorraine Hwang,
Mark D. Wilkinson,
Vanessa Sochat,
Hartwig Anzt,
Carole Goble,
for FAIR4RS Subgroup 1
Abstract:
This document captures the discussion and deliberation of the FAIR for Research Software (FAIR4RS) subgroup that took a fresh look at the applicability of the FAIR Guiding Principles for scientific data management and stewardship for research software. We discuss the vision of research software as ideally reproducible, open, usable, recognized, sustained and robust, and then review both the charac…
▽ More
This document captures the discussion and deliberation of the FAIR for Research Software (FAIR4RS) subgroup that took a fresh look at the applicability of the FAIR Guiding Principles for scientific data management and stewardship for research software. We discuss the vision of research software as ideally reproducible, open, usable, recognized, sustained and robust, and then review both the characteristic and practiced differences of research software and data. This vision and understanding of initial conditions serves as a backdrop for an attempt at translating and interpreting the guiding principles to more fully align with research software. We have found that many of the principles remained relatively intact as written, as long as considerable interpretation was provided. This was particularly the case for the "Findable" and "Accessible" foundational principles. We found that "Interoperability" and "Reusability" are particularly prone to a broad and sometimes opposing set of interpretations as written. We propose two new principles modeled on existing ones, and provide modified guiding text for these principles to help clarify our final interpretation. A series of gaps in translation were captured during this process, and these remain to be addressed. We finish with a consideration of where these translated principles fall short of the vision laid out in the opening.
△ Less
Submitted 9 February, 2021; v1 submitted 26 January, 2021;
originally announced January 2021.
-
Ginkgo -- A Math Library designed for Platform Portability
Authors:
Terry Cojean,
Yu-Hsiang "Mike" Tsai,
Hartwig Anzt
Abstract:
The first associations to software sustainability might be the existence of a continuous integration (CI) framework; the existence of a testing framework composed of unit tests, integration tests, and end-to-end tests; and also the existence of software documentation. However, when asking what is a common deathblow for a scientific software product, it is often the lack of platform and performance…
▽ More
The first associations to software sustainability might be the existence of a continuous integration (CI) framework; the existence of a testing framework composed of unit tests, integration tests, and end-to-end tests; and also the existence of software documentation. However, when asking what is a common deathblow for a scientific software product, it is often the lack of platform and performance portability. Against this background, we designed the Ginkgo library with the primary focus on platform portability and the ability to not only port to new hardware architectures, but also achieve good performance. In this paper we present the Ginkgo library design, radically separating algorithms from hardware-specific kernels forming the distinct hardware executors, and report our experience when adding execution backends for NVIDIA, AMD, and Intel GPUs. We also comment on the different levels of performance portability, and the performance we achieved on the distinct hardware backends.
△ Less
Submitted 17 November, 2020;
originally announced November 2020.
-
Resiliency in Numerical Algorithm Design for Extreme Scale Simulations
Authors:
Emmanuel Agullo,
Mirco Altenbernd,
Hartwig Anzt,
Leonardo Bautista-Gomez,
Tommaso Benacchio,
Luca Bonaventura,
Hans-Joachim Bungartz,
Sanjay Chatterjee,
Florina M. Ciorba,
Nathan DeBardeleben,
Daniel Drzisga,
Sebastian Eibl,
Christian Engelmann,
Wilfried N. Gansterer,
Luc Giraud,
Dominik Goeddeke,
Marco Heisig,
Fabienne Jezequel,
Nils Kohl,
Xiaoye Sherry Li,
Romain Lion,
Miriam Mehl,
Paul Mycek,
Michael Obersteiner,
Enrique S. Quintana-Orti
, et al. (11 additional authors not shown)
Abstract:
This work is based on the seminar titled ``Resiliency in Numerical Algorithm Design for Extreme Scale Simulations'' held March 1-6, 2020 at Schloss Dagstuhl, that was attended by all the authors.
Naive versions of conventional resilience techniques will not scale to the exascale regime: with a main memory footprint of tens of Petabytes, synchronously writing checkpoint data all the way to backgr…
▽ More
This work is based on the seminar titled ``Resiliency in Numerical Algorithm Design for Extreme Scale Simulations'' held March 1-6, 2020 at Schloss Dagstuhl, that was attended by all the authors.
Naive versions of conventional resilience techniques will not scale to the exascale regime: with a main memory footprint of tens of Petabytes, synchronously writing checkpoint data all the way to background storage at frequent intervals will create intolerable overheads in runtime and energy consumption. Forecasts show that the mean time between failures could be lower than the time to recover from such a checkpoint, so that large calculations at scale might not make any progress if robust alternatives are not investigated.
More advanced resilience techniques must be devised. The key may lie in exploiting both advanced system features as well as specific application knowledge. Research will face two essential questions: (1) what are the reliability requirements for a particular computation and (2) how do we best design the algorithms and software to meet these requirements? One avenue would be to refine and improve on system- or application-level checkpointing and rollback strategies in the case an error is detected. Developers might use fault notification interfaces and flexible runtime systems to respond to node failures in an application-dependent fashion. Novel numerical algorithms or more stochastic computational approaches may be required to meet accuracy requirements in the face of undetectable soft errors.
The goal of this Dagstuhl Seminar was to bring together a diverse group of scientists with expertise in exascale computing to discuss novel ways to make applications resilient against detected and undetected faults. In particular, participants explored the role that algorithms and applications play in the holistic approach needed to tackle this challenge.
△ Less
Submitted 26 October, 2020;
originally announced October 2020.
-
Compressed Basis GMRES on High Performance GPUs
Authors:
José I. Aliaga,
Hartwig Anzt,
Thomas Grützmacher,
Enrique S. Quintana-Ortí,
Andrés E. Tomás
Abstract:
Krylov methods provide a fast and highly parallel numerical tool for the iterative solution of many large-scale sparse linear systems. To a large extent, the performance of practical realizations of these methods is constrained by the communication bandwidth in all current computer architectures, motivating the recent investigation of sophisticated techniques to avoid, reduce, and/or hide the mess…
▽ More
Krylov methods provide a fast and highly parallel numerical tool for the iterative solution of many large-scale sparse linear systems. To a large extent, the performance of practical realizations of these methods is constrained by the communication bandwidth in all current computer architectures, motivating the recent investigation of sophisticated techniques to avoid, reduce, and/or hide the message-passing costs (in distributed platforms) and the memory accesses (in all architectures).
This paper introduces a new communication-reduction strategy for the (Krylov) GMRES solver that advocates for decoupling the storage format (i.e., the data representation in memory) of the orthogonal basis from the arithmetic precision that is employed during the operations with that basis. Given that the execution time of the GMRES solver is largely determined by the memory access, the datatype transforms can be mostly hidden, resulting in the acceleration of the iterative step via a lower volume of bits being retrieved from memory. Together with the special properties of the orthonormal basis (whose elements are all bounded by 1), this paves the road toward the aggressive customization of the storage format, which includes some floating point as well as fixed point formats with little impact on the convergence of the iterative process.
We develop a high performance implementation of the "compressed basis GMRES" solver in the Ginkgo sparse linear algebra library and using a large set of test problems from the SuiteSparse matrix collection we demonstrate robustness and performance advantages on a modern NVIDIA V100 GPU of up to 50% over the standard GMRES solver that stores all data in IEEE double precision.
△ Less
Submitted 25 September, 2020;
originally announced September 2020.
-
Evaluating the Performance of NVIDIA's A100 Ampere GPU for Sparse Linear Algebra Computations
Authors:
Yuhsiang Mike Tsai,
Terry Cojean,
Hartwig Anzt
Abstract:
GPU accelerators have become an important backbone for scientific high performance computing, and the performance advances obtained from adopting new GPU hardware are significant. In this paper we take a first look at NVIDIA's newest server line GPU, the A100 architecture part of the Ampere generation. Specifically, we assess its performance for sparse linear algebra operations that form the backb…
▽ More
GPU accelerators have become an important backbone for scientific high performance computing, and the performance advances obtained from adopting new GPU hardware are significant. In this paper we take a first look at NVIDIA's newest server line GPU, the A100 architecture part of the Ampere generation. Specifically, we assess its performance for sparse linear algebra operations that form the backbone of many scientific applications and assess the performance improvements over its predecessor.
△ Less
Submitted 19 August, 2020;
originally announced August 2020.
-
A Survey of Numerical Methods Utilizing Mixed Precision Arithmetic
Authors:
Ahmad Abdelfattah,
Hartwig Anzt,
Erik G. Boman,
Erin Carson,
Terry Cojean,
Jack Dongarra,
Mark Gates,
Thomas Grützmacher,
Nicholas J. Higham,
Sherry Li,
Neil Lindquist,
Yang Liu,
Jennifer Loe,
Piotr Luszczek,
Pratik Nayak,
Sri Pranesh,
Siva Rajamanickam,
Tobias Ribizel,
Barry Smith,
Kasia Swirydowicz,
Stephen Thomas,
Stanimire Tomov,
Yaohung M. Tsai,
Ichitaro Yamazaki,
Urike Meier Yang
Abstract:
Within the past years, hardware vendors have started designing low precision special function units in response to the demand of the Machine Learning community and their demand for high compute power in low precision formats. Also the server-line products are increasingly featuring low-precision special function units, such as the NVIDIA tensor cores in ORNL's Summit supercomputer providing more t…
▽ More
Within the past years, hardware vendors have started designing low precision special function units in response to the demand of the Machine Learning community and their demand for high compute power in low precision formats. Also the server-line products are increasingly featuring low-precision special function units, such as the NVIDIA tensor cores in ORNL's Summit supercomputer providing more than an order of magnitude higher performance than what is available in IEEE double precision. At the same time, the gap between the compute power on the one hand and the memory bandwidth on the other hand keeps increasing, making data access and communication prohibitively expensive compared to arithmetic operations. To start the multiprecision focus effort, we survey the numerical linear algebra community and summarize all existing multiprecision knowledge, expertise, and software capabilities in this landscape analysis report. We also include current efforts and preliminary results that may not yet be considered "mature technology," but have the potential to grow into production quality within the multiprecision focus effort. As we expect the reader to be familiar with the basics of numerical linear algebra, we refrain from providing a detailed background on the algorithms themselves but focus on how mixed- and multiprecision technology can help improving the performance of these methods and present highlights of application significantly outperforming the traditional fixed precision methods.
△ Less
Submitted 13 July, 2020;
originally announced July 2020.
-
Ginkgo: A Modern Linear Operator Algebra Framework for High Performance Computing
Authors:
Hartwig Anzt,
Terry Cojean,
Goran Flegar,
Fritz Göbel,
Thomas Grützmacher,
Pratik Nayak,
Tobias Ribizel,
Yuhsiang Mike Tsai,
Enrique S. Quintana-Ortí
Abstract:
In this paper, we present Ginkgo, a modern C++ math library for scientific high performance computing. While classical linear algebra libraries act on matrix and vector objects, Ginkgo's design principle abstracts all functionality as "linear operators", motivating the notation of a "linear operator algebra library". Ginkgo's current focus is oriented towards providing sparse linear algebra functi…
▽ More
In this paper, we present Ginkgo, a modern C++ math library for scientific high performance computing. While classical linear algebra libraries act on matrix and vector objects, Ginkgo's design principle abstracts all functionality as "linear operators", motivating the notation of a "linear operator algebra library". Ginkgo's current focus is oriented towards providing sparse linear algebra functionality for high performance GPU architectures, but given the library design, this focus can be easily extended to accommodate other algorithms and hardware architectures. We introduce this sophisticated software architecture that separates core algorithms from architecture-specific back ends and provide details on extensibility and sustainability measures. We also demonstrate Ginkgo's usability by providing examples on how to use its functionality inside the MFEM and deal.ii finite element ecosystems. Finally, we offer a practical demonstration of Ginkgo's high performance on state-of-the-art GPU architectures.
△ Less
Submitted 1 July, 2020; v1 submitted 30 June, 2020;
originally announced June 2020.
-
Preparing Ginkgo for AMD GPUs -- A Testimonial on Porting CUDA Code to HIP
Authors:
Yuhsiang M. Tsai,
Terry Cojean,
Tobias Ribizel,
Hartwig Anzt
Abstract:
With AMD reinforcing their ambition in the scientific high performance computing ecosystem, we extend the hardware scope of the Ginkgo linear algebra package to feature a HIP backend for AMD GPUs. In this paper, we report and discuss the porting effort from CUDA, the extension of the HIP framework to add missing features such as cooperative groups, the performance price of compiling HIP code for A…
▽ More
With AMD reinforcing their ambition in the scientific high performance computing ecosystem, we extend the hardware scope of the Ginkgo linear algebra package to feature a HIP backend for AMD GPUs. In this paper, we report and discuss the porting effort from CUDA, the extension of the HIP framework to add missing features such as cooperative groups, the performance price of compiling HIP code for AMD architectures, and the design of a library providing native backends for NVIDIA and AMD GPUs while minimizing code duplication by using a shared code base.
△ Less
Submitted 25 June, 2020;
originally announced June 2020.
-
An Environment for Sustainable Research Software in Germany and Beyond: Current State, Open Challenges, and Call for Action
Authors:
Hartwig Anzt,
Felix Bach,
Stephan Druskat,
Frank Löffler,
Axel Loewe,
Bernhard Y. Renard,
Gunnar Seemann,
Alexander Struck,
Elke Achhammer,
Piush Aggarwal,
Franziska Appel,
Michael Bader,
Lutz Brusch,
Christian Busse,
Gerasimos Chourdakis,
Piotr W. Dabrowski,
Peter Ebert,
Bernd Flemisch,
Sven Friedl,
Bernadette Fritzsch,
Maximilian D. Funk,
Volker Gast,
Florian Goth,
Jean-Noël Grad,
Sibylle Hermann
, et al. (18 additional authors not shown)
Abstract:
Research software has become a central asset in academic research. It optimizes existing and enables new research methods, implements and embeds research knowledge, and constitutes an essential research product in itself. Research software must be sustainable in order to understand, replicate, reproduce, and build upon existing research or conduct new research effectively. In other words, software…
▽ More
Research software has become a central asset in academic research. It optimizes existing and enables new research methods, implements and embeds research knowledge, and constitutes an essential research product in itself. Research software must be sustainable in order to understand, replicate, reproduce, and build upon existing research or conduct new research effectively. In other words, software must be available, discoverable, usable, and adaptable to new needs, both now and in the future. Research software therefore requires an environment that supports sustainability. Hence, a change is needed in the way research software development and maintenance are currently motivated, incentivized, funded, structurally and infrastructurally supported, and legally treated. Failing to do so will threaten the quality and validity of research. In this paper, we identify challenges for research software sustainability in Germany and beyond, in terms of motivation, selection, research software engineering personnel, funding, infrastructure, and legal aspects. Besides researchers, we specifically address political and academic decision-makers to increase awareness of the importance and needs of sustainable research software practices. In particular, we recommend strategies and measures to create an environment for sustainable research software, with the ultimate goal to ensure that software-driven research is valid, reproducible and sustainable, and that software is recognized as a first class citizen in research. This paper is the outcome of two workshops run in Germany in 2019, at deRSE19 - the first International Conference of Research Software Engineers in Germany - and a dedicated DFG-supported follow-up workshop in Berlin.
△ Less
Submitted 5 May, 2020; v1 submitted 27 April, 2020;
originally announced May 2020.
-
Evaluating Abstract Asynchronous Schwarz solvers on GPUs
Authors:
Pratik Nayak,
Terry Cojean,
Hartwig Anzt
Abstract:
With the commencement of the exascale computing era, we realize that the majority of the leadership supercomputers are heterogeneous and massively parallel even on a single node with multiple co-processors such as GPUs and multiple cores on each node. For example, ORNLs Summit accumulates six NVIDIA Tesla V100s and 42 core IBM Power9s on each node. Synchronizing across all these compute resources…
▽ More
With the commencement of the exascale computing era, we realize that the majority of the leadership supercomputers are heterogeneous and massively parallel even on a single node with multiple co-processors such as GPUs and multiple cores on each node. For example, ORNLs Summit accumulates six NVIDIA Tesla V100s and 42 core IBM Power9s on each node. Synchronizing across all these compute resources in a single node or even across multiple nodes is prohibitively expensive. Hence it is necessary to develop and study asynchronous algorithms that circumvent this issue of bulk-synchronous computing for massive parallelism. In this study, we examine the asynchronous version of the abstract Restricted Additive Schwarz method as a solver where we do not explicitly synchronize, but allow for communication of the data between the sub-domains to be completely asynchronous thereby removing the bulk synchronous nature of the algorithm.
We accomplish this by using the onesided RMA functions of the MPI standard. We study the benefits of using such an asynchronous solver over its synchronous counterpart on both multi-core architectures and on multiple GPUs. We also study the communication patterns and local solvers and their effect on the global solver. Finally, we show that this concept can render attractive runtime benefits over the synchronous counterparts.
△ Less
Submitted 5 May, 2020; v1 submitted 11 March, 2020;
originally announced March 2020.