-
PETSc/TAO Developments for Early Exascale Systems
Authors:
Richard Tran Mills,
Mark Adams,
Satish Balay,
Jed Brown,
Jacob Faibussowitsch,
Toby Isaac,
Matthew Knepley,
Todd Munson,
Hansol Suh,
Stefano Zampini,
Hong Zhang,
Junchao Zhang
Abstract:
The Portable Extensible Toolkit for Scientific Computation (PETSc) library provides scalable solvers for nonlinear time-dependent differential and algebraic equations and for numerical optimization via the Toolkit for Advanced Optimization (TAO). PETSc is used in dozens of scientific fields and is an important building block for many simulation codes. During the U.S. Department of Energy's Exascal…
▽ More
The Portable Extensible Toolkit for Scientific Computation (PETSc) library provides scalable solvers for nonlinear time-dependent differential and algebraic equations and for numerical optimization via the Toolkit for Advanced Optimization (TAO). PETSc is used in dozens of scientific fields and is an important building block for many simulation codes. During the U.S. Department of Energy's Exascale Computing Project, the PETSc team has made substantial efforts to enable efficient utilization of the massive fine-grain parallelism present within exascale compute nodes and to enable performance portability across exascale architectures. We recap some of the challenges that designers of numerical libraries face in such an endeavor, and then discuss the many developments we have made, which include the addition of new GPU backends, features supporting efficient on-device matrix assembly, better support for asynchronicity and GPU kernel concurrency, and new communication infrastructure. We evaluate the performance of these developments on some pre-exascale systems as well the early exascale systems Frontier and Aurora, using compute kernel, communication layer, solver, and mini-application benchmark studies, and then close with a few observations drawn from our experiences on the tension between portable performance and other goals of numerical libraries.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Safe, Seamless, And Scalable Integration Of Asynchronous GPU Streams In PETSc
Authors:
Jacob Faibussowitsch,
Mark F. Adams,
Richard Tran Mills,
Stefano Zampini,
Junchao Zhang
Abstract:
Leveraging Graphics Processing Units (GPUs) to accelerate scientific software has proven to be highly successful, but in order to extract more performance, GPU programmers must overcome the high latency costs associated with their use. One method of reducing or hiding this latency cost is to use asynchronous streams to issue commands to the GPU. While performant, the streams model is an invasive a…
▽ More
Leveraging Graphics Processing Units (GPUs) to accelerate scientific software has proven to be highly successful, but in order to extract more performance, GPU programmers must overcome the high latency costs associated with their use. One method of reducing or hiding this latency cost is to use asynchronous streams to issue commands to the GPU. While performant, the streams model is an invasive abstraction, and has therefore proven difficult to integrate into general-purpose libraries. In this work, we enumerate the difficulties specific to library authors in adopting streams, and present recent work on addressing them. Finally, we present a unified asynchronous programming model for use in the Portable, Extensible, Toolkit for Scientific Computation (PETSc) to overcome these challenges. The new model shows broad performance benefits while remaining ergonomic to the user.
△ Less
Submitted 30 June, 2023;
originally announced June 2023.
-
The PETSc Community Is the Infrastructure
Authors:
Mark Adams,
Satish Balay,
Oana Marin,
Lois Curfman McInnes,
Richard Tran Mills,
Todd Munson,
Hong Zhang,
Junchao Zhang,
Jed Brown,
Victor Eijkhout,
Jacob Faibussowitsch,
Matthew Knepley,
Fande Kong,
Scott Kruger,
Patrick Sanan,
Barry F. Smith,
Hong Zhang
Abstract:
The communities who develop and support open source scientific software packages are crucial to the utility and success of such packages. Moreover, these communities form an important part of the human infrastructure that enables scientific progress. This paper discusses aspects of the PETSc (Portable Extensible Toolkit for Scientific Computation) community, its organization, and technical approac…
▽ More
The communities who develop and support open source scientific software packages are crucial to the utility and success of such packages. Moreover, these communities form an important part of the human infrastructure that enables scientific progress. This paper discusses aspects of the PETSc (Portable Extensible Toolkit for Scientific Computation) community, its organization, and technical approaches that enable community members to help each other efficiently.
△ Less
Submitted 3 January, 2022;
originally announced January 2022.
-
The PetscSF Scalable Communication Layer
Authors:
Junchao Zhang,
Jed Brown,
Satish Balay,
Jacob Faibussowitsch,
Matthew Knepley,
Oana Marin,
Richard Tran Mills,
Todd Munson,
Barry F. Smith,
Stefano Zampini
Abstract:
PetscSF, the communication component of the Portable, Extensible Toolkit for Scientific Computation (PETSc), is designed to provide PETSc's communication infrastructure suitable for exascale computers that utilize GPUs and other accelerators. PetscSF provides a simple application programming interface (API) for managing common communication patterns in scientific computations by using a star-fores…
▽ More
PetscSF, the communication component of the Portable, Extensible Toolkit for Scientific Computation (PETSc), is designed to provide PETSc's communication infrastructure suitable for exascale computers that utilize GPUs and other accelerators. PetscSF provides a simple application programming interface (API) for managing common communication patterns in scientific computations by using a star-forest graph representation. PetscSF supports several implementations based on MPI and NVSHMEM, whose selection is based on the characteristics of the application or the target architecture. An efficient and portable model for network and intra-node communication is essential for implementing large-scale applications. The Message Passing Interface, which has been the de facto standard for distributed memory systems, has developed into a large complex API that does not yet provide high performance on the emerging heterogeneous CPU-GPU-based exascale systems. In this paper, we discuss the design of PetscSF, how it can overcome some difficulties of working directly with MPI on GPUs, and we demonstrate its performance, scalability, and novel features.
△ Less
Submitted 21 May, 2021; v1 submitted 25 February, 2021;
originally announced February 2021.