Parthenon -- a performance portable block-structured adaptive mesh refinement framework
Authors:
Philipp Grete,
Joshua C. Dolence,
Jonah M. Miller,
Joshua Brown,
Ben Ryan,
Andrew Gaspar,
Forrest Glines,
Sriram Swaminarayan,
Jonas Lippuner,
Clell J. Solomon,
Galen Shipman,
Christoph Junghans,
Daniel Holladay,
James M. Stone,
Luke F. Roberts
Abstract:
On the path to exascale the landscape of computer device architectures and corresponding programming models has become much more diverse. While various low-level performance portable programming models are available, support at the application level lacks behind. To address this issue, we present the performance portable block-structured adaptive mesh refinement (AMR) framework Parthenon, derived…
▽ More
On the path to exascale the landscape of computer device architectures and corresponding programming models has become much more diverse. While various low-level performance portable programming models are available, support at the application level lacks behind. To address this issue, we present the performance portable block-structured adaptive mesh refinement (AMR) framework Parthenon, derived from the well-tested and widely used Athena++ astrophysical magnetohydrodynamics code, but generalized to serve as the foundation for a variety of downstream multi-physics codes. Parthenon adopts the Kokkos programming model, and provides various levels of abstractions from multi-dimensional variables, to packages defining and separating components, to launching of parallel compute kernels. Parthenon allocates all data in device memory to reduce data movement, supports the logical packing of variables and mesh blocks to reduce kernel launch overhead, and employs one-sided, asynchronous MPI calls to reduce communication overhead in multi-node simulations. Using a hydrodynamics miniapp, we demonstrate weak and strong scaling on various architectures including AMD and NVIDIA GPUs, Intel and AMD x86 CPUs, IBM Power9 CPUs, as well as Fujitsu A64FX CPUs. At the largest scale on Frontier (the first TOP500 exascale machine), the miniapp reaches a total of $1.7\times10^{13}$ zone-cycles/s on 9,216 nodes (73,728 logical GPUs) at ~92% weak scaling parallel efficiency (starting from a single node). In combination with being an open, collaborative project, this makes Parthenon an ideal framework to target exascale simulations in which the downstream developers can focus on their specific application rather than on the complexity of handling massively-parallel, device-accelerated AMR.
△ Less
Submitted 21 November, 2022; v1 submitted 24 February, 2022;
originally announced February 2022.
SpECTRE: A Task-based Discontinuous Galerkin Code for Relativistic Astrophysics
Authors:
Lawrence E. Kidder,
Scott E. Field,
Francois Foucart,
Erik Schnetter,
Saul A. Teukolsky,
Andy Bohn,
Nils Deppe,
Peter Diener,
François Hébert,
Jonas Lippuner,
Jonah Miller,
Christian D. Ott,
Mark A. Scheel,
Trevor Vincent
Abstract:
We introduce a new relativistic astrophysics code, SpECTRE, that combines a discontinuous Galerkin method with a task-based parallelism model. SpECTRE's goal is to achieve more accurate solutions for challenging relativistic astrophysics problems such as core-collapse supernovae and binary neutron star mergers. The robustness of the discontinuous Galerkin method allows for the use of high-resoluti…
▽ More
We introduce a new relativistic astrophysics code, SpECTRE, that combines a discontinuous Galerkin method with a task-based parallelism model. SpECTRE's goal is to achieve more accurate solutions for challenging relativistic astrophysics problems such as core-collapse supernovae and binary neutron star mergers. The robustness of the discontinuous Galerkin method allows for the use of high-resolution shock capturing methods in regions where (relativistic) shocks are found, while exploiting high-order accuracy in smooth regions. A task-based parallelism model allows efficient use of the largest supercomputers for problems with a heterogeneous workload over disparate spatial and temporal scales. We argue that the locality and algorithmic structure of discontinuous Galerkin methods will exhibit good scalability within a task-based parallelism framework. We demonstrate the code on a wide variety of challenging benchmark problems in (non)-relativistic (magneto)-hydrodynamics. We demonstrate the code's scalability including its strong scaling on the NCSA Blue Waters supercomputer up to the machine's full capacity of 22,380 nodes using 671,400 threads.
△ Less
Submitted 21 July, 2017; v1 submitted 31 August, 2016;
originally announced September 2016.