Refactoring the MPS/University of Chicago Radiative MHD(MURaM) Model for GPU/CPU Performance Portability Using OpenACC Directives
Authors:
Eric Wright,
Damien Przybylski,
Matthias Rempel,
Cena Miller,
Supreeth Suresh,
Shiquan Su,
Richard Loft,
Sunita Chandrasekaran
Abstract:
The MURaM (Max Planck University of Chicago Radiative MHD) code is a solar atmosphere radiative MHD model that has been broadly applied to solar phenomena ranging from quiet to active sun, including eruptive events such as flares and coronal mass ejections. The treatment of physics is sufficiently realistic to allow for the synthesis of emission from visible light to extreme UV and X-rays, which i…
▽ More
The MURaM (Max Planck University of Chicago Radiative MHD) code is a solar atmosphere radiative MHD model that has been broadly applied to solar phenomena ranging from quiet to active sun, including eruptive events such as flares and coronal mass ejections. The treatment of physics is sufficiently realistic to allow for the synthesis of emission from visible light to extreme UV and X-rays, which is critical for a detailed comparison with available and future multi-wavelength observations. This component relies critically on the radiation transport solver (RTS) of MURaM; the most computationally intensive component of the code. The benefits of accelerating RTS are multiple fold: A faster RTS allows for the regular use of the more expensive multi-band radiation transport needed for comparison with observations, and this will pave the way for the acceleration of ongoing improvements in RTS that are critical for simulations of the solar chromosphere. We present challenges and strategies to accelerate a multi-physics, multi-band MURaM using a directive-based programming model, OpenACC in order to maintain a single source code across CPUs and GPUs. Results for a $288^3$ test problem show that MURaM with the optimized RTS routine achieves 1.73x speedup using a single NVIDIA V100 GPU over a fully subscribed 40-core Intel Skylake CPU node and with respect to the number of simulation points (in millions) per second, a single NVIDIA V100 GPU is equivalent to 69 Skylake cores. We also measure parallel performance on up to 96 GPUs and present weak and strong scaling results.
△ Less
Submitted 16 July, 2021;
originally announced July 2021.
A Parallel Time-Integrator for Solving the Linearized Shallow Water Equations on the Rotating Sphere
Authors:
Martin Schreiber,
Richard Loft
Abstract:
With the stagnation of processor core performance, further reductions in the time-to-solution for geophysical fluid problems are becoming increasingly difficult with standard time integrators. Parallel-in-time exposes and exploits additional parallelism in the time dimension which is inherently sequential in traditional methods. The rational approximation of exponential integrators (REXI) method a…
▽ More
With the stagnation of processor core performance, further reductions in the time-to-solution for geophysical fluid problems are becoming increasingly difficult with standard time integrators. Parallel-in-time exposes and exploits additional parallelism in the time dimension which is inherently sequential in traditional methods. The rational approximation of exponential integrators (REXI) method allows taking arbitrarily long time steps based on a sum over a number of decoupled complex PDEs that can be solved independently massively parallel. Hence REXI is assumed to be well suited for modern massively parallel super computers which are currently trending. To date the study and development of the REXI approach has been limited to linearized problems on the periodic 2D plane. This work extends the REXI time step** method to the linear shallow-water equations (SWE) on the rotating sphere, thus moving the method one step closer to solving fully nonlinear fluid problems of geophysical interest on the sphere. The rotating sphere poses particular challenges for finding an efficient solver due to the zonal dependence of the Coriolis term. Here we present an efficient REXI solver based on spherical harmonics, showing the results of: a geostrophic balance test, a comparison with alternative time step** methods, an analysis of dispersion relations, indicating superior properties of REXI, and finally a performance comparison on Cheyenne supercomputer. Our results indicate that REXI is not only able to take larger time steps, but that REXI can also be used to gain higher accuracy and significantly reduced time-to-solution compared to currently existing time step** methods.
△ Less
Submitted 22 January, 2019; v1 submitted 9 December, 2018;
originally announced December 2018.