-
Optimization and Portability of a Fusion OpenACC-based FORTRAN HPC Code from NVIDIA to AMD GPUs
Authors:
Igor Sfiligoi,
Emily A. Belli,
Jeff Candy,
Reuben D. Budiardja
Abstract:
NVIDIA has been the main provider of GPU hardware in HPC systems for over a decade. Most applications that benefit from GPUs have thus been developed and optimized for the NVIDIA software stack. Recent exascale HPC systems are, however, introducing GPUs from other vendors, e.g. with the AMD GPU-based OLCF Frontier system just becoming available. AMD GPUs cannot be directly accessed using the NVIDI…
▽ More
NVIDIA has been the main provider of GPU hardware in HPC systems for over a decade. Most applications that benefit from GPUs have thus been developed and optimized for the NVIDIA software stack. Recent exascale HPC systems are, however, introducing GPUs from other vendors, e.g. with the AMD GPU-based OLCF Frontier system just becoming available. AMD GPUs cannot be directly accessed using the NVIDIA software stack, and require a porting effort by the application developers. This paper provides an overview of our experience porting and optimizing the CGYRO code, a widely-used fusion simulation tool based on FORTRAN with OpenACC-based GPU acceleration. While the porting from the NVIDIA compilers was relatively straightforward using the CRAY compilers on the AMD systems, the performance optimization required more fine-tuning. In the optimization effort, we uncovered code sections that had performed well on NVIDIA GPUs, but were unexpectedly slow on AMD GPUs. After AMD-targeted code optimizations, performance on AMD GPUs has increased to meet our expectations. Modest speed improvements were also seen on NVIDIA GPUs, which was an unexpected benefit of this exercise.
△ Less
Submitted 17 May, 2023;
originally announced May 2023.
-
Flexible, integrated modeling of tokamak stability, transport, equilibrium, and pedestal physics
Authors:
B. C. Lyons,
J. McClenaghan,
T. Slendebroek,
O. Meneghini,
T. F. Neiser,
S. P. Smith,
D. B. Weisberg,
E. A. Belli,
J. Candy,
J. M. Hanson,
L. L. Lao,
N. C. Logan,
S. Saarelma,
O. Sauter,
P. B. Snyder,
G. M. Staebler,
K. E. Thome,
A. D. Turnbull
Abstract:
The STEP (Stability, Transport, Equilibrium, and Pedestal) integrated-modeling tool has been developed in OMFIT to predict stable, tokamak equilibria self-consistently with core-transport and pedestal calculations. STEP couples theory-based codes to integrate a variety of physics, including MHD stability, transport, equilibrium, pedestal formation, and current-drive, heating, and fueling. The inpu…
▽ More
The STEP (Stability, Transport, Equilibrium, and Pedestal) integrated-modeling tool has been developed in OMFIT to predict stable, tokamak equilibria self-consistently with core-transport and pedestal calculations. STEP couples theory-based codes to integrate a variety of physics, including MHD stability, transport, equilibrium, pedestal formation, and current-drive, heating, and fueling. The input/output of each code is interfaced with a centralized ITER-IMAS data structure, allowing codes to be run in any order and enabling open-loop, feedback, and optimization workflows. This paradigm simplifies the integration of new codes, making STEP highly extensible. STEP has been verified against a published benchmark of six different integrated models. Core-pedestal calculations with STEP have been successfully validated against individual DIII-D H-mode discharges and across more than 500 discharges of the $H_{98,y2}$ database, with a mean error in confinement time from experiment less than 19%. STEP has also reproduced results in less conventional DIII-D scenarios, including negative-central-shear and negative-triangularity plasmas. Predictive STEP modeling has been used to assess performance in several tokamak reactors. Simulations of a high-field, large-aspect-ratio reactor show significantly lower fusion power than predicted by a zero-dimensional study, demonstrating the limitations of scaling-law extrapolations. STEP predictions have found promising EXCITE scenarios, including a high-pressure, 80%-bootstrap-fraction plasma. ITER modeling with STEP has shown that pellet fueling enhances fusion gain in both the baseline and advanced-inductive scenarios. Finally, STEP predictions for the SPARC baseline scenario are in good agreement with published results from the physics basis.
△ Less
Submitted 12 May, 2023;
originally announced May 2023.
-
Comparing single-node and multi-node performance of an important fusion HPC code benchmark
Authors:
Emily A. Belli,
Jeff Candy,
Igor Sfiligoi,
Frank Würthwein
Abstract:
Fusion simulations have traditionally required the use of leadership scale High Performance Computing (HPC) resources in order to produce advances in physics. The impressive improvements in compute and memory capacity of many-GPU compute nodes are now allowing for some problems that once required a multi-node setup to be also solvable on a single node. When possible, the increased interconnect ban…
▽ More
Fusion simulations have traditionally required the use of leadership scale High Performance Computing (HPC) resources in order to produce advances in physics. The impressive improvements in compute and memory capacity of many-GPU compute nodes are now allowing for some problems that once required a multi-node setup to be also solvable on a single node. When possible, the increased interconnect bandwidth can result in order of magnitude higher science throughput, especially for communication-heavy applications. In this paper we analyze the performance of the fusion simulation tool CGYRO, an Eulerian gyrokinetic turbulence solver designed and optimized for collisional, electromagnetic, multiscale simulation, which is widely used in the fusion research community. Due to the nature of the problem, the application has to work on a large multi-dimensional computational mesh as a whole, requiring frequent exchange of large amounts of data between the compute processes. In particular, we show that the average-scale nl03 benchmark CGYRO simulation can be run at an acceptable speed on a single Google Cloud instance with 16 A100 GPUs, outperforming 8 NERSC Perlmutter Phase1 nodes, 16 ORNL Summit nodes and 256 NERSC Cori nodes. Moving from a multi-node to a single-node GPU setup we get comparable simulation times using less than half the number of GPUs. Larger benchmark problems, however, still require a multi-node HPC setup due to GPU memory capacity needs, since at the time of writing no vendor offers nodes with a sufficient GPU memory setup. The upcoming external NVSWITCH does however promise to deliver an almost equivalent solution for up to 256 NVIDIA GPUs.
△ Less
Submitted 19 May, 2022;
originally announced May 2022.
-
Theoretical description of heavy impurity transport and its application to the modelling of tungsten in JET and ASDEX Upgrade
Authors:
F. J. Casson,
C. Angioni,
E. A. Belli,
R. Bilato,
P. Mantica,
T. Odstrcil,
T. Puetterich,
M. Valisa,
L. Garzotti,
C. Giroud,
J. Hobirk,
C. F. Maggi,
J. Mlynar,
M. L. Reinke,
JET EFDA contributors,
ASDEX-Upgrade team
Abstract:
Recent developments in theory-based modelling of core heavy impurity transport are presented, and shown to be necessary for quantitative description of present experiments in JET and ASDEX Upgrade. The treatment of heavy impurities is complicated by their large mass and charge, which result in a strong response to plasma rotation or any small background electrostatic field in the plasma, such as t…
▽ More
Recent developments in theory-based modelling of core heavy impurity transport are presented, and shown to be necessary for quantitative description of present experiments in JET and ASDEX Upgrade. The treatment of heavy impurities is complicated by their large mass and charge, which result in a strong response to plasma rotation or any small background electrostatic field in the plasma, such as that generated by anisotropic external heating. These forces lead to strong poloidal asymmetries of impurity density, which have recently been added to numerical tools describing both neoclassical and turbulent transport. Modelling predictions of the steady-state two-dimensional tungsten impurity distribution are compared with experimental densities interpreted from soft X-ray diagnostics. The modelling identifies neoclassical transport enhanced by poloidal asymmetries as the dominant mechanism responsible for tungsten accumulation in the central core of the plasma. Depending on the bulk plasma profiles, neoclassical temperature screening can prevent accumulation, and can be enhanced by externally heated species, demonstrated here in ICRH plasmas.
△ Less
Submitted 4 July, 2014;
originally announced July 2014.
-
Impurity transport in Alcator C-Mod in the presence of poloidal density variation induced by ion cyclotron resonance heating
Authors:
Albert Mollén,
István Pusztai,
Matthew L. Reinke,
Yevgen O. Kazakov,
Nathan T. Howard,
Emily A. Belli,
Tünde Fülöp,
the Alcator C-Mod Team
Abstract:
Impurity particle transport in an ion cyclotron resonance heated Alcator C-Mod discharge is studied with local gyrokinetic simulations and a theoretical model including the effect of poloidal asymmetries and elongation. In spite of the strong minority temperature anisotropy in the deep core region, the poloidal asymmetries are found to have a negligible effect on the turbulent impurity transport d…
▽ More
Impurity particle transport in an ion cyclotron resonance heated Alcator C-Mod discharge is studied with local gyrokinetic simulations and a theoretical model including the effect of poloidal asymmetries and elongation. In spite of the strong minority temperature anisotropy in the deep core region, the poloidal asymmetries are found to have a negligible effect on the turbulent impurity transport due to low magnetic shear in this region, in agreement with the experimental observations. According to the theoretical model, in outer core regions poloidal asymmetries may contribute to the reduction of the impurity peaking, but uncertainties in atomic physics processes prevent quantitative comparison with experiments.
△ Less
Submitted 10 November, 2014; v1 submitted 3 February, 2014;
originally announced February 2014.
-
Intrinsic rotation driven by non-Maxwellian equilibria in tokamak plasmas
Authors:
M. Barnes,
F. I. Parra,
J. P. Lee,
E. A. Belli,
M. F. F. Nave,
A. E. White
Abstract:
The effect of small deviations from a Maxwellian equilibrium on turbulent momentum transport in tokamak plasmas is considered. These non-Maxwellian features, arising from diamagnetic effects, introduce a strong dependence of the radial flux of co-current toroidal angular momentum on collisionality: As the plasma goes from nearly collisionless to weakly collisional, the flux reverses direction from…
▽ More
The effect of small deviations from a Maxwellian equilibrium on turbulent momentum transport in tokamak plasmas is considered. These non-Maxwellian features, arising from diamagnetic effects, introduce a strong dependence of the radial flux of co-current toroidal angular momentum on collisionality: As the plasma goes from nearly collisionless to weakly collisional, the flux reverses direction from radially inward to outward. This indicates a collisionality-dependent transition from peaked to hollow rotation profiles, consistent with experimental observations of intrinsic rotation.
△ Less
Submitted 12 April, 2013;
originally announced April 2013.
-
Simulating Gyrokinetic Microinstabilities in Stellarator Geometry with GS2
Authors:
J. A. Baumgaertel,
E. A. Belli,
W. Dorland,
W. Guttenfelder,
G. W. Hammett,
D. R. Mikkelsen,
G. Rewoldt,
W. M. Tang,
P. Xanthopoulos
Abstract:
The nonlinear gyrokinetic code GS2 has been extended to treat non-axisymmetric stellarator geometry. Electromagnetic perturbations and multiple trapped particle regions are allowed. Here, linear, collisionless, electrostatic simulations of the quasi-axisymmetric, three-field period National Compact Stellarator Experiment (NCSX) design QAS3-C82 have been successfully benchmarked against the eigenva…
▽ More
The nonlinear gyrokinetic code GS2 has been extended to treat non-axisymmetric stellarator geometry. Electromagnetic perturbations and multiple trapped particle regions are allowed. Here, linear, collisionless, electrostatic simulations of the quasi-axisymmetric, three-field period National Compact Stellarator Experiment (NCSX) design QAS3-C82 have been successfully benchmarked against the eigenvalue code FULL. Quantitatively, the linear stability calculations of GS2 and FULL agree to within ~10%.
△ Less
Submitted 21 September, 2011;
originally announced September 2011.