-
Experience and Analysis of Scalable High-Fidelity Computational Fluid Dynamics on Modular Supercomputing Architectures
Authors:
Martin Karp,
Estela Suarez,
Jan H. Meinke,
Måns I. Andersson,
Philipp Schlatter,
Stefano Markidis,
Niclas Jansson
Abstract:
The never-ending computational demand from simulations of turbulence makes computational fluid dynamics (CFD) a prime application use case for current and future exascale systems. High-order finite element methods, such as the spectral element method, have been gaining traction as they offer high performance on both multicore CPUs and modern GPU-based accelerators. In this work, we assess how high…
▽ More
The never-ending computational demand from simulations of turbulence makes computational fluid dynamics (CFD) a prime application use case for current and future exascale systems. High-order finite element methods, such as the spectral element method, have been gaining traction as they offer high performance on both multicore CPUs and modern GPU-based accelerators. In this work, we assess how high-fidelity CFD using the spectral element method can exploit the modular supercomputing architecture at scale through domain partitioning, where the computational domain is split between a Booster module powered by GPUs and a Cluster module with conventional CPU nodes. We investigate several different flow cases and computer systems based on the modular supercomputing architecture (MSA). We observe that for our simulations, the communication overhead and load balancing issues incurred by incorporating different computing architectures are seldom worthwhile, especially when I/O is also considered, but when the simulation at hand requires more than the combined global memory on the GPUs, utilizing additional CPUs to increase the available memory can be fruitful. We support our results with a simple performance model to assess when running across modules might be beneficial. As MSA is becoming more widespread and efforts to increase system utilization are growing more important our results give insight into when and how a monolithic application can utilize and spread out to more than one module and obtain a faster time to solution.
△ Less
Submitted 9 May, 2024;
originally announced May 2024.
-
Supercomputers as a Continous Medium
Authors:
Martin Karp,
Niclas Jansson,
Philipp Schlatter,
Stefano Markidis
Abstract:
As supercomputers' complexity has grown, the traditional boundaries between processor, memory, network, and accelerators have blurred, making a homogeneous computer model, in which the overall computer system is modeled as a continuous medium with homogeneously distributed computational power, memory, and data movement transfer capabilities, an intriguing and powerful abstraction. By applying a ho…
▽ More
As supercomputers' complexity has grown, the traditional boundaries between processor, memory, network, and accelerators have blurred, making a homogeneous computer model, in which the overall computer system is modeled as a continuous medium with homogeneously distributed computational power, memory, and data movement transfer capabilities, an intriguing and powerful abstraction. By applying a homogeneous computer model to algorithms with a given I/O complexity, we recover from first principles, other discrete computer models, such as the roofline model, parallel computing laws, such as Amdahl's and Gustafson's laws, and phenomenological observations, such as super-linear speedup. One of the homogeneous computer model's distinctive advantages is the capability of directly linking the performance limits of an application to the physical properties of a classical computer system. Applying the homogeneous computer model to supercomputers, such as Frontier, Fugaku, and the Nvidia DGX GH200, shows that applications, such as Conjugate Gradient (CG) and Fast Fourier Transforms (FFT), are rapidly approaching the fundamental classical computational limits, where the performance of even denser systems in terms of compute and memory are fundamentally limited by the speed of light.
△ Less
Submitted 9 May, 2024;
originally announced May 2024.
-
Large-Scale Direct Numerical Simulations of Turbulence Using GPUs and Modern Fortran
Authors:
Martin Karp,
Daniele Massaro,
Niclas Jansson,
Alistair Hart,
Jacob Wahlgren,
Philipp Schlatter,
Stefano Markidis
Abstract:
We present our approach to making direct numerical simulations of turbulence with applications in sustainable ship**. We use modern Fortran and the spectral element method to leverage and scale on supercomputers powered by the Nvidia A100 and the recent AMD Instinct MI250X GPUs, while still providing support for user software developed in Fortran. We demonstrate the efficiency of our approach by…
▽ More
We present our approach to making direct numerical simulations of turbulence with applications in sustainable ship**. We use modern Fortran and the spectral element method to leverage and scale on supercomputers powered by the Nvidia A100 and the recent AMD Instinct MI250X GPUs, while still providing support for user software developed in Fortran. We demonstrate the efficiency of our approach by performing the world's first direct numerical simulation of the flow around a Flettner rotor at Re=30'000 and its interaction with a turbulent boundary layer. We present one of the first performance comparisons between the AMD Instinct MI250X and Nvidia A100 GPUs for scalable computational fluid dynamics. Our results show that one MI250X offers performance on par with two A100 GPUs and has a similar power efficiency.
△ Less
Submitted 23 June, 2022;
originally announced July 2022.
-
Predicting the temporal dynamics of turbulent channels through deep learning
Authors:
Giuseppe Borrelli,
Luca Guastoni,
Hamidreza Eivazi,
Philipp Schlatter,
Ricardo Vinuesa
Abstract:
The success of recurrent neural networks (RNNs) has been demonstrated in many applications related to turbulence, including flow control, optimization, turbulent features reproduction as well as turbulence prediction and modeling. With this study we aim to assess the capability of these networks to reproduce the temporal evolution of a minimal turbulent channel flow. We first obtain a data-driven…
▽ More
The success of recurrent neural networks (RNNs) has been demonstrated in many applications related to turbulence, including flow control, optimization, turbulent features reproduction as well as turbulence prediction and modeling. With this study we aim to assess the capability of these networks to reproduce the temporal evolution of a minimal turbulent channel flow. We first obtain a data-driven model based on a modal decomposition in the Fourier domain (which we denote as FFT-POD) of the time series sampled from the flow. This particular case of turbulent flow allows us to accurately simulate the most relevant coherent structures close to the wall. Long-short-term-memory (LSTM) networks and a Koopman-based framework (KNF) are trained to predict the temporal dynamics of the minimal-channel-flow modes. Tests with different configurations highlight the limits of the KNF method compared to the LSTM, given the complexity of the flow under study. Long-term prediction for LSTM show excellent agreement from the statistical point of view, with errors below 2% for the best models with respect to the reference. Furthermore, the analysis of the chaotic behaviour through the use of the Lyapunov exponents and of the dynamic behaviour through Poincaré maps emphasizes the ability of the LSTM to reproduce the temporal dynamics of turbulence. Alternative reduced-order models (ROMs), based on the identification of different turbulent structures, are explored and they continue to show a good potential in predicting the temporal dynamics of the minimal channel.
△ Less
Submitted 2 March, 2022;
originally announced March 2022.
-
Strong Scaling of OpenACC enabled Nek5000 on several GPU based HPC systems
Authors:
Jonathan Vincent,
**g Gong,
Martin Karp,
Adam Peplinski,
Niclas Jansson,
Artur Podobas,
Andreas Jocksch,
Jie Yao,
Fazle Hussain,
Stefano Markidis,
Matts Karlsson,
Dirk Pleiter,
Erwin Laure,
Philipp Schlatter
Abstract:
We present new results on the strong parallel scaling for the OpenACC-accelerated implementation of the high-order spectral element fluid dynamics solver Nek5000. The test case considered consists of a direct numerical simulation of fully-developed turbulent flow in a straight pipe, at two different Reynolds numbers $Re_τ=360$ and $Re_τ=550$, based on friction velocity and pipe radius. The strong…
▽ More
We present new results on the strong parallel scaling for the OpenACC-accelerated implementation of the high-order spectral element fluid dynamics solver Nek5000. The test case considered consists of a direct numerical simulation of fully-developed turbulent flow in a straight pipe, at two different Reynolds numbers $Re_τ=360$ and $Re_τ=550$, based on friction velocity and pipe radius. The strong scaling is tested on several GPU-enabled HPC systems, including the Swiss Piz Daint system, TACC's Longhorn, Jülich's JUWELS Booster, and Berzelius in Sweden. The performance results show that speed-up between 3-5 can be achieved using the GPU accelerated version compared with the CPU version on these different systems. The run-time for 20 timesteps reduces from 43.5 to 13.2 seconds with increasing the number of GPUs from 64 to 512 for $Re_τ=550$ case on JUWELS Booster system. This illustrates the GPU accelerated version the potential for high throughput. At the same time, the strong scaling limit is significantly larger for GPUs, at about $2000-5000$ elements per rank; compared to about $50-100$ for a CPU-rank.
△ Less
Submitted 4 November, 2021; v1 submitted 8 September, 2021;
originally announced September 2021.
-
A High-Fidelity Flow Solver for Unstructured Meshes on Field-Programmable Gate Arrays
Authors:
Martin Karp,
Artur Podobas,
Tobias Kenter,
Niclas Jansson,
Christian Plessl,
Philipp Schlatter,
Stefano Markidis
Abstract:
The impending termination of Moore's law motivates the search for new forms of computing to continue the performance scaling we have grown accustomed to. Among the many emerging Post-Moore computing candidates, perhaps none is as salient as the Field-Programmable Gate Array (FPGA), which offers the means of specializing and customizing the hardware to the computation at hand.
In this work, we de…
▽ More
The impending termination of Moore's law motivates the search for new forms of computing to continue the performance scaling we have grown accustomed to. Among the many emerging Post-Moore computing candidates, perhaps none is as salient as the Field-Programmable Gate Array (FPGA), which offers the means of specializing and customizing the hardware to the computation at hand.
In this work, we design a custom FPGA-based accelerator for a computational fluid dynamics (CFD) code. Unlike prior work -- which often focuses on accelerating small kernels -- we target the entire Poisson solver on unstructured meshes based on the high-fidelity spectral element method (SEM) used in modern state-of-the-art CFD systems. We model our accelerator using an analytical performance model based on the I/O cost of the algorithm. We empirically evaluate our accelerator on a state-of-the-art Intel Stratix 10 FPGA in terms of performance and power consumption and contrast it against existing solutions on general-purpose processors (CPUs). Finally, we propose a data movement-reducing technique where we compute geometric factors on the fly, which yields significant (700+ Gflop/s) single-precision performance and an upwards of 2x reduction in runtime for the local evaluation of the Laplace operator.
We end the paper by discussing the challenges and opportunities of using reconfigurable architecture in the future, particularly in the light of emerging (not yet available) technologies.
△ Less
Submitted 2 November, 2021; v1 submitted 27 August, 2021;
originally announced August 2021.
-
Physics-informed neural networks for solving Reynolds-averaged Navier$\unicode{x2013}$Stokes equations
Authors:
Hamidreza Eivazi,
Mojtaba Tahani,
Philipp Schlatter,
Ricardo Vinuesa
Abstract:
Physics-informed neural networks (PINNs) are successful machine-learning methods for the solution and identification of partial differential equations (PDEs). We employ PINNs for solving the Reynolds-averaged Navier$\unicode{x2013}$Stokes (RANS) equations for incompressible turbulent flows without any specific model or assumption for turbulence, and by taking only the data on the domain boundaries…
▽ More
Physics-informed neural networks (PINNs) are successful machine-learning methods for the solution and identification of partial differential equations (PDEs). We employ PINNs for solving the Reynolds-averaged Navier$\unicode{x2013}$Stokes (RANS) equations for incompressible turbulent flows without any specific model or assumption for turbulence, and by taking only the data on the domain boundaries. We first show the applicability of PINNs for solving the Navier$\unicode{x2013}$Stokes equations for laminar flows by solving the Falkner$\unicode{x2013}$Skan boundary layer. We then apply PINNs for the simulation of four turbulent-flow cases, i.e., zero-pressure-gradient boundary layer, adverse-pressure-gradient boundary layer, and turbulent flows over a NACA4412 airfoil and the periodic hill. Our results show the excellent applicability of PINNs for laminar flows with strong pressure gradients, where predictions with less than 1% error can be obtained. For turbulent flows, we also obtain very good accuracy on simulation results even for the Reynolds-stress components.
△ Less
Submitted 22 July, 2021;
originally announced July 2021.
-
Neko: A Modern, Portable, and Scalable Framework for High-Fidelity Computational Fluid Dynamics
Authors:
Niclas Jansson,
Martin Karp,
Artur Podobas,
Stefano Markidis,
Philipp Schlatter
Abstract:
Recent trends and advancement in including more diverse and heterogeneous hardware in High-Performance Computing is challenging software developers in their pursuit for good performance and numerical stability. The well-known maxim "software outlives hardware" may no longer necessarily hold true, and developers are today forced to re-factor their codebases to leverage these powerful new systems. C…
▽ More
Recent trends and advancement in including more diverse and heterogeneous hardware in High-Performance Computing is challenging software developers in their pursuit for good performance and numerical stability. The well-known maxim "software outlives hardware" may no longer necessarily hold true, and developers are today forced to re-factor their codebases to leverage these powerful new systems. CFD is one of the many application domains affected. In this paper, we present Neko, a portable framework for high-order spectral element flow simulations. Unlike prior works, Neko adopts a modern object-oriented approach, allowing multi-tier abstractions of the solver stack and facilitating hardware backends ranging from general-purpose processors down to exotic vector processors and FPGAs. We show that Neko's performance and accuracy are comparable to NekRS, and thus on-par with Nek5000's successor on modern CPU machines. Furthermore, we develop a performance model, which we use to discuss challenges and opportunities for high-order solvers on emerging hardware.
△ Less
Submitted 2 July, 2021;
originally announced July 2021.
-
High-Performance Spectral Element Methods on Field-Programmable Gate Arrays
Authors:
Martin Karp,
Artur Podobas,
Niclas Jansson,
Tobias Kenter,
Christian Plessl,
Philipp Schlatter,
Stefano Markidis
Abstract:
Improvements in computer systems have historically relied on two well-known observations: Moore's law and Dennard's scaling. Today, both these observations are ending, forcing computer users, researchers, and practitioners to abandon the general-purpose architectures' comforts in favor of emerging post-Moore systems. Among the most salient of these post-Moore systems is the Field-Programmable Gate…
▽ More
Improvements in computer systems have historically relied on two well-known observations: Moore's law and Dennard's scaling. Today, both these observations are ending, forcing computer users, researchers, and practitioners to abandon the general-purpose architectures' comforts in favor of emerging post-Moore systems. Among the most salient of these post-Moore systems is the Field-Programmable Gate Array (FPGA), which strikes a convenient balance between complexity and performance. In this paper, we study modern FPGAs' applicability in accelerating the Spectral Element Method (SEM) core to many computational fluid dynamics (CFD) applications. We design a custom SEM hardware accelerator operating in double-precision that we empirically evaluate on the latest Stratix 10 GX-series FPGAs and position its performance (and power-efficiency) against state-of-the-art systems such as ARM ThunderX2, NVIDIA Pascal/Volta/Ampere Tesla-series cards, and general-purpose manycore CPUs. Finally, we develop a performance model for our SEM-accelerator, which we use to project future FPGAs' performance and role to accelerate CFD applications, ultimately answering the question: what characteristics would a perfect FPGA for CFD applications have?
△ Less
Submitted 4 May, 2021; v1 submitted 26 October, 2020;
originally announced October 2020.
-
Optimization of Tensor-product Operations in Nekbone on GPUs
Authors:
Martin Karp,
Niclas Jansson,
Artur Podobas,
Philipp Schlatter,
Stefano Markidis
Abstract:
In the CFD solver Nek5000, the computation is dominated by the evaluation of small tensor operations. Nekbone is a proxy app for Nek5000 and has previously been ported to GPUs with a mixed OpenACC and CUDA approach. In this work, we continue this effort and optimize the main tensor-product operation in Nekbone further. Our optimization is done in CUDA and uses a different, 2D, thread structure to…
▽ More
In the CFD solver Nek5000, the computation is dominated by the evaluation of small tensor operations. Nekbone is a proxy app for Nek5000 and has previously been ported to GPUs with a mixed OpenACC and CUDA approach. In this work, we continue this effort and optimize the main tensor-product operation in Nekbone further. Our optimization is done in CUDA and uses a different, 2D, thread structure to make the computations layer by layer. This enables us to use loop unrolling as well as utilize registers and shared memory efficiently. Our implementation is then compared on both the Pascal and Volta GPU architectures to previous GPU versions of Nekbone as well as a measured roofline. The results show that our implementation outperforms previous GPU Nekbone implementations by 6-10%. Compared to the measured roofline, we obtain 77 - 92% of the peak performance for both Nvidia P100 and V100 GPUs for inputs with 1024 - 4096 elements and polynomial degree 9.
△ Less
Submitted 27 May, 2020;
originally announced May 2020.
-
Recurrent neural networks and Koopman-based frameworks for temporal predictions in a low-order model of turbulence
Authors:
Hamidreza Eivazi,
Luca Guastoni,
Philipp Schlatter,
Hossein Azizpour,
Ricardo Vinuesa
Abstract:
The capabilities of recurrent neural networks and Koopman-based frameworks are assessed in the prediction of temporal dynamics of the low-order model of near-wall turbulence by Moehlis et al. (New J. Phys. 6, 56, 2004). Our results show that it is possible to obtain excellent reproductions of the long-term statistics and the dynamic behavior of the chaotic system with properly trained long-short-t…
▽ More
The capabilities of recurrent neural networks and Koopman-based frameworks are assessed in the prediction of temporal dynamics of the low-order model of near-wall turbulence by Moehlis et al. (New J. Phys. 6, 56, 2004). Our results show that it is possible to obtain excellent reproductions of the long-term statistics and the dynamic behavior of the chaotic system with properly trained long-short-term memory (LSTM) networks, leading to relative errors in the mean and the fluctuations below $1\%$. Besides, a newly developed Koopman-based framework, called Koopman with nonlinear forcing (KNF), leads to the same level of accuracy in the statistics at a significantly lower computational expense. Furthermore, the KNF framework outperforms the LSTM network when it comes to short-term predictions. We also observe that using a loss function based only on the instantaneous predictions of the chaotic system can lead to suboptimal reproductions in terms of long-term statistics. Thus, we propose a model-selection criterion based on the computed statistics which allows to achieve excellent statistical reconstruction even on small datasets, with minimal loss of accuracy in the instantaneous predictions.
△ Less
Submitted 14 April, 2021; v1 submitted 1 May, 2020;
originally announced May 2020.
-
On the use of recurrent neural networks for predictions of turbulent flows
Authors:
Luca Guastoni,
Prem A. Srinivasan,
Hossein Azizpour,
Philipp Schlatter,
Ricardo Vinuesa
Abstract:
In this paper, the prediction capabilities of recurrent neural networks are assessed in the low-order model of near-wall turbulence by Moehlis {\it et al.} (New J. Phys. {\bf 6}, 56, 2004). Our results show that it is possible to obtain excellent predictions of the turbulence statistics and the dynamic behavior of the flow with properly trained long short-term memory (LSTM) networks, leading to re…
▽ More
In this paper, the prediction capabilities of recurrent neural networks are assessed in the low-order model of near-wall turbulence by Moehlis {\it et al.} (New J. Phys. {\bf 6}, 56, 2004). Our results show that it is possible to obtain excellent predictions of the turbulence statistics and the dynamic behavior of the flow with properly trained long short-term memory (LSTM) networks, leading to relative errors in the mean and the fluctuations below $1\%$. We also observe that using a loss function based only on the instantaneous predictions of the flow may not lead to the best predictions in terms of turbulence statistics, and it is necessary to define a stop** criterion based on the computed statistics. Furthermore, more sophisticated loss functions, including not only the instantaneous predictions but also the averaged behavior of the flow, may lead to much faster neural network training.
△ Less
Submitted 4 February, 2020;
originally announced February 2020.
-
On the Strong Scaling of the Spectral Element Solver Nek5000 on Petascale Systems
Authors:
Nicolas Offermans,
Oana Marin,
Michel Schanen,
**g Gong,
Paul Fischer,
Philipp Schlatter,
Aleks Obabko,
Adam Peplinksi,
Maxwell Hutchinson,
Elia Merzari
Abstract:
The present work is targeted at performing a strong scaling study of the high-order spectral element fluid dynamics solver Nek5000. Prior studies indicated a recommendable metric for strong scalability from a theoretical viewpoint, which we test here extensively on three parallel machines with different performance characteristics and interconnect networks, namely Mira (IBM Blue Gene/Q), Beskow (C…
▽ More
The present work is targeted at performing a strong scaling study of the high-order spectral element fluid dynamics solver Nek5000. Prior studies indicated a recommendable metric for strong scalability from a theoretical viewpoint, which we test here extensively on three parallel machines with different performance characteristics and interconnect networks, namely Mira (IBM Blue Gene/Q), Beskow (Cray XC40) and Titan (Cray XK7). The test cases considered for the simulations correspond to a turbulent flow in a straight pipe at four different friction Reynolds numbers $Re_τ$ = 180, 360, 550 and 1000. Considering the linear model for parallel communication we quantify the machine characteristics in order to better assess the scaling behaviors of the code. Subsequently sampling and profiling tools are used to measure the computation and communication times over a large range of compute cores. We also study the effect of the two coarse grid solvers XXT and AMG on the computational time. Super-linear scaling due to a reduction in cache misses is observed on each computer. The strong scaling limit is attained for roughly 5000 - 10,000 degrees of freedom per core on Mira, 30,000 - 50,0000 on Beskow, with only a small impact of the problem size for both machines, and ranges between 10,000 and 220,000 depending on the problem size on Titan. This work aims at being a reference for Nek5000 users and also serves as a basis for potential issues to address as the community heads towards exascale supercomputers.
△ Less
Submitted 9 June, 2017;
originally announced June 2017.