-
Smoothers with localized residual computations for geometric multigrid methods
Authors:
Michał Wichrowski,
Peter Munch,
Martin Kronbichler,
Guido Kanschat
Abstract:
We improve the performance of multigrid solvers on many-core architectures with cache hierarchies by reorganizing operations in the smoothing step to minimize memory transfers. We focus on patch smoothers, which offer robust convergence rates with respect to the finite element degree for various equations, in the setting of multiplicative subspace correction for numerical efficiency. By combining…
▽ More
We improve the performance of multigrid solvers on many-core architectures with cache hierarchies by reorganizing operations in the smoothing step to minimize memory transfers. We focus on patch smoothers, which offer robust convergence rates with respect to the finite element degree for various equations, in the setting of multiplicative subspace correction for numerical efficiency. By combining the computation of local residuals with local solvers, we increase the locality of the problem and thus reduce data transfers. The thread-parallel implementation of this algorithm is based on coloring, which contradicts cache efficiency. We improve data locality by rearranging the loop into batches so that more data can be reused. The organization of consecutive batches prioritizes data locality.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
High-performance matrix-free unfitted finite element operator evaluation
Authors:
Maximilian Bergbauer,
Peter Munch,
Wolfgang A. Wall,
Martin Kronbichler
Abstract:
Unfitted finite element methods, like CutFEM, have traditionally been implemented in a matrix-based fashion, where a sparse matrix is assembled and later applied to vectors while solving the resulting linear system. With the goal of increasing performance and enabling algorithms with polynomial spaces of higher degrees, this contribution chooses a more abstract approach by matrix-free evaluation o…
▽ More
Unfitted finite element methods, like CutFEM, have traditionally been implemented in a matrix-based fashion, where a sparse matrix is assembled and later applied to vectors while solving the resulting linear system. With the goal of increasing performance and enabling algorithms with polynomial spaces of higher degrees, this contribution chooses a more abstract approach by matrix-free evaluation of the operator action on vectors instead. The proposed method loops over cells and locally evaluates the cell, face, and interface integrals, including the contributions from cut cells and the different means of stabilization. The main challenge is the efficient numerical evaluation of terms in the weak form with unstructured quadrature points arising from the unfitted discretization in cells cut by the interface. We present design choices and performance optimizations for tensor-product elements and demonstrate the performance by means of benchmarks and application examples. We demonstrate a speedup of more than one order of magnitude for the operator evaluation of a discontinuous Galerkin discretization with polynomial degree three compared to a sparse matrix-vector product and develop performance models to quantify the performance properties over a wide range of polynomial degrees.
△ Less
Submitted 12 April, 2024; v1 submitted 11 April, 2024;
originally announced April 2024.
-
A highly efficient computational approach for part-scale microstructure predictions in Ti-6Al-4V additive manufacturing
Authors:
Sebastian D. Proell,
Julian Brotz,
Martin Kronbichler,
Wolfgang A. Wall,
Christoph Meier
Abstract:
Fast and efficient simulations of metal additive manufacturing (AM) processes are highly relevant to exploring the full potential of this promising manufacturing technique. The microstructure composition plays an important role in characterizing the part quality and deriving mechanical properties. When complete parts are simulated, one often needs to resort to strong simplifications such as layer-…
▽ More
Fast and efficient simulations of metal additive manufacturing (AM) processes are highly relevant to exploring the full potential of this promising manufacturing technique. The microstructure composition plays an important role in characterizing the part quality and deriving mechanical properties. When complete parts are simulated, one often needs to resort to strong simplifications such as layer-wise heating due to the large number of simulated time steps compared to the small time step sizes. This article proposes a scan-resolved approach to the coupled thermo-microstructural problem. Building on a highly efficient thermal model, we discuss the implementation of a phenomenological microstructure model for the evolution of the three main constituents of Ti-6Al-4V: stable $α_s$-phase, martensite $α_m$-phase and $β$-phase. The implementation is tailored to modern hardware features using vectorization and fast approximations of transcendental functions. A performance model and numerical examples verify the high degree of optimization. We demonstrate the applicability and predictive power of the approach and the influence of scan strategy and geometry. Depending on the specific example, results can be obtained with moderate computational resources in a few hours to days. The numerical examples include a prediction of the microstructure on the full NIST AM Benchmark cantilever specimen.
△ Less
Submitted 27 February, 2024;
originally announced February 2024.
-
Graph-based methods for hyperbolic systems of conservation laws using discontinuous space discretizations, Part I: building blocks
Authors:
Martin Kronbichler,
Matthias Maier,
Ignacio Tomas
Abstract:
We present a graph-based discretization method for solving hyperbolic systems of conservation laws using discontinuous finite elements. The method is based on the convex limiting technique technique introduced by Guermond et al. (SIAM J. Sci. Comput. 40, A3211--A3239, 2018). As such, these methods are mathematically guaranteed to be invariant-set preserving and to satisfy discrete pointwise entrop…
▽ More
We present a graph-based discretization method for solving hyperbolic systems of conservation laws using discontinuous finite elements. The method is based on the convex limiting technique technique introduced by Guermond et al. (SIAM J. Sci. Comput. 40, A3211--A3239, 2018). As such, these methods are mathematically guaranteed to be invariant-set preserving and to satisfy discrete pointwise entropy inequalities. In this paper we extend the theory for the specific case of discontinuous finite elements, incorporating the effect of boundary conditions into the formulation. From a practical point of view, the implementation of these methods is algebraic, meaning, that they operate directly on the stencil of the spatial discretization.
This first paper in a sequence of two papers introduces and verifies essential building blocks for the convex limiting procedure using discontinuous Galerkin discretizations. In particular, we discuss a minimally stabilized high-order discontinuous Galerkin method that exhibits optimal convergence rates comparable to linear stabilization techniques for cell-based methods. In addition, we discuss a proper choice of local bounds for the convex limiting procedure. A follow-up contribution will focus on the high-performance implementation, benchmarking and verification of the method.
We verify convergence rates on a sequence of one- and two-dimensional tests with differing regularity. In particular, we obtain optimal convergence rates for single rarefaction waves. We also propose a simple test in order to verify the implementation of boundary conditions and their convergence rates.
△ Less
Submitted 6 February, 2024;
originally announced February 2024.
-
Improved accuracy of continuum surface flux models for metal additive manufacturing melt pool simulations
Authors:
Nils Much,
Magdalena Schreter-Fleischhacker,
Peter Munch,
Martin Kronbichler,
Wolfgang A. Wall,
Christoph Meier
Abstract:
Computational modeling of the melt pool dynamics in laser-based powder bed fusion metal additive manufacturing (PBF-LB/M) promises to shed light on fundamental mechanisms of defect generation. These processes are accompanied by rapid evaporation so that the evaporation-induced recoil pressure and cooling arise as major driving forces for fluid dynamics and temperature evolution. The magnitude of t…
▽ More
Computational modeling of the melt pool dynamics in laser-based powder bed fusion metal additive manufacturing (PBF-LB/M) promises to shed light on fundamental mechanisms of defect generation. These processes are accompanied by rapid evaporation so that the evaporation-induced recoil pressure and cooling arise as major driving forces for fluid dynamics and temperature evolution. The magnitude of these interface fluxes depends exponentially on the melt pool surface temperature, which, therefore, has to be predicted with high accuracy. The present work utilizes a diffuse interface finite element model based on a continuum surface flux (CSF) description of interface fluxes to study dimensionally reduced thermal two-phase problems representative for PBF-LB/M in a finite element framework. It is demonstrated that the extreme temperature gradients combined with the high ratios of material properties between metal and ambient gas lead to significant errors in the interface temperatures and fluxes when classical CSF approaches, along with typical interface thicknesses and discretizations, are applied. It is expected that this finding is also relevant for other types of diffuse interface PBF-LB/M melt pool models. A novel parameter-scaled CSF approach is proposed, which is constructed to yield a smoother temperature field in the diffuse interface region, significantly increasing the solution accuracy. The interface thickness required to predict the temperature field with a given level of accuracy is less restrictive by at least one order of magnitude for the proposed parameter-scaled approach compared to classical CSF, drastically reducing computational costs. Finally, we showcase the general applicability of the parameter-scaled CSF to a 3D simulation of stationary laser melting of PBF-LB/M considering the fully coupled thermo-hydrodynamic multi-phase problem, including phase change.
△ Less
Submitted 12 July, 2024; v1 submitted 22 January, 2024;
originally announced January 2024.
-
A consistent diffuse-interface model for two-phase flow problems with rapid evaporation
Authors:
Magdalena Schreter-Fleischhacker,
Peter Munch,
Nils Much,
Martin Kronbichler,
Wolfgang A. Wall,
Christoph Meier
Abstract:
We present accurate and mathematically consistent formulations of a diffuse-interface model for two-phase flow problems involving rapid evaporation. The model addresses challenges including discontinuities in the density field by several orders of magnitude, leading to high velocity and pressure jumps across the liquid-vapor interface, along with dynamically changing interface topologies. To this…
▽ More
We present accurate and mathematically consistent formulations of a diffuse-interface model for two-phase flow problems involving rapid evaporation. The model addresses challenges including discontinuities in the density field by several orders of magnitude, leading to high velocity and pressure jumps across the liquid-vapor interface, along with dynamically changing interface topologies. To this end, we integrate an incompressible Navier--Stokes solver combined with a conservative level-set formulation and a regularized, i.e., diffuse, representation of discontinuities into a matrix-free adaptive finite element framework. The achievements are three-fold: First, this work proposes mathematically consistent definitions for the level-set transport velocity in the diffuse interface region by extrapolating the velocity from the liquid or gas phase, which exhibit superior prediction accuracy for the evaporated mass and the resulting interface dynamics compared to a local velocity evaluation, especially for highly curved interfaces. Second, we show that accurate prediction of the evaporation-induced pressure jump requires a consistent, namely a reciprocal, density interpolation across the interface, which satisfies local mass conservation. Third, the combination of diffuse interface models for evaporation with standard Stokes-type constitutive relations for viscous flows leads to significant pressure artifacts in the diffuse interface region. To mitigate these, we propose a modification for such constitutive model types. Through selected analytical and numerical examples, the aforementioned properties are validated. The presented model promises new insights in simulation-based prediction of melt-vapor interactions in thermal multiphase flows such as in laser-based powder bed fusion of metals.
△ Less
Submitted 15 January, 2024;
originally announced January 2024.
-
A highly efficient computational framework for fast scan-resolved simulations of metal additive manufacturing processes on the scale of real parts
Authors:
Sebastian D. Proell,
Peter Munch,
Martin Kronbichler,
Wolfgang A. Wall,
Christoph Meier
Abstract:
This article proposes a novel high-performance computing approach for the prediction of the temperature field in powder bed fusion (PBF) additive manufacturing processes. In contrast to many existing approaches to part-scale simulations, the underlying computational model consistently resolves physical scan tracks without additional heat source scaling, agglomeration strategies or any other heuris…
▽ More
This article proposes a novel high-performance computing approach for the prediction of the temperature field in powder bed fusion (PBF) additive manufacturing processes. In contrast to many existing approaches to part-scale simulations, the underlying computational model consistently resolves physical scan tracks without additional heat source scaling, agglomeration strategies or any other heuristic modeling assumptions. A growing, adaptively refined mesh accurately captures all details of the laser beam motion. Critically, the fine spatial resolution required for resolved scan tracks in combination with the high scan velocities underlying these processes mandates the use of comparatively small time steps to resolve the underlying physics. Explicit time integration schemes are well-suited for this setting, while unconditionally stable implicit time integration schemes are employed for the interlayer cool down phase governed by significantly larger time scales. These two schemes are combined and implemented in an efficient fast operator evaluation framework providing significant performance gains and optimization opportunities. The capabilities of the novel framework are demonstrated through realistic AM examples on the centimeter scale including the first scan-resolved simulation of the entire NIST AM Benchmark cantilever specimen, with a computation time of less than one day. Apart from physical insights gained through these simulation examples, also numerical aspects are thoroughly studied on basis of weak and strong parallel scaling tests. As potential applications, the proposed thermal PBF simulation framework can serve as a basis for microstructure and thermo-mechanical predictions on the part-scale, but also to assess the influence of scan pattern and part geometry on melt pool shape and temperature, which are important indicators for well-known process instabilities.
△ Less
Submitted 15 September, 2023; v1 submitted 10 February, 2023;
originally announced February 2023.
-
Stage-parallel fully implicit Runge-Kutta implementations with optimal multilevel preconditioners at the scaling limit
Authors:
Peter Munch,
Ivo Dravins,
Martin Kronbichler,
Maya Neytcheva
Abstract:
We present an implementation of a fully stage-parallel preconditioner for Radau IIA type fully implicit Runge--Kutta methods, which approximates the inverse of $A_Q$ from the Butcher tableau by the lower triangular matrix resulting from an LU decomposition and diagonalizes the system with as many blocks as stages. For the transformed system, we employ a block preconditioner where each block is dis…
▽ More
We present an implementation of a fully stage-parallel preconditioner for Radau IIA type fully implicit Runge--Kutta methods, which approximates the inverse of $A_Q$ from the Butcher tableau by the lower triangular matrix resulting from an LU decomposition and diagonalizes the system with as many blocks as stages. For the transformed system, we employ a block preconditioner where each block is distributed and solved by a subgroup of processes in parallel. For combination of partial results, we either use a communication pattern resembling Cannon's algorithm or shared memory. A performance model and a large set of performance studies (including strong scaling runs with up to 150k processes on 3k compute nodes) conducted for a time-dependent heat problem, using matrix-free finite element methods, indicate that the stage-parallel implementation can reach higher throughputs when the block solvers operate at lower parallel efficiencies, which occurs near the scaling limit. Achievable speedup increases linearly with number of stages and are bounded by the number of stages. Furthermore, we show that the presented stage-parallel concepts are also applicable to the case that $A_Q$ is directly diagonalized, which requires complex arithmetic or the solution of two-by-two blocks and sequentializes parts of the algorithm. Alternatively to distributing stages and assigning them to distinct processes, we discuss the possibility of batching operations from different stages together.
△ Less
Submitted 14 September, 2022;
originally announced September 2022.
-
Enhancing data locality of the conjugate gradient method for high-order matrix-free finite-element implementations
Authors:
Martin Kronbichler,
Dmytro Sashko,
Peter Munch
Abstract:
This work investigates a variant of the conjugate gradient (CG) method and embeds it into the context of high-order finite-element schemes with fast matrix-free operator evaluation and cheap preconditioners like the matrix diagonal. Relying on a data-dependency analysis and appropriate enumeration of degrees of freedom, we interleave the vector updates and inner products in a CG iteration with the…
▽ More
This work investigates a variant of the conjugate gradient (CG) method and embeds it into the context of high-order finite-element schemes with fast matrix-free operator evaluation and cheap preconditioners like the matrix diagonal. Relying on a data-dependency analysis and appropriate enumeration of degrees of freedom, we interleave the vector updates and inner products in a CG iteration with the matrix-vector product with only minor organizational overhead. As a result, around 90% of the vector entries of the three active vectors of the CG method are transferred from slow RAM memory exactly once per iteration, with all additional access hitting fast cache memory. Node-level performance analyses and scaling studies on up to 147k cores show that the CG method with the proposed performance optimizations is around two times faster than a standard CG solver as well as optimized pipelined CG and s-step CG methods for large sizes that exceed processor caches, and provides similar performance near the strong scaling limit.
△ Less
Submitted 18 May, 2022;
originally announced May 2022.
-
Efficient distributed matrix-free multigrid methods on locally refined meshes for FEM computations
Authors:
Peter Munch,
Timo Heister,
Laura Prieto Saavedra,
Martin Kronbichler
Abstract:
This work studies three multigrid variants for matrix-free finite-element computations on locally refined meshes: geometric local smoothing, geometric global coarsening, and polynomial global coarsening. We have integrated the algorithms into the same framework-the open-source finite-element library deal.II-, which allows us to make fair comparisons regarding their implementation complexity, compu…
▽ More
This work studies three multigrid variants for matrix-free finite-element computations on locally refined meshes: geometric local smoothing, geometric global coarsening, and polynomial global coarsening. We have integrated the algorithms into the same framework-the open-source finite-element library deal.II-, which allows us to make fair comparisons regarding their implementation complexity, computational efficiency, and parallel scalability as well as to compare the measurements with theoretically derived performance models. Serial simulations and parallel weak and strong scaling on up to 147,456 CPU cores on 3,072 compute nodes are presented. The results obtained indicate that global coarsening algorithms show a better parallel behavior for comparable smoothers due to the better load balance particularly on the expensive fine levels. In the serial case, the costs of applying hanging-node constraints might be significant, leading to advantages of local smoothing, even though the number of solver iterations needed is slightly higher.
△ Less
Submitted 10 April, 2022; v1 submitted 23 March, 2022;
originally announced March 2022.
-
Lethe-DEM : An open-source parallel discrete element solver with load balancing
Authors:
Shahab Golshan,
Peter Munch,
Rene Gassmoller,
Martin Kronbichler,
Bruno Blais
Abstract:
Approximately $75 \%$ of the raw material and $50 \%$ of the products in the chemical industry are granular materials. The Discrete Element Method (DEM) provides detailed insights of phenomena at particle scale and it is therefore often used for modeling granular materials. However, because DEM tracks the motion and contact of individual particles separately, its computational cost increases non-l…
▽ More
Approximately $75 \%$ of the raw material and $50 \%$ of the products in the chemical industry are granular materials. The Discrete Element Method (DEM) provides detailed insights of phenomena at particle scale and it is therefore often used for modeling granular materials. However, because DEM tracks the motion and contact of individual particles separately, its computational cost increases non-linearly $O(n_p\log(n_p))$ -- $O(n_p^2)$ depending on the algorithm) with the number of particles ($n_p$). In this article, we introduce a new open-source parallel DEM software with load balancing: Lethe-DEM. Lethe-DEM, a module of Lethe, consists of solvers for two-dimensional and three-dimensional DEM simulations. Load-balancing allows Lethe-DEM to significantly increase the parallel efficiency by $\approx 25 - 70 \%$ depending on the granular simulation. We explain the fundamental modules of Lethe-DEM, its software architecture, and the governing equations. Furthermore, we verify Lethe-DEM with several tests including analytical solutions and comparison with other software. Comparisons with experiments in a flat-bottomed silo, wedge-shaped silo, and rotating drum validate Lethe-DEM. We investigate the strong and weak scaling of Lethe-DEM with $1 \leq n_c \leq 192$ and $32 \leq n_c \leq 320$ processes, respectively, with and without load-balancing. The strong-scaling analysis is performed on the wedge-shaped silo and rotating drum simulations, while for the weak-scaling analysis, we use a dam break simulation. The best scalability of Lethe-DEM is obtained in the range of $5000 \leq n_p/n_c \leq 15000$. Finally, we demonstrate that large scale simulations can be carried out with Lethe-DEM using the simulation of a three-dimensional cylindrical silo with $n_p=4.3 \times 10^6$ on 320 cores.
△ Less
Submitted 4 June, 2021;
originally announced June 2021.
-
On the implementation of a robust and efficient finite element-based parallel solver for the compressible Navier-Stokes equations
Authors:
Jean-Luc Guermond,
Martin Kronbichler,
Matthias Maier,
Bojan Popov,
Ignacio Tomas
Abstract:
This paper describes in detail the implementation of a finite element technique for solving the compressible Navier-Stokes equations that is provably robust and demonstrates excellent performance on modern computer hardware. The method is second-order accurate in time and space. Robustness here means that the method is proved to be invariant domain preserving under the hyperbolic CFL time step res…
▽ More
This paper describes in detail the implementation of a finite element technique for solving the compressible Navier-Stokes equations that is provably robust and demonstrates excellent performance on modern computer hardware. The method is second-order accurate in time and space. Robustness here means that the method is proved to be invariant domain preserving under the hyperbolic CFL time step restriction, and the method delivers results that are reproducible. The proposed technique is shown to be accurate on challenging 2D and 3D realistic benchmarks.
△ Less
Submitted 25 October, 2021; v1 submitted 3 June, 2021;
originally announced June 2021.
-
A weakly compressible hybridizable discontinuous Galerkin formulation for fluid-structure interaction problems
Authors:
Andrea La Spina,
Martin Kronbichler,
Matteo Giacomini,
Wolfgang A. Wall,
Antonio Huerta
Abstract:
A scheme for the solution of fluid-structure interaction (FSI) problems with weakly compressible flows is proposed in this work. A novel hybridizable discontinuous Galerkin (HDG) method is derived for the discretization of the fluid equations, while the standard continuous Galerkin (CG) approach is adopted for the structural problem. The chosen HDG solver combines robustness of discontinuous Galer…
▽ More
A scheme for the solution of fluid-structure interaction (FSI) problems with weakly compressible flows is proposed in this work. A novel hybridizable discontinuous Galerkin (HDG) method is derived for the discretization of the fluid equations, while the standard continuous Galerkin (CG) approach is adopted for the structural problem. The chosen HDG solver combines robustness of discontinuous Galerkin (DG) approaches in advection-dominated flows with higher order accuracy and efficient implementations. Two coupling strategies are examined in this contribution, namely a partitioned Dirichlet-Neumann scheme in the context of hybrid HDG-CG discretizations and a monolithic approach based on Nitsche's method, exploiting the definition of the numerical flux and the trace of the solution to impose the coupling conditions. Numerical experiments show optimal convergence of the HDG and CG primal and mixed variables and superconvergence of the postprocessed fluid velocity. The robustness and the efficiency of the proposed weakly compressible formulation, in comparison to a fully incompressible one, are also highlighted on a selection of two and three dimensional FSI benchmark problems.
△ Less
Submitted 9 September, 2020;
originally announced September 2020.
-
Numerical evidence of anomalous energy dissipation in incompressible Euler flows: Towards grid-converged results for the inviscid Taylor-Green problem
Authors:
Niklas Fehn,
Martin Kronbichler,
Peter Munch,
Wolfgang A Wall
Abstract:
Providing evidence of finite-time singularities of the incompressible Euler equations in three space dimensions is still an unsolved problem. Likewise, the zeroth law of turbulence has not been proven to date by numerical experiments. We address this issue by high-resolution numerical simulations of the inviscid three-dimensional Taylor-Green vortex problem using a novel high-order discontinuous G…
▽ More
Providing evidence of finite-time singularities of the incompressible Euler equations in three space dimensions is still an unsolved problem. Likewise, the zeroth law of turbulence has not been proven to date by numerical experiments. We address this issue by high-resolution numerical simulations of the inviscid three-dimensional Taylor-Green vortex problem using a novel high-order discontinuous Galerkin discretization approach. Our main finding is that the kinetic energy evolution does not tend towards exact energy conservation for increasing spatial resolution of the numerical scheme, but instead converges to a solution with nonzero kinetic energy dissipation rate. This implies an energy dissipation anomaly in the absense of viscous dissipation according to Onsager's conjecture, and serves as an indication of finite-time singularities in incompressible inviscid flows. We demonstrate convergence to a dissipative solution for the three-dimensional inviscid Taylor-Green problem with a measured relative $L_2$-error of $0.27 \%$ for the temporal evolution of the kinetic energy and $3.52 \%$ for the kinetic energy dissipation rate.
△ Less
Submitted 3 July, 2020;
originally announced July 2020.
-
Efficient parallel 3D computation of the compressible Euler equations with an invariant-domain preserving second-order finite-element scheme
Authors:
Matthias Maier,
Martin Kronbichler
Abstract:
We discuss the efficient implementation of a high-performance second-order collocation-type finite-element scheme for solving the compressible Euler equations of gas dynamics on unstructured meshes. The solver is based on the convex limiting technique introduced by Guermond et al. (SIAM J. Sci. Comput. 40, A3211-A3239, 2018). As such it is invariant-domain preserving, i.e., the solver maintains im…
▽ More
We discuss the efficient implementation of a high-performance second-order collocation-type finite-element scheme for solving the compressible Euler equations of gas dynamics on unstructured meshes. The solver is based on the convex limiting technique introduced by Guermond et al. (SIAM J. Sci. Comput. 40, A3211-A3239, 2018). As such it is invariant-domain preserving, i.e., the solver maintains important physical invariants and is guaranteed to be stable without the use of ad-hoc tuning parameters. This stability comes at the expense of a significantly more involved algorithmic structure that renders conventional high-performance discretizations challenging. We develop an algorithmic design that allows SIMD vectorization of the compute kernel, identify the main ingredients for a good node-level performance, and report excellent weak and strong scaling of a hybrid thread/MPI parallelization.
△ Less
Submitted 2 February, 2021; v1 submitted 30 June, 2020;
originally announced July 2020.
-
Scalability of High-Performance PDE Solvers
Authors:
Paul Fischer,
Misun Min,
Thilina Rathnayake,
Som Dutta,
Tzanio Kolev,
Veselin Dobrev,
Jean-Sylvain Camier,
Martin Kronbichler,
Tim Warburton,
Kasia Swirydowicz,
Jed Brown
Abstract:
Performance tests and analyses are critical to effective HPC software development and are central components in the design and implementation of computational algorithms for achieving faster simulations on existing and future computing architectures for large-scale application problems. In this paper, we explore performance and space-time trade-offs for important compute-intensive kernels of large…
▽ More
Performance tests and analyses are critical to effective HPC software development and are central components in the design and implementation of computational algorithms for achieving faster simulations on existing and future computing architectures for large-scale application problems. In this paper, we explore performance and space-time trade-offs for important compute-intensive kernels of large-scale numerical solvers for PDEs that govern a wide range of physical applications. We consider a sequence of PDE- motivated bake-off problems designed to establish best practices for efficient high-order simulations across a variety of codes and platforms. We measure peak performance (degrees of freedom per second) on a fixed number of nodes and identify effective code optimization strategies for each architecture. In addition to peak performance, we identify the minimum time to solution at 80% parallel efficiency. The performance analysis is based on spectral and p-type finite elements but is equally applicable to a broad spectrum of numerical PDE discretizations, including finite difference, finite volume, and h-type finite elements.
△ Less
Submitted 14 April, 2020;
originally announced April 2020.
-
High-order arbitrary Lagrangian-Eulerian discontinuous Galerkin methods for the incompressible Navier-Stokes equations
Authors:
Niklas Fehn,
Johannes Heinz,
Wolfgang A. Wall,
Martin Kronbichler
Abstract:
This paper presents robust discontinuous Galerkin methods for the incompressible Navier-Stokes equations on moving meshes. High-order accurate arbitrary Lagrangian-Eulerian formulations are proposed in a unified framework for both monolithic as well as projection or splitting-type Navier-Stokes solvers. The framework is flexible, allows implicit and explicit formulations of the convective term, an…
▽ More
This paper presents robust discontinuous Galerkin methods for the incompressible Navier-Stokes equations on moving meshes. High-order accurate arbitrary Lagrangian-Eulerian formulations are proposed in a unified framework for both monolithic as well as projection or splitting-type Navier-Stokes solvers. The framework is flexible, allows implicit and explicit formulations of the convective term, and adaptive time-step**. The Navier-Stokes equations with ALE transport term are solved on the deformed geometry storing one instance of the mesh that is updated from one time step to the next. Discretization in space is applied to the time discrete equations so that all weak forms and mass matrices are evaluated at the end of the current time step. This design ensures that the proposed formulations fulfill the geometric conservation law automatically, as is shown theoretically and demonstrated numerically by the example of the free-stream preservation test. We discuss the peculiarities related to the imposition of boundary conditions in intermediate steps of projection-type methods and the ingredients needed to preserve high-order accuracy. We show numerically that the formulations proposed in this work maintain the formal order of accuracy of the Navier-Stokes solvers. Moreover, we demonstrate robustness and accuracy for under-resolved turbulent flows.
△ Less
Submitted 16 March, 2020;
originally announced March 2020.
-
hyper.deal: An efficient, matrix-free finite-element library for high-dimensional partial differential equations
Authors:
Peter Munch,
Katharina Kormann,
Martin Kronbichler
Abstract:
This work presents the efficient, matrix-free finite-element library hyper.deal for solving partial differential equations in two to six dimensions with high-order discontinuous Galerkin methods. It builds upon the low-dimensional finite-element library deal.II to create complex low-dimensional meshes and to operate on them individually. These meshes are combined via a tensor product on the fly an…
▽ More
This work presents the efficient, matrix-free finite-element library hyper.deal for solving partial differential equations in two to six dimensions with high-order discontinuous Galerkin methods. It builds upon the low-dimensional finite-element library deal.II to create complex low-dimensional meshes and to operate on them individually. These meshes are combined via a tensor product on the fly and the library provides new special-purpose highly optimized matrix-free functions exploiting domain decomposition as well as shared memory via MPI-3.0 features. Both node-level performance analyses and strong/weak-scaling studies on up to 147,456 CPU cores confirm the efficiency of the implementation. Results of the library hyper.deal are reported for high-dimensional advection problems and for the solution of the Vlasov--Poisson equation in up to 6D phase space.
△ Less
Submitted 19 February, 2020;
originally announced February 2020.
-
The deal.II finite element library: design, features, and insights
Authors:
Daniel Arndt,
Wolfgang Bangerth,
Denis Davydov,
Timo Heister,
Luca Heltai,
Martin Kronbichler,
Matthias Maier,
Jean-Paul Pelteret,
Bruno Turcksin,
David Wells
Abstract:
deal.II is a state-of-the-art finite element library focused on generality, dimension-independent programming, parallelism, and extensibility. Herein, we outline its primary design considerations and its sophisticated features such as distributed meshes, $hp$-adaptivity, support for complex geometries, and matrix-free algorithms. But deal.II is more than just a software library: It is also a diver…
▽ More
deal.II is a state-of-the-art finite element library focused on generality, dimension-independent programming, parallelism, and extensibility. Herein, we outline its primary design considerations and its sophisticated features such as distributed meshes, $hp$-adaptivity, support for complex geometries, and matrix-free algorithms. But deal.II is more than just a software library: It is also a diverse and worldwide community of developers and users, as well as an educational platform. We therefore also discuss some of the technical and social challenges and lessons learned in running a large community software project over the course of two decades.
△ Less
Submitted 17 February, 2020; v1 submitted 24 October, 2019;
originally announced October 2019.
-
Propagating geometry information to finite element computations
Authors:
Luca Heltai,
Wolfgang Bangerth,
Martin Kronbichler,
Andrea Mola
Abstract:
The traditional workflow in continuum mechanics simulations is that a geometry description -- for example obtained using Constructive Solid Geometry or Computer Aided Design tools -- forms the input for a mesh generator. The mesh is then used as the sole input for the finite element, finite volume, and finite difference solver, which at this point no longer has access to the original, "underlying"…
▽ More
The traditional workflow in continuum mechanics simulations is that a geometry description -- for example obtained using Constructive Solid Geometry or Computer Aided Design tools -- forms the input for a mesh generator. The mesh is then used as the sole input for the finite element, finite volume, and finite difference solver, which at this point no longer has access to the original, "underlying" geometry. However, many modern techniques -- for example, adaptive mesh refinement and the use of higher order geometry approximation methods -- really do need information about the underlying geometry to realize their full potential. We have undertaken an exhaustive study of where typical finite element codes use geometry information, with the goal of determining what information geometry tools would have to provide. Our study shows that nearly all geometry-related needs inside the simulators can be satisfied by just two "primitives": elementary queries posed by the simulation software to the geometry description. We then show that it is possible to provide these primitives in all of the frequently used ways in which geometries are described in common industrial workflows, and illustrate our solutions using a number of examples.
△ Less
Submitted 7 July, 2021; v1 submitted 22 October, 2019;
originally announced October 2019.
-
Hybrid multigrid methods for high-order discontinuous Galerkin discretizations
Authors:
Niklas Fehn,
Peter Munch,
Wolfgang A. Wall,
Martin Kronbichler
Abstract:
The present work develops hybrid multigrid methods for high-order discontinuous Galerkin discretizations of elliptic problems. Fast matrix-free operator evaluation on tensor product elements is used to devise a computationally efficient PDE solver. The multigrid hierarchy exploits all possibilities of geometric, polynomial, and algebraic coarsening, targeting engineering applications on complex ge…
▽ More
The present work develops hybrid multigrid methods for high-order discontinuous Galerkin discretizations of elliptic problems. Fast matrix-free operator evaluation on tensor product elements is used to devise a computationally efficient PDE solver. The multigrid hierarchy exploits all possibilities of geometric, polynomial, and algebraic coarsening, targeting engineering applications on complex geometries. Additionally, a transfer from discontinuous to continuous function spaces is performed within the multigrid hierarchy. This does not only further reduce the problem size of the coarse-grid problem, but also leads to a discretization most suitable for state-of-the-art algebraic multigrid methods applied as coarse-grid solver. The relevant design choices regarding the selection of optimal multigrid coarsening strategies among the various possibilities are discussed with the metric of computational costs as the driving force for algorithmic selections. We find that a transfer to a continuous function space at highest polynomial degree (or on the finest mesh), followed by polynomial and geometric coarsening, shows the best overall performance. The success of this particular multigrid strategy is due to a significant reduction in iteration counts as compared to a transfer from discontinuous to continuous function spaces at lowest polynomial degree (or on the coarsest mesh). The coarsening strategy with transfer to a continuous function space on the finest level leads to a multigrid algorithm that is robust with respect to the penalty parameter of the SIPG method. Detailed numerical investigations are conducted for a series of examples ranging from academic test cases to more complex, practically relevant geometries. Performance comparisons to state-of-the-art methods from the literature demonstrate the versatility and computational efficiency of the proposed multigrid algorithms.
△ Less
Submitted 4 October, 2019;
originally announced October 2019.
-
A Hermite-like basis for faster matrix-free evaluation of interior penalty discontinuous Galerkin operators
Authors:
Martin Kronbichler,
Katharina Kormann,
Niklas Fehn,
Peter Munch,
Julius Witte
Abstract:
This work proposes a basis for improved throughput of matrix-free evaluation of discontinuous Galerkin symmetric interior penalty discretizations on hexahedral elements. The basis relies on ideas of Hermite polynomials. It is used in a fully discontinuous setting not for higher order continuity but to minimize the effective stencil width, namely to limit the neighbor access of an element to one da…
▽ More
This work proposes a basis for improved throughput of matrix-free evaluation of discontinuous Galerkin symmetric interior penalty discretizations on hexahedral elements. The basis relies on ideas of Hermite polynomials. It is used in a fully discontinuous setting not for higher order continuity but to minimize the effective stencil width, namely to limit the neighbor access of an element to one data point for the function value and one for the derivative. The basis is extended to higher orders with nodal contributions derived from roots of Jacobi polynomials and extended to multiple dimensions with tensor products, which enable the use of sum factorization. The beneficial effect of the reduced data access on modern processors is shown. Furthermore, the viability of the basis in the context of multigrid solvers is analyzed. While a plain point-Jacobi approach is less efficient than with the best nodal polynomials, a basis change via sum-factorization techniques enables the combination of the fast matrix-vector products with effective multigrid constituents. The basis change is essentially for free on modern hardware because these computations can be hidden behind the cost of the data access.
△ Less
Submitted 19 July, 2019;
originally announced July 2019.
-
Algorithms and data structures for matrix-free finite element operators with MPI-parallel sparse multi-vectors
Authors:
Denis Davydov,
Martin Kronbichler
Abstract:
Traditional solution approaches for problems in quantum mechanics scale as $\mathcal O(M^3)$, where $M$ is the number of electrons. Various methods have been proposed to address this issue and obtain linear scaling $\mathcal O(M)$. One promising formulation is the direct minimization of energy. Such methods take advantage of physical localization of the solution, namely that the solution can be so…
▽ More
Traditional solution approaches for problems in quantum mechanics scale as $\mathcal O(M^3)$, where $M$ is the number of electrons. Various methods have been proposed to address this issue and obtain linear scaling $\mathcal O(M)$. One promising formulation is the direct minimization of energy. Such methods take advantage of physical localization of the solution, namely that the solution can be sought in terms of non-orthogonal orbitals with local support. In this work a numerically efficient implementation of sparse parallel vectors within the open-source finite element library deal.II is proposed. The main algorithmic ingredient is the matrix-free evaluation of the Hamiltonian operator by cell-wise quadrature. Based on an a-priori chosen support for each vector we develop algorithms and data structures to perform (i) matrix-free sparse matrix multivector products (SpMM), (ii) the projection of an operator onto a sparse sub-space (inner products), and (iii) post-multiplication of a sparse multivector with a square matrix. The node-level performance is analyzed using a roofline model. Our matrix-free implementation of finite element operators with sparse multivectors achieves the performance of 157 GFlop/s on Intel Cascade Lake architecture. Strong and weak scaling results are reported for a typical benchmark problem using quadratic and quartic finite element bases.
△ Less
Submitted 1 July, 2019;
originally announced July 2019.
-
High-order DG solvers for under-resolved turbulent incompressible flows: A comparison of $L^2$ and $H$(div) methods
Authors:
Niklas Fehn,
Martin Kronbichler,
Christoph Lehrenfeld,
Gert Lube,
Philipp W. Schroeder
Abstract:
The accurate numerical simulation of turbulent incompressible flows is a challenging topic in computational fluid dynamics. For discretisation methods to be robust in the under-resolved regime, mass conservation as well as energy stability are key ingredients to obtain robust and accurate discretisations. Recently, two approaches have been proposed in the context of high-order discontinuous Galerk…
▽ More
The accurate numerical simulation of turbulent incompressible flows is a challenging topic in computational fluid dynamics. For discretisation methods to be robust in the under-resolved regime, mass conservation as well as energy stability are key ingredients to obtain robust and accurate discretisations. Recently, two approaches have been proposed in the context of high-order discontinuous Galerkin (DG) discretisations that address these aspects differently. On the one hand, standard $L^2$-based DG discretisations enforce mass conservation and energy stability weakly by the use of additional stabilisation terms. On the other hand, pointwise divergence-free $H(\operatorname{div})$-conforming approaches ensure exact mass conservation and energy stability by the use of tailored finite element function spaces. The present work raises the question whether and to which extent these two approaches are equivalent when applied to under-resolved turbulent flows. This comparative study highlights similarities and differences of these two approaches. The numerical results emphasise that both discretisation strategies are promising for under-resolved simulations of turbulent flows due to their inherent dissipation mechanisms.
△ Less
Submitted 29 April, 2019;
originally announced May 2019.
-
A hybridizable discontinuous Galerkin method for electromagnetics with a view on subsurface applications
Authors:
Luca Berardocco,
Martin Kronbichler,
Volker Gravemeier
Abstract:
Two Hybridizable Discontinuous Galerkin (HDG) schemes for the solution of Maxwell's equations in the time domain are presented. The first method is based on an electromagnetic diffusion equation, while the second is based on Faraday's and Maxwell--Ampère's laws. Both formulations include the diffusive term depending on the conductivity of the medium. The three-dimensional formulation of the electr…
▽ More
Two Hybridizable Discontinuous Galerkin (HDG) schemes for the solution of Maxwell's equations in the time domain are presented. The first method is based on an electromagnetic diffusion equation, while the second is based on Faraday's and Maxwell--Ampère's laws. Both formulations include the diffusive term depending on the conductivity of the medium. The three-dimensional formulation of the electromagnetic diffusion equation in the framework of HDG methods, the introduction of the conduction current term and the choice of the electric field as hybrid variable in a mixed formulation are the key points of the current study. Numerical results are provided for validation purposes and convergence studies of spatial and temporal discretizations are carried out. The test cases include both simulation in dielectric and conductive media.
△ Less
Submitted 23 April, 2019;
originally announced April 2019.
-
A Flexible, Parallel, Adaptive Geometric Multigrid method for FEM
Authors:
Thomas C. Clevenger,
Timo Heister,
Guido Kanschat,
Martin Kronbichler
Abstract:
We present the design and implementation details of a geometric multigrid method on adaptively refined meshes for massively parallel computations. The method uses local smoothing on the refined part of the mesh. Partitioning is achieved by using a space filling curve for the leaf mesh and distributing ancestors in the hierarchy based on the leaves. We present a model of the efficiency of mesh hier…
▽ More
We present the design and implementation details of a geometric multigrid method on adaptively refined meshes for massively parallel computations. The method uses local smoothing on the refined part of the mesh. Partitioning is achieved by using a space filling curve for the leaf mesh and distributing ancestors in the hierarchy based on the leaves. We present a model of the efficiency of mesh hierarchy distribution and compare its predictions to runtime measurements. The algorithm is implemented as part of the deal.II finite element library and as such available to the public.
△ Less
Submitted 3 August, 2021; v1 submitted 5 April, 2019;
originally announced April 2019.
-
Modern discontinuous Galerkin methods for the simulation of transitional and turbulent flows in biomedical engineering: A comprehensive LES study of the FDA benchmark nozzle model
Authors:
Niklas Fehn,
Wolfgang A. Wall,
Martin Kronbichler
Abstract:
This work uses high-order discontinuous Galerkin discretization techniques as a generic, parameter-free, and reliable tool to accurately predict transitional and turbulent flows through medical devices. Flows through medical devices are characterized by moderate Reynolds numbers and typically involve different flow regimes such as laminar, transitional, and turbulent flows. Previous studies for th…
▽ More
This work uses high-order discontinuous Galerkin discretization techniques as a generic, parameter-free, and reliable tool to accurately predict transitional and turbulent flows through medical devices. Flows through medical devices are characterized by moderate Reynolds numbers and typically involve different flow regimes such as laminar, transitional, and turbulent flows. Previous studies for the FDA benchmark nozzle model revealed limitations of Reynolds-averaged Navier-Stokes turbulence models when applied to more complex flow scenarios. Recent works based on large-eddy simulation approaches indicate that these limitations can be overcome but also highlight potential limitations due to a high sensitivity with respect to numerical parameters. The novel methodology presented in this work is based on two key ingredients. Firstly, we use high-order discontinuous Galerkin methods for discretization in space yielding a discretization approach that is robust, accurate, and generic. The inherent dissipation properties of high-order discontinuous Galerkin discretizations render this approach well-suited for transitional and turbulent flow simulations. Secondly, a precursor simulation approach is applied in order to correctly predict the inflow boundary condition for the whole range of laminar, transitional, and turbulent flow regimes. This approach eliminates the need to fit parameters of the numerical solution approach. We investigate the whole range of Reynolds numbers as suggested by the FDA benchmark nozzle problem in order to critically assess the predictive capabilities of the solver. The results presented in this study are compared to experimental data obtained by particle image velocimetry demonstrating that the approach is capable of correctly predicting the flow for different flow regimes.
△ Less
Submitted 16 November, 2018;
originally announced November 2018.
-
A matrix-free high-order discontinuous Galerkin compressible Navier-Stokes solver: A performance comparison of compressible and incompressible formulations for turbulent incompressible flows
Authors:
Niklas Fehn,
Wolfgang A. Wall,
Martin Kronbichler
Abstract:
Both compressible and incompressible Navier-Stokes solvers can be used and are used to solve incompressible turbulent flow problems. In the compressible case, the Mach number is then considered as a solver parameter that is set to a small value, $\mathrm{M}\approx 0.1$, in order to mimic incompressible flows. This strategy is widely used for high-order discontinuous Galerkin discretizations of the…
▽ More
Both compressible and incompressible Navier-Stokes solvers can be used and are used to solve incompressible turbulent flow problems. In the compressible case, the Mach number is then considered as a solver parameter that is set to a small value, $\mathrm{M}\approx 0.1$, in order to mimic incompressible flows. This strategy is widely used for high-order discontinuous Galerkin discretizations of the compressible Navier-Stokes equations. The present work raises the question regarding the computational efficiency of compressible DG solvers as compared to a genuinely incompressible formulation. Our contributions to the state-of-the-art are twofold: Firstly, we present a high-performance discontinuous Galerkin solver for the compressible Navier-Stokes equations based on a highly efficient matrix-free implementation that targets modern cache-based multicore architectures. The performance results presented in this work focus on the node-level performance and our results suggest that there is great potential for further performance improvements for current state-of-the-art discontinuous Galerkin implementations of the compressible Navier-Stokes equations. Secondly, this compressible Navier-Stokes solver is put into perspective by comparing it to an incompressible DG solver that uses the same matrix-free implementation. We discuss algorithmic differences between both solution strategies and present an in-depth numerical investigation of the performance. The considered benchmark test cases are the three-dimensional Taylor-Green vortex problem as a representative of transitional flows and the turbulent channel flow problem as a representative of wall-bounded turbulent flows.
△ Less
Submitted 8 June, 2018;
originally announced June 2018.
-
Efficient Explicit Time Step** of High Order Discontinuous Galerkin Schemes for Waves
Authors:
Svenja Schoeder,
Katharina Kormann,
Wolfgang Wall,
Martin Kronbichler
Abstract:
This work presents algorithms for the efficient implementation of discontinuous Galerkin methods with explicit time step** for acoustic wave propagation on unstructured meshes of quadrilaterals or hexahedra. A crucial step towards efficiency is to evaluate operators in a matrix-free way with sum-factorization kernels. The method allows for general curved geometries and variable coefficients. Tem…
▽ More
This work presents algorithms for the efficient implementation of discontinuous Galerkin methods with explicit time step** for acoustic wave propagation on unstructured meshes of quadrilaterals or hexahedra. A crucial step towards efficiency is to evaluate operators in a matrix-free way with sum-factorization kernels. The method allows for general curved geometries and variable coefficients. Temporal discretization is carried out by low-storage explicit Runge-Kutta schemes and the arbitrary derivative (ADER) method. For ADER, we propose a flexible basis change approach that combines cheap face integrals with cell evaluation using collocated nodes and quadrature points. Additionally, a degree reduction for the optimized cell evaluation is presented to decrease the computational cost when evaluating higher order spatial derivatives as required in ADER time step**. We analyze and compare the performance of state-of-the-art Runge-Kutta schemes and ADER time step** with the proposed optimizations. ADER involves fewer operations and additionally reaches higher throughput by higher arithmetic intensities and hence decreases the required computational time significantly. Comparison of Runge-Kutta and ADER at their respective CFL stability limit renders ADER especially beneficial for higher orders when the Butcher barrier implies an overproportional amount of stages. Moreover, vector updates in explicit Runge--Kutta schemes are shown to take a substantial amount of the computational time due to their memory intensity.
△ Less
Submitted 9 May, 2018;
originally announced May 2018.
-
Effective slip over partially filled microcavities and its possible failure
Authors:
Zhouyang Ge,
Hanna Holmgren,
Martin Kronbichler,
Luca Brandt,
Gunilla Kreiss
Abstract:
Motivated by the emerging applications of liquid-infused surfaces (LIS), we study the drag reduction and robustness of transverse flows over two-dimensional microcavities partially filled with an oily lubricant. Using separate simulations at different scales, characteristic contact line velocities at the fluid-solid intersection are first extracted from nano-scale phase field simulations and then…
▽ More
Motivated by the emerging applications of liquid-infused surfaces (LIS), we study the drag reduction and robustness of transverse flows over two-dimensional microcavities partially filled with an oily lubricant. Using separate simulations at different scales, characteristic contact line velocities at the fluid-solid intersection are first extracted from nano-scale phase field simulations and then applied to micron-scale two-phase flows, thus introducing a multiscale numerical framework to model the interface displacement and deformation within the cavities. As we explore the various effects of the lubricant-to-outer-fluid viscosity ratio $\tildeμ_2/\tildeμ_1$, the capillary number Ca, the static contact angle $θ_s$, and the filling fraction of the cavity $δ$, we find that the effective slip is most sensitive to the parameter $δ$. The effects of $\tildeμ_2/\tildeμ_1$ and $θ_s$ are generally intertwined, but weakened if $δ< 1$. Moreover, for an initial filling fraction $δ=0.94$, our results show that the effective slip is nearly independent of the capillary number, when it is small. Further increasing Ca to about $0.01 \tildeμ_1/\tildeμ_2$, we identify a possible failure mode, associated with lubricants draining from the LIS, for $\tildeμ_2/\tildeμ_1 \lesssim 0.1$. Very viscous lubricants (\eg $\tildeμ_2/\tildeμ_1 >1$), on the other hand, are immune to such failure due to their generally larger contact line velocity.
△ Less
Submitted 7 May, 2018;
originally announced May 2018.
-
Efficiency of high-performance discontinuous Galerkin spectral element methods for under-resolved turbulent incompressible flows
Authors:
Niklas Fehn,
Wolfgang A. Wall,
Martin Kronbichler
Abstract:
The present paper addresses the numerical solution of turbulent flows with high-order discontinuous Galerkin methods for discretizing the incompressible Navier-Stokes equations. The efficiency of high-order methods when applied to under-resolved problems is an open issue in literature. This topic is carefully investigated in the present work by the example of the 3D Taylor-Green vortex problem. Ou…
▽ More
The present paper addresses the numerical solution of turbulent flows with high-order discontinuous Galerkin methods for discretizing the incompressible Navier-Stokes equations. The efficiency of high-order methods when applied to under-resolved problems is an open issue in literature. This topic is carefully investigated in the present work by the example of the 3D Taylor-Green vortex problem. Our implementation is based on a generic high-performance framework for matrix-free evaluation of finite element operators with one of the best realizations currently known. We present a methodology to systematically analyze the efficiency of the incompressible Navier-Stokes solver for high polynomial degrees. Due to the absence of optimal rates of convergence in the under-resolved regime, our results reveal that demonstrating improved efficiency of high-order methods is a challenging task and that optimal computational complexity of solvers, preconditioners, and matrix-free implementations are necessary ingredients to achieve the goal of better solution quality at the same computational costs already for a geometrically simple problem such as the Taylor-Green vortex. Although the analysis is performed for a Cartesian geometry, our approach is generic and can be applied to arbitrary geometries. We present excellent performance numbers on modern, cache-based computer architectures achieving a throughput for operator evaluation of 3e8 up to 1e9 DoFs/sec on one Intel Haswell node with 28 cores. Compared to performance results published within the last 5 years for high-order DG discretizations of the compressible Navier-Stokes equations, our approach reduces computational costs by more than one order of magnitude for the same setup.
△ Less
Submitted 5 February, 2018;
originally announced February 2018.
-
Robust and efficient discontinuous Galerkin methods for under-resolved turbulent incompressible flows
Authors:
Niklas Fehn,
Wolfgang A Wall,
Martin Kronbichler
Abstract:
We present a robust and accurate discretization approach for incompressible turbulent flows based on high-order discontinuous Galerkin methods. The DG discretization of the incompressible Navier-Stokes equations uses the local Lax-Friedrichs flux for the convective term, the symmetric interior penalty method for the viscous term, and central fluxes for the velocity-pressure coupling terms. Stabili…
▽ More
We present a robust and accurate discretization approach for incompressible turbulent flows based on high-order discontinuous Galerkin methods. The DG discretization of the incompressible Navier-Stokes equations uses the local Lax-Friedrichs flux for the convective term, the symmetric interior penalty method for the viscous term, and central fluxes for the velocity-pressure coupling terms. Stability of the discretization approach for under-resolved, turbulent flow problems is realized by a purely numerical stabilization approach. Consistent penalty terms that enforce the incompressibility constraint as well as inter-element continuity of the velocity field in a weak sense render the numerical method a robust discretization scheme in the under-resolved regime. The penalty parameters are derived by means of dimensional analysis using penalty factors of order 1. Applying these penalty terms in a postprocessing step leads to an efficient solution algorithm for turbulent flows. The proposed approach is applicable independently of the solution strategy used to solve the incompressible Navier-Stokes equations, i.e., it can be used for both projection-type solution methods as well as monolithic solution approaches. Since our approach is based on consistent penalty terms, it is by definition generic and provides optimal rates of convergence when applied to laminar flow problems. Robustness and accuracy are verified for the Orr-Sommerfeld stability problem, the Taylor-Green vortex problem, and turbulent channel flow. Moreover, the accuracy of high-order discretizations as compared to low-order discretizations is investigated for these flow problems. A comparison to state-of-the-art computational approaches for large-eddy simulation indicates that the proposed methods are highly attractive components for turbulent flow solvers.
△ Less
Submitted 24 January, 2018;
originally announced January 2018.
-
Wall modeling via function enrichment: extension to detached-eddy simulation
Authors:
Benjamin Krank,
Martin Kronbichler,
Wolfgang A. Wall
Abstract:
We extend the approach of wall modeling via function enrichment to detached-eddy simulation. The wall model aims at using coarse cells in the near-wall region by modeling the velocity profile in the viscous sublayer and log-layer. However, unlike other wall models, the full Navier-Stokes equations are still discretely fulfilled, including the pressure gradient and convective term. This is achieved…
▽ More
We extend the approach of wall modeling via function enrichment to detached-eddy simulation. The wall model aims at using coarse cells in the near-wall region by modeling the velocity profile in the viscous sublayer and log-layer. However, unlike other wall models, the full Navier-Stokes equations are still discretely fulfilled, including the pressure gradient and convective term. This is achieved by enriching the elements of the high-order discontinuous Galerkin method with the law-of-the-wall. As a result, the Galerkin method can "choose" the optimal solution among the polynomial and enrichment shape functions. The detached-eddy simulation methodology provides a suitable turbulence model for the coarse near-wall cells. The approach is applied to wall-modeled LES of turbulent channel flow in a wide range of Reynolds numbers. Flow over periodic hills shows the superiority compared to an equilibrium wall model under separated flow conditions.
△ Less
Submitted 22 December, 2017;
originally announced December 2017.
-
Fast matrix-free evaluation of discontinuous Galerkin finite element operators
Authors:
Martin Kronbichler,
Katharina Kormann
Abstract:
We present an algorithmic framework for matrix-free evaluation of discontinuous Galerkin finite element operators based on sum factorization on quadrilateral and hexahedral meshes. We identify a set of kernels for fast quadrature on cells and faces targeting a wide class of weak forms originating from linear and nonlinear partial differential equations. Different algorithms and data structures for…
▽ More
We present an algorithmic framework for matrix-free evaluation of discontinuous Galerkin finite element operators based on sum factorization on quadrilateral and hexahedral meshes. We identify a set of kernels for fast quadrature on cells and faces targeting a wide class of weak forms originating from linear and nonlinear partial differential equations. Different algorithms and data structures for the implementation of operator evaluation are compared in an in-depth performance analysis. The sum factorization kernels are optimized by vectorization over several cells and faces and an even-odd decomposition of the one-dimensional compute kernels. In isolation our implementation then reaches up to 60\% of arithmetic peak on Intel Haswell and Broadwell processors and up to 50\% of arithmetic peak on Intel Knights Landing. The full operator evaluation reaches only about half that throughput due to memory bandwidth limitations from loading the input and output vectors, MPI ghost exchange, as well as handling variable coefficients and the geometry. Our performance analysis shows that the results are often within 10\% of the available memory bandwidth for the proposed implementation, with the exception of the Cartesian mesh case where the cost of gather operations and MPI communication are more substantial.
△ Less
Submitted 9 November, 2017;
originally announced November 2017.
-
On the stability of projection methods for the incompressible Navier-Stokes equations based on high-order discontinuous Galerkin discretizations
Authors:
Niklas Fehn,
Wolfgang A. Wall,
Martin Kronbichler
Abstract:
The present paper deals with the numerical solution of the incompressible Navier-Stokes equations using high-order discontinuous Galerkin (DG) methods for discretization in space. For DG methods applied to the dual splitting projection method, instabilities have recently been reported that occur for coarse spatial resolutions and small time step sizes. By means of numerical investigation we give e…
▽ More
The present paper deals with the numerical solution of the incompressible Navier-Stokes equations using high-order discontinuous Galerkin (DG) methods for discretization in space. For DG methods applied to the dual splitting projection method, instabilities have recently been reported that occur for coarse spatial resolutions and small time step sizes. By means of numerical investigation we give evidence that these instabilities are related to the discontinuous Galerkin formulation of the velocity divergence term and the pressure gradient term that couple velocity and pressure. Integration by parts of these terms with a suitable definition of boundary conditions is required in order to obtain a stable and robust method. Since the intermediate velocity field does not fulfill the boundary conditions prescribed for the velocity, a consistent boundary condition is derived from the convective step of the dual splitting scheme to ensure high-order accuracy with respect to the temporal discretization. This new formulation is stable in the limit of small time steps for both equal-order and mixed-order polynomial approximations. Although the dual splitting scheme itself includes inf-sup stabilizing contributions, we demonstrate that spurious pressure oscillations appear for equal-order polynomials and small time steps highlighting the necessity to consider inf-sup stability explicitly.
△ Less
Submitted 28 June, 2017;
originally announced June 2017.
-
A multiscale approach to hybrid RANS/LES wall modeling within a high-order discontinuous Galerkin scheme using function enrichment
Authors:
Benjamin Krank,
Martin Kronbichler,
Wolfgang A. Wall
Abstract:
We present a novel approach to hybrid RANS/LES wall modeling based on function enrichment, which overcomes the common problem of the RANS-LES transition and enables coarse meshes near the boundary. While the concept of function enrichment as an efficient discretization technique for turbulent boundary layers has been proposed in an earlier article by Krank & Wall (J. Comput. Phys. 316 (2016) 94-11…
▽ More
We present a novel approach to hybrid RANS/LES wall modeling based on function enrichment, which overcomes the common problem of the RANS-LES transition and enables coarse meshes near the boundary. While the concept of function enrichment as an efficient discretization technique for turbulent boundary layers has been proposed in an earlier article by Krank & Wall (J. Comput. Phys. 316 (2016) 94-116), the contribution of this work is a rigorous derivation of a new multiscale turbulence modeling approach and a corresponding discontinuous Galerkin discretization scheme. In the near-wall area, the Navier-Stokes equations are explicitly solved for an LES and a RANS component in one single equation. This is done by providing the Galerkin method with an independent set of shape functions for each of these two methods; the standard high-order polynomial basis resolves turbulent eddies where the mesh is sufficiently fine and the enrichment automatically computes the ensemble-averaged flow if the LES mesh is too coarse. As a result of the derivation, the RANS model is consistently applied solely to the RANS degrees of freedom, which effectively prevents the typical issue of a log-layer mismatch in attached boundary layers. As the full Navier-Stokes equations are solved in the boundary layer, spatial refinement gradually yields wall-resolved LES with exact boundary conditions. Numerical tests show the outstanding characteristics of the wall model regarding grid independence, superiority compared to equilibrium wall models in separated flows, and achieve a speed-up by two orders of magnitude compared to wall-resolved LES.
△ Less
Submitted 27 December, 2017; v1 submitted 24 May, 2017;
originally announced May 2017.
-
A performance comparison of continuous and discontinuous Galerkin methods with fast multigrid solvers
Authors:
Martin Kronbichler,
Wolfgang A. Wall
Abstract:
This study presents a fair performance comparison of the continuous finite element method, the symmetric interior penalty discontinuous Galerkin method, and the hybridized discontinuous Galerkin method. Modern implementations of high-order methods with state-of-the-art multigrid solvers for the Poisson equation are considered, including fast matrix-free implementations with sum factorization on qu…
▽ More
This study presents a fair performance comparison of the continuous finite element method, the symmetric interior penalty discontinuous Galerkin method, and the hybridized discontinuous Galerkin method. Modern implementations of high-order methods with state-of-the-art multigrid solvers for the Poisson equation are considered, including fast matrix-free implementations with sum factorization on quadrilateral and hexahedral elements. For the hybridized discontinuous Galerkin method, a multigrid approach that combines a grid transfer from the trace space to the space of linear finite elements with algebraic multigrid on further levels is developed. Despite similar solver complexity of the matrix-based HDG solver and matrix-free geometric multigrid schemes with continuous and discontinuous Galerkin finite elements, the latter offer up to order of magnitude faster time to solution, even after including the superconvergence effects. This difference is because of vastly better performance of matrix-free operator evaluation as compared to sparse matrix-vector products. A roofline performance model confirms the advantage of the matrix-free implementation.
△ Less
Submitted 9 December, 2016; v1 submitted 9 November, 2016;
originally announced November 2016.
-
Wall modeling via function enrichment within a high-order DG method for RANS simulations of incompressible flow
Authors:
Benjamin Krank,
Martin Kronbichler,
Wolfgang A. Wall
Abstract:
We present a novel approach to wall modeling for RANS within the discontinuous Galerkin method. Wall functions are not used to prescribe boundary conditions as usual but they are built into the function space of the numerical method as a local enrichment, in addition to the standard polynomial component. The Galerkin method then automatically finds the optimal solution among all shape functions av…
▽ More
We present a novel approach to wall modeling for RANS within the discontinuous Galerkin method. Wall functions are not used to prescribe boundary conditions as usual but they are built into the function space of the numerical method as a local enrichment, in addition to the standard polynomial component. The Galerkin method then automatically finds the optimal solution among all shape functions available. This idea is fully consistent and gives the wall model vast flexibility in separated boundary layers or high adverse pressure gradients. The wall model is implemented in a high-order discontinuous Galerkin solver for incompressible flow complemented by the Spalart-Allmaras closure model. As benchmark examples we present turbulent channel flow starting from $Re_τ=180$ and up to $Re_τ=100{,}000$ as well as flow past periodic hills at Reynolds numbers based on the hill height of $Re_H=10{,}595$ and $Re_{H}=19{,}000$.
△ Less
Submitted 26 October, 2016;
originally announced October 2016.
-
A high-order semi-explicit discontinuous Galerkin solver for 3D incompressible flow with application to DNS and LES of turbulent channel flow
Authors:
Benjamin Krank,
Niklas Fehn,
Wolfgang A. Wall,
Martin Kronbichler
Abstract:
We present an efficient discontinuous Galerkin scheme for simulation of the incompressible Navier-Stokes equations including laminar and turbulent flow. We consider a semi-explicit high-order velocity-correction method for time integration as well as nodal equal-order discretizations for velocity and pressure. The non-linear convective term is treated explicitly while a linear system is solved for…
▽ More
We present an efficient discontinuous Galerkin scheme for simulation of the incompressible Navier-Stokes equations including laminar and turbulent flow. We consider a semi-explicit high-order velocity-correction method for time integration as well as nodal equal-order discretizations for velocity and pressure. The non-linear convective term is treated explicitly while a linear system is solved for the pressure Poisson equation and the viscous term. The key feature of our solver is a consistent penalty term reducing the local divergence error in order to overcome recently reported instabilities in spatially under-resolved high-Reynolds-number flows as well as small time steps. This penalty method is similar to the grad-div stabilization widely used in continuous finite elements. We further review and compare our method to several other techniques recently proposed in literature to stabilize the method for such flow configurations. The solver is specifically designed for large-scale computations through matrix-free linear solvers including efficient preconditioning strategies and tensor-product elements, which have allowed us to scale this code up to 34.4 billion degrees of freedom and 147,456 CPU cores. We validate our code and demonstrate optimal convergence rates with laminar flows present in a vortex problem and flow past a cylinder and show applicability of our solver to direct numerical simulation as well as implicit large-eddy simulation of turbulent channel flow at $Re_τ=180$ as well as $590$.
△ Less
Submitted 5 July, 2016;
originally announced July 2016.
-
The deal.II Library, Version 8.1
Authors:
Wolfgang Bangerth,
Timo Heister,
Luca Heltai,
Guido Kanschat,
Martin Kronbichler,
Matthias Maier,
Bruno Turcksin,
Toby D. Young
Abstract:
This paper provides an overview of the new features of the finite element library deal.II version 8.1.
This paper provides an overview of the new features of the finite element library deal.II version 8.1.
△ Less
Submitted 31 December, 2013; v1 submitted 8 December, 2013;
originally announced December 2013.