Search | arXiv e-print repository

Code Generation and Performance Engineering for Matrix-Free Finite Element Methods on Hybrid Tetrahedral Grids

Authors: Fabian Böhm, Daniel Bauer, Nils Kohl, Christie Alappat, Dominik Thönnes, Marcus Mohr, Harald Köstler, Ulrich Rüde

Abstract: This paper introduces a code generator designed for node-level optimized, extreme-scalable, matrix-free finite element operators on hybrid tetrahedral grids. It optimizes the local evaluation of bilinear forms through various techniques including tabulation, relocation of loop invariants, and inter-element vectorization - implemented as transformations of an abstract syntax tree. A key contributio… ▽ More This paper introduces a code generator designed for node-level optimized, extreme-scalable, matrix-free finite element operators on hybrid tetrahedral grids. It optimizes the local evaluation of bilinear forms through various techniques including tabulation, relocation of loop invariants, and inter-element vectorization - implemented as transformations of an abstract syntax tree. A key contribution is the development, analysis, and generation of efficient loop patterns that leverage the local structure of the underlying tetrahedral grid. These significantly enhance cache locality and arithmetic intensity, mitigating bandwidth-pressure associated with compute-sparse, low-order operators. The paper demonstrates the generator's capabilities through a comprehensive educational cycle of performance analysis, bottleneck identification, and emission of dedicated optimizations. For three differential operators ($-Δ$, $-\nabla \cdot (k(\mathbf{x})\, \nabla\,)$, $α(\mathbf{x})\, \mathbf{curl}\ \mathbf{curl} + β(\mathbf{x}) $), we determine the set of most effective optimizations. Applied by the generator, they result in speed-ups of up to 58$\times$ compared to reference implementations. Detailed node-level performance analysis yields matrix-free operators with a throughput of 1.3 to 2.1 GDoF/s, achieving up to 62% peak performance on a 36-core Intel Ice Lake socket. Finally, the solution of the curl-curl problem with more than a trillion ($ 10^{12}$) degrees of freedom on 21504 processes in less than 50 seconds demonstrates the generated operators' performance and extreme-scalability as part of a full multigrid solver. △ Less

Submitted 12 April, 2024; originally announced April 2024.

Comments: 22 pages

MSC Class: 65F50; 65N30; 65N55; 65Y20; 65F10

arXiv:2403.01579 [pdf, other]

doi 10.1080/17445760.2024.2360190

A Continuous Benchmarking Infrastructure for High-Performance Computing Applications

Authors: Christoph Alt, Martin Lanser, Jonas Plewinski, Atin Janki, Axel Klawonn, Harald Köstler, Michael Selzer, Ulrich Rüde

Abstract: For scientific software, especially those used for large-scale simulations, achieving good performance and efficiently using the available hardware resources is essential. It is important to regularly perform benchmarks to ensure the efficient use of hardware and software when systems are changing and the software evolves. However, this can become quickly very tedious when many options for paramet… ▽ More For scientific software, especially those used for large-scale simulations, achieving good performance and efficiently using the available hardware resources is essential. It is important to regularly perform benchmarks to ensure the efficient use of hardware and software when systems are changing and the software evolves. However, this can become quickly very tedious when many options for parameters, solvers, and hardware architectures are available. We present a continuous benchmarking strategy that automates benchmarking new code changes on high-performance computing clusters. This makes it possible to track how each code change affects the performance and how it evolves. △ Less

Submitted 3 March, 2024; originally announced March 2024.

Journal ref: International Journal of Parallel, Emergent & Distributed Systems, 2024

arXiv:2402.13171 [pdf, other]

doi 10.1002/cpe.8117

waLBerla-wind: a lattice-Boltzmann-based high-performance flow solver for wind energy applications

Authors: Helen Schottenhamml, Ani Anciaux-Sedrakian, Frédéric Blondel, Harald Köstler, Ulrich Rüde

Abstract: This article presents the development of a new wind turbine simulation software to study wake flow physics. To this end, the design and development of waLBerla-wind, a new simulator based on the lattice-Boltzmann method that is known for its excellent performance and scaling properties, will be presented. Here it will be used for large eddy simulations (LES) coupled with actuator wind turbine mode… ▽ More This article presents the development of a new wind turbine simulation software to study wake flow physics. To this end, the design and development of waLBerla-wind, a new simulator based on the lattice-Boltzmann method that is known for its excellent performance and scaling properties, will be presented. Here it will be used for large eddy simulations (LES) coupled with actuator wind turbine models. Due to its modular software design, waLBerla-wind is flexible and extensible with regard to turbine configurations. Additionally it is performance portable across different hardware architectures, another critical design goal. The new solver is validated by presenting force distributions and velocity profiles and comparing them with experimental data and a vortex solver. Furthermore, waLBerla-wind's performance is \revision{compared to a theoretical peak performance}, and analysed with weak and strong scaling benchmarks on CPU and GPU systems. This analysis demonstrates the suitability for large-scale applications and future cost-effective full wind farm simulations. △ Less

Submitted 8 December, 2023; originally announced February 2024.

Journal ref: Concurrency Computat Pract Exper. 2024;e8117

arXiv:2401.03041 [pdf, other]

Development of a central-moment phase-field lattice Boltzmann model for thermocapillary flows: Droplet capture and computational performance

Authors: Markus Holzer, Travis Mitchell, Christopher R. Leonardi, Ulrich Ruede

Abstract: This study develops a computationally efficient phase-field lattice Boltzmann model with the capability to simulate thermocapillary flows. The model was implemented into the open-source simulation framework, waLBerla, and extended to conduct the collision stage using central moments. The multiphase model was coupled with both a passive-scalar thermal LB, and a RK solution to the energy equation in… ▽ More This study develops a computationally efficient phase-field lattice Boltzmann model with the capability to simulate thermocapillary flows. The model was implemented into the open-source simulation framework, waLBerla, and extended to conduct the collision stage using central moments. The multiphase model was coupled with both a passive-scalar thermal LB, and a RK solution to the energy equation in order to resolve temperature-dependent surface tension phenomena. Various lattice stencils (D3Q7, D3Q15, D3Q19, D3Q27) were tested for the passive-scalar LB and both the second- and fourth-order RK methods were investigated. There was no significant difference observed in the accuracy of the LB or RK schemes. The passive scalar D3Q7 LB discretisation tended to provide computational benefits, while the second order RK scheme is superior in memory usage. This paper makes contributions relating to the modelling of thermocapillary flows and to understanding the behaviour of droplet capture with thermal sources analogous to thermal tweezers. Four primary contributions to the literature are identified. First, a new 3D thermocapillary, central-moment phase-field LB model is presented and implemented in the open-source software, waLBerla. Second, the accuracy and computational performance of various techniques to resolve the energy equation for multiphase, incompressible fluids is investigated. Third, the dynamic droplet transport behaviour in the presence of thermal sources is studied and insight is provided on the potential ability to manipulate droplets based on local domain heating. Finally, a concise analysis of the computational performance together with near-perfect scaling results on NVIDIA and AMD GPU-clusters is shown. This research enables the detailed study of droplet manipulation and control in thermocapillary devices. △ Less

Submitted 5 January, 2024; originally announced January 2024.

arXiv:2310.06952 [pdf, ps, other]

Generalized Golub-Kahan bidiagonalization for nonsymmetric saddle point systems

Authors: Andrei Dumitrasc, Carola Kruse, Ulrich Ruede

Abstract: The generalized Golub-Kahan bidiagonalization has been used to solve saddle-point systems where the leading block is symmetric and positive definite. We extend this iterative method for the case where the symmetry condition no longer holds. We do so by relying on the known connection the algorithm has with the Conjugate Gradient method and following the line of reasoning that adapts the latter int… ▽ More The generalized Golub-Kahan bidiagonalization has been used to solve saddle-point systems where the leading block is symmetric and positive definite. We extend this iterative method for the case where the symmetry condition no longer holds. We do so by relying on the known connection the algorithm has with the Conjugate Gradient method and following the line of reasoning that adapts the latter into the Full Orthogonalization Method. We propose appropriate stop** criteria based on the residual and an estimate of the energy norm for the error associated with the primal variable. Numerical comparison with GMRES highlights the advantages of our proposed strategy regarding its low memory requirements and the associated implications. △ Less

Submitted 10 October, 2023; originally announced October 2023.

Comments: 18 pages, 3 figures

MSC Class: 65F10 (Primary) 65F50; 65N22 (Secondary)

arXiv:2308.01792 [pdf, other]

Fundamental Data Structures for Matrix-Free Finite Elements on Hybrid Tetrahedral Grids

Authors: Nils Kohl, Daniel Bauer, Fabian Böhm, Ulrich Rüde

Abstract: This paper presents efficient data structures for the implementation of matrix-free finite element methods on block-structured, hybrid tetrahedral grids. It provides a complete categorization of all geometric sub-objects that emerge from the regular refinement of the unstructured, tetrahedral coarse grid and describes efficient iteration patterns and analytical linearization functions for the mapp… ▽ More This paper presents efficient data structures for the implementation of matrix-free finite element methods on block-structured, hybrid tetrahedral grids. It provides a complete categorization of all geometric sub-objects that emerge from the regular refinement of the unstructured, tetrahedral coarse grid and describes efficient iteration patterns and analytical linearization functions for the map** of coefficients to memory addresses. This foundation enables the implementation of fast, extreme-scalable, matrix-free, iterative solvers, and in particular geometric multigrid methods by design. Their application to the variable-coefficient Stokes system subject to an enriched Galerkin discretization and to the curl-curl problem discretized with Nédélec edge elements showcases the flexibility of the implementation. Eventually, the solution of a curl-curl problem with $1.6 \cdot 10^{11}$ (more than one hundred billion) unknowns on more than $32000$ processes with a matrix-free full multigrid solver demonstrates its extreme-scalability. △ Less

Submitted 3 August, 2023; originally announced August 2023.

Comments: 21 pages

arXiv:2305.17693 [pdf, other]

doi 10.1137/22M1537266

Deflation for the off-diagonal block in symmetric saddle point systems

Authors: Andrei Dumitrasc, Carola Kruse, Ulrich Ruede

Abstract: Deflation techniques are typically used to shift isolated clusters of small eigenvalues in order to obtain a tighter distribution and a smaller condition number. Such changes induce a positive effect in the convergence behavior of Krylov subspace methods, which are among the most popular iterative solvers for large sparse linear systems. We develop a deflation strategy for symmetric saddle point m… ▽ More Deflation techniques are typically used to shift isolated clusters of small eigenvalues in order to obtain a tighter distribution and a smaller condition number. Such changes induce a positive effect in the convergence behavior of Krylov subspace methods, which are among the most popular iterative solvers for large sparse linear systems. We develop a deflation strategy for symmetric saddle point matrices by taking advantage of their underlying block structure. The vectors used for deflation come from an elliptic singular value decomposition relying on the generalized Golub-Kahan bidiagonalization process. The block targeted by deflation is the off-diagonal one since it features a problematic singular value distribution for certain applications. One example is the Stokes flow in elongated channels, where the off-diagonal block has several small, isolated singular values, depending on the length of the channel. Applying deflation to specific parts of the saddle point system is important when using solvers such as CRAIG, which operates on individual blocks rather than the whole system. The theory is developed by extending the existing framework for deflating square matrices before applying a Krylov subspace method like MINRES. Numerical experiments confirm the merits of our strategy and lead to interesting questions about using approximate vectors for deflation. △ Less

Submitted 14 May, 2024; v1 submitted 28 May, 2023; originally announced May 2023.

Comments: 28 pages, 13 figures

MSC Class: 15A18 (Primary) 65F10; 65F15 (Secondary)

arXiv:2305.15116 [pdf, other]

doi 10.1145/3592979.3593422

Model-Based Performance Analysis of the HyTeG Finite Element Framework

Authors: Dominik Thönnes, Ulrich Rüde

Abstract: In this work, we present how code generation techniques significantly improve the performance of the computational kernels in the HyTeG software framework. This HPC framework combines the performance and memory advantages of matrix-free multigrid solvers with the flexibility of unstructured meshes. The pystencils code generation toolbox is used to replace the original abstract C++ kernels with hig… ▽ More In this work, we present how code generation techniques significantly improve the performance of the computational kernels in the HyTeG software framework. This HPC framework combines the performance and memory advantages of matrix-free multigrid solvers with the flexibility of unstructured meshes. The pystencils code generation toolbox is used to replace the original abstract C++ kernels with highly optimized loop nests. The performance of one of those kernels (the matrix-vector multiplication) is thoroughly analyzed using the Execution-Cache-Memory (ECM) performance model. We validate these predictions by measurements on the SuperMUC-NG supercomputer. The experiments show that the performance mostly matches the predictions. In cases where the prediction does not match, we discuss the discrepancies. Additionally, we conduct a node-level scaling study which shows the expected behavior for a memory-bound compute kernel. △ Less

Submitted 24 May, 2023; originally announced May 2023.

arXiv:2305.09910 [pdf, other]

doi 10.1145/3587135.3592176

Scalable Flow Simulations with the Lattice Boltzmann Method

Authors: Markus Holzer, Jayesh Badwaik, Radim Vavrik, Gabriel Staffelbach, Andreas Herten, Ondrej Vysocky, Ilan Rocchi, Lubomir Riha, Romain Cuidard, Ulrich Ruede

Abstract: The primary goal of the EuroHPC JU project SCALABLE is to develop an industrial Lattice Boltzmann Method (LBM)-based computational fluid dynamics (CFD) solver capable of exploiting current and future extreme scale architectures, expanding current capabilities of existing industrial LBM solvers by at least two orders of magnitude in terms of processor cores and lattice cells, while preserving its a… ▽ More The primary goal of the EuroHPC JU project SCALABLE is to develop an industrial Lattice Boltzmann Method (LBM)-based computational fluid dynamics (CFD) solver capable of exploiting current and future extreme scale architectures, expanding current capabilities of existing industrial LBM solvers by at least two orders of magnitude in terms of processor cores and lattice cells, while preserving its accessibility from both the end-user and software developer's point of view. This is accomplished by transferring technology and knowledge between an academic code (waLBerla) and an industrial code (LaBS). This paper briefly introduces the characteristics and main features of both software packages involved in the process. We also highlight some of the performance achievements in scales of up to tens of thousand of cores presented on one academic and one industrial benchmark case. △ Less

Submitted 16 May, 2023; originally announced May 2023.

arXiv:2301.10674 [pdf, other]

doi 10.1017/jfm.2023.262

Particle-resolved simulation of antidunes in free-surface flows

Authors: Christoph Schwarzmeier, Christoph Rettinger, Samuel Kemmler, Jonas Plewinski, Francisco Núñez-González, Harald Köstler, Ulrich Rüde, Bernhard Vowinckel

Abstract: The interaction of supercritical turbulent flows with granular sediment beds is challenging to study both experimentally and numerically; this challenging task has hampered the advances in understanding antidunes, the most characteristic bedform of supercritical flows. This article presents the first numerical attempt to simulate upstream-migrating antidunes with geometrically resolved particles a… ▽ More The interaction of supercritical turbulent flows with granular sediment beds is challenging to study both experimentally and numerically; this challenging task has hampered the advances in understanding antidunes, the most characteristic bedform of supercritical flows. This article presents the first numerical attempt to simulate upstream-migrating antidunes with geometrically resolved particles and a liquid-gas interface. Our simulations provide data at a resolution higher than laboratory experiments, and they can therefore provide new insights into the mechanisms of antidune migration and contribute to a deeper understanding of the underlying physics. To manage the simulations' computational costs and physical complexity, we employ the cumulant lattice Boltzmann method in conjunction with a discrete element method for particle interactions, as well as a volume of fluid scheme to track the deformable free surface of the fluid. By reproducing two flow configurations of previous experiments (Pascal et al., Earth Surf. Proc. Land., vol. 46(9), 2021, 1750-1765), we demonstrate that our approach is robust and accurately predicts the antidunes' amplitude, wavelength, and celerity. Furthermore, the simulated wall-shear stress, a key parameter governing sediment transport, is in excellent agreement with the experimental measurements. The highly resolved data of fluid and particle motion from our simulation approach open new perspectives for detailed studies of morphodynamics in shallow supercritical flows. △ Less

Submitted 23 March, 2023; v1 submitted 25 January, 2023; originally announced January 2023.

Journal ref: Journal of Fluid Mechanics 961 (2023)

arXiv:2211.02435 [pdf, other]

doi 10.1137/22M1531348

Advanced Automatic Code Generation for Multiple Relaxation-Time Lattice Boltzmann Methods

Authors: Frederik Hennig, Markus Holzer, Ulrich Rüde

Abstract: The scientific code generation package lbmpy supports the automated design and the efficient implementation of lattice Boltzmann methods (LBMs) through metaprogramming. It is based on a new, concise calculus for describing multiple relaxation-time LBMs, including techniques that enable the numerically advantageous subtraction of the constant background component from the populations. These techniq… ▽ More The scientific code generation package lbmpy supports the automated design and the efficient implementation of lattice Boltzmann methods (LBMs) through metaprogramming. It is based on a new, concise calculus for describing multiple relaxation-time LBMs, including techniques that enable the numerically advantageous subtraction of the constant background component from the populations. These techniques are generalized to a wide range of collision spaces and equilibrium distributions. The article contains an overview of lbmpy's front-end and its code generation pipeline, which implements the new LBM calculus by means of symbolic formula manipulation tools and object-oriented programming. The generated codes have only a minimal number of arithmetic operations. Their automatic derivation rests on two novel Chimera transforms that have been specifically developed for efficiently computing raw and central moments. Information contained in the symbolic representation of the methods is further exploited in a customized sequence of algebraic simplifications, further reducing computational cost. When combined, these algebraic transformations lead to concise and compact numerical kernels. Specifically, with these optimizations, the advanced central moment- and cumulant-based methods can be realized with only little additional cost as when compared with the simple BGK method. The effectiveness and flexibility of the new lbmpy code generation system is demonstrated in simulating Taylor-Green vortex decay and the automatic derivation of an LBM algorithm to solve the shallow water equations. △ Less

Submitted 4 November, 2022; originally announced November 2022.

Comments: 23 pages, 6 figures

arXiv:2208.01079 [pdf, other]

doi 10.1002/nla.2484

Inexact inner-outer Golub-Kahan bidiagonalization method: A relaxation strategy

Authors: Vincent Darrigrand, Andrei Dumitrasc, Carola Kruse, Ulrich Ruede

Abstract: We study an inexact inner-outer generalized Golub-Kahan algorithm for the solution of saddle-point problems with a two-times-two block structure. In each outer iteration, an inner system has to be solved which in theory has to be done exactly. Whenever the system is getting large, an inner exact solver is, however, no longer efficient or even feasible and iterative methods must be used. We focus t… ▽ More We study an inexact inner-outer generalized Golub-Kahan algorithm for the solution of saddle-point problems with a two-times-two block structure. In each outer iteration, an inner system has to be solved which in theory has to be done exactly. Whenever the system is getting large, an inner exact solver is, however, no longer efficient or even feasible and iterative methods must be used. We focus this article on a numerical study showing the influence of the accuracy of an inner iterative solution on the accuracy of the solution of the block system. Emphasis is further given on reducing the computational cost, which is defined as the total number of inner iterations. We develop relaxation techniques intended to dynamically change the inner tolerance for each outer iteration to further minimize the total number of inner iterations. We illustrate our findings on a Stokes problem and validate them on a mixed formulation of the Poisson problem. △ Less

Submitted 1 August, 2022; originally announced August 2022.

Comments: 25 pages, 9 figures

MSC Class: 65F10; 65F50; 65N22

arXiv:2207.14496 [pdf, other]

doi 10.1063/5.0131159

Comparison of refilling schemes in the free-surface lattice Boltzmann method

Authors: Christoph Schwarzmeier, Ulrich Rüde

Abstract: Simulating mobile liquid-gas interfaces with the free-surface lattice Boltzmann method (FSLBM) requires frequent re-initialization of fluid flow information in computational cells that convert from gas to liquid. The corresponding algorithm, here referred to as the refilling scheme, is crucial for the successful application of the FSLBM in terms of accuracy and numerical stability. This study comp… ▽ More Simulating mobile liquid-gas interfaces with the free-surface lattice Boltzmann method (FSLBM) requires frequent re-initialization of fluid flow information in computational cells that convert from gas to liquid. The corresponding algorithm, here referred to as the refilling scheme, is crucial for the successful application of the FSLBM in terms of accuracy and numerical stability. This study compares five refilling schemes that extract information from the surrounding liquid and interface cells by averaging, extrapolating, or assuming one of the three different equilibrium states. Six numerical experiments were performed, covering a broad spectrum of possible scenarios. These include a standing gravity wave, a rectangular and cylindrical dam break, a Taylor bubble, a drop impact into liquid, and a bubbly plane Poiseuille flow. In some simulations, the averaging, extrapolation, and one equilibrium-based scheme were numerically unstable. Overall, the results have shown that the simplest equilibrium-based scheme should be preferred in terms of numerical stability, computational cost, accuracy, and ease of implementation. △ Less

Submitted 24 November, 2022; v1 submitted 29 July, 2022; originally announced July 2022.

Comments: arXiv admin note: text overlap with arXiv:2207.13962

Journal ref: AIP Advances 12 (2022)

arXiv:2207.13962 [pdf, other]

doi 10.1002/fld.5173

Analysis and comparison of boundary condition variants in the free-surface lattice Boltzmann method

Authors: Christoph Schwarzmeier, Ulrich Rüde

Abstract: The accuracy of the free-surface lattice Boltzmann method (FSLBM) depends significantly on the boundary condition employed at the free interface. Ideally, the chosen boundary condition balances the forces exerted by the liquid and gas pressure. Different variants of the same boundary condition are possible, depending on the number and choice of the particle distribution functions (PDFs) to which i… ▽ More The accuracy of the free-surface lattice Boltzmann method (FSLBM) depends significantly on the boundary condition employed at the free interface. Ideally, the chosen boundary condition balances the forces exerted by the liquid and gas pressure. Different variants of the same boundary condition are possible, depending on the number and choice of the particle distribution functions (PDFs) to which it is applied. This study analyzes and compares four variants, in which (i) the boundary condition is applied to all PDFs oriented in the opposite direction of the free interface's normal vector, including or (ii) excluding the central PDF. While these variants overwrite existing information, the boundary condition can also be applied (iii) to only missing PDFs without drop** available data or (iv) to only missing PDFs but at least three PDFs as suggested in the literature. It is shown that neither variant generally balances the forces exerted by the liquid and gas pressure at the free surface. The four variants' accuracy was compared in five different numerical experiments covering various applications. These include a standing gravity wave, a rectangular and cylindrical dam break, a rising Taylor bubble, and a droplet impacting a thin pool of liquid. Overall, variant (iii) was substantially more accurate than the other variants in the numerical experiments performed in this study. △ Less

Submitted 22 January, 2023; v1 submitted 28 July, 2022; originally announced July 2022.

Journal ref: International Journal for Numerical Methods in Fluids (2023)

arXiv:2206.11637 [pdf, other]

doi 10.1016/j.jcp.2022.111753

Comparison of free-surface and conservative Allen-Cahn phase-field lattice Boltzmann method

Authors: Christoph Schwarzmeier, Markus Holzer, Travis Mitchell, Moritz Lehmann, Fabian Häusl, Ulrich Rüde

Abstract: This study compares the free-surface lattice Boltzmann method (FSLBM) with the conservative Allen-Cahn phase-field lattice Boltzmann method (PFLBM) in their ability to model two-phase flows in which the behavior of the system is dominated by the heavy phase. Both models are introduced and their individual properties, strengths and weaknesses are thoroughly discussed. Six numerical benchmark cases… ▽ More This study compares the free-surface lattice Boltzmann method (FSLBM) with the conservative Allen-Cahn phase-field lattice Boltzmann method (PFLBM) in their ability to model two-phase flows in which the behavior of the system is dominated by the heavy phase. Both models are introduced and their individual properties, strengths and weaknesses are thoroughly discussed. Six numerical benchmark cases were simulated with both models, including (i) a standing gravity and (ii) capillary wave, (iii) an unconfined rising gas bubble in liquid, (iv) a Taylor bubble in a cylindrical tube, and (v) the vertical and (vi) oblique impact of a drop into a pool of liquid. Comparing the simulation results with either analytical models or experimental data from the literature, four major observations were made. Firstly, the PFLBM selected was able to simulate flows purely governed by surface tension with reasonable accuracy. Secondly, the FSLBM, a sharp interface model, generally requires a lower resolution than the PFLBM, a diffuse interface model. However, in the limit case of a standing wave, this was not observed. Thirdly, in simulations of a bubble moving in a liquid, the FSLBM accurately predicted the bubble's shape and rise velocity with low computational resolution. Finally, the PFLBM's accuracy is found to be sensitive to the choice of the model's mobility parameter and interface width. △ Less

Submitted 24 November, 2022; v1 submitted 23 June, 2022; originally announced June 2022.

Journal ref: Journal of Computational Physics 473 (2023)

arXiv:2205.07543 [pdf, other]

Effect of Sediment Form and Form Distribution on Porosity: A Simulation Study Based on the Discrete Element Method

Authors: Christoph Rettinger, Ulrich Rüde, Stefan Vollmer, Roy M. Frings

Abstract: Porosity is one of the key properties of dense particle packings like sediment deposits and is influenced by a multitude of grain characteristics such as their size distribution and shape. In the present work, we focus on the form, a specific aspect of the overall shape, of sedimentary grains in order to investigate and quantify its effect on porosity, ultimately deriving novel porosity-prediction… ▽ More Porosity is one of the key properties of dense particle packings like sediment deposits and is influenced by a multitude of grain characteristics such as their size distribution and shape. In the present work, we focus on the form, a specific aspect of the overall shape, of sedimentary grains in order to investigate and quantify its effect on porosity, ultimately deriving novel porosity-prediction models. To this end, we develop a robust and accurate simulation tool based on the discrete element method which we validate against laboratory experiments. Utilizing digital representations of actual sediment from the Rhine river, we first study packings that are composed of particles with a single form. There, the porosity is found to be mainly determined by the inverse equancy, i.e., the ratio of the longest to the smallest form-defining axis. Only for small ratios, additional shape-related properties become relevant, as revealed by a direct comparison to packings of form-equivalent ellipsoids. Since sediment naturally features form mixtures, we extend our simulation tool to study sediment packings with normally-distributed forms. In agreement with our single form studies, the porosity depends primarily on the inverse of the mean equancy. By supplying additional information about a second form factor and the standard deviations, we derive an accurate model for porosity prediction. Due to its simplicity, it can be readily applied to sediment packings for which some measurements of flatness and elongation, the two most common form factors, are available. △ Less

Submitted 16 May, 2022; originally announced May 2022.

arXiv:2112.04353 [pdf, ps, other]

A decoupled numerical method for two-phase flows of different densities and viscosities in superposed fluid and porous layers

Authors: Yali Gao, Daozhi Han, Xiaoming He, Ulrich Rüde

Abstract: In this article we consider the numerical modeling and simulation via the phase field approach of two-phase flows of different densities and viscosities in superposed fluid and porous layers. The model consists of the Cahn-Hilliard-Navier-Stokes equations in the free flow region and the Cahn-Hilliard-Darcy equations in porous media that are coupled by seven domain interface boundary conditions. We… ▽ More In this article we consider the numerical modeling and simulation via the phase field approach of two-phase flows of different densities and viscosities in superposed fluid and porous layers. The model consists of the Cahn-Hilliard-Navier-Stokes equations in the free flow region and the Cahn-Hilliard-Darcy equations in porous media that are coupled by seven domain interface boundary conditions. We show that the coupled model satisfies an energy law. Based on the ideas of pressure stabilization and artificial compressibility, we propose an unconditionally stable time step** method that decouples the computation of the phase field variable, the velocity and pressure of free flow, the velocity and pressure of porous media, hence significantly reduces the computational cost. The energy stability of the scheme effected with the finite element spatial discretization is rigorously established. We verify numerically that our schemes are convergent and energy-law preserving. Ample numerical experiments are performed to illustrate the features of two-phase flows in the coupled free flow and porous media setting. △ Less

Submitted 8 December, 2021; originally announced December 2021.

arXiv:2103.10882 [pdf, other]

Coupling fully resolved light particles with the Lattice Boltzmann method on adaptively refined grids

Authors: Lukas Werner, Christoph Rettinger, Ulrich Rüde

Abstract: The simulation of geometrically resolved rigid particles in a fluid relies on coupling algorithms to transfer momentum both ways between the particles and the fluid. In this article, the fluid flow is modeled with a parallel Lattice Boltzmann method using adaptive grid refinement to improve numerical efficiency. The coupling with the particles is realized with the momentum exchange method. When im… ▽ More The simulation of geometrically resolved rigid particles in a fluid relies on coupling algorithms to transfer momentum both ways between the particles and the fluid. In this article, the fluid flow is modeled with a parallel Lattice Boltzmann method using adaptive grid refinement to improve numerical efficiency. The coupling with the particles is realized with the momentum exchange method. When implemented in plain form, instabilities may arise in the coupling when the particles are lighter than the fluid. The algorithm can be stabilized with a virtual mass correction specifically developed for the Lattice Boltzmann method. The method is analyzed for a wide set of physically relevant regimes, varying independently the body-to-fluid density ratio and the relative magnitude of inertial and viscous effects. These studies of a single rising particle exhibit periodic regimes of particle motion as well as chaotic behavior, as previously reported in the literature. The new method is carefully compared with available experimental and numerical results. This serves to validate the presented new coupled Lattice Boltzmann method and additionally it leads to new physical insight for some of the parameter settings. △ Less

Submitted 19 March, 2021; originally announced March 2021.

arXiv:2103.04103 [pdf, other]

doi 10.1017/jfm.2021.870

Rheology of mobile sediment beds in laminar shear flow: effects of creep and polydispersity

Authors: Christoph Rettinger, Sebastian Eibl, Ulrich Rüde, Bernhard Vowinckel

Abstract: Classical scaling relationships for rheological quantities such as the $μ(J)$-rheology have become increasingly popular for closures of two-phase flow modeling. However, these frameworks have been derived for monodisperse particles. We aim to extend these considerations to sediment transport modeling by using a more realistic sediment composition. We investigate the rheological behavior of sheared… ▽ More Classical scaling relationships for rheological quantities such as the $μ(J)$-rheology have become increasingly popular for closures of two-phase flow modeling. However, these frameworks have been derived for monodisperse particles. We aim to extend these considerations to sediment transport modeling by using a more realistic sediment composition. We investigate the rheological behavior of sheared sediment beds composed of polydisperse spherical particles in a laminar Couette-type shear flow. The sediment beds consist of particles with a diameter size ratio of up to ten, which corresponds to grains ranging from fine to coarse sand. The data was generated using fully coupled, grain resolved direct numerical simulations using a combined lattice Boltzmann - discrete element method. These highly-resolved data yield detailed depth-resolved profiles of the relevant physical quantities that determine the rheology, i.e., the local shear rate of the fluid, particle volume fraction, total shear, and granular pressure. A comparison against experimental data shows excellent agreement for the monodisperse case. We improve upon the parameterization of the $μ(J)$-rheology by expressing its empirically derived parameters as a function of the maximum particle volume fraction. Furthermore, we extend these considerations by exploring the cree** regime for viscous numbers much lower than used by previous studies to calibrate these correlations. Considering the low viscous numbers of our data, we found that the friction coefficient governing the quasi-static state in the cree** regime tends to a finite value for vanishing shear, which decreases the critical friction coefficient by a factor of three for all cases investigated. △ Less

Submitted 30 June, 2021; v1 submitted 6 March, 2021; originally announced March 2021.

arXiv:2103.02388 [pdf, other]

A massively parallel Eulerian-Lagrangian method for advection-dominated transport in viscous fluids

Authors: Nils Kohl, Marcus Mohr, Sebastian Eibl, Ulrich Rüde

Abstract: Motivated by challenges in Earth mantle convection, we present a massively parallel implementation of an Eulerian-Lagrangian method for the advection-diffusion equation in the advection-dominated regime. The advection term is treated by a particle-based, characteristics method coupled to a block-structured finite-element framework. Its numerical and computational performance is evaluated in multip… ▽ More Motivated by challenges in Earth mantle convection, we present a massively parallel implementation of an Eulerian-Lagrangian method for the advection-diffusion equation in the advection-dominated regime. The advection term is treated by a particle-based, characteristics method coupled to a block-structured finite-element framework. Its numerical and computational performance is evaluated in multiple, two- and three-dimensional benchmarks, including curved geometries, discontinuous solutions, pure advection, and it is applied to a coupled non-linear system modeling buoyancy-driven convection in Stokes flow. We demonstrate the parallel performance in a strong and weak scaling experiment, with scalability to up to $147,456$ parallel processes, solving for more than $5.2 \times 10^{10}$ (52 billion) degrees of freedom per time-step. △ Less

Submitted 3 March, 2021; originally announced March 2021.

Comments: 22 pages

MSC Class: 65M25; 65Y05; 65M60

arXiv:2012.06144 [pdf, other]

Highly Efficient Lattice-Boltzmann Multiphase Simulations of Immiscible Fluids at High-Density Ratios on CPUs and GPUs through Code Generation

Authors: Markus Holzer, Martin Bauer, Ulrich Rüde

Abstract: A high-performance implementation of a multiphase lattice Boltzmann method based on the conservative Allen-Cahn model supporting high-density ratios and high Reynolds numbers is presented. Metaprogramming techniques are used to generate optimized code for CPUs and GPUs automatically. The coupled model is specified in a high-level symbolic description and optimized through automatic transformations… ▽ More A high-performance implementation of a multiphase lattice Boltzmann method based on the conservative Allen-Cahn model supporting high-density ratios and high Reynolds numbers is presented. Metaprogramming techniques are used to generate optimized code for CPUs and GPUs automatically. The coupled model is specified in a high-level symbolic description and optimized through automatic transformations. The memory footprint of the resulting algorithm is reduced through the fusion of compute kernels. A roofline analysis demonstrates the excellent efficiency of the generated code on a single GPU. The resulting single GPU code has been integrated into the multiphysics framework waLBerla to run massively parallel simulations on large domains. Communication hiding and GPUDirect-enabled MPI yield near-perfect scaling behaviour. Scaling experiments are conducted on the Piz Daint supercomputer with up to 2048 GPUs, simulating several hundred fully resolved bubbles. Further, validation of the implementation is shown in a physically relevant scenario-a three-dimensional rising air bubble in water. △ Less

Submitted 11 December, 2020; originally announced December 2020.

Comments: 17 pages, 9 figures

arXiv:2010.13513 [pdf, other]

Textbook efficiency: massively parallel matrix-free multigrid for the Stokes system

Authors: Nils Kohl, Ulrich Rüde

Abstract: We employ textbook multigrid efficiency (TME), as introduced by Achi Brandt, to construct an asymptotically optimal monolithic multigrid solver for the Stokes system. The geometric multigrid solver builds upon the concept of hierarchical hybrid grids (HHG), which is extended to higher-order finite-element discretizations, and a corresponding matrix-free implementation. The computational cost of th… ▽ More We employ textbook multigrid efficiency (TME), as introduced by Achi Brandt, to construct an asymptotically optimal monolithic multigrid solver for the Stokes system. The geometric multigrid solver builds upon the concept of hierarchical hybrid grids (HHG), which is extended to higher-order finite-element discretizations, and a corresponding matrix-free implementation. The computational cost of the full multigrid (FMG) iteration is quantified, and the solver is applied to multiple benchmark problems. Through a parameter study, we suggest configurations that achieve TME for both, stabilized equal-order, and Taylor-Hood discretizations. The excellent node-level performance of the relevant compute kernels is presented via a roofline analysis. Finally, we demonstrate the weak and strong scalability to up to $147,456$ parallel processes and solve Stokes systems with more than $3.6 \times 10^{12}$ (trillion) unknowns. △ Less

Submitted 26 October, 2020; originally announced October 2020.

Comments: 22 pages, 7 figures

MSC Class: 65F10; 65N30; 65N55

arXiv:2010.13342 [pdf, other]

Resiliency in Numerical Algorithm Design for Extreme Scale Simulations

Authors: Emmanuel Agullo, Mirco Altenbernd, Hartwig Anzt, Leonardo Bautista-Gomez, Tommaso Benacchio, Luca Bonaventura, Hans-Joachim Bungartz, Sanjay Chatterjee, Florina M. Ciorba, Nathan DeBardeleben, Daniel Drzisga, Sebastian Eibl, Christian Engelmann, Wilfried N. Gansterer, Luc Giraud, Dominik Goeddeke, Marco Heisig, Fabienne Jezequel, Nils Kohl, Xiaoye Sherry Li, Romain Lion, Miriam Mehl, Paul Mycek, Michael Obersteiner, Enrique S. Quintana-Orti , et al. (11 additional authors not shown)

Abstract: This work is based on the seminar titled ``Resiliency in Numerical Algorithm Design for Extreme Scale Simulations'' held March 1-6, 2020 at Schloss Dagstuhl, that was attended by all the authors. Naive versions of conventional resilience techniques will not scale to the exascale regime: with a main memory footprint of tens of Petabytes, synchronously writing checkpoint data all the way to backgr… ▽ More This work is based on the seminar titled ``Resiliency in Numerical Algorithm Design for Extreme Scale Simulations'' held March 1-6, 2020 at Schloss Dagstuhl, that was attended by all the authors. Naive versions of conventional resilience techniques will not scale to the exascale regime: with a main memory footprint of tens of Petabytes, synchronously writing checkpoint data all the way to background storage at frequent intervals will create intolerable overheads in runtime and energy consumption. Forecasts show that the mean time between failures could be lower than the time to recover from such a checkpoint, so that large calculations at scale might not make any progress if robust alternatives are not investigated. More advanced resilience techniques must be devised. The key may lie in exploiting both advanced system features as well as specific application knowledge. Research will face two essential questions: (1) what are the reliability requirements for a particular computation and (2) how do we best design the algorithms and software to meet these requirements? One avenue would be to refine and improve on system- or application-level checkpointing and rollback strategies in the case an error is detected. Developers might use fault notification interfaces and flexible runtime systems to respond to node failures in an application-dependent fashion. Novel numerical algorithms or more stochastic computational approaches may be required to meet accuracy requirements in the face of undetectable soft errors. The goal of this Dagstuhl Seminar was to bring together a diverse group of scientists with expertise in exascale computing to discuss novel ways to make applications resilient against detected and undetected faults. In particular, participants explored the role that algorithms and applications play in the holistic approach needed to tackle this challenge. △ Less

Submitted 26 October, 2020; originally announced October 2020.

Comments: 45 pages, 3 figures, submitted to The International Journal of High Performance Computing Applications

ACM Class: D.4.5; G.4; G.1; D.4.4

arXiv:2008.13046 [pdf]

doi 10.1063/5.0025505

Densification of Single-Walled Carbon Nanotube Films: Mesoscopic Distinct Element Method Simulations and Experimental Validation

Authors: Grigorii Drozdov, Igor Ostanin, Hao Xu, Yuezhou Wang, Traian Dumitrică, Artem Grebenko, Alexey P. Tsapenko, Yuriy Gladush, Georgy Ermolaev, Valentyn S. Volkov, Sebastian Eibl, Ulrich Rüde, Albert G. Nasibulin

Abstract: Nanometer thin single-walled carbon nanotube (CNT) films collected from the aerosol chemical deposition reactors have gathered attention for their promising applications. Densification of these pristine films provides an important way to manipulate the mechanical, electronic, and optical properties. To elucidate the underlying microstructural level restructuring, which is ultimately responsible fo… ▽ More Nanometer thin single-walled carbon nanotube (CNT) films collected from the aerosol chemical deposition reactors have gathered attention for their promising applications. Densification of these pristine films provides an important way to manipulate the mechanical, electronic, and optical properties. To elucidate the underlying microstructural level restructuring, which is ultimately responsible for the change in properties, we perform large scale vector-based mesoscopic distinct element method simulations in conjunction with electron microscopy and spectroscopic ellipsometry characterization of pristine and densified films by drop-cast volatile liquid processing. Matching the microscopy observations, pristine CNT films with finite thickness are modeled as self-assembled CNT networks comprising entangled dendritic bundles with branches extending down to individual CNTs. Simulations of the film under uniaxial compression uncover an ultra-soft densification regime extending to a ~75% strain, which is likely accessible with the surface tensional forces arising from liquid surface tension during the evaporation. When removing the loads, the pre-compressed samples evolve into homogeneously densified films with thickness values depending on both the pre-compression level and the sample microstructure. The significant reduction in thickness, confirmed by our spectroscopic ellipsometry, is attributed to the underlying structural changes occurring at the 100 nm scale, including the zip** of the thinnest dendritic branches. △ Less

Submitted 29 August, 2020; originally announced August 2020.

Comments: 12 figures

arXiv:2003.01490 [pdf, other]

An efficient four-way coupled lattice Boltzmann - discrete element method for fully resolved simulations of particle-laden flows

Authors: Christoph Rettinger, Ulrich Rüde

Abstract: A four-way coupling scheme for the direct numerical simulation of particle-laden flows is developed and analyzed. It employs a novel adaptive multi-relaxation time lattice Boltzmann method to simulate the fluid phase efficiently. The momentum exchange method is used to couple the fluid and the particulate phase. The particle interactions in normal and tangential direction are accounted for by a di… ▽ More A four-way coupling scheme for the direct numerical simulation of particle-laden flows is developed and analyzed. It employs a novel adaptive multi-relaxation time lattice Boltzmann method to simulate the fluid phase efficiently. The momentum exchange method is used to couple the fluid and the particulate phase. The particle interactions in normal and tangential direction are accounted for by a discrete element method using linear contact forces. All parameters of the scheme are studied and evaluated in detail and precise guidelines for their choice are developed. The development is based on several carefully selected calibration and validation tests of increasing physical complexity. It is found that a well-calibrated lubrication model is crucial to obtain the correct trajectories of a sphere colliding with a plane wall in a viscous fluid. For adequately resolving the collision dynamics it is found that the collision time must be stretched appropriately. The complete set of tests establishes a validation pipeline that can be universally applied to other fluid-particle coupling schemes providing a systematic methodology that can guide future developments. △ Less

Submitted 3 March, 2020; originally announced March 2020.

arXiv:2001.11806 [pdf, other]

lbmpy: Automatic code generation for efficient parallel lattice Boltzmann methods

Authors: Martin Bauer, Harald Köstler, Ulrich Rüde

Abstract: Lattice Boltzmann methods are a popular mesoscopic alternative to macroscopic computational fluid dynamics solvers. Many variants have been developed that vary in complexity, accuracy, and computational cost. Extensions are available to simulate multi-phase, multi-component, turbulent, or non-Newtonian flows. In this work we present lbmpy, a code generation package that supports a wide variety of… ▽ More Lattice Boltzmann methods are a popular mesoscopic alternative to macroscopic computational fluid dynamics solvers. Many variants have been developed that vary in complexity, accuracy, and computational cost. Extensions are available to simulate multi-phase, multi-component, turbulent, or non-Newtonian flows. In this work we present lbmpy, a code generation package that supports a wide variety of different methods and provides a generic development environment for new schemes as well. A high-level domain-specific language allows the user to formulate, extend and test various lattice Boltzmann schemes. The method specification is represented in a symbolic intermediate representation. Transformations that operate on this intermediate representation optimize and parallelize the method, yielding highly efficient lattice Boltzmann compute kernels not only for single- and two-relaxation-time schemes but also for multi-relaxation-time, cumulant, and entropically stabilized methods. An integration into the HPC framework waLBerla makes massively parallel, distributed simulations possible, which is demonstrated through scaling experiments on the SuperMUC-NG supercomputing system △ Less

Submitted 11 April, 2020; v1 submitted 31 January, 2020; originally announced January 2020.

arXiv:2001.10424 [pdf, ps, other]

Parallel solution of saddle point systems with nested iterative solvers based on the Golub-Kahan Bidiagonalization

Authors: Carola Kruse, Masha Sosonkina, Mario Arioli, Nicolas Tardieu, Ulrich Ruede

Abstract: We present a scalability study of Golub-Kahan bidiagonalization for the parallel iterative solution of symmetric indefinite linear systems with a 2x2 block structure. The algorithms have been implemented within the parallel numerical library PETSc. Since a nested inner-outer iteration strategy may be necessary, we investigate different choices for the inner solvers, including parallel sparse direc… ▽ More We present a scalability study of Golub-Kahan bidiagonalization for the parallel iterative solution of symmetric indefinite linear systems with a 2x2 block structure. The algorithms have been implemented within the parallel numerical library PETSc. Since a nested inner-outer iteration strategy may be necessary, we investigate different choices for the inner solvers, including parallel sparse direct and multigrid accelerated iterative methods. We show the strong and weak scalability of the Golub-Kahan bidiagonalization based iterative method when applied to a two-dimensional Poiseuille flow and to two- and three-dimensional Stokes test problems. △ Less

Submitted 28 January, 2020; originally announced January 2020.

arXiv:1909.13772 [pdf, other]

doi 10.1016/j.camwa.2020.01.007

waLBerla: A block-structured high-performance framework for multiphysics simulations

Authors: Martin Bauer, Sebastian Eibl, Christian Godenschwager, Nils Kohl, Michael Kuron, Christoph Rettinger, Florian Schornbaum, Christoph Schwarzmeier, Dominik Thönnes, Harald Köstler, Ulrich Rüde

Abstract: Programming current supercomputers efficiently is a challenging task. Multiple levels of parallelism on the core, on the compute node, and between nodes need to be exploited to make full use of the system. Heterogeneous hardware architectures with accelerators further complicate the development process. waLBerla addresses these challenges by providing the user with highly efficient building blocks… ▽ More Programming current supercomputers efficiently is a challenging task. Multiple levels of parallelism on the core, on the compute node, and between nodes need to be exploited to make full use of the system. Heterogeneous hardware architectures with accelerators further complicate the development process. waLBerla addresses these challenges by providing the user with highly efficient building blocks for develo** simulations on block-structured grids. The block-structured domain partitioning is flexible enough to handle complex geometries, while the structured grid within each block allows for highly efficient implementations of stencil-based algorithms. We present several example applications realized with waLBerla, ranging from lattice Boltzmann methods to rigid particle simulations. Most importantly, these methods can be coupled together, enabling multiphysics simulations. The framework uses meta-programming techniques to generate highly efficient code for CPUs and GPUs from a symbolic method formulation. To ensure software quality and performance portability, a continuous integration toolchain automatically runs an extensive test suite encompassing multiple compilers, hardware architectures, and software configurations. △ Less

Submitted 30 September, 2019; originally announced September 2019.

arXiv:1908.11746 [pdf, ps, other]

On numerical solution of full rank linear systems

Authors: A. Dumitrasc, Ph. Leleux, C. Popa, D. Ruiz, U. Ruede

Abstract: Matrices can be augmented by adding additional columns such that a partitioning of the matrix in blocks of rows defines mutually orthogonal subspaces. This augmented system can then be solved efficiently by a sum of projections onto these subspaces. The equivalence to the original linear system is ensured by adding additional rows to the matrix in a specific form. The resulting solution method is… ▽ More Matrices can be augmented by adding additional columns such that a partitioning of the matrix in blocks of rows defines mutually orthogonal subspaces. This augmented system can then be solved efficiently by a sum of projections onto these subspaces. The equivalence to the original linear system is ensured by adding additional rows to the matrix in a specific form. The resulting solution method is known as the augmented block Cimmino method. Here this method is extended to full rank underdetermined systems and to overdetermined systems. In the latter case, rows of the matrix, not columns, must be suitably augmented. The article presents an analysis of these methods. △ Less

Submitted 30 August, 2019; originally announced August 2019.

arXiv:1908.08666 [pdf, other]

Stencil scaling for vector-valued PDEs on hybrid grids with applications to generalized Newtonian fluids

Authors: Daniel Drzisga, Ulrich Rüde, Barbara Wohlmuth

Abstract: Matrix-free finite element implementations for large applications provide an attractive alternative to standard sparse matrix data formats due to the significantly reduced memory consumption. Here, we show that they are also competitive with respect to the run time in the low order case if combined with suitable stencil scaling techniques. We focus on variable coefficient vector-valued partial dif… ▽ More Matrix-free finite element implementations for large applications provide an attractive alternative to standard sparse matrix data formats due to the significantly reduced memory consumption. Here, we show that they are also competitive with respect to the run time in the low order case if combined with suitable stencil scaling techniques. We focus on variable coefficient vector-valued partial differential equations as they arise in many physical applications. The presented method is based on scaling constant reference stencils originating from a linear finite element discretization instead of evaluating the bilinear forms on-the-fly. This method assumes the usage of hierarchical hybrid grids, and it may be applied to vector-valued second-order elliptic partial differential equations directly or as a part of more complicated problems. We provide theoretical and experimental performance estimates showing the advantages of this new approach compared to the traditional on-the-fly integration and stored matrix approaches. In our numerical experiments, we consider two specific mathematical models. Namely, linear elastostatics and incompressible Stokes flow. The final example considers a non-linear shear-thinning generalized Newtonian fluid. For this type of non-linearity, we present an efficient approach to compute a regularized strain rate which is then used to define the node-wise viscosity. Depending on the compute architecture, we could observe maximum speedups of 64% and 122% compared to the on-the-fly integration. The largest considered example involved solving a Stokes problem with 12288 compute cores on the state of the art supercomputer SuperMUC-NG. △ Less

Submitted 18 March, 2020; v1 submitted 23 August, 2019; originally announced August 2019.

arXiv:1906.10963 [pdf, other]

A Modular and Extensible Software Architecture for Particle Dynamics

Authors: Sebastian Eibl, Ulrich Rüde

Abstract: Creating a highly parallel and flexible discrete element software requires an interdisciplinary approach, where expertise from different disciplines is combined. On the one hand domain specialists provide interaction models between particles. On the other hand high-performance computing specialists optimize the code to achieve good performance on different hardware architectures. In particular, th… ▽ More Creating a highly parallel and flexible discrete element software requires an interdisciplinary approach, where expertise from different disciplines is combined. On the one hand domain specialists provide interaction models between particles. On the other hand high-performance computing specialists optimize the code to achieve good performance on different hardware architectures. In particular, the software must be carefully crafted to achieve good scaling on massively parallel supercomputers. Combining all this in a flexible and extensible, widely usable software is a challenging task. In this article we outline the design decisions and concepts of a newly developed particle dynamics code MESA-PD that is implemented as part of the waLBerla multi-physics framework. Extensibility, flexibility, but also performance and scalability are primary design goals for the new software framework. In particular, the new modular architecture is designed such that physical models can be modified and extended by domain scientists without understanding all details of the parallel computing functionality and the underlying distributed data structures that are needed to achieve good performance on current supercomputer architectures. This goal is achieved by combining the high performance simulation framework waLBerla with code generation techniques. All code and the code generator are released as open source under GPLv3 within the publicly available waLBerla framework (www.walberla.net). △ Less

Submitted 26 June, 2019; originally announced June 2019.

Comments: Proceedings Of The 8Th International Conference On Discrete Element Methods

arXiv:1906.06884 [pdf, other]

Validation and calibration of coupled porous-medium and free-flow problems using pore-scale resolved models

Authors: Iryna Rybak, Christoph Schwarzmeier, Elissa Eggenweiler, Ulrich Rüde

Abstract: The correct choice of interface conditions and effective parameters for coupled macroscale free-flow and porous-medium models is crucial for a complete mathematical description of the problem under consideration and for accurate numerical simulation of applications. We consider single-fluid-phase systems described by the Stokes-Darcy model. Different sets of coupling conditions for this model are… ▽ More The correct choice of interface conditions and effective parameters for coupled macroscale free-flow and porous-medium models is crucial for a complete mathematical description of the problem under consideration and for accurate numerical simulation of applications. We consider single-fluid-phase systems described by the Stokes-Darcy model. Different sets of coupling conditions for this model are available. However, the choice of these conditions and effective model parameters is often arbitrary. We use large scale lattice Boltzmann simulations to validate coupling conditions by comparison of the macroscale simulations against pore-scale resolved models. We analyse two settings (lid driven cavity over a porous bed and infiltration problem) with different geometrical configurations (channelised and staggered distributions of solid grains) and different sets of interface conditions. Effective parameters for the macroscale models are computed numerically for each geometrical configuration. Numerical simulation results demonstrate the sensitivity of the coupled Stokes-Darcy problem to the location of the sharp fluid-porous interface, the effective model parameters and the interface conditions. △ Less

Submitted 26 June, 2019; v1 submitted 17 June, 2019; originally announced June 2019.

MSC Class: 68N99; 76D07; 76S05

arXiv:1905.05042 [pdf, other]

Computational Study of Ultrathin CNT Films with the Scalable Mesoscopic Distinct Element Method

Authors: Igor Ostanin, Traian Dumitrică, Sebastian Eibl, Ulrich Rüde

Abstract: In this work we present a computational study of the small strain mechanics of freestanding ultrathin CNT films under in-plane loading. The numerical modeling of the mechanics of representatively large specimens with realistic micro- and nanostructure is presented. Our simulations utilize the scalable implementation of the mesoscopic distinct element method of the waLBerla multi-physics framework.… ▽ More In this work we present a computational study of the small strain mechanics of freestanding ultrathin CNT films under in-plane loading. The numerical modeling of the mechanics of representatively large specimens with realistic micro- and nanostructure is presented. Our simulations utilize the scalable implementation of the mesoscopic distinct element method of the waLBerla multi-physics framework. Within our modeling approach, CNTs are represented as chains of interacting rigid segments. Neighboring segments in the chain are connected with elastic bonds, resolving tension, bending, shear and torsional deformations. These bonds represent a covalent bonding within CNT surface and utilize Enhanced Vector Model (EVM) formalism. Segments of the neighboring CNTs interact with realistic coarse-grained anisotropic vdW potential, enabling relative slip of CNTs in contact. The advanced simulation technique allowed us to gain useful insights on the behavior of CNT materials. In particular, it was established that the energy dissipation during CNT sliding leads to extended load transfer that conditions material-like mechanical response of the weakly bonded assemblies of CNTs. △ Less

Submitted 19 October, 2019; v1 submitted 13 May, 2019; originally announced May 2019.

arXiv:1811.12742 [pdf, other]

Dynamic Load Balancing Techniques for Particulate Flow Simulations

Authors: Christoph Rettinger, Ulrich Rüde

Abstract: Parallel multiphysics simulations often suffer from load imbalances originating from the applied coupling of algorithms with spatially and temporally varying workloads. It is thus desirable to minimize these imbalances to reduce the time to solution and to better utilize the available hardware resources. Taking particulate flows as an illustrating example application, we present and evaluate load… ▽ More Parallel multiphysics simulations often suffer from load imbalances originating from the applied coupling of algorithms with spatially and temporally varying workloads. It is thus desirable to minimize these imbalances to reduce the time to solution and to better utilize the available hardware resources. Taking particulate flows as an illustrating example application, we present and evaluate load balancing techniques that tackle this challenging task. This involves a load estimation step in which the currently generated workload is predicted. We describe in detail how such a workload estimator can be developed. In a second step, load distribution strategies like space-filling curves or graph partitioning are applied to dynamically distribute the load among the available processes. To compare and analyze their performance, we employ these techniques to a benchmark scenario and observe a reduction of the load imbalances by almost a factor of four. This results in a decrease of the overall runtime by 14% for space-filling curves. △ Less

Submitted 30 November, 2018; originally announced November 2018.

arXiv:1808.07677 [pdf, other]

An iterative generalized Golub-Kahan algorithm for problems in structural mechanics

Authors: Mario Arioli, Carola Kruse, Ulrich Ruede, Nicolas Tardieu

Abstract: This paper studies the Craig variant of the Golub-Kahan bidiagonalization algorithm as an iterative solver for linear systems with saddle point structure. Such symmetric indefinite systems in 2x2 block form arise in many applications, but standard iterative solvers are often found to perform poorly on them and robust preconditioners may not be available. Specifically, such systems arise in structu… ▽ More This paper studies the Craig variant of the Golub-Kahan bidiagonalization algorithm as an iterative solver for linear systems with saddle point structure. Such symmetric indefinite systems in 2x2 block form arise in many applications, but standard iterative solvers are often found to perform poorly on them and robust preconditioners may not be available. Specifically, such systems arise in structural mechanics, when a semidefinite finite element stiffness matrix is augmented with linear multi-point constraints via Lagrange multipliers. Engineers often use such multi-point constraints to introduce boundary or coupling conditions into complex finite element models. The article will present a systematic convergence study of the Golub-Kahan algorithm for a sequence of test problems of increasing complexity, including concrete structures enforced with pretension cables and the coupled finite element model of a reactor containment building. When the systems are suitably transformed using augmented Lagrangians on the semidefinite block and when the constraint equations are properly scaled, the Golub-Kahan algorithm is found to exhibit excellent convergence that depends only weakly on the size of the model. The new algorithm is found to be robust in practical cases that are otherwise considered to be difficult for iterative solvers. △ Less

Submitted 23 August, 2018; originally announced August 2018.

arXiv:1808.00829 [pdf, other]

doi 10.1016/j.cpc.2019.06.020

A Systematic Comparison of Dynamic Load Balancing Algorithms for Massively Parallel Rigid Particle Dynamics

Authors: Sebastian Eibl, Ulrich Rüde

Abstract: As compute power increases with time, more involved and larger simulations become possible. However, it gets increasingly difficult to efficiently use the provided computational resources. Especially in particle-based simulations with a spatial domain partitioning large load imbalances can occur due to the simulation being dynamic. Then a static domain partitioning may not be suitable. This can de… ▽ More As compute power increases with time, more involved and larger simulations become possible. However, it gets increasingly difficult to efficiently use the provided computational resources. Especially in particle-based simulations with a spatial domain partitioning large load imbalances can occur due to the simulation being dynamic. Then a static domain partitioning may not be suitable. This can deteriorate the overall runtime of the simulation significantly. Sophisticated load balancing strategies must be designed to alleviate this problem. In this paper we conduct a systematic evaluation of the performance of six different load balancing algorithms. Our tests cover a wide range of simulation sizes, and employ one of the largest supercomputers available. In particular we study the runtime and memory complexity of all components of the simulation carefully. When progressing to extreme scale simulations it is essential to identify bottlenecks and to predict the scaling behaviour. Scaling experiments are shown for up to over one million processes. The performance of each algorithm is analyzed with respect to the quality of the load balancing and its runtime costs. For all tests, the waLBerla multiphysics framework is employed. △ Less

Submitted 2 August, 2019; v1 submitted 2 August, 2018; originally announced August 2018.

arXiv:1805.10167 [pdf, other]

A Scalable and Modular Software Architecture for Finite Elements on Hierarchical Hybrid Grids

Authors: Nils Kohl, Dominik Thönnes, Daniel Drzisga, Dominik Bartuschat, Ulrich Rüde

Abstract: In this article, a new generic higher-order finite-element framework for massively parallel simulations is presented. The modular software architecture is carefully designed to exploit the resources of modern and future supercomputers. Combining an unstructured topology with structured grid refinement facilitates high geometric adaptability and matrix-free multigrid implementations with excellent… ▽ More In this article, a new generic higher-order finite-element framework for massively parallel simulations is presented. The modular software architecture is carefully designed to exploit the resources of modern and future supercomputers. Combining an unstructured topology with structured grid refinement facilitates high geometric adaptability and matrix-free multigrid implementations with excellent performance. Different abstraction levels and fully distributed data structures additionally ensure high flexibility, extensibility, and scalability. The software concepts support sophisticated load balancing and flexibly combining finite element spaces. Example scenarios with coupled systems of PDEs show the applicability of the concepts to performing geophysical simulations. △ Less

Submitted 25 May, 2018; originally announced May 2018.

Comments: Preprint of an article submitted to International Journal of Parallel, Emergent and Distributed Systems (Taylor & Francis)

arXiv:1804.06373 [pdf, other]

Adaptive control in rollforward recovery for extreme scale multigrid

Authors: Markus Huber, Ulrich Rüde, Barbara Wohlmuth

Abstract: With the increasing number of compute components, failures in future exa-scale computer systems are expected to become more frequent. This motivates the study of novel resilience techniques. Here, we extend a recently proposed algorithm-based recovery method for multigrid iterations by introducing an adaptive control. After a fault, the healthy part of the system continues the iterative solution p… ▽ More With the increasing number of compute components, failures in future exa-scale computer systems are expected to become more frequent. This motivates the study of novel resilience techniques. Here, we extend a recently proposed algorithm-based recovery method for multigrid iterations by introducing an adaptive control. After a fault, the healthy part of the system continues the iterative solution process, while the solution in the faulty domain is re-constructed by an asynchronous on-line recovery. The computations in both the faulty and healthy subdomains must be coordinated in a sensitive way, in particular, both under and over-solving must be avoided. Both of these waste computational resources and will therefore increase the overall time-to-solution. To control the local recovery and guarantee an optimal re-coupling, we introduce a stop** criterion based on a mathematical error estimator. It involves hierarchical weighted sums of residuals within the context of uniformly refined meshes and is well-suited in the context of parallel high-performance computing. The re-coupling process is steered by local contributions of the error estimator. We propose and compare two criteria which differ in their weights. Failure scenarios when solving up to $6.9\cdot10^{11}$ unknowns on more than 245\,766 parallel processes will be reported on a state-of-the-art peta-scale supercomputer demonstrating the robustness of the method. △ Less

Submitted 17 April, 2018; originally announced April 2018.

arXiv:1803.04937 [pdf, other]

An improved lattice Boltzmann D3Q19 method based on an alternative equilibrium discretization

Authors: Martin Bauer, Ulrich Rüde

Abstract: Lattice Boltzmann simulations of three-dimensional, isothermal hydrodynamics often use either the D3Q19 or the D3Q27 velocity sets. While both models correctly approximate Navier-Stokes in the continuum limit, the D3Q19 model is computationally less expensive but has some known deficiencies regarding Galilean invariance, especially for high Reynolds number flows. In this work we present a novel me… ▽ More Lattice Boltzmann simulations of three-dimensional, isothermal hydrodynamics often use either the D3Q19 or the D3Q27 velocity sets. While both models correctly approximate Navier-Stokes in the continuum limit, the D3Q19 model is computationally less expensive but has some known deficiencies regarding Galilean invariance, especially for high Reynolds number flows. In this work we present a novel methodology to construct lattice Boltzmann equilibria for hydrodynamics directly from the continuous Maxwellian equilibrium. While our new approach reproduces the well known LBM equilibrium for D2Q9 and D3Q27 lattice models, it yields a different equilibrium formulation for the D3Q19 stencil. This newly proposed formulation is shown to be more accurate than the widely used second order equilibrium, while having the same computation costs. We present a steady state Chapman-Enskog analysis of the standard and the improved D3Q19 model and conduct numerical experiments that demonstrate the superior accuracy of our newly developed D3Q19 equilibrium. △ Less

Submitted 28 August, 2018; v1 submitted 13 March, 2018; originally announced March 2018.

arXiv:1802.02765 [pdf, other]

A local parallel communication algorithm for polydisperse rigid body dynamics

Authors: Sebastian Eibl, Ulrich Rüde

Abstract: The simulation of large ensembles of particles is usually parallelized by partitioning the domain spatially and using message passing to communicate between the processes handling neighboring subdomains. The particles are represented as individual geometric objects and are associated to the subdomains. Handling collisions and migrating particles between subdomains, as required for proper parallel… ▽ More The simulation of large ensembles of particles is usually parallelized by partitioning the domain spatially and using message passing to communicate between the processes handling neighboring subdomains. The particles are represented as individual geometric objects and are associated to the subdomains. Handling collisions and migrating particles between subdomains, as required for proper parallel execution, requires a complex communication protocol. Typically, the parallelization is restricted to handling only particles that are smaller than a subdomain. In many applications, however, particle sizes may vary drastically with some of them being larger than a subdomain. In this article we propose a new communication and synchronization algorithm that can handle the parallelization without size restrictions on the particles. Despite the additional complexity and extended functionality, the new algorithm introduces only minimal overhead. We demonstrate the scalability of the previous and the new communication algorithms up to almost two million parallel processes and for handling ten billion (1e10) geometrically resolved particles on a state-of-the-art petascale supercomputer. Different scenarios are presented to analyze the performance of the new algorithm and to demonstrate its capability to simulate polydisperse scenarios, where large individual particles can extend across several subdomains. △ Less

Submitted 2 August, 2018; v1 submitted 8 February, 2018; originally announced February 2018.

arXiv:1712.07028 [pdf, other]

doi 10.1080/10618562.2018.1424836

Direct simulation of liquid-gas-solid flow with a free surface lattice Boltzmann method

Authors: Simon Bogner, Jens Harting, Ulrich Rüde

Abstract: Direct numerical simulation of liquid-gas-solid flows is uncommon due to the considerable computational cost. As the grid spacing is determined by the smallest involved length scale, large grid sizes become necessary -- in particular if the bubble-particle aspect ratio is on the order of 10 or larger. Hence, it arises the question of both feasibility and reasonability. In this paper, we present a… ▽ More Direct numerical simulation of liquid-gas-solid flows is uncommon due to the considerable computational cost. As the grid spacing is determined by the smallest involved length scale, large grid sizes become necessary -- in particular if the bubble-particle aspect ratio is on the order of 10 or larger. Hence, it arises the question of both feasibility and reasonability. In this paper, we present a fully parallel, scalable method for direct numerical simulation of bubble-particle interaction at a size ratio of 1-2 orders of magnitude that makes simulations feasible on currently available super-computing resources. With the presented approach, simulations of bubbles in suspension columns consisting of more than $100\,000$ fully resolved particles become possible. Furthermore, we demonstrate the significance of particle-resolved simulations by comparison to previous unresolved solutions. The results indicate that fully-resolved direct numerical simulation is indeed necessary to predict the flow structure of bubble-particle interaction problems correctly. △ Less

Submitted 19 December, 2017; originally announced December 2017.

Comments: submitted to International Journal of Computational Fluid Dynamics

arXiv:1711.00336 [pdf, other]

A Coupled Lattice Boltzmann Method and Discrete Element Method for Discrete Particle Simulations of Particulate Flows

Authors: Christoph Rettinger, Ulrich Rüde

Abstract: Discrete particle simulations are widely used to study large-scale particulate flows in complex geometries where particle-particle and particle-fluid interactions require an adequate representation but the computational cost has to be kept low. In this work, we present a novel coupling approach for such simulations. A lattice Boltzmann formulation of the generalized Navier-Stokes equations is used… ▽ More Discrete particle simulations are widely used to study large-scale particulate flows in complex geometries where particle-particle and particle-fluid interactions require an adequate representation but the computational cost has to be kept low. In this work, we present a novel coupling approach for such simulations. A lattice Boltzmann formulation of the generalized Navier-Stokes equations is used to describe the fluid motion. This promises efficient simulations suitable for high performance computing and, since volume displacement effects by the solid phase are considered, our approach is also applicable to non-dilute particulate systems. The discrete element method is combined with an explicit evaluation of interparticle lubrication forces to simulate the motion of individual submerged particles. Drag, pressure and added mass forces determine the momentum transfer by fluid-particle interactions. A stable coupling algorithm is presented and discussed in detail. We demonstrate the validity of our approach for dilute as well as dense systems by predicting the settling velocity of spheres over a broad range of solid volume fractions in good agreement with semi-empirical correlations. Additionally, the accuracy of particle-wall interactions in a viscous fluid is thoroughly tested and established. Our approach can thus be readily used for various particulate systems and can be extended straightforward to e.g. non-spherical particles. △ Less

Submitted 1 November, 2017; originally announced November 2017.

arXiv:1709.06793 [pdf, other]

A stencil scaling approach for accelerating matrix-free finite element implementations

Authors: Simon Bauer, Daniel Drzisga, Marcus Mohr, Ulrich Ruede, Christian Waluga, Barbara Wohlmuth

Abstract: We present a novel approach to fast on-the-fly low order finite element assembly for scalar elliptic partial differential equations of Darcy type with variable coefficients optimized for matrix-free implementations. Our approach introduces a new operator that is obtained by appropriately scaling the reference stiffness matrix from the constant coefficient case. Assuming sufficient regularity, an a… ▽ More We present a novel approach to fast on-the-fly low order finite element assembly for scalar elliptic partial differential equations of Darcy type with variable coefficients optimized for matrix-free implementations. Our approach introduces a new operator that is obtained by appropriately scaling the reference stiffness matrix from the constant coefficient case. Assuming sufficient regularity, an a priori analysis shows that solutions obtained by this approach are unique and have asymptotically optimal order convergence in the $H^1$- and the $L^2$-norm on hierarchical hybrid grids. For the pre-asymptotic regime, we present a local modification that guarantees uniform ellipticity of the operator. Cost considerations show that our novel approach requires roughly one third of the floating-point operations compared to a classical finite element assembly scheme employing nodal integration. Our theoretical considerations are illustrated by numerical tests that confirm the expectations with respect to accuracy and run-time. A large scale application with more than a hundred billion ($1.6\cdot10^{11}$) degrees of freedom executed on 14,310 compute cores demonstrates the efficiency of the new scaling approach. △ Less

Submitted 23 July, 2018; v1 submitted 20 September, 2017; originally announced September 2017.

arXiv:1708.08741 [pdf, other]

doi 10.1016/j.jocs.2018.05.011

A Scalable Multiphysics Algorithm for Massively Parallel Direct Numerical Simulations of Electrophoresis

Authors: Dominik Bartuschat, Ulrich Rüde

Abstract: In this article we introduce a novel coupled algorithm for massively parallel direct numerical simulations of electrophoresis in microfluidic flows. This multiphysics algorithm employs an Eulerian description of fluid and ions, combined with a Lagrangian representation of moving charged particles. The fixed grid facilitates efficient solvers and the employed lattice Boltzmann method can efficientl… ▽ More In this article we introduce a novel coupled algorithm for massively parallel direct numerical simulations of electrophoresis in microfluidic flows. This multiphysics algorithm employs an Eulerian description of fluid and ions, combined with a Lagrangian representation of moving charged particles. The fixed grid facilitates efficient solvers and the employed lattice Boltzmann method can efficiently handle complex geometries. Validation experiments with more than $70\,000$ time steps are presented, together with scaling experiments with over ${4\cdot10^{6}}$ particles and ${1.96\cdot10^{11}}$ grid cells for both hydrodynamics and electric potential. We achieve excellent performance and scaling on up to $65\,536$ cores of a current supercomputer. △ Less

Submitted 25 May, 2018; v1 submitted 29 August, 2017; originally announced August 2017.

Comments: Accepted manuscript of publication in Journal of Computational Science (Elsevier)

arXiv:1708.08286 [pdf, other]

A Scalable and Extensible Checkpointing Scheme for Massively Parallel Simulations

Authors: Nils Kohl, Johannes Hötzer, Florian Schornbaum, Martin Bauer, Christian Godenschwager, Harald Köstler, Britta Nestler, Ulrich Rüde

Abstract: Realistic simulations in engineering or in the materials sciences can consume enormous computing resources and thus require the use of massively parallel supercomputers. The probability of a failure increases both with the runtime and with the number of system components. For future exascale systems it is therefore considered critical that strategies are developed to make software resilient agains… ▽ More Realistic simulations in engineering or in the materials sciences can consume enormous computing resources and thus require the use of massively parallel supercomputers. The probability of a failure increases both with the runtime and with the number of system components. For future exascale systems it is therefore considered critical that strategies are developed to make software resilient against failures. In this article, we present a scalable, distributed, diskless, and resilient checkpointing scheme that can create and recover snapshots of a partitioned simulation domain. We demonstrate the efficiency and scalability of the checkpoint strategy for simulations with up to $40$ billion computational cells executing on more than $400$ billion floating point values. A checkpoint creation is shown to require only a few seconds and the new checkpointing scheme scales almost perfectly up to more than $260\,000$ ($2^{18}$) processes. To recover from a diskless checkpoint during runtime, we realize the recovery algorithms using ULFM MPI. The checkpointing mechanism is fully integrated in a state-of-the-art high-performance multi-physics simulation framework. We demonstrate the efficiency and robustness of the method with a realistic phase-field simulation originating in the material sciences and with a lattice Boltzmann method implementation. △ Less

Submitted 29 January, 2018; v1 submitted 28 August, 2017; originally announced August 2017.

arXiv:1706.00221 [pdf, other]

doi 10.1007/s00466-017-1486-0

The Maximum Dissipation Principle in Rigid-Body Dynamics with Purely Inelastic Impacts

Authors: Tobias Preclik, Sebastian Eibl, Ulrich Rüde

Abstract: Formulating a consistent theory for rigid-body dynamics with impacts is an intricate problem. Twenty years ago Stewart published the first consistent theory with purely inelastic impacts and an impulsive friction model analogous to Coulomb friction. In this paper we demonstrate that the consistent impact model can exhibit multiple solutions with a varying degree of dissipation even in the single-c… ▽ More Formulating a consistent theory for rigid-body dynamics with impacts is an intricate problem. Twenty years ago Stewart published the first consistent theory with purely inelastic impacts and an impulsive friction model analogous to Coulomb friction. In this paper we demonstrate that the consistent impact model can exhibit multiple solutions with a varying degree of dissipation even in the single-contact case. Replacing the impulsive friction model based on Coulomb friction by a model based on the maximum dissipation principle resolves the non-uniqueness in the single-contact impact problem. The paper constructs the alternative impact model and presents integral equations describing rigid-body dynamics with a non-impulsive and non-compliant contact model and an associated purely inelastic impact model maximizing dissipation. An analytic solution is derived for the single-contact impact problem. The models are then embedded into a time-step** scheme. The macroscopic behaviour is compared to Coulomb friction in a large-scale granular flow problem. △ Less

Submitted 1 June, 2017; originally announced June 2017.

Journal ref: Springer, Computational Mechanics, 2017

arXiv:1704.06829 [pdf, other]

doi 10.1137/17M1128411

Extreme-Scale Block-Structured Adaptive Mesh Refinement

Authors: Florian Schornbaum, Ulrich Rüde

Abstract: In this article, we present a novel approach for block-structured adaptive mesh refinement (AMR) that is suitable for extreme-scale parallelism. All data structures are designed such that the size of the meta data in each distributed processor memory remains bounded independent of the processor number. In all stages of the AMR process, we use only distributed algorithms. No central resources such… ▽ More In this article, we present a novel approach for block-structured adaptive mesh refinement (AMR) that is suitable for extreme-scale parallelism. All data structures are designed such that the size of the meta data in each distributed processor memory remains bounded independent of the processor number. In all stages of the AMR process, we use only distributed algorithms. No central resources such as a master process or replicated data are employed, so that an unlimited scalability can be achieved. For the dynamic load balancing in particular, we propose to exploit the hierarchical nature of the block-structured domain partitioning by creating a lightweight, temporary copy of the core data structure. This copy acts as a local and fully distributed proxy data structure. It does not contain simulation data, but only provides topological information about the domain partitioning into blocks. Ultimately, this approach enables an inexpensive, local, diffusion-based dynamic load balancing scheme. We demonstrate the excellent performance and the full scalability of our new AMR implementation for two architecturally different petascale supercomputers. Benchmarks on an IBM Blue Gene/Q system with a mesh containing 3.7 trillion unknowns distributed to 458,752 processes confirm the applicability for future extreme-scale parallel machines. The algorithms proposed in this article operate on blocks that result from the domain partitioning. This concept and its realization support the storage of arbitrary data. In consequence, the software framework can be used for different simulation methods, including mesh based and meshless methods. In this article, we demonstrate fluid simulations based on the lattice Boltzmann method. △ Less

Submitted 13 April, 2018; v1 submitted 22 April, 2017; originally announced April 2017.

Comments: 38 pages, 17 figures, 11 tables

MSC Class: 68W10; 68W15; 68U20; 65Y05; 65Y20; 76P05

Journal ref: SIAM J. Sci. Comput. 40-3 (2018), pp. C358-C387

arXiv:1702.04910 [pdf, other]

doi 10.1016/j.compfluid.2017.05.033

A comparative study of fluid-particle coupling methods for fully resolved lattice Boltzmann simulations

Authors: Christoph Rettinger, Ulrich Rüde

Abstract: The direct numerical simulation of particulate systems offers a unique approach to study the dynamics of fluid-solid suspensions by fully resolving the submerged particles and without introducing empirical models. For the lattice Boltzmann method, different variants exist to incorporate the fluid-particle interaction into the simulation. This paper provides a detailed and systematic comparison of… ▽ More The direct numerical simulation of particulate systems offers a unique approach to study the dynamics of fluid-solid suspensions by fully resolving the submerged particles and without introducing empirical models. For the lattice Boltzmann method, different variants exist to incorporate the fluid-particle interaction into the simulation. This paper provides a detailed and systematic comparison of two different methods, namely the momentum exchange method and the partially saturated cells method by Noble and Torczynski. Three subvariants of each method are used in the benchmark scenario of a single heavy sphere settling in ambient fluid to study their characteristics and accuracy for particle Reynolds numbers from 185 up to 365. The sphere must be resolved with at least 24 computational cells per diameter to achieve velocity errors below 5%. The momentum exchange method is found to be more accurate in predicting the streamwise velocity component whereas the partially saturated cells method is more accurate in the spanwise components. The study reveals that the resolution should be chosen with respect to the coupling dynamics, and not only based on the flow properties, to avoid large errors in the fluid-particle interaction. △ Less

Submitted 16 February, 2017; originally announced February 2017.

Comments: 29 pages, 13 figures, 4 tables

arXiv:1612.01333 [pdf, ps, other]

On the analysis of block smoothers for saddle point problems

Authors: Lorenz John, Ulrich Rüde, Barbara Wohlmuth, Walter Zulehner

Abstract: In this article, we discuss several classes of Uzawa smoothers for the application in multigrid methods in the context of saddle point problems. Beside commonly used variants, such as the inexact and block factorization version, we also introduce a new symmetric method, belonging to the class of Uzawa smoothers. For these variants we unify the analysis of the smoothing properties, which is an impo… ▽ More In this article, we discuss several classes of Uzawa smoothers for the application in multigrid methods in the context of saddle point problems. Beside commonly used variants, such as the inexact and block factorization version, we also introduce a new symmetric method, belonging to the class of Uzawa smoothers. For these variants we unify the analysis of the smoothing properties, which is an important part in the multigrid convergence theory. These methods are applied to the Stokes problem for which all smoothers are implemented as pointwise relaxation methods. Several numerical examples illustrate the theoretical results. △ Less

Submitted 5 December, 2016; originally announced December 2016.

arXiv:1610.02608 [pdf, other]

Research and Education in Computational Science and Engineering

Authors: Ulrich Rüde, Karen Willcox, Lois Curfman McInnes, Hans De Sterck, George Biros, Hans Bungartz, James Corones, Evin Cramer, James Crowley, Omar Ghattas, Max Gunzburger, Michael Hanke, Robert Harrison, Michael Heroux, Jan Hesthaven, Peter Jimack, Chris Johnson, Kirk E. Jordan, David E. Keyes, Rolf Krause, Vipin Kumar, Stefan Mayer, Juan Meza, Knut Martin Mørken, J. Tinsley Oden , et al. (8 additional authors not shown)

Abstract: Over the past two decades the field of computational science and engineering (CSE) has penetrated both basic and applied research in academia, industry, and laboratories to advance discovery, optimize systems, support decision-makers, and educate the scientific and engineering workforce. Informed by centuries of theory and experiment, CSE performs computational experiments to answer questions that… ▽ More Over the past two decades the field of computational science and engineering (CSE) has penetrated both basic and applied research in academia, industry, and laboratories to advance discovery, optimize systems, support decision-makers, and educate the scientific and engineering workforce. Informed by centuries of theory and experiment, CSE performs computational experiments to answer questions that neither theory nor experiment alone is equipped to answer. CSE provides scientists and engineers of all persuasions with algorithmic inventions and software systems that transcend disciplines and scales. Carried on a wave of digital technology, CSE brings the power of parallelism to bear on troves of data. Mathematics-based advanced computing has become a prevalent means of discovery and innovation in essentially all areas of science, engineering, technology, and society; and the CSE community is at the core of this transformation. However, a combination of disruptive developments---including the architectural complexity of extreme-scale computing, the data revolution that engulfs the planet, and the specialization required to follow the applications to new frontiers---is redefining the scope and reach of the CSE endeavor. This report describes the rapid expansion of CSE and the challenges to sustaining its bold advances. The report also presents strategies and directions for CSE research and education for the next decade. △ Less

Submitted 31 December, 2017; v1 submitted 8 October, 2016; originally announced October 2016.

Comments: Major revision, to appear in SIAM Review

Report number: Argonne National Laboratory Preprint ANL/MCS-P6054-0916 MSC Class: 00A72; 62-07; 68U20; 68W01; 68W10; 97A99; 97M10; 97N80; 97R20; 97R30 ACM Class: G.0; G.4; I.6; J.0; J.2; J.3; J.4; J.6; J.7; K.3.2

Showing 1–50 of 80 results for author: Rüde, U