Search | arXiv e-print repository

A two-scale approach for efficient on-the-fly operator assembly in massively parallel high performance multigrid codes

Authors: Simon Bauer, Marcus Mohr, Ulrich Rüde, Jens Weismüller, Markus Wittmann, Barbara Wohlmuth

Abstract: Matrix-free finite element implementations of massively parallel geometric multigrid save memory and are often significantly faster than implementations using classical sparse matrix techniques. They are especially well suited for hierarchical hybrid grids on polyhedral domains. In the case of constant coefficients all fine grid node stencils in the interior of a coarse macro element are equal. Ho… ▽ More Matrix-free finite element implementations of massively parallel geometric multigrid save memory and are often significantly faster than implementations using classical sparse matrix techniques. They are especially well suited for hierarchical hybrid grids on polyhedral domains. In the case of constant coefficients all fine grid node stencils in the interior of a coarse macro element are equal. However, for non-polyhedral domains the situation changes. Then even for the Laplace operator, the non-linear element map** leads to fine grid stencils that can vary from grid point to grid point. This observation motivates a new two-scale approach that exploits a piecewise polynomial approximation of the fine grid operator with respect to the coarse mesh size. The low-cost evaluation of these surrogate polynomials results in an efficient stencil assembly on-the-fly for non-polyhedral domains that can be significantly more efficient than matrix-free techniques that are based on an element-wise assembly. The performance analysis and additional hardware-aware code optimizations are based on the Execution-Cache-Memory model. Several aspects such as two-scale a priori error bounds and double discretization techniques are presented. Weak and strong scaling results illustrate the benefits of the new technique when used within large scale PDE solvers. △ Less

Submitted 23 August, 2016; originally announced August 2016.

arXiv:1607.03252 [pdf, other]

Scheduling massively parallel multigrid for multilevel Monte Carlo methods

Authors: Björn Gmeiner, Daniel Drzisga, Ulrich Ruede, Robert Scheichl, Barbara Wohlmuth

Abstract: The computational complexity of naive, sampling-based uncertainty quantification for 3D partial differential equations is extremely high. Multilevel approaches, such as multilevel Monte Carlo (MLMC), can reduce the complexity significantly, but to exploit them fully in a parallel environment, sophisticated scheduling strategies are needed. Often fast algorithms that are executed in parallel are es… ▽ More The computational complexity of naive, sampling-based uncertainty quantification for 3D partial differential equations is extremely high. Multilevel approaches, such as multilevel Monte Carlo (MLMC), can reduce the complexity significantly, but to exploit them fully in a parallel environment, sophisticated scheduling strategies are needed. Often fast algorithms that are executed in parallel are essential to compute fine level samples in 3D, whereas to compute individual coarse level samples only moderate numbers of processors can be employed efficiently. We make use of multiple instances of a parallel multigrid solver combined with advanced load balancing techniques. In particular, we optimize the concurrent execution across the three layers of the MLMC method: parallelization across levels, across samples, and across the spatial grid. The overall efficiency and performance of these methods will be analyzed. Here the scalability window of the multigrid solver is revealed as being essential, i.e., the property that the solution can be computed with a range of process numbers while maintaining good parallel efficiency. We evaluate the new scheduling strategies in a series of numerical tests, and conclude the paper demonstrating large 3D scaling experiments. △ Less

Submitted 12 July, 2016; originally announced July 2016.

MSC Class: G.1.8

arXiv:1604.07994 [pdf, other]

Highly sparse surface couplings for subdomain-wise isoviscous Stokes finite element discretizations

Authors: Markus Huber, Ulrich Rüde, Christian Waluga, Barbara Wohlmuth

Abstract: The Stokes system with constant viscosity can be cast into different formulations by exploiting the incompressibility constraint. For instance the strain in the weak formulation can be replaced by the gradient to decouple the velocity components in the different coordinate directions. Thus the discretization of the simplified problem leads to fewer nonzero entries in the stiffness matrix. This is… ▽ More The Stokes system with constant viscosity can be cast into different formulations by exploiting the incompressibility constraint. For instance the strain in the weak formulation can be replaced by the gradient to decouple the velocity components in the different coordinate directions. Thus the discretization of the simplified problem leads to fewer nonzero entries in the stiffness matrix. This is of particular interest in large scale simulations where a reduced memory bandwidth requirement can help to significantly accelerate the computations. In the case of a piecewise constant viscosity, as it typically arises in multi-phase flows, or when the boundary conditions involve traction, the situation is more complex, and one has to treat the cross derivatives in the original Stokes system with care. A naive application of the standard vectorial Laplacian results in a physically incorrect solution, while formulations based on the strain increase the computational effort everywhere, even when the inconsistencies arise only from an incorrect treatment in a small fraction of the computational domain. Here we propose a new approach that is consistent with the strain-based formulation and preserves the decoupling advantages of the gradient-based formulation in isoviscous subdomains. The modification is equivalent to locally changing the discretization stencils, hence the more expensive discretization is restricted to a lower dimensional interface, making the additional computational cost asymptotically negligible. We demonstrate the consistency and convergence properties of the method and show that in a massively parallel setup, the multigrid solution of the resulting discrete systems is faster than for the classical strain-based formulation. Moreover, we give an application example which is inspired by geophysical research. △ Less

Submitted 27 April, 2016; originally announced April 2016.

arXiv:1604.01632 [pdf, other]

doi 10.1016/j.camwa.2013.06.007

Free Surface Lattice Boltzmann with Enhanced Bubble Model

Authors: Daniela Anderl, Simon Bogner, Cornelia Rauh, Ulrich Rüde, Antonio Delgado

Abstract: This paper presents an enhancement to the free surface lattice Boltzmann method (FSLBM) for the simulation of bubbly flows including rupture and breakup of bubbles. The FSLBM uses a volume of fluid approach to reduce the problem of a liquid-gas two-phase flow to a single-phase free surface simulation. In bubbly flows compression effects leading to an increase or decrease of pressure in the suspend… ▽ More This paper presents an enhancement to the free surface lattice Boltzmann method (FSLBM) for the simulation of bubbly flows including rupture and breakup of bubbles. The FSLBM uses a volume of fluid approach to reduce the problem of a liquid-gas two-phase flow to a single-phase free surface simulation. In bubbly flows compression effects leading to an increase or decrease of pressure in the suspended bubbles cannot be neglected. Therefore, the free surface simulation is augmented by a bubble model that supplies the missing information by tracking the topological changes of the free surface in the flow. The new model presented here is capable of handling the effects of bubble breakup and coalesce without causing a significant computational overhead. Thus, the enhanced bubble model extends the applicability of the FSLBM to a new range of practically relevant problems, like bubble formation and development in chemical reactors or foaming processes. △ Less

Submitted 6 April, 2016; originally announced April 2016.

Journal ref: Computers & Mathematics with Applications 67 (2). pages 331-339. 2014

arXiv:1603.04633 [pdf, other]

Microswimming with inertia

Authors: Jayant Pande, Kristina Pickl, Oleg Trosman, Ulrich Rüde, Ana-Sunčana Smith

Abstract: Microswimmers, especially in theoretical treatments, are generally taken to be completely inertia-free, since inertial effects on their motion are typically small and assuming their absence simplifies the problem considerably. Yet in nature there is no discrete break between swimmers for which inertia is negligibly small and for which it is detectable. Here we study a microswimming model for which… ▽ More Microswimmers, especially in theoretical treatments, are generally taken to be completely inertia-free, since inertial effects on their motion are typically small and assuming their absence simplifies the problem considerably. Yet in nature there is no discrete break between swimmers for which inertia is negligibly small and for which it is detectable. Here we study a microswimming model for which the effect of inertia is calculated explicitly in the regime of transition between the Stokesian and the non-Stokesian flow limits, which we term the intermediate regime. The model in the inertialess limit is the bead-spring swimmer. We first show that in the intermediate regime a mechanical microswimmer exhibits damped inertial coasting like an underdamped harmonic oscillator. We then calculate analytically the swimmer's velocity by including a mass-acceleration term in the equations of motion which are otherwise based on the Stokes flow. We show that this hybrid treatment combining aspects of underdamped and overdamped dynamics provides an accurate description of the motion in the intermediate regime, as verified here by comparison to simulations using the lattice Boltzmann method, and is a significant improvement over the results from the inertialess theory when either the mass of the swimmer or the forces driving its motion is/are large enough. △ Less

Submitted 6 November, 2016; v1 submitted 15 March, 2016; originally announced March 2016.

arXiv:1511.07261 [pdf, other]

doi 10.1080/17445760.2015.1118478

A Python Extension for the Massively Parallel Multiphysics Simulation Framework waLBerla

Authors: Martin Bauer, Florian Schornbaum, Christian Godenschwager, Matthias Markl, Daniela Anderl, Harald Köstler, Ulrich Rüde

Abstract: We present a Python extension to the massively parallel HPC simulation toolkit waLBerla. waLBerla is a framework for stencil based algorithms operating on block-structured grids, with the main application field being fluid simulations in complex geometries using the lattice Boltzmann method. Careful performance engineering results in excellent node performance and good scalability to over 400,000… ▽ More We present a Python extension to the massively parallel HPC simulation toolkit waLBerla. waLBerla is a framework for stencil based algorithms operating on block-structured grids, with the main application field being fluid simulations in complex geometries using the lattice Boltzmann method. Careful performance engineering results in excellent node performance and good scalability to over 400,000 cores. To increase the usability and flexibility of the framework, a Python interface was developed. Python extensions are used at all stages of the simulation pipeline: They simplify and automate scenario setup, evaluation, and plotting. We show how our Python interface outperforms the existing text-file-based configuration mechanism, providing features like automatic nondimensionalization of physical quantities and handling of complex parameter dependencies. Furthermore, Python is used to process and evaluate results while the simulation is running, leading to smaller output files and the possibility to adjust parameters dependent on the current simulation state. C++ data structures are exported such that a seamless interfacing to other numerical Python libraries is possible. The expressive power of Python and the performance of C++ make development of efficient code with low time effort possible. △ Less

Submitted 23 November, 2015; originally announced November 2015.

arXiv:1511.05759 [pdf, other]

Solution Techniques for the Stokes System: A priori and a posteriori modifications, resilient algorithms

Authors: Markus Huber, Lorenz John, Petra Pustejovska, Ulrich Rüde, Christian Waluga, Barbara Wohlmuth

Abstract: This article proposes modifications to standard low order finite element approximations of the Stokes system with the goal of improving both the approximation quality and the parallel algebraic solution process. Different from standard finite element techniques, we do not modify or enrich the approximation spaces but modify the operator itself to ensure fundamental physical properties such as mass… ▽ More This article proposes modifications to standard low order finite element approximations of the Stokes system with the goal of improving both the approximation quality and the parallel algebraic solution process. Different from standard finite element techniques, we do not modify or enrich the approximation spaces but modify the operator itself to ensure fundamental physical properties such as mass and energy conservation. Special local a~priori correction techniques at re-entrant corners lead to an improved representation of the energy in the discrete system and can suppress the global pollution effect. Local mass conservation can be achieved by an a~posteriori correction to the finite element flux. This avoids artifacts in coupled multi-physics transport problems. Finally, hardware failures in large supercomputers may lead to a loss of data in solution subdomains. Within parallel multigrid, this can be compensated by the accelerated solution of local subproblems. These resilient algorithms will gain importance on future extreme scale computing systems. △ Less

Submitted 18 November, 2015; originally announced November 2015.

Comments: in Proceedings of the ICIAM, Bei**g, China, 2015

MSC Class: 65N30; 65N12; 65N55; 65Y05; 76D07

arXiv:1511.02134 [pdf, other]

A quantitative performance analysis for Stokes solvers at the extreme scale

Authors: Björn Gmeiner, Markus Huber, Lorenz John, Ulrich Rüde, Barbara Wohlmuth

Abstract: This article presents a systematic quantitative performance analysis for large finite element computations on extreme scale computing systems. Three parallel iterative solvers for the Stokes system, discretized by low order tetrahedral elements, are compared with respect to their numerical efficiency and their scalability running on up to $786\,432$ parallel threads. A genuine multigrid method for… ▽ More This article presents a systematic quantitative performance analysis for large finite element computations on extreme scale computing systems. Three parallel iterative solvers for the Stokes system, discretized by low order tetrahedral elements, are compared with respect to their numerical efficiency and their scalability running on up to $786\,432$ parallel threads. A genuine multigrid method for the saddle point system using an Uzawa-type smoother provides the best overall performance with respect to memory consumption and time-to-solution. The largest system solved on a Blue Gene/Q system has more than ten trillion ($1.1 \cdot 10 ^{13}$) unknowns and requires about 13 minutes compute time. Despite the matrix free and highly optimized implementation, the memory requirement for the solution vector and the auxiliary vectors is about 200 TByte. Brandt's notion of "textbook multigrid efficiency" is employed to study the algorithmic performance of iterative solvers. A recent extension of this paradigm to "parallel textbook multigrid efficiency" makes it possible to assess also the efficiency of parallel iterative solvers for a given hardware architecture in absolute terms. The efficiency of the method is demonstrated for simulating incompressible fluid flow in a pipe filled with spherical obstacles. △ Less

Submitted 6 November, 2015; originally announced November 2015.

MSC Class: 65N55; 65Y05; 68Q25

arXiv:1509.07691 [pdf, other]

doi 10.1103/PhysRevE.93.043302

Curvature estimation from a volume of fluid indicator function for the simulation of surface tension and wetting with a free surface lattice Boltzmann method

Authors: Simon Bogner, Ulrich Rüde, Jens Harting

Abstract: The free surface lattice Boltzmann method (FSLBM) is a combination of the hydrodynamic lattice Boltzmann method (LBM) with a volume of fluid (VOF) interface apturing technique for the simulation of incompressible free surface flows. Capillary effects are modeled by extracting the curvature of the interface from the VOF indicator function and imposing a pressure jump at the free boundary. However,… ▽ More The free surface lattice Boltzmann method (FSLBM) is a combination of the hydrodynamic lattice Boltzmann method (LBM) with a volume of fluid (VOF) interface apturing technique for the simulation of incompressible free surface flows. Capillary effects are modeled by extracting the curvature of the interface from the VOF indicator function and imposing a pressure jump at the free boundary. However, obtaining accurate curvature estimates from a VOF description can introduce significant errors. This article reports numerical results for three different surface tension models in standard test cases, and compares the according errors in the velocity field (spurious currents). Furthermore, the FSLBM is shown to be suited to simulate wetting effects at solid boundaries. To this end, a new method is developed to represent wetting boundary conditions in a least squares curvature econstruction technique. The main limitations of the current FSLBM are analyzed and are found to be caused by its simplified advection scheme. Possible improvements are suggested. △ Less

Submitted 10 February, 2016; v1 submitted 25 September, 2015; originally announced September 2015.

Journal ref: Phys. Rev. E 93, 043302 (2016)

arXiv:1508.07982 [pdf, ps, other]

doi 10.1137/15M1035240

Massively Parallel Algorithms for the Lattice Boltzmann Method on Non-uniform Grids

Authors: Florian Schornbaum, Ulrich Rüde

Abstract: The lattice Boltzmann method exhibits excellent scalability on current supercomputing systems and has thus increasingly become an alternative method for large-scale non-stationary flow simulations, reaching up to a trillion grid nodes. Additionally, grid refinement can lead to substantial savings in memory and compute time. These saving, however, come at the cost of much more complex data structur… ▽ More The lattice Boltzmann method exhibits excellent scalability on current supercomputing systems and has thus increasingly become an alternative method for large-scale non-stationary flow simulations, reaching up to a trillion grid nodes. Additionally, grid refinement can lead to substantial savings in memory and compute time. These saving, however, come at the cost of much more complex data structures and algorithms. In particular, the interface between subdomains with different grid sizes must receive special treatment. In this article, we present parallel algorithms, distributed data structures, and communication routines that are implemented in the software framework waLBerla in order to support large-scale, massively parallel lattice Boltzmann-based simulations on non-uniform grids. Additionally, we evaluate the performance of our approach on two current petascale supercomputers. On an IBM Blue Gene/Q system, the largest weak scaling benchmarks with refined grids are executed with almost two million threads, demonstrating not only near-perfect scalability but also an absolute performance of close to a trillion lattice Boltzmann cell updates per second. On an Intel-based system, the strong scaling of a simulation with refined grids and a total of more than 8.5 million cells is demonstrated to reach a performance of less than one millisecond per time step. This enables simulations with complex, non-uniform grids and four million time steps per hour compute time. △ Less

Submitted 21 January, 2016; v1 submitted 31 August, 2015; originally announced August 2015.

Comments: 32 pages, 20 figures, 4 tables

Journal ref: SIAM J. Sci. Comput. 38-2 (2016), pp. C96-C126

arXiv:1508.02960 [pdf, other]

Pore-scale lattice Boltzmann simulation of laminar and turbulent flow through a sphere pack

Authors: Ehsan Fattahia, Christian Waluga, Barbara Wohlmuth, Ulrich Rüde, Michael Manhart, Rainer Helmig

Abstract: The lattice Boltzmann method can be used to simulate flow through porous media with full geometrical resolution. With such a direct numerical simulation, it becomes possible to study fundamental effects which are difficult to assess either by develo** macroscopic mathematical models or experiments. We first evaluate the lattice Boltzmann method with various boundary handling of the solid-wall an… ▽ More The lattice Boltzmann method can be used to simulate flow through porous media with full geometrical resolution. With such a direct numerical simulation, it becomes possible to study fundamental effects which are difficult to assess either by develo** macroscopic mathematical models or experiments. We first evaluate the lattice Boltzmann method with various boundary handling of the solid-wall and various collision operators to assess their suitability for large scale direct numerical simulation of porous media flow. A periodic pressure drop boundary condition is used to mimic the pressure driven flow through the simple sphere pack in a periodic domain. The evaluation of the method is done in the Darcy regime and the results are compared to a semi-analytic solution. Taking into account computational cost and accuracy, we choose the most efficient combination of the solid boundary condition and collision operator. We apply this method to perform simulations for a wide range of Reynolds numbers from Stokes flow over seven orders of magnitude to turbulent flow. Contours and streamlines of the flow field are presented to show the flow behavior in different flow regimes. Moreover, unknown parameters of the Forchheimer, the Barree--Conway and friction factor models are evaluated numerically for the considered flow regimes. △ Less

Submitted 12 August, 2015; originally announced August 2015.

arXiv:1507.06565 [pdf, ps, other]

Large scale lattice Boltzmann simulation for the coupling of free and porous media flow

Authors: Ehsan Fattahi, Christian Waluga, Barbara Wohlmuth, Ulrich Rüde

Abstract: In this work, we investigate the interaction of free and porous media flow by large scale lattice Boltzmann simulations. We study the transport phenomena at the porous interface on multiple scales, i.e., we consider both, computationally generated pore-scale geometries and homogenized models at a macroscopic scale. The pore-scale results are compared to those obtained by using different transmissi… ▽ More In this work, we investigate the interaction of free and porous media flow by large scale lattice Boltzmann simulations. We study the transport phenomena at the porous interface on multiple scales, i.e., we consider both, computationally generated pore-scale geometries and homogenized models at a macroscopic scale. The pore-scale results are compared to those obtained by using different transmission models. Two-domain approaches with sharp interface conditions, e.g., of Beavers--Joseph--Saffman type, as well as a single-domain approach with a porosity depending viscosity are taken into account. For the pore-scale simulations, we use a highly scalable communication-reducing scheme with a robust second order boundary handling. We comment on computational aspects of the pore-scale simulation and on how to generate pore-scale geometries. The two-domain approaches depend sensitively on the choice of the exact position of the interface, whereas a well-designed single-domain approach can significantly better recover the averaged pore-scale results. △ Less

Submitted 23 July, 2015; originally announced July 2015.

arXiv:1506.06185 [pdf, other]

Resilience for Multigrid Software at the Extreme Scale

Authors: Markus Huber, Björn Gmeiner, Ulrich Rüde, Barbara Wohlmuth

Abstract: Fault tolerant algorithms for the numerical approximation of elliptic partial differential equations on modern supercomputers play a more and more important role in the future design of exa-scale enabled iterative solvers. Here, we combine domain partitioning with highly scalable geometric multigrid schemes to obtain fast and fault-robust solvers in three dimensions. The recovery strategy is based… ▽ More Fault tolerant algorithms for the numerical approximation of elliptic partial differential equations on modern supercomputers play a more and more important role in the future design of exa-scale enabled iterative solvers. Here, we combine domain partitioning with highly scalable geometric multigrid schemes to obtain fast and fault-robust solvers in three dimensions. The recovery strategy is based on a hierarchical hybrid concept where the values on lower dimensional primitives such as faces are stored redundantly and thus can be recovered easily in case of a failure. The lost volume unknowns in the faulty region are re-computed approximately with multigrid cycles by solving a local Dirichlet problem on the faulty subdomain. Different strategies are compared and evaluated with respect to performance, computational cost, and speed up. Especially effective are strategies in which the local recovery in the faulty region is executed in parallel with global solves and when the local recovery is additionally accelerated. This results in an asynchronous multigrid iteration that can fully compensate faults. Excellent parallel performance on a current peta-scale system is demonstrated. △ Less

Submitted 19 June, 2015; originally announced June 2015.

MSC Class: 65N55; 65Y05; 68Q85

arXiv:1506.01684 [pdf, other]

Massively Parallel Phase-Field Simulations for Ternary Eutectic Directional Solidification

Authors: Martin Bauer, Johannes Hötzer, Philipp Steinmetz, Marcus Jainta, Marco Berghoff, Florian Schornbaum, Christian Godenschwager, Harald Köstler, Britta Nestler, Ulrich Rüde

Abstract: Microstructures forming during ternary eutectic directional solidification processes have significant influence on the macroscopic mechanical properties of metal alloys. For a realistic simulation, we use the well established thermodynamically consistent phase-field method and improve it with a new grand potential formulation to couple the concentration evolution. This extension is very compute in… ▽ More Microstructures forming during ternary eutectic directional solidification processes have significant influence on the macroscopic mechanical properties of metal alloys. For a realistic simulation, we use the well established thermodynamically consistent phase-field method and improve it with a new grand potential formulation to couple the concentration evolution. This extension is very compute intensive due to a temperature dependent diffusive concentration. We significantly extend previous simulations that have used simpler phase-field models or were performed on smaller domain sizes. The new method has been implemented within the massively parallel HPC framework waLBerla that is designed to exploit current supercomputers efficiently. We apply various optimization techniques, including buffering techniques, explicit SIMD kernel vectorization, and communication hiding. Simulations utilizing up to 262,144 cores have been run on three different supercomputing architectures and weak scalability results are shown. Additionally, a hierarchical, mesh-based data reduction strategy is developed to keep the I/O problem manageable at scale. △ Less

Submitted 4 June, 2015; originally announced June 2015.

Comments: submitted to Supercomputing 2015

arXiv:1503.06869 [pdf, other]

Two Computational Models for Simulating the Tumbling Motion of Elongated Particles in Fluids

Authors: Dominik Bartuschat, Ellen Fischermeier, Katarina Gustavsson, Ulrich Rüde

Abstract: Suspensions with fiber-like particles in the low Reynolds number regime are modeled by two different approaches that both use a Lagrangian representation of individual particles. The first method is the well-established formulation based on Stokes flow that is formulated as integral equations. It uses a slender body approximation for the fibers to represent the interaction between them directly wi… ▽ More Suspensions with fiber-like particles in the low Reynolds number regime are modeled by two different approaches that both use a Lagrangian representation of individual particles. The first method is the well-established formulation based on Stokes flow that is formulated as integral equations. It uses a slender body approximation for the fibers to represent the interaction between them directly without explicitly computing the flow field. The second is a new technique using the 3D lattice Boltzmann method on parallel supercomputers. Here the flow computation is coupled to a computational model of the dynamics of rigid bodies using fluid-structure interaction techniques. Both methods can be applied to simulate fibers in fluid flow. They are carefully validated and compared against each other, exposing systematically their strengths and weaknesses regarding their accuracy, the computational cost, and possible model extensions. △ Less

Submitted 23 March, 2015; originally announced March 2015.

Comments: Submitted to the Journal Computers & Fluids (Elsevier)

arXiv:1501.07400 [pdf, other]

Resilience for Exascale Enabled Multigrid Methods

Authors: Markus Huber, Björn Gmeiner, Ulrich Rüde, Barbara Wohlmuth

Abstract: With the increasing number of components and further miniaturization the mean time between faults in supercomputers will decrease. System level fault tolerance techniques are expensive and cost energy, since they are often based on redundancy. Also classical check-point-restart techniques reach their limits when the time for storing the system state to backup memory becomes excessive. Therefore, a… ▽ More With the increasing number of components and further miniaturization the mean time between faults in supercomputers will decrease. System level fault tolerance techniques are expensive and cost energy, since they are often based on redundancy. Also classical check-point-restart techniques reach their limits when the time for storing the system state to backup memory becomes excessive. Therefore, algorithm-based fault tolerance mechanisms can become an attractive alternative. This article investigates the solution process for elliptic partial differential equations that are discretized by finite elements. Faults that occur in the parallel geometric multigrid solver are studied in various model scenarios. In a standard domain partitioning approach, the impact of a failure of a core or a node will affect one or several subdomains. Different strategies are developed to compensate the effect of such a failure algorithmically. The recovery is achieved by solving a local subproblem with Dirichlet boundary conditions using local multigrid cycling algorithms. Additionally, we propose a superman strategy where extra compute power is employed to minimize the time of the recovery process. △ Less

Submitted 29 January, 2015; originally announced January 2015.

MSC Class: 68W10; 68N30; 65N55

arXiv:1501.05810 [pdf, other]

Ultrascale Simulations of Non-smooth Granular Dynamics

Authors: Tobias Preclik, Ulrich Rüde

Abstract: This article presents new algorithms for massively parallel granular dynamics simulations on distributed memory architectures using a domain partitioning approach. Collisions are modelled with hard contacts in order to hide their micro-dynamics and thus to extend the time and length scales that can be simulated. The multi-contact problem is solved using a non-linear block Gauss-Seidel method that… ▽ More This article presents new algorithms for massively parallel granular dynamics simulations on distributed memory architectures using a domain partitioning approach. Collisions are modelled with hard contacts in order to hide their micro-dynamics and thus to extend the time and length scales that can be simulated. The multi-contact problem is solved using a non-linear block Gauss-Seidel method that is conforming to the subdomain structure. The parallel algorithms employ a sophisticated protocol between processors that delegate algorithmic tasks such as contact treatment and position integration uniquely and robustly to the processors. Communication overhead is minimized through aggressive message aggregation, leading to excellent strong and weak scaling. The robustness and scalability is assessed on three clusters including two peta-scale supercomputers with up to 458752 processor cores. The simulations can reach unprecedented resolution of up to ten billion non-spherical particles and contacts. △ Less

Submitted 23 January, 2015; originally announced January 2015.

MSC Class: 65Y05 (Primary); 70F35; 70F40; 70E55 (Secondary) ACM Class: I.6.0

arXiv:1410.7254 [pdf, other]

doi 10.1016/j.camwa.2012.12.006

Optimization of the Multigrid-Convergence Rate on Semi-structured Meshes by Local Fourier Analysis

Authors: B. Gmeiner, T. Gradl, F. Gaspar, U. Rüde

Abstract: In this paper a local Fourier analysis for multigrid methods on tetrahedral grids is presented. Different smoothers for the discretization of the Laplace operator by linear finite elements on such grids are analyzed. A four-color smoother is presented as an efficient choice for regular tetrahedral grids, whereas line and plane relaxations are needed for poorly shaped tetrahedra. A novel partitioni… ▽ More In this paper a local Fourier analysis for multigrid methods on tetrahedral grids is presented. Different smoothers for the discretization of the Laplace operator by linear finite elements on such grids are analyzed. A four-color smoother is presented as an efficient choice for regular tetrahedral grids, whereas line and plane relaxations are needed for poorly shaped tetrahedra. A novel partitioning of the Fourier space is proposed to analyze the four-color smoother. Numerical test calculations validate the theoretical predictions. A multigrid method is constructed in a block-wise form, by using different smoothers and different numbers of pre- and post-smoothing steps in each tetrahedron of the coarsest grid of the domain. Some numerical experiments are presented to illustrate the efficiency of this multigrid algorithm. △ Less

Submitted 27 October, 2014; originally announced October 2014.

Journal ref: Computers & Mathematics with Applications, 65(4), 694-711 (2013)

arXiv:1410.6609 [pdf, other]

Parallel Multiphysics Simulations of Charged Particles in Microfluidic Flows

Authors: Dominik Bartuschat, Ulrich Rüde

Abstract: The article describes parallel multiphysics simulations of charged particles in microfluidic flows with the waLBerla framework. To this end, three physical effects are coupled: rigid body dynamics, fluid flow modelled by a lattice Boltzmann algorithm, and electric potentials represented by a finite volume discretisation. For solving the finite volume discretisation for the electrostatic forces, a… ▽ More The article describes parallel multiphysics simulations of charged particles in microfluidic flows with the waLBerla framework. To this end, three physical effects are coupled: rigid body dynamics, fluid flow modelled by a lattice Boltzmann algorithm, and electric potentials represented by a finite volume discretisation. For solving the finite volume discretisation for the electrostatic forces, a cell-centered multigrid algorithm is developed that conforms to the lattice Boltzmann meshes and the parallel communication structure of waLBerla. The new functionality is validated with suitable benchmark scenarios. Additionally, the parallel scaling and the numerical efficiency of the algorithms are analysed on an advanced supercomputer. △ Less

Submitted 24 October, 2014; originally announced October 2014.

Comments: Submitted to Journal of Computational Science (Elsevier)

arXiv:1409.5645 [pdf, ps, other]

doi 10.1016/j.jcp.2015.04.055

Boundary Conditions for Free Interfaces with the Lattice Boltzmann Method

Authors: Simon Bogner, Regina Ammer, Ulrich Rüde

Abstract: In this paper we analyze the boundary treatment of the lattice Boltzmann method (LBM) for simulating 3D flows with free surfaces. The widely used free surface boundary condition of Körner et al. (2005) is shown to be first order accurate. The article presents a new free surface boundary scheme that is suitable for second order accurate simulations based on the LBM. The new method takes into accoun… ▽ More In this paper we analyze the boundary treatment of the lattice Boltzmann method (LBM) for simulating 3D flows with free surfaces. The widely used free surface boundary condition of Körner et al. (2005) is shown to be first order accurate. The article presents a new free surface boundary scheme that is suitable for second order accurate simulations based on the LBM. The new method takes into account the free surface position and its orientation with respect to the computational lattice. Numerical experiments confirm the theoretical findings and illustrate the different behavior of the original method and the new method. △ Less

Submitted 16 February, 2015; v1 submitted 19 September, 2014; originally announced September 2014.

Comments: Preprint submitted to Elsevier

Journal ref: International Journal of Computational Physics 297, 2015, pp. 1 - 12

arXiv:1406.5369 [pdf, other]

A Scala Prototype to Generate Multigrid Solver Implementations for Different Problems and Target Multi-Core Platforms

Authors: Harald Koestler, Christian Schmitt, Sebastian Kuckuk, Frank Hannig, Juergen Teich, Ulrich Ruede

Abstract: Many problems in computational science and engineering involve partial differential equations and thus require the numerical solution of large, sparse (non)linear systems of equations. Multigrid is known to be one of the most efficient methods for this purpose. However, the concrete multigrid algorithm and its implementation highly depend on the underlying problem and hardware. Therefore, changes… ▽ More Many problems in computational science and engineering involve partial differential equations and thus require the numerical solution of large, sparse (non)linear systems of equations. Multigrid is known to be one of the most efficient methods for this purpose. However, the concrete multigrid algorithm and its implementation highly depend on the underlying problem and hardware. Therefore, changes in the code or many different variants are necessary to cover all relevant cases. In this article we provide a prototype implementation in Scala for a framework that allows abstract descriptions of PDEs, their discretization, and their numerical solution via multigrid algorithms. From these, one is able to generate data structures and implementations of multigrid components required to solve elliptic PDEs on structured grids. Two different test problems showcase our proposed automatic generation of multigrid solvers for both CPU and GPU target platforms. △ Less

Submitted 20 June, 2014; originally announced June 2014.

arXiv:1403.3251 [pdf, other]

doi 10.1007/s00170-014-6594-9

Numerical Investigations on Hatching Process Strategies for Powder Bed Based Additive Manufacturing using an Electron Beam

Authors: Matthias Markl, Regina Ammer, Ulrich Rüde, Carolin Körner

Abstract: This paper investigates in hatching process strategies for additive manufacturing using an electron beam by numerical simulations. The underlying physical model and the corresponding three dimensional thermal free surface lattice Boltzmann method of the simulation software are briefly presented. The simulation software has already been validated on the basis of experiments up to 1.2 kW beam power… ▽ More This paper investigates in hatching process strategies for additive manufacturing using an electron beam by numerical simulations. The underlying physical model and the corresponding three dimensional thermal free surface lattice Boltzmann method of the simulation software are briefly presented. The simulation software has already been validated on the basis of experiments up to 1.2 kW beam power by hatching a cuboid with a basic process strategy, whereby the results are classified into `porous', `good' and `uneven', depending on their relative density and top surface smoothness. In this paper we study the limitations of this basic process strategy in terms of higher beam powers and scan velocities to exploit the future potential of high power electron beam guns up to 10 kW. Subsequently, we introduce modified process strategies, which circumvent these restrictions, to build the part as fast as possible under the restriction of a fully dense part with a smooth top surface. These process strategies are suitable to reduce the build time and costs, maximize the beam power usage and therefore use the potential of high power electron beam guns. △ Less

Submitted 30 March, 2015; v1 submitted 13 March, 2014; originally announced March 2014.

Journal ref: The International Journal of Advanced Manufacturing Technology: Volume 78, Issue 1 (2015), Page 239-247

arXiv:1402.2440 [pdf, ps, other]

Validation Experiments for LBM Simulations of Electron Beam Melting

Authors: Regina Ammer, Matthias Markl, Vera Jüchter, Carolin Körner, Ulrich Rüde

Abstract: This paper validates 3D simulation results of electron beam melting (EBM) processes comparing experimental and numerical data. The physical setup is presented which is discretized by a three dimensional (3D) thermal lattice Boltzmann method (LBM). An experimental process window is used for the validation depending on the line energy injected into the metal powder bed and the scan velocity of the e… ▽ More This paper validates 3D simulation results of electron beam melting (EBM) processes comparing experimental and numerical data. The physical setup is presented which is discretized by a three dimensional (3D) thermal lattice Boltzmann method (LBM). An experimental process window is used for the validation depending on the line energy injected into the metal powder bed and the scan velocity of the electron beam. In the process window the EBM products are classified into the categories, porous, good and swelling, depending on the quality of the surface. The same parameter sets are used to generate a numerical process window. A comparison of numerical and experimental process windows shows a good agreement. This validates the EBM model and justifies simulations for future improvements of EBM processes. In particular numerical simulations can be used to explain future process window scenarios and find the best parameter set for a good surface quality and dense products. △ Less

Submitted 11 February, 2014; originally announced February 2014.

Comments: submitted to "International Journal of Modern Physics C"

arXiv:1401.2025 [pdf, other]

doi 10.1016/j.ijmultiphaseflow.2014.10.001

Drag correlation for dilute and moderately dense fluid-particle systems using the lattice Boltzmann method

Authors: Simon Bogner, Swati Mohanty, Ulrich Rüde

Abstract: This paper presents a numerical study of flow through static random assemblies of monodisperse, spherical particles. A lattice Boltzmann approach based on a two relaxation time collision operator is used to obtain reliable predictions of the particle drag by direct numerical simulation. From these predictions a closure law $F(Re, {\varphi})$ of the drag force relationship to the bed density… ▽ More This paper presents a numerical study of flow through static random assemblies of monodisperse, spherical particles. A lattice Boltzmann approach based on a two relaxation time collision operator is used to obtain reliable predictions of the particle drag by direct numerical simulation. From these predictions a closure law $F(Re, {\varphi})$ of the drag force relationship to the bed density ${\varphi}$ and the particle Reynolds number $Re$ is derived. The present study includes densities ${\varphi}$ ranging from $0.01$ to $0.35$ with Re ranging up to $300$, that is compiled into a single drag correlation valid for the whole range. The corelation has a more compact expression compared to others previously reported in literature. At low particle densities, the new correlation is close to the widely used Wen & Yu - correlation. Recently, there has been reported a discrepancy between results obtained using different numerical methods, namely the comprehensive lattice Boltzmann study of Beetstra et al. (2007) and the predictions based on an immersed boundary - pseudo-spectral Navier-Stokes approach (Tenneti et al., 2011). The present study excludes significant finite resolution effects, which have been suspected to cause the reported deviations, but does not coincide exactly with either of the previous studies. This indicates the need for yet more accurate simulation methods in the future. △ Less

Submitted 7 October, 2014; v1 submitted 9 January, 2014; originally announced January 2014.

Comments: Preprint submitted to Elsevier. Comments welcome!

arXiv:1303.1651 [pdf, ps, other]

Model-guided Performance Analysis of the Sparse Matrix-Matrix Multiplication

Authors: Tobias Scharpff, Klaus Iglberger, Georg Hager, Ulrich Ruede

Abstract: Achieving high efficiency with numerical kernels for sparse matrices is of utmost importance, since they are part of many simulation codes and tend to use most of the available compute time and resources. In addition, especially in large scale simulation frameworks the readability and ease of use of mathematical expressions are essential components for the continuous maintenance, modification, and… ▽ More Achieving high efficiency with numerical kernels for sparse matrices is of utmost importance, since they are part of many simulation codes and tend to use most of the available compute time and resources. In addition, especially in large scale simulation frameworks the readability and ease of use of mathematical expressions are essential components for the continuous maintenance, modification, and extension of software. In this context, the sparse matrix-matrix multiplication is of special interest. In this paper we thoroughly analyze the single-core performance of sparse matrix-matrix multiplication kernels in the Blaze Smart Expression Template (SET) framework. We develop simple models for estimating the achievable maximum performance, and use them to assess the efficiency of our implementations. Additionally, we compare these kernels with several commonly used SET-based C++ libraries, which, just as Blaze, aim at combining the requirements of high performance with an elegant user interface. For the different sparse matrix structures considered here, we show that our implementations are competitive or faster than those of the other SET libraries for most problem sizes on a current Intel multicore processor. △ Less

Submitted 6 May, 2013; v1 submitted 7 March, 2013; originally announced March 2013.

Comments: 8 pages, 12 figures. Small corrections w.r.t. previous version

arXiv:1211.6885 [pdf, other]

doi 10.1103/PhysRevLett.109.264504

Permeability of porous materials determined from the Euler characteristic

Authors: Christian Scholz, Frank Wirner, Jan Götz, Ulrich Rüde, Gerd E. Schröder-Turk, Klaus Mecke, Clemens Bechinger

Abstract: We study the permeability of quasi two-dimensional porous structures of randomly placed overlap** monodisperse circular and elliptical grains. Measurements in microfluidic devices and lattice Boltzmann simulations demonstrate that the permeability is determined by the Euler characteristic of the conducting phase. We obtain an expression for the permeability that is independent of the percolation… ▽ More We study the permeability of quasi two-dimensional porous structures of randomly placed overlap** monodisperse circular and elliptical grains. Measurements in microfluidic devices and lattice Boltzmann simulations demonstrate that the permeability is determined by the Euler characteristic of the conducting phase. We obtain an expression for the permeability that is independent of the percolation threshold and shows agreement with experimental and simulated data over a wide range of porosities. Our approach suggests that the permeability explicitly depends on the overlap** probability of grains rather than their shape. △ Less

Submitted 29 November, 2012; originally announced November 2012.

Journal ref: Phys. Rev. Lett. 109, 264504 (2012)

arXiv:1201.0351 [pdf, ps, other]

doi 10.1016/j.camwa.2012.09.012

Liquid-gas-solid flows with lattice Boltzmann: Simulation of floating bodies

Authors: Simon Bogner, Ulrich Rüde

Abstract: This paper presents a model for the simulation of liquid-gas-solid flows by means of the lattice Boltzmann method. The approach is built upon previous works for the simulation of liquid-solid particle suspensions on the one hand, and on a liquid-gas free surface model on the other. We show how the two approaches can be unified by a novel set of dynamic cell conversion rules. For evaluation, we con… ▽ More This paper presents a model for the simulation of liquid-gas-solid flows by means of the lattice Boltzmann method. The approach is built upon previous works for the simulation of liquid-solid particle suspensions on the one hand, and on a liquid-gas free surface model on the other. We show how the two approaches can be unified by a novel set of dynamic cell conversion rules. For evaluation, we concentrate on the rotational stability of non-spherical rigid bodies floating on a plane water surface - a classical hydrostatic problem known from naval architecture. We show the consistency of our method in this kind of flows and obtain convergence towards the ideal solution for the measured heeling stability of a floating box. △ Less

Submitted 1 January, 2012; originally announced January 2012.

Comments: 22 pages, Preprint submitted to Computers and Mathematics with Applications Special Issue ICMMES 2011, Proceedings of the Eighth International Conference for Mesoscopic Methods in Engineering and Science

arXiv:1108.0786 [pdf, other]

All good things come in threes - Three beads learn to swim with lattice Boltzmann and a rigid body solver

Authors: Kristina Pickl, Jan Götz, Klaus Iglberger, Jayant Pande, Klaus Mecke, Ana-Suncana Smith, Ulrich Rüde

Abstract: We simulate the self-propulsion of devices in a fluid in the regime of low Reynolds numbers. Each device consists of three bodies (spheres or capsules) connected with two damped harmonic springs. Sinusoidal driving forces compress the springs which are resolved within a rigid body physics engine. The latter is consistently coupled to a 3D lattice Boltzmann framework for the fluid dynamics. In simu… ▽ More We simulate the self-propulsion of devices in a fluid in the regime of low Reynolds numbers. Each device consists of three bodies (spheres or capsules) connected with two damped harmonic springs. Sinusoidal driving forces compress the springs which are resolved within a rigid body physics engine. The latter is consistently coupled to a 3D lattice Boltzmann framework for the fluid dynamics. In simulations of three-sphere devices, we find that the propulsion velocity agrees well with theoretical predictions. In simulations where some or all spheres are replaced by capsules, we find that the asymmetry of the design strongly affects the propelling efficiency. △ Less

Submitted 3 August, 2011; originally announced August 2011.

arXiv:1104.1729 [pdf, ps, other]

doi 10.1137/110830125

Expression Templates Revisited: A Performance Analysis of the Current ET Methodology

Authors: Klaus Iglberger, Georg Hager, Jan Treibig, Ulrich Ruede

Abstract: In the last decade, Expression Templates (ET) have gained a reputation as an efficient performance optimization tool for C++ codes. This reputation builds on several ET-based linear algebra frameworks focused on combining both elegant and high-performance C++ code. However, on closer examination the assumption that ETs are a performance optimization technique cannot be maintained. In this paper we… ▽ More In the last decade, Expression Templates (ET) have gained a reputation as an efficient performance optimization tool for C++ codes. This reputation builds on several ET-based linear algebra frameworks focused on combining both elegant and high-performance C++ code. However, on closer examination the assumption that ETs are a performance optimization technique cannot be maintained. In this paper we demonstrate and explain the inability of current ET-based frameworks to deliver high performance for dense and sparse linear algebra operations, and introduce a new "smart" ET implementation that truly allows the combination of high performance code with the elegance and maintainability of a domain-specific language. △ Less

Submitted 9 April, 2011; originally announced April 2011.

Comments: 16 pages, 7 figures

Journal ref: SIAM Journal on Scientific Computing 34(2), C42-C69 (2012)

arXiv:1007.1388 [pdf, ps, other]

doi 10.1016/j.parco.2011.03.005

A Flexible Patch-Based Lattice Boltzmann Parallelization Approach for Heterogeneous GPU-CPU Clusters

Authors: Christian Feichtinger, Johannes Habich, Harald Koestler, Georg Hager, Ulrich Ruede, Gerhard Wellein

Abstract: Sustaining a large fraction of single GPU performance in parallel computations is considered to be the major problem of GPU-based clusters. In this article, this topic is addressed in the context of a lattice Boltzmann flow solver that is integrated in the WaLBerla software framework. We propose a multi-GPU implementation using a block-structured MPI parallelization, suitable for load balancing an… ▽ More Sustaining a large fraction of single GPU performance in parallel computations is considered to be the major problem of GPU-based clusters. In this article, this topic is addressed in the context of a lattice Boltzmann flow solver that is integrated in the WaLBerla software framework. We propose a multi-GPU implementation using a block-structured MPI parallelization, suitable for load balancing and heterogeneous computations on CPUs and GPUs. The overhead required for multi-GPU simulations is discussed in detail and it is demonstrated that the kernel performance can be sustained to a large extent. With our GPU implementation, we achieve nearly perfect weak scalability on InfiniBand clusters. However, in strong scaling scenarios multi-GPUs make less efficient use of the hardware than IBM BG/P and x86 clusters. Hence, a cost analysis must determine the best course of action for a particular simulation task. Additionally, weak scaling results of heterogeneous simulations conducted on CPUs and GPUs simultaneously are presented using clusters equipped with varying node configurations. △ Less

Submitted 8 July, 2010; originally announced July 2010.

Comments: 20 pages, 12 figures

Journal ref: Parallel Computing 37(9), 536-549 (2011)

Showing 51–80 of 80 results for author: Rüde, U