Search | arXiv e-print repository

Scheduling massively parallel multigrid for multilevel Monte Carlo methods

Authors: Björn Gmeiner, Daniel Drzisga, Ulrich Ruede, Robert Scheichl, Barbara Wohlmuth

Abstract: The computational complexity of naive, sampling-based uncertainty quantification for 3D partial differential equations is extremely high. Multilevel approaches, such as multilevel Monte Carlo (MLMC), can reduce the complexity significantly, but to exploit them fully in a parallel environment, sophisticated scheduling strategies are needed. Often fast algorithms that are executed in parallel are es… ▽ More The computational complexity of naive, sampling-based uncertainty quantification for 3D partial differential equations is extremely high. Multilevel approaches, such as multilevel Monte Carlo (MLMC), can reduce the complexity significantly, but to exploit them fully in a parallel environment, sophisticated scheduling strategies are needed. Often fast algorithms that are executed in parallel are essential to compute fine level samples in 3D, whereas to compute individual coarse level samples only moderate numbers of processors can be employed efficiently. We make use of multiple instances of a parallel multigrid solver combined with advanced load balancing techniques. In particular, we optimize the concurrent execution across the three layers of the MLMC method: parallelization across levels, across samples, and across the spatial grid. The overall efficiency and performance of these methods will be analyzed. Here the scalability window of the multigrid solver is revealed as being essential, i.e., the property that the solution can be computed with a range of process numbers while maintaining good parallel efficiency. We evaluate the new scheduling strategies in a series of numerical tests, and conclude the paper demonstrating large 3D scaling experiments. △ Less

Submitted 12 July, 2016; originally announced July 2016.

MSC Class: G.1.8

arXiv:1511.02134 [pdf, other]

A quantitative performance analysis for Stokes solvers at the extreme scale

Authors: Björn Gmeiner, Markus Huber, Lorenz John, Ulrich Rüde, Barbara Wohlmuth

Abstract: This article presents a systematic quantitative performance analysis for large finite element computations on extreme scale computing systems. Three parallel iterative solvers for the Stokes system, discretized by low order tetrahedral elements, are compared with respect to their numerical efficiency and their scalability running on up to $786\,432$ parallel threads. A genuine multigrid method for… ▽ More This article presents a systematic quantitative performance analysis for large finite element computations on extreme scale computing systems. Three parallel iterative solvers for the Stokes system, discretized by low order tetrahedral elements, are compared with respect to their numerical efficiency and their scalability running on up to $786\,432$ parallel threads. A genuine multigrid method for the saddle point system using an Uzawa-type smoother provides the best overall performance with respect to memory consumption and time-to-solution. The largest system solved on a Blue Gene/Q system has more than ten trillion ($1.1 \cdot 10 ^{13}$) unknowns and requires about 13 minutes compute time. Despite the matrix free and highly optimized implementation, the memory requirement for the solution vector and the auxiliary vectors is about 200 TByte. Brandt's notion of "textbook multigrid efficiency" is employed to study the algorithmic performance of iterative solvers. A recent extension of this paradigm to "parallel textbook multigrid efficiency" makes it possible to assess also the efficiency of parallel iterative solvers for a given hardware architecture in absolute terms. The efficiency of the method is demonstrated for simulating incompressible fluid flow in a pipe filled with spherical obstacles. △ Less

Submitted 6 November, 2015; originally announced November 2015.

MSC Class: 65N55; 65Y05; 68Q25

arXiv:1506.06185 [pdf, other]

Resilience for Multigrid Software at the Extreme Scale

Authors: Markus Huber, Björn Gmeiner, Ulrich Rüde, Barbara Wohlmuth

Abstract: Fault tolerant algorithms for the numerical approximation of elliptic partial differential equations on modern supercomputers play a more and more important role in the future design of exa-scale enabled iterative solvers. Here, we combine domain partitioning with highly scalable geometric multigrid schemes to obtain fast and fault-robust solvers in three dimensions. The recovery strategy is based… ▽ More Fault tolerant algorithms for the numerical approximation of elliptic partial differential equations on modern supercomputers play a more and more important role in the future design of exa-scale enabled iterative solvers. Here, we combine domain partitioning with highly scalable geometric multigrid schemes to obtain fast and fault-robust solvers in three dimensions. The recovery strategy is based on a hierarchical hybrid concept where the values on lower dimensional primitives such as faces are stored redundantly and thus can be recovered easily in case of a failure. The lost volume unknowns in the faulty region are re-computed approximately with multigrid cycles by solving a local Dirichlet problem on the faulty subdomain. Different strategies are compared and evaluated with respect to performance, computational cost, and speed up. Especially effective are strategies in which the local recovery in the faulty region is executed in parallel with global solves and when the local recovery is additionally accelerated. This results in an asynchronous multigrid iteration that can fully compensate faults. Excellent parallel performance on a current peta-scale system is demonstrated. △ Less

Submitted 19 June, 2015; originally announced June 2015.

MSC Class: 65N55; 65Y05; 68Q85

arXiv:1501.07400 [pdf, other]

Resilience for Exascale Enabled Multigrid Methods

Authors: Markus Huber, Björn Gmeiner, Ulrich Rüde, Barbara Wohlmuth

Abstract: With the increasing number of components and further miniaturization the mean time between faults in supercomputers will decrease. System level fault tolerance techniques are expensive and cost energy, since they are often based on redundancy. Also classical check-point-restart techniques reach their limits when the time for storing the system state to backup memory becomes excessive. Therefore, a… ▽ More With the increasing number of components and further miniaturization the mean time between faults in supercomputers will decrease. System level fault tolerance techniques are expensive and cost energy, since they are often based on redundancy. Also classical check-point-restart techniques reach their limits when the time for storing the system state to backup memory becomes excessive. Therefore, algorithm-based fault tolerance mechanisms can become an attractive alternative. This article investigates the solution process for elliptic partial differential equations that are discretized by finite elements. Faults that occur in the parallel geometric multigrid solver are studied in various model scenarios. In a standard domain partitioning approach, the impact of a failure of a core or a node will affect one or several subdomains. Different strategies are developed to compensate the effect of such a failure algorithmically. The recovery is achieved by solving a local subproblem with Dirichlet boundary conditions using local multigrid cycling algorithms. Additionally, we propose a superman strategy where extra compute power is employed to minimize the time of the recovery process. △ Less

Submitted 29 January, 2015; originally announced January 2015.

MSC Class: 68W10; 68N30; 65N55

Showing 1–4 of 4 results for author: Gmeiner, B