-
Distributed convergence detection based on global residual error under asynchronous iterations
Authors:
Frédéric Magoulès,
Guillaume Gbikpi-Benissan
Abstract:
Convergence of classical parallel iterations is detected by performing a reduction operation at each iteration in order to compute a residual error relative to a potential solution vector. To efficiently run asynchronous iterations, blocking communication requests are avoided, which makes it hard to isolate and handle any global vector. While some termination protocols were proposed for asynchrono…
▽ More
Convergence of classical parallel iterations is detected by performing a reduction operation at each iteration in order to compute a residual error relative to a potential solution vector. To efficiently run asynchronous iterations, blocking communication requests are avoided, which makes it hard to isolate and handle any global vector. While some termination protocols were proposed for asynchronous iterations, only very few of them are based on global residual computation and guarantee effective convergence. But the most effective and efficient existing solutions feature two reduction operations, which constitutes an important factor of termination delay. In this paper, we present new, non-intrusive, protocols to compute a residual error under asynchronous iterations, requiring only one reduction operation. Various communication models show that some heuristics can even be introduced and formally evaluated. Extensive experiments with up to 5600 processor cores confirm the practical effectiveness and efficiency of our approach.
△ Less
Submitted 29 December, 2023;
originally announced December 2023.
-
Asynchronous iterations of HSS method for non-Hermitian linear systems
Authors:
Guillaume Gbikpi-Benissan,
Qinmeng Zou,
Frédéric Magoulès
Abstract:
A general asynchronous alternating iterative model is designed, for which convergence is theoretically ensured both under classical spectral radius bound and, then, for a classical class of matrix splittings for $\mathsf H$-matrices. The computational model can be thought of as a two-stage alternating iterative method, which well suits to the well-known Hermitian and skew-Hermitian splitting (HSS)…
▽ More
A general asynchronous alternating iterative model is designed, for which convergence is theoretically ensured both under classical spectral radius bound and, then, for a classical class of matrix splittings for $\mathsf H$-matrices. The computational model can be thought of as a two-stage alternating iterative method, which well suits to the well-known Hermitian and skew-Hermitian splitting (HSS) approach, with the particularity here of considering only one inner iteration. Experimental parallel performance comparison is conducted between the generalized minimal residual (GMRES) algorithm, the standard HSS and our asynchronous variant, on both real and complex non-Hermitian linear systems respectively arising from convection-diffusion and structural dynamics problems. A significant gain on execution time is observed in both cases.
△ Less
Submitted 27 December, 2023;
originally announced December 2023.
-
Accurate Coarse Residual for Two-Level Asynchronous Domain Decomposition Methods
Authors:
Guillaume Gbikpi-Benissan,
Frédéric Magoulès
Abstract:
Recently, asynchronous coarse-space correction has been achieved within both the overlap** Schwarz and the primal Schur frameworks. Both additive and multiplicative corrections have been discussed. In this paper, we address some implementation drawbacks of the proposed additive correction scheme. In the existing approach, each coarse solution is applied only once, leaving most of the iterations…
▽ More
Recently, asynchronous coarse-space correction has been achieved within both the overlap** Schwarz and the primal Schur frameworks. Both additive and multiplicative corrections have been discussed. In this paper, we address some implementation drawbacks of the proposed additive correction scheme. In the existing approach, each coarse solution is applied only once, leaving most of the iterations of the solver without coarse-space information while building the right-hand side of the coarse problem. Moreover, one-sided routines of the Message Passing Interface (MPI) standard were considered, which introduced the need for a sleep statement in the iterations loop of the coarse solver. This implies a tuning of the sleep period, which is a non-discrete quantity. In this paper, we improve the accuracy of the coarse right-hand side, which allowed for more frequent corrections. In addition, we highlight a two-sided implementation which better suits the asynchronous coarse-space correction scheme. Numerical experiments show a significant performance gain with such increased incorporation of the coarse space.
△ Less
Submitted 19 October, 2023;
originally announced October 2023.
-
JACK2: a new high-level communication library for parallel iterative methods
Authors:
Guillaume Gbikpi-Benissan,
Frederic Magoules
Abstract:
In this paper, we address the problem of designing a distributed application meant to run both classical and asynchronous iterations. MPI libraries are very popular and widely used in the scientific community, however asynchronous iterative methods raise non-negligible difficulties about the efficient management of communication requests and buffers. Moreover, a convergence detection issue is intr…
▽ More
In this paper, we address the problem of designing a distributed application meant to run both classical and asynchronous iterations. MPI libraries are very popular and widely used in the scientific community, however asynchronous iterative methods raise non-negligible difficulties about the efficient management of communication requests and buffers. Moreover, a convergence detection issue is introduced, which requires the implementation of one of the various state-of-the-art termination methods, which are not necessarily highly reliable for most computational environments. We propose here an MPI-based communication library which handles all these issues in a non-intrusive manner, providing a unique interface for implementing both classical and asynchronous iterations. Few details are highlighted about our approach to achieve best communication rates and ensure accurate convergence detection. Experimental results on two supercomputers confirmed the low overhead communication costs introduced, and the effectiveness of our library.
△ Less
Submitted 30 June, 2022;
originally announced June 2022.
-
Distributed asynchronous convergence detection without detection protocol
Authors:
Guillaume Gbikpi-Benissan,
Frederic Magoules
Abstract:
In this paper, we address the problem of detecting the moment when an ongoing asynchronous parallel iterative process can be terminated to provide a sufficiently precise solution to a fixed-point problem being solved. Formulating the detection problem as a global solution identification problem, we analyze the snapshot-based approach, which is the only one that allows for exact global residual err…
▽ More
In this paper, we address the problem of detecting the moment when an ongoing asynchronous parallel iterative process can be terminated to provide a sufficiently precise solution to a fixed-point problem being solved. Formulating the detection problem as a global solution identification problem, we analyze the snapshot-based approach, which is the only one that allows for exact global residual error computation. From a recently developed approximate snapshot protocol providing a reliable global residual error, we experimentally investigate here, as well, the reliability of a global residual error computed without any prior particular detection mechanism. Results on a single-site supercomputer successfully show that such high-performance computing platforms possibly provide computational environments stable enough to allow for simply resorting to non-blocking reduction operations for computing reliable global residual errors, which provides noticeable time saving, at both implementation and execution levels.
△ Less
Submitted 30 June, 2022;
originally announced June 2022.
-
Asynchronous parareal time discretization for partial differential equations
Authors:
Frederic Magoules,
Guillaume Gbikpi-Benissan
Abstract:
Asynchronous iterations are more and more investigated for both scaling and fault-resilience purpose on high performance computing platforms. While so far, they have been exclusively applied within space domain decomposition frameworks, this paper advocates a novel application direction targeting time-decomposed time-parallel approaches. Specifically, an asynchronous iterative model is derived fro…
▽ More
Asynchronous iterations are more and more investigated for both scaling and fault-resilience purpose on high performance computing platforms. While so far, they have been exclusively applied within space domain decomposition frameworks, this paper advocates a novel application direction targeting time-decomposed time-parallel approaches. Specifically, an asynchronous iterative model is derived from the Parareal scheme, for which convergence and speedup analysis are then conducted. It turned out that Parareal and async-Parareal feature very close convergence conditions, asymptotically equivalent, including the finite-time termination property. Based on a computational cost model aware of unsteady communication delays, our speedup analysis shows the potential performance gain from asynchronous iterations, which is confirmed by some experimental case of heat evolution on a homogeneous supercomputer. This primary work clearly suggests possible further benefits from asynchronous iterations.
△ Less
Submitted 20 October, 2021;
originally announced October 2021.
-
Spectral domain decomposition method for physically-based rendering of photochromic/electrochromic glass windows
Authors:
Guillaume Gbikpi-Benissan,
Patrick Callet,
Frederic Magoules
Abstract:
This paper covers the time consuming issues intrinsic to physically-based image rendering algorithms. First, glass materials optical properties were measured on samples of real glasses and other objects materials inside an hotel room were characterized by deducing spectral data from multiple trichromatic images. We then present the rendering model and ray-tracing algorithm implemented in Virtueliu…
▽ More
This paper covers the time consuming issues intrinsic to physically-based image rendering algorithms. First, glass materials optical properties were measured on samples of real glasses and other objects materials inside an hotel room were characterized by deducing spectral data from multiple trichromatic images. We then present the rendering model and ray-tracing algorithm implemented in Virtuelium, an open source software. In order to accelerate the computation of the interactions between light rays and objects, the ray-tracing algorithm is parallelized by means of domain decomposition method techniques. Numerical experiments show that the speedups obtained with classical parallelization techniques are significantly less significant than those achieved with parallel domain decomposition methods.
△ Less
Submitted 9 December, 2019;
originally announced December 2019.
-
Spectral Domain Decomposition Method for Natural Lighting and Medieval Glass Rendering
Authors:
Guillaume Gbikpi-Benissan,
Remi Cerise,
Patrick Callet,
Frederic Magoules
Abstract:
In this paper, we use an original ray-tracing domain decomposition method to address image rendering of naturally lighted scenes. This new method allows to particularly analyze rendering problems on parallel architectures, in the case of interactions between light-rays and glass material. Numerical experiments, for medieval glass rendering within the church of the Royaumont abbey, illustrate the p…
▽ More
In this paper, we use an original ray-tracing domain decomposition method to address image rendering of naturally lighted scenes. This new method allows to particularly analyze rendering problems on parallel architectures, in the case of interactions between light-rays and glass material. Numerical experiments, for medieval glass rendering within the church of the Royaumont abbey, illustrate the performance of the proposed ray-tracing domain decomposition method (DDM) on multi-cores and multi-processors architectures. On one hand, applying domain decomposition techniques increases speedups obtained by parallelizing the computation. On the other hand, for a fixed number of parallel processes, we notice that speedups increase as the number of sub-domains do.
△ Less
Submitted 9 December, 2019;
originally announced December 2019.
-
Spectral domain decomposition method for physically-based rendering of Royaumont abbey
Authors:
Guillaume Gbikpi-Benissan,
Patrick Callet,
Frederic Magoules
Abstract:
In the context of a virtual reconstitution of the destroyed Royaumont abbey church, this paper investigates computer sciences issues intrinsic to the physically-based image rendering. First, a virtual model was designed from historical sources and archaeological descriptions. Then some materials physical properties were measured on remains of the church and on pieces from similar ancient churches.…
▽ More
In the context of a virtual reconstitution of the destroyed Royaumont abbey church, this paper investigates computer sciences issues intrinsic to the physically-based image rendering. First, a virtual model was designed from historical sources and archaeological descriptions. Then some materials physical properties were measured on remains of the church and on pieces from similar ancient churches. We specify the properties of our lighting source which is a representation of the sun, and present the rendering algorithm implemented in our software Virtuelium. In order to accelerate the computation of the interactions between light-rays and objects, this ray-tracing algorithm is parallelized by means of domain decomposition techniques. Numerical experiments show that the computational time saved by a classic parallelization is much less significant than that gained with our approach.
△ Less
Submitted 9 December, 2019;
originally announced December 2019.
-
Beam-tracing domain decomposition method for urban acoustic pollution
Authors:
Guillaume Gbikpi-Benissan,
Frederic Magoules
Abstract:
This paper covers the fast solution of large acoustic problems on low-resources parallel platforms. A domain decomposition method is coupled with a dynamic load balancing scheme to efficiently accelerate a geometrical acoustic method. The geometrical method studied implements a beam-tracing method where intersections are handled as in a ray-tracing method. Beyond the distribution of the global pro…
▽ More
This paper covers the fast solution of large acoustic problems on low-resources parallel platforms. A domain decomposition method is coupled with a dynamic load balancing scheme to efficiently accelerate a geometrical acoustic method. The geometrical method studied implements a beam-tracing method where intersections are handled as in a ray-tracing method. Beyond the distribution of the global processing upon multiple sub-domains, a second parallelization level is operated by means of multi-threading and shared memory mechanisms.
Numerical experiments show that this method allows to handle large scale open domains for parallel computing purposes on few machines. Urban acoustic pollution arrising from car traffic was simulated on a large model of the Shinjuku district of Tokyo, Japan. The good speed-up results illustrate the performance of this new domain decomposition method.
△ Less
Submitted 9 December, 2019;
originally announced December 2019.
-
Asynchronous Communications Library for the Parallel-in-Time Solution of Black-Scholes Equation
Authors:
Qinmeng Zou,
Guillaume Gbikpi-Benissan,
Frederic Magoules
Abstract:
The advent of asynchronous iterative scheme gives high efficiency to numerical computations. However, it is generally difficult to handle the problems of resource management and convergence detection. This paper uses JACK2, an asynchronous communication kernel library for iterative algorithms, to implement both classical and asynchronous parareal algorithms, especially the latter. We illustrate th…
▽ More
The advent of asynchronous iterative scheme gives high efficiency to numerical computations. However, it is generally difficult to handle the problems of resource management and convergence detection. This paper uses JACK2, an asynchronous communication kernel library for iterative algorithms, to implement both classical and asynchronous parareal algorithms, especially the latter. We illustrate the measures whereby one can tackle the problems above elegantly for the time-dependent case. Finally, experiments are presented to prove the availability and efficiency of such application.
△ Less
Submitted 2 July, 2019;
originally announced July 2019.
-
Asynchronous Parareal Algorithm Applied to European Option Pricing
Authors:
Qinmeng Zou,
Guillaume Gbikpi-Benissan,
Frederic Magoules
Abstract:
Asynchronous iterations arise naturally in parallel computing if one wants to solve large problems with a minimization of the idle times. This paper presents an original model of asynchronous iterations for a time-domain decomposition method, namely the parareal method. The asynchronous parareal algorithm is here applied to European option pricing, and numerical experiments performed on a parallel…
▽ More
Asynchronous iterations arise naturally in parallel computing if one wants to solve large problems with a minimization of the idle times. This paper presents an original model of asynchronous iterations for a time-domain decomposition method, namely the parareal method. The asynchronous parareal algorithm is here applied to European option pricing, and numerical experiments performed on a parallel supercomputer, illustrate the performance and efficiency of this new method.
△ Less
Submitted 2 July, 2019;
originally announced July 2019.