Search | arXiv e-print repository

doi 10.1109/TPDS.2017.2780856

Distributed convergence detection based on global residual error under asynchronous iterations

Authors: Frédéric Magoulès, Guillaume Gbikpi-Benissan

Abstract: Convergence of classical parallel iterations is detected by performing a reduction operation at each iteration in order to compute a residual error relative to a potential solution vector. To efficiently run asynchronous iterations, blocking communication requests are avoided, which makes it hard to isolate and handle any global vector. While some termination protocols were proposed for asynchrono… ▽ More Convergence of classical parallel iterations is detected by performing a reduction operation at each iteration in order to compute a residual error relative to a potential solution vector. To efficiently run asynchronous iterations, blocking communication requests are avoided, which makes it hard to isolate and handle any global vector. While some termination protocols were proposed for asynchronous iterations, only very few of them are based on global residual computation and guarantee effective convergence. But the most effective and efficient existing solutions feature two reduction operations, which constitutes an important factor of termination delay. In this paper, we present new, non-intrusive, protocols to compute a residual error under asynchronous iterations, requiring only one reduction operation. Various communication models show that some heuristics can even be introduced and formally evaluated. Extensive experiments with up to 5600 processor cores confirm the practical effectiveness and efficiency of our approach. △ Less

Submitted 29 December, 2023; originally announced December 2023.

arXiv:2312.16505 [pdf, ps, other]

doi 10.1080/00207160.2021.1952572

Asynchronous iterations of HSS method for non-Hermitian linear systems

Authors: Guillaume Gbikpi-Benissan, Qinmeng Zou, Frédéric Magoulès

Abstract: A general asynchronous alternating iterative model is designed, for which convergence is theoretically ensured both under classical spectral radius bound and, then, for a classical class of matrix splittings for $\mathsf H$-matrices. The computational model can be thought of as a two-stage alternating iterative method, which well suits to the well-known Hermitian and skew-Hermitian splitting (HSS)… ▽ More A general asynchronous alternating iterative model is designed, for which convergence is theoretically ensured both under classical spectral radius bound and, then, for a classical class of matrix splittings for $\mathsf H$-matrices. The computational model can be thought of as a two-stage alternating iterative method, which well suits to the well-known Hermitian and skew-Hermitian splitting (HSS) approach, with the particularity here of considering only one inner iteration. Experimental parallel performance comparison is conducted between the generalized minimal residual (GMRES) algorithm, the standard HSS and our asynchronous variant, on both real and complex non-Hermitian linear systems respectively arising from convection-diffusion and structural dynamics problems. A significant gain on execution time is observed in both cases. △ Less

Submitted 27 December, 2023; originally announced December 2023.

arXiv:2310.14948 [pdf, other]

doi 10.4203/ccc.5.4.2

Physics-Informed Graph Convolutional Networks: Towards a generalized framework for complex geometries

Authors: Marien Chenaud, José Alves, Frédéric Magoulès

Abstract: Since the seminal work of [9] and their Physics-Informed neural networks (PINNs), many efforts have been conducted towards solving partial differential equations (PDEs) with Deep Learning models. However, some challenges remain, for instance the extension of such models to complex three-dimensional geometries, and a study on how such approaches could be combined to classical numerical solvers. In… ▽ More Since the seminal work of [9] and their Physics-Informed neural networks (PINNs), many efforts have been conducted towards solving partial differential equations (PDEs) with Deep Learning models. However, some challenges remain, for instance the extension of such models to complex three-dimensional geometries, and a study on how such approaches could be combined to classical numerical solvers. In this work, we justify the use of graph neural networks for these problems, based on the similarity between these architectures and the meshes used in traditional numerical techniques for solving partial differential equations. After proving an issue with the Physics-Informed framework for complex geometries, during the computation of PDE residuals, an alternative procedure is proposed, by combining classical numerical solvers and the Physics-Informed framework. Finally, we propose an implementation of this approach, that we test on a three-dimensional problem on an irregular geometry. △ Less

Submitted 24 November, 2023; v1 submitted 20 October, 2023; originally announced October 2023.

Journal ref: Civil-Comp Conferences, Volume 5, Paper 4.2, Civil-Comp Press, Edinburgh, United Kingdom, 2023

arXiv:2310.14707 [pdf, other]

A Hybrid GNN approach for predicting node data for 3D meshes

Authors: Shwetha Salimath, Francesca Bugiotti, Frederic Magoules

Abstract: Metal forging is used to manufacture dies. We require the best set of input parameters for the process to be efficient. Currently, we predict the best parameters using the finite element method by generating simulations for the different initial conditions, which is a time-consuming process. In this paper, introduce a hybrid approach that helps in processing and generating new data simulations usi… ▽ More Metal forging is used to manufacture dies. We require the best set of input parameters for the process to be efficient. Currently, we predict the best parameters using the finite element method by generating simulations for the different initial conditions, which is a time-consuming process. In this paper, introduce a hybrid approach that helps in processing and generating new data simulations using a surrogate graph neural network model based on graph convolutions, having a cheaper time cost. We also introduce a hybrid approach that helps in processing and generating new data simulations using the model. Given a dataset representing meshes, our focus is on the conversion of the available information into a graph or point cloud structure. This new representation enables deep learning. The predicted result is similar, with a low error when compared to that produced using the finite element method. The new models have outperformed existing PointNet and simple graph neural network models when applied to produce the simulations. △ Less

Submitted 23 October, 2023; originally announced October 2023.

arXiv:2310.13494 [pdf, other]

Enhancing the Global/Local Coupling Method: An Asynchronous Parallel Framework

Authors: Ahmed El Kerim, Pierre Gosselet, Frédéric Magoulès

Abstract: A novel approach is being developed to introduce a parallel asynchronous implementation of non-intrusive global-local coupling. This study examines scenarios involving numerous patches, including those covering the entire structure. By leveraging asynchronous, the method aims to minimize reliance on communication, handle failures effectively, and address load imbalances. Detailed insights into the… ▽ More A novel approach is being developed to introduce a parallel asynchronous implementation of non-intrusive global-local coupling. This study examines scenarios involving numerous patches, including those covering the entire structure. By leveraging asynchronous, the method aims to minimize reliance on communication, handle failures effectively, and address load imbalances. Detailed insights into the methodology are presented, accompanied by a demonstration of its performance through an academic case study. △ Less

Submitted 20 October, 2023; originally announced October 2023.

arXiv:2310.12605 [pdf, ps, other]

doi 10.4203/ccc.4.1.1

Accurate Coarse Residual for Two-Level Asynchronous Domain Decomposition Methods

Authors: Guillaume Gbikpi-Benissan, Frédéric Magoulès

Abstract: Recently, asynchronous coarse-space correction has been achieved within both the overlap** Schwarz and the primal Schur frameworks. Both additive and multiplicative corrections have been discussed. In this paper, we address some implementation drawbacks of the proposed additive correction scheme. In the existing approach, each coarse solution is applied only once, leaving most of the iterations… ▽ More Recently, asynchronous coarse-space correction has been achieved within both the overlap** Schwarz and the primal Schur frameworks. Both additive and multiplicative corrections have been discussed. In this paper, we address some implementation drawbacks of the proposed additive correction scheme. In the existing approach, each coarse solution is applied only once, leaving most of the iterations of the solver without coarse-space information while building the right-hand side of the coarse problem. Moreover, one-sided routines of the Message Passing Interface (MPI) standard were considered, which introduced the need for a sleep statement in the iterations loop of the coarse solver. This implies a tuning of the sleep period, which is a non-discrete quantity. In this paper, we improve the accuracy of the coarse right-hand side, which allowed for more frequent corrections. In addition, we highlight a two-sided implementation which better suits the asynchronous coarse-space correction scheme. Numerical experiments show a significant performance gain with such increased incorporation of the coarse space. △ Less

Submitted 19 October, 2023; originally announced October 2023.

arXiv:2211.13073 [pdf, other]

doi 10.1016/j.cma.2023.115910

Asynchronous global-local non-invasive coupling for linear elliptic problems

Authors: Ahmed El Kerim, Pierre Gosselet, Frédéric Magoulès

Abstract: This paper presents the first asynchronous version of the Global/Local non-invasive coupling, capable of dealing efficiently with multiple, possibly adjacent, patches. We give a new interpretation of the coupling in terms of primal domain decomposition method, and we prove the convergence of the relaxed asynchronous iteration. The asynchronous paradigm lifts many bottlenecks of the Global/Local co… ▽ More This paper presents the first asynchronous version of the Global/Local non-invasive coupling, capable of dealing efficiently with multiple, possibly adjacent, patches. We give a new interpretation of the coupling in terms of primal domain decomposition method, and we prove the convergence of the relaxed asynchronous iteration. The asynchronous paradigm lifts many bottlenecks of the Global/Local coupling performance. We illustrate the method on several linear elliptic problems as encountered in thermal and elasticity studies. △ Less

Submitted 23 November, 2022; originally announced November 2022.

arXiv:2211.09380 [pdf, other]

Multilayer Perceptron-based Surrogate Models for Finite Element Analysis

Authors: Lawson Oliveira Lima, Julien Rosenberger, Esteban Antier, Frederic Magoules

Abstract: Many Partial Differential Equations (PDEs) do not have analytical solution, and can only be solved by numerical methods. In this context, Physics-Informed Neural Networks (PINN) have become important in the last decades, since it uses a neural network and physical conditions to approximate any functions. This paper focuses on hypertuning of a PINN, used to solve a PDE. The behavior of the approxim… ▽ More Many Partial Differential Equations (PDEs) do not have analytical solution, and can only be solved by numerical methods. In this context, Physics-Informed Neural Networks (PINN) have become important in the last decades, since it uses a neural network and physical conditions to approximate any functions. This paper focuses on hypertuning of a PINN, used to solve a PDE. The behavior of the approximated solution when we change the learning rate or the activation function (sigmoid, hyperbolic tangent, GELU, ReLU and ELU) is here analyzed. A comparative study is done to determine the best characteristics in the problem, as well as to find a learning rate that allows fast and satisfactory learning. GELU and hyperbolic tangent activation functions exhibit better performance than other activation functions. A suitable choice of the learning rate results in higher accuracy and faster convergence. △ Less

Submitted 17 November, 2022; originally announced November 2022.

arXiv:2211.09373 [pdf, other]

Graph Neural Network-based Surrogate Models for Finite Element Analysis

Authors: Meduri Venkata Shivaditya, José Alves, Francesca Bugiotti, Frederic Magoules

Abstract: Current simulation of metal forging processes use advanced finite element methods. Such methods consist of solving mathematical equations, which takes a significant amount of time for the simulation to complete. Computational time can be prohibitive for parametric response surface exploration tasks. In this paper, we propose as an alternative, a Graph Neural Network-based graph prediction model to… ▽ More Current simulation of metal forging processes use advanced finite element methods. Such methods consist of solving mathematical equations, which takes a significant amount of time for the simulation to complete. Computational time can be prohibitive for parametric response surface exploration tasks. In this paper, we propose as an alternative, a Graph Neural Network-based graph prediction model to act as a surrogate model for parameters search space exploration and which exhibits a time cost reduced by an order of magnitude. Numerical experiments show that this new model outperforms the Point-Net model and the Dynamic Graph Convolutional Neural Net model. △ Less

Submitted 17 November, 2022; originally announced November 2022.

arXiv:2208.03707 [pdf, other]

Asynchronous scalable version of the Global-Local non-invasive coupling

Authors: Ahmed El Kerim, Pierre Gosselet, Frederic Magoules

Abstract: The Global-Local non-invasive coupling is an improvement of the submodeling technique, which permits to locally enhance structure computations by introducing patches with refined models and to take into accounts all the interactions. In order to circumvent its inherently limited computational performance, we propose and implement an asynchronous version of the method. The asynchronous coupling red… ▽ More The Global-Local non-invasive coupling is an improvement of the submodeling technique, which permits to locally enhance structure computations by introducing patches with refined models and to take into accounts all the interactions. In order to circumvent its inherently limited computational performance, we propose and implement an asynchronous version of the method. The asynchronous coupling reduces the dependency on communications, failures, and load imbalance. We present the theory and the implementation of the method in the linear case and illustrate its performance on academic cases inspired by actual industrial problems. △ Less

Submitted 7 August, 2022; originally announced August 2022.

arXiv:2207.09159 [pdf, other]

Couplage Global-Local en asynchrone pour des problèmes linéaires

Authors: Ahmed El Kerim, Pierre Gosselet, Frederic Magoules

Abstract: An asynchronous parallel version of the non-intrusive global-local coupling is implemented. The case of many patches, including those covering the entire structure, is studied. The asynchronism limits the dependency on communications, failures, and load imbalance. We detail the method and illustrate its performance in an academic case. An asynchronous parallel version of the non-intrusive global-local coupling is implemented. The case of many patches, including those covering the entire structure, is studied. The asynchronism limits the dependency on communications, failures, and load imbalance. We detail the method and illustrate its performance in an academic case. △ Less

Submitted 19 July, 2022; originally announced July 2022.

Comments: in French language

arXiv:2206.15420 [pdf, other]

doi 10.4203/ccp.111.39

JACK2: a new high-level communication library for parallel iterative methods

Authors: Guillaume Gbikpi-Benissan, Frederic Magoules

Abstract: In this paper, we address the problem of designing a distributed application meant to run both classical and asynchronous iterations. MPI libraries are very popular and widely used in the scientific community, however asynchronous iterative methods raise non-negligible difficulties about the efficient management of communication requests and buffers. Moreover, a convergence detection issue is intr… ▽ More In this paper, we address the problem of designing a distributed application meant to run both classical and asynchronous iterations. MPI libraries are very popular and widely used in the scientific community, however asynchronous iterative methods raise non-negligible difficulties about the efficient management of communication requests and buffers. Moreover, a convergence detection issue is introduced, which requires the implementation of one of the various state-of-the-art termination methods, which are not necessarily highly reliable for most computational environments. We propose here an MPI-based communication library which handles all these issues in a non-intrusive manner, providing a unique interface for implementing both classical and asynchronous iterations. Few details are highlighted about our approach to achieve best communication rates and ensure accurate convergence detection. Experimental results on two supercomputers confirmed the low overhead communication costs introduced, and the effectiveness of our library. △ Less

Submitted 30 June, 2022; originally announced June 2022.

arXiv:2206.15418 [pdf, other]

doi 10.4203/ccp.112.17

Distributed asynchronous convergence detection without detection protocol

Authors: Guillaume Gbikpi-Benissan, Frederic Magoules

Abstract: In this paper, we address the problem of detecting the moment when an ongoing asynchronous parallel iterative process can be terminated to provide a sufficiently precise solution to a fixed-point problem being solved. Formulating the detection problem as a global solution identification problem, we analyze the snapshot-based approach, which is the only one that allows for exact global residual err… ▽ More In this paper, we address the problem of detecting the moment when an ongoing asynchronous parallel iterative process can be terminated to provide a sufficiently precise solution to a fixed-point problem being solved. Formulating the detection problem as a global solution identification problem, we analyze the snapshot-based approach, which is the only one that allows for exact global residual error computation. From a recently developed approximate snapshot protocol providing a reliable global residual error, we experimentally investigate here, as well, the reliability of a global residual error computed without any prior particular detection mechanism. Results on a single-site supercomputer successfully show that such high-performance computing platforms possibly provide computational environments stable enough to allow for simply resorting to non-blocking reduction operations for computing reliable global residual errors, which provides noticeable time saving, at both implementation and execution levels. △ Less

Submitted 30 June, 2022; originally announced June 2022.

arXiv:2112.11880 [pdf, other]

Iterative Krylov Methods for Acoustic Problems on Graphics Processing Unit

Authors: Abal-Kassim Cheik Ahamed, Frederic Magoules

Abstract: This paper deals with linear algebra operations on Graphics Processing Unit (GPU) with complex number arithmetic using double precision. An analysis of their uses within iterative Krylov methods is presented to solve acoustic problems. Numerical experiments performed on a set of acoustic matrices arising from the modelisation of acoustic phenomena inside a car compartment are collected, and outlin… ▽ More This paper deals with linear algebra operations on Graphics Processing Unit (GPU) with complex number arithmetic using double precision. An analysis of their uses within iterative Krylov methods is presented to solve acoustic problems. Numerical experiments performed on a set of acoustic matrices arising from the modelisation of acoustic phenomena inside a car compartment are collected, and outline the performance, robustness and effectiveness of our algorithms, with a speed-up up to 28x for dot product, 9.8x for sparse matrix-vector product and solvers. △ Less

Submitted 22 December, 2021; originally announced December 2021.

arXiv:2112.06465 [pdf, other]

Accelerated solution of Helmholtz equation with Iterative Krylov Methods on GPU

Authors: Abal-Kassim Cheik Ahamed, Frederic Magoules

Abstract: This paper gives an analysis and an evaluation of linear algebra operations on Graphics Processing Unit (GPU) with complex number arithmetics with double precision. Knowing the performance of these operations, iterative Krylov methods are considered to solve the acoustic problem efficiently. Numerical experiments carried out on a set of acoustic matrices arising from the modelisation of acoustic p… ▽ More This paper gives an analysis and an evaluation of linear algebra operations on Graphics Processing Unit (GPU) with complex number arithmetics with double precision. Knowing the performance of these operations, iterative Krylov methods are considered to solve the acoustic problem efficiently. Numerical experiments carried out on a set of acoustic matrices arising from the modelisation of acoustic phenomena within a cylinder and a car compartment are exposed, exhibiting the performance, robustness and efficiency of our algorithms, with a ratio up to 27x for dot product, 10x for sparse matrix-vector product and solvers in complex double precision arithmetics. △ Less

Submitted 13 December, 2021; originally announced December 2021.

arXiv:2112.03851 [pdf, ps, other]

Stochastic Optimized Schwarz Methods for the Gravity Equations on Graphics Processing Unit

Authors: Abal-Kassim Cheik Ahamed, Frederic Magoules

Abstract: Low order, sequential or non-massively parallel finite elements are generaly used for three-dimensional gravity modelling. In this paper, in order to obtain better gravity anomaly solutions in heterogeneous media, we solve the gravimetry problem using massively parallel high order finite elements on hybrid multi-CPU/GPU clusters. Parallel algorithms well suited for such hybrid architectures have t… ▽ More Low order, sequential or non-massively parallel finite elements are generaly used for three-dimensional gravity modelling. In this paper, in order to obtain better gravity anomaly solutions in heterogeneous media, we solve the gravimetry problem using massively parallel high order finite elements on hybrid multi-CPU/GPU clusters. Parallel algorithms well suited for such hybrid architectures have to be designed. A new stochastic-based optimization procedure for the optimized Schwarz method is here presented, implemented and tuned to graphical cards processors units. Numerical experiments performed on a reallistic test case, demonstrates the robustness and efficiency of the proposed method and of its implementation on massive multi-CPU/GPU architectures. △ Less

Submitted 7 December, 2021; originally announced December 2021.

arXiv:2112.02377 [pdf, other]

On the stability and performance of the solution of sparse linear systems by partitioned procedures

Authors: Abal-Kassim Cheik Ahamed, Frederic Magoules

Abstract: In this paper, we present, evaluate and analyse the performance of parallel synchronous Jacobi algorithms by different partitioned procedures including band-row splitting, band-row sparsity pattern splitting and substructuring splitting, when solving sparse large linear systems. Numerical experiments performed on a set of academic 3D Laplace equation and on a real gravity matrices arising from the… ▽ More In this paper, we present, evaluate and analyse the performance of parallel synchronous Jacobi algorithms by different partitioned procedures including band-row splitting, band-row sparsity pattern splitting and substructuring splitting, when solving sparse large linear systems. Numerical experiments performed on a set of academic 3D Laplace equation and on a real gravity matrices arising from the Chicxulub crater are exhibited, and show the impact of splitting on parallel synchronous iterations when solving sparse large linear systems. The numerical results clearly show the interest of substructuring methods compared to band-row splitting strategies. △ Less

Submitted 4 December, 2021; originally announced December 2021.

Comments: arXiv admin note: text overlap with arXiv:2108.13162

arXiv:2112.00087 [pdf, other]

Coupling and Simulation of Fluid-Structure Interaction Problems for Automotive Sun-roof on Graphics Processing Unit

Authors: Liang S. Lai, Choi-Hong Lai, Abal-Kassim Cheik Ahamed, Frederic Magoules

Abstract: In this paper, the authors propose an analysis of the frequency response function in a car compartment, subject to some fluctuating pressure distribution along the open cavity of the sun-roof at the top of a car. Coupling of a computational fluid dynamics and of a computational acoustics code is considered to simulate the acoustic fluid-structure interaction problem. Iterative Krylov methods and d… ▽ More In this paper, the authors propose an analysis of the frequency response function in a car compartment, subject to some fluctuating pressure distribution along the open cavity of the sun-roof at the top of a car. Coupling of a computational fluid dynamics and of a computational acoustics code is considered to simulate the acoustic fluid-structure interaction problem. Iterative Krylov methods and domain decomposition methods, tuned on Graphic Processing Unit (GPU), are considered to solve the acoustic problem with complex number arithmetics with double precision. Numerical simulations illustrate the efficiency, robustness and accuracy of the proposed approaches. △ Less

Submitted 30 November, 2021; originally announced December 2021.

arXiv:2110.10762 [pdf, other]

Asynchronous parareal time discretization for partial differential equations

Authors: Frederic Magoules, Guillaume Gbikpi-Benissan

Abstract: Asynchronous iterations are more and more investigated for both scaling and fault-resilience purpose on high performance computing platforms. While so far, they have been exclusively applied within space domain decomposition frameworks, this paper advocates a novel application direction targeting time-decomposed time-parallel approaches. Specifically, an asynchronous iterative model is derived fro… ▽ More Asynchronous iterations are more and more investigated for both scaling and fault-resilience purpose on high performance computing platforms. While so far, they have been exclusively applied within space domain decomposition frameworks, this paper advocates a novel application direction targeting time-decomposed time-parallel approaches. Specifically, an asynchronous iterative model is derived from the Parareal scheme, for which convergence and speedup analysis are then conducted. It turned out that Parareal and async-Parareal feature very close convergence conditions, asymptotically equivalent, including the finite-time termination property. Based on a computational cost model aware of unsteady communication delays, our speedup analysis shows the potential performance gain from asynchronous iterations, which is confirmed by some experimental case of heat evolution on a homogeneous supercomputer. This primary work clearly suggests possible further benefits from asynchronous iterations. △ Less

Submitted 20 October, 2021; originally announced October 2021.

arXiv:2110.10446 [pdf, other]

doi 10.2312/egs.20211022

Interactive simulation for easy decision-making in fluid dynamics

Authors: Mengchen Wang, Nicolas Férey, Frédéric Magoulès, Patrick Bourdot

Abstract: A conventional study of fluid simulation involves different stages including conception, simulation, visualization, and analysis tasks. It is, therefore, necessary to switch between different software and interactive contexts which implies costly data manipulation and increases the time needed for decision making. Our interactive simulation approach was designed to shorten this loop, allowing user… ▽ More A conventional study of fluid simulation involves different stages including conception, simulation, visualization, and analysis tasks. It is, therefore, necessary to switch between different software and interactive contexts which implies costly data manipulation and increases the time needed for decision making. Our interactive simulation approach was designed to shorten this loop, allowing users to visualize and steer a simulation in progress without waiting for the end of the simulation. The methodology allows the users to control, start, pause, or stop a simulation in progress, to change global physical parameters, to interact with its 3D environment by editing boundary conditions such as walls or obstacles. This approach is made possible by using a methodology such as the Lattice Boltzmann Method (LBM) to achieve interactive time while remaining physically relevant. In this work, we present our platform dedicated to interactive fluid simulation based on LBM. The contribution of our interactive simulation approach to decision making will be evaluated in a study based on a simple but realistic use case. △ Less

Submitted 20 October, 2021; originally announced October 2021.

Journal ref: H. Theisel and M. Wimmer, editors, Eurographics 2021 - Short Papers. The Eurographics Association, 2021

arXiv:2110.01414 [pdf, other]

doi 10.1109/HPCC.2014.54

Power Consumption Analysis of Parallel Algorithms on GPUs

Authors: Frédéric Magoulès, Abal-Kassim Cheik Ahamed, Alban Desmaison, Jean-Christophe Léchenet, François Mayer, Haifa Ben Salem, Thomas Zhu

Abstract: Due to their highly parallel multi-cores architecture, GPUs are being increasingly used in a wide range of computationally intensive applications. Compared to CPUs, GPUs can achieve higher performances at accelerating the programs' execution in an energy-efficient way. Therefore GPGPU computing is useful for high performance computing applications and in many scientific research fields. In order t… ▽ More Due to their highly parallel multi-cores architecture, GPUs are being increasingly used in a wide range of computationally intensive applications. Compared to CPUs, GPUs can achieve higher performances at accelerating the programs' execution in an energy-efficient way. Therefore GPGPU computing is useful for high performance computing applications and in many scientific research fields. In order to bring further performance improvements, GPU clusters are increasingly adopted. The energy consumed by GPUs cannot be neglected. Therefore, an energy-efficient time scheduling of the programs that are going to be executed by the parallel GPUs based on their deadline as well as the assigned priorities could be deployed to face their energetic avidity. For this reason, we present in this paper a model enabling the measure of the power consumption and the time execution of some elementary operations running on a single GPU using a new developed energy measurement protocol. Consequently, using our methodology, energy needs of a program could be predicted, allowing a better task scheduling. △ Less

Submitted 28 September, 2021; originally announced October 2021.

MSC Class: 14Q65; 15A60; 65E10; 65F10; 68W10; 65Y05; 68M20 ACM Class: G.1.3; G.1.6; I.3.1; D.3.4

arXiv:2108.13162 [pdf, other]

doi 10.1109/HPCC.2014.24

Parallel Sub-Structuring Methods for solving Sparse Linear Systems on a cluster of GPU

Authors: Abal-Kassim Cheik Ahamed, Frédéric Magoulès

Abstract: The main objective of this work consists in analyzing sub-structuring method for the parallel solution of sparse linear systems with matrices arising from the discretization of partial differential equations such as finite element, finite volume and finite difference. With the success encountered by the general-purpose processing on graphics processing units (GPGPU), we develop an hybrid multiGPUs… ▽ More The main objective of this work consists in analyzing sub-structuring method for the parallel solution of sparse linear systems with matrices arising from the discretization of partial differential equations such as finite element, finite volume and finite difference. With the success encountered by the general-purpose processing on graphics processing units (GPGPU), we develop an hybrid multiGPUs and CPUs sub-structuring algorithm. GPU computing, with CUDA, is used to accelerate the operations performed on each processor. Numerical experiments have been performed on a set of matrices arising from engineering problems. We compare C+MPI implementation on classical CPU cluster with C+MPI+CUDA on a cluster of GPU. The performance comparison shows a speed-up for the sub-structuring method up to 19 times in double precision by using CUDA. △ Less

Submitted 8 August, 2021; originally announced August 2021.

MSC Class: 14Q65; 15A60; 65E10; 65F10; 68W10; 65Y05 ACM Class: G.1.3; G.1.6; I.3.1; D.3.4

Journal ref: 2014 IEEE Intl Conf on High Performance Computing and Communications, 2014, pp. 121-128

arXiv:1912.06474 [pdf, other]

doi 10.1109/DCABES.2014.27

Spectral domain decomposition method for physically-based rendering of photochromic/electrochromic glass windows

Authors: Guillaume Gbikpi-Benissan, Patrick Callet, Frederic Magoules

Abstract: This paper covers the time consuming issues intrinsic to physically-based image rendering algorithms. First, glass materials optical properties were measured on samples of real glasses and other objects materials inside an hotel room were characterized by deducing spectral data from multiple trichromatic images. We then present the rendering model and ray-tracing algorithm implemented in Virtueliu… ▽ More This paper covers the time consuming issues intrinsic to physically-based image rendering algorithms. First, glass materials optical properties were measured on samples of real glasses and other objects materials inside an hotel room were characterized by deducing spectral data from multiple trichromatic images. We then present the rendering model and ray-tracing algorithm implemented in Virtuelium, an open source software. In order to accelerate the computation of the interactions between light rays and objects, the ray-tracing algorithm is parallelized by means of domain decomposition method techniques. Numerical experiments show that the speedups obtained with classical parallelization techniques are significantly less significant than those achieved with parallel domain decomposition methods. △ Less

Submitted 9 December, 2019; originally announced December 2019.

Comments: arXiv admin note: substantial text overlap with arXiv:1912.05494

arXiv:1912.05494 [pdf, other]

doi 10.1109/HPCC.2014.17

Spectral Domain Decomposition Method for Natural Lighting and Medieval Glass Rendering

Authors: Guillaume Gbikpi-Benissan, Remi Cerise, Patrick Callet, Frederic Magoules

Abstract: In this paper, we use an original ray-tracing domain decomposition method to address image rendering of naturally lighted scenes. This new method allows to particularly analyze rendering problems on parallel architectures, in the case of interactions between light-rays and glass material. Numerical experiments, for medieval glass rendering within the church of the Royaumont abbey, illustrate the p… ▽ More In this paper, we use an original ray-tracing domain decomposition method to address image rendering of naturally lighted scenes. This new method allows to particularly analyze rendering problems on parallel architectures, in the case of interactions between light-rays and glass material. Numerical experiments, for medieval glass rendering within the church of the Royaumont abbey, illustrate the performance of the proposed ray-tracing domain decomposition method (DDM) on multi-cores and multi-processors architectures. On one hand, applying domain decomposition techniques increases speedups obtained by parallelizing the computation. On the other hand, for a fixed number of parallel processes, we notice that speedups increase as the number of sub-domains do. △ Less

Submitted 9 December, 2019; originally announced December 2019.

arXiv:1912.04356 [pdf, other]

Interactive 3D fluid simulation: steering the simulation in progress using Lattice Boltzmann Method

Authors: Mengchen Wang, Nicolas Ferey, Patrick Bourdot, Frederic Magoules

Abstract: This paper describes a work in progress about software and hardware architecture to steer and control an ongoing fluid simulation in a context of a serious game application. We propose to use the Lattice Boltzmann Method as the simulation approach considering that it can provide fully parallel algorithms to reach interactive time and because it is easier to change parameters while the simulation i… ▽ More This paper describes a work in progress about software and hardware architecture to steer and control an ongoing fluid simulation in a context of a serious game application. We propose to use the Lattice Boltzmann Method as the simulation approach considering that it can provide fully parallel algorithms to reach interactive time and because it is easier to change parameters while the simulation is in progress remaining physically relevant than more classical simulation approaches. We describe which parameters we can modify and how we solve technical issues of interactive steering and we finally show an application of our interactive fluid simulation approach of water dam phenomena. △ Less

Submitted 9 December, 2019; originally announced December 2019.

arXiv:1912.04352 [pdf, other]

Using asynchronous simulation approach for interactive simulation

Authors: Mengchen Wang, Nicolas Ferey, Patrick Bourdot, Frederic Magoules

Abstract: This paper discusses about the advantage of using asynchronous simulation in the case of interactive simulation in which user can steer and control parameters during a simulation in progress. synchronous models allow to compute each iteration faster to address the issues of performance needed in an highly interactive context, and our hypothesis is that get partial results faster is better than get… ▽ More This paper discusses about the advantage of using asynchronous simulation in the case of interactive simulation in which user can steer and control parameters during a simulation in progress. synchronous models allow to compute each iteration faster to address the issues of performance needed in an highly interactive context, and our hypothesis is that get partial results faster is better than getting synchronized and final results to take a decision, in a interactive simulation context. △ Less

Submitted 9 December, 2019; originally announced December 2019.

arXiv:1912.04000 [pdf, ps, other]

doi 10.1109/CSE-EUC-DCABES.2016.212

Spectral domain decomposition method for physically-based rendering of Royaumont abbey

Authors: Guillaume Gbikpi-Benissan, Patrick Callet, Frederic Magoules

Abstract: In the context of a virtual reconstitution of the destroyed Royaumont abbey church, this paper investigates computer sciences issues intrinsic to the physically-based image rendering. First, a virtual model was designed from historical sources and archaeological descriptions. Then some materials physical properties were measured on remains of the church and on pieces from similar ancient churches.… ▽ More In the context of a virtual reconstitution of the destroyed Royaumont abbey church, this paper investigates computer sciences issues intrinsic to the physically-based image rendering. First, a virtual model was designed from historical sources and archaeological descriptions. Then some materials physical properties were measured on remains of the church and on pieces from similar ancient churches. We specify the properties of our lighting source which is a representation of the sun, and present the rendering algorithm implemented in our software Virtuelium. In order to accelerate the computation of the interactions between light-rays and objects, this ray-tracing algorithm is parallelized by means of domain decomposition techniques. Numerical experiments show that the computational time saved by a classic parallelization is much less significant than that gained with our approach. △ Less

Submitted 9 December, 2019; originally announced December 2019.

arXiv:1912.03998 [pdf, other]

doi 10.1109/DCABES.2014.34

Beam-tracing domain decomposition method for urban acoustic pollution

Authors: Guillaume Gbikpi-Benissan, Frederic Magoules

Abstract: This paper covers the fast solution of large acoustic problems on low-resources parallel platforms. A domain decomposition method is coupled with a dynamic load balancing scheme to efficiently accelerate a geometrical acoustic method. The geometrical method studied implements a beam-tracing method where intersections are handled as in a ray-tracing method. Beyond the distribution of the global pro… ▽ More This paper covers the fast solution of large acoustic problems on low-resources parallel platforms. A domain decomposition method is coupled with a dynamic load balancing scheme to efficiently accelerate a geometrical acoustic method. The geometrical method studied implements a beam-tracing method where intersections are handled as in a ray-tracing method. Beyond the distribution of the global processing upon multiple sub-domains, a second parallelization level is operated by means of multi-threading and shared memory mechanisms. Numerical experiments show that this method allows to handle large scale open domains for parallel computing purposes on few machines. Urban acoustic pollution arrising from car traffic was simulated on a large model of the Shinjuku district of Tokyo, Japan. The good speed-up results illustrate the performance of this new domain decomposition method. △ Less

Submitted 9 December, 2019; originally announced December 2019.

arXiv:1912.00816 [pdf, ps, other]

doi 10.1109/DCABES48411.2019.00048

Recent Developments in Iterative Methods for Reducing Synchronization

Authors: Qinmeng Zou, Frederic Magoules

Abstract: On modern parallel architectures, the cost of synchronization among processors can often dominate the cost of floating-point computation. Several modifications of the existing methods have been proposed in order to keep the communication cost as low as possible. This paper aims at providing a brief overview of recent advances in parallel iterative methods for solving large-scale problems. We refer… ▽ More On modern parallel architectures, the cost of synchronization among processors can often dominate the cost of floating-point computation. Several modifications of the existing methods have been proposed in order to keep the communication cost as low as possible. This paper aims at providing a brief overview of recent advances in parallel iterative methods for solving large-scale problems. We refer the reader to the related references for more details on the derivation, implementation, performance, and analysis of these techniques. △ Less

Submitted 2 December, 2019; originally announced December 2019.

Journal ref: 18th International Symposium on Distributed Computing and Applications for Business Engineering and Science (DCABES), 2019, IEEE

arXiv:1909.01473 [pdf, ps, other]

Asynchronous Time-Parallel Method based on Laplace Transform

Authors: Frederic Magoules, Qinmeng Zou

Abstract: Laplace transform method has proved to be very efficient and easy to parallelize for the solution of time-dependent problems. However, the synchronization delay among processors implies an upper bound on the expectable acceleration factor, which leads to a lot of wasted time. In this paper, we propose an original asynchronous Laplace transform method formalized for quasilinear problems based on th… ▽ More Laplace transform method has proved to be very efficient and easy to parallelize for the solution of time-dependent problems. However, the synchronization delay among processors implies an upper bound on the expectable acceleration factor, which leads to a lot of wasted time. In this paper, we propose an original asynchronous Laplace transform method formalized for quasilinear problems based on the well-known Gaver-Stehfest algorithm. Parallel experiments show the convergence of our new method, as well as several interesting properties compared with the classical algorithms. △ Less

Submitted 3 September, 2019; originally announced September 2019.

arXiv:1907.04393 [pdf, other]

doi 10.1109/dcabes.2017.9

GPU Accelerated Contactless Human Machine Interface for Driving Car

Authors: Frederic Magoules, Qinmeng Zou

Abstract: In this paper we present an original contactless human machine interface for driving car. The proposed framework is based on the image sent by a simple camera device, which is then processed by various computer vision algorithms. These algorithms allow the isolation of the user's hand on the camera frame and translate its movements into orders sent to the computer in a real time process. The optim… ▽ More In this paper we present an original contactless human machine interface for driving car. The proposed framework is based on the image sent by a simple camera device, which is then processed by various computer vision algorithms. These algorithms allow the isolation of the user's hand on the camera frame and translate its movements into orders sent to the computer in a real time process. The optimization of the implemented algorithms on graphics processing unit leads to real time interaction between the user, the computer and the machine. The user can easily modify or create the interfaces displayed by the proposed framework to fit his personnel needs. A contactless driving car interface is here produced to illustrate the principle of our framework. △ Less

Submitted 9 July, 2019; originally announced July 2019.

Journal ref: 16th International Symposium on Distributed Computing and Applications for Business Engineering and Science (DCABES), 2017, IEEE

arXiv:1907.04390 [pdf, other]

doi 10.1109/dcabes.2017.37

A Novel Contactless Human Machine Interface based on Machine Learning

Authors: Frederic Magoules, Qinmeng Zou

Abstract: This paper describes a global framework that enables contactless human machine interaction using computer vision and machine learning techniques. The main originality of our framework is that only a very simple image acquisition device, as a computer camera, is sufficient to establish a rich human machine interaction as traditional devices such as mouse or keyboard. This framework is based on well… ▽ More This paper describes a global framework that enables contactless human machine interaction using computer vision and machine learning techniques. The main originality of our framework is that only a very simple image acquisition device, as a computer camera, is sufficient to establish a rich human machine interaction as traditional devices such as mouse or keyboard. This framework is based on well known computer vision techniques and efficient machine learning techniques are used to detect and track user hand gestures so the end user can control his computer using virtual interfaces with very simple gestures. △ Less

Submitted 9 July, 2019; originally announced July 2019.

Journal ref: 16th International Symposium on Distributed Computing and Applications for Business Engineering and Science (DCABES), 2017, IEEE

arXiv:1907.01201 [pdf, other]

doi 10.1109/dcabes.2018.00081

Convergence Detection of Asynchronous Iterations based on Modified Recursive Doubling

Authors: Qinmeng Zou, Frederic Magoules

Abstract: This paper addresses the distributed convergence detection problem in asynchronous iterations. A modified recursive doubling algorithm is investigated in order to adapt to the non-power-of-two case. Some convergence detection algorithms are illustrated based on the reduction operation. Finally, a concluding discussion about the implementation and the applicability is presented. This paper addresses the distributed convergence detection problem in asynchronous iterations. A modified recursive doubling algorithm is investigated in order to adapt to the non-power-of-two case. Some convergence detection algorithms are illustrated based on the reduction operation. Finally, a concluding discussion about the implementation and the applicability is presented. △ Less

Submitted 2 July, 2019; originally announced July 2019.

Journal ref: 17th International Symposium on Distributed Computing and Applications for Business Engineering and Science (DCABES), 2018, IEEE

arXiv:1907.01199 [pdf, other]

doi 10.1109/dcabes.2017.17

Asynchronous Communications Library for the Parallel-in-Time Solution of Black-Scholes Equation

Authors: Qinmeng Zou, Guillaume Gbikpi-Benissan, Frederic Magoules

Abstract: The advent of asynchronous iterative scheme gives high efficiency to numerical computations. However, it is generally difficult to handle the problems of resource management and convergence detection. This paper uses JACK2, an asynchronous communication kernel library for iterative algorithms, to implement both classical and asynchronous parareal algorithms, especially the latter. We illustrate th… ▽ More The advent of asynchronous iterative scheme gives high efficiency to numerical computations. However, it is generally difficult to handle the problems of resource management and convergence detection. This paper uses JACK2, an asynchronous communication kernel library for iterative algorithms, to implement both classical and asynchronous parareal algorithms, especially the latter. We illustrate the measures whereby one can tackle the problems above elegantly for the time-dependent case. Finally, experiments are presented to prove the availability and efficiency of such application. △ Less

Submitted 2 July, 2019; originally announced July 2019.

Journal ref: 16th International Symposium on Distributed Computing and Applications for Business Engineering and Science (DCABES), 2017, IEEE

arXiv:1907.01198 [pdf, ps, other]

doi 10.1109/dcabes.2017.15

Asynchronous Parareal Algorithm Applied to European Option Pricing

Authors: Qinmeng Zou, Guillaume Gbikpi-Benissan, Frederic Magoules

Abstract: Asynchronous iterations arise naturally in parallel computing if one wants to solve large problems with a minimization of the idle times. This paper presents an original model of asynchronous iterations for a time-domain decomposition method, namely the parareal method. The asynchronous parareal algorithm is here applied to European option pricing, and numerical experiments performed on a parallel… ▽ More Asynchronous iterations arise naturally in parallel computing if one wants to solve large problems with a minimization of the idle times. This paper presents an original model of asynchronous iterations for a time-domain decomposition method, namely the parareal method. The asynchronous parareal algorithm is here applied to European option pricing, and numerical experiments performed on a parallel supercomputer, illustrate the performance and efficiency of this new method. △ Less

Submitted 2 July, 2019; originally announced July 2019.

Journal ref: 16th International Symposium on Distributed Computing and Applications for Business Engineering and Science (DCABES), 2017, IEEE

Showing 1–35 of 35 results for author: Magoules, F