-
Relevance of the Basset history term for Lagrangian particle dynamics
Authors:
Julio Urizarna-Carasa,
Daniel Ruprecht,
Alexandra von Kameke,
Kathrin Padberg-Gehle
Abstract:
The movement of small but finite spherical particles in a fluid can be described by the Maxey-Riley equation (MRE) if they are too large to be considered passive tracers. The MRE contains an integral "history term" modeling wake effects, which causes the force acting on a particle at some given time to depend on its full past trajectory. The history term causes complications in the numerical solut…
▽ More
The movement of small but finite spherical particles in a fluid can be described by the Maxey-Riley equation (MRE) if they are too large to be considered passive tracers. The MRE contains an integral "history term" modeling wake effects, which causes the force acting on a particle at some given time to depend on its full past trajectory. The history term causes complications in the numerical solution of the MRE and is therefore often neglected, despite both numerical and experimental evidence that its effects are generally not negligible. By numerically computing trajectories with and without the history term of a large number of particles in different flow fields, we investigate its impact on the large-scale Lagrangian dynamics of simulated particles. We show that for moderate to large Stokes numbers, ignoring the history term leads to significant differences in clustering patterns. Furthermore, we compute finite-time Lyapunov exponents and show that, even for small particles, the differences in the resulting scalar field from ignoring the BHT can be significant, in particular if the underlying flow is turbulent.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Dynamic Deep Learning Based Super-Resolution For The Shallow Water Equations
Authors:
Maximilian Witte,
Fabricio Rodrigues Lapolli,
Philip Freese,
Sebastian Götschel,
Daniel Ruprecht,
Peter Korn,
Christopher Kadow
Abstract:
Using the nonlinear shallow water equations as benchmark, we demonstrate that a simulation with the ICON-O ocean model with a 20km resolution that is frequently corrected by a U-net-type neural network can achieve discretization errors of a simulation with 10km resolution. The network, originally developed for image-based super-resolution in post-processing, is trained to compute the difference be…
▽ More
Using the nonlinear shallow water equations as benchmark, we demonstrate that a simulation with the ICON-O ocean model with a 20km resolution that is frequently corrected by a U-net-type neural network can achieve discretization errors of a simulation with 10km resolution. The network, originally developed for image-based super-resolution in post-processing, is trained to compute the difference between solutions on both meshes and is used to correct the coarse mesh every 12h. Our setup is the Galewsky test case, modeling transition of a barotropic instability into turbulent flow. We show that the ML-corrected coarse resolution run correctly maintains a balance flow and captures the transition to turbulence in line with the higher resolution simulation. After 8 day of simulation, the $L_2$-error of the corrected run is similar to a simulation run on the finer mesh. While mass is conserved in the corrected runs, we observe some spurious generation of kinetic energy.
△ Less
Submitted 9 April, 2024;
originally announced April 2024.
-
Bathymetry reconstruction from experimental data using PDE-constrained optimisation
Authors:
Judith Angel,
Jörn Behrens,
Sebastian Götschel,
Marten Hollm,
Daniel Ruprecht,
Robert Seifried
Abstract:
Knowledge of the bottom topography, also called bathymetry, of rivers, seas or the ocean is important for many areas of maritime science and civil engineering. While direct measurements are possible, they are time consuming and expensive. Therefore, many approaches have been proposed how to infer the bathymetry from measurements of surface waves. Mathematically, this is an inverse problem where an…
▽ More
Knowledge of the bottom topography, also called bathymetry, of rivers, seas or the ocean is important for many areas of maritime science and civil engineering. While direct measurements are possible, they are time consuming and expensive. Therefore, many approaches have been proposed how to infer the bathymetry from measurements of surface waves. Mathematically, this is an inverse problem where an unknown system state needs to be reconstructed from observations with a suitable model for the flow as constraint. In many cases, the shallow water equations can be used to describe the flow. While theoretical studies of the efficacy of such a PDE-constrained optimisation approach for bathymetry reconstruction exist, there seem to be few publications that study its application to data obtained from real-world measurements. This paper shows that the approach can, at least qualitatively, reconstruct a Gaussian-shaped bathymetry in a wave flume from measurements of the water height at up to three points. Achieved normalized root mean square errors (NRMSE) are in line with other approaches.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
Space-time parallel scaling of Parareal with a Fourier Neural Operator as coarse propagator
Authors:
Abdul Qadir Ibrahim,
Sebastian Götschel,
Daniel Ruprecht
Abstract:
Iterative parallel-in-time algorithms like Parareal can extend scaling beyond the saturation of purely spatial parallelization when solving initial value problems. However, they require the user to build coarse models to handle the inevitably serial transport of information in time.This is a time consuming and difficult process since there is still only limited theoretical insight into what consti…
▽ More
Iterative parallel-in-time algorithms like Parareal can extend scaling beyond the saturation of purely spatial parallelization when solving initial value problems. However, they require the user to build coarse models to handle the inevitably serial transport of information in time.This is a time consuming and difficult process since there is still only limited theoretical insight into what constitutes a good and efficient coarse model. Novel approaches from machine learning to solve differential equations could provide a more generic way to find coarse level models for parallel-in-time algorithms. This paper demonstrates that a physics-informed Fourier Neural Operator (PINO) is an effective coarse model for the parallelization in time of the two-asset Black-Scholes equation using Parareal. We demonstrate that PINO-Parareal converges as fast as a bespoke numerical coarse model and that, in combination with spatial parallelization by domain decomposition, it provides better overall speedup than both purely spatial parallelization and space-time parallelizaton with a numerical coarse propagator.
△ Less
Submitted 3 April, 2024;
originally announced April 2024.
-
Parallel performance of shared memory parallel spectral deferred corrections
Authors:
Philip Freese,
Sebastian Götschel,
Thibaut Lunet,
Daniel Ruprecht,
Martin Schreiber
Abstract:
We investigate parallel performance of parallel spectral deferred corrections, a numerical approach that provides small-scale parallelism for the numerical solution of initial value problems. The scheme is applied to the shallow water equation and uses an IMEX splitting that integrates fast modes implicitly and slow modes explicitly in order to be efficient. We describe parallel $\texttt{OpenMP}$-…
▽ More
We investigate parallel performance of parallel spectral deferred corrections, a numerical approach that provides small-scale parallelism for the numerical solution of initial value problems. The scheme is applied to the shallow water equation and uses an IMEX splitting that integrates fast modes implicitly and slow modes explicitly in order to be efficient. We describe parallel $\texttt{OpenMP}$-based implementations of parallel SDC in two well established simulation codes: the finite volume based operational ocean model $\texttt{ICON-O}$ and the spherical harmonics based research code $\texttt{SWEET}$. The implementations are benchmarked on a single node of the JUSUF ($\texttt{SWEET}$) and JUWELS ($\texttt{ICON-O}$) system at Jülich Supercomputing Centre. We demonstrate a reduction of time-to-solution across a range of accuracies. For $\texttt{ICON-O}$, we show speedup over the currently used Adams--Bashforth-2 integrator with $\texttt{OpenMP}$ loop parallelization. For $\texttt{SWEET}$, we show speedup over serial spectral deferred corrections and a second order implicit-explicit integrator.
△ Less
Submitted 29 March, 2024;
originally announced March 2024.
-
Physics-Informed Neural Networks for Satellite State Estimation
Authors:
Jacob Varey,
Jessica D. Ruprecht,
Michael Tierney,
Ryan Sullenberger
Abstract:
The Space Domain Awareness (SDA) community routinely tracks satellites in orbit by fitting an orbital state to observations made by the Space Surveillance Network (SSN). In order to fit such orbits, an accurate model of the forces that are acting on the satellite is required. Over the past several decades, high-quality, physics-based models have been developed for satellite state estimation and pr…
▽ More
The Space Domain Awareness (SDA) community routinely tracks satellites in orbit by fitting an orbital state to observations made by the Space Surveillance Network (SSN). In order to fit such orbits, an accurate model of the forces that are acting on the satellite is required. Over the past several decades, high-quality, physics-based models have been developed for satellite state estimation and propagation. These models are exceedingly good at estimating and propagating orbital states for non-maneuvering satellites; however, there are several classes of anomalous accelerations that a satellite might experience which are not well-modeled, such as satellites that use low-thrust electric propulsion to modify their orbit. Physics-Informed Neural Networks (PINNs) are a valuable tool for these classes of satellites as they combine physics models with Deep Neural Networks (DNNs), which are highly expressive and versatile function approximators. By combining a physics model with a DNN, the machine learning model need not learn astrodynamics, which results in more efficient and effective utilization of machine learning resources. This paper details the application of PINNs to estimate the orbital state and a continuous, low-amplitude anomalous acceleration profile for satellites. The PINN is trained to learn the unknown acceleration by minimizing the mean square error of observations. We evaluate the performance of pure physics models with PINNs in terms of their observation residuals and their propagation accuracy beyond the fit span of the observations. For a two-day simulation of a GEO satellite using an unmodeled acceleration profile on the order of $10^{-8} \text{ km/s}^2$, the PINN outperformed the best-fit physics model by orders of magnitude for both observation residuals (123 arcsec vs 1.00 arcsec) as well as propagation accuracy (3860 km vs 164 km after five days).
△ Less
Submitted 28 March, 2024;
originally announced March 2024.
-
Improving Efficiency of Parallel Across the Method Spectral Deferred Corrections
Authors:
Gayatri Čaklović,
Thibaut Lunet,
Sebastian Götschel,
Daniel Ruprecht
Abstract:
Parallel-across-the method time integration can provide small scale parallelism when solving initial value problems. Spectral deferred corrections (SDC) with a diagonal sweeper, which is closely related to iterated Runge-Kutta methods proposed by Van der Houwen and Sommeijer, can use a number of threads equal to the number of quadrature nodes in the underlying collocation method. However, converge…
▽ More
Parallel-across-the method time integration can provide small scale parallelism when solving initial value problems. Spectral deferred corrections (SDC) with a diagonal sweeper, which is closely related to iterated Runge-Kutta methods proposed by Van der Houwen and Sommeijer, can use a number of threads equal to the number of quadrature nodes in the underlying collocation method. However, convergence speed, efficiency and stability depends critically on the used coefficients. Previous approaches have used numerical optimization to find good parameters. Instead, we propose an ansatz that allows to find optimal parameters analytically. We show that the resulting parallel SDC methods provide stability domains and convergence order very similar to those of well established serial SDC variants. Using a model for computational cost that assumes 80% efficiency of an implementation of parallel SDC we show that our variants are competitive with serial SDC, previously published parallel SDC coefficients as well as Picard iteration, explicit RKM-4 and an implicit fourth-order diagonally implicit Runge-Kutta method.
△ Less
Submitted 27 March, 2024;
originally announced March 2024.
-
Efficient numerical methods for the Maxey-Riley equations with Basset history term
Authors:
Julio Urizarna-Carasa,
Leon Schlegel,
Daniel Ruprecht
Abstract:
The Maxey-Riley equations (MRE) describe the motion of a finite-sized, spherical particle in a fluid. Because of wake effects, the force acting on a particle depends on its past trajectory. This is modelled by an integral term in the MRE, also called Basset force, that makes its numerical solution challenging and memory intensive. A recent approach proposed by Prasath, Vasan and Govindarajan explo…
▽ More
The Maxey-Riley equations (MRE) describe the motion of a finite-sized, spherical particle in a fluid. Because of wake effects, the force acting on a particle depends on its past trajectory. This is modelled by an integral term in the MRE, also called Basset force, that makes its numerical solution challenging and memory intensive. A recent approach proposed by Prasath, Vasan and Govindarajan exploits connections between the integral term and fractional derivatives to reformulate the MRE as a time-dependent partial differential equation on a semi-infinite pseudo-space. They also propose a numerical algorithm based on polynomial expansions. This paper develops a numerical approach based on finite difference instead, by adopting techniques by Koleva and Fazio and Janelli to cope with the issues of having an unbounded spatial domain. We compare convergence order and computational efficiency for particles of varying size and density of the polynomial expansion by Prasath et al., our finite difference schemes and a direct integrator for the MRE based on multi-step methods proposed by Daitche.
△ Less
Submitted 20 March, 2024;
originally announced March 2024.
-
Adaptive time step selection for Spectral Deferred Corrections
Authors:
Thomas Baumann,
Sebastian Götschel,
Thibaut Lunet,
Daniel Ruprecht,
Robert Speck
Abstract:
Spectral Deferred Corrections (SDC) is an iterative method for the numerical solution of ordinary differential equations. It works by refining the numerical solution for an initial value problem by approximately solving differential equations for the error, and can be interpreted as a preconditioned fixed-point iteration for solving the fully implicit collocation problem. We adopt techniques from…
▽ More
Spectral Deferred Corrections (SDC) is an iterative method for the numerical solution of ordinary differential equations. It works by refining the numerical solution for an initial value problem by approximately solving differential equations for the error, and can be interpreted as a preconditioned fixed-point iteration for solving the fully implicit collocation problem. We adopt techniques from embedded Runge-Kutta Methods (RKM) to SDC in order to provide a mechanism for adaptive time step size selection and thus increase computational efficiency of SDC. We propose two SDC-specific estimates of the local error that are generic and require only minimal problem specific tuning. We demonstrate a gain in efficiency over standard SDC with fixed step size, compare efficiency favorably against state-of-the-art adaptive RKM and show that due to its iterative nature, adaptive SDC can cope efficiently with silent data corruption.
△ Less
Submitted 20 March, 2024;
originally announced March 2024.
-
Spectral deferred correction methods for second-order problems
Authors:
Ikrom Akramov,
Sebastian Götschel,
Michael Minion,
Daniel Ruprecht,
Robert Speck
Abstract:
Spectral deferred corrections (SDC) are a class of iterative methods for the numerical solution of ordinary differential equations. SDC can be interpreted as a Picard iteration to solve a fully implicit collocation problem, preconditioned with a low-order method. It has been widely studied for first-order problems, using explicit, implicit or implicit-explicit Euler and other low-order methods as…
▽ More
Spectral deferred corrections (SDC) are a class of iterative methods for the numerical solution of ordinary differential equations. SDC can be interpreted as a Picard iteration to solve a fully implicit collocation problem, preconditioned with a low-order method. It has been widely studied for first-order problems, using explicit, implicit or implicit-explicit Euler and other low-order methods as preconditioner. For first-order problems, SDC achieves arbitrary order of accuracy and possesses good stability properties. While numerical results for SDC applied to the second-order Lorentz equations exist, no theoretical results are available for SDC applied to second-order problems.
We present an analysis of the convergence and stability properties of SDC using velocity-Verlet as the base method for general second-order initial value problems. Our analysis proves that the order of convergence depends on whether the force in the system depends on the velocity. We also demonstrate that the SDC iteration is stable under certain conditions. Finally, we show that SDC can be computationally more efficient than a simple Picard iteration or a fourth-order Runge-Kutta-Nyström method.
△ Less
Submitted 12 February, 2024; v1 submitted 12 October, 2023;
originally announced October 2023.
-
Parareal with a physics-informed neural network as coarse propagator
Authors:
Abdul Qadir Ibrahim,
Sebastian Götschel,
Daniel Ruprecht
Abstract:
Parallel-in-time algorithms provide an additional layer of concurrency for the numerical integration of models based on time-dependent differential equations. Methods like Parareal, which parallelize across multiple time steps, rely on a computationally cheap and coarse integrator to propagate information forward in time, while a parallelizable expensive fine propagator provides accuracy. Typicall…
▽ More
Parallel-in-time algorithms provide an additional layer of concurrency for the numerical integration of models based on time-dependent differential equations. Methods like Parareal, which parallelize across multiple time steps, rely on a computationally cheap and coarse integrator to propagate information forward in time, while a parallelizable expensive fine propagator provides accuracy. Typically, the coarse method is a numerical integrator using lower resolution, reduced order or a simplified model. Our paper proposes to use a physics-informed neural network (PINN) instead. We demonstrate for the Black-Scholes equation, a partial differential equation from computational finance, that Parareal with a PINN coarse propagator provides better speedup than a numerical coarse propagator. Training and evaluating a neural network are both tasks whose computing patterns are well suited for GPUs. By contrast, mesh-based algorithms with their low computational intensity struggle to perform well. We show that moving the coarse propagator PINN to a GPU while running the numerical fine propagator on the CPU further improves Parareal's single-node performance. This suggests that integrating machine learning techniques into parallel-in-time integration methods and exploiting their differences in computing patterns might offer a way to better utilize heterogeneous architectures.
△ Less
Submitted 5 June, 2023; v1 submitted 7 March, 2023;
originally announced March 2023.
-
Parallel Computation of Inverse Compton Scattering Radiation Spectra based on Liénard-Wiechert Potentials
Authors:
Yi-Kai Kan,
Franz X. Kärtner,
Sabine Le Borne,
Daniel Ruprecht,
Jens-Peter M. Zemke
Abstract:
Inverse Compton Scattering (ICS) has gained much attention recently because of its promise for the development of table-top-size X-ray light sources. Precise and fast simulation is an indispensable tool for predicting the radiation property of a given machine design and to optimize its parameters. Instead of the conventional approach to compute radiation spectra which directly evaluates the discre…
▽ More
Inverse Compton Scattering (ICS) has gained much attention recently because of its promise for the development of table-top-size X-ray light sources. Precise and fast simulation is an indispensable tool for predicting the radiation property of a given machine design and to optimize its parameters. Instead of the conventional approach to compute radiation spectra which directly evaluates the discretized Fourier integral of the Liénard-Wiechert field given analytically (referred to as the frequency-domain method), this article focuses on an approach where the field is recorded along the observer time on a uniform time grid which is then used to compute the radiation spectra after completion of the simulation, referred to as the time-domain method. Besides the derivation and implementation details of the proposed method, we analyze possible parallelization schemes and compare the parallel performance of the proposed time-domain method with the frequency-domain method. We will characterize scenarios/conditions under which one method is expected to outperform the other.
△ Less
Submitted 7 June, 2022;
originally announced June 2022.
-
Tracing Milky Way substructure with an RR Lyrae hierarchical clustering forest
Authors:
Brian T. Cook,
Deborah F. Woods,
Jessica D. Ruprecht,
Jacob Varey,
Radha Mastandrea,
Kaylee de Soto,
Jacob F. Harburg,
Umaa Rebbapragada,
Ashish A. Mahabal
Abstract:
RR Lyrae variable stars have long been reliable standard candles used to discern structure in the Local Group. With this in mind, we present a routine to identify grou**s containing a statistically significant number of RR Lyrae variables in the Milky Way environment. RR Lyrae variable grou**s, or substructures, with potential Galactic archaeology applications are found using a forest of agglo…
▽ More
RR Lyrae variable stars have long been reliable standard candles used to discern structure in the Local Group. With this in mind, we present a routine to identify grou**s containing a statistically significant number of RR Lyrae variables in the Milky Way environment. RR Lyrae variable grou**s, or substructures, with potential Galactic archaeology applications are found using a forest of agglomerative, hierarchical clustering trees, whose leaves are Milky Way RR Lyrae variables. Each grou** is validated by ensuring that the internal RR Lyrae variable proper motions are sufficiently correlated. Photometric information was collected from the Gaia second data release and proper motions from the (early) third data release. After applying this routine to the catalogue of 91234 variables, we are able to report sixteen unique RR Lyrae substructures with physical sizes of less than 1 kpc. Five of these substructures are in close proximity to Milky Way globular clusters with previously known tidal tails and/or a potential connection to Galactic merger events. One candidate substructure is in the neighbourhood of the Large Magellanic Cloud but is more distant (and older) than known satellites of the dwarf galaxy. Our study ends with a discussion of ways in which future surveys could be applied to the discovery of Milky Way stellar streams.
△ Less
Submitted 12 April, 2022;
originally announced April 2022.
-
A unified analysis framework for iterative parallel-in-time algorithms
Authors:
M. J. Gander,
T. Lunet,
D. Ruprecht,
R. Speck
Abstract:
Parallel-in-time integration has been the focus of intensive research efforts over the past two decades due to the advent of massively parallel computer architectures and the scaling limits of purely spatial parallelization. Various iterative parallel-in-time (PinT) algorithms have been proposed, like Parareal, PFASST, MGRIT, and Space-Time Multi-Grid (STMG). These methods have been described usin…
▽ More
Parallel-in-time integration has been the focus of intensive research efforts over the past two decades due to the advent of massively parallel computer architectures and the scaling limits of purely spatial parallelization. Various iterative parallel-in-time (PinT) algorithms have been proposed, like Parareal, PFASST, MGRIT, and Space-Time Multi-Grid (STMG). These methods have been described using different notations, and the convergence estimates that are available are difficult to compare. We describe Parareal, PFASST, MGRIT and STMG for the Dahlquist model problem using a common notation and give precise convergence estimates using generating functions. This allows us, for the first time, to directly compare their convergence. We prove that all four methods eventually converge super-linearly, and also compare them numerically. The generating function framework provides further opportunities to explore and analyze existing and new methods.
△ Less
Submitted 28 April, 2023; v1 submitted 30 March, 2022;
originally announced March 2022.
-
Impact of spatial coarsening on Parareal convergence
Authors:
Judith Angel,
Sebastian Götschel,
Daniel Ruprecht
Abstract:
We study the impact of spatial coarsening on the convergence of the Parareal algorithm, both theoretically and numerically. For initial value problems with a normal system matrix, we prove a lower bound for the Euclidean norm of the iteration matrix. When there is no physical or numerical diffusion, an immediate consequence is that the norm of the iteration matrix cannot be smaller than unoty as s…
▽ More
We study the impact of spatial coarsening on the convergence of the Parareal algorithm, both theoretically and numerically. For initial value problems with a normal system matrix, we prove a lower bound for the Euclidean norm of the iteration matrix. When there is no physical or numerical diffusion, an immediate consequence is that the norm of the iteration matrix cannot be smaller than unoty as soon as the coarse problem has fewer degrees-of-freedom than the fine. This prevents a theoretical guarantee for monotonic convergence, which is necessary to obtain meaningful speedups. For diffusive problems, in the worst-case where the iteration error contracts only as fast as the powers of the iteration matrix norm, making Parareal as accurate as the fine method will take about as many iterations as there are processors, making meaningful speedup impossible. Numerical examples with a non-normal system matrix show that for diffusive problems good speedup is possible, but that for non-diffusive problems the negative impact of spatial coarsening on convergence is big.
△ Less
Submitted 19 November, 2021;
originally announced November 2021.
-
New applications for the Boris Spectral Deferred Correction algorithm for plasma simulations
Authors:
Kris Smedt,
Daniel Ruprecht,
Jitse Niesen,
Steven Tobias,
Joonas Nättilä
Abstract:
The paper investigates two new use cases for the Boris Spectral Deferred Corrections (Boris-SDC) time integrator for plasma simulations. First, we show that using Boris-SDC as a particle pusher in an electrostatic particle-in-cell (PIC) code can, at least in the linear regime, improve simulation accuracy compared with the standard second order Boris method. In some instances, the higher order of B…
▽ More
The paper investigates two new use cases for the Boris Spectral Deferred Corrections (Boris-SDC) time integrator for plasma simulations. First, we show that using Boris-SDC as a particle pusher in an electrostatic particle-in-cell (PIC) code can, at least in the linear regime, improve simulation accuracy compared with the standard second order Boris method. In some instances, the higher order of Boris-SDC even allows a much larger time step, leading to modest computational gains. Second, we propose a modification of Boris-SDC for the relativistic regime. Based on an implementation of Boris-SDC in the \textsc{runko} PIC code, we demonstrate for a relativistic Penning trap that Boris-SDC retains its high order of convergence for velocities ranging from $0.5c$ to $>0.99c$. We also show that for the force-free case where acceleration from electric and magnetic field cancel, Boris-SDC produces less numerical drift than Boris.
△ Less
Submitted 15 October, 2021;
originally announced October 2021.
-
Twelve Ways To Fool The Masses When Giving Parallel-In-Time Results
Authors:
Sebastian Goetschel,
Michael Minion,
Daniel Ruprecht,
Robert Speck
Abstract:
Getting good speedup -- let alone high parallel efficiency -- for parallel-in-time (PinT) integration examples can be frustratingly difficult. The high complexity and large number of parameters in PinT methods can easily (and unintentionally) lead to numerical experiments that overestimate the algorithm's performance. In the tradition of Bailey's article "Twelve ways to fool the masses when giving…
▽ More
Getting good speedup -- let alone high parallel efficiency -- for parallel-in-time (PinT) integration examples can be frustratingly difficult. The high complexity and large number of parameters in PinT methods can easily (and unintentionally) lead to numerical experiments that overestimate the algorithm's performance. In the tradition of Bailey's article "Twelve ways to fool the masses when giving performance results on parallel computers", we discuss and demonstrate pitfalls to avoid when evaluating performance of PinT methods. Despite being written in a light-hearted tone, this paper is intended to raise awareness that there are many ways to unintentionally fool yourself and others and that by avoiding these fallacies more meaningful PinT performance results can be obtained.
△ Less
Submitted 23 February, 2021;
originally announced February 2021.
-
Performance of the BGSDC integrator for computing fast ion trajectories in nuclear fusion reactors
Authors:
Krasymyr Tretiak,
James Buchanan,
Rob Akers,
Daniel Ruprecht
Abstract:
Modelling neutral beam injection (NBI) in fusion reactors requires computing the trajectories of large ensembles of particles. Slowing down times of up to one second combined with nanosecond time steps make these simulations computationally very costly. This paper explores the performance of BGSDC, a new numerical time step** method, for tracking ions generated by NBI in the DIII-D and JET react…
▽ More
Modelling neutral beam injection (NBI) in fusion reactors requires computing the trajectories of large ensembles of particles. Slowing down times of up to one second combined with nanosecond time steps make these simulations computationally very costly. This paper explores the performance of BGSDC, a new numerical time step** method, for tracking ions generated by NBI in the DIII-D and JET reactors. BGSDC is a high-order generalisation of the Boris method, combining it with spectral deferred corrections and the Generalized Minimal Residual method GMRES. Without collision modelling, where numerical drift can be quantified accurately, we find that BGSDC can deliver higher quality particle distributions than the standard Boris integrator at comparable cost or comparable distributions at lower cost. With collision models, quantifying accuracy is difficult but we show that BGSDC produces stable distributions at larger time steps than Boris.
△ Less
Submitted 3 December, 2020; v1 submitted 15 May, 2020;
originally announced May 2020.
-
Performance of parallel-in-time integration for Rayleigh Bénard Convection
Authors:
Andrew Clarke,
Chris Davies,
Daniel Ruprecht,
Steven Tobias,
Jeffrey S. Oishi
Abstract:
Rayleigh-Bénard convection (RBC) is a fundamental problem of fluid dynamics, with many applications to geophysical, astrophysical, and industrial flows. Understanding RBC at parameter regimes of interest requires complex physical or numerical experiments. Numerical simulations require large amounts of computational resources; in order to more efficiently use the large numbers of processors now ava…
▽ More
Rayleigh-Bénard convection (RBC) is a fundamental problem of fluid dynamics, with many applications to geophysical, astrophysical, and industrial flows. Understanding RBC at parameter regimes of interest requires complex physical or numerical experiments. Numerical simulations require large amounts of computational resources; in order to more efficiently use the large numbers of processors now available in large high performance computing clusters, novel parallelisation strategies are required. To this end, we investigate the performance of the parallel-in-time algorithm Parareal when used in numerical simulations of RBC. We present the first parallel-in-time speedups for RBC simulations at finite Prandtl number. We also investigate the problem of convergence of Parareal with respect to to statistical numerical quantities, such as the Nusselt number, and discuss the importance of reliable online stop** criteria in these cases.
△ Less
Submitted 6 January, 2020;
originally announced January 2020.
-
Parareal with a Learned Coarse Model for Robotic Manipulation
Authors:
Wisdom Agboh,
Oliver Grainger,
Daniel Ruprecht,
Mehmet Dogar
Abstract:
A key component of many robotics model-based planning and control algorithms is physics predictions, that is, forecasting a sequence of states given an initial state and a sequence of controls. This process is slow and a major computational bottleneck for robotics planning algorithms. Parallel-in-time integration methods can help to leverage parallel computing to accelerate physics predictions and…
▽ More
A key component of many robotics model-based planning and control algorithms is physics predictions, that is, forecasting a sequence of states given an initial state and a sequence of controls. This process is slow and a major computational bottleneck for robotics planning algorithms. Parallel-in-time integration methods can help to leverage parallel computing to accelerate physics predictions and thus planning. The Parareal algorithm iterates between a coarse serial integrator and a fine parallel integrator. A key challenge is to devise a coarse model that is computationally cheap but accurate enough for Parareal to converge quickly. Here, we investigate the use of a deep neural network physics model as a coarse model for Parareal in the context of robotic manipulation. In simulated experiments using the physics engine Mujoco as fine propagator we show that the learned coarse model leads to faster Parareal convergence than a coarse physics-based model. We further show that the learned coarse model allows to apply Parareal to scenarios with multiple objects, where the physics-based coarse model is not applicable. Finally, we conduct experiments on a real robot and show that Parareal predictions are close to real-world physics predictions for robotic pushing of multiple objects. Videos are at https://youtu.be/wCh2o1rf-gA.
△ Less
Submitted 19 June, 2020; v1 submitted 12 December, 2019;
originally announced December 2019.
-
Characterization of material around the centaur (2060) Chiron from a visible and near-infrared stellar occultation in 2011
Authors:
A. A. Sickafoose,
A. S. Bosh,
J. P. Emery,
M. J. Person,
C. A. Zuluaga,
M. Womack,
J. D. Ruprecht,
F. B. Bianco,
A. M. Zangari
Abstract:
The centaur (2060) Chiron has exhibited outgassing behaviour and possibly hosts a ring system. On 2011 November 29, Chiron occulted a fairly bright star (R approximately 15 mag) as observed from the 3-m NASA Infrared Telescope Facility (IRTF) on Mauna Kea and the 2-m Faulkes Telescope North (FTN) at Haleakala. Data were taken as visible wavelength images and simultaneous, low-resolution, near-infr…
▽ More
The centaur (2060) Chiron has exhibited outgassing behaviour and possibly hosts a ring system. On 2011 November 29, Chiron occulted a fairly bright star (R approximately 15 mag) as observed from the 3-m NASA Infrared Telescope Facility (IRTF) on Mauna Kea and the 2-m Faulkes Telescope North (FTN) at Haleakala. Data were taken as visible wavelength images and simultaneous, low-resolution, near-infrared (NIR) spectra. Here, we present a detailed examination of the light-curve features in the optical data and an analysis of the near-infrared spectra. We place a lower limit on the diameter of Chiron's nucleus of 160.2+/-1.3 km. Sharp, narrow dips were observed between 280-360 km from the centre (depending on event geometry). For a central chord and assumed ring plane, the separated features are 298.5 to 302 and 308 to 310.5 km from the nucleus, with normal optical depth approximately 0.5-0.9, and a gap of 9.1+/-1.3 km. These features are similar in equivalent depth to Chariklo's inner ring. The absence of absorbing or scattering material near the nucleus suggests that these sharp dips are more likely to be planar rings than a shell of material. The region of relatively-increased transmission is within the 1:2 spin-orbit resonance, which is consistent with the proposed clearing pattern for a non-axisymmetric nucleus. Characteristics of additional, azimuthally incomplete features are presented, which are likely to be transient, as well as detection of an extended shell or diffuse ring from approximately 900-1500 km. There are no significant features in the NIR light curves, nor any correlation between optical features and NIR spectral slope.
△ Less
Submitted 18 November, 2019; v1 submitted 11 October, 2019;
originally announced October 2019.
-
Combining Coarse and Fine Physics for Manipulation using Parallel-in-Time Integration
Authors:
Wisdom C. Agboh,
Daniel Ruprecht,
Mehmet R. Dogar
Abstract:
We present a method for fast and accurate physics-based predictions during non-prehensile manipulation planning and control. Given an initial state and a sequence of controls, the problem of predicting the resulting sequence of states is a key component of a variety of model-based planning and control algorithms. We propose combining a coarse (i.e. computationally cheap but not very accurate) pred…
▽ More
We present a method for fast and accurate physics-based predictions during non-prehensile manipulation planning and control. Given an initial state and a sequence of controls, the problem of predicting the resulting sequence of states is a key component of a variety of model-based planning and control algorithms. We propose combining a coarse (i.e. computationally cheap but not very accurate) predictive physics model, with a fine (i.e. computationally expensive but accurate) predictive physics model, to generate a hybrid model that is at the required speed and accuracy for a given manipulation task. Our approach is based on the Parareal algorithm, a parallel-in-time integration method used for computing numerical solutions for general systems of ordinary differential equations. We adapt Parareal to combine a coarse pushing model with an off-the-shelf physics engine to deliver physics-based predictions that are as accurate as the physics engine but run in substantially less wall-clock time, thanks to parallelization across time. We use these physics-based predictions in a model-predictive-control framework based on trajectory optimization, to plan pushing actions that avoid an obstacle and reach a goal location. We show that with hybrid physics models, we can achieve the same success rates as the planner that uses the off-the-shelf physics engine directly, but significantly faster. We present experiments in simulation and on a real robotic setup. Videos are available here: https://youtu.be/5e9oTeu4JOU
△ Less
Submitted 30 August, 2019; v1 submitted 20 March, 2019;
originally announced March 2019.
-
Parallel-in-time integration of Kinematic Dynamos
Authors:
Andrew T. Clarke,
Christopher J. Davies,
Daniel Ruprecht,
Steven M. Tobias
Abstract:
The precise mechanisms responsible for the natural dynamos in the Earth and Sun are still not fully understood. Numerical simulations of natural dynamos are extremely computationally intensive, and are carried out in parameter regimes many orders of magnitude away from real conditions. Parallelization in space is a common strategy to speed up simulations on high performance computers, but eventual…
▽ More
The precise mechanisms responsible for the natural dynamos in the Earth and Sun are still not fully understood. Numerical simulations of natural dynamos are extremely computationally intensive, and are carried out in parameter regimes many orders of magnitude away from real conditions. Parallelization in space is a common strategy to speed up simulations on high performance computers, but eventually hits a scaling limit. Additional directions of parallelization are desirable to utilise the high number of processor cores now available. Parallel-in-time methods can deliver speed up in addition to that offered by spatial partitioning but have not yet been applied to dynamo simulations. This paper investigates the feasibility of using the parallel-in-time algorithm Parareal to speed up initial value problem simulations of the kinematic dynamo, using the open source Dedalus spectral solver. Both the time independent Roberts and time dependent Galloway-Proctor 2.5D dynamos are investigated over a range of magnetic Reynolds numbers.
Speed ups beyond those possible from spatial parallelization are found in both cases. Results for the Galloway-Proctor flow are promising, with Parareal efficiency found to be close to 0.3. Roberts flow results are less efficient, but Parareal still shows some speed up over spatial parallelization alone.
Parallel in space and time speed ups of $\sim300$ were found for 1600 cores for the Galloway-Proctor flow, with total parallel efficiency of $\sim0.16$.
△ Less
Submitted 1 February, 2019;
originally announced February 2019.
-
An arbitrary order time-step** algorithm for tracking particles in inhomogeneous magnetic fields
Authors:
Krasymyr Tretiak,
Daniel Ruprecht
Abstract:
The Lorentz equations describe the motion of electrically charged particles in electric and magnetic fields and are used widely in plasma physics. The most popular numerical algorithm for solving them is the Boris method, a variant of the Störmer-Verlet algorithm. Boris' method is phase space volume conserving and simulated particles typically remain near the correct trajectory. However, it is onl…
▽ More
The Lorentz equations describe the motion of electrically charged particles in electric and magnetic fields and are used widely in plasma physics. The most popular numerical algorithm for solving them is the Boris method, a variant of the Störmer-Verlet algorithm. Boris' method is phase space volume conserving and simulated particles typically remain near the correct trajectory. However, it is only second order accurate. Therefore, in scenarios where it is not enough to know that a particle stays on the right trajectory but one needs to know where on the trajectory the particle is at a given time, Boris method requires very small time steps to deliver accurate phase information, making it computationally expensive. We derive an improved version of the high-order Boris spectral deferred correction algorithm (Boris-SDC) by adopting a convergence acceleration strategy for second order problems based on the Generalised Minimum Residual (GMRES) method. Our new algorithm is easy to implement as it still relies on the standard Boris method. Like Boris-SDC it can deliver arbitrary order of accuracy through simple changes of runtime parameter but possesses better long-term energy stability. We demonstrate for two examples, a magnetic mirror trap and the Solev'ev equilibrium, that the new method can deliver better accuracy at lower computational cost compared to the standard Boris method. While our examples are motivated by tracking ions in the magnetic field of a nuclear fusion reactor, the introduced algorithm can potentially deliver similar improvements in efficiency for other applications.
△ Less
Submitted 2 August, 2019; v1 submitted 19 December, 2018;
originally announced December 2018.
-
Toward transient finite element simulation of thermal deformation of machine tools in real-time
Authors:
Andreas Naumann,
Daniel Ruprecht,
Joerg Wensch
Abstract:
Finite element models without simplifying assumptions can accurately describe the spatial and temporal distribution of heat in machine tools as well as the resulting deformation. In principle, this allows to correct for displacements of the Tool Centre Point and enables high precision manufacturing. However, the computational cost of FEM models and restriction to generic algorithms in commercial t…
▽ More
Finite element models without simplifying assumptions can accurately describe the spatial and temporal distribution of heat in machine tools as well as the resulting deformation. In principle, this allows to correct for displacements of the Tool Centre Point and enables high precision manufacturing. However, the computational cost of FEM models and restriction to generic algorithms in commercial tools like ANSYS prevents their operational use since simulations have to run faster than real-time. For the case where heat diffusion is slow compared to machine movement, we introduce a tailored implicit-explicit multi-rate time step** method of higher order based on spectral deferred corrections. Using the open-source FEM library DUNE, we show that fully coupled simulations of the temperature field are possible in real-time for a machine consisting of a stock sliding up and down on rails attached to a stand.
△ Less
Submitted 12 July, 2017;
originally announced July 2017.
-
Parallel-in-Space-and-Time Simulation of the Three-Dimensional, Unsteady Navier-Stokes Equations for Incompressible Flow
Authors:
Roberto Croce,
Daniel Ruprecht,
Rolf Krause
Abstract:
In this paper we combine the Parareal parallel-in-time method together with spatial parallelization and investigate this space-time parallel scheme by means of solving the three-dimensional incompressible Navier-Stokes equations. Parallelization of time step** provides a new direction of parallelization and allows to employ additional cores to further speed up simulations after spatial paralleli…
▽ More
In this paper we combine the Parareal parallel-in-time method together with spatial parallelization and investigate this space-time parallel scheme by means of solving the three-dimensional incompressible Navier-Stokes equations. Parallelization of time step** provides a new direction of parallelization and allows to employ additional cores to further speed up simulations after spatial parallelization has saturated. We report on numerical experiments performed on a Cray XE6, simulating a driven cavity flow with and without obstacles. Distributed memory parallelization is used in both space and time, featuring up to 2,048 cores in total. It is confirmed that the space-time-parallel method can provide speedup beyond the saturation of the spatial parallelization.
△ Less
Submitted 17 May, 2017;
originally announced May 2017.
-
Wave propagation characteristics of Parareal
Authors:
Daniel Ruprecht
Abstract:
The paper derives and analyses the (semi-)discrete dispersion relation of the Parareal parallel-in-time integration method. It investigates Parareal's wave propagation characteristics with the aim to better understand what causes the well documented stability problems for hyperbolic equations. The analysis shows that the instability is caused by convergence of the amplification factor to the exact…
▽ More
The paper derives and analyses the (semi-)discrete dispersion relation of the Parareal parallel-in-time integration method. It investigates Parareal's wave propagation characteristics with the aim to better understand what causes the well documented stability problems for hyperbolic equations. The analysis shows that the instability is caused by convergence of the amplification factor to the exact value from above for medium to high wave numbers. Phase errors in the coarse propagator are identified as the culprit, which suggests that specifically tailored coarse level methods could provide a remedy.
△ Less
Submitted 14 October, 2017; v1 submitted 5 January, 2017;
originally announced January 2017.
-
Spectral deferred corrections with fast-wave slow-wave splitting
Authors:
Daniel Ruprecht,
Robert Speck
Abstract:
The paper investigates a variant of semi-implicit spectral deferred corrections (SISDC) in which the stiff, fast dynamics correspond to fast propagating waves ("fast-wave slow-wave problem"). We show that for a scalar test problem with two imaginary eigenvalues $i λ_{fast}$, $i λ_{slow}$, having $Δt \left(\left| λ_{fast} \right| + \left| λ_{slow} \right| \right) < 1$ is sufficient for the fast-wav…
▽ More
The paper investigates a variant of semi-implicit spectral deferred corrections (SISDC) in which the stiff, fast dynamics correspond to fast propagating waves ("fast-wave slow-wave problem"). We show that for a scalar test problem with two imaginary eigenvalues $i λ_{fast}$, $i λ_{slow}$, having $Δt \left(\left| λ_{fast} \right| + \left| λ_{slow} \right| \right) < 1$ is sufficient for the fast-wave slow-wave SDC (FWSW-SDC) iteration to converge and that in the limit of infinitely fast waves the convergence rate of the non-split version is retained. Stability function and discrete dispersion relation are derived and show that the method is stable for essentially arbitrary fast-wave CFL numbers as long as the slow dynamics are resolved. The method causes little numerical diffusion and its semi-discrete phase speed is accurate also for large wave number modes. Performance is studied for an acoustic-advection problem and for the linearised Boussinesq equations, describing compressible, stratified flow. FWSW-SDC is compared to a diagonally implicit Runge-Kutta (DIRK) and IMEX Runge-Kutta (IMEX) method and found to be competitive in terms of both accuracy and cost.
△ Less
Submitted 13 June, 2016; v1 submitted 4 February, 2016;
originally announced February 2016.
-
Toward fault-tolerant parallel-in-time integration with PFASST
Authors:
Robert Speck,
Daniel Ruprecht
Abstract:
We introduce and analyze different strategies for the parallel-in-time integration method PFASST to recover from hard faults and subsequent data loss. Since PFASST stores solutions at multiple time steps on different processors, information from adjacent steps can be used to recover after a processor has failed. PFASST's multi-level hierarchy allows to use the coarse level for correcting the recon…
▽ More
We introduce and analyze different strategies for the parallel-in-time integration method PFASST to recover from hard faults and subsequent data loss. Since PFASST stores solutions at multiple time steps on different processors, information from adjacent steps can be used to recover after a processor has failed. PFASST's multi-level hierarchy allows to use the coarse level for correcting the reconstructed solution, which can help to minimize overhead. A theoretical model is devised linking overhead to the number of additional PFASST iterations required for convergence after a fault. The potential efficiency of different strategies is assessed in terms of required additional iterations for examples of diffusive and advective type.
△ Less
Submitted 31 May, 2016; v1 submitted 28 October, 2015;
originally announced October 2015.
-
Explicit Parallel-in-time Integration of a Linear Acoustic-Advection System
Authors:
Daniel Ruprecht,
Rolf Krause
Abstract:
The applicability of the Parareal parallel-in-time integration scheme for the solution of a linear, two-dimensional hyperbolic acoustic-advection system, which is often used as a test case for integration schemes for numerical weather prediction (NWP), is addressed. Parallel-in-time schemes are a possible way to increase, on the algorithmic level, the amount of parallelism, a requirement arising f…
▽ More
The applicability of the Parareal parallel-in-time integration scheme for the solution of a linear, two-dimensional hyperbolic acoustic-advection system, which is often used as a test case for integration schemes for numerical weather prediction (NWP), is addressed. Parallel-in-time schemes are a possible way to increase, on the algorithmic level, the amount of parallelism, a requirement arising from the rapidly growing number of CPUs in high performance computer systems. A recently introduced modification of the "parallel implicit time-integration algorithm" could successfully solve hyperbolic problems arising in structural dynamics. It has later been cast into the framework of Parareal. The present paper adapts this modified Parareal and employs it for the solution of a hyperbolic flow problem, where the initial value problem solved in parallel arises from the spatial discretization of a partial differential equation by a finite difference method. It is demonstrated that the modified Parareal is stable and can produce reasonably accurate solutions while allowing for a noticeable reduction of the time-to-solution. The implementation relies on integration schemes already widely used in NWP (RK-3, partially split forward Euler, forward-backward). It is demonstrated that using an explicit partially split scheme for the coarse integrator allows to avoid the use of an implicit scheme while still achieving speedup.
△ Less
Submitted 8 October, 2015;
originally announced October 2015.
-
Shared Memory Pipelined Parareal
Authors:
Daniel Ruprecht
Abstract:
For the parallel-in-time integration method Parareal, pipelining can be used to hide some of the cost of the serial correction step and improve its efficiency. The paper introduces a basic OpenMP implementation of pipelined Parareal and compares it to a standard MPI-based variant. Both versions yield almost identical runtimes, but, depending on the compiler, the OpenMP variant consumes about 7% le…
▽ More
For the parallel-in-time integration method Parareal, pipelining can be used to hide some of the cost of the serial correction step and improve its efficiency. The paper introduces a basic OpenMP implementation of pipelined Parareal and compares it to a standard MPI-based variant. Both versions yield almost identical runtimes, but, depending on the compiler, the OpenMP variant consumes about 7% less energy and has a significantly smaller memory footprint. However, its higher implementation complexity might make it difficult to use in legacy codes and in combination with spatial parallelisation.
△ Less
Submitted 11 November, 2019; v1 submitted 23 September, 2015;
originally announced September 2015.
-
Parareal convergence for 2D unsteady flow around a cylinder
Authors:
Andreas Kreienbuehl,
Arne Naegel,
Daniel Ruprecht,
Andreas Vogel,
Gabriel Wittum,
Rolf Krause
Abstract:
In this technical report we study the convergence of Parareal for 2D incompressible flow around a cylinder for different viscosities. Two methods are used as fine integrator: backward Euler and a fractional step method. It is found that Parareal converges better for the implicit Euler, likely because it under-resolves the fine-scale dynamics as a result of numerical diffusion.
In this technical report we study the convergence of Parareal for 2D incompressible flow around a cylinder for different viscosities. Two methods are used as fine integrator: backward Euler and a fractional step method. It is found that Parareal converges better for the implicit Euler, likely because it under-resolves the fine-scale dynamics as a result of numerical diffusion.
△ Less
Submitted 14 September, 2015;
originally announced September 2015.
-
Time parallel gravitational collapse simulation
Authors:
Andreas Kreienbuehl,
Pietro Benedusi,
Daniel Ruprecht,
Rolf Krause
Abstract:
This article demonstrates the applicability of the parallel-in-time method Parareal to the numerical solution of the Einstein gravity equations for the spherical collapse of a massless scalar field. To account for the shrinking of the spatial domain in time, a tailored load balancing scheme is proposed and compared to load balancing based on number of time steps alone. The performance of Parareal…
▽ More
This article demonstrates the applicability of the parallel-in-time method Parareal to the numerical solution of the Einstein gravity equations for the spherical collapse of a massless scalar field. To account for the shrinking of the spatial domain in time, a tailored load balancing scheme is proposed and compared to load balancing based on number of time steps alone. The performance of Parareal is studied for both the sub-critical and black hole case; our experiments show that Parareal generates substantial speedup and, in the super-critical regime, can reproduce Choptuik's black hole mass scaling law.
△ Less
Submitted 28 December, 2016; v1 submitted 4 September, 2015;
originally announced September 2015.
-
Numerical simulation of skin transport using Parareal
Authors:
Andreas Kreienbuehl,
Arne Naegel,
Daniel Ruprecht,
Robert Speck,
Gabriel Wittum,
Rolf Krause
Abstract:
In-silico investigation of skin permeation is an important but also computationally demanding problem. To resolve all scales involved in full detail will not only require exascale computing capacities but also suitable parallel algorithms. This article investigates the applicability of the time-parallel Parareal algorithm to a brick and mortar setup, a precursory problem to skin permeation. The C+…
▽ More
In-silico investigation of skin permeation is an important but also computationally demanding problem. To resolve all scales involved in full detail will not only require exascale computing capacities but also suitable parallel algorithms. This article investigates the applicability of the time-parallel Parareal algorithm to a brick and mortar setup, a precursory problem to skin permeation. The C++ library Lib4PrM implementing Parareal is combined with the UG4 simulation framework, which provides the spatial discretization and parallelization. The combination's performance is studied with respect to convergence and speedup. It is confirmed that anisotropies in the domain and jumps in diffusion coefficients only have a minor impact on Parareal's convergence. The influence of load imbalances in time due to differences in number of iterations required by the spatial solver as well as spatio-temporal weak scaling is discussed.
△ Less
Submitted 27 July, 2015; v1 submitted 12 February, 2015;
originally announced February 2015.
-
A stencil-based implementation of Parareal in the C++ domain specific embedded language STELLA
Authors:
Andrea Arteaga,
Daniel Ruprecht,
Rolf Krause
Abstract:
In view of the rapid rise of the number of cores in modern supercomputers, time-parallel methods that introduce concurrency along the temporal axis are becoming increasingly popular. For the solution of time-dependent partial differential equations, these methods can add another direction for concurrency on top of spatial parallelization. The paper presents an implementation of the time-parallel P…
▽ More
In view of the rapid rise of the number of cores in modern supercomputers, time-parallel methods that introduce concurrency along the temporal axis are becoming increasingly popular. For the solution of time-dependent partial differential equations, these methods can add another direction for concurrency on top of spatial parallelization. The paper presents an implementation of the time-parallel Parareal method in a C++ domain specific language for stencil computations (STELLA). STELLA provides both an OpenMP and a CUDA backend for a shared memory parallelization, using the CPU or GPU inside a node for the spatial stencils. Here, we intertwine this node-wise spatial parallelism with the time-parallel Parareal. This is done by adding an MPI-based implementation of Parareal, which allows us to parallelize in time across nodes. The performance of Parareal with both backends is analyzed in terms of speedup, parallel efficiency and energy-to-solution for an advection-diffusion problem with a time-dependent diffusion coefficient.
△ Less
Submitted 3 December, 2014; v1 submitted 30 September, 2014;
originally announced September 2014.
-
A high-order Boris integrator
Authors:
Mathias Winkel,
Robert Speck,
Daniel Ruprecht
Abstract:
This work introduces the high-order Boris-SDC method for integrating the equations of motion for electrically charged particles in an electric and magnetic field. Boris-SDC relies on a combination of the Boris-integrator with spectral deferred corrections (SDC). SDC can be considered as preconditioned Picard iteration to compute the stages of a collocation method. In this interpretation, inverting…
▽ More
This work introduces the high-order Boris-SDC method for integrating the equations of motion for electrically charged particles in an electric and magnetic field. Boris-SDC relies on a combination of the Boris-integrator with spectral deferred corrections (SDC). SDC can be considered as preconditioned Picard iteration to compute the stages of a collocation method. In this interpretation, inverting the preconditioner corresponds to a sweep with a low-order method. In Boris-SDC, the Boris method, a second-order Lorentz force integrator based on velocity-Verlet, is used as a sweeper/preconditioner. The presented method provides a generic way to extend the classical Boris integrator, which is widely used in essentially all particle-based plasma physics simulations involving magnetic fields, to a high-order method. Stability, convergence order and conservation properties of the method are demonstrated for different simulation setups. Boris-SDC reproduces the expected high order of convergence for a single particle and for the center-of-mass of a particle cloud in a Penning trap and shows good long-term energy stability.
△ Less
Submitted 14 April, 2015; v1 submitted 19 September, 2014;
originally announced September 2014.
-
Interweaving PFASST and Parallel Multigrid
Authors:
Michael Minion,
Robert Speck,
Matthias Bolten,
Matthew Emmett,
Daniel Ruprecht
Abstract:
The parallel full approximation scheme in space and time (PFASST) introduced by Emmett and Minion in 2012 is an iterative strategy for the temporal parallelization of ODEs and discretized PDEs. As the name suggests, PFASST is similar in spirit to a space-time FAS multigrid method performed over multiple time-steps in parallel. However, since the original focus of PFASST has been on the performance…
▽ More
The parallel full approximation scheme in space and time (PFASST) introduced by Emmett and Minion in 2012 is an iterative strategy for the temporal parallelization of ODEs and discretized PDEs. As the name suggests, PFASST is similar in spirit to a space-time FAS multigrid method performed over multiple time-steps in parallel. However, since the original focus of PFASST has been on the performance of the method in terms of time parallelism, the solution of any spatial system arising from the use of implicit or semi-implicit temporal methods within PFASST have simply been assumed to be solved to some desired accuracy completely at each sub-step and each iteration by some unspecified procedure. It hence is natural to investigate how iterative solvers in the spatial dimensions can be interwoven with the PFASST iterations and whether this strategy leads to a more efficient overall approach. This paper presents an initial investigation on the relative performance of different strategies for coupling PFASST iterations with multigrid methods for the implicit treatment of diffusion terms in PDEs. In particular, we compare full accuracy multigrid solves at each sub-step with a small fixed number of multigrid V-cycles. This reduces the cost of each PFASST iteration at the possible expense of a corresponding increase in the number of PFASST iterations needed for convergence. Parallel efficiency of the resulting methods is explored through numerical examples.
△ Less
Submitted 30 March, 2015; v1 submitted 24 July, 2014;
originally announced July 2014.
-
Parareal for diffusion problems with space- and time-dependent coefficients
Authors:
Daniel Ruprecht,
Robert Speck,
Rolf Krause
Abstract:
For the time-parallel Parareal method, there exists both numerical and analytical proof that it converges very well for diffusive problems like the heat equation. Many applications, however, do not lead to simple homogeneous diffusive scenarios but feature strongly inhomogeneous and possibly anisotropic coefficients. The paper presents results from a numerical study of how space- and time-dependen…
▽ More
For the time-parallel Parareal method, there exists both numerical and analytical proof that it converges very well for diffusive problems like the heat equation. Many applications, however, do not lead to simple homogeneous diffusive scenarios but feature strongly inhomogeneous and possibly anisotropic coefficients. The paper presents results from a numerical study of how space- and time-dependent coefficients in a diffusion setup affect Parareal's convergence behaviour. It is shown that, for the presented examples, non-constant diffusion coefficients have only marginal influence on how fast Parareal converges. Furthermore, an example is shown that illustrates how for linear problems the maximum singular value of the Parareal iteration matrix can be used to estimate convergence rates.
△ Less
Submitted 30 January, 2014;
originally announced January 2014.
-
Inexact spectral deferred corrections
Authors:
Robert Speck,
Daniel Ruprecht,
Michael Minion,
Matthew Emmett,
Rolf Krause
Abstract:
Spectral deferred correction (SDC) methods are an attractive approach to iteratively computing collocation solutions to an ODE by performing so-called sweeps with a low-order time step** method. SDC allows to easily construct high order split methods where e.g. stiff terms of the ODE are treated implicitly. This requires the solution to full accuracy of multiple linear systems of equations durin…
▽ More
Spectral deferred correction (SDC) methods are an attractive approach to iteratively computing collocation solutions to an ODE by performing so-called sweeps with a low-order time step** method. SDC allows to easily construct high order split methods where e.g. stiff terms of the ODE are treated implicitly. This requires the solution to full accuracy of multiple linear systems of equations during each sweep, e.g. with a multigrid method. In this paper, we present an inexact variant of SDC, where each solve of a linear system is replaced by a single multigrid V-cycle and thus significantly reduces the cost for each sweep. For the investigated examples, this strategy results only in a small increase of the number of required sweeps and we demonstrate that "inexact spectral deferred corrections" can provide a dramatic reduction of the overall number of multigrid V-cycles required to complete an SDC time step.
△ Less
Submitted 12 June, 2014; v1 submitted 30 January, 2014;
originally announced January 2014.
-
Convergence of Parareal for the Navier-Stokes equations depending on the Reynolds number
Authors:
Johannes Steiner,
Daniel Ruprecht,
Robert Speck,
Rolf Krause
Abstract:
The paper presents first a linear stability analysis for the time-parallel Parareal method, using an IMEX Euler as coarse and a Runge-Kutta-3 method as fine propagator, confirming that dominant imaginary eigenvalues negatively affect Parareal's convergence. This suggests that when Parareal is applied to the nonlinear Navier-Stokes equations, problems for small viscosities could arise. Numerical re…
▽ More
The paper presents first a linear stability analysis for the time-parallel Parareal method, using an IMEX Euler as coarse and a Runge-Kutta-3 method as fine propagator, confirming that dominant imaginary eigenvalues negatively affect Parareal's convergence. This suggests that when Parareal is applied to the nonlinear Navier-Stokes equations, problems for small viscosities could arise. Numerical results for a driven cavity benchmark are presented, confirming that Parareal's convergence can indeed deteriorate as viscosity decreases and the flow becomes increasingly dominated by convection. The effect is found to strongly depend on the spatial resolution.
△ Less
Submitted 15 October, 2014; v1 submitted 18 November, 2013;
originally announced November 2013.
-
A space-time parallel solver for the three-dimensional heat equation
Authors:
Robert Speck,
Daniel Ruprecht,
Matthew Emmett,
Matthias Bolten,
Rolf Krause
Abstract:
The paper presents a combination of the time-parallel "parallel full approximation scheme in space and time" (PFASST) with a parallel multigrid method (PMG) in space, resulting in a mesh-based solver for the three-dimensional heat equation with a uniquely high degree of efficient concurrency. Parallel scaling tests are reported on the Cray XE6 machine "Monte Rosa" on up to 16,384 cores and on the…
▽ More
The paper presents a combination of the time-parallel "parallel full approximation scheme in space and time" (PFASST) with a parallel multigrid method (PMG) in space, resulting in a mesh-based solver for the three-dimensional heat equation with a uniquely high degree of efficient concurrency. Parallel scaling tests are reported on the Cray XE6 machine "Monte Rosa" on up to 16,384 cores and on the IBM Blue Gene/Q system "JUQUEEN" on up to 65,536 cores. The efficacy of the combined spatial- and temporal parallelization is shown by demonstrating that using PFASST in addition to PMG significantly extends the strong-scaling limit. Implications of using spatial coarsening strategies in PFASST's multi-level hierarchy in large-scale parallel simulations are discussed.
△ Less
Submitted 14 July, 2014; v1 submitted 30 July, 2013;
originally announced July 2013.
-
A multi-level spectral deferred correction method
Authors:
Robert Speck,
Daniel Ruprecht,
Matthew Emmett,
Michael Minion,
Matthias Bolten,
Rolf Krause
Abstract:
The spectral deferred correction (SDC) method is an iterative scheme for computing a higher-order collocation solution to an ODE by performing a series of correction sweeps using a low-order timestep** method. This paper examines a variation of SDC for the temporal integration of PDEs called multi-level spectral deferred corrections (MLSDC), where sweeps are performed on a hierarchy of levels an…
▽ More
The spectral deferred correction (SDC) method is an iterative scheme for computing a higher-order collocation solution to an ODE by performing a series of correction sweeps using a low-order timestep** method. This paper examines a variation of SDC for the temporal integration of PDEs called multi-level spectral deferred corrections (MLSDC), where sweeps are performed on a hierarchy of levels and an FAS correction term, as in nonlinear multigrid methods, couples solutions on different levels. Three different strategies to reduce the computational cost of correction sweeps on the coarser levels are examined: reducing the degrees of freedom, reducing the order of the spatial discretization, and reducing the accuracy when solving linear systems arising in implicit temporal integration. Several numerical examples demonstrate the effect of multi-level coarsening on the convergence and cost of SDC integration. In particular, MLSDC can provide significant savings in compute time compared to SDC for a three-dimensional problem.
△ Less
Submitted 25 August, 2014; v1 submitted 4 July, 2013;
originally announced July 2013.
-
Transparent boundary conditions based on the Pole Condition for time-dependent, two-dimensional problems
Authors:
Daniel Ruprecht,
Achim Schädle,
Frank Schmidt
Abstract:
The pole condition approach for deriving transparent boundary conditions is extended to the time-dependent, two-dimensional case. Non-physical modes of the solution are identified by the position of poles of the solution's spatial Laplace transform in the complex plane. By requiring the Laplace transform to be analytic on some problem dependent complex half-plane, these modes can be suppressed. Th…
▽ More
The pole condition approach for deriving transparent boundary conditions is extended to the time-dependent, two-dimensional case. Non-physical modes of the solution are identified by the position of poles of the solution's spatial Laplace transform in the complex plane. By requiring the Laplace transform to be analytic on some problem dependent complex half-plane, these modes can be suppressed. The resulting algorithm computes a finite number of coefficients of a series expansion of the Laplace transform, thereby providing an approximation to the exact boundary condition. The resulting error decays super-algebraically with the number of coefficients, so relatively few additional degrees of freedom are sufficient to reduce the error to the level of the discretization error in the interior of the computational domain. The approach shows good results for the Schrödinger and the drift-diffusion equation but, in contrast to the one-dimensional case, exhibits instabilities for the wave and Klein-Gordon equation. Numerical examples are shown that demonstrate the good performance in the former and the instabilities in the latter case.
△ Less
Submitted 17 April, 2012;
originally announced April 2012.