Search | arXiv e-print repository

doi 10.1016/j.jcp.2023.112234

Exact conservation laws for neural network integrators of dynamical systems

Abstract: The solution of time dependent differential equations with neural networks has attracted a lot of attention recently. The central idea is to learn the laws that govern the evolution of the solution from data, which might be polluted with random noise. However, in contrast to other machine learning applications, usually a lot is known about the system at hand. For example, for many dynamical system… ▽ More The solution of time dependent differential equations with neural networks has attracted a lot of attention recently. The central idea is to learn the laws that govern the evolution of the solution from data, which might be polluted with random noise. However, in contrast to other machine learning applications, usually a lot is known about the system at hand. For example, for many dynamical systems physical quantities such as energy or (angular) momentum are exactly conserved. Hence, the neural network has to learn these conservation laws from data and they will only be satisfied approximately due to finite training time and random noise. In this paper we present an alternative approach which uses Noether's Theorem to inherently incorporate conservation laws into the architecture of the neural network. We demonstrate that this leads to better predictions for three model systems: the motion of a non-relativistic particle in a three-dimensional Newtonian gravitational potential, the motion of a massive relativistic particle in the Schwarzschild metric and a system of two interacting particles in four dimensions. △ Less

Submitted 14 May, 2023; v1 submitted 23 September, 2022; originally announced September 2022.

Comments: 24 pages, 16 figures; to appear in Journal of Computational Physics

MSC Class: 65L05; 68T07; 70H33; 70H40; 83C10 ACM Class: G.1.7

arXiv:2007.10054 [pdf, other]

Parallel Performance of ARM ThunderX2 for Atomistic Simulation Algorithms

Authors: William Robert Saunders, James Grant, Eike Hermann Müller

Abstract: Atomistic simulation drives scientific advances in modern material science and accounts for a significant proportion of wall time on High Performance Computing facilities. It is important that algorithms are efficient and implementations are performant in a continuously diversifying hardware landscape. Furthermore, they have to be portable to make best use of the available computing resource. In… ▽ More Atomistic simulation drives scientific advances in modern material science and accounts for a significant proportion of wall time on High Performance Computing facilities. It is important that algorithms are efficient and implementations are performant in a continuously diversifying hardware landscape. Furthermore, they have to be portable to make best use of the available computing resource. In this paper we assess the parallel performance of some key algorithms implemented in a performance portable framework developed by us. We consider Molecular Dynamics with short range interactions, the Fast Multipole Method and Kinetic Monte Carlo. To assess the performance of emerging architectures, we compare the Marvell ThunderX2 (ARM) architecture to traditional x86_64 hardware made available through the Azure cloud computing service. △ Less

Submitted 20 July, 2020; originally announced July 2020.

Comments: 10 pages, 3 figures, 1 tables; submitted to EAHPC-2020 (Embracing Arm: a journey of porting and optimization to the latest Arm-based processors 2020)

arXiv:2002.00756 [pdf, ps, other]

doi 10.1002/qj.3880

Multigrid preconditioners for the mixed finite element dynamical core of the LFRic atmospheric model

Authors: Christopher Maynard, Thomas Melvin, Eike Hermann Müller

Abstract: Due to the wide separation of time scales in geophysical fluid dynamics, semi-implicit time integrators are commonly used in operational atmospheric forecast models. They guarantee the stable treatment of fast (acoustic and gravity) waves, while not suffering from severe restrictions on the timestep size. To propagate the state of the atmosphere forward in time, a non-linear equation for the progn… ▽ More Due to the wide separation of time scales in geophysical fluid dynamics, semi-implicit time integrators are commonly used in operational atmospheric forecast models. They guarantee the stable treatment of fast (acoustic and gravity) waves, while not suffering from severe restrictions on the timestep size. To propagate the state of the atmosphere forward in time, a non-linear equation for the prognostic variables has to be solved at every timestep. Since the nonlinearity is typically weak, this is done with a small number of Newton- or Picard- iterations, which in turn require the efficient solution of a large system on linear equations with $\mathcal{O}(10^6-10^9)$ unknowns. This linear solve is often the computationally most costly part of the model. In this paper an efficient linear solver for the LFRic next-generation model, currently developed by the Met Office, is described. The model uses an advanced mimetic finite element discretisation which makes the construction of efficient solvers challenging compared to models using standard finite-difference and finite-volume methods. The linear solver hinges on a bespoke multigrid preconditioner of the Schur-complement system for the pressure correction. By comparing to Krylov-subspace methods, the superior performance and robustness of the multigrid algorithm is demonstrated for standard test cases and realistic model setups. In production mode, the model will have to run in parallel on 100,000s of processing elements. As confirmed by numerical experiments, one particular advantage of the multigrid solver is its excellent parallel scalability due to avoiding expensive global reduction operations. △ Less

Submitted 21 July, 2020; v1 submitted 31 January, 2020; originally announced February 2020.

Comments: 22 pages, 6 figures, 5 tables, to appear in Quarterly Journal of the Royal Meteorological Society

MSC Class: 65F08; 65N55; 76M10; 86A10; 65Y05 ACM Class: G.1.3; G.1.8; J.2

arXiv:1905.04065 [pdf, other]

doi 10.1016/j.jcp.2020.109379

Fast electrostatic solvers for kinetic Monte Carlo simulations

Authors: William Robert Saunders, James Grant, Eike Hermann Müller, Ian Thompson

Abstract: Kinetic Monte Carlo (KMC) is an important computational tool in physics and chemistry. In contrast to standard Monte Carlo, KMC permits the description of time dependent dynamical processes and is not restricted to systems in equilibrium. Recently KMC has been applied successfully in modelling of novel energy materials such as Lithium-ion batteries and solar cells. We consider general solid state… ▽ More Kinetic Monte Carlo (KMC) is an important computational tool in physics and chemistry. In contrast to standard Monte Carlo, KMC permits the description of time dependent dynamical processes and is not restricted to systems in equilibrium. Recently KMC has been applied successfully in modelling of novel energy materials such as Lithium-ion batteries and solar cells. We consider general solid state systems which contain free, interacting particles which can hop between localised sites in the material. The KMC transition rates for those hops depend on the change in total potential energy of the system. For charged particles this requires the frequent calculation of electrostatic interactions, which is usually the bottleneck of the simulation. To avoid this issue and obtain results in reasonable times, many studies replace the long-range potential by a short range approximation. This, however, leads to systematic errors and unphysical results. On the other hand standard electrostatic solvers such as Ewald summation or fast Poisson solvers are highly inefficient or introduce uncontrollable systematic errors at high resolution. In this paper we describe how the Fast Multipole Method by Greengard and Rokhlin can be adapted to overcome this issue by dramatically reducing computational costs. We exploit the fact that each update in the transition rate calculation corresponds to a single particle move and changes the configuration only by a small amount. This allows us to construct an algorithm which scales linearly in the number of charges for each KMC step, something which had not been deemed to be possible before. We demonstrate the performance and parallel scalability of the method by implementing it in a performance portable software library. We describe the high-level Python interface of the code which makes it easy to adapt to specific cases. △ Less

Submitted 1 March, 2020; v1 submitted 10 May, 2019; originally announced May 2019.

Comments: 26 pages, 19 figures, 7 tables; accepted for publication in Computer Physics Communications

MSC Class: 78M16; 82C80; 82D37; 65Y05; 65Y20 ACM Class: J.2; G.4; D.1.3; D.2.11

arXiv:1809.07267 [pdf, other]

doi 10.1016/j.jpdc.2019.02.007

LFRic: Meeting the challenges of scalability and performance portability in Weather and Climate models

Authors: S. V. Adams, R. W. Ford, M. Hambley, J. M. Hobson, I. Kavcic, C. M. Maynard, T. Melvin, E. H Mueller, S. Mullerworth, A. R. Porter, M. Rezny, B. J. Shipway, R. Wong

Abstract: This paper describes LFRic: the new weather and climate modelling system being developed by the UK Met Office to replace the existing Unified Model in preparation for exascale computing in the 2020s. LFRic uses the GungHo dynamical core and runs on a semi-structured cubed-sphere mesh. The design of the supporting infrastructure follows object orientated principles to facilitate modularity and the… ▽ More This paper describes LFRic: the new weather and climate modelling system being developed by the UK Met Office to replace the existing Unified Model in preparation for exascale computing in the 2020s. LFRic uses the GungHo dynamical core and runs on a semi-structured cubed-sphere mesh. The design of the supporting infrastructure follows object orientated principles to facilitate modularity and the use of external libraries where possible. In particular, a `separation of concerns' between the science code and parallel code is imposed to promote performance portability. An application called PSyclone, developed at the STFC Hartree centre, can generate the parallel code enabling deployment of a single source science code onto different machine architectures. This paper provides an overview of the scientific requirement, the design of the software infrastructure, and examples of PSyclone usage. Preliminary performance results show strong scaling and an indication that hybrid MPI/OpenMP performs better than pure MPI. △ Less

Submitted 12 July, 2019; v1 submitted 19 September, 2018; originally announced September 2018.

Comments: 41 pages, 10 figures. EASC2018

Journal ref: Journal of Parallel and Distributed Computing, 132 (2019), 383 -- 396

arXiv:1708.01135 [pdf, other]

Long range forces in a performance portable Molecular Dynamics framework

Authors: William R. Saunders, James Grant, Eike H. Müller

Abstract: Molecular Dynamics (MD) codes predict the fundamental properties of matter by following the trajectories of a collection of interacting model particles. To exploit diverse modern manycore hardware, efficient codes must use all available parallelism. At the same time they need to be portable and easily extendible by the domain specialist (physicist/chemist) without detailed knowledge of this hardwa… ▽ More Molecular Dynamics (MD) codes predict the fundamental properties of matter by following the trajectories of a collection of interacting model particles. To exploit diverse modern manycore hardware, efficient codes must use all available parallelism. At the same time they need to be portable and easily extendible by the domain specialist (physicist/chemist) without detailed knowledge of this hardware. To address this challenge, we recently described a new Domain Specific Language (DSL) for the development of performance portable MD codes based on a "Separation of Concerns": a Python framework automatically generates efficient parallel code for a range of target architectures. Electrostatic interactions between charged particles are important in many physical systems and often dominate the runtime. Here we discuss the inclusion of long-range interaction algorithms in our code generation framework. These algorithms require global communications and careful consideration has to be given to any impact on parallel scalability. We implemented an Ewald summation algorithm for electrostatic forces, present scaling comparisons for different system sizes and compare to the performance of existing codes. We also report on further performance optimisations delivered with OpenMP shared memory parallelism. △ Less

Submitted 3 August, 2017; originally announced August 2017.

Comments: 9 pages, 3 figures, submitted to ParCo 2017 Parallel Computing Conference

ACM Class: D.1.3; D.2.11; J.2; G.4

arXiv:1704.03329 [pdf, other]

doi 10.1016/j.cpc.2017.11.006

A Domain Specific Language for Performance Portable Molecular Dynamics Algorithms

Authors: William R. Saunders, James Grant, Eike H. Müller

Abstract: Developers of Molecular Dynamics (MD) codes face significant challenges when adapting existing simulation packages to new hardware. In a continuously diversifying hardware landscape it becomes increasingly difficult for scientists to be experts both in their own domain (physics/chemistry/biology) and specialists in the low level parallelisation and optimisation of their codes. To address this chal… ▽ More Developers of Molecular Dynamics (MD) codes face significant challenges when adapting existing simulation packages to new hardware. In a continuously diversifying hardware landscape it becomes increasingly difficult for scientists to be experts both in their own domain (physics/chemistry/biology) and specialists in the low level parallelisation and optimisation of their codes. To address this challenge, we describe a "Separation of Concerns" approach for the development of parallel and optimised MD codes: the science specialist writes code at a high abstraction level in a domain specific language (DSL), which is then translated into efficient computer code by a scientific programmer. In a related context, an abstraction for the solution of partial differential equations with grid based methods has recently been implemented in the (Py)OP2 library. Inspired by this approach, we develop a Python code generation system for molecular dynamics simulations on different parallel architectures, including massively parallel distributed memory systems and GPUs. We demonstrate the efficiency of the auto-generated code by studying its performance and scalability on different hardware and compare it to other state-of-the-art simulation packages. With growing data volumes the extraction of physically meaningful information from the simulation becomes increasingly challenging and requires equally efficient implementations. A particular advantage of our approach is the easy expression of such analysis algorithms. We consider two popular methods for deducing the crystalline structure of a material from the local environment of each atom, show how they can be expressed in our abstraction and implement them in the code generation framework. △ Less

Submitted 13 November, 2017; v1 submitted 11 April, 2017; originally announced April 2017.

Comments: 24 pages, 12 figures, 11 tables, accepted for publication in Computer Physics Communications on 12 Nov 2017

ACM Class: D.1.3, D.2.11, J.2, G.4

arXiv:1605.00492 [pdf, other]

doi 10.1016/j.jcp.2016.09.037

High level implementation of geometric multigrid solvers for finite element problems: applications in atmospheric modelling

Authors: Lawrence Mitchell, Eike Hermann Müller

Abstract: The implementation of efficient multigrid preconditioners for elliptic partial differential equations (PDEs) is a challenge due to the complexity of the resulting algorithms and corresponding computer code. For sophisticated finite element discretisations on unstructured grids an efficient implementation can be very time consuming and requires the programmer to have in-depth knowledge of the mathe… ▽ More The implementation of efficient multigrid preconditioners for elliptic partial differential equations (PDEs) is a challenge due to the complexity of the resulting algorithms and corresponding computer code. For sophisticated finite element discretisations on unstructured grids an efficient implementation can be very time consuming and requires the programmer to have in-depth knowledge of the mathematical theory, parallel computing and optimisation techniques on manycore CPUs. In this paper we show how the development of bespoke multigrid preconditioners can be simplified significantly by using a framework which allows the expression of the each component of the algorithm at the correct abstraction level. Our approach (1) allows the expression of the finite element problem in a language which is close to the mathematical formulation of the problem, (2) guarantees the automatic generation and efficient execution of parallel optimised low-level computer code and (3) is flexible enough to support different abstraction levels and give the programmer control over details of the preconditioner. We use the composable abstractions of the Firedrake/PyOP2 package to demonstrate the efficiency of this approach for the solution of strongly anisotropic PDEs in atmospheric modelling. The weak formulation of the PDE is expressed in Unified Form Language (UFL) and the lower PyOP2 abstraction layer allows the manual design of computational kernels for a bespoke geometric multigrid preconditioner. We compare the performance of this preconditioner to a single-level method and hypre's BoomerAMG algorithm. The Firedrake/PyOP2 code is inherently parallel and we present a detailed performance analysis for a single node (24 cores) on the ARCHER supercomputer. Our implementation utilises a significant fraction of the available memory bandwidth and shows very good weak scaling on up to 6,144 compute cores. △ Less

Submitted 14 September, 2016; v1 submitted 2 May, 2016; originally announced May 2016.

Comments: 22 pages, 5 figures, 9 tables. Submitted to JCP

MSC Class: 65F08; 65N55; 76M10; 86A10 ACM Class: D.2.2; G.1.3; G.1.8; G.4; J.2

Journal ref: Journal of Computational Physics 327:1-18 (2016)

arXiv:1408.2981 [pdf, ps, other]

Efficient Multigrid Preconditioners for Atmospheric Flow Simulations at High Aspect Ratio

Authors: Andreas Dedner, Eike Hermann Müller, Robert Scheichl

Abstract: Many problems in fluid modelling require the efficient solution of highly anisotropic elliptic partial differential equations (PDEs) in "flat" domains. For example, in numerical weather- and climate-prediction an elliptic PDE for the pressure correction has to be solved at every time step in a thin spherical shell representing the global atmosphere. This elliptic solve can be one of the computatio… ▽ More Many problems in fluid modelling require the efficient solution of highly anisotropic elliptic partial differential equations (PDEs) in "flat" domains. For example, in numerical weather- and climate-prediction an elliptic PDE for the pressure correction has to be solved at every time step in a thin spherical shell representing the global atmosphere. This elliptic solve can be one of the computationally most demanding components in semi-implicit semi-Lagrangian time step** methods which are very popular as they allow for larger model time steps and better overall performance. With increasing model resolution, algorithmically efficient and scalable algorithms are essential to run the code under tight operational time constraints. We discuss the theory and practical application of bespoke geometric multigrid preconditioners for equations of this type. The algorithms deal with the strong anisotropy in the vertical direction by using the tensor-product approach originally analysed by Börm and Hiptmair [Numer. Algorithms, 26/3 (2001), pp. 219-234]. We extend the analysis to three dimensions under slightly weakened assumptions, and numerically demonstrate its efficiency for the solution of the elliptic PDE for the global pressure correction in atmospheric forecast models. For this we compare the performance of different multigrid preconditioners on a tensor-product grid with a semi-structured and quasi-uniform horizontal mesh and a one dimensional vertical grid. The code is implemented in the Distributed and Unified Numerics Environment (DUNE), which provides an easy-to-use and scalable environment for algorithms operating on tensor-product grids. Parallel scalability of our solvers on up to 20,480 cores is demonstrated on the HECToR supercomputer. △ Less

Submitted 10 February, 2015; v1 submitted 13 August, 2014; originally announced August 2014.

Comments: 22 pages, 6 Figures, 2 Tables

MSC Class: 65N55; 65Y20; 65F08; 65Y05; 35J57; 86A10 ACM Class: G.1.8; J.2; G.1.0

arXiv:1402.3545 [pdf, ps, other]

Petascale elliptic solvers for anisotropic PDEs on GPU clusters

Authors: Eike Hermann Müller, Robert Scheichl, Eero Vainikko

Abstract: Memory bound applications such as solvers for large sparse systems of equations remain a challenge for GPUs. Fast solvers should be based on numerically efficient algorithms and implemented such that global memory access is minimised. To solve systems with up to one trillion ($10^{12}$) unknowns the code has to make efficient use of several million individual processor cores on large GPU clusters.… ▽ More Memory bound applications such as solvers for large sparse systems of equations remain a challenge for GPUs. Fast solvers should be based on numerically efficient algorithms and implemented such that global memory access is minimised. To solve systems with up to one trillion ($10^{12}$) unknowns the code has to make efficient use of several million individual processor cores on large GPU clusters. We describe the multi-GPU implementation of two algorithmically optimal iterative solvers for anisotropic elliptic PDEs which are encountered in atmospheric modelling. In this application the condition number is large but independent of the grid resolution and both methods are asymptotically optimal, albeit with different absolute performance. We parallelise the solvers and adapt them to the specific features of GPU architectures, paying particular attention to efficient global memory access. We achieve a performance of up to 0.78 PFLOPs when solving an equation with $0.55\cdot 10^{12}$ unknowns on 16384 GPUs; this corresponds to about $3\%$ of the theoretical peak performance of the machine and we use more than $40\%$ of the peak memory bandwidth with a Conjugate Gradient (CG) solver. Although the other solver, a geometric multigrid algorithm, has a slightly worse performance in terms of FLOPs per second, overall it is faster as it needs less iterations to converge; the multigrid algorithm can solve a linear PDE with half a trillion unknowns in about one second. △ Less

Submitted 29 May, 2015; v1 submitted 14 February, 2014; originally announced February 2014.

Comments: 20 pages, 6 figures. Additional explanations and clarifications of the characteristics of the PDE; discussion and estimate of the condition number. Added section and figure on the robustness of both the single-level and the multigrid method under variations of the Courant number. Clarified the terminology in the performance analysis. Added section on preliminary strong scaling results

MSC Class: 65Y05 (Primary); 65N55 (Secondary) ACM Class: G.1.0; G.1.8; C.1

arXiv:1307.2036 [pdf, ps, other]

doi 10.1002/qj.2327

Massively parallel solvers for elliptic PDEs in Numerical Weather- and Climate Prediction

Authors: Eike H. Mueller, Robert Scheichl

Abstract: The demand for substantial increases in the spatial resolution of global weather- and climate- prediction models makes it necessary to use numerically efficient and highly scalable algorithms to solve the equations of large scale atmospheric fluid dynamics. For stability and efficiency reasons several of the operational forecasting centres, in particular the Met Office and the ECMWF in the UK, use… ▽ More The demand for substantial increases in the spatial resolution of global weather- and climate- prediction models makes it necessary to use numerically efficient and highly scalable algorithms to solve the equations of large scale atmospheric fluid dynamics. For stability and efficiency reasons several of the operational forecasting centres, in particular the Met Office and the ECMWF in the UK, use semi-implicit semi-Lagrangian time step** in the dynamical core of the model. The additional burden with this approach is that a three dimensional elliptic partial differential equation (PDE) for the pressure correction has to be solved at every model time step and this often constitutes a significant proportion of the time spent in the dynamical core. To run within tight operational time scales the solver has to be parallelised and there seems to be a (perceived) misconception that elliptic solvers do not scale to large processor counts and hence implicit time step** can not be used in very high resolution global models. After reviewing several methods for solving the elliptic PDE for the pressure correction and their application in atmospheric models we demonstrate the performance and very good scalability of Krylov subspace solvers and multigrid algorithms for a representative model equation with more than $10^{10}$ unknowns on 65536 cores on HECToR, the UK's national supercomputer. For this we tested and optimised solvers from two existing numerical libraries (DUNE and hypre) and implemented both a Conjugate Gradient solver and a geometric multigrid algorithm based on a tensor-product approach which exploits the strong vertical anisotropy of the discretised equation. We study both weak and strong scalability and compare the absolute solution times for all methods; in contrast to one-level methods the multigrid solver is robust with respect to parameter variations. △ Less

Submitted 8 July, 2013; originally announced July 2013.

Comments: 24 pages, 7 figures, 7 tables

MSC Class: 35J25; 65F08; 65F10; 65N55; 68W10 ACM Class: D.1.3; G.1.3; G.1.8; J.2

Showing 1–11 of 11 results for author: Mueller, E H