-
SWIFT: A modern highly-parallel gravity and smoothed particle hydrodynamics solver for astrophysical and cosmological applications
Authors:
Matthieu Schaller,
Josh Borrow,
Peter W. Draper,
Mladen Ivkovic,
Stuart McAlpine,
Bert Vandenbroucke,
Yannick Bahé,
Evgenii Chaikin,
Aidan B. G. Chalk,
Tsang Keung Chan,
Camila Correa,
Marcel van Daalen,
Willem Elbers,
Pedro Gonnet,
Loïc Hausammann,
John Helly,
Filip Huško,
Jacob A. Kegerreis,
Folkert S. J. Nobels,
Sylvia Ploeckinger,
Yves Revaz,
William J. Roper,
Sergio Ruiz-Bonilla,
Thomas D. Sandnes,
Yolan Uyttenhove
, et al. (2 additional authors not shown)
Abstract:
Numerical simulations have become one of the key tools used by theorists in all the fields of astrophysics and cosmology. The development of modern tools that target the largest existing computing systems and exploit state-of-the-art numerical methods and algorithms is thus crucial. In this paper, we introduce the fully open-source highly-parallel, versatile, and modular coupled hydrodynamics, gra…
▽ More
Numerical simulations have become one of the key tools used by theorists in all the fields of astrophysics and cosmology. The development of modern tools that target the largest existing computing systems and exploit state-of-the-art numerical methods and algorithms is thus crucial. In this paper, we introduce the fully open-source highly-parallel, versatile, and modular coupled hydrodynamics, gravity, cosmology, and galaxy-formation code SWIFT. The software package exploits hybrid shared- and distributed-memory task-based parallelism, asynchronous communications, and domain-decomposition algorithms based on balancing the workload, rather than the data, to efficiently exploit modern high-performance computing cluster architectures. Gravity is solved for using a fast-multipole-method, optionally coupled to a particle mesh solver in Fourier space to handle periodic volumes. For gas evolution, multiple modern flavours of Smoothed Particle Hydrodynamics are implemented. SWIFT also evolves neutrinos using a state-of-the-art particle-based method. Two complementary networks of sub-grid models for galaxy formation as well as extensions to simulate planetary physics are also released as part of the code. An extensive set of output options, including snapshots, light-cones, power spectra, and a coupling to structure finders are also included. We describe the overall code architecture, summarise the consistency and accuracy tests that were performed, and demonstrate the excellent weak-scaling performance of the code using a representative cosmological hydrodynamical problem with $\approx$$300$ billion particles. The code is released to the community alongside extensive documentation for both users and developers, a large selection of example test problems, and a suite of tools to aid in the analysis of large simulations run with SWIFT.
△ Less
Submitted 29 March, 2024; v1 submitted 22 May, 2023;
originally announced May 2023.
-
Continuous Simulation Data Stream: A dynamical timescale-dependent output scheme for simulations
Authors:
Loic Hausammann,
Pedro Gonnet,
Matthieu Schaller
Abstract:
Exa-scale simulations are on the horizon but almost no new design for the output has been proposed in recent years. In simulations using individual time steps, the traditional snapshots are over resolving particles/cells with large time steps and are under resolving the particles/cells with short time steps. Therefore, they are unable to follow fast events and use efficiently the storage space. Th…
▽ More
Exa-scale simulations are on the horizon but almost no new design for the output has been proposed in recent years. In simulations using individual time steps, the traditional snapshots are over resolving particles/cells with large time steps and are under resolving the particles/cells with short time steps. Therefore, they are unable to follow fast events and use efficiently the storage space. The Continuous Simulation Data Stream (CSDS) is designed to decrease this space while providing an accurate state of the simulation at any time. It takes advantage of the individual time step to ensure the same relative accuracy for all the particles. The outputs consist of a single file representing the full evolution of the simulation. Within this file, the particles are written independently and at their own frequency. Through the interpolation of the records, the state of the simulation can be recovered at any point in time. In this paper, we show that the CSDS can reduce the storage space by 2.76x for the same accuracy than snapshots or increase the accuracy by 67.8x for the same storage space whilst retaining an acceptable reading speed for analysis. By using interpolation between records, the CSDS provides the state of the simulation, with a high accuracy, at any time. This should largely improve the analysis of fast events such as supernovae and simplify the construction of light-cone outputs.
△ Less
Submitted 3 October, 2022;
originally announced October 2022.
-
TF-GNN: Graph Neural Networks in TensorFlow
Authors:
Oleksandr Ferludin,
Arno Eigenwillig,
Martin Blais,
Dustin Zelle,
Jan Pfeifer,
Alvaro Sanchez-Gonzalez,
Wai Lok Sibon Li,
Sami Abu-El-Haija,
Peter Battaglia,
Neslihan Bulut,
Jonathan Halcrow,
Filipe Miguel Gonçalves de Almeida,
Pedro Gonnet,
Liangze Jiang,
Parth Kothari,
Silvio Lattanzi,
André Linhares,
Brandon Mayer,
Vahab Mirrokni,
John Palowitch,
Mihir Paradkar,
Jennifer She,
Anton Tsitsulin,
Kevin Villela,
Lisa Wang
, et al. (2 additional authors not shown)
Abstract:
TensorFlow-GNN (TF-GNN) is a scalable library for Graph Neural Networks in TensorFlow. It is designed from the bottom up to support the kinds of rich heterogeneous graph data that occurs in today's information ecosystems. In addition to enabling machine learning researchers and advanced developers, TF-GNN offers low-code solutions to empower the broader developer community in graph learning. Many…
▽ More
TensorFlow-GNN (TF-GNN) is a scalable library for Graph Neural Networks in TensorFlow. It is designed from the bottom up to support the kinds of rich heterogeneous graph data that occurs in today's information ecosystems. In addition to enabling machine learning researchers and advanced developers, TF-GNN offers low-code solutions to empower the broader developer community in graph learning. Many production models at Google use TF-GNN, and it has been recently released as an open source project. In this paper we describe the TF-GNN data model, its Keras message passing API, and relevant capabilities such as graph sampling and distributed training.
△ Less
Submitted 23 July, 2023; v1 submitted 7 July, 2022;
originally announced July 2022.
-
A Hybrid MPI+Threads Approach to Particle Group Finding Using Union-Find
Authors:
James S. Willis,
Matthieu Schaller,
Pedro Gonnet,
John C. Helly
Abstract:
The Friends-of-Friends (FoF) algorithm is a standard technique used in cosmological $N$-body simulations to identify structures. Its goal is to find clusters of particles (called groups) that are separated by at most a cut-off radius. $N$-body simulations typically use most of the memory present on a node, leaving very little free for a FoF algorithm to run on-the-fly. We propose a new method that…
▽ More
The Friends-of-Friends (FoF) algorithm is a standard technique used in cosmological $N$-body simulations to identify structures. Its goal is to find clusters of particles (called groups) that are separated by at most a cut-off radius. $N$-body simulations typically use most of the memory present on a node, leaving very little free for a FoF algorithm to run on-the-fly. We propose a new method that utilises the common Union-Find data structure and a hybrid MPI+threads approach. The algorithm can also be expressed elegantly in a task-based formalism if such a framework is used in the rest of the application. We have implemented our algorithm in the open-source cosmological code, SWIFT. Our implementation displays excellent strong- and weak-scaling behaviour on realistic problems and compares favourably (speed-up of 18x) over other methods commonly used in the $N$-body community.
△ Less
Submitted 25 March, 2020;
originally announced March 2020.
-
IndyLSTMs: Independently Recurrent LSTMs
Authors:
Pedro Gonnet,
Thomas Deselaers
Abstract:
We introduce Independently Recurrent Long Short-term Memory cells: IndyLSTMs. These differ from regular LSTM cells in that the recurrent weights are not modeled as a full matrix, but as a diagonal matrix, i.e.\ the output and state of each LSTM cell depends on the inputs and its own output/state, as opposed to the input and the outputs/states of all the cells in the layer. The number of parameters…
▽ More
We introduce Independently Recurrent Long Short-term Memory cells: IndyLSTMs. These differ from regular LSTM cells in that the recurrent weights are not modeled as a full matrix, but as a diagonal matrix, i.e.\ the output and state of each LSTM cell depends on the inputs and its own output/state, as opposed to the input and the outputs/states of all the cells in the layer. The number of parameters per IndyLSTM layer, and thus the number of FLOPS per evaluation, is linear in the number of nodes in the layer, as opposed to quadratic for regular LSTM layers, resulting in potentially both smaller and faster models. We evaluate their performance experimentally by training several models on the popular \iamondb and CASIA online handwriting datasets, as well as on several of our in-house datasets. We show that IndyLSTMs, despite their smaller size, consistently outperform regular LSTMs both in terms of accuracy per parameter, and in best accuracy overall. We attribute this improved performance to the IndyLSTMs being less prone to overfitting.
△ Less
Submitted 19 March, 2019;
originally announced March 2019.
-
Fast Multi-language LSTM-based Online Handwriting Recognition
Authors:
Victor Carbune,
Pedro Gonnet,
Thomas Deselaers,
Henry A. Rowley,
Alexander Daryin,
Marcos Calvo,
Li-Lun Wang,
Daniel Keysers,
Sandro Feuz,
Philippe Gervais
Abstract:
We describe an online handwriting system that is able to support 102 languages using a deep neural network architecture. This new system has completely replaced our previous Segment-and-Decode-based system and reduced the error rate by 20%-40% relative for most languages. Further, we report new state-of-the-art results on IAM-OnDB for both the open and closed dataset setting. The system combines m…
▽ More
We describe an online handwriting system that is able to support 102 languages using a deep neural network architecture. This new system has completely replaced our previous Segment-and-Decode-based system and reduced the error rate by 20%-40% relative for most languages. Further, we report new state-of-the-art results on IAM-OnDB for both the open and closed dataset setting. The system combines methods from sequence recognition with a new input encoding using Bézier curves. This leads to up to 10x faster recognition times compared to our previous system. Through a series of experiments we determine the optimal configuration of our models and report the results of our setup on a number of additional public datasets.
△ Less
Submitted 24 January, 2020; v1 submitted 22 February, 2019;
originally announced February 2019.
-
Planetary Giant Impacts: Convergence of High-Resolution Simulations using Efficient Spherical Initial Conditions and SWIFT
Authors:
J. A. Kegerreis,
V. R. Eke,
P. G. Gonnet,
D. G. Korycansky,
R. J. Massey,
M. Schaller,
L. F. A. Teodoro
Abstract:
We perform simulations of giant impacts onto the young Uranus using smoothed particle hydrodynamics (SPH) with over 100 million particles. This 100--1000$\times$ improvement in particle number reveals that simulations with below 10^7 particles fail to converge on even bulk properties like the post-impact rotation period, or on the detailed erosion of the atmosphere. Higher resolutions appear to de…
▽ More
We perform simulations of giant impacts onto the young Uranus using smoothed particle hydrodynamics (SPH) with over 100 million particles. This 100--1000$\times$ improvement in particle number reveals that simulations with below 10^7 particles fail to converge on even bulk properties like the post-impact rotation period, or on the detailed erosion of the atmosphere. Higher resolutions appear to determine these large-scale results reliably, but even 10^8 particles may not be sufficient to study the detailed composition of the debris -- finding that almost an order of magnitude more rock is ejected beyond the Roche radius than with 10^5 particles. We present two software developments that enable this increase in the feasible number of particles. First, we present an algorithm to place any number of particles in a spherical shell such that they all have an SPH density within 1% of the desired value. Particles in model planets built from these nested shells have a root-mean-squared velocity below 1% of the escape speed, which avoids the need for long precursor simulations to produce relaxed initial conditions. Second, we develop the hydrodynamics code SWIFT for planetary simulations. SWIFT uses task-based parallelism and other modern algorithmic approaches to take full advantage of contemporary supercomputer architectures. Both the particle placement code and SWIFT are publicly released.
△ Less
Submitted 7 April, 2020; v1 submitted 28 January, 2019;
originally announced January 2019.
-
SWIFT: Maintaining weak-scalability with a dynamic range of $10^4$ in time-step size to harness extreme adaptivity
Authors:
Josh Borrow,
Richard G. Bower,
Peter W. Draper,
Pedro Gonnet,
Matthieu Schaller
Abstract:
Cosmological simulations require the use of a multiple time-step** scheme. Without such a scheme, cosmological simulations would be impossible due to their high level of dynamic range; over eleven orders of magnitude in density. Such a large dynamic range leads to a range of over four orders of magnitude in time-step, which presents a significant load-balancing challenge. In this work, the extre…
▽ More
Cosmological simulations require the use of a multiple time-step** scheme. Without such a scheme, cosmological simulations would be impossible due to their high level of dynamic range; over eleven orders of magnitude in density. Such a large dynamic range leads to a range of over four orders of magnitude in time-step, which presents a significant load-balancing challenge. In this work, the extreme adaptivity that cosmological simulations present is tackled in three main ways through the use of the code SWIFT. First, an adaptive mesh is used to ensure that only the relevant particles are interacted in a given time-step. Second, task-based parallelism is used to ensure efficient load-balancing within a single node, using pthreads and SIMD vectorisation. Finally, a domain decomposition strategy is presented, using the graph domain decomposition library METIS, that bisects the work that must be performed by the simulation between nodes using MPI. These three strategies are shown to give SWIFT near-perfect weak-scaling characteristics, only losing 25% performance when scaling from 1 to 4096 cores on a representative problem, whilst being more than 30x faster than the de-facto standard Gadget-2 code.
△ Less
Submitted 3 July, 2018;
originally announced July 2018.
-
An Efficient SIMD Implementation of Pseudo-Verlet Lists for Neighbour Interactions in Particle-Based Codes
Authors:
James S. Willis,
Matthieu Schaller,
Pedro Gonnet,
Richard G. Bower,
Peter W. Draper
Abstract:
In particle-based simulations, neighbour finding (i.e finding pairs of particles to interact within a given range) is the most time consuming part of the computation. One of the best such algorithms, which can be used for both Molecular Dynamics (MD) and Smoothed Particle Hydrodynamics (SPH) simulations, is the pseudo-Verlet list algorithm. This algorithm, however, does not vectorise trivially, an…
▽ More
In particle-based simulations, neighbour finding (i.e finding pairs of particles to interact within a given range) is the most time consuming part of the computation. One of the best such algorithms, which can be used for both Molecular Dynamics (MD) and Smoothed Particle Hydrodynamics (SPH) simulations, is the pseudo-Verlet list algorithm. This algorithm, however, does not vectorise trivially, and hence makes it difficult to exploit SIMD-parallel architectures. In this paper, we present several novel modifications as well as a vectorisation strategy for the algorithm which lead to overall speed-ups over the scalar version of the algorithm of 2.24x for the AVX instruction set (SIMD width of 8), 2.43x for AVX2, and 4.07x for AVX-512 (SIMD width of 16).
△ Less
Submitted 17 April, 2018;
originally announced April 2018.
-
SWIFT: Using task-based parallelism, fully asynchronous communication, and graph partition-based domain decomposition for strong scaling on more than 100,000 cores
Authors:
Matthieu Schaller,
Pedro Gonnet,
Aidan B. G. Chalk,
Peter W. Draper
Abstract:
We present a new open-source cosmological code, called SWIFT, designed to solve the equations of hydrodynamics using a particle-based approach (Smooth Particle Hydrodynamics) on hybrid shared/distributed-memory architectures. SWIFT was designed from the bottom up to provide excellent strong scaling on both commodity clusters (Tier-2 systems) and Top100-supercomputers (Tier-0 systems), without rely…
▽ More
We present a new open-source cosmological code, called SWIFT, designed to solve the equations of hydrodynamics using a particle-based approach (Smooth Particle Hydrodynamics) on hybrid shared/distributed-memory architectures. SWIFT was designed from the bottom up to provide excellent strong scaling on both commodity clusters (Tier-2 systems) and Top100-supercomputers (Tier-0 systems), without relying on architecture-specific features or specialized accelerator hardware. This performance is due to three main computational approaches: (1) Task-based parallelism for shared-memory parallelism, which provides fine-grained load balancing and thus strong scaling on large numbers of cores. (2) Graph-based domain decomposition, which uses the task graph to decompose the simulation domain such that the work, as opposed to just the data, as is the case with most partitioning schemes, is equally distributed across all nodes. (3) Fully dynamic and asynchronous communication, in which communication is modelled as just another task in the task-based scheme, sending data whenever it is ready and deferring on tasks that rely on data from other nodes until it arrives. In order to use these approaches, the code had to be re-written from scratch, and the algorithms therein adapted to the task-based paradigm. As a result, we can show upwards of 60% parallel efficiency for moderate-sized problems when increasing the number of cores 512-fold, on both x86-based and Power8-based architectures.
△ Less
Submitted 8 June, 2016;
originally announced June 2016.
-
QuickSched: Task-based parallelism with dependencies and conflicts
Authors:
Pedro Gonnet,
Aidan B. G. Chalk,
Matthieu Schaller
Abstract:
This paper describes QuickSched, a compact and efficient Open-Source C-language library for task-based shared-memory parallel programming. QuickSched extends the standard dependency-only scheme of task-based programming with the concept of task conflicts, i.e.~sets of tasks that can be executed in any order, yet not concurrently. These conflicts are modelled using exclusively lockable hierarchical…
▽ More
This paper describes QuickSched, a compact and efficient Open-Source C-language library for task-based shared-memory parallel programming. QuickSched extends the standard dependency-only scheme of task-based programming with the concept of task conflicts, i.e.~sets of tasks that can be executed in any order, yet not concurrently. These conflicts are modelled using exclusively lockable hierarchical resources. The scheduler itself prioritizes tasks along the critical path of execution and is shown to perform and scale well on a 64-core parallel shared-memory machine for two example problems: A tiled QR decomposition and a task-based Barnes-Hut tree code.
△ Less
Submitted 20 January, 2016;
originally announced January 2016.
-
SWIFT: task-based hydrodynamics and gravity for cosmological simulations
Authors:
Tom Theuns,
Aidan Chalk,
Matthieu Schaller,
Pedro Gonnet
Abstract:
Simulations of galaxy formation follow the gravitational and hydrodynamical interactions between gas, stars and dark matter through cosmic time. The huge dynamic range of such calculations severely limits strong scaling behaviour of the community codes in use, with load-imbalance, cache inefficiencies and poor vectorisation limiting performance. The new swift code exploits task-based parallelism d…
▽ More
Simulations of galaxy formation follow the gravitational and hydrodynamical interactions between gas, stars and dark matter through cosmic time. The huge dynamic range of such calculations severely limits strong scaling behaviour of the community codes in use, with load-imbalance, cache inefficiencies and poor vectorisation limiting performance. The new swift code exploits task-based parallelism designed for many-core compute nodes interacting via MPI using asynchronous communication to improve speed and scaling. A graph-based domain decomposition schedules interdependent tasks over available resources. Strong scaling tests on realistic particle distributions yield excellent parallel efficiency, and efficient cache usage provides a large speed-up compared to current codes even on a single core. SWIFT is designed to be easy to use by shielding the astronomer from computational details such as the construction of the tasks or MPI communication. The techniques and algorithms used in SWIFT may benefit other computational physics areas as well, for example that of compressible hydrodynamics. For details of this open-source project, see www.swiftsim.com
△ Less
Submitted 1 August, 2015;
originally announced August 2015.
-
Efficient and Scalable Algorithms for Smoothed Particle Hydrodynamics on Hybrid Shared/Distributed-Memory Architectures
Authors:
Pedro Gonnet
Abstract:
This paper describes a new fast and implicitly parallel approach to neighbour-finding in multi-resolution Smoothed Particle Hydrodynamics (SPH) simulations. This new approach is based on hierarchical cell decompositions and sorted interactions, within a task-based formulation. It is shown to be faster than traditional tree-based codes, and to scale better than domain decomposition-based approaches…
▽ More
This paper describes a new fast and implicitly parallel approach to neighbour-finding in multi-resolution Smoothed Particle Hydrodynamics (SPH) simulations. This new approach is based on hierarchical cell decompositions and sorted interactions, within a task-based formulation. It is shown to be faster than traditional tree-based codes, and to scale better than domain decomposition-based approaches on hybrid shared/distributed-memory parallel architectures, e.g. clusters of multi-cores, achieving a $40\times$ speedup over the Gadget-2 simulation code.
△ Less
Submitted 8 April, 2014;
originally announced April 2014.
-
SWIFT: Fast algorithms for multi-resolution SPH on multi-core architectures
Authors:
Pedro Gonnet,
Matthieu Schaller,
Tom Theuns,
Aidan B. G. Chalk
Abstract:
This paper describes a novel approach to neighbour-finding in Smoothed Particle Hydrodynamics (SPH) simulations with large dynamic range in smoothing length. This approach is based on hierarchical cell decompositions, sorted interactions, and a task-based formulation. It is shown to be faster than traditional tree-based codes, and to scale better than domain decomposition-based approaches on share…
▽ More
This paper describes a novel approach to neighbour-finding in Smoothed Particle Hydrodynamics (SPH) simulations with large dynamic range in smoothing length. This approach is based on hierarchical cell decompositions, sorted interactions, and a task-based formulation. It is shown to be faster than traditional tree-based codes, and to scale better than domain decomposition-based approaches on shared-memory parallel architectures such as multi-cores.
△ Less
Submitted 15 September, 2013;
originally announced September 2013.
-
Increasing the Reliability of Adaptive Quadrature Using Explicit Interpolants
Authors:
Pedro Gonnet
Abstract:
We present two new adaptive quadrature routines. Both routines differ from previously published algorithms in many aspects, most significantly in how they represent the integrand, how they treat non-numerical values of the integrand, how they deal with improper divergent integrals and how they estimate the integration error. The main focus of these improvements is to increase the reliability of th…
▽ More
We present two new adaptive quadrature routines. Both routines differ from previously published algorithms in many aspects, most significantly in how they represent the integrand, how they treat non-numerical values of the integrand, how they deal with improper divergent integrals and how they estimate the integration error. The main focus of these improvements is to increase the reliability of the algorithms without significantly impacting their efficiency. Both algorithms are implemented in Matlab and tested using both the "families" suggested by Lyness and Kaganove and the battery test used by Gander and Gautschi and Kahaner. They are shown to be more reliable, albeit in some cases less efficient, than other commonly-used adaptive integrators.
△ Less
Submitted 20 June, 2010;
originally announced June 2010.
-
A Review of Error Estimation in Adaptive Quadrature
Authors:
Pedro Gonnet
Abstract:
The most critical component of any adaptive numerical quadrature routine is the estimation of the integration error. Since the publication of the first algorithms in the 1960s, many error estimation schemes have been presented, evaluated and discussed. This paper presents a review of existing error estimation techniques and discusses their differences and their common features. Some common shortco…
▽ More
The most critical component of any adaptive numerical quadrature routine is the estimation of the integration error. Since the publication of the first algorithms in the 1960s, many error estimation schemes have been presented, evaluated and discussed. This paper presents a review of existing error estimation techniques and discusses their differences and their common features. Some common shortcomings of these algorithms are discussed and a new general error estimation technique is presented.
△ Less
Submitted 7 November, 2010; v1 submitted 24 March, 2010;
originally announced March 2010.
-
Efficient Construction, Update and Downdate Of The Coefficients Of Interpolants Based On Polynomials Satisfying A Three-Term Recurrence Relation
Authors:
Pedro Gonnet
Abstract:
In this paper, we consider methods to compute the coefficients of interpolants relative to a basis of polynomials satisfying a three-term recurrence relation. Two new algorithms are presented: the first constructs the coefficients of the interpolation incrementally and can be used to update the coefficients whenever a nodes is added to or removed from the interpolation. The second algorithm, which…
▽ More
In this paper, we consider methods to compute the coefficients of interpolants relative to a basis of polynomials satisfying a three-term recurrence relation. Two new algorithms are presented: the first constructs the coefficients of the interpolation incrementally and can be used to update the coefficients whenever a nodes is added to or removed from the interpolation. The second algorithm, which constructs the interpolation coefficients by decomposing the Vandermonde-like matrix iteratively, can not be used to update or downdate an interpolation, yet is more numerically stable than the first algorithm and is more efficient when the coefficients of multiple interpolations are to be computed over the same set of nodes.
△ Less
Submitted 30 March, 2010; v1 submitted 24 March, 2010;
originally announced March 2010.