-
A hybrid approach for solving the gravitational N-body problem with Artificial Neural Networks
Authors:
Veronica Saz Ulibarrena,
Philipp Horn,
Simon Portegies Zwart,
Elena Sellentin,
Barry Koren,
Maxwell X. Cai
Abstract:
Simulating the evolution of the gravitational N-body problem becomes extremely computationally expensive as N increases since the problem complexity scales quadratically with the number of bodies. We study the use of Artificial Neural Networks (ANNs) to replace expensive parts of the integration of planetary systems. Neural networks that include physical knowledge have grown in popularity in the l…
▽ More
Simulating the evolution of the gravitational N-body problem becomes extremely computationally expensive as N increases since the problem complexity scales quadratically with the number of bodies. We study the use of Artificial Neural Networks (ANNs) to replace expensive parts of the integration of planetary systems. Neural networks that include physical knowledge have grown in popularity in the last few years, although few attempts have been made to use them to speed up the simulation of the motion of celestial bodies. We study the advantages and limitations of using Hamiltonian Neural Networks to replace computationally expensive parts of the numerical simulation. We compare the results of the numerical integration of a planetary system with asteroids with those obtained by a Hamiltonian Neural Network and a conventional Deep Neural Network, with special attention to understanding the challenges of this problem. Due to the non-linear nature of the gravitational equations of motion, errors in the integration propagate. To increase the robustness of a method that uses neural networks, we propose a hybrid integrator that evaluates the prediction of the network and replaces it with the numerical solution if considered inaccurate. Hamiltonian Neural Networks can make predictions that resemble the behavior of symplectic integrators but are challenging to train and in our case fail when the inputs differ ~7 orders of magnitude. In contrast, Deep Neural Networks are easy to train but fail to conserve energy, leading to fast divergence from the reference solution. The hybrid integrator designed to include the neural networks increases the reliability of the method and prevents large energy errors without increasing the computing cost significantly. For this problem, the use of neural networks results in faster simulations when the number of asteroids is >70.
△ Less
Submitted 31 October, 2023;
originally announced October 2023.
-
Neural Symplectic Integrator with Hamiltonian Inductive Bias for the Gravitational $N$-body Problem
Authors:
Maxwell X. Cai,
Simon Portegies Zwart,
Damian Podareanu
Abstract:
The gravitational $N$-body problem, which is fundamentally important in astrophysics to predict the motion of $N$ celestial bodies under the mutual gravity of each other, is usually solved numerically because there is no known general analytical solution for $N>2$. Can an $N$-body problem be solved accurately by a neural network (NN)? Can a NN observe long-term conservation of energy and orbital a…
▽ More
The gravitational $N$-body problem, which is fundamentally important in astrophysics to predict the motion of $N$ celestial bodies under the mutual gravity of each other, is usually solved numerically because there is no known general analytical solution for $N>2$. Can an $N$-body problem be solved accurately by a neural network (NN)? Can a NN observe long-term conservation of energy and orbital angular momentum? Inspired by Wistom & Holman (1991)'s symplectic map, we present a neural $N$-body integrator for splitting the Hamiltonian into a two-body part, solvable analytically, and an interaction part that we approximate with a NN. Our neural symplectic $N$-body code integrates a general three-body system for $10^{5}$ steps without diverting from the ground truth dynamics obtained from a traditional $N$-body integrator. Moreover, it exhibits good inductive bias by successfully predicting the evolution of $N$-body systems that are no part of the training set.
△ Less
Submitted 28 November, 2021;
originally announced November 2021.
-
Deep-learning enhancement of large scale numerical simulations
Authors:
Caspar van Leeuwen,
Damian Podareanu,
Valeriu Codreanu,
Maxwell X. Cai,
Axel Berg,
Simon Portegies Zwart,
Robin Stoffer,
Menno Veerman,
Chiel van Heerwaarden,
Sydney Otten,
Sascha Caron,
Cunliang Geng,
Francesco Ambrosetti,
Alexandre M. J. J. Bonvin
Abstract:
Traditional simulations on High-Performance Computing (HPC) systems typically involve modeling very large domains and/or very complex equations. HPC systems allow running large models, but limits in performance increase that have become more prominent in the last 5-10 years will likely be experienced. Therefore new approaches are needed to increase application performance. Deep learning appears to…
▽ More
Traditional simulations on High-Performance Computing (HPC) systems typically involve modeling very large domains and/or very complex equations. HPC systems allow running large models, but limits in performance increase that have become more prominent in the last 5-10 years will likely be experienced. Therefore new approaches are needed to increase application performance. Deep learning appears to be a promising way to achieve this. Recently deep learning has been employed to enhance solving problems that traditionally are solved with large-scale numerical simulations using HPC. This type of application, deep learning for high-performance computing, is the theme of this whitepaper. Our goal is to provide concrete guidelines to scientists and others that would like to explore opportunities for applying deep learning approaches in their own large-scale numerical simulations. These guidelines have been extracted from a number of experiments that have been undertaken in various scientific domains over the last two years, and which are described in more detail in the Appendix. Additionally, we share the most important lessons that we have learned.
△ Less
Submitted 30 March, 2020;
originally announced April 2020.
-
Newton vs the machine: solving the chaotic three-body problem using deep neural networks
Authors:
Philip G. Breen,
Christopher N. Foley,
Tjarda Boekholt,
Simon Portegies Zwart
Abstract:
Since its formulation by Sir Isaac Newton, the problem of solving the equations of motion for three bodies under their own gravitational force has remained practically unsolved. Currently, the solution for a given initialization can only be found by performing laborious iterative calculations that have unpredictable and potentially infinite computational cost, due to the system's chaotic nature. W…
▽ More
Since its formulation by Sir Isaac Newton, the problem of solving the equations of motion for three bodies under their own gravitational force has remained practically unsolved. Currently, the solution for a given initialization can only be found by performing laborious iterative calculations that have unpredictable and potentially infinite computational cost, due to the system's chaotic nature. We show that an ensemble of solutions obtained using an arbitrarily precise numerical integrator can be used to train a deep artificial neural network (ANN) that, over a bounded time interval, provides accurate solutions at fixed computational cost and up to 100 million times faster than a state-of-the-art solver. Our results provide evidence that, for computationally challenging regions of phase-space, a trained ANN can replace existing numerical solvers, enabling fast and scalable simulations of many-body systems to shed light on outstanding phenomena such as the formation of black-hole binary systems or the origin of the core collapse in dense star clusters.
△ Less
Submitted 16 October, 2019;
originally announced October 2019.
-
Bonsai-SPH: A GPU accelerated astrophysical Smoothed Particle Hydrodynamics code
Authors:
Jeroen Bédorf,
Simon Portegies Zwart
Abstract:
We present the smoothed-particle hydrodynamics simulation code, Bonsai-SPH, which is a continuation of our previously developed gravity-only hierarchical $N$-body code (called Bonsai). The code is optimized for Graphics Processing Unit (GPU) accelerators which enables researchers to take advantage of these powerful computational resources. Bonsa-SPH produces simulation results comparable with stat…
▽ More
We present the smoothed-particle hydrodynamics simulation code, Bonsai-SPH, which is a continuation of our previously developed gravity-only hierarchical $N$-body code (called Bonsai). The code is optimized for Graphics Processing Unit (GPU) accelerators which enables researchers to take advantage of these powerful computational resources. Bonsa-SPH produces simulation results comparable with state-of-the-art, CPU based, codes, but using an order of magnitude less computation time. The code is freely available online and the details are described in this work.
△ Less
Submitted 14 February, 2020; v1 submitted 16 September, 2019;
originally announced September 2019.
-
Numerical verification of the microscopic time reversibility of Newton's equations of motion: Fighting exponential divergence
Authors:
Simon Portegies Zwart,
Tjarda Boekholt
Abstract:
Numerical solutions to Newton's equations of motion for chaotic self gravitating systems of more than 2 bodies are often regarded to be irreversible. This is due to the exponential growth of errors introduced by the integration scheme and the numerical round-off in the least significant figure. This secular growth of error is sometimes attributed to the increase in entropy of the system even thoug…
▽ More
Numerical solutions to Newton's equations of motion for chaotic self gravitating systems of more than 2 bodies are often regarded to be irreversible. This is due to the exponential growth of errors introduced by the integration scheme and the numerical round-off in the least significant figure. This secular growth of error is sometimes attributed to the increase in entropy of the system even though Newton's equations of motion are strictly time reversible. We demonstrate that when numerical errors are reduced to below the physical perturbation and its exponential growth during integration the microscopic reversibility is retrieved. Time reversibility itself is not a guarantee for a definitive solution to the chaotic N-body problem. However, time reversible algorithms may be used to find initial conditions for which perturbed trajectories converge rather than diverge. The ability to calculate such a converging pair of solutions is a striking illustration which shows that it is possible to compute a definitive solution to a highly unstable problem. This works as follows: If you (i) use a code which is capable of producing a definitive solution (and which will therefore handle converging pairs of solutions correctly), (ii) use it to study the statistical result of some other problem, and then (iii) find that some other code produces a solution S with statistical properties which are indistinguishable from those of the definitive solution, then solution S may be deemed veracious.
△ Less
Submitted 3 February, 2018;
originally announced February 2018.
-
Sapporo2: A versatile direct $N$-body library
Authors:
Jeroen Bédorf,
Evghenii Gaburov,
Simon Portegies Zwart
Abstract:
Astrophysical direct $N$-body methods have been one of the first production algorithms to be implemented using NVIDIA's CUDA architecture. Now, almost seven years later, the GPU is the most used accelerator device in astronomy for simulating stellar systems. In this paper we present the implementation of the Sapporo2 $N$-body library, which allows researchers to use the GPU for $N$-body simulation…
▽ More
Astrophysical direct $N$-body methods have been one of the first production algorithms to be implemented using NVIDIA's CUDA architecture. Now, almost seven years later, the GPU is the most used accelerator device in astronomy for simulating stellar systems. In this paper we present the implementation of the Sapporo2 $N$-body library, which allows researchers to use the GPU for $N$-body simulations with little to no effort. The first version, released five years ago, is actively used, but lacks advanced features and versatility in numerical precision and support for higher order integrators. In this updated version we have rebuilt the code from scratch and added support for OpenCL, multi-precision and higher order integrators. We show how to tune these codes for different GPU architectures and present how to continue utilizing the GPU optimal even when only a small number of particles ($N < 100$) is integrated. This careful tuning allows Sapporo2 to be faster than Sapporo1 even with the added options and double precision data loads. The code runs on a range of NVIDIA and AMD GPUs in single and double precision accuracy. With the addition of OpenCL support the library is also able to run on CPUs and other accelerators that support OpenCL.
△ Less
Submitted 14 October, 2015;
originally announced October 2015.
-
From Thread to Transcontinental Computer: Disturbing Lessons in Distributed Supercomputing
Authors:
Derek Groen,
Simon Portegies Zwart
Abstract:
We describe the political and technical complications encountered during the astronomical CosmoGrid project. CosmoGrid is a numerical study on the formation of large scale structure in the universe. The simulations are challenging due to the enormous dynamic range in spatial and temporal coordinates, as well as the enormous computer resources required. In CosmoGrid we dealt with the computational…
▽ More
We describe the political and technical complications encountered during the astronomical CosmoGrid project. CosmoGrid is a numerical study on the formation of large scale structure in the universe. The simulations are challenging due to the enormous dynamic range in spatial and temporal coordinates, as well as the enormous computer resources required. In CosmoGrid we dealt with the computational requirements by connecting up to four supercomputers via an optical network and make them operate as a single machine. This was challenging, if only for the fact that the supercomputers of our choice are separated by half the planet, as three of them are located scattered across Europe and fourth one is in Tokyo. The co-scheduling of multiple computers and the 'gridification' of the code enabled us to achieve an efficiency of up to $93\%$ for this distributed intercontinental supercomputer. In this work, we find that high-performance computing on a grid can be done much more effectively if the sites involved are willing to be flexible about their user policies, and that having facilities to provide such flexibility could be key to strengthening the position of the HPC community in an increasingly Cloud-dominated computing landscape. Given that smaller computer clusters owned by research groups or university departments usually have flexible user policies, we argue that it could be easier to instead realize distributed supercomputing by combining tens, hundreds or even thousands of these resources.
△ Less
Submitted 4 July, 2015;
originally announced July 2015.
-
24.77 Pflops on a Gravitational Tree-Code to Simulate the Milky Way Galaxy with 18600 GPUs
Authors:
Jeroen Bédorf,
Evghenii Gaburov,
Michiko S. Fujii,
Keigo Nitadori,
Tomoaki Ishiyama,
Simon Portegies Zwart
Abstract:
We have simulated, for the first time, the long term evolution of the Milky Way Galaxy using 51 billion particles on the Swiss Piz Daint supercomputer with our $N$-body gravitational tree-code Bonsai. Herein, we describe the scientific motivation and numerical algorithms. The Milky Way model was simulated for 6 billion years, during which the bar structure and spiral arms were fully formed. This i…
▽ More
We have simulated, for the first time, the long term evolution of the Milky Way Galaxy using 51 billion particles on the Swiss Piz Daint supercomputer with our $N$-body gravitational tree-code Bonsai. Herein, we describe the scientific motivation and numerical algorithms. The Milky Way model was simulated for 6 billion years, during which the bar structure and spiral arms were fully formed. This improves upon previous simulations by using 1000 times more particles, and provides a wealth of new data that can be directly compared with observations. We also report the scalability on both the Swiss Piz Daint and the US ORNL Titan. On Piz Daint the parallel efficiency of Bonsai was above 95%. The highest performance was achieved with a 242 billion particle Milky Way model using 18600 GPUs on Titan, thereby reaching a sustained GPU and application performance of 33.49 Pflops and 24.77 Pflops respectively.
△ Less
Submitted 1 December, 2014;
originally announced December 2014.
-
Computational Gravitational Dynamics with Modern Numerical Accelerators
Authors:
Simon Portegies Zwart,
Jeroen Bédorf
Abstract:
We review the recent optimizations of gravitational $N$-body kernels for running them on graphics processing units (GPUs), on single hosts and massive parallel platforms. For each of the two main $N$-body techniques, direct summation and tree-codes, we discuss the optimization strategy, which is different for each algorithm. Because both the accuracy as well as the performance characteristics diff…
▽ More
We review the recent optimizations of gravitational $N$-body kernels for running them on graphics processing units (GPUs), on single hosts and massive parallel platforms. For each of the two main $N$-body techniques, direct summation and tree-codes, we discuss the optimization strategy, which is different for each algorithm. Because both the accuracy as well as the performance characteristics differ, hybridizing the two algorithms is essential when simulating a large $N$-body system with high-density structures containing few particles, and with low-density structures containing many particles. We demonstrate how this can be realized by splitting the underlying Hamiltonian, and we subsequently demonstrate the efficiency and accuracy of the hybrid code by simulating a group of 11 merging galaxies with massive black holes in the nuclei.
△ Less
Submitted 18 September, 2014;
originally announced September 2014.
-
On the minimal accuracy required for simulating self-gravitating systems by means of direct N-body methods
Authors:
Simon Portegies Zwart,
Tjarda Boekholt
Abstract:
The conservation of energy, linear momentum and angular momentum are important drivers for our physical understanding of the evolution of the Universe. These quantities are also conserved in Newton's laws of motion under gravity \citep{Newton:1687}. Numerical integration of the associated equations of motion is extremely challenging, in particular due to the steady growth of numerical errors (by r…
▽ More
The conservation of energy, linear momentum and angular momentum are important drivers for our physical understanding of the evolution of the Universe. These quantities are also conserved in Newton's laws of motion under gravity \citep{Newton:1687}. Numerical integration of the associated equations of motion is extremely challenging, in particular due to the steady growth of numerical errors (by round-off and discrete time-step**, \cite{1981PAZh....7..752B,1993ApJ...415..715G,1993ApJ...402L..85H,1994LNP...430..131M}) and the exponential divergence \citep{1964ApJ...140..250M,2009MNRAS.392.1051U} between two nearby solution. As a result, numerical solutions to the general N-body problem are intrinsically questionable \citep{2003gmbp.book.....H,1994JAM....61..226L}. Using brute force integrations to arbitrary numerical precision we demonstrate empirically that ensembles of different realizations of resonant 3-body interactions produce statistically indistinguishable results. Although individual solutions using common integration methods are notoriously unreliable, we conjecture that an ensemble of approximate 3-body solutions accurately represents an ensemble of true solutions, so long as the energy during integration is conserved to better than 1/10. We therefore provide an independent confirmation that previous work on self-gravitating systems can actually be trusted, irrespective of the intrinsic chaotic nature of the N-body problem.
△ Less
Submitted 26 February, 2014;
originally announced February 2014.
-
MPWide: a light-weight library for efficient message passing over wide area networks
Authors:
Derek Groen,
Steven Rieder,
Simon Portegies Zwart
Abstract:
We present MPWide, a light weight communication library which allows efficient message passing over a distributed network. MPWide has been designed to connect application running on distributed (super)computing resources, and to maximize the communication performance on wide area networks for those without administrative privileges. It can be used to provide message-passing between application, mo…
▽ More
We present MPWide, a light weight communication library which allows efficient message passing over a distributed network. MPWide has been designed to connect application running on distributed (super)computing resources, and to maximize the communication performance on wide area networks for those without administrative privileges. It can be used to provide message-passing between application, move files, and make very fast connections in client-server environments. MPWide has already been applied to enable distributed cosmological simulations across up to four supercomputers on two continents, and to couple two different bloodflow simulations to form a multiscale simulation.
△ Less
Submitted 3 December, 2013;
originally announced December 2013.
-
Bonsai: A GPU Tree-Code
Authors:
Jeroen Bédorf,
Evghenii Gaburov,
Simon Portegies Zwart
Abstract:
We present a gravitational hierarchical N-body code that is designed to run efficiently on Graphics Processing Units (GPUs). All parts of the algorithm are executed on the GPU which eliminates the need for data transfer between the Central Processing Unit (CPU) and the GPU. Our tests indicate that the gravitational tree-code outperforms tuned CPU code for all parts of the algorithm and show an ove…
▽ More
We present a gravitational hierarchical N-body code that is designed to run efficiently on Graphics Processing Units (GPUs). All parts of the algorithm are executed on the GPU which eliminates the need for data transfer between the Central Processing Unit (CPU) and the GPU. Our tests indicate that the gravitational tree-code outperforms tuned CPU code for all parts of the algorithm and show an overall performance improvement of more than a factor 20, resulting in a processing rate of more than 2.8 million particles per second.
△ Less
Submitted 10 April, 2012;
originally announced April 2012.
-
High-Performance Distributed Multi-Model / Multi-Kernel Simulations: A Case-Study in Jungle Computing
Authors:
Niels Drost,
Jason Maassen,
Maarten A. J. van Meersbergen,
Henri E. Bal,
F. Inti Pelupessy,
Simon Portegies Zwart,
Michael Kliphuis,
Henk A. Dijkstra,
Frank J. Seinstra
Abstract:
High-performance scientific applications require more and more compute power. The concurrent use of multiple distributed compute resources is vital for making scientific progress. The resulting distributed system, a so-called Jungle Computing System, is both highly heterogeneous and hierarchical, potentially consisting of grids, clouds, stand-alone machines, clusters, desktop grids, mobile devices…
▽ More
High-performance scientific applications require more and more compute power. The concurrent use of multiple distributed compute resources is vital for making scientific progress. The resulting distributed system, a so-called Jungle Computing System, is both highly heterogeneous and hierarchical, potentially consisting of grids, clouds, stand-alone machines, clusters, desktop grids, mobile devices, and supercomputers, possibly with accelerators such as GPUs.
One striking example of applications that can benefit greatly of Jungle Computing Systems are Multi-Model / Multi-Kernel simulations. In these simulations, multiple models, possibly implemented using different techniques and programming models, are coupled into a single simulation of a physical system. Examples include the domain of computational astrophysics and climate modeling.
In this paper we investigate the use of Jungle Computing Systems for such Multi-Model / Multi-Kernel simulations. We make use of the software developed in the Ibis project, which addresses many of the problems faced when running applications on Jungle Computing Systems. We create a prototype Jungle-aware version of AMUSE, an astrophysical simulation framework. We show preliminary experiments with the resulting system, using clusters, grids, stand-alone machines, and GPUs.
△ Less
Submitted 1 March, 2012;
originally announced March 2012.
-
High performance cosmological simulations on a grid of supercomputers
Authors:
Derek Groen,
Steven Rieder,
Simon Portegies Zwart
Abstract:
We present results from our cosmological N-body simulation which consisted of 2048x2048x2048 particles and ran distributed across three supercomputers throughout Europe. The run, which was performed as the concluding phase of the Gravitational Billion Body Problem DEISA project, integrated a 30 Mpc box of dark matter using an optimized Tree/Particle Mesh N-body integrator. We ran the simulation up…
▽ More
We present results from our cosmological N-body simulation which consisted of 2048x2048x2048 particles and ran distributed across three supercomputers throughout Europe. The run, which was performed as the concluding phase of the Gravitational Billion Body Problem DEISA project, integrated a 30 Mpc box of dark matter using an optimized Tree/Particle Mesh N-body integrator. We ran the simulation up to the present day (z=0), and obtained an efficiency of about 0.93 over 2048 cores compared to a single supercomputer run. In addition, we share our experiences on using multiple supercomputers for high performance computing and provide several recommendations for future projects.
△ Less
Submitted 26 September, 2011;
originally announced September 2011.
-
A sparse octree gravitational N-body code that runs entirely on the GPU processor
Authors:
Jeroen Bédorf,
Evghenii Gaburov,
Simon Portegies Zwart
Abstract:
We present parallel algorithms for constructing and traversing sparse octrees on graphics processing units (GPUs). The algorithms are based on parallel-scan and sort methods. To test the performance and feasibility, we implemented them in CUDA in the form of a gravitational tree-code which completely runs on the GPU.(The code is publicly available at: http://castle.strw.leidenuniv.nl/software.html…
▽ More
We present parallel algorithms for constructing and traversing sparse octrees on graphics processing units (GPUs). The algorithms are based on parallel-scan and sort methods. To test the performance and feasibility, we implemented them in CUDA in the form of a gravitational tree-code which completely runs on the GPU.(The code is publicly available at: http://castle.strw.leidenuniv.nl/software.html) The tree construction and traverse algorithms are portable to many-core devices which have support for CUDA or OpenCL programming languages. The gravitational tree-code outperforms tuned CPU code during the tree-construction and shows a performance improvement of more than a factor 20 overall, resulting in a processing rate of more than 2.8 million particles per second.
△ Less
Submitted 10 April, 2012; v1 submitted 9 June, 2011;
originally announced June 2011.
-
High Performance Gravitational N-body Simulations on a Planet-wide Distributed Supercomputer
Authors:
Derek Groen,
Simon Portegies Zwart,
Tomoaki Ishiyama,
Junichiro Makino
Abstract:
We report on the performance of our cold-dark matter cosmological N-body simulation which was carried out concurrently using supercomputers across the globe. We ran simulations on 60 to 750 cores distributed over a variety of supercomputers in Amsterdam (the Netherlands, Europe), in Tokyo (Japan, Asia), Edinburgh (UK, Europe) and Espoo (Finland, Europe). Regardless the network latency of 0.32 seco…
▽ More
We report on the performance of our cold-dark matter cosmological N-body simulation which was carried out concurrently using supercomputers across the globe. We ran simulations on 60 to 750 cores distributed over a variety of supercomputers in Amsterdam (the Netherlands, Europe), in Tokyo (Japan, Asia), Edinburgh (UK, Europe) and Espoo (Finland, Europe). Regardless the network latency of 0.32 seconds and the communication over 30.000 km of optical network cable we are able to achieve about 87% of the performance compared to an equal number of cores on a single supercomputer. We argue that using widely distributed supercomputers in order to acquire more compute power is technically feasible, and that the largest obstacle is introduced by local scheduling and reservation policies.
△ Less
Submitted 3 January, 2011;
originally announced January 2011.
-
A Light-Weight Communication Library for Distributed Computing
Authors:
Derek Groen,
Steven Rieder,
Paola Grosso,
Cees de Laat,
Simon Portegies Zwart
Abstract:
We present MPWide, a platform independent communication library for performing message passing between computers. Our library allows coupling of several local MPI applications through a long distance network and is specifically optimized for such communications. The implementation is deliberately kept light-weight, platform independent and the library can be installed and used without administrati…
▽ More
We present MPWide, a platform independent communication library for performing message passing between computers. Our library allows coupling of several local MPI applications through a long distance network and is specifically optimized for such communications. The implementation is deliberately kept light-weight, platform independent and the library can be installed and used without administrative privileges. The only requirements are a C++ compiler and at least one open port to a wide area network on each site. In this paper we present the library, describe the user interface, present performance tests and apply MPWide in a large scale cosmological N-body simulation on a network of two computers, one in Amsterdam and the other in Tokyo.
△ Less
Submitted 16 August, 2010;
originally announced August 2010.
-
Gravitational tree-code on graphics processing units: implementation in CUDA
Authors:
Evghenii Gaburov,
Jeroen Bédorf,
Simon Portegies Zwart
Abstract:
We present a new very fast tree-code which runs on massively parallel Graphical Processing Units (GPU) with NVIDIA CUDA architecture. The tree-construction and calculation of multipole moments is carried out on the host CPU, while the force calculation which consists of tree walks and evaluation of interaction list is carried out on the GPU. In this way we achieve a sustained performance of about…
▽ More
We present a new very fast tree-code which runs on massively parallel Graphical Processing Units (GPU) with NVIDIA CUDA architecture. The tree-construction and calculation of multipole moments is carried out on the host CPU, while the force calculation which consists of tree walks and evaluation of interaction list is carried out on the GPU. In this way we achieve a sustained performance of about 100GFLOP/s and data transfer rates of about 50GB/s. It takes about a second to compute forces on a million particles with an opening angle of $θ\approx 0.5$. The code has a convenient user interface and is freely available for use\footnote{\tt http://castle.strw.leidenuniv.nl/software/octgrav.html}.
△ Less
Submitted 28 May, 2010;
originally announced May 2010.
-
Simulating the universe on an intercontinental grid of supercomputers
Authors:
Simon Portegies Zwart,
Tomoaki Ishiyama,
Derek Groen,
Keigo Nitadori,
Junichiro Makino,
Cees de Laat,
Stephen McMillan,
Kei Hiraki,
Stefan Harfst,
Paola Grosso
Abstract:
Understanding the universe is hampered by the elusiveness of its most common constituent, cold dark matter. Almost impossible to observe, dark matter can be studied effectively by means of simulation and there is probably no other research field where simulation has led to so much progress in the last decade. Cosmological N-body simulations are an essential tool for evolving density perturbation…
▽ More
Understanding the universe is hampered by the elusiveness of its most common constituent, cold dark matter. Almost impossible to observe, dark matter can be studied effectively by means of simulation and there is probably no other research field where simulation has led to so much progress in the last decade. Cosmological N-body simulations are an essential tool for evolving density perturbations in the nonlinear regime. Simulating the formation of large-scale structures in the universe, however, is still a challenge due to the enormous dynamic range in spatial and temporal coordinates, and due to the enormous computer resources required. The dynamic range is generally dealt with by the hybridization of numerical techniques. We deal with the computational requirements by connecting two supercomputers via an optical network and make them operate as a single machine. This is challenging, if only for the fact that the supercomputers of our choice are separated by half the planet, as one is located in Amsterdam and the other is in Tokyo. The co-scheduling of the two computers and the 'gridification' of the code enables us to achieve a 90% efficiency for this distributed intercontinental supercomputer.
△ Less
Submitted 5 January, 2010;
originally announced January 2010.
-
The Living Application: a Self-Organising System for Complex Grid Tasks
Authors:
D. Groen,
S. Harfst,
S. Portegies Zwart
Abstract:
We present the living application, a method to autonomously manage applications on the grid. During its execution on the grid, the living application makes choices on the resources to use in order to complete its tasks. These choices can be based on the internal state, or on autonomously acquired knowledge from external sensors. By giving limited user capabilities to a living application, the li…
▽ More
We present the living application, a method to autonomously manage applications on the grid. During its execution on the grid, the living application makes choices on the resources to use in order to complete its tasks. These choices can be based on the internal state, or on autonomously acquired knowledge from external sensors. By giving limited user capabilities to a living application, the living application is able to port itself from one resource topology to another. The application performs these actions at run-time without depending on users or external workflow tools. We demonstrate this new concept in a special case of a living application: the living simulation. Today, many simulations require a wide range of numerical solvers and run most efficiently if specialized nodes are matched to the solvers. The idea of the living simulation is that it decides itself which grid machines to use based on the numerical solver currently in use. In this paper we apply the living simulation to modelling the collision between two galaxies in a test setup with two specialized computers. This simulation switces at run-time between a GPU-enabled computer in the Netherlands and a GRAPE-enabled machine that resides in the United States, using an oct-tree N-body code whenever it runs in the Netherlands and a direct N-body solver in the United States.
△ Less
Submitted 23 July, 2009;
originally announced July 2009.
-
SAPPORO: A way to turn your graphics cards into a GRAPE-6
Authors:
Evghenii Gaburov,
Stefan Harfst,
Simon Portegies Zwart
Abstract:
We present Sapporo, a library for performing high-precision gravitational N-body simulations on NVIDIA Graphical Processing Units (GPUs). Our library mimics the GRAPE-6 library, and N-body codes currently running on GRAPE-6 can switch to Sapporo by a simple relinking of the library. The precision of our library is comparable to that of GRAPE-6, even though internally the GPU hardware is limited…
▽ More
We present Sapporo, a library for performing high-precision gravitational N-body simulations on NVIDIA Graphical Processing Units (GPUs). Our library mimics the GRAPE-6 library, and N-body codes currently running on GRAPE-6 can switch to Sapporo by a simple relinking of the library. The precision of our library is comparable to that of GRAPE-6, even though internally the GPU hardware is limited to single precision arithmetics. This limitation is effectively overcome by emulating double precision for calculating the distance between particles. The performance loss of this operation is small (< 20%) compared to the advantage of being able to run at high precision. We tested the library using several GRAPE-6-enabled N-body codes, in particular with Starlab and phiGRAPE. We measured peak performance of 800 Gflop/s for running with 10^6 particles on a PC with four commercial G92 architecture GPUs (two GeForce 9800GX2). As a production test, we simulated a 32k Plummer model with equal mass stars well beyond core collapse. The simulation took 41 days, during which the mean performance was 113 Gflop/s. The GPU did not show any problems from running in a production environment for such an extended period of time.
△ Less
Submitted 5 March, 2009; v1 submitted 25 February, 2009;
originally announced February 2009.
-
A parallel gravitational N-body kernel
Authors:
Simon Portegies Zwart,
Steve McMillan,
Derek Groen,
Alessia Gualandris,
Michael Sipior,
Willem Vermin
Abstract:
We describe source code level parallelization for the {\tt kira} direct gravitational $N$-body integrator, the workhorse of the {\tt starlab} production environment for simulating dense stellar systems. The parallelization strategy, called ``j-parallelization'', involves the partition of the computational domain by distributing all particles in the system among the available processors. Partial…
▽ More
We describe source code level parallelization for the {\tt kira} direct gravitational $N$-body integrator, the workhorse of the {\tt starlab} production environment for simulating dense stellar systems. The parallelization strategy, called ``j-parallelization'', involves the partition of the computational domain by distributing all particles in the system among the available processors. Partial forces on the particles to be advanced are calculated in parallel by their parent processors, and are then summed in a final global operation. Once total forces are obtained, the computing elements proceed to the computation of their particle trajectories. We report the results of timing measurements on four different parallel computers, and compare them with theoretical predictions. The computers employ either a high-speed interconnect, a NUMA architecture to minimize the communication overhead or are distributed in a grid. The code scales well in the domain tested, which ranges from 1024 - 65536 stars on 1 - 128 processors, providing satisfactory speedup. Running the production environment on a grid becomes inefficient for more than 60 processors distributed across three sites.
△ Less
Submitted 5 November, 2007;
originally announced November 2007.
-
Distributed N-body Simulation on the Grid Using Dedicated Hardware
Authors:
Derek Groen,
Simon Portegies Zwart,
Steve McMillan,
Jun Makino
Abstract:
We present performance measurements of direct gravitational N -body simulation on the grid, with and without specialized (GRAPE-6) hardware. Our inter-continental virtual organization consists of three sites, one in Tokyo, one in Philadelphia and one in Amsterdam. We run simulations with up to 196608 particles for a variety of topologies. In many cases, high performance simulations over the enti…
▽ More
We present performance measurements of direct gravitational N -body simulation on the grid, with and without specialized (GRAPE-6) hardware. Our inter-continental virtual organization consists of three sites, one in Tokyo, one in Philadelphia and one in Amsterdam. We run simulations with up to 196608 particles for a variety of topologies. In many cases, high performance simulations over the entire planet are dominated by network bandwidth rather than latency. With this global grid of GRAPEs our calculation time remains dominated by communication over the entire range of N, which was limited due to the use of three sites. Increasing the number of particles will result in a more efficient execution. Based on these timings we construct and calibrate a model to predict the performance of our simulation on any grid infrastructure with or without GRAPE. We apply this model to predict the simulation performance on the Netherlands DAS-3 wide area computer. Equip** the DAS-3 with GRAPE-6Af hardware would achieve break-even between calculation and communication at a few million particles, resulting in a compute time of just over ten hours for 1 N -body time unit. Key words: high-performance computing, grid, N-body simulation, performance modelling
△ Less
Submitted 5 November, 2007; v1 submitted 28 September, 2007;
originally announced September 2007.
-
High Performance Direct Gravitational N-body Simulations on Graphics Processing Units
Authors:
Simon Portegies Zwart,
Robert Belleman,
Peter Geldof
Abstract:
We present the results of gravitational direct $N$-body simulations using the commercial graphics processing units (GPU) NVIDIA Quadro FX1400 and GeForce 8800GTX, and compare the results with GRAPE-6Af special purpose hardware. The force evaluation of the $N$-body problem was implemented in Cg using the GPU directly to speed-up the calculations. The integration of the equations of motions were,…
▽ More
We present the results of gravitational direct $N$-body simulations using the commercial graphics processing units (GPU) NVIDIA Quadro FX1400 and GeForce 8800GTX, and compare the results with GRAPE-6Af special purpose hardware. The force evaluation of the $N$-body problem was implemented in Cg using the GPU directly to speed-up the calculations. The integration of the equations of motions were, running on the host computer, implemented in C using the 4th order predictor-corrector Hermite integrator with block time steps. We find that for a large number of particles ($N \apgt 10^4$) modern graphics processing units offer an attractive low cost alternative to GRAPE special purpose hardware. A modern GPU continues to give a relatively flat scaling with the number of particles, comparable to that of the GRAPE. Using the same time step criterion the total energy of the $N$-body system was conserved better than to one in $10^6$ on the GPU, which is only about an order of magnitude worse than obtained with GRAPE. For $N\apgt 10^6$ the GeForce 8800GTX was about 20 times faster than the host computer. Though still about an order of magnitude slower than GRAPE, modern GPU's outperform GRAPE in their low cost, long mean time between failure and the much larger onboard memory; the GRAPE-6Af holds at most 256k particles whereas the GeForce 8800GTF can hold 9 million particles in memory.
△ Less
Submitted 23 February, 2007;
originally announced February 2007.