-
Survey of adaptive containerization architectures for HPC
Authors:
Tiziano Müller,
Nina Mujkanovic,
Juan J. Durillo,
Nicolay Hammer
Abstract:
Containers offer an array of advantages that benefit research reproducibility and portability across groups and systems. As container tools mature, container security improves, and High-performance computing (HPC) and cloud system tools converge, supercomputing centers are increasingly integrating containers in their workflows. The technology selection process requires sufficient information on th…
▽ More
Containers offer an array of advantages that benefit research reproducibility and portability across groups and systems. As container tools mature, container security improves, and High-performance computing (HPC) and cloud system tools converge, supercomputing centers are increasingly integrating containers in their workflows. The technology selection process requires sufficient information on the diverse tools available, yet the majority of research into containers still focuses on cloud environments. We consider an adaptive containerization approach, with a focus on accelerating the deployment of applications and workflows on HPC systems using containers. To this end, we discuss the specific HPC requirements regarding container tools, and analyze the entire containerization stack, including container engines and registries, in-depth. Finally, we consider various orchestrator and HPC workload manager integration scenarios.
△ Less
Submitted 23 August, 2023;
originally announced August 2023.
-
Towards Confidential Computing: A Secure Cloud Architecture for Big Data Analytics and AI
Authors:
Naweiluo Zhou,
Florent Dufour,
Vinzent Bode,
Peter Zinterhof,
Nicolay J Hammer,
Dieter Kranzlmüller
Abstract:
Cloud computing provisions computer resources at a cost-effective way based on demand. Therefore it has become a viable solution for big data analytics and artificial intelligence which have been widely adopted in various domain science. Data security in certain fields such as biomedical research remains a major concern when moving their workflows to cloud, because cloud environments are generally…
▽ More
Cloud computing provisions computer resources at a cost-effective way based on demand. Therefore it has become a viable solution for big data analytics and artificial intelligence which have been widely adopted in various domain science. Data security in certain fields such as biomedical research remains a major concern when moving their workflows to cloud, because cloud environments are generally outsourced which are more exposed to risks. We present a secure cloud architecture and describes how it enables workflow packaging and scheduling while kee** its data, logic and computation secure in transit, in use and at rest.
△ Less
Submitted 28 May, 2023;
originally announced May 2023.
-
Space-Efficient Graph Coarsening with Applications to Succinct Planar Encodings
Authors:
Nina Hammer,
Frank Kammer,
Johannes Meintrup
Abstract:
We present a novel space-efficient graph coarsening technique for $n$-vertex planar graphs $G$, called cloud partition, which partitions the vertices $V(G)$ into disjoint sets $C$ of size $O(\log n)$ such that each $C$ induces a connected subgraph of $G$. Using this partition $P$ we construct a so-called structure-maintaining minor $F$ of $G$ via specific contractions within the disjoint sets such…
▽ More
We present a novel space-efficient graph coarsening technique for $n$-vertex planar graphs $G$, called cloud partition, which partitions the vertices $V(G)$ into disjoint sets $C$ of size $O(\log n)$ such that each $C$ induces a connected subgraph of $G$. Using this partition $P$ we construct a so-called structure-maintaining minor $F$ of $G$ via specific contractions within the disjoint sets such that $F$ has $O(n/\log n)$ vertices. The combination of $(F, P)$ is referred to as a cloud decomposition.
For planar graphs we show that a cloud decomposition can be constructed in $O(n)$ time and using $O(n)$ bits. Given a cloud decomposition $(F, P)$ constructed for a planar graph $G$ we are able to find a balanced separator of $G$ in $O(n/\log n)$ time. Contrary to related publications, we do not make use of an embedding of the planar input graph. We generalize our cloud decomposition from planar graphs to $H$-minor-free graphs for any fixed graph $H$. This allows us to construct the succinct encoding scheme for $H$-minor-free graphs due to Blelloch and Farzan (CPM 2010) in $O(n)$ time and $O(n)$ bits improving both runtime and space by a factor of $Θ(\log n)$.
As an additional application of our cloud decomposition we show that, for $H$-minor-free graphs, a tree decomposition of width $O(n^{1/2 + ε})$ for any $ε> 0$ can be constructed in $O(n)$ bits and a time linear in the size of the tree decomposition. Finally, we implemented our cloud decomposition algorithm and experimentally verified its practical effectiveness on both randomly generated graphs and real-world graphs such as road networks. The obtained data shows that a simplified version of our algorithms suffices in a practical setting, as many of the theoretical worst-case scenarios are not present in the graphs we encountered.
△ Less
Submitted 18 June, 2024; v1 submitted 12 May, 2022;
originally announced May 2022.
-
Exploiting the Space Filling Curve Ordering of Particles in the Neighbour Search of Gadget3
Authors:
Antonio Ragagnin,
Nikola Tchipev,
Michael Bader,
Klaus Dolag,
Nicolay J. Hammer
Abstract:
Gadget3 is nowadays one of the most frequently used high performing parallel codes for cosmological hydrodynamical simulations. Recent analyses have shown t\ hat the Neighbour Search process of Gadget3 is one of the most time-consuming parts. Thus, a considerable speedup can be expected from improvements of the u\ nderlying algorithms. In this work we propose a novel approach for speeding up the N…
▽ More
Gadget3 is nowadays one of the most frequently used high performing parallel codes for cosmological hydrodynamical simulations. Recent analyses have shown t\ hat the Neighbour Search process of Gadget3 is one of the most time-consuming parts. Thus, a considerable speedup can be expected from improvements of the u\ nderlying algorithms. In this work we propose a novel approach for speeding up the Neighbour Search which takes advantage of the space-filling-curve particle ordering. Instead of performing Neighbour Search for all particles individually, nearby active particles can be grouped and one single Neighbour Search can be performed to obta\ in a common superset of neighbours. Thus, with this approach we reduce the number of searches. On the other hand, tree walks are performed within a larger searching radius. There is an optimal size of grou** that maximize the speedup, which we found by numerical experiments. We tested the algorithm within the boxes of the Magneticum project. As a result we obtained a speedup of $1.65$ in the Density and of $1.30$ in the Hydrodynamics computation, respectively, and a total speedup of $1.34.$
△ Less
Submitted 23 October, 2018;
originally announced October 2018.
-
Performance Optimisation of Smoothed Particle Hydrodynamics Algorithms for Multi/Many-Core Architectures
Authors:
Fabio Baruffa,
Luigi Iapichino,
Nicolay J. Hammer,
Vasileios Karakasis
Abstract:
We describe a strategy for code modernisation of Gadget, a widely used community code for computational astrophysics. The focus of this work is on node-level performance optimisation, targeting current multi/many-core IntelR architectures. We identify and isolate a sample code kernel, which is representative of a typical Smoothed Particle Hydrodynamics (SPH) algorithm. The code modifications inclu…
▽ More
We describe a strategy for code modernisation of Gadget, a widely used community code for computational astrophysics. The focus of this work is on node-level performance optimisation, targeting current multi/many-core IntelR architectures. We identify and isolate a sample code kernel, which is representative of a typical Smoothed Particle Hydrodynamics (SPH) algorithm. The code modifications include threading parallelism optimisation, change of the data layout into Structure of Arrays (SoA), auto-vectorisation and algorithmic improvements in the particle sorting. We obtain shorter execution time and improved threading scalability both on Intel XeonR ($2.6 \times$ on Ivy Bridge) and Xeon PhiTM ($13.7 \times$ on Knights Corner) systems. First few tests of the optimised code result in $19.1 \times$ faster execution on second generation Xeon Phi (Knights Landing), thus demonstrating the portability of the devised optimisation solutions to upcoming architectures.
△ Less
Submitted 10 May, 2017; v1 submitted 19 December, 2016;
originally announced December 2016.
-
Extreme Scale-out SuperMUC Phase 2 - lessons learned
Authors:
Nicolay Hammer,
Ferdinand Jamitzky,
Helmut Satzger,
Momme Allalen,
Alexander Block,
Anupam Karmakar,
Matthias Brehm,
Reinhold Bader,
Luigi Iapichino,
Antonio Ragagnin,
Vasilios Karakasis,
Dieter Kranzlmüller,
Arndt Bode,
Herbert Huber,
Martin Kühn,
Rui Machado,
Daniel Grünewald,
Philipp V. F. Edelmann,
Friedrich K. Röpke,
Markus Wittmann,
Thomas Zeiser,
Gerhard Wellein,
Gerald Mathias,
Magnus Schwörer,
Konstantin Lorenzen
, et al. (14 additional authors not shown)
Abstract:
In spring 2015, the Leibniz Supercomputing Centre (Leibniz-Rechenzentrum, LRZ), installed their new Peta-Scale System SuperMUC Phase2. Selected users were invited for a 28 day extreme scale-out block operation during which they were allowed to use the full system for their applications. The following projects participated in the extreme scale-out workshop: BQCD (Quantum Physics), SeisSol (Geophysi…
▽ More
In spring 2015, the Leibniz Supercomputing Centre (Leibniz-Rechenzentrum, LRZ), installed their new Peta-Scale System SuperMUC Phase2. Selected users were invited for a 28 day extreme scale-out block operation during which they were allowed to use the full system for their applications. The following projects participated in the extreme scale-out workshop: BQCD (Quantum Physics), SeisSol (Geophysics, Seismics), GPI-2/GASPI (Toolkit for HPC), Seven-League Hydro (Astrophysics), ILBDC (Lattice Boltzmann CFD), Iphigenie (Molecular Dynamic), FLASH (Astrophysics), GADGET (Cosmological Dynamics), PSC (Plasma Physics), waLBerla (Lattice Boltzmann CFD), Musubi (Lattice Boltzmann CFD), Vertex3D (Stellar Astrophysics), CIAO (Combustion CFD), and LS1-Mardyn (Material Science). The projects were allowed to use the machine exclusively during the 28 day period, which corresponds to a total of 63.4 million core-hours, of which 43.8 million core-hours were used by the applications, resulting in a utilization of 69%. The top 3 users were using 15.2, 6.4, and 4.7 million core-hours, respectively.
△ Less
Submitted 6 September, 2016;
originally announced September 2016.