-
ScalSALE: Scalable SALE Benchmark Framework for Supercomputers
Authors:
Re'em Harel,
Matan Rusanovsky,
Ron Wagner,
Harel Levin,
Gal Oren
Abstract:
Supercomputers worldwide provide the necessary infrastructure for groundbreaking research. However, most supercomputers are not designed equally due to different desired figure of merit, which is derived from the computational bounds of the targeted scientific applications' portfolio. In turn, the design of such computers becomes an optimization process that strives to achieve the best performance…
▽ More
Supercomputers worldwide provide the necessary infrastructure for groundbreaking research. However, most supercomputers are not designed equally due to different desired figure of merit, which is derived from the computational bounds of the targeted scientific applications' portfolio. In turn, the design of such computers becomes an optimization process that strives to achieve the best performances possible in a multi-parameters search space. Therefore, verifying and evaluating whether a supercomputer can achieve its desired goal becomes a tedious and complex task. For this purpose, many full, mini, proxy, and benchmark applications have been introduced in the attempt to represent scientific applications. Nevertheless, as these benchmarks are hard to expand, and most importantly, are over-simplified compared to scientific applications that tend to couple multiple scientific domains, they fail to represent the true scaling capabilities. We suggest a new physical scalable benchmark framework, namely ScalSALE, based on the well-known SALE scheme. ScalSALE's main goal is to provide a simple, flexible, scalable infrastructure that can be easily expanded to include multi-physical schemes while maintaining scalable and efficient execution times. By expanding ScalSALE, the gap between the over-simplified benchmarks and scientific applications can be bridged. To achieve this goal, ScalSALE is implemented in Modern Fortran with simple OOP design patterns and supported by transparent MPI-3 blocking and non-blocking communication that allows such a scalable framework. ScalSALE is compared to LULESH via simulating the Sedov-Taylor blast wave problem using strong and weak scaling tests. ScalSALE is executed and evaluated with both rezoning options - Lagrangian and Eulerian.
△ Less
Submitted 5 September, 2022;
originally announced September 2022.
-
Recovery of Distributed Iterative Solvers for Linear Systems Using Non-Volatile RAM
Authors:
Yehonatan Fridman,
Yaniv Snir,
Harel Levin,
Danny Hendler,
Hagit Attiya,
Gal Oren
Abstract:
HPC systems are a critical resource for scientific research. The increased demand for computational power and memory ushers in the exascale era, in which supercomputers are designed to provide enormous computing power to meet these needs. These complex supercomputers consist of numerous compute nodes and are consequently expected to experience frequent faults and crashes.
Mathematical solvers, i…
▽ More
HPC systems are a critical resource for scientific research. The increased demand for computational power and memory ushers in the exascale era, in which supercomputers are designed to provide enormous computing power to meet these needs. These complex supercomputers consist of numerous compute nodes and are consequently expected to experience frequent faults and crashes.
Mathematical solvers, in particular, iterative linear solvers are key building block in numerous large-scale scientific applications. Consequently, supporting the recovery of distributed solvers is necessary for scaling scientific applications to exascale platforms. Previous recovery methods for iterative solvers are based on Checkpoint-Restart (CR), which incurs high fault tolerance overhead, or intrinsic fault tolerance, which require extra computation time to converge after failures.
Exact state reconstruction (ESR) was proposed as an alternative mechanism to alleviate the impact of frequent failures on long-term computations. ESR has been shown to provide exact reconstruction of the computation state while avoiding the need for costly checkpointing. However, ESR currently relies on volatile memory for fault tolerance, and must therefore maintain redundancies in the RAM of multiple nodes, incurring high memory and network overheads.
Recent supercomputer designs feature emerging non-volatile RAM (NVRAM) technology. This paper investigates how NVRAM can be utilized to devise an enhanced ESR-based recovery mechanism that is more efficient and provides full resilience. Our mechanism, called in-NVRAM ESR, is based on a novel MPI One-Sided Communication (OSC) over RDMA implementation, and provides full resiliency while significantly reducing both the memory footprint and the time overhead in comparison with the original ESR design (in-RAM ESR).
△ Less
Submitted 9 August, 2022; v1 submitted 25 April, 2022;
originally announced April 2022.
-
Assessing the Use Cases of Persistent Memory in High-Performance Scientific Computing
Authors:
Yehonatan Fridman,
Yaniv Snir,
Matan Rusanovsky,
Kfir Zvi,
Harel Levin,
Danny Hendler,
Hagit Attiya,
Gal Oren
Abstract:
As the High Performance Computing world moves towards the Exa-Scale era, huge amounts of data should be analyzed, manipulated and stored. In the traditional storage/memory hierarchy, each compute node retains its data objects in its local volatile DRAM. Whenever the DRAM's capacity becomes insufficient for storing this data, the computation should either be distributed between several compute node…
▽ More
As the High Performance Computing world moves towards the Exa-Scale era, huge amounts of data should be analyzed, manipulated and stored. In the traditional storage/memory hierarchy, each compute node retains its data objects in its local volatile DRAM. Whenever the DRAM's capacity becomes insufficient for storing this data, the computation should either be distributed between several compute nodes, or some portion of these data objects must be stored in a non-volatile block device such as a hard disk drive or an SSD storage device. Optane DataCenter Persistent Memory Module (DCPMM), a new technology introduced by Intel, provides non-volatile memory that can be plugged into standard memory bus slots and therefore be accessed much faster than standard storage devices. In this work, we present and analyze the results of a comprehensive performance assessment of several ways in which DCPMM can 1) replace standard storage devices, and 2) replace or augment DRAM for improving the performance of HPC scientific computations. To achieve this goal, we have configured an HPC system such that DCPMM can service I/O operations of scientific applications, replace standard storage devices and file systems (specifically for diagnostics and checkpoint-restarting), and serve for expanding applications' main memory. We focus on kee** the scientific codes with as few changes as possible, while allowing them to access the NVM transparently as if they access persistent storage. Our results show that DCPMM allows scientific applications to fully utilize nodes' locality by providing them with sufficiently-large main memory. Moreover, it can be used for providing a high-performance replacement for persistent storage. Thus, the usage of DCPMM has the potential of replacing standard HDD and SSD storage devices in HPC architectures and enabling a more efficient platform for modern supercomputing applications.
△ Less
Submitted 5 September, 2021;
originally announced September 2021.
-
Mixture Model Framework for Traumatic Brain Injury Prognosis Using Heterogeneous Clinical and Outcome Data
Authors:
Alan D. Kaplan,
Qi Cheng,
K. Aditya Mohan,
Lindsay D. Nelson,
Sonia Jain,
Harvey Levin,
Abel Torres-Espin,
Austin Chou,
J. Russell Huie,
Adam R. Ferguson,
Michael McCrea,
Joseph Giacino,
Shivshankar Sundaram,
Amy J. Markowitz,
Geoffrey T. Manley
Abstract:
Prognoses of Traumatic Brain Injury (TBI) outcomes are neither easily nor accurately determined from clinical indicators. This is due in part to the heterogeneity of damage inflicted to the brain, ultimately resulting in diverse and complex outcomes. Using a data-driven approach on many distinct data elements may be necessary to describe this large set of outcomes and thereby robustly depict the n…
▽ More
Prognoses of Traumatic Brain Injury (TBI) outcomes are neither easily nor accurately determined from clinical indicators. This is due in part to the heterogeneity of damage inflicted to the brain, ultimately resulting in diverse and complex outcomes. Using a data-driven approach on many distinct data elements may be necessary to describe this large set of outcomes and thereby robustly depict the nuanced differences among TBI patients' recovery. In this work, we develop a method for modeling large heterogeneous data types relevant to TBI. Our approach is geared toward the probabilistic representation of mixed continuous and discrete variables with missing values. The model is trained on a dataset encompassing a variety of data types, including demographics, blood-based biomarkers, and imaging findings. In addition, it includes a set of clinical outcome assessments at 3, 6, and 12 months post-injury. The model is used to stratify patients into distinct groups in an unsupervised learning setting. We use the model to infer outcomes using input data, and show that the collection of input data reduces uncertainty of outcomes over a baseline approach. In addition, we quantify the performance of a likelihood scoring technique that can be used to self-evaluate the extrapolation risk of prognosis on unseen patients.
△ Less
Submitted 20 July, 2021; v1 submitted 22 December, 2020;
originally announced December 2020.
-
Secured Distributed Algorithms Without Hardness Assumptions
Authors:
Leonid Barenboim,
Harel Levin
Abstract:
We study algorithms in the distributed message-passing model that produce secured output, for an input graph $G$. Specifically, each vertex computes its part in the output, the entire output is correct, but each vertex cannot discover the output of other vertices, with a certain probability. This is motivated by high-performance processors that are embedded nowadays in a large variety of devices.…
▽ More
We study algorithms in the distributed message-passing model that produce secured output, for an input graph $G$. Specifically, each vertex computes its part in the output, the entire output is correct, but each vertex cannot discover the output of other vertices, with a certain probability. This is motivated by high-performance processors that are embedded nowadays in a large variety of devices. In such situations, it no longer makes sense, and in many cases it is not feasible, to leave the whole processing task to a single computer or even a group of central computers. As the extensive research in the distributed algorithms field yielded efficient decentralized algorithms for many classic problems, the discussion about the security of distributed algorithms was somewhat neglected. Nevertheless, many protocols and algorithms were devised in the research area of secure multi-party computation problem (MPC or SMC). However, the notions and terminology of these protocols are quite different than in classic distributed algorithms. As a consequence, the focus in those protocols was to work for every function $f$ at the expense of increasing the round complexity, or the necessity of several computational assumptions. In this work, we present a novel approach, which rather than turning existing algorithms into secure ones, identifies and develops those algorithms that are inherently secure (which means they do not require any further constructions). This approach yields efficient secure algorithms for various locality problems, such as coloring, network decomposition, forest decomposition, and a variety of additional labeling problems. Remarkably, our approach does not require any hardness assumption, but only a private randomness generator in each vertex. This is in contrast to previously known techniques in this setting that are based on public-key encryption schemes.
△ Less
Submitted 17 February, 2021; v1 submitted 16 November, 2020;
originally announced November 2020.
-
BACKUS: Comprehensive High-Performance Research Software Engineering Approach for Simulations in Supercomputing Systems
Authors:
Matan Rusanovsky,
Re'em Harel,
Lee-or Alon,
Idan Mosseri,
Harel Levin,
Gal Oren
Abstract:
High-Performance Computing (HPC) platforms enable scientific software to achieve breakthroughs in many research fields such as physics, biology, and chemistry, by employing Research Software Engineering (RSE) techniques. These include 1) novel parallelism paradigms such as Shared Memory Parallelism (with e.g. OpenMP 4.5); Distributed Memory Parallelism (with e.g. MPI 4); Hybrid Parallelism which c…
▽ More
High-Performance Computing (HPC) platforms enable scientific software to achieve breakthroughs in many research fields such as physics, biology, and chemistry, by employing Research Software Engineering (RSE) techniques. These include 1) novel parallelism paradigms such as Shared Memory Parallelism (with e.g. OpenMP 4.5); Distributed Memory Parallelism (with e.g. MPI 4); Hybrid Parallelism which combines them; and Heterogeneous Parallelism (for CPUs, co-processors and accelerators), 2) introducing advanced Software Engineering concepts such as Object Oriented Parallel Programming (OOPP); Parallel Unit testing; Parallel I/O Formats; Hybrid Parallel Visualization; and 3) Selecting the Best Practices in other necessary areas such as User Interface; Automatic Documentation; Version Control and Project Management. In this work we present BACKUS: Comprehensive High-Performance Research Software Engineering Approach for Simulations in Supercomputing Systems, which we found to fit best for long-lived parallel scientific codes.
△ Less
Submitted 14 October, 2019;
originally announced October 2019.
-
Adaptive Distributed Hierarchical Sensing Algorithm for Reduction of Wireless Sensor Network Cluster-Heads Energy Consumption
Authors:
Gal Oren,
Leonid Barenboim,
Harel Levin
Abstract:
Energy efficiency is a crucial performance metric in sensor networks, directly determining the network lifetime. Consequently, a key factor in WSN is to improve overall energy efficiency to extend the network lifetime. Although many algorithms have been presented to optimize the energy factor, energy efficiency is still one of the major problems of WSNs, especially when there is a need to sample a…
▽ More
Energy efficiency is a crucial performance metric in sensor networks, directly determining the network lifetime. Consequently, a key factor in WSN is to improve overall energy efficiency to extend the network lifetime. Although many algorithms have been presented to optimize the energy factor, energy efficiency is still one of the major problems of WSNs, especially when there is a need to sample an area with different types of loads. Unlike other energy-efficient schemes for hierarchical sampling, our hypothesis is that it is achievable, in terms of prolonging the network lifetime, to adaptively re-modify CHs sensing rates (the processing and transmitting stages in particular) in some specific regions that are triggered significantly less than other regions. In order to do so we introduce the Adaptive Distributed Hierarchical Sensing (ADHS) algorithm. This algorithm employs a homogenous sensor network in a distributed fashion and changes the sampling rates of the CHs based on the variance of the sampled data without damaging significantly the accuracy of the sensed area.
△ Less
Submitted 24 July, 2017;
originally announced July 2017.