-
Distributed OpenMP Offloading of OpenMC on Intel GPU MAX Accelerators
Authors:
Yehonatan Fridman,
Guy Tamir,
Uri Steinitz,
Gal Oren
Abstract:
Monte Carlo (MC) simulations play a pivotal role in diverse scientific and engineering domains, with applications ranging from nuclear physics to materials science. Harnessing the computational power of high-performance computing (HPC) systems, especially Graphics Processing Units (GPUs), has become essential for accelerating MC simulations. This paper focuses on the adaptation and optimization of…
▽ More
Monte Carlo (MC) simulations play a pivotal role in diverse scientific and engineering domains, with applications ranging from nuclear physics to materials science. Harnessing the computational power of high-performance computing (HPC) systems, especially Graphics Processing Units (GPUs), has become essential for accelerating MC simulations. This paper focuses on the adaptation and optimization of the OpenMC neutron and photon transport Monte Carlo code for Intel GPUs, specifically the Intel Data Center Max 1100 GPU (codename Ponte Vecchio, PVC), through distributed OpenMP offloading. Building upon prior work by Tramm J.R., et al. (2022), which laid the groundwork for GPU adaptation, our study meticulously extends the OpenMC code's capabilities to Intel GPUs. We present a comprehensive benchmarking and scaling analysis, comparing performance on Intel MAX GPUs to state-of-the-art CPU execution (Intel Xeon Platinum 8480+ Processor, codename 4th generation Sapphire Rapids). The results demonstrate a remarkable acceleration factor compared to CPU execution, showcasing the GPU-adapted code's superiority over its CPU counterpart as computational load increases.
△ Less
Submitted 12 March, 2024; v1 submitted 5 March, 2024;
originally announced March 2024.
-
A Random-Player Game and Derangement Numbers
Authors:
Yehonatan Fridman
Abstract:
Consider the following game between a random player R and a deterministic player D. There is a pile of n elements at the beginning. The rules for playing are as follows: In each turn of R, if the pile contains exactly m elements, R removes k elements from the pile, where k is independently identically distributed from {1, . . . , m}. In each turn of D, D removes only one element. The winner is the…
▽ More
Consider the following game between a random player R and a deterministic player D. There is a pile of n elements at the beginning. The rules for playing are as follows: In each turn of R, if the pile contains exactly m elements, R removes k elements from the pile, where k is independently identically distributed from {1, . . . , m}. In each turn of D, D removes only one element. The winner is the player that, at the end of its round, has no elements remaining. R starts first to play. This short paper shows that Dn, which is defined as the probability of D winning the game (when is initialized with n elements), approaches 1/e when n increases; and more specifically, Dn = dn/n!, where dn is the n-th derangement number.
△ Less
Submitted 24 March, 2024; v1 submitted 15 February, 2024;
originally announced February 2024.
-
Dynamical Mean Field Theory of the Bilayer Hubbard Model with Inchworm Monte Carlo
Authors:
Dolev Goldberger,
Yehonatan Fridman,
Emanuel Gull,
Eitan Eidelstein,
Guy Cohen
Abstract:
Dynamical mean-field theory allows access to the physics of strongly correlated materials with nontrivial orbital structure, but relies on the ability to solve auxiliary multi-orbital impurity problems. The most successful approaches to date for solving these impurity problems are the various continuous time quantum Monte Carlo algorithms. Here, we consider perhaps the simplest realization of mult…
▽ More
Dynamical mean-field theory allows access to the physics of strongly correlated materials with nontrivial orbital structure, but relies on the ability to solve auxiliary multi-orbital impurity problems. The most successful approaches to date for solving these impurity problems are the various continuous time quantum Monte Carlo algorithms. Here, we consider perhaps the simplest realization of multi-orbital physics: the bilayer Hubbard model on an infinite-coordination Bethe lattice. Despite its simplicity, the majority of this model's phase diagram cannot be predicted by using traditional Monte Carlo methods. We show that these limitations can be largely circumvented by recently introduced Inchworm Monte Carlo techniques. We then explore the model's phase diagram at a variety of interaction strengths, temperatures and filling ratios.
△ Less
Submitted 28 November, 2023;
originally announced November 2023.
-
CXL Memory as Persistent Memory for Disaggregated HPC: A Practical Approach
Authors:
Yehonatan Fridman,
Suprasad Mutalik Desai,
Navneet Singh,
Thomas Willhalm,
Gal Oren
Abstract:
In the landscape of High-Performance Computing (HPC), the quest for efficient and scalable memory solutions remains paramount. The advent of Compute Express Link (CXL) introduces a promising avenue with its potential to function as a Persistent Memory (PMem) solution in the context of disaggregated HPC systems. This paper presents a comprehensive exploration of CXL memory's viability as a candidat…
▽ More
In the landscape of High-Performance Computing (HPC), the quest for efficient and scalable memory solutions remains paramount. The advent of Compute Express Link (CXL) introduces a promising avenue with its potential to function as a Persistent Memory (PMem) solution in the context of disaggregated HPC systems. This paper presents a comprehensive exploration of CXL memory's viability as a candidate for PMem, supported by physical experiments conducted on cutting-edge multi-NUMA nodes equipped with CXL-attached memory prototypes. Our study not only benchmarks the performance of CXL memory but also illustrates the seamless transition from traditional PMem programming models to CXL, reinforcing its practicality.
To substantiate our claims, we establish a tangible CXL prototype using an FPGA card embodying CXL 1.1/2.0 compliant endpoint designs (Intel FPGA CXL IP). Performance evaluations, executed through the STREAM and STREAM-PMem benchmarks, showcase CXL memory's ability to mirror PMem characteristics in App-Direct and Memory Mode while achieving impressive bandwidth metrics with Intel 4th generation Xeon (Sapphire Rapids) processors.
The results elucidate the feasibility of CXL memory as a persistent memory solution, outperforming previously established benchmarks. In contrast to published DCPMM results, our CXL-DDR4 memory module offers comparable bandwidth to local DDR4 memory configurations, albeit with a moderate decrease in performance. The modified STREAM-PMem application underscores the ease of transitioning programming models from PMem to CXL, thus underscoring the practicality of adopting CXL memory.
△ Less
Submitted 21 August, 2023;
originally announced August 2023.
-
Portability and Scalability of OpenMP Offloading on State-of-the-art Accelerators
Authors:
Yehonatan Fridman,
Guy Tamir,
Gal Oren
Abstract:
Over the last decade, most of the increase in computing power has been gained by advances in accelerated many-core architectures, mainly in the form of GPGPUs. While accelerators achieve phenomenal performances in various computing tasks, their utilization requires code adaptations and transformations. Thus, OpenMP, the most common standard for multi-threading in scientific computing applications,…
▽ More
Over the last decade, most of the increase in computing power has been gained by advances in accelerated many-core architectures, mainly in the form of GPGPUs. While accelerators achieve phenomenal performances in various computing tasks, their utilization requires code adaptations and transformations. Thus, OpenMP, the most common standard for multi-threading in scientific computing applications, introduced offloading capabilities between host (CPUs) and accelerators since v4.0, with increasing support in the successive v4.5, v5.0, v5.1, and the latest v5.2 versions. Recently, two state-of-the-art GPUs -- the Intel Ponte Vecchio Max 1100 and the NVIDIA A100 GPUs -- were released to the market, with the oneAPI and NVHPC compilers for offloading, correspondingly. In this work, we present early performance results of OpenMP offloading capabilities to these devices while specifically analyzing the portability of advanced directives (using SOLLVE's OMPVV test suite) and the scalability of the hardware in representative scientific mini-app (the LULESH benchmark). Our results show that the coverage for version 4.5 is nearly complete in both latest NVHPC and oneAPI tools. However, we observed a lack of support in versions 5.0, 5.1, and 5.2, which is particularly noticeable when using NVHPC. From the performance perspective, we found that the PVC1100 and A100 are relatively comparable on the LULESH benchmark. While the A100 is slightly better due to faster memory bandwidth, the PVC1100 reaches the next problem size (400^3) scalably due to the larger memory size.
△ Less
Submitted 14 May, 2023; v1 submitted 9 April, 2023;
originally announced April 2023.
-
The Case for Non-Volatile RAM in Cloud HPCaaS
Authors:
Yehonatan Fridman,
Re'em Harel,
Gal Oren
Abstract:
HPC as a service (HPCaaS) is a new way to expose HPC resources via cloud services. However, continued effort to port large-scale tightly coupled applications with high interprocessor communication to multiple (and many) nodes synchronously, as in on-premise supercomputers, is still far from satisfactory due to network latencies. As a consequence, in said cases, HPCaaS is recommended to be used wit…
▽ More
HPC as a service (HPCaaS) is a new way to expose HPC resources via cloud services. However, continued effort to port large-scale tightly coupled applications with high interprocessor communication to multiple (and many) nodes synchronously, as in on-premise supercomputers, is still far from satisfactory due to network latencies. As a consequence, in said cases, HPCaaS is recommended to be used with one or few instances. In this paper we take the claim that new piece of memory hardware, namely Non-Volatile RAM (NVRAM), can allow such computations to scale up to an order of magnitude with marginalized penalty in comparison to RAM. Moreover, we suggest that the introduction of NVRAM to HPCaaS can be cost-effective to the users and the suppliers in numerous forms.
△ Less
Submitted 3 August, 2022;
originally announced August 2022.
-
Recovery of Distributed Iterative Solvers for Linear Systems Using Non-Volatile RAM
Authors:
Yehonatan Fridman,
Yaniv Snir,
Harel Levin,
Danny Hendler,
Hagit Attiya,
Gal Oren
Abstract:
HPC systems are a critical resource for scientific research. The increased demand for computational power and memory ushers in the exascale era, in which supercomputers are designed to provide enormous computing power to meet these needs. These complex supercomputers consist of numerous compute nodes and are consequently expected to experience frequent faults and crashes.
Mathematical solvers, i…
▽ More
HPC systems are a critical resource for scientific research. The increased demand for computational power and memory ushers in the exascale era, in which supercomputers are designed to provide enormous computing power to meet these needs. These complex supercomputers consist of numerous compute nodes and are consequently expected to experience frequent faults and crashes.
Mathematical solvers, in particular, iterative linear solvers are key building block in numerous large-scale scientific applications. Consequently, supporting the recovery of distributed solvers is necessary for scaling scientific applications to exascale platforms. Previous recovery methods for iterative solvers are based on Checkpoint-Restart (CR), which incurs high fault tolerance overhead, or intrinsic fault tolerance, which require extra computation time to converge after failures.
Exact state reconstruction (ESR) was proposed as an alternative mechanism to alleviate the impact of frequent failures on long-term computations. ESR has been shown to provide exact reconstruction of the computation state while avoiding the need for costly checkpointing. However, ESR currently relies on volatile memory for fault tolerance, and must therefore maintain redundancies in the RAM of multiple nodes, incurring high memory and network overheads.
Recent supercomputer designs feature emerging non-volatile RAM (NVRAM) technology. This paper investigates how NVRAM can be utilized to devise an enhanced ESR-based recovery mechanism that is more efficient and provides full resilience. Our mechanism, called in-NVRAM ESR, is based on a novel MPI One-Sided Communication (OSC) over RDMA implementation, and provides full resiliency while significantly reducing both the memory footprint and the time overhead in comparison with the original ESR design (in-RAM ESR).
△ Less
Submitted 9 August, 2022; v1 submitted 25 April, 2022;
originally announced April 2022.
-
ChangeChip: A Reference-Based Unsupervised Change Detection for PCB Defect Detection
Authors:
Yehonatan Fridman,
Matan Rusanovsky,
Gal Oren
Abstract:
The usage of electronic devices increases, and becomes predominant in most aspects of life. Surface Mount Technology (SMT) is the most common industrial method for manufacturing electric devices in which electrical components are mounted directly onto the surface of a Printed Circuit Board (PCB). Although the expansion of electronic devices affects our lives in a productive way, failures or defect…
▽ More
The usage of electronic devices increases, and becomes predominant in most aspects of life. Surface Mount Technology (SMT) is the most common industrial method for manufacturing electric devices in which electrical components are mounted directly onto the surface of a Printed Circuit Board (PCB). Although the expansion of electronic devices affects our lives in a productive way, failures or defects in the manufacturing procedure of those devices might also be counterproductive and even harmful in some cases. It is therefore desired and sometimes crucial to ensure zero-defect quality in electronic devices and their production. While traditional Image Processing (IP) techniques are not sufficient to produce a complete solution, other promising methods like Deep Learning (DL) might also be challenging for PCB inspection, mainly because such methods require big adequate datasets which are missing, not available or not updated in the rapidly growing field of PCBs. Thus, PCB inspection is conventionally performed manually by human experts. Unsupervised Learning (UL) methods may potentially be suitable for PCB inspection, having learning capabilities on the one hand, while not relying on large datasets on the other. In this paper, we introduce ChangeChip, an automated and integrated change detection system for defect detection in PCBs, from soldering defects to missing or misaligned electronic elements, based on Computer Vision (CV) and UL. We achieve good quality defect detection by applying an unsupervised change detection between images of a golden PCB (reference) and the inspected PCB under various setting. In this work, we also present CD-PCB, a synthesized labeled dataset of 20 pairs of PCB images for evaluation of defect detection algorithms.
△ Less
Submitted 13 September, 2021;
originally announced September 2021.
-
Assessing the Use Cases of Persistent Memory in High-Performance Scientific Computing
Authors:
Yehonatan Fridman,
Yaniv Snir,
Matan Rusanovsky,
Kfir Zvi,
Harel Levin,
Danny Hendler,
Hagit Attiya,
Gal Oren
Abstract:
As the High Performance Computing world moves towards the Exa-Scale era, huge amounts of data should be analyzed, manipulated and stored. In the traditional storage/memory hierarchy, each compute node retains its data objects in its local volatile DRAM. Whenever the DRAM's capacity becomes insufficient for storing this data, the computation should either be distributed between several compute node…
▽ More
As the High Performance Computing world moves towards the Exa-Scale era, huge amounts of data should be analyzed, manipulated and stored. In the traditional storage/memory hierarchy, each compute node retains its data objects in its local volatile DRAM. Whenever the DRAM's capacity becomes insufficient for storing this data, the computation should either be distributed between several compute nodes, or some portion of these data objects must be stored in a non-volatile block device such as a hard disk drive or an SSD storage device. Optane DataCenter Persistent Memory Module (DCPMM), a new technology introduced by Intel, provides non-volatile memory that can be plugged into standard memory bus slots and therefore be accessed much faster than standard storage devices. In this work, we present and analyze the results of a comprehensive performance assessment of several ways in which DCPMM can 1) replace standard storage devices, and 2) replace or augment DRAM for improving the performance of HPC scientific computations. To achieve this goal, we have configured an HPC system such that DCPMM can service I/O operations of scientific applications, replace standard storage devices and file systems (specifically for diagnostics and checkpoint-restarting), and serve for expanding applications' main memory. We focus on kee** the scientific codes with as few changes as possible, while allowing them to access the NVM transparently as if they access persistent storage. Our results show that DCPMM allows scientific applications to fully utilize nodes' locality by providing them with sufficiently-large main memory. Moreover, it can be used for providing a high-performance replacement for persistent storage. Thus, the usage of DCPMM has the potential of replacing standard HDD and SSD storage devices in HPC architectures and enabling a more efficient platform for modern supercomputing applications.
△ Less
Submitted 5 September, 2021;
originally announced September 2021.
-
Complete CVDL Methodology for Investigating Hydrodynamic Instabilities
Authors:
Re'em Harel,
Matan Rusanovsky,
Yehonatan Fridman,
Assaf Shimony,
Gal Oren
Abstract:
In fluid dynamics, one of the most important research fields is hydrodynamic instabilities and their evolution in different flow regimes. The investigation of said instabilities is concerned with the highly non-linear dynamics. Currently, three main methods are used for understanding of such phenomenon - namely analytical models, experiments and simulations - and all of them are primarily investig…
▽ More
In fluid dynamics, one of the most important research fields is hydrodynamic instabilities and their evolution in different flow regimes. The investigation of said instabilities is concerned with the highly non-linear dynamics. Currently, three main methods are used for understanding of such phenomenon - namely analytical models, experiments and simulations - and all of them are primarily investigated and correlated using human expertise. In this work we claim and demonstrate that a major portion of this research effort could and should be analysed using recent breakthrough advancements in the field of Computer Vision with Deep Learning (CVDL, or Deep Computer-Vision). Specifically, we target and evaluate specific state-of-the-art techniques - such as Image Retrieval, Template Matching, Parameters Regression and Spatiotemporal Prediction - for the quantitative and qualitative benefits they provide. In order to do so we focus in this research on one of the most representative instabilities, the Rayleigh-Taylor one, simulate its behaviour and create an open-sourced state-of-the-art annotated database (RayleAI). Finally, we use adjusted experimental results and novel physical loss methodologies to validate the correspondence of the predicted results to actual physical reality to prove the models efficiency. The techniques which were developed and proved in this work can be served as essential tools for physicists in the field of hydrodynamics for investigating a variety of physical systems, and also could be used via Transfer Learning to other instabilities research. A part of the techniques can be easily applied on already exist simulation results. All models as well as the data-set that was created for this work, are publicly available at: https://github.com/scientific-computing-nrcn/SimulAI.
△ Less
Submitted 26 April, 2020; v1 submitted 3 April, 2020;
originally announced April 2020.
-
Multistability and regime shifts in microbial communities explained by competition for essential nutrients
Authors:
Veronika Dubinkina,
Yulia Fridman,
Parth Pratim Pandey,
Sergei Maslov
Abstract:
Microbial communities routinely have several possible species compositions or community states observed for the same environmental parameters. Changes in these parameters can trigger abrupt and persistent transitions (regime shifts) between such community states. Yet little is known about the main determinants and mechanisms of multistability in microbial communities. Here we introduce and study a…
▽ More
Microbial communities routinely have several possible species compositions or community states observed for the same environmental parameters. Changes in these parameters can trigger abrupt and persistent transitions (regime shifts) between such community states. Yet little is known about the main determinants and mechanisms of multistability in microbial communities. Here we introduce and study a resource-explicit model in which microbes compete for two types of essential nutrients. We adapt game-theoretical methods of the stable matching problem to identify all possible species compositions of a microbial community. We then classify them by their resilience against three types of perturbations: fluctuations in nutrient supply, invasions by new species, and small changes of abundances of existing ones. We observe multistability and explore an intricate network of regime shifts between stable states in our model. Our results suggest that multistability requires microbial species to have different stoichiometries of essential nutrients. We also find that balanced nutrient supply promote multistability and species diversity yet make individual community states less stable.
△ Less
Submitted 29 June, 2019;
originally announced July 2019.
-
Alternative stable states in a model of microbial community limited by multiple essential nutrients
Authors:
Veronika Dubinkina,
Yulia Fridman,
Parth Pandey,
Sergei Maslov
Abstract:
Microbial communities routinely have several alternative stable states observed for the same environmental parameters. Sudden and irreversible transitions between these states make external manipulation of these systems more complicated. To better understand the mechanisms and origins of multistability in microbial communities, we introduce and study a model of a microbial ecosystem colonized by m…
▽ More
Microbial communities routinely have several alternative stable states observed for the same environmental parameters. Sudden and irreversible transitions between these states make external manipulation of these systems more complicated. To better understand the mechanisms and origins of multistability in microbial communities, we introduce and study a model of a microbial ecosystem colonized by multiple specialist species selected from a fixed pool. Growth of each species can be limited by essential nutrients of two types, e.g. carbon and nitrogen, each represented in the environment by multiple metabolites. We demonstrate that our model has an exponentially large number of potential stable states realized for different environmental parameters. Using game theoretical methods adapted from the stable marriage problem we predict all of these states based only on ranked lists of competitive abilities of species for each of the nutrients. We show that for every set of nutrient influxes, several mutually uninvadable stable states are generally feasible and we distinguish them based upon their dynamic stability. We further explore an intricate network of discontinuous transitions (regime shifts) between these alternative states both in the course of community assembly, or upon changes of nutrient influxes.
△ Less
Submitted 10 October, 2018;
originally announced October 2018.
-
Inertial longitudinal magnetization reversal for non-Heisenberg ferromagnets
Authors:
E. G. Galkina,
V. I. Butrim,
Yu. A. Fridman,
B. A. Ivanov,
F. Nori
Abstract:
We analyze theoretically the novel pathway of ultrafast spin dynamics for ferromagnets with high enough single-ion anisotropy (non-Heisenberg ferromagnets). This longitudinal spin dynamics includes the coupled oscillations of the modulus of the magnetization together with the quadrupolar spin variables, which are expressed through quantum expectation values of operators bilinear on the spin compon…
▽ More
We analyze theoretically the novel pathway of ultrafast spin dynamics for ferromagnets with high enough single-ion anisotropy (non-Heisenberg ferromagnets). This longitudinal spin dynamics includes the coupled oscillations of the modulus of the magnetization together with the quadrupolar spin variables, which are expressed through quantum expectation values of operators bilinear on the spin components. Even for a simple single-element ferromagnet, such a dynamics can lead to an inertial magnetization reversal under the action of an ultrashort laser pulse.
△ Less
Submitted 30 June, 2013; v1 submitted 5 June, 2013;
originally announced June 2013.
-
Spin nematic state for a spin S=3/2 isotropic non-Heisenberg magnet
Authors:
Yu. A. Fridman,
O. A. Kosmachev,
B. A. Ivanov
Abstract:
$S=3/2$ system with general isotropic nearest-neighbor exchange within a mean-field approximation possesses a magnetically ordered ferromagnetic state and antiferromagnetic state, and two different spin nematic states, with zero spin expectation values. Both spin nematic phases display complicated symmetry break, including standard rotational break described by the vector-director $\vec {u}$ and…
▽ More
$S=3/2$ system with general isotropic nearest-neighbor exchange within a mean-field approximation possesses a magnetically ordered ferromagnetic state and antiferromagnetic state, and two different spin nematic states, with zero spin expectation values. Both spin nematic phases display complicated symmetry break, including standard rotational break described by the vector-director $\vec {u}$ and specific symmetry break with respect to the time reversal. The break of time reversal is determined by non-trivial quantum averages cubic over the spin components and can be described by unit "pseudospin" vector $\vec σ$. The vectors $\vec σ$ on different sites are parallel for a nematic state, and $\vec σ$'s are antiparallel for different sublattices for an antinematic phase.
△ Less
Submitted 26 July, 2009;
originally announced July 2009.
-
On the Possibility of Development of the Explosion Instability in a Two-Component Gravitating System
Authors:
A. S. Kingsep,
Yu. A. Fridman
Abstract:
We obtain an expression for the energy of the density wave propagating in a multicomponent gravitating medium in the form well known from electrodynamics. Using the above, the possibility of "triple production" of the quasi-particles, or waves, with their energies summing up to zero, in a non-equilibrium medium is demonstrated. That kind of resonance wave interaction is shown to result in the de…
▽ More
We obtain an expression for the energy of the density wave propagating in a multicomponent gravitating medium in the form well known from electrodynamics. Using the above, the possibility of "triple production" of the quasi-particles, or waves, with their energies summing up to zero, in a non-equilibrium medium is demonstrated. That kind of resonance wave interaction is shown to result in the development of an explosion instability. By the method developed in plasma physics, the characteristic time of the instability is evaluated.
△ Less
Submitted 30 April, 2008;
originally announced April 2008.
-
Magnitoelastic interaction and long-range magnetic ordering in two-dimesional ferromagnetics
Authors:
Yu. N. Mitsay,
Yu. A. Fridman,
D. V. Spirin,
C. N. Alexeev,
M. S. Kochmaśki
Abstract:
The influence of magnitoelastic (ME) interaction on the stabilization of long-range magnetic order (LMO) in the two-dimensional easy-plane ferromagnetic is investigated in this work. The account of ME exchange results in the root dispersion law of magnons and appearance of ME gap in the spectra of elementary excitations. Such a behavior of the spectra testifies to the stabilization of LMO and fi…
▽ More
The influence of magnitoelastic (ME) interaction on the stabilization of long-range magnetic order (LMO) in the two-dimensional easy-plane ferromagnetic is investigated in this work. The account of ME exchange results in the root dispersion law of magnons and appearance of ME gap in the spectra of elementary excitations. Such a behavior of the spectra testifies to the stabilization of LMO and finite Curie's temperature.
△ Less
Submitted 9 July, 1998;
originally announced July 1998.