Skip to main content

Showing 1–22 of 22 results for author: Peng, I B

.
  1. Leveraging HPC Profiling & Tracing Tools to Understand the Performance of Particle-in-Cell Monte Carlo Simulations

    Authors: Jeremy J. Williams, David Tskhakaya, Stefan Costea, Ivy B. Peng, Marta Garcia-Gasulla, Stefano Markidis

    Abstract: Large-scale plasma simulations are critical for designing and develo** next-generation fusion energy devices and modeling industrial plasmas. BIT1 is a massively parallel Particle-in-Cell code designed for specifically studying plasma material interaction in fusion devices. Its most salient characteristic is the inclusion of collision Monte Carlo models for different plasma species. In this work… ▽ More

    Submitted 28 June, 2023; originally announced June 2023.

    Comments: Accepted by the Euro-Par 2023 workshops (TDLPP 2023), prepared in the standardized Springer LNCS format and consists of 12 pages, which includes the main text, references, and figures

  2. arXiv:2304.03748  [pdf, other

    cs.LG cs.AI physics.comp-ph physics.data-an

    Perspectives on AI Architectures and Co-design for Earth System Predictability

    Authors: Maruti K. Mudunuru, James A. Ang, Mahantesh Halappanavar, Simon D. Hammond, Maya B. Gokhale, James C. Hoe, Tushar Krishna, Sarat S. Sreepathi, Matthew R. Norman, Ivy B. Peng, Philip W. Jones

    Abstract: Recently, the U.S. Department of Energy (DOE), Office of Science, Biological and Environmental Research (BER), and Advanced Scientific Computing Research (ASCR) programs organized and held the Artificial Intelligence for Earth System Predictability (AI4ESP) workshop series. From this workshop, a critical conclusion that the DOE BER and ASCR community came to is the requirement to develop a new par… ▽ More

    Submitted 7 April, 2023; originally announced April 2023.

    Comments: 23 pages, 1 figure

  3. Evaluating Emerging CXL-enabled Memory Pooling for HPC Systems

    Authors: Jacob Wahlgren, Maya Gokhale, Ivy B. Peng

    Abstract: Current HPC systems provide memory resources that are statically configured and tightly coupled with compute nodes. However, workloads on HPC systems are evolving. Diverse workloads lead to a need for configurable memory resources to achieve high performance and utilization. In this study, we evaluate a memory subsystem design leveraging CXL-enabled memory pooling. Two promising use cases of compo… ▽ More

    Submitted 4 November, 2022; originally announced November 2022.

    Comments: 10 pages, 13 figures. Accepted for publication in Workshop on Memory Centric High Performance Computing (MCHPC'22) at SC22

  4. arXiv:2106.05373  [pdf, other

    cs.DC cs.LG cs.NE

    StreamBrain: An HPC Framework for Brain-like Neural Networks on CPUs, GPUs and FPGAs

    Authors: Artur Podobas, Martin Svedin, Steven W. D. Chien, Ivy B. Peng, Naresh Balaji Ravichandran, Pawel Herman, Anders Lansner, Stefano Markidis

    Abstract: The modern deep learning method based on backpropagation has surged in popularity and has been used in multiple domains and application areas. At the same time, there are other -- less-known -- machine learning algorithms with a mature and solid theoretical foundation whose performance remains unexplored. One such example is the brain-like Bayesian Confidence Propagation Neural Network (BCPNN). In… ▽ More

    Submitted 9 June, 2021; originally announced June 2021.

    Comments: Accepted for publication at the International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies (HEART 2021)

  5. sputniPIC: an Implicit Particle-in-Cell Code for Multi-GPU Systems

    Authors: Steven W. D. Chien, Jonas Nylund, Gabriel Bengtsson, Ivy B. Peng, Artur Podobas, Stefano Markidis

    Abstract: Large-scale simulations of plasmas are essential for advancing our understanding of fusion devices, space, and astrophysical systems. Particle-in-Cell (PIC) codes have demonstrated their success in simulating numerous plasma phenomena on HPC systems. Today, flagship supercomputers feature multiple GPUs per compute node to achieve unprecedented computing power at high power efficiency. PIC codes re… ▽ More

    Submitted 10 August, 2020; originally announced August 2020.

    Comments: Accepted for publication at the 32nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2020)

  6. tf-Darshan: Understanding Fine-grained I/O Performance in Machine Learning Workloads

    Authors: Steven W. D. Chien, Artur Podobas, Ivy B. Peng, Stefano Markidis

    Abstract: Machine Learning applications on HPC systems have been gaining popularity in recent years. The upcoming large scale systems will offer tremendous parallelism for training through GPUs. However, another heavy aspect of Machine Learning is I/O, and this can potentially be a performance bottleneck. TensorFlow, one of the most popular Deep-Learning platforms, now offers a new profiler interface and al… ▽ More

    Submitted 11 August, 2020; v1 submitted 10 August, 2020; originally announced August 2020.

    Comments: Accepted for publication at the 2020 International Conference on Cluster Computing (CLUSTER 2020)

  7. Performance Evaluation of Advanced Features in CUDA Unified Memory

    Authors: Steven W. D. Chien, Ivy B. Peng, Stefano Markidis

    Abstract: CUDA Unified Memory improves the GPU programmability and also enables GPU memory oversubscription. Recently, two advanced memory features, memory advises and asynchronous prefetch, have been introduced. In this work, we evaluate the new features on two platforms that feature different CPUs, GPUs, and interconnects. We derive a benchmark suite for the experiments and stress the memory system to eva… ▽ More

    Submitted 21 October, 2019; originally announced October 2019.

    Comments: Accepted for publication at Workshop on Memory Centric High Performance Computing (MCHPC'19) in SC19

  8. arXiv:1910.07566  [pdf

    cs.DC

    UMap: Enabling Application-driven Optimizations for Page Management

    Authors: Ivy B. Peng, Marty McFadden, Eric Green, Keita Iwabuchi, Kai Wu, Dong Li, Roger Pearce, Maya Gokhale

    Abstract: Leadership supercomputers feature a diversity of storage, from node-local persistent memory and NVMe SSDs to network-interconnected flash memory and HDD. Memory map** files on different tiers of storage provides a uniform interface in applications. However, system-wide services like mmap are optimized for generality and lack flexibility for enabling application-specific optimizations. In this wo… ▽ More

    Submitted 16 October, 2019; originally announced October 2019.

  9. System Evaluation of the Intel Optane Byte-addressable NVM

    Authors: Ivy B. Peng, Maya B. Gokhale, Eric W. Green

    Abstract: Byte-addressable non-volatile memory (NVM) features high density, DRAM comparable performance, and persistence. These characteristics position NVM as a promising new tier in the memory hierarchy. Nevertheless, NVM has asymmetric read and write performance, and considerably higher write energy than DRAM. Our work provides an in-depth evaluation of the first commercially available byte-addressable N… ▽ More

    Submitted 18 August, 2019; originally announced August 2019.

    Journal ref: In Proceedings of the International Symposium on Memory Systems, 2019

  10. Posit NPB: Assessing the Precision Improvement in HPC Scientific Applications

    Authors: Steven W. D. Chien, Ivy B. Peng, Stefano Markidis

    Abstract: Floating-point operations can significantly impact the accuracy and performance of scientific applications on large-scale parallel systems. Recently, an emerging floating-point format called Posit has attracted attention as an alternative to the standard IEEE floating-point formats because it could enable higher precision than IEEE formats using the same number of bits. In this work, we first expl… ▽ More

    Submitted 12 July, 2019; originally announced July 2019.

    Comments: Accepted for publication in PPAM 2019 conference

  11. arXiv:1810.04110  [pdf, other

    cs.DC

    MPI Windows on Storage for HPC Applications

    Authors: Sergio Rivas-Gomez, Roberto Gioiosa, Ivy Bo Peng, Gokcen Kestor, Sai Narasimhamurthy, Erwin Laure, Stefano Markidis

    Abstract: Upcoming HPC clusters will feature hybrid memories and storage devices per compute node. In this work, we propose to use the MPI one-sided communication model and MPI windows as unique interface for programming memory and storage. We describe the design and implementation of MPI storage windows, and present its benefits for out-of-core execution, parallel I/O and fault-tolerance. In addition, we e… ▽ More

    Submitted 9 October, 2018; originally announced October 2018.

  12. The SAGE Project: a Storage Centric Approach for Exascale Computing

    Authors: Sai Narasimhamurthy, Nikita Danilov, Sining Wu, Ganesan Umanesan, Steven Wei-der Chien, Sergio Rivas-Gomez, Ivy Bo Peng, Erwin Laure, Shaun de Witt, Dirk Pleiter, Stefano Markidis

    Abstract: SAGE (Percipient StorAGe for Exascale Data Centric Computing) is a European Commission funded project towards the era of Exascale computing. Its goal is to design and implement a Big Data/Extreme Computing (BDEC) capable infrastructure with associated software stack. The SAGE system follows a "storage centric" approach as it is capable of storing and processing large data volumes at the Exascale r… ▽ More

    Submitted 6 July, 2018; originally announced July 2018.

    Comments: Submitted to Computing Frontiers 2018. arXiv admin note: substantial text overlap with arXiv:1805.00556

  13. SAGE: Percipient Storage for Exascale Data Centric Computing

    Authors: Sai Narasimhamurthy, Nikita Danilov, Sining Wu, Ganesan Umanesan, Stefano Markidis, Sergio Rivas-Gomez, Ivy Bo Peng, Erwin Laure, Dirk Pleiter, Shaun de Witt

    Abstract: We aim to implement a Big Data/Extreme Computing (BDEC) capable system infrastructure as we head towards the era of Exascale computing - termed SAGE (Percipient StorAGe for Exascale Data Centric Computing). The SAGE system will be capable of storing and processing immense volumes of data at the Exascale regime, and provide the capability for Exascale class applications to use such a storage infras… ▽ More

    Submitted 1 May, 2018; originally announced May 2018.

    Journal ref: Parallel Computing, 23 March 2018

  14. NVIDIA Tensor Core Programmability, Performance & Precision

    Authors: Stefano Markidis, Steven Wei Der Chien, Erwin Laure, Ivy Bo Peng, Jeffrey S. Vetter

    Abstract: The NVIDIA Volta GPU microarchitecture introduces a specialized unit, called "Tensor Core" that performs one matrix-multiply-and-accumulate on 4x4 matrices per clock cycle. The NVIDIA Tesla V100 accelerator, featuring the Volta microarchitecture, provides 640 Tensor Cores with a theoretical peak performance of 125 Tflops/s in mixed precision. In this paper, we investigate current approaches to pro… ▽ More

    Submitted 11 March, 2018; originally announced March 2018.

    Comments: This paper has been accepted by the Eighth International Workshop on Accelerators and Hybrid Exascale Systems (AsHES) 2018

  15. arXiv:1708.01306  [pdf, other

    cs.DC

    MPI Streams for HPC Applications

    Authors: Ivy Bo Peng, Stefano Markidis, Roberto Gioiosa, Gokcen Kestor, Erwin Laure

    Abstract: Data streams are a sequence of data flowing between source and destination processes. Streaming is widely used for signal, image and video processing for its efficiency in pipelining and effectiveness in reducing demand for memory. The goal of this work is to extend the use of data streams to support both conventional scientific applications and emerging data analytic applications running on HPC p… ▽ More

    Submitted 3 August, 2017; originally announced August 2017.

    Comments: Advances in Parallel Computing

  16. arXiv:1708.01304  [pdf, other

    cs.DC

    Preparing HPC Applications for the Exascale Era: A Decoupling Strategy

    Authors: Ivy Bo Peng, Roberto Gioiosa, Gokcen Kestor, Erwin Laure, Stefano Markidis

    Abstract: Production-quality parallel applications are often a mixture of diverse operations, such as computation- and communication-intensive, regular and irregular, tightly coupled and loosely linked operations. In conventional construction of parallel applications, each process performs all the operations, which might result inefficient and seriously limit scalability, especially at large scale. We propo… ▽ More

    Submitted 3 August, 2017; originally announced August 2017.

    Comments: The 46th International Conference on Parallel Processing (ICPP-2017)

  17. arXiv:1704.08492  [pdf

    cs.DC

    Extending Message Passing Interface Windows to Storage

    Authors: Sergio Rivas-Gomez, Stefano Markidis, Ivy Bo Peng, Erwin Laure, Gokcen Kestor, Roberto Gioiosa

    Abstract: This work presents an extension to MPI supporting the one-sided communication model and window allocations in storage. Our design transparently integrates with the current MPI implementations, enabling applications to target MPI windows in storage, memory or both simultaneously, without major modifications. Initial performance results demonstrate that the presented MPI window extension could poten… ▽ More

    Submitted 27 April, 2017; originally announced April 2017.

  18. Exploring the Performance Benefit of Hybrid Memory System on HPC Environments

    Authors: Ivy Bo Peng, Roberto Gioiosa, Gokcen Kestor, Erwin Laure, Stefano Markidis

    Abstract: Hardware accelerators have become a de-facto standard to achieve high performance on current supercomputers and there are indications that this trend will increase in the future. Modern accelerators feature high-bandwidth memory next to the computing cores. For example, the Intel Knights Landing (KNL) processor is equipped with 16 GB of high-bandwidth memory (HBM) that works together with conventi… ▽ More

    Submitted 26 April, 2017; originally announced April 2017.

  19. Idle Period Propagation in Message-Passing Applications

    Authors: Ivy Bo Peng, Stefano Markidis, Erwin Laure, Gokcen Kestor, Roberto Gioiosa

    Abstract: Idle periods on different processes of Message Passing applications are unavoidable. While the origin of idle periods on a single process is well understood as the effect of system and architectural random delays, yet it is unclear how these idle periods propagate from one process to another. It is important to understand idle period propagation in Message Passing applications as it allows applica… ▽ More

    Submitted 26 April, 2017; originally announced April 2017.

    Comments: 18th International Conference on High Performance Computing and Communications, IEEE, 2016

  20. Exploring Application Performance on Emerging Hybrid-Memory Supercomputers

    Authors: Ivy Bo Peng, Stefano Markidis, Erwin Laure, Gokcen Kestor, Roberto Gioiosa

    Abstract: Next-generation supercomputers will feature more hierarchical and heterogeneous memory systems with different memory technologies working side-by-side. A critical question is whether at large scale existing HPC applications and emerging data-analytics workloads will have performance improvement or degradation on these systems. We propose a systematic and fair methodology to identify the trend of a… ▽ More

    Submitted 26 April, 2017; originally announced April 2017.

    Comments: 18th International Conference on High Performance Computing and Communications, IEEE, 2016

  21. arXiv:1704.03803  [pdf, other

    physics.space-ph

    Global three-dimensional simulation of Earth's dayside reconnection using a two-way coupled magnetohydrodynamics with embedded particle-in-cell model: initial results

    Authors: Yuxi Chen, Gabor Toth, Paul Cassak, Xianzhe Jia, Tamas I. Gombosi, James A. Slavin, Stefano Markidis, Ivy Bo Peng, Vania K. Jordanova

    Abstract: We perform a three-dimensional (3D) global simulation of Earth's magnetosphere with kinetic reconnection physics to study the flux transfer events (FTEs) and dayside magnetic reconnection with the recently developed magnetohydrodynamics with embedded particle-in-cell model (MHD-EPIC). During the one-hour long simulation, the FTEs are generated quasi-periodically near the subsolar point and move to… ▽ More

    Submitted 12 April, 2017; originally announced April 2017.

  22. arXiv:1512.02018  [pdf, other

    physics.space-ph astro-ph.EP physics.plasm-ph

    Magnetic null points in kinetic simulations of space plasmas

    Authors: Vyacheslav Olshevsky, Jan Deca, Andrey Divin, Ivy Bo Peng, Stefano Markidis, Maria Elena Innocenti, Emanuele Cazzola, Giovanni Lapenta

    Abstract: We present a systematic attempt to study magnetic null points and the associated magnetic energy conversion in kinetic Particle-in-Cell simulations of various plasma configurations. We address three-dimensional simulations performed with the semi-implicit kinetic electromagnetic code iPic3D in different setups: variations of a Harris current sheet, dipolar and quadrupolar magnetospheres interactin… ▽ More

    Submitted 7 December, 2015; originally announced December 2015.

    Comments: Nordita program on Magnetic Reconnection in Plasmas 2015

    Report number: NORDITA-2015-127

    Journal ref: The Astrophysical Journal 2016, Volume 819, Number 1