Search | arXiv e-print repository

Evaluating Portable Parallelization Strategies for Heterogeneous Architectures in High Energy Physics

Authors: Mohammad Atif, Meghna Battacharya, Paolo Calafiura, Taylor Childers, Mark Dewing, Zhihua Dong, Oliver Gutsche, Salman Habib, Kyle Knoepfel, Matti Kortelainen, Ka Hei Martin Kwok, Charles Leggett, Meifeng Lin, Vincent Pascuzzi, Alexei Strelchenko, Vakhtang Tsulaia, Brett Viren, Tianle Wang, Beomki Yeo, Haiwang Yu

Abstract: High-energy physics (HEP) experiments have developed millions of lines of code over decades that are optimized to run on traditional x86 CPU systems. However, we are seeing a rapidly increasing fraction of floating point computing power in leadership-class computing facilities and traditional data centers coming from new accelerator architectures, such as GPUs. HEP experiments are now faced with t… ▽ More High-energy physics (HEP) experiments have developed millions of lines of code over decades that are optimized to run on traditional x86 CPU systems. However, we are seeing a rapidly increasing fraction of floating point computing power in leadership-class computing facilities and traditional data centers coming from new accelerator architectures, such as GPUs. HEP experiments are now faced with the untenable prospect of rewriting millions of lines of x86 CPU code, for the increasingly dominant architectures found in these computational accelerators. This task is made more challenging by the architecture-specific languages and APIs promoted by manufacturers such as NVIDIA, Intel and AMD. Producing multiple, architecture-specific implementations is not a viable scenario, given the available person power and code maintenance issues. The Portable Parallelization Strategies team of the HEP Center for Computational Excellence is investigating the use of Kokkos, SYCL, OpenMP, std::execution::parallel and alpaka as potential portability solutions that promise to execute on multiple architectures from the same source code, using representative use cases from major HEP experiments, including the DUNE experiment of the Long Baseline Neutrino Facility, and the ATLAS and CMS experiments of the Large Hadron Collider. This cross-cutting evaluation of portability solutions using real applications will help inform and guide the HEP community when choosing their software and hardware suites for the next generation of experimental frameworks. We present the outcomes of our studies, including performance metrics, porting challenges, API evaluations, and build system integration. △ Less

Submitted 27 June, 2023; originally announced June 2023.

Comments: 18 pages, 9 Figures, 2 Tables

arXiv:2304.01841 [pdf, other]

Portable Programming Model Exploration for LArTPC Simulation in a Heterogeneous Computing Environment: OpenMP vs. SYCL

Authors: Meifeng Lin, Zhihua Dong, Tianle Wang, Mohammad Atif, Meghna Battacharya, Kyle Knoepfel, Charles Leggett, Brett Viren, Haiwang Yu

Abstract: The evolution of the computing landscape has resulted in the proliferation of diverse hardware architectures, with different flavors of GPUs and other compute accelerators becoming more widely available. To facilitate the efficient use of these architectures in a heterogeneous computing environment, several programming models are available to enable portability and performance across different com… ▽ More The evolution of the computing landscape has resulted in the proliferation of diverse hardware architectures, with different flavors of GPUs and other compute accelerators becoming more widely available. To facilitate the efficient use of these architectures in a heterogeneous computing environment, several programming models are available to enable portability and performance across different computing systems, such as Kokkos, SYCL, OpenMP and others. As part of the High Energy Physics Center for Computational Excellence (HEP-CCE) project, we investigate if and how these different programming models may be suitable for experimental HEP workflows through a few representative use cases. One of such use cases is the Liquid Argon Time Projection Chamber (LArTPC) simulation which is essential for LArTPC detector design, validation and data analysis. Following up on our previous investigations of using Kokkos to port LArTPC simulation in the Wire-Cell Toolkit (WCT) to GPUs, we have explored OpenMP and SYCL as potential portable programming models for WCT, with the goal to make diverse computing resources accessible to the LArTPC simulations. In this work, we describe how we utilize relevant features of OpenMP and SYCL for the LArTPC simulation module in WCT. We also show performance benchmark results on multi-core CPUs, NVIDIA and AMD GPUs for both the OpenMP and the SYCL implementations. Comparisons with different compilers will also be given where appropriate. △ Less

Submitted 4 April, 2023; originally announced April 2023.

Comments: 6 pages, 6 figures. Proceedings of 21st International Workshop on Advanced Computing and Analysis Techniques in Physics Research (ACAT 2022)

arXiv:2203.14345 [pdf, other]

Evolution of HEP Processing Frameworks

Authors: Christopher D. Jones, Kyle Knoepfel, Paolo Calafiura, Charles Leggett, Vakhtang Tsulaia

Abstract: HEP data-processing software must support the disparate physics needs of many experiments. For both collider and neutrino environments, HEP experiments typically use data-processing frameworks to manage the computational complexities of their large-scale data processing needs. Data-processing frameworks are being faced with new challenges this decade. The computing landscape has changed from the p… ▽ More HEP data-processing software must support the disparate physics needs of many experiments. For both collider and neutrino environments, HEP experiments typically use data-processing frameworks to manage the computational complexities of their large-scale data processing needs. Data-processing frameworks are being faced with new challenges this decade. The computing landscape has changed from the past three decades of homogeneous single-core x86 batch jobs running on grid sites. Frameworks must now work on a heterogeneous mixture of different platforms: multi-core machines, different CPU architectures, and computational accelerators; and different computing sites: grid, cloud, and high-performance computing. We describe these challenges in more detail and how frameworks may confront them. Given their historic success, frameworks will continue to be critical software systems that enable HEP experiments to meet their computing needs. Frameworks have weathered computing revolutions in the past; they will do so again with support from the HEP community △ Less

Submitted 16 March, 2022; originally announced March 2022.

Comments: Contribution to Snowmass 2021

arXiv:2103.14737 [pdf, ps, other]

Porting HEP Parameterized Calorimeter Simulation Code to GPUs

Authors: Zhihua Dong, Heather Gray, Charles Leggett, Meifeng Lin, Vincent R. Pascuzzi, Kwangmin Yu

Abstract: The High Energy Physics (HEP) experiments, such as those at the Large Hadron Collider (LHC), traditionally consume large amounts of CPU cycles for detector simulations and data analysis, but rarely use compute accelerators such as GPUs. As the LHC is upgraded to allow for higher luminosity, resulting in much higher data rates, purely relying on CPUs may not provide enough computing power to suppor… ▽ More The High Energy Physics (HEP) experiments, such as those at the Large Hadron Collider (LHC), traditionally consume large amounts of CPU cycles for detector simulations and data analysis, but rarely use compute accelerators such as GPUs. As the LHC is upgraded to allow for higher luminosity, resulting in much higher data rates, purely relying on CPUs may not provide enough computing power to support the simulation and data analysis needs. As a proof of concept, we investigate the feasibility of porting a HEP parameterized calorimeter simulation code to GPUs. We have chosen to use FastCaloSim, the ATLAS fast parametrized calorimeter simulation. While FastCaloSim is sufficiently fast such that it does not impose a bottleneck in detector simulations overall, significant speed-ups in the processing of large samples can be achieved from GPU parallelization at both the particle (intra-event) and event levels; this is especially beneficial in conditions expected at the high-luminosity LHC, where extremely high per-event particle multiplicities will result from the many simultaneous proton-proton collisions. We report our experience with porting FastCaloSim to NVIDIA GPUs using CUDA. A preliminary Kokkos implementation of FastCaloSim for portability to other parallel architectures is also described. △ Less

Submitted 18 May, 2021; v1 submitted 26 March, 2021; originally announced March 2021.

Comments: 15 pages, 1 figure, 8 tables, 2 listings, submitted to Frontiers in Big Data (Big Data in AI and High Energy Physics)

arXiv:cs/0306089 [pdf, ps, other]

The StoreGate: a Data Model for the Atlas Software Architecture

Authors: P. Calafiura, C. G. Leggett, D. R. Quarrie, H. Ma, S. Rajagopalan

Abstract: The Atlas collaboration at CERN has adopted the Gaudi software architecture which belongs to the blackboard family: data objects produced by knowledge sources (e.g. reconstruction modules) are posted to a common in-memory data base from where other modules can access them and produce new data objects. The StoreGate has been designed, based on the Atlas requirements and the experience of other HE… ▽ More The Atlas collaboration at CERN has adopted the Gaudi software architecture which belongs to the blackboard family: data objects produced by knowledge sources (e.g. reconstruction modules) are posted to a common in-memory data base from where other modules can access them and produce new data objects. The StoreGate has been designed, based on the Atlas requirements and the experience of other HENP systems such as Babar, CDF, CLEO, D0 and LHCB, to identify in a simple and efficient fashion (collections of) data objects based on their type and/or the modules which posted them to the Transient Data Store (the blackboard). The developer also has the freedom to use her preferred key class to uniquely identify a data object according to any other criterion. Besides this core functionality, the StoreGate provides the developers with a powerful interface to handle in a coherent fashion persistable references, object lifetimes, memory management and access control policy for the data objects in the Store. It also provides a Handle/Proxy mechanism to define and hide the cache fault mechanism: upon request, a missing Data Object can be transparently created and added to the Transient Store presumably retrieving it from a persistent data-base, or even reconstructing it on demand. △ Less

Submitted 14 June, 2003; originally announced June 2003.

Comments: Talk from the 2003 Computing in High Energy and Nuclear Physics (CHEP03), La Jolla, Ca, USA, March 2003, 4 pages, LaTeX, MOJT008

ACM Class: D.2.11

Showing 1–5 of 5 results for author: Leggett, C