-
Porting numerical integration codes from CUDA to oneAPI: a case study
Authors:
Ioannis Sakiotis,
Kamesh Arumugam,
Marc Paterno,
Desh Ranjan,
Balsa Terzic,
Mohammad Zubair
Abstract:
We present our experience in porting optimized CUDA implementations to oneAPI. We focus on the use case of numerical integration, particularly the CUDA implementations of PAGANI and $m$-Cubes. We faced several challenges that caused performance degradation in the oneAPI ports. These include differences in utilized registers per thread, compiler optimizations, and map**s of CUDA library calls to…
▽ More
We present our experience in porting optimized CUDA implementations to oneAPI. We focus on the use case of numerical integration, particularly the CUDA implementations of PAGANI and $m$-Cubes. We faced several challenges that caused performance degradation in the oneAPI ports. These include differences in utilized registers per thread, compiler optimizations, and map**s of CUDA library calls to oneAPI equivalents. After addressing those challenges, we tested both the PAGANI and m-Cubes integrators on numerous integrands of various characteristics. To evaluate the quality of the ports, we collected performance metrics of the CUDA and oneAPI implementations on the Nvidia V100 GPU. We found that the oneAPI ports often achieve comparable performance to the CUDA versions, and that they are at most 10% slower.
△ Less
Submitted 17 February, 2023; v1 submitted 11 February, 2023;
originally announced February 2023.
-
A Case Study on Parallel HDF5 Dataset Concatenation for High Energy Physics Data Analysis
Authors:
Sunwoo Lee,
Kai-yuan Hou,
Kewei Wang,
Saba Sehrish,
Marc Paterno,
James Kowalkowski,
Quincey Koziol,
Robert Ross,
Ankit Agrawal,
Alok Choudhary,
Wei-keng Liao
Abstract:
In High Energy Physics (HEP), experimentalists generate large volumes of data that, when analyzed, helps us better understand the fundamental particles and their interactions. This data is often captured in many files of small size, creating a data management challenge for scientists. In order to better facilitate data management, transfer, and analysis on large scale platforms, it is advantageous…
▽ More
In High Energy Physics (HEP), experimentalists generate large volumes of data that, when analyzed, helps us better understand the fundamental particles and their interactions. This data is often captured in many files of small size, creating a data management challenge for scientists. In order to better facilitate data management, transfer, and analysis on large scale platforms, it is advantageous to aggregate data further into a smaller number of larger files. However, this translation process can consume significant time and resources, and if performed incorrectly the resulting aggregated files can be inefficient for highly parallel access during analysis on large scale platforms. In this paper, we present our case study on parallel I/O strategies and HDF5 features for reducing data aggregation time, making effective use of compression, and ensuring efficient access to the resulting data during analysis at scale. We focus on NOvA detector data in this case study, a large-scale HEP experiment generating many terabytes of data. The lessons learned from our case study inform the handling of similar datasets, thus expanding community knowledge related to this common data management task.
△ Less
Submitted 2 May, 2022;
originally announced May 2022.
-
m-CUBES An efficient and portable implementation of multi-dimensional integration for gpus
Authors:
Ioannis Sakiotis,
Kamesh Arumugam,
Marc Paterno,
Desh Ranjan,
Balsa Terzic,
Mohammad Zubair
Abstract:
The task of multi-dimensional numerical integration is frequently encountered in physics and other scientific fields, e.g., in modeling the effects of systematic uncertainties in physical systems and in Bayesian parameter estimation. Multi-dimensional integration is often time-prohibitive on CPUs. Efficient implementation on many-core architectures is challenging as the workload across the integra…
▽ More
The task of multi-dimensional numerical integration is frequently encountered in physics and other scientific fields, e.g., in modeling the effects of systematic uncertainties in physical systems and in Bayesian parameter estimation. Multi-dimensional integration is often time-prohibitive on CPUs. Efficient implementation on many-core architectures is challenging as the workload across the integration space cannot be predicted a priori. We propose m-Cubes, a novel implementation of the well-known Vegas algorithm for execution on GPUs. Vegas transforms integration variables followed by calculation of a Monte Carlo integral estimate using adaptive partitioning of the resulting space. m-Cubes improves performance on GPUs by maintaining relatively uniform workload across the processors. As a result, our optimized Cuda implementation for Nvidia GPUs outperforms parallelization approaches proposed in past literature. We further demonstrate the efficiency of m-Cubes by evaluating a six-dimensional integral from a cosmology application, achieving significant speedup and greater precision than the CUBA library's CPU implementation of VEGAS. We also evaluate m-Cubes on a standard integrand test suite. m-Cubes outperforms the serial implementations of the Cuba and GSL libraries by orders of magnitude speedup while maintaining comparable accuracy. Our approach yields a speedup of at least 10 when compared against publicly available Monte Carlo based GPU implementations. In summary, m-Cubes can solve integrals that are prohibitively expensive using standard libraries and custom implementations. A modern C++ interface header-only implementation makes m-Cubes portable, allowing its utilization in complicated pipelines with easy to define stateful integrals. Compatibility with non-Nvidia GPUs is achieved with our initial implementation of m-Cubes using the Kokkos framework.
△ Less
Submitted 21 June, 2022; v1 submitted 3 February, 2022;
originally announced February 2022.
-
PAGANI: A Parallel Adaptive GPU Algorithm for Numerical
Authors:
Ioannis Sakiotis,
Kamesh Arumugam,
Marc Paterno,
Desh Ranjan,
Balša Terzić,
Mohammad Zubair
Abstract:
We present a new adaptive parallel algorithm for the challenging problem of multi-dimensional numerical integration on massively parallel architectures. Adaptive algorithms have demonstrated the best performance, but efficient many-core utilization is difficult to achieve because the adaptive work-load can vary greatly across the integration space and is impossible to predict a priori. Existing pa…
▽ More
We present a new adaptive parallel algorithm for the challenging problem of multi-dimensional numerical integration on massively parallel architectures. Adaptive algorithms have demonstrated the best performance, but efficient many-core utilization is difficult to achieve because the adaptive work-load can vary greatly across the integration space and is impossible to predict a priori. Existing parallel algorithms utilize sequential computations on independent processors, which results in bottlenecks due to the need for data redistribution and processor synchronization. Our algorithm employs a high-throughput approach in which all existing sub-regions are processed and sub-divided in parallel. Repeated sub-region classification and filtering improves upon a brute-force approach and allows the algorithm to make efficient use of computation and memory resources. A CUDA implementation shows orders of magnitude speedup over the fastest open-source CPU method and extends the achievable accuracy for difficult integrands. Our algorithm typically outperforms other existing deterministic parallel methods.
△ Less
Submitted 23 June, 2021; v1 submitted 13 April, 2021;
originally announced April 2021.
-
HEPCloud, an Elastic Hybrid HEP Facility using an Intelligent Decision Support System
Authors:
Parag Mhashilkar,
Mine Altunay,
Eileen Berman,
David Dagenhart,
Stuart Fuess,
Burt Holzman,
James Kowalkowski,
Dmitry Litvintsev,
Qiming Lu,
Alexander Moibenko,
Marc Paterno,
Panagiotis Spentzouris,
Steven Timm,
Anthony Tiradani,
Eric Vaandering,
John Hover,
Jose Caballero Bejar
Abstract:
HEPCloud is rapidly becoming the primary system for provisioning compute resources for all Fermilab-affiliated experiments. In order to reliably meet the peak demands of the next generation of High Energy Physics experiments, Fermilab must plan to elastically expand its computational capabilities to cover the forecasted need. Commercial cloud and allocation-based High Performance Computing (HPC) r…
▽ More
HEPCloud is rapidly becoming the primary system for provisioning compute resources for all Fermilab-affiliated experiments. In order to reliably meet the peak demands of the next generation of High Energy Physics experiments, Fermilab must plan to elastically expand its computational capabilities to cover the forecasted need. Commercial cloud and allocation-based High Performance Computing (HPC) resources both have explicit and implicit costs that must be considered when deciding when to provision these resources, and at which scale. In order to support such provisioning in a manner consistent with organizational business rules and budget constraints, we have developed a modular intelligent decision support system (IDSS) to aid in the automatic provisioning of resources spanning multiple cloud providers, multiple HPC centers, and grid computing federations. In this paper, we discuss the goals and architecture of the HEPCloud Facility, the architecture of the IDSS, and our early experience in using the IDSS for automated facility expansion both at Fermi and Brookhaven National Laboratory.
△ Less
Submitted 18 April, 2019;
originally announced April 2019.
-
Intelligently-automated facilities expansion with the HEPCloud Decision Engine
Authors:
Mine Altunay,
W. David Dagenhart,
Stuart Fuess,
Burt Holzman,
Jim Kowalkowski,
Dmitry Litvintsev,
Qiming Lu,
Parag Mhashilkar,
Alexander Moibenko,
Marc Paterno,
Panagiotis Spentzouris,
Steven Timm,
Anthony Tiradani
Abstract:
The next generation of High Energy Physics experiments are expected to generate exabytes of data---two orders of magnitude greater than the current generation. In order to reliably meet peak demands, facilities must either plan to provision enough resources to cover the forecasted need, or find ways to elastically expand their computational capabilities. Commercial cloud and allocation-based High…
▽ More
The next generation of High Energy Physics experiments are expected to generate exabytes of data---two orders of magnitude greater than the current generation. In order to reliably meet peak demands, facilities must either plan to provision enough resources to cover the forecasted need, or find ways to elastically expand their computational capabilities. Commercial cloud and allocation-based High Performance Computing (HPC) resources both have explicit and implicit costs that must be considered when deciding when to provision these resources, and to choose an appropriate scale. In order to support such provisioning in a manner consistent with organizational business rules and budget constraints, we have developed a modular intelligent decision support system (IDSS) to aid in the automatic provisioning of resources---spanning multiple cloud providers, multiple HPC centers, and grid computing federations.
△ Less
Submitted 11 June, 2018; v1 submitted 8 June, 2018;
originally announced June 2018.
-
Adapting SAM for CDF
Authors:
D. Bonham,
G. Garzoglio,
R. Herber,
J. Kowalkowski,
D. Litvintsev,
L. Lueking,
M. Paterno,
D. Petravick,
L. Piccoli,
R. Pordes,
N. Stanfield,
I. Terekhov,
J. Trumbo,
J. Tseng,
S. Veseli,
M. Votava,
V. White,
T. Huffman,
S. Stonjek,
K. Waltkins,
P. Crosby,
D. Waters,
R. St. Denis
Abstract:
The CDF and D0 experiments probe the high-energy frontier and as they do so have accumulated hundreds of Terabytes of data on the way to petabytes of data over the next two years. The experiments have made a commitment to use the develo** Grid based on the SAM system to handle these data. The D0 SAM has been extended for use in CDF as common patterns of design emerged to meet the similar requi…
▽ More
The CDF and D0 experiments probe the high-energy frontier and as they do so have accumulated hundreds of Terabytes of data on the way to petabytes of data over the next two years. The experiments have made a commitment to use the develo** Grid based on the SAM system to handle these data. The D0 SAM has been extended for use in CDF as common patterns of design emerged to meet the similar requirements of these experiments. The process by which the merger was achieved is explained with particular emphasis on lessons learned concerning the database design patterns plus realization of the use cases.
△ Less
Submitted 18 June, 2003;
originally announced June 2003.