-
Lessons Learned Migrating CUDA to SYCL: A HEP Case Study with ROOT RDataFrame
Authors:
Jolly Chen,
Monica Dessole,
Ana Lucia Varbanescu
Abstract:
The world's largest particle accelerator, located at CERN, produces petabytes of data that need to be analysed efficiently, to study the fundamental structures of our universe. ROOT is an open-source C++ data analysis framework, developed for this purpose. Its high-level data analysis interface, RDataFrame, currently only supports CPU parallelism. Given the increasing heterogeneity in computing fa…
▽ More
The world's largest particle accelerator, located at CERN, produces petabytes of data that need to be analysed efficiently, to study the fundamental structures of our universe. ROOT is an open-source C++ data analysis framework, developed for this purpose. Its high-level data analysis interface, RDataFrame, currently only supports CPU parallelism. Given the increasing heterogeneity in computing facilities, it becomes crucial to efficiently support GPGPUs to take advantage of the available resources. SYCL allows for a single-source implementation, which enables support for different architectures. In this paper, we describe a CUDA implementation and the migration process to SYCL, focusing on a core high energy physics operation in RDataFrame -- histogramming. We detail the challenges that we faced when integrating SYCL into a large and complex code base. Furthermore, we perform an extensive comparative performance analysis of two SYCL compilers, AdaptiveCpp and DPC++, and the reference CUDA implementation. We highlight the performance bottlenecks that we encountered, and the methodology used to detect these. Based on our findings, we provide actionable insights for developers of SYCL applications.
△ Less
Submitted 24 January, 2024;
originally announced January 2024.
-
GenVectorX: A performance-portable SYCL library for Lorentz Vectors operations
Authors:
Monica Dessole,
Jolly Chen,
Axel Naumann
Abstract:
The Large Hadron Collider (LHC) at CERN will see an upgraded hardware configuration which will bring a new era of physics data taking and related computational challenges. To this end, it is necessary to exploit the ever increasing variety of computational architectures, featuring GPUs from multiple vendors and new accelerators. Performance portable frameworks, like SYCL, allow to offload the comp…
▽ More
The Large Hadron Collider (LHC) at CERN will see an upgraded hardware configuration which will bring a new era of physics data taking and related computational challenges. To this end, it is necessary to exploit the ever increasing variety of computational architectures, featuring GPUs from multiple vendors and new accelerators. Performance portable frameworks, like SYCL, allow to offload the computational work on non-CPU resources, while retaining their performance, without the need to maintain different implementations of the same code. The High Energy Physics (HEP) community employs a wide variety of algorithms and tools for accelerators, but it still lacks a streamlined coherent approach that can target many use cases without compromising the usability aspect. In this paper, we present our efforts in creating GenVectorX, a C++ package that provides classes and functionalities to represent and manipulate particle events using the SYCL programming model. The SYCL-based implementation exhibits comparable performance and scalability as the CUDA implementation when targeting NVIDIA GPUs.
△ Less
Submitted 5 December, 2023;
originally announced December 2023.
-
Exact sinogram: an analytical approach to the Radon transform of phantoms
Authors:
Monica Dessole,
Marta Gatto,
Davide Poggiali,
Francesca Tedeschi
Abstract:
Phantoms can serve as a gold standard for the validation of MRI numerical methods. In some special cases, it is possible to compute analytically the Radon transform, or sinogram, of a phantom. In this work, we present analytical formulae to compute the exact sinograms of three classes of phantoms. We compare the use of the discrete Radon transform, that yields an approximate sinogram, and the corr…
▽ More
Phantoms can serve as a gold standard for the validation of MRI numerical methods. In some special cases, it is possible to compute analytically the Radon transform, or sinogram, of a phantom. In this work, we present analytical formulae to compute the exact sinograms of three classes of phantoms. We compare the use of the discrete Radon transform, that yields an approximate sinogram, and the correspondent analytical sinogram for image reconstruction.
△ Less
Submitted 13 February, 2023;
originally announced February 2023.
-
The Lawson-Hanson Algorithm with Deviation Maximization: Finite Convergence and Sparse Recovery
Authors:
Monica Dessole,
Marco Dell'Orto,
Fabio Marcuzzi
Abstract:
In this work we apply the "deviation maximization", a new column selection strategy, to the Lawson-Hanson algorithm for the solution of NonNegative Least Squares (NNLS), devising a new algorithm we call Lawson-Hanson with Deviation Maximization (LHDM). This algorithm allows to exploit BLAS-3 operations, leading to higher performances. We show the finite convergence of this algorithm and explore th…
▽ More
In this work we apply the "deviation maximization", a new column selection strategy, to the Lawson-Hanson algorithm for the solution of NonNegative Least Squares (NNLS), devising a new algorithm we call Lawson-Hanson with Deviation Maximization (LHDM). This algorithm allows to exploit BLAS-3 operations, leading to higher performances. We show the finite convergence of this algorithm and explore the sparse recovery ability of LHDM. The results are presented with an extensive campaign of experiments, where we compare its performance against several $\ell_1$-minimization solvers. An implementation of the proposed algorithm is available on a public repository.
△ Less
Submitted 14 July, 2022; v1 submitted 11 August, 2021;
originally announced August 2021.
-
Deviation Maximization for Rank-Revealing QR Factorizations
Authors:
Monica Dessole,
Fabio Marcuzzi
Abstract:
In this paper we introduce a new column selection strategy, named here ``Deviation Maximization", and apply it to compute rank-revealing QR factorizations as an alternative to the well known block version of the QR factorization with the column pivoting method, called QP3 and currently implemented in LAPACK's xgeqp3 routine. We show that the resulting algorithm, named QRDM, has similar rank-reveal…
▽ More
In this paper we introduce a new column selection strategy, named here ``Deviation Maximization", and apply it to compute rank-revealing QR factorizations as an alternative to the well known block version of the QR factorization with the column pivoting method, called QP3 and currently implemented in LAPACK's xgeqp3 routine. We show that the resulting algorithm, named QRDM, has similar rank-revealing properties of QP3 and better execution times. We present numerical test results on a wide data set of numerically singular matrices, which has become a reference in the recent literature.
△ Less
Submitted 6 June, 2021;
originally announced June 2021.