Prototy** a ROOT-based distributed analysis workflow for HL-LHC: the CMS use case
Authors:
Tommaso Tedeschi,
Vincenzo Eduardo Padulano,
Daniele Spiga,
Diego Ciangottini,
Mirco Tracolli,
Enric Tejedor Saavedra,
Enrico Guiraud,
Massimo Biasotto
Abstract:
The challenges expected for the next era of the Large Hadron Collider (LHC), both in terms of storage and computing resources, provide LHC experiments with a strong motivation for evaluating ways of rethinking their computing models at many levels. Great efforts have been put into optimizing the computing resource utilization for the data analysis, which leads both to lower hardware requirements a…
▽ More
The challenges expected for the next era of the Large Hadron Collider (LHC), both in terms of storage and computing resources, provide LHC experiments with a strong motivation for evaluating ways of rethinking their computing models at many levels. Great efforts have been put into optimizing the computing resource utilization for the data analysis, which leads both to lower hardware requirements and faster turnaround for physics analyses. In this scenario, the Compact Muon Solenoid (CMS) collaboration is involved in several activities aimed at benchmarking different solutions for running High Energy Physics (HEP) analysis workflows. A promising solution is evolving software towards more user-friendly approaches featuring a declarative programming model and interactive workflows. The computing infrastructure should keep up with this trend by offering on the one side modern interfaces, and on the other side hiding the complexity of the underlying environment, while efficiently leveraging the already deployed grid infrastructure and scaling toward opportunistic resources like public cloud or HPC centers. This article presents the first example of using the ROOT RDataFrame technology to exploit such next-generation approaches for a production-grade CMS physics analysis. A new analysis facility is created to offer users a modern interactive web interface based on JupyterLab that can leverage HTCondor-based grid resources on different geographical sites. The physics analysis is converted from a legacy iterative approach to the modern declarative approach offered by RDataFrame and distributed over multiple computing nodes. The new scenario offers not only an overall improved programming experience, but also an order of magnitude speedup increase with respect to the previous approach.
△ Less
Submitted 24 July, 2023;
originally announced July 2023.
Running CMS software on GRID Testbeds
Authors:
D. Bonacorsi,
P. Capiluppi,
A. Fanfani,
C. Grandi,
M. Corvo,
F. Fanzago,
M. Sgaravatto,
M. Verlato,
C. Charlot,
I. Semeniuok,
D. Colling,
B. MacEvoy,
H. Tallini,
M. Biasotto,
S. Fantinel,
E. Leonardi,
A. Sciaba',
O. Maroney,
I. Augustin,
E. Laure,
M. Schulz,
H. Stockinger,
V. Lefebure,
S. Burke,
J. J. Blaising
, et al. (5 additional authors not shown)
Abstract:
Starting in the middle of November 2002, the CMS experiment undertook an evaluation of the European DataGrid Project (EDG) middleware using its event simulation programs. A joint CMS-EDG task force performed a "stress test" by submitting a large number of jobs to many distributed sites. The EDG testbed was complemented with additional CMS-dedicated resources. A total of ~ 10000 jobs consisting o…
▽ More
Starting in the middle of November 2002, the CMS experiment undertook an evaluation of the European DataGrid Project (EDG) middleware using its event simulation programs. A joint CMS-EDG task force performed a "stress test" by submitting a large number of jobs to many distributed sites. The EDG testbed was complemented with additional CMS-dedicated resources. A total of ~ 10000 jobs consisting of two different computational types were submitted from four different locations in Europe over a period of about one month. Nine sites were active, providing integrated resources of more than 500 CPUs and about 5 TB of disk space (with the additional use of two Mass Storage Systems). Descriptions of the adopted procedures, the problems encountered and the corresponding solutions are reported. Results and evaluations of the test, both from the CMS and the EDG perspectives, are described.
△ Less
Submitted 4 June, 2003;
originally announced June 2003.