-
Coffea-Casa: Building composable analysis facilities for the HL-LHC
Authors:
Sam Albin,
Garhan Attebury,
Kenneth Bloom,
Brian Bockelman,
Carl Lundstedt,
Oksana Shadura,
John Thiltges
Abstract:
The large data volumes expected from the High Luminosity LHC (HL-LHC) present challenges to existing paradigms and facilities for end-user data analysis. Modern cyberinfrastructure tools provide a diverse set of services that can be composed into a system that provides physicists with powerful tools that give them straightforward access to large computing resources, with low barriers to entry. The…
▽ More
The large data volumes expected from the High Luminosity LHC (HL-LHC) present challenges to existing paradigms and facilities for end-user data analysis. Modern cyberinfrastructure tools provide a diverse set of services that can be composed into a system that provides physicists with powerful tools that give them straightforward access to large computing resources, with low barriers to entry. The Coffea-Casa analysis facility (AF) provides an environment for end users enabling the execution of increasingly complex analyses such as those demonstrated by the Analysis Grand Challenge (AGC) and capturing the features that physicists will need for the HL-LHC.
We describe the development progress of the Coffea-Casa facility featuring its modularity while demonstrating the ability to port and customize the facility software stack to other locations. The facility also facilitates the support of batch systems while staying Kubernetes-native. We present the evolved architecture of the facility, such as the integration of advanced data delivery services (e.g. ServiceX) and making data caching services (e.g. XCache) available to end users of the facility. We also highlight the composability of modern cyberinfrastructure tools. To enable machine learning pipelines at coffee-casa analysis facilities, a set of industry ML solutions adopted for HEP columnar analysis were integrated on top of existing facility services. These services also feature transparent access for user workflows to GPUs available at a facility via inference servers while using Kubernetes as enabling technology.
△ Less
Submitted 30 November, 2023;
originally announced December 2023.
-
Snowmass 2021 Computational Frontier CompF4 Topical Group Report: Storage and Processing Resource Access
Authors:
W. Bhimji,
D. Carder,
E. Dart,
J. Duarte,
I. Fisk,
R. Gardner,
C. Guok,
B. Jayatilaka,
T. Lehman,
M. Lin,
C. Maltzahn,
S. McKee,
M. S. Neubauer,
O. Rind,
O. Shadura,
N. V. Tran,
P. van Gemmeren,
G. Watts,
B. A. Weaver,
F. Würthwein
Abstract:
Computing plays a significant role in all areas of high energy physics. The Snowmass 2021 CompF4 topical group's scope is facilities R&D, where we consider "facilities" as the computing hardware and software infrastructure inside the data centers plus the networking between data centers, irrespective of who owns them, and what policies are applied for using them. In other words, it includes commer…
▽ More
Computing plays a significant role in all areas of high energy physics. The Snowmass 2021 CompF4 topical group's scope is facilities R&D, where we consider "facilities" as the computing hardware and software infrastructure inside the data centers plus the networking between data centers, irrespective of who owns them, and what policies are applied for using them. In other words, it includes commercial clouds, federally funded High Performance Computing (HPC) systems for all of science, and systems funded explicitly for a given experimental or theoretical program. This topical group report summarizes the findings and recommendations for the storage, processing, networking and associated software service infrastructures for future high energy physics research, based on the discussions organized through the Snowmass 2021 community study.
△ Less
Submitted 29 September, 2022; v1 submitted 19 September, 2022;
originally announced September 2022.
-
Collaborative Computing Support for Analysis Facilities Exploiting Software as Infrastructure Techniques
Authors:
Maria Acosta Flechas,
Garhan Attebury,
Kenneth Bloom,
Brian Bockelman,
Lindsey Gray,
Burt Holzman,
Carl Lundstedt,
Oksana Shadura,
Nicholas Smith,
John Thiltges
Abstract:
Prior to the public release of Kubernetes it was difficult to conduct joint development of elaborate analysis facilities due to the highly non-homogeneous nature of hardware and network topology across compute facilities. However, since the advent of systems like Kubernetes and OpenShift, which provide declarative interfaces for building fault-tolerant and self-healing deployments of networked sof…
▽ More
Prior to the public release of Kubernetes it was difficult to conduct joint development of elaborate analysis facilities due to the highly non-homogeneous nature of hardware and network topology across compute facilities. However, since the advent of systems like Kubernetes and OpenShift, which provide declarative interfaces for building fault-tolerant and self-healing deployments of networked software, it is possible for multiple institutes to collaborate more effectively since resource details are abstracted away through various forms of hardware and software virtualization. In this whitepaper we will outline the development of two analysis facilities: "Coffea-casa" at University of Nebraska Lincoln and the "Elastic Analysis Facility" at Fermilab, and how utilizing platform abstraction has improved the development of common software for each of these facilities, and future development plans made possible by this methodology.
△ Less
Submitted 22 March, 2022; v1 submitted 18 March, 2022;
originally announced March 2022.
-
Coffea-casa: an analysis facility prototype
Authors:
Matous Adamec,
Garhan Attebury,
Kenneth Bloom,
Brian Bockelman,
Carl Lundstedt,
Oksana Shadura,
John Thiltges
Abstract:
Data analysis in HEP has often relied on batch systems and event loops; users are given a non-interactive interface to computing resources and consider data event-by-event. The "Coffea-casa" prototype analysis facility is an effort to provide users with alternate mechanisms to access computing resources and enable new programming paradigms. Instead of the command-line interface and asynchronous ba…
▽ More
Data analysis in HEP has often relied on batch systems and event loops; users are given a non-interactive interface to computing resources and consider data event-by-event. The "Coffea-casa" prototype analysis facility is an effort to provide users with alternate mechanisms to access computing resources and enable new programming paradigms. Instead of the command-line interface and asynchronous batch access, a notebook-based web interface and interactive computing is provided. Instead of writing event loops, the column-based Coffea library is used. In this paper, we describe the architectural components of the facility, the services offered to end-users, and how it integrates into a larger ecosystem for data access and authentication.
△ Less
Submitted 29 June, 2021; v1 submitted 2 March, 2021;
originally announced March 2021.
-
ROOT I/O compression improvements for HEP analysis
Authors:
Oksana Shadura,
Brian Paul Bockelman,
Philippe Canal,
Danilo Piparo,
Zhe Zhang
Abstract:
We overview recent changes in the ROOT I/O system, increasing performance and enhancing it and improving its interaction with other data analysis ecosystems. Both the newly introduced compression algorithms, the much faster bulk I/O data path, and a few additional techniques have the potential to significantly to improve experiment's software performance. The need for efficient lossless data compr…
▽ More
We overview recent changes in the ROOT I/O system, increasing performance and enhancing it and improving its interaction with other data analysis ecosystems. Both the newly introduced compression algorithms, the much faster bulk I/O data path, and a few additional techniques have the potential to significantly to improve experiment's software performance. The need for efficient lossless data compression has grown significantly as the amount of HEP data collected, transmitted, and stored has dramatically increased during the LHC era. While compression reduces storage space and, potentially, I/O bandwidth usage, it should not be applied blindly: there are significant trade-offs between the increased CPU cost for reading and writing files and the reduce storage space.
△ Less
Submitted 8 April, 2020;
originally announced April 2020.
-
C++ Modules in ROOT and Beyond
Authors:
Vassil Vassilev,
David Lange,
Malik Shahzad Muzaffar,
Mircho Rodozov,
Oksana Shadura,
Alexander Penev
Abstract:
C++ Modules come in C++20 to fix the long-standing build scalability problems in the language. They provide an io-efficient, on-disk representation capable to reduce build times and peak memory usage. ROOT employs the C++ modules technology further in the ROOT dictionary system to improve its performance and reduce the memory footprint.
ROOT with C++ Modules was released as a technology preview…
▽ More
C++ Modules come in C++20 to fix the long-standing build scalability problems in the language. They provide an io-efficient, on-disk representation capable to reduce build times and peak memory usage. ROOT employs the C++ modules technology further in the ROOT dictionary system to improve its performance and reduce the memory footprint.
ROOT with C++ Modules was released as a technology preview in fall 2018, after intensive development during the last few years. The current state is ready for production, however, there is still room for performance optimizations. In this talk, we show the roadmap for making the technology default in ROOT. We demonstrate a global module indexing optimization which allows reducing the memory footprint dramatically for many workflows. We will report user feedback on the migration to ROOT with C++ Modules.
△ Less
Submitted 25 August, 2020; v1 submitted 11 April, 2020;
originally announced April 2020.
-
Automatic Differentiation in ROOT
Authors:
Vassil Vassilev,
Aleksandr Efremov,
Oksana Shadura
Abstract:
In mathematics and computer algebra, automatic differentiation (AD) is a set of techniques to evaluate the derivative of a function specified by a computer program. AD exploits the fact that every computer program, no matter how complicated, executes a sequence of elementary arithmetic operations (addition, subtraction, multiplication, division, etc.), elementary functions (exp, log, sin, cos, etc…
▽ More
In mathematics and computer algebra, automatic differentiation (AD) is a set of techniques to evaluate the derivative of a function specified by a computer program. AD exploits the fact that every computer program, no matter how complicated, executes a sequence of elementary arithmetic operations (addition, subtraction, multiplication, division, etc.), elementary functions (exp, log, sin, cos, etc.) and control flow statements. AD takes source code of a function as input and produces source code of the derived function. By applying the chain rule repeatedly to these operations, derivatives of arbitrary order can be computed automatically, accurately to working precision, and using at most a small constant factor more arithmetic operations than the original program.
This paper presents AD techniques available in ROOT, supported by Cling, to produce derivatives of arbitrary C/C++ functions through implementing source code transformation and employing the chain rule of differential calculus in both forward mode and reverse mode. We explain its current integration for gradient computation in TFormula. We demonstrate the correctness and performance improvements in ROOT's fitting algorithms.
△ Less
Submitted 9 April, 2020;
originally announced April 2020.
-
Speeding HEP Analysis with ROOT Bulk I/O
Authors:
Brian Bockelman,
Zhe Zhang,
Oksana Shadura
Abstract:
Distinct HEP workflows have distinct I/O needs; while ROOT I/O excels at serializing complex C++ objects common to reconstruction, analysis workflows typically have simpler objects and can sustain higher event rates. To meet these workflows, we have developed a "bulk I/O" interface, allowing multiple events data to be returned per library call. This reduces ROOT-related overheads and increases eve…
▽ More
Distinct HEP workflows have distinct I/O needs; while ROOT I/O excels at serializing complex C++ objects common to reconstruction, analysis workflows typically have simpler objects and can sustain higher event rates. To meet these workflows, we have developed a "bulk I/O" interface, allowing multiple events data to be returned per library call. This reduces ROOT-related overheads and increases event rates - orders-of-magnitude improvements are shown in microbenchmarks. Unfortunately, this bulk interface is difficult to use as it requires users to identify when it is applicable and they still "think" in terms of events, not arrays of data. We have integrated the bulk I/O interface into the new RDataFrame analysis framework inside ROOT. As RDataFrame's interface can provide improved type information, the framework itself can determine what data is readable via the bulk IO and automatically switch between interfaces. We demonstrate how this can improve event rates when reading analysis data formats, such as CMS's NanoAOD.
△ Less
Submitted 11 June, 2019;
originally announced June 2019.
-
Migrating large codebases to C++ Modules
Authors:
Yuka Takahashi,
Oksana Shadura,
Vassil Vassilev
Abstract:
ROOT has several features which interact with libraries and require implicit header inclusion. This can be triggered by reading or writing data on disk, or user actions at the prompt. Often, the headers are immutable, and reparsing is redundant. C++ Modules are designed to minimize the reparsing of the same header content by providing an efficient on-disk representation of C++ Code. ROOT has relea…
▽ More
ROOT has several features which interact with libraries and require implicit header inclusion. This can be triggered by reading or writing data on disk, or user actions at the prompt. Often, the headers are immutable, and reparsing is redundant. C++ Modules are designed to minimize the reparsing of the same header content by providing an efficient on-disk representation of C++ Code. ROOT has released a C++ Modules-aware technology preview which intends to become the default for the next release.
In this paper, we will summarize our experience with migrating C++ Modules to LHC experiment's software code bases. We outline the challenges in C++ Modules migration of the CMS software, including the integration of C++ Modules support in CMS build system. We also evaluate the performance benefits that experiments are expected to achieve.
△ Less
Submitted 22 August, 2019; v1 submitted 12 June, 2019;
originally announced June 2019.
-
ROOT I/O compression algorithms and their performance impact within Run 3
Authors:
Oksana Shadura,
Brian Paul Bockelman
Abstract:
The LHCs Run3 will push the envelope on data-intensive workflows and, since at the lowest level this data is managed using the ROOT software framework, preparations for managing this data are starting already. At the beginning of LHC Run 1, all ROOT data was compressed with the ZLIB algorithm; since then, ROOT has added support for additional algorithms such as LZMA and LZ4, each with unique stren…
▽ More
The LHCs Run3 will push the envelope on data-intensive workflows and, since at the lowest level this data is managed using the ROOT software framework, preparations for managing this data are starting already. At the beginning of LHC Run 1, all ROOT data was compressed with the ZLIB algorithm; since then, ROOT has added support for additional algorithms such as LZMA and LZ4, each with unique strengths. This work must continue as industry introduces new techniques - ROOT can benefit saving disk space or reducing the I/O and bandwidth for online and offline needs of experiments by introducing better compression algorithms. In addition to alternate algorithms, we have been exploring alternate techniques to improve parallelism and apply pre-conditioners to the serialized data.
We have performed a survey of the performance of the new compression techniques. Our survey includes various use cases of data compression of ROOT files provided by different LHC experiments. We also provide insight into solutions applied to resolve bottlenecks in compression algorithms, resulting in improved ROOT performance.
△ Less
Submitted 2 August, 2019; v1 submitted 11 June, 2019;
originally announced June 2019.
-
Evolution of ROOT package management
Authors:
Oksana Shadura,
Brian Paul Bockelman,
Vassil Vassilev
Abstract:
ROOT is a large code base with a complex set of build-time dependencies; there is a significant difference in compilation time between the "core" of ROOT and the full-fledged deployment. We present results on a "delayed build" for internal ROOT packages and external packages. This gives the ability to offer a "lightweight" core of ROOT, later extended by building additional modules to extend the f…
▽ More
ROOT is a large code base with a complex set of build-time dependencies; there is a significant difference in compilation time between the "core" of ROOT and the full-fledged deployment. We present results on a "delayed build" for internal ROOT packages and external packages. This gives the ability to offer a "lightweight" core of ROOT, later extended by building additional modules to extend the functionality of ROOT. As a part of this work, we have improved the separation of ROOT code into distinct modules and packages with minimal dependencies. This approach gives users better flexibility and the possibility to combine various build features without rebuilding from scratch.
Dependency hell is a common problem found in software and particularly in HEP software ecosystem. We would like to discuss an improvement of artifact management ("lazy-install") system as a solution to the "dependency hell" problem. HEP software stack usually consists of multiple sub-projects with dependencies. The development model is often distributed, independent and non-coherent among the sub-projects. We believe that software should be designed to take advantage of other software components that are already available, or have already been designed and implemented for use elsewhere rather than "reinventing the wheel". In our contribution, we will present our approach to artifact management system of ROOT together with a set of examples and use cases.
△ Less
Submitted 11 June, 2019;
originally announced June 2019.
-
Optimizing Frameworks Performance Using C++ Modules Aware ROOT
Authors:
Yuka Takahashi,
Vassil Vassilev,
Oksana Shadura,
Raphael Isemann
Abstract:
ROOT is a data analysis framework broadly used in and outside of High Energy Physics (HEP). Since HEP software frameworks always strive for performance improvements, ROOT was extended with experimental support of runtime C++ Modules. C++ Modules are designed to improve the performance of C++ code parsing. C++ Modules offers a promising way to improve ROOT's runtime performance by saving the C++ he…
▽ More
ROOT is a data analysis framework broadly used in and outside of High Energy Physics (HEP). Since HEP software frameworks always strive for performance improvements, ROOT was extended with experimental support of runtime C++ Modules. C++ Modules are designed to improve the performance of C++ code parsing. C++ Modules offers a promising way to improve ROOT's runtime performance by saving the C++ header parsing time which happens during ROOT runtime. This paper presents the results and challenges of integrating C++ Modules into ROOT.
△ Less
Submitted 17 May, 2019; v1 submitted 10 December, 2018;
originally announced December 2018.
-
Continuous Performance Benchmarking Framework for ROOT
Authors:
Oksana Shadura,
Vassil Vassilev,
Brian Paul Bockelman
Abstract:
Foundational software libraries such as ROOT are under intense pressure to avoid software regression, including performance regressions. Continuous performance benchmarking, as a part of continuous integration and other code quality testing, is an industry best-practice to understand how the performance of a software product evolves over time. We present a framework, built from industry best pract…
▽ More
Foundational software libraries such as ROOT are under intense pressure to avoid software regression, including performance regressions. Continuous performance benchmarking, as a part of continuous integration and other code quality testing, is an industry best-practice to understand how the performance of a software product evolves over time. We present a framework, built from industry best practices and tools, to help to understand ROOT code performance and monitor the efficiency of the code for a several processor architectures. It additionally allows historical performance measurements for ROOT I/O, vectorization and parallelization sub-systems.
△ Less
Submitted 21 February, 2019; v1 submitted 7 December, 2018;
originally announced December 2018.
-
Extending ROOT through Modules
Authors:
Oksana Shadura,
Brian Paul Bockelman,
Vassil Vassilev
Abstract:
The ROOT software framework is foundational for the HEP ecosystem, providing capabilities such as IO, a C++ interpreter, GUI, and math libraries. It uses object-oriented concepts and build-time components to layer between them. We believe additional layering formalisms will benefit ROOT and its users. We present the modularization strategy for ROOT which aims to formalize the description of existi…
▽ More
The ROOT software framework is foundational for the HEP ecosystem, providing capabilities such as IO, a C++ interpreter, GUI, and math libraries. It uses object-oriented concepts and build-time components to layer between them. We believe additional layering formalisms will benefit ROOT and its users. We present the modularization strategy for ROOT which aims to formalize the description of existing source components, making available the dependencies and other metadata externally from the build system, and allow post-install additions of functionality in the runtime environment. components can then be grouped into packages, installable from external repositories to deliver post-install step of missing packages. This provides a mechanism for the wider software ecosystem to interact with a minimalistic install. Reducing intra-component dependencies improves maintainability and code hygiene. We believe hel** maintain the smallest "base install" possible will help embedding use cases. The modularization effort draws inspiration from the Java, Python, and Swift ecosystems. Kee** aligned with the modern C++, this strategy relies on forthcoming features such as C++ modules. We hope formalizing the component layer will provide simpler ROOT installs, improve extensibility, and decrease the complexity of embedding in other ecosystems
△ Less
Submitted 11 December, 2018; v1 submitted 7 December, 2018;
originally announced December 2018.