-
Unified System for Processing Real and Simulated Data in the ATLAS Experiment
Authors:
Mikhail Borodin,
Kaushik De,
Jose Garcia Navarro,
Dmitry Golubkov,
Alexei Klimentov,
Tadashi Maeno,
David South,
Alexandre Vaniachine
Abstract:
The physics goals of the next Large Hadron Collider run include high precision tests of the Standard Model and searches for new physics. These goals require detailed comparison of data with computational models simulating the expected data behavior. To highlight the role which modeling and simulation plays in future scientific discovery, we report on use cases and experience with a unified system…
▽ More
The physics goals of the next Large Hadron Collider run include high precision tests of the Standard Model and searches for new physics. These goals require detailed comparison of data with computational models simulating the expected data behavior. To highlight the role which modeling and simulation plays in future scientific discovery, we report on use cases and experience with a unified system built to process both real and simulated data of growing volume and variety.
△ Less
Submitted 28 August, 2015;
originally announced August 2015.
-
Advancements in Big Data Processing in the ATLAS and CMS Experiments
Authors:
A. V. Vaniachine
Abstract:
The ever-increasing volumes of scientific data present new challenges for distributed computing and Grid technologies. The emerging Big Data revolution drives exploration in scientific fields including nanotechnology, astrophysics, high-energy physics, biology and medicine. New initiatives are transforming data-driven scientific fields enabling massive data analysis in new ways. In petascale data…
▽ More
The ever-increasing volumes of scientific data present new challenges for distributed computing and Grid technologies. The emerging Big Data revolution drives exploration in scientific fields including nanotechnology, astrophysics, high-energy physics, biology and medicine. New initiatives are transforming data-driven scientific fields enabling massive data analysis in new ways. In petascale data processing scientists deal with datasets, not individual files. As a result, a task (comprised of many jobs) became a unit of petascale data processing on the Grid. Splitting of a large data processing task into jobs enabled fine-granularity checkpointing analogous to the splitting of a large file into smaller TCP/IP packets during data transfers. Transferring large data in small packets achieves reliability through automatic re-sending of the dropped TCP/IP packets. Similarly, transient job failures on the Grid can be recovered by automatic re-tries to achieve reliable six sigma production quality in petascale data processing on the Grid. The computing experience of the ATLAS and CMS experiments provides foundation for reliability engineering scaling up Grid technologies for data processing beyond the petascale.
△ Less
Submitted 8 March, 2013;
originally announced March 2013.
-
LHC Databases on the Grid: Achievements and Open Issues
Authors:
A. V. Vaniachine
Abstract:
To extract physics results from the recorded data, the LHC experiments are using Grid computing infrastructure. The event data processing on the Grid requires scalable access to non-event data (detector conditions, calibrations, etc.) stored in relational databases. The database-resident data are critical for the event data reconstruction processing steps and often required for physics analysis. T…
▽ More
To extract physics results from the recorded data, the LHC experiments are using Grid computing infrastructure. The event data processing on the Grid requires scalable access to non-event data (detector conditions, calibrations, etc.) stored in relational databases. The database-resident data are critical for the event data reconstruction processing steps and often required for physics analysis. This paper reviews LHC experience with database technologies for the Grid computing. List of topics includes: database integration with Grid computing models of the LHC experiments; choice of database technologies; examples of database interfaces; distributed database applications (data complexity, update frequency, data volumes and access patterns); scalability of database access in the Grid computing environment of the LHC experiments. The review describes areas in which substantial progress was made and remaining open issues.
△ Less
Submitted 12 July, 2010;
originally announced July 2010.
-
Scalable Database Access Technologies for ATLAS Distributed Computing
Authors:
A. Vaniachine
Abstract:
ATLAS event data processing requires access to non-event data (detector conditions, calibrations, etc.) stored in relational databases. The database-resident data are crucial for the event data reconstruction processing steps and often required for user analysis. A main focus of ATLAS database operations is on the worldwide distribution of the Conditions DB data, which are necessary for every AT…
▽ More
ATLAS event data processing requires access to non-event data (detector conditions, calibrations, etc.) stored in relational databases. The database-resident data are crucial for the event data reconstruction processing steps and often required for user analysis. A main focus of ATLAS database operations is on the worldwide distribution of the Conditions DB data, which are necessary for every ATLAS data processing job. Since Conditions DB access is critical for operations with real data, we have developed the system where a different technology can be used as a redundant backup. Redundant database operations infrastructure fully satisfies the requirements of ATLAS reprocessing, which has been proven on a scale of one billion database queries during two reprocessing campaigns of 0.5 PB of single-beam and cosmics data on the Grid. To collect experience and provide input for a best choice of technologies, several promising options for efficient database access in user analysis were evaluated successfully. We present ATLAS experience with scalable database access technologies and describe our approach for prevention of database access bottlenecks in a Grid computing environment.
△ Less
Submitted 2 October, 2009; v1 submitted 1 October, 2009;
originally announced October 2009.
-
Primary Numbers Database for ATLAS Detector Description Parameters
Authors:
A. Vaniachine,
S. Eckmann,
D. Malon,
P. Nevski,
T. Wenaus
Abstract:
We present the design and the status of the database for detector description parameters in ATLAS experiment. The ATLAS Primary Numbers are the parameters defining the detector geometry and digitization in simulations, as well as certain reconstruction parameters. Since the detailed ATLAS detector description needs more than 10,000 such parameters, a preferred solution is to have a single verifi…
▽ More
We present the design and the status of the database for detector description parameters in ATLAS experiment. The ATLAS Primary Numbers are the parameters defining the detector geometry and digitization in simulations, as well as certain reconstruction parameters. Since the detailed ATLAS detector description needs more than 10,000 such parameters, a preferred solution is to have a single verified source for all these data. The database stores the data dictionary for each parameter collection object, providing schema evolution support for object-based retrieval of parameters. The same Primary Numbers are served to many different clients accessing the database: the ATLAS software framework Athena, the Geant3 heritage framework Atlsim, the Geant4 developers framework FADS/Goofy, the generator of XML output for detector description, and several end-user clients for interactive data navigation, including web-based browsers and ROOT. The choice of the MySQL database product for the implementation provides additional benefits: the Primary Numbers database can be used on the developers laptop when disconnected (using the MySQL embedded server technology), with data being updated when the laptop is connected (using the MySQL database replication).
△ Less
Submitted 16 June, 2003;
originally announced June 2003.
-
Prototy** Virtual Data Technologies in ATLAS Data Challenge 1 Production
Authors:
A. Vaniachine,
D. Malon,
P. Nevski,
K. De
Abstract:
For efficiency of the large production tasks distributed worldwide, it is essential to provide shared production management tools comprised of integratable and interoperable services. To enhance the ATLAS DC1 production toolkit, we introduced and tested a Virtual Data services component. For each major data transformation step identified in the ATLAS data processing pipeline (event generation, d…
▽ More
For efficiency of the large production tasks distributed worldwide, it is essential to provide shared production management tools comprised of integratable and interoperable services. To enhance the ATLAS DC1 production toolkit, we introduced and tested a Virtual Data services component. For each major data transformation step identified in the ATLAS data processing pipeline (event generation, detector simulation, background pile-up and digitization, etc) the Virtual Data Cookbook (VDC) catalogue encapsulates the specific data transformation knowledge and the validated parameters settings that must be provided before the data transformation invocation. To provide for local-remote transparency during DC1 production, the VDC database server delivered in a controlled way both the validated production parameters and the templated production recipes for thousands of the event generation and detector simulation jobs around the world, simplifying the production management solutions.
△ Less
Submitted 16 June, 2003;
originally announced June 2003.
-
POOL File Catalog, Collection and Metadata Components
Authors:
C. Cioffi,
S. Eckmann,
M. Girone,
J. Hrivnac,
D. Malon,
H. Schmuecker,
A. Vaniachine,
J. Wojcieszuk,
Z. Xie
Abstract:
The POOL project is the common persistency framework for the LHC experiments to store petabytes of experiment data and metadata in a distributed and grid enabled way. POOL is a hybrid event store consisting of a data streaming layer and a relational layer. This paper describes the design of file catalog, collection and metadata components which are not part of the data streaming layer of POOL an…
▽ More
The POOL project is the common persistency framework for the LHC experiments to store petabytes of experiment data and metadata in a distributed and grid enabled way. POOL is a hybrid event store consisting of a data streaming layer and a relational layer. This paper describes the design of file catalog, collection and metadata components which are not part of the data streaming layer of POOL and outlines how POOL aims to provide transparent and efficient data access for a wide range of environments and use cases - ranging from a large production site down to a single disconnected laptops. The file catalog is the central POOL component translating logical data references to physical data files in a grid environment. POOL collections with their associated metadata provide an abstract way of accessing experiment data via their logical grou** into sets of related data objects.
△ Less
Submitted 13 June, 2003;
originally announced June 2003.