-
Dashboard Task Monitor for Managing ATLAS User Analysis on the Grid
Authors:
L Sargsyan,
J Andreeva,
M Jha,
E Karavakis,
L Kokoszkiewicz,
P Saiz,
J Schovancova,
D Tuckett
Abstract:
The organization of the distributed user analysis on the Worldwide LHC Computing Grid (WLCG) infrastructure is one of the most challenging tasks among the computing activities at the Large Hadron Collider. The Experiment Dashboard offers a solution that not only monitors but also manages (kill, resubmit) user tasks and jobs via a web interface. The ATLAS Dashboard Task Monitor provides analysis us…
▽ More
The organization of the distributed user analysis on the Worldwide LHC Computing Grid (WLCG) infrastructure is one of the most challenging tasks among the computing activities at the Large Hadron Collider. The Experiment Dashboard offers a solution that not only monitors but also manages (kill, resubmit) user tasks and jobs via a web interface. The ATLAS Dashboard Task Monitor provides analysis users with a tool that is independent of the operating system and Grid environment. This contribution describes the functionality of the application and its implementation details, in particular authentication, authorization and audit of the management operations.
△ Less
Submitted 30 May, 2019;
originally announced June 2019.
-
ATLAS job monitoring in the Dashboard Framework
Authors:
L Sargsyan,
J Andreeva,
S Campana,
E Karavakis,
L Kokoszkiewicz,
P Saiz,
J Schovancova,
D Tuckett
Abstract:
Monitoring of the large-scale data processing of the ATLAS experiment includes monitoring of production and user analysis jobs. The Experiment Dashboard provides a common job monitoring solution, which is shared by ATLAS and CMS experiments. This includes an accounting portal as well as real-time monitoring. Dashboard job monitoring for ATLAS combines information from the PanDA job processing data…
▽ More
Monitoring of the large-scale data processing of the ATLAS experiment includes monitoring of production and user analysis jobs. The Experiment Dashboard provides a common job monitoring solution, which is shared by ATLAS and CMS experiments. This includes an accounting portal as well as real-time monitoring. Dashboard job monitoring for ATLAS combines information from the PanDA job processing database, Production system database and monitoring information from jobs submitted through GANGA to Workload Management System (WMS) or local batch systems. Usage of Dashboard-based job monitoring applications will decrease load on the PanDA database and overcome scale limitations in PanDA monitoring caused by the short job rotation cycle in the PanDA database. Aggregation of the task/job metrics from different sources provides complete view of job processing activity in ATLAS scope.
△ Less
Submitted 30 May, 2019;
originally announced May 2019.
-
Automating ATLAS Computing Operations using the Site Status Board
Authors:
Julia Andreeva,
Carlos Borrego Iglesias,
Simone Campana,
Alessandro Di Girolamo,
Ivan Dzhunov,
Xavier Espinal Curull,
Stavro Gayazov,
Erekle Magradze,
Michal Maciej Nowotka,
Lorenzo Rinaldi,
Pablo Saiz,
Jaroslava Schovancova,
Graeme Andrew Stewart,
Michael Wright
Abstract:
The automation of operations is essential to reduce manpower costs and improve the reliability of the system. The Site Status Board (SSB) is a framework which allows Virtual Organizations to monitor their computing activities at distributed sites and to evaluate site performance. The ATLAS experiment intensively uses the SSB for the distributed computing shifts, for estimating data processing and…
▽ More
The automation of operations is essential to reduce manpower costs and improve the reliability of the system. The Site Status Board (SSB) is a framework which allows Virtual Organizations to monitor their computing activities at distributed sites and to evaluate site performance. The ATLAS experiment intensively uses the SSB for the distributed computing shifts, for estimating data processing and data transfer efficiencies at a particular site, and for implementing automatic exclusion of sites from computing activities, in case of potential problems. The ATLAS SSB provides a real-time aggregated monitoring view and keeps the history of the monitoring metrics. Based on this history, usability of a site from the perspective of ATLAS is calculated. The paper will describe how the SSB is integrated in the ATLAS operations and computing infrastructure and will cover implementation details of the ATLAS SSB sensors and alarm system, based on the information in the SSB. It will demonstrate the positive impact of the use of the SSB on the overall performance of ATLAS computing activities and will overview future plans.
△ Less
Submitted 28 January, 2013; v1 submitted 1 January, 2013;
originally announced January 2013.
-
AliEn - EDG Interoperability in ALICE
Authors:
S. Bagnasco,
R. Barbera,
P. Buncic,
F. Carminati,
P. Cerello,
P. Saiz
Abstract:
AliEn (ALICE Environment) is a GRID-like system for large scale job submission and distributed data management developed and used in the context of ALICE, the CERN LHC heavy-ion experiment. With the aim of exploiting upcoming Grid resources to run AliEn-managed jobs and store the produced data, the problem of AliEn-EDG interoperability was addressed and an in-terface was designed. One or more ED…
▽ More
AliEn (ALICE Environment) is a GRID-like system for large scale job submission and distributed data management developed and used in the context of ALICE, the CERN LHC heavy-ion experiment. With the aim of exploiting upcoming Grid resources to run AliEn-managed jobs and store the produced data, the problem of AliEn-EDG interoperability was addressed and an in-terface was designed. One or more EDG (European Data Grid) User Interface machines run the AliEn software suite (Cluster Monitor, Storage Element and Computing Element), and act as interface nodes between the systems. An EDG Resource Broker is seen by the AliEn server as a single Computing Element, while the EDG storage is seen by AliEn as a single, large Storage Element; files produced in EDG sites are registered in both the EDG Replica Catalogue and in the AliEn Data Catalogue, thus ensuring accessibility from both worlds. In fact, both registrations are required: the AliEn one is used for the data management, the EDG one to guarantee the integrity and access to EDG produced data. A prototype interface has been successfully deployed using the ALICE AliEn Server and the EDG and DataTAG Testbeds.
△ Less
Submitted 13 June, 2003;
originally announced June 2003.
-
The MammoGrid Project Grids Architecture
Authors:
Richard McClatchey,
Predrag Buncic,
David Manset,
Tamas Hauer,
Florida Estrella,
Pablo Saiz,
Dmitri Rogulin
Abstract:
The aim of the recently EU-funded MammoGrid project is, in the light of emerging Grid technology, to develop a European-wide database of mammograms that will be used to develop a set of important healthcare applications and investigate the potential of this Grid to support effective co-working between healthcare professionals throughout the EU. The MammoGrid consortium intends to use a Grid mode…
▽ More
The aim of the recently EU-funded MammoGrid project is, in the light of emerging Grid technology, to develop a European-wide database of mammograms that will be used to develop a set of important healthcare applications and investigate the potential of this Grid to support effective co-working between healthcare professionals throughout the EU. The MammoGrid consortium intends to use a Grid model to enable distributed computing that spans national borders. This Grid infrastructure will be used for deploying novel algorithms as software directly developed or enhanced within the project. Using the MammoGrid clinicians will be able to harness the use of massive amounts of medical image data to perform epidemiological studies, advanced image processing, radiographic education and ultimately, tele-diagnosis over communities of medical "virtual organisations". This is achieved through the use of Grid-compliant services [1] for managing (versions of) massively distributed files of mammograms, for handling the distributed execution of mammograms analysis software, for the development of Grid-aware algorithms and for the sharing of resources between multiple collaborating medical centres. All this is delivered via a novel software and hardware information infrastructure that, in addition guarantees the integrity and security of the medical data. The MammoGrid implementation is based on AliEn, a Grid framework developed by the ALICE Collaboration. AliEn provides a virtual file catalogue that allows transparent access to distributed data-sets and provides top to bottom implementation of a lightweight Grid applicable to cases when handling of a large number of files is required. This paper details the architecture that will be implemented by the MammoGrid project.
△ Less
Submitted 16 June, 2003;
originally announced June 2003.
-
AliEnFS - a Linux File System for the AliEn Grid Services
Authors:
Andreas J. Peters,
P. Saiz,
P. Buncic
Abstract:
Among the services offered by the AliEn (ALICE Environment http://alien.cern.ch) Grid framework there is a virtual file catalogue to allow transparent access to distributed data-sets using various file transfer protocols. $alienfs$ (AliEn File System) integrates the AliEn file catalogue as a new file system type into the Linux kernel using LUFS, a hybrid user space file system framework (Open So…
▽ More
Among the services offered by the AliEn (ALICE Environment http://alien.cern.ch) Grid framework there is a virtual file catalogue to allow transparent access to distributed data-sets using various file transfer protocols. $alienfs$ (AliEn File System) integrates the AliEn file catalogue as a new file system type into the Linux kernel using LUFS, a hybrid user space file system framework (Open Source http://lufs.sourceforge.net). LUFS uses a special kernel interface level called VFS (Virtual File System Switch) to communicate via a generalised file system interface to the AliEn file system daemon. The AliEn framework is used for authentication, catalogue browsing, file registration and read/write transfer operations. A C++ API implements the generic file system operations. The goal of AliEnFS is to allow users easy interactive access to a worldwide distributed virtual file system using familiar shell commands (f.e. cp,ls,rm ...) The paper discusses general aspects of Grid File Systems, the AliEn implementation and present and future developments for the AliEn Grid File System.
△ Less
Submitted 13 June, 2003;
originally announced June 2003.
-
AliEn Resource Brokers
Authors:
Pablo Saiz,
Predrag Buncic,
Andreas J. Peters
Abstract:
AliEn (ALICE Environment) is a lightweight GRID framework developed by the Alice Collaboration. When the experiment starts running, it will collect data at a rate of approximately 2 PB per year, producing O(109) files per year. All these files, including all simulated events generated during the preparation phase of the experiment, must be accounted and reliably tracked in the GRID environment.…
▽ More
AliEn (ALICE Environment) is a lightweight GRID framework developed by the Alice Collaboration. When the experiment starts running, it will collect data at a rate of approximately 2 PB per year, producing O(109) files per year. All these files, including all simulated events generated during the preparation phase of the experiment, must be accounted and reliably tracked in the GRID environment. The backbone of AliEn is a distributed file catalogue, which associates universal logical file name to physical file names for each dataset and provides transparent access to datasets independently of physical location. The file replication and transport is carried out under the control of the File Transport Broker. In addition, the file catalogue maintains information about every job running in the system. The jobs are distributed by the Job Resource Broker that is implemented using a simplified pull (as opposed to traditional push) architecture. This paper describes the Job and File Transport Resource Brokers and shows that a similar architecture can be applied to solve both problems.
△ Less
Submitted 13 June, 2003;
originally announced June 2003.
-
The AliEn system, status and perspectives
Authors:
P. Buncic,
P. Saiz,
A. J. Peters
Abstract:
AliEn is a production environment that implements several components of the Grid paradigm needed to simulate, reconstruct and analyse HEP data in a distributed way. The system is built around Open Source components, uses the Web Services model and standard network protocols to implement the computing platform that is currently being used to produce and analyse Monte Carlo data at over 30 sites o…
▽ More
AliEn is a production environment that implements several components of the Grid paradigm needed to simulate, reconstruct and analyse HEP data in a distributed way. The system is built around Open Source components, uses the Web Services model and standard network protocols to implement the computing platform that is currently being used to produce and analyse Monte Carlo data at over 30 sites on four continents. The aim of this paper is to present the current AliEn architecture and outline its future developments in the light of emerging standards.
△ Less
Submitted 13 June, 2003;
originally announced June 2003.