Dashboard Task Monitor for Managing ATLAS User Analysis on the Grid
Authors:
L Sargsyan,
J Andreeva,
M Jha,
E Karavakis,
L Kokoszkiewicz,
P Saiz,
J Schovancova,
D Tuckett
Abstract:
The organization of the distributed user analysis on the Worldwide LHC Computing Grid (WLCG) infrastructure is one of the most challenging tasks among the computing activities at the Large Hadron Collider. The Experiment Dashboard offers a solution that not only monitors but also manages (kill, resubmit) user tasks and jobs via a web interface. The ATLAS Dashboard Task Monitor provides analysis us…
▽ More
The organization of the distributed user analysis on the Worldwide LHC Computing Grid (WLCG) infrastructure is one of the most challenging tasks among the computing activities at the Large Hadron Collider. The Experiment Dashboard offers a solution that not only monitors but also manages (kill, resubmit) user tasks and jobs via a web interface. The ATLAS Dashboard Task Monitor provides analysis users with a tool that is independent of the operating system and Grid environment. This contribution describes the functionality of the application and its implementation details, in particular authentication, authorization and audit of the management operations.
△ Less
Submitted 30 May, 2019;
originally announced June 2019.
Automating ATLAS Computing Operations using the Site Status Board
Authors:
Julia Andreeva,
Carlos Borrego Iglesias,
Simone Campana,
Alessandro Di Girolamo,
Ivan Dzhunov,
Xavier Espinal Curull,
Stavro Gayazov,
Erekle Magradze,
Michal Maciej Nowotka,
Lorenzo Rinaldi,
Pablo Saiz,
Jaroslava Schovancova,
Graeme Andrew Stewart,
Michael Wright
Abstract:
The automation of operations is essential to reduce manpower costs and improve the reliability of the system. The Site Status Board (SSB) is a framework which allows Virtual Organizations to monitor their computing activities at distributed sites and to evaluate site performance. The ATLAS experiment intensively uses the SSB for the distributed computing shifts, for estimating data processing and…
▽ More
The automation of operations is essential to reduce manpower costs and improve the reliability of the system. The Site Status Board (SSB) is a framework which allows Virtual Organizations to monitor their computing activities at distributed sites and to evaluate site performance. The ATLAS experiment intensively uses the SSB for the distributed computing shifts, for estimating data processing and data transfer efficiencies at a particular site, and for implementing automatic exclusion of sites from computing activities, in case of potential problems. The ATLAS SSB provides a real-time aggregated monitoring view and keeps the history of the monitoring metrics. Based on this history, usability of a site from the perspective of ATLAS is calculated. The paper will describe how the SSB is integrated in the ATLAS operations and computing infrastructure and will cover implementation details of the ATLAS SSB sensors and alarm system, based on the information in the SSB. It will demonstrate the positive impact of the use of the SSB on the overall performance of ATLAS computing activities and will overview future plans.
△ Less
Submitted 28 January, 2013; v1 submitted 1 January, 2013;
originally announced January 2013.