-
Automated, Reliable, and Efficient Continental-Scale Replication of 7.3 Petabytes of Climate Simulation Data: A Case Study
Authors:
Lukasz Lacinski,
Lee Liming,
Steven Turoscy,
Cameron Harr,
Kyle Chard,
Eli Dart,
Paul Durack,
Sasha Ames,
Forrest M. Hoffman,
Ian T. Foster
Abstract:
We report on our experiences replicating 7.3 petabytes (PB) of Earth System Grid Federation (ESGF) climate simulation data from Lawrence Livermore National Laboratory (LLNL) in California to Argonne National Laboratory (ANL) in Illinois and Oak Ridge National Laboratory (ORNL) in Tennessee. This movement of some 29 million files, twice, undertaken in order to establish new ESGF nodes at ANL and OR…
▽ More
We report on our experiences replicating 7.3 petabytes (PB) of Earth System Grid Federation (ESGF) climate simulation data from Lawrence Livermore National Laboratory (LLNL) in California to Argonne National Laboratory (ANL) in Illinois and Oak Ridge National Laboratory (ORNL) in Tennessee. This movement of some 29 million files, twice, undertaken in order to establish new ESGF nodes at ANL and ORNL, was performed largely automatically by a simple replication tool, a script that invoked Globus to transfer large bundles of files while tracking progress in a database. Under the covers, Globus organized transfers to make efficient use of the high-speed Energy Sciences network (ESnet) and the data transfer nodes deployed at participating sites, and also addressed security, integrity checking, and recovery from a variety of transient failures. This success demonstrates the considerable benefits that can accrue from the adoption of performant data replication infrastructure.
△ Less
Submitted 30 April, 2024;
originally announced April 2024.
-
A Checklist to Publish Collections as Data in GLAM Institutions
Authors:
Gustavo Candela,
Nele Gabriƫls,
Sally Chambers,
Thuy-An Pham,
Sarah Ames,
Neil Fitzgerald,
Katrine Hofmann,
Victor Harbo,
Abigail Potter,
Meghan Ferriter,
Eileen Manchester,
Alba Irollo,
Ellen Van Keer,
Mahendra Mahey,
Olga Holownia,
Milena Dobreva
Abstract:
Large-scale digitization in Galleries, Libraries, Archives and Museums (GLAM) created the conditions for providing access to collections as data. It opened new opportunities to explore, use and reuse digital collections. Strong proponents of collections as data are the Innovation Labs which provided numerous examples of publishing datasets under open licenses in order to reuse digital content in n…
▽ More
Large-scale digitization in Galleries, Libraries, Archives and Museums (GLAM) created the conditions for providing access to collections as data. It opened new opportunities to explore, use and reuse digital collections. Strong proponents of collections as data are the Innovation Labs which provided numerous examples of publishing datasets under open licenses in order to reuse digital content in novel and creative ways. Within the current transition to the emerging data spaces, clouds for cultural heritage and open science, the need to identify practices which support more GLAM institutions to offer datasets becomes a priority, especially within the smaller and medium-sized institutions.
This paper answers the need to support GLAM institutions in facilitating the transition into publishing their digital content and to introduce collections as data services; this will also help their future efficient contribution to data spaces and cultural heritage clouds. It offers a checklist that can be used for both creating and evaluating digital collections suitable for computational use. The main contributions of this paper are i) a methodology for devising a checklist to create and assess digital collections for computational use; ii) a checklist to create and assess digital collections suitable for use with computational methods; iii) the assessment of the checklist against the practice of institutions innovating in the Collections as data field; and iv) the results obtained after the application and recommendations for the use of the checklist in GLAM institutions.
△ Less
Submitted 13 November, 2023; v1 submitted 5 April, 2023;
originally announced April 2023.
-
Distributed Resources for the Earth System Grid Advanced Management (DREAM)
Authors:
Luca Cinquini,
Steve Petruzza,
Jason Jerome Boutte,
Sasha Ames,
Ghaleb Abdulla,
Venkatramani Balaji,
Robert Ferraro,
Aparna Radhakrishnan,
Laura Carriere,
Thomas Maxwell,
Giorgio Scorzelli,
Valerio Pascucci
Abstract:
The DREAM project was funded more than 3 years ago to design and implement a next-generation ESGF (Earth System Grid Federation [1]) architecture which would be suitable for managing and accessing data and services resources on a distributed and scalable environment. In particular, the project intended to focus on the computing and visualization capabilities of the stack, which at the time were ra…
▽ More
The DREAM project was funded more than 3 years ago to design and implement a next-generation ESGF (Earth System Grid Federation [1]) architecture which would be suitable for managing and accessing data and services resources on a distributed and scalable environment. In particular, the project intended to focus on the computing and visualization capabilities of the stack, which at the time were rather primitive. At the beginning, the team had the general notion that a better ESGF architecture could be built by modularizing each component, and redefining its interaction with other components by defining and exposing a well defined API. Although this was still the high level principle that guided the work, the DREAM project was able to accomplish its goals by leveraging new practices in IT that started just about 3 or 4 years ago: the advent of containerization technologies (specifically, Docker), the development of frameworks to manage containers at scale (Docker Swarm and Kubernetes), and their application to the commercial Cloud. Thanks to these new technologies, DREAM was able to improve the ESGF architecture (including its computing and visualization services) to a level of deployability and scalability beyond the original expectations.
△ Less
Submitted 13 April, 2020;
originally announced April 2020.