The LBNL Superfacility Project Report
Authors:
Deborah Bard,
Cory Snavely,
Lisa Gerhardt,
Jason Lee,
Becci Totzke,
Katie Antypas,
William Arndt,
Johannes Blaschke,
Suren Byna,
Ravi Cheema,
Shreyas Cholia,
Mark Day,
Bjoern Enders,
Aditi Gaur,
Annette Greiner,
Taylor Groves,
Mariam Kiran,
Quincey Koziol,
Tom Lehman,
Kelly Rowland,
Chris Samuel,
Ashwin Selvarajan,
Alex Sim,
David Skinner,
Laurie Stephey
, et al. (2 additional authors not shown)
Abstract:
The Superfacility model is designed to leverage HPC for experimental science. It is more than simply a model of connected experiment, network, and HPC facilities; it encompasses the full ecosystem of infrastructure, software, tools, and expertise needed to make connected facilities easy to use. The three-year Lawrence Berkeley National Laboratory (LBNL) Superfacility project was initiated in 2019…
▽ More
The Superfacility model is designed to leverage HPC for experimental science. It is more than simply a model of connected experiment, network, and HPC facilities; it encompasses the full ecosystem of infrastructure, software, tools, and expertise needed to make connected facilities easy to use. The three-year Lawrence Berkeley National Laboratory (LBNL) Superfacility project was initiated in 2019 to coordinate work being performed at LBNL to support this model, and to provide a coherent and comprehensive set of science requirements to drive existing and new work.
A key component of the project was the in-depth engagements with eight science teams that represent challenging use cases across the DOE Office of Science. By the close of the project, we met our project goal by enabling our science application engagements to demonstrate automated pipelines that analyze data from remote facilities at large scale, without routine human intervention. In several cases, we have gone beyond demonstrations and now provide production-level services. To achieve this goal, the Superfacility team developed tools, infrastructure, and policies for near-real-time computing support, dynamic high-performance networking, data management and movement tools, API-driven automation, HPC-scale notebooks via Jupyter, authentication using Federated Identity and container-based edge services supported.
The lessons we learned during this project provide a valuable model for future large, complex, cross-disciplinary collaborations. There is a pressing need for a coherent computing infrastructure across national facilities, and LBNL's Superfacility project is a unique model for success in tackling the challenges that will be faced in hardware, software, policies, and services across multiple science domains.
△ Less
Submitted 27 June, 2022; v1 submitted 23 June, 2022;
originally announced June 2022.
The Petascale DTN Project: High Performance Data Transfer for HPC Facilities
Authors:
Eli Dart,
William Allcock,
Wahid Bhimji,
Tim Boerner,
Ravinderjeet Cheema,
Andrew Cherry,
Brent Draney,
Salman Habib,
Damian Hazen,
Jason Hill,
Matt Kollross,
Suzanne Parete-Koon,
Daniel Pelfrey,
Adrian Pope,
Jeff Porter,
David Wheeler
Abstract:
The movement of large-scale (tens of Terabytes and larger) data sets between high performance computing (HPC) facilities is an important and increasingly critical capability. A growing number of scientific collaborations rely on HPC facilities for tasks which either require large-scale data sets as input or produce large-scale data sets as output. In order to enable the transfer of these data sets…
▽ More
The movement of large-scale (tens of Terabytes and larger) data sets between high performance computing (HPC) facilities is an important and increasingly critical capability. A growing number of scientific collaborations rely on HPC facilities for tasks which either require large-scale data sets as input or produce large-scale data sets as output. In order to enable the transfer of these data sets as needed by the scientific community, HPC facilities must design and deploy the appropriate data transfer capabilities to allow users to do data placement at scale.
This paper describes the Petascale DTN Project, an effort undertaken by four HPC facilities, which succeeded in achieving routine data transfer rates of over 1PB/week between the facilities. We describe the design and configuration of the Data Transfer Node (DTN) clusters used for large-scale data transfers at these facilities, the software tools used, and the performance tuning that enabled this capability.
△ Less
Submitted 8 September, 2021; v1 submitted 26 May, 2021;
originally announced May 2021.