-
Proceedings of the 2015 International Workshop on the Lustre Ecosystem: Challenges and Opportunities
Authors:
Neena Imam,
Michael Brim,
Sarp Oral
Abstract:
The Lustre parallel file system has been widely adopted by high-performance computing (HPC) centers as an effective system for managing large-scale storage resources. Lustre achieves unprecedented aggregate performance by parallelizing I/O over file system clients and storage targets at extreme scales. Today, 7 out of 10 fastest supercomputers in the world use Lustre for high-performance storage.…
▽ More
The Lustre parallel file system has been widely adopted by high-performance computing (HPC) centers as an effective system for managing large-scale storage resources. Lustre achieves unprecedented aggregate performance by parallelizing I/O over file system clients and storage targets at extreme scales. Today, 7 out of 10 fastest supercomputers in the world use Lustre for high-performance storage. To date, Lustre development has focused on improving the performance and scalability of large-scale scientific workloads. In particular, large-scale checkpoint storage and retrieval, which is characterized by bursty I/O from coordinated parallel clients, has been the primary driver of Lustre development over the last decade. With the advent of extreme scale computing and Big Data computing, many HPC centers are seeing increased user interest in running diverse workloads that place new demands on Lustre. In March 2015, the International Workshop on the Lustre Ecosystem: Challenges and Opportunities was held in Annapolis, Maryland at the Historic Inns of Annapolis Governor Calvert House. This workshop series is intended to help explore improvements in the performance and flexibility of Lustre for supporting diverse application workloads. The 2015 workshop was the inaugural edition, and the goal was to initiate a discussion on the open challenges associated with enhancing Lustre for diverse applications, the technological advances necessary, and the associated impacts to the Lustre ecosystem. The workshop program featured a day of tutorials and a day of technical paper presentations.
△ Less
Submitted 17 June, 2015;
originally announced June 2015.
-
Monitoring Extreme-scale Lustre Toolkit
Authors:
Michael J. Brim,
Joshua K. Lothian
Abstract:
We discuss the design and ongoing development of the Monitoring Extreme-scale Lustre Toolkit (MELT), a unified Lustre performance monitoring and analysis infrastructure that provides continuous, low-overhead summary information on the health and performance of Lustre, as well as on-demand, in- depth problem diagnosis and root-cause analysis. The MELT infrastructure leverages a distributed overlay…
▽ More
We discuss the design and ongoing development of the Monitoring Extreme-scale Lustre Toolkit (MELT), a unified Lustre performance monitoring and analysis infrastructure that provides continuous, low-overhead summary information on the health and performance of Lustre, as well as on-demand, in- depth problem diagnosis and root-cause analysis. The MELT infrastructure leverages a distributed overlay network to enable monitoring of center-wide Lustre filesystems where clients are located across many network domains. We preview interactive command-line utilities that help administrators and users to observe Lustre performance at various levels of resolution, from individual servers or clients to whole filesystems, including job-level reporting. Finally, we discuss our future plans for automating the root-cause analysis of common Lustre performance problems.
△ Less
Submitted 26 April, 2015;
originally announced April 2015.
-
Evaluating Dynamic File Stri** For Lustre
Authors:
Joel Reed,
Jeremy Archuleta,
Michael J. Brim,
Joshua Lothian
Abstract:
We define dynamic stri** as the ability to assign different Lustre stri** characteristics to contiguous segments of a file as it grows. In this paper, we evaluate the effects of dynamic stri** using a watermark-based strategy where the stripe count or width is increased once a file's size exceeds one of the chosen watermarks. To measure the performance of this strategy we used a modified ver…
▽ More
We define dynamic stri** as the ability to assign different Lustre stri** characteristics to contiguous segments of a file as it grows. In this paper, we evaluate the effects of dynamic stri** using a watermark-based strategy where the stripe count or width is increased once a file's size exceeds one of the chosen watermarks. To measure the performance of this strategy we used a modified version of the IOR benchmark, a netflow analysis workload, and the blastn algorithm from NCBI BLAST. The results indicate that dynamic stri** is beneficial to tasks with unpredictable data file size and large sequential reads, but are less conclusive for workloads with significant random read phases.
△ Less
Submitted 26 April, 2015;
originally announced April 2015.