-
Understanding Power and Energy Utilization in Large Scale Production Physics Simulation Codes
Authors:
Brian S. Ryu**,
Arturo Vargas,
Ian Karlin,
Shawn A. Dawson,
Kenneth Weiss,
Adam Bertsch,
M. Scott McKinley,
Michael R. Collette,
Si D. Hammond,
Kevin Pedretti,
Robert N. Rieben
Abstract:
Power is an often-cited reason for moving to advanced architectures on the path to Exascale computing. This is due to the practical concern of delivering enough power to successfully site and operate these machines, as well as concerns over energy usage while running large simulations. Since accurate power measurements can be difficult to obtain, processor thermal design power (TDP) is a possible…
▽ More
Power is an often-cited reason for moving to advanced architectures on the path to Exascale computing. This is due to the practical concern of delivering enough power to successfully site and operate these machines, as well as concerns over energy usage while running large simulations. Since accurate power measurements can be difficult to obtain, processor thermal design power (TDP) is a possible surrogate due to its simplicity and availability. However, TDP is not indicative of typical power usage while running simulations. Using commodity and advance technology systems at Lawrence Livermore National Laboratory (LLNL) and Sandia National Laboratory, we performed a series of experiments to measure power and energy usage in running simulation codes. These experiments indicate that large scale LLNL simulation codes are significantly more efficient than a simple processor TDP model might suggest.
△ Less
Submitted 4 January, 2022;
originally announced January 2022.
-
TabulaROSA: Tabular Operating System Architecture for Massively Parallel Heterogeneous Compute Engines
Authors:
Jeremy Kepner,
Ron Brightwell,
Alan Edelman,
Vijay Gadepally,
Hayden Jananthan,
Michael Jones,
Sam Madden,
Peter Michaleas,
Hamed Okhravi,
Kevin Pedretti,
Albert Reuther,
Thomas Sterling,
Mike Stonebraker
Abstract:
The rise in computing hardware choices is driving a reevaluation of operating systems. The traditional role of an operating system controlling the execution of its own hardware is evolving toward a model whereby the controlling processor is distinct from the compute engines that are performing most of the computations. In this context, an operating system can be viewed as software that brokers and…
▽ More
The rise in computing hardware choices is driving a reevaluation of operating systems. The traditional role of an operating system controlling the execution of its own hardware is evolving toward a model whereby the controlling processor is distinct from the compute engines that are performing most of the computations. In this context, an operating system can be viewed as software that brokers and tracks the resources of the compute engines and is akin to a database management system. To explore the idea of using a database in an operating system role, this work defines key operating system functions in terms of rigorous mathematical semantics (associative array algebra) that are directly translatable into database operations. These operations possess a number of mathematical properties that are ideal for parallel operating systems by guaranteeing correctness over a wide range of parallel operations. The resulting operating system equations provide a mathematical specification for a Tabular Operating System Architecture (TabulaROSA) that can be implemented on any platform. Simulations of forking in TabularROSA are performed using an associative array implementation and compared to Linux on a 32,000+ core supercomputer. Using over 262,000 forkers managing over 68,000,000,000 processes, the simulations show that TabulaROSA has the potential to perform operating system functions on a massively parallel scale. The TabulaROSA simulations show 20x higher performance as compared to Linux while managing 2000x more processes in fully searchable tables.
△ Less
Submitted 13 July, 2018;
originally announced July 2018.
-
Geometric Partitioning and Ordering Strategies for Task Map** on Parallel Computers
Authors:
Mehmet Deveci,
Karen D. Devine,
Kevin Pedretti,
Mark A. Taylor,
Sivasankaran Rajamanickam,
Umit V. Catalyurek
Abstract:
We present a new method for map** applications' MPI tasks to cores of a parallel computer such that applications' communication time is reduced. We address the case of sparse node allocation, where the nodes assigned to a job are not necessarily located in a contiguous block nor within close proximity to each other in the network, although our methods generalize to contiguous allocations as well…
▽ More
We present a new method for map** applications' MPI tasks to cores of a parallel computer such that applications' communication time is reduced. We address the case of sparse node allocation, where the nodes assigned to a job are not necessarily located in a contiguous block nor within close proximity to each other in the network, although our methods generalize to contiguous allocations as well. The goal is to assign tasks to cores so that interdependent tasks are performed by "nearby" cores, thus lowering the distance messages must travel, the amount of congestion in the network, and the overall cost of communication. Our new method applies a geometric partitioning algorithm to both the tasks and the processors, and assigns task parts to the corresponding processor parts. We also present a number of algorithmic optimizations that exploit specific features of the network or application. We show that, for the structured finite difference mini-application MiniGhost, our map** methods reduced communication time up to 75% relative to MiniGhost's default map** on 128K cores of a Cray XK7 with sparse allocation. For the atmospheric modeling code E3SM/HOMME, our methods reduced communication time up to 31% on 32K cores of an IBM BlueGene/Q with contiguous allocation.
△ Less
Submitted 25 April, 2018;
originally announced April 2018.