-
Exploring the Vision Processing Unit as Co-processor for Inference
Authors:
Sergio Rivas-Gomez,
Antonio J. Peña,
David Moloney,
Erwin Laure,
Stefano Markidis
Abstract:
The success of the exascale supercomputer is largely debated to remain dependent on novel breakthroughs in technology that effectively reduce the power consumption and thermal dissipation requirements. In this work, we consider the integration of co-processors in high-performance computing (HPC) to enable low-power, seamless computation offloading of certain operations. In particular, we explore t…
▽ More
The success of the exascale supercomputer is largely debated to remain dependent on novel breakthroughs in technology that effectively reduce the power consumption and thermal dissipation requirements. In this work, we consider the integration of co-processors in high-performance computing (HPC) to enable low-power, seamless computation offloading of certain operations. In particular, we explore the so-called Vision Processing Unit (VPU), a highly-parallel vector processor with a power envelope of less than 1W. We evaluate this chip during inference using a pre-trained GoogLeNet convolutional network model and a large image dataset from the ImageNet ILSVRC challenge. Preliminary results indicate that a multi-VPU configuration provides similar performance compared to reference CPU and GPU implementations, while reducing the thermal-design power (TDP) up to 8x in comparison.
△ Less
Submitted 9 October, 2018;
originally announced October 2018.
-
Decoupled Strategy for Imbalanced Workloads in MapReduce Frameworks
Authors:
Sergio Rivas-Gomez,
Sai Narasimhamurthy,
Keeran Brabazon,
Oliver Perks,
Erwin Laure,
Stefano Markidis
Abstract:
In this work, we consider the integration of MPI one-sided communication and non-blocking I/O in HPC-centric MapReduce frameworks. Using a decoupled strategy, we aim to overlap the Map and Reduce phases of the algorithm by allowing processes to communicate and synchronize using solely one-sided operations. Hence, we effectively increase the performance in situations where the workload per process…
▽ More
In this work, we consider the integration of MPI one-sided communication and non-blocking I/O in HPC-centric MapReduce frameworks. Using a decoupled strategy, we aim to overlap the Map and Reduce phases of the algorithm by allowing processes to communicate and synchronize using solely one-sided operations. Hence, we effectively increase the performance in situations where the workload per process is unexpectedly unbalanced. Using a Word-Count implementation and a large dataset from the Purdue MapReduce Benchmarks Suite (PUMA), we demonstrate that our approach can provide up to 23% performance improvement on average compared to a reference MapReduce implementation that uses state-of-the-art MPI collective communication and I/O.
△ Less
Submitted 9 October, 2018;
originally announced October 2018.
-
MPI Windows on Storage for HPC Applications
Authors:
Sergio Rivas-Gomez,
Roberto Gioiosa,
Ivy Bo Peng,
Gokcen Kestor,
Sai Narasimhamurthy,
Erwin Laure,
Stefano Markidis
Abstract:
Upcoming HPC clusters will feature hybrid memories and storage devices per compute node. In this work, we propose to use the MPI one-sided communication model and MPI windows as unique interface for programming memory and storage. We describe the design and implementation of MPI storage windows, and present its benefits for out-of-core execution, parallel I/O and fault-tolerance. In addition, we e…
▽ More
Upcoming HPC clusters will feature hybrid memories and storage devices per compute node. In this work, we propose to use the MPI one-sided communication model and MPI windows as unique interface for programming memory and storage. We describe the design and implementation of MPI storage windows, and present its benefits for out-of-core execution, parallel I/O and fault-tolerance. In addition, we explore the integration of heterogeneous window allocations, where memory and storage share a unified virtual address space. When performing large, irregular memory operations, we verify that MPI windows on local storage incurs a 55% performance penalty on average. When using a Lustre parallel file system, asymmetric performance is observed with over 90% degradation in writing operations. Nonetheless, experimental results of a Distributed Hash Table, the HACC I/O kernel mini-application, and a novel MapReduce implementation based on the use of MPI one-sided communication, indicate that the overall penalty of MPI windows on storage can be negligible in most cases in real-world applications.
△ Less
Submitted 9 October, 2018;
originally announced October 2018.
-
The SAGE Project: a Storage Centric Approach for Exascale Computing
Authors:
Sai Narasimhamurthy,
Nikita Danilov,
Sining Wu,
Ganesan Umanesan,
Steven Wei-der Chien,
Sergio Rivas-Gomez,
Ivy Bo Peng,
Erwin Laure,
Shaun de Witt,
Dirk Pleiter,
Stefano Markidis
Abstract:
SAGE (Percipient StorAGe for Exascale Data Centric Computing) is a European Commission funded project towards the era of Exascale computing. Its goal is to design and implement a Big Data/Extreme Computing (BDEC) capable infrastructure with associated software stack. The SAGE system follows a "storage centric" approach as it is capable of storing and processing large data volumes at the Exascale r…
▽ More
SAGE (Percipient StorAGe for Exascale Data Centric Computing) is a European Commission funded project towards the era of Exascale computing. Its goal is to design and implement a Big Data/Extreme Computing (BDEC) capable infrastructure with associated software stack. The SAGE system follows a "storage centric" approach as it is capable of storing and processing large data volumes at the Exascale regime.
SAGE addresses the convergence of Big Data Analysis and HPC in an era of next-generation data centric computing. This convergence is driven by the proliferation of massive data sources, such as large, dispersed scientific instruments and sensors where data needs to be processed, analyzed and integrated into simulations to derive scientific and innovative insights. A first prototype of the SAGE system has been been implemented and installed at the Julich Supercomputing Center. The SAGE storage system consists of multiple types of storage device technologies in a multi-tier I/O hierarchy, including flash, disk, and non-volatile memory technologies. The main SAGE software component is the Seagate Mero Object Storage that is accessible via the Clovis API and higher level interfaces. The SAGE project also includes scientific applications for the validation of the SAGE concepts.
The objective of this paper is to present the SAGE project concepts, the prototype of the SAGE platform and discuss the software architecture of the SAGE system.
△ Less
Submitted 6 July, 2018;
originally announced July 2018.
-
SAGE: Percipient Storage for Exascale Data Centric Computing
Authors:
Sai Narasimhamurthy,
Nikita Danilov,
Sining Wu,
Ganesan Umanesan,
Stefano Markidis,
Sergio Rivas-Gomez,
Ivy Bo Peng,
Erwin Laure,
Dirk Pleiter,
Shaun de Witt
Abstract:
We aim to implement a Big Data/Extreme Computing (BDEC) capable system infrastructure as we head towards the era of Exascale computing - termed SAGE (Percipient StorAGe for Exascale Data Centric Computing). The SAGE system will be capable of storing and processing immense volumes of data at the Exascale regime, and provide the capability for Exascale class applications to use such a storage infras…
▽ More
We aim to implement a Big Data/Extreme Computing (BDEC) capable system infrastructure as we head towards the era of Exascale computing - termed SAGE (Percipient StorAGe for Exascale Data Centric Computing). The SAGE system will be capable of storing and processing immense volumes of data at the Exascale regime, and provide the capability for Exascale class applications to use such a storage infrastructure. SAGE addresses the increasing overlaps between Big Data Analysis and HPC in an era of next-generation data centric computing that has developed due to the proliferation of massive data sources, such as large, dispersed scientific instruments and sensors, whose data needs to be processed, analyzed and integrated into simulations to derive scientific and innovative insights. Indeed, Exascale I/O, as a problem that has not been sufficiently dealt with for simulation codes, is appropriately addressed by the SAGE platform. The objective of this paper is to discuss the software architecture of the SAGE system and look at early results we have obtained employing some of its key methodologies, as the system continues to evolve.
△ Less
Submitted 1 May, 2018;
originally announced May 2018.
-
Extending Message Passing Interface Windows to Storage
Authors:
Sergio Rivas-Gomez,
Stefano Markidis,
Ivy Bo Peng,
Erwin Laure,
Gokcen Kestor,
Roberto Gioiosa
Abstract:
This work presents an extension to MPI supporting the one-sided communication model and window allocations in storage. Our design transparently integrates with the current MPI implementations, enabling applications to target MPI windows in storage, memory or both simultaneously, without major modifications. Initial performance results demonstrate that the presented MPI window extension could poten…
▽ More
This work presents an extension to MPI supporting the one-sided communication model and window allocations in storage. Our design transparently integrates with the current MPI implementations, enabling applications to target MPI windows in storage, memory or both simultaneously, without major modifications. Initial performance results demonstrate that the presented MPI window extension could potentially be helpful for a wide-range of use-cases and with low-overhead.
△ Less
Submitted 27 April, 2017;
originally announced April 2017.