Crossing the Architectural Barrier: Evaluating Representative Regions of Parallel HPC Applications
Authors:
Alexandra Ferreron,
Radhika Jagtap,
Sascha Bischoff,
Roxana Rusitoru
Abstract:
Exascale computing will get mankind closer to solving important social, scientific and engineering problems. Due to high prototy** costs, High Performance Computing (HPC) system architects make use of simulation models for design space exploration and hardware-software co-design. However, as HPC systems reach exascale proportions, the cost of simulation increases, since simulators themselves are…
▽ More
Exascale computing will get mankind closer to solving important social, scientific and engineering problems. Due to high prototy** costs, High Performance Computing (HPC) system architects make use of simulation models for design space exploration and hardware-software co-design. However, as HPC systems reach exascale proportions, the cost of simulation increases, since simulators themselves are largely single-threaded. Tools for selecting representative parts of parallel applications to reduce running costs are widespread, e.g., BarrierPoint achieves this by analysing, in simulation, abstract characteristics such as basic blocks and reuse distances. However, architectures new to HPC have a limited set of tools available.
In this work, we provide an independent cross-architectural evaluation on real hardware - across Intel and ARM - of the BarrierPoint methodology, when applied to parallel HPC proxy applications. We present both cases: when the methodology can be applied and when it cannot. In the former case, results show that we can predict the performance of full application execution by running shorter representative sections. In the latter case, we dive into the underlying issues and suggest improvements. We demonstrate a total simulation time reduction of up to 178x, whilst kee** the error below 2.3% for both cycles and instructions.
△ Less
Submitted 20 March, 2018;
originally announced March 2018.
AISC: Approximate Instruction Set Computer
Authors:
Alexandra Ferreron,
Jesus Alastruey-Benede,
Dario Suarez-Gracia,
Ulya R. Karpuzcu
Abstract:
This paper makes the case for a single-ISA heterogeneous computing platform, AISC, where each compute engine (be it a core or an accelerator) supports a different subset of the very same ISA. An ISA subset may not be functionally complete, but the union of the (per compute engine) subsets renders a functionally complete, platform-wide single ISA. Tailoring the microarchitecture of each compute eng…
▽ More
This paper makes the case for a single-ISA heterogeneous computing platform, AISC, where each compute engine (be it a core or an accelerator) supports a different subset of the very same ISA. An ISA subset may not be functionally complete, but the union of the (per compute engine) subsets renders a functionally complete, platform-wide single ISA. Tailoring the microarchitecture of each compute engine to the subset of the ISA that it supports can easily reduce hardware complexity. At the same time, the energy efficiency of computing can improve by exploiting algorithmic noise tolerance: by map** code sequences that can tolerate (any potential inaccuracy induced by) the incomplete ISA-subsets to the corresponding compute engines.
△ Less
Submitted 19 March, 2018;
originally announced March 2018.