-
Software Development Vehicles to enable extended and early co-design: a RISC-V and HPC case of study
Authors:
Filippo Mantovani,
Pablo Vizcaino,
Fabio Banchelli,
Marta Garcia-Gasulla,
Roger Ferrer,
Giorgos Ieronymakis,
Nikos Dimou,
Vassilis Papaefstathiou,
Jesus Labarta
Abstract:
Prototy** HPC systems with low-to-mid technology readiness level (TRL) systems is critical for providing feedback to hardware designers, the system software team (e.g., compiler developers), and early adopters from the scientific community. The typical approach to hardware design and HPC system prototy** often limits feedback or only allows it at a late stage. In this paper, we present a set o…
▽ More
Prototy** HPC systems with low-to-mid technology readiness level (TRL) systems is critical for providing feedback to hardware designers, the system software team (e.g., compiler developers), and early adopters from the scientific community. The typical approach to hardware design and HPC system prototy** often limits feedback or only allows it at a late stage. In this paper, we present a set of tools for co-designing HPC systems, called software development vehicles (SDV). We use an innovative RISC-V design as a demonstrator, which includes a scalar CPU and a vector processing unit capable of operating large vectors up to 16 kbits. We provide an incremental methodology and early tangible evidence of the co-design process that provide feedback to improve both architecture and system software at a very early stage of system development.
△ Less
Submitted 1 June, 2023;
originally announced June 2023.
-
A portable coding strategy to exploit vectorization on combustion simulations
Authors:
Fabio Banchelli,
Guillermo Oyarzun,
Marta Garcia-Gasulla,
Filippo Mantovani,
Ambrus Both,
Guillaume Houzeaux,
Daniel Mira
Abstract:
The complexity of combustion simulations demands the latest high-performance computing tools to accelerate its time-to-solution results. A current trend on HPC systems is the utilization of CPUs with SIMD or vector extensions to exploit data parallelism. Our work proposes a strategy to improve the automatic vectorization of finite element-based scientific codes. The approach applies a parametric c…
▽ More
The complexity of combustion simulations demands the latest high-performance computing tools to accelerate its time-to-solution results. A current trend on HPC systems is the utilization of CPUs with SIMD or vector extensions to exploit data parallelism. Our work proposes a strategy to improve the automatic vectorization of finite element-based scientific codes. The approach applies a parametric configuration to the data structures to help the compiler detect the block of codes that can take advantage of vector computation while maintaining the code portable. A detailed analysis of the computational impact of this methodology on the different stages of a CFD solver is studied on the PRECCINSTA burner simulation. Our parametric implementation has proven to help the compiler generate more vector instructions in the assembly operation: this results in a reduction of up to 9.3 times of the total executed instruction maintaining constant the Instructions Per Cycle and the CPU frequency. The proposed strategy improves the performance of the CFD case under study up to 4.67 times on the MareNostrum 4 supercomputer.
△ Less
Submitted 21 October, 2022;
originally announced October 2022.
-
Performance and energy consumption of HPC workloads on a cluster based on Arm ThunderX2 CPU
Authors:
Filippo Mantovani,
Marta Garcia-Gasulla,
José Gracia,
Esteban Stafford,
Fabio Banchelli,
Marc Josep-Fabrego,
Joel Criado-Ledesma,
Mathias Nachtmann
Abstract:
In this paper, we analyze the performance and energy consumption of an Arm-based high-performance computing (HPC) system developed within the European project Mont-Blanc 3. This system, called Dibona, has been integrated by ATOS/Bull, and it is powered by the latest Marvell's CPU, ThunderX2. This CPU is the same one that powers the Astra supercomputer, the first Arm-based supercomputer entering th…
▽ More
In this paper, we analyze the performance and energy consumption of an Arm-based high-performance computing (HPC) system developed within the European project Mont-Blanc 3. This system, called Dibona, has been integrated by ATOS/Bull, and it is powered by the latest Marvell's CPU, ThunderX2. This CPU is the same one that powers the Astra supercomputer, the first Arm-based supercomputer entering the Top500 in November 2018. We study from micro-benchmarks up to large production codes. We include an interdisciplinary evaluation of three scientific applications (a finite-element fluid dynamics code, a smoothed particle hydrodynamics code, and a lattice Boltzmann code) and the Graph 500 benchmark, focusing on parallel and energy efficiency as well as studying their scalability up to thousands of Armv8 cores. For comparison, we run the same tests on state-of-the-art x86 nodes included in Dibona and the Tier-0 supercomputer MareNostrum4. Our experiments show that the ThunderX2 has a 25% lower performance on average, mainly due to its small vector unit yet somewhat compensated by its 30% wider links between the CPU and the main memory. We found that the software ecosystem of the Armv8 architecture is comparable to the one available for Intel. Our results also show that ThunderX2 delivers similar or better energy-to-solution and scalability, proving that Arm-based chips are legitimate contenders in the market of next-generation HPC systems.
△ Less
Submitted 10 July, 2020; v1 submitted 9 July, 2020;
originally announced July 2020.