-
A System Development Kit for Big Data Applications on FPGA-based Clusters: The EVEREST Approach
Authors:
Christian Pilato,
Subhadeep Banik,
Jakub Beranek,
Fabien Brocheton,
Jeronimo Castrillon,
Riccardo Cevasco,
Radim Cmar,
Serena Curzel,
Fabrizio Ferrandi,
Karl F. A. Friebel,
Antonella Galizia,
Matteo Grasso,
Paulo Silva,
Jan Martinovic,
Gianluca Palermo,
Michele Paolino,
Andrea Parodi,
Antonio Parodi,
Fabio Pintus,
Raphael Polig,
David Poulet,
Francesco Regazzoni,
Burkhard Ringlein,
Roberto Rocco,
Katerina Slaninova
, et al. (6 additional authors not shown)
Abstract:
Modern big data workflows are characterized by computationally intensive kernels. The simulated results are often combined with knowledge extracted from AI models to ultimately support decision-making. These energy-hungry workflows are increasingly executed in data centers with energy-efficient hardware accelerators since FPGAs are well-suited for this task due to their inherent parallelism. We pr…
▽ More
Modern big data workflows are characterized by computationally intensive kernels. The simulated results are often combined with knowledge extracted from AI models to ultimately support decision-making. These energy-hungry workflows are increasingly executed in data centers with energy-efficient hardware accelerators since FPGAs are well-suited for this task due to their inherent parallelism. We present the H2020 project EVEREST, which has developed a system development kit (SDK) to simplify the creation of FPGA-accelerated kernels and manage the execution at runtime through a virtualization environment. This paper describes the main components of the EVEREST SDK and the benefits that can be achieved in our use cases.
△ Less
Submitted 19 February, 2024;
originally announced February 2024.
-
A Survey of FPGA Optimization Methods for Data Center Energy Efficiency
Authors:
Mattia Tibaldi,
Christian Pilato
Abstract:
This article provides a survey of academic literature about field programmable gate array (FPGA) and their utilization for energy efficiency acceleration in data centers. The goal is to critically present the existing FPGA energy optimization techniques and discuss how they can be applied to such systems. To do so, the article explores current energy trends and their projection to the future with…
▽ More
This article provides a survey of academic literature about field programmable gate array (FPGA) and their utilization for energy efficiency acceleration in data centers. The goal is to critically present the existing FPGA energy optimization techniques and discuss how they can be applied to such systems. To do so, the article explores current energy trends and their projection to the future with particular attention to the requirements set out by the European Code of Conduct for Data Center Energy Efficiency. The article then proposes a complete analysis of over ten years of research in energy optimization techniques, classifying them by purpose, method of application, and impacts on the sources of consumption. Finally, we conclude with the challenges and possible innovations we expect for this sector.
△ Less
Submitted 22 September, 2023;
originally announced September 2023.
-
Automatic Creation of High-Bandwidth Memory Architectures from Domain-Specific Languages: The Case of Computational Fluid Dynamics
Authors:
Stephanie Soldavini,
Karl F. A. Friebel,
Mattia Tibaldi,
Gerald Hempel,
Jeronimo Castrillon,
Christian Pilato
Abstract:
Numerical simulations can help solve complex problems. Most of these algorithms are massively parallel and thus good candidates for FPGA acceleration thanks to spatial parallelism. Modern FPGA devices can leverage high-bandwidth memory technologies, but when applications are memory-bound designers must craft advanced communication and memory architectures for efficient data movement and on-chip st…
▽ More
Numerical simulations can help solve complex problems. Most of these algorithms are massively parallel and thus good candidates for FPGA acceleration thanks to spatial parallelism. Modern FPGA devices can leverage high-bandwidth memory technologies, but when applications are memory-bound designers must craft advanced communication and memory architectures for efficient data movement and on-chip storage. This development process requires hardware design skills that are uncommon in domain-specific experts. In this paper, we propose an automated tool flow from a domain-specific language (DSL) for tensor expressions to generate massively-parallel accelerators on HBM-equipped FPGAs. Designers can use this flow to integrate and evaluate various compiler or hardware optimizations. We use computational fluid dynamics (CFD) as a paradigmatic example. Our flow starts from the high-level specification of tensor operations and combines an MLIR-based compiler with an in-house hardware generation flow to generate systems with parallel accelerators and a specialized memory architecture that moves data efficiently, aiming at fully exploiting the available CPU-FPGA bandwidth. We simulated applications with millions of elements, achieving up to 103 GFLOPS with one compute unit and custom precision when targeting a Xilinx Alveo U280. Our FPGA implementation is up to 25x more energy-efficient than expert-crafted Intel CPU implementations.
△ Less
Submitted 8 November, 2022; v1 submitted 21 March, 2022;
originally announced March 2022.