Search | arXiv e-print repository

A System Development Kit for Big Data Applications on FPGA-based Clusters: The EVEREST Approach

Authors: Christian Pilato, Subhadeep Banik, Jakub Beranek, Fabien Brocheton, Jeronimo Castrillon, Riccardo Cevasco, Radim Cmar, Serena Curzel, Fabrizio Ferrandi, Karl F. A. Friebel, Antonella Galizia, Matteo Grasso, Paulo Silva, Jan Martinovic, Gianluca Palermo, Michele Paolino, Andrea Parodi, Antonio Parodi, Fabio Pintus, Raphael Polig, David Poulet, Francesco Regazzoni, Burkhard Ringlein, Roberto Rocco, Katerina Slaninova , et al. (6 additional authors not shown)

Abstract: Modern big data workflows are characterized by computationally intensive kernels. The simulated results are often combined with knowledge extracted from AI models to ultimately support decision-making. These energy-hungry workflows are increasingly executed in data centers with energy-efficient hardware accelerators since FPGAs are well-suited for this task due to their inherent parallelism. We pr… ▽ More Modern big data workflows are characterized by computationally intensive kernels. The simulated results are often combined with knowledge extracted from AI models to ultimately support decision-making. These energy-hungry workflows are increasingly executed in data centers with energy-efficient hardware accelerators since FPGAs are well-suited for this task due to their inherent parallelism. We present the H2020 project EVEREST, which has developed a system development kit (SDK) to simplify the creation of FPGA-accelerated kernels and manage the execution at runtime through a virtualization environment. This paper describes the main components of the EVEREST SDK and the benefits that can be achieved in our use cases. △ Less

Submitted 19 February, 2024; originally announced February 2024.

Comments: Accepted for presentation at DATE 2024 (multi-partner project session)

arXiv:2301.07486 [pdf, other]

CINM (Cinnamon): A Compilation Infrastructure for Heterogeneous Compute In-Memory and Compute Near-Memory Paradigms

Authors: Asif Ali Khan, Hamid Farzaneh, Karl F. A. Friebel, Clément Fournier, Lorenzo Chelini, Jeronimo Castrillon

Abstract: The rise of data-intensive applications exposed the limitations of conventional processor-centric von-Neumann architectures that struggle to meet the off-chip memory bandwidth demand. Therefore, recent innovations in computer architecture advocate compute-in-memory (CIM) and compute-near-memory (CNM), non-von- Neumann paradigms achieving orders-of-magnitude improvements in performance and energy c… ▽ More The rise of data-intensive applications exposed the limitations of conventional processor-centric von-Neumann architectures that struggle to meet the off-chip memory bandwidth demand. Therefore, recent innovations in computer architecture advocate compute-in-memory (CIM) and compute-near-memory (CNM), non-von- Neumann paradigms achieving orders-of-magnitude improvements in performance and energy consumption. Despite significant technological breakthroughs in the last few years, the programmability of these systems is still a serious challenge. Their programming models are too low-level and specific to particular system implementations. Since such future architectures are predicted to be highly heterogenous, develo** novel compiler abstractions and frameworks become necessary. To this end, we present CINM (Cinnamon), a first end-to-end compilation flow that leverages the hierarchal abstractions to generalize over different CIM and CNM devices and enable device-agnostic and device-aware optimizations. Cinnamon progressively lowers input programs and performs optimizations at each level in the lowering pipeline. To show its efficacy, we evaluate CINM on a set of benchmarks for the well-known UPMEM CNM system and the memristors-based CIM accelerators. We show that Cinnamon, supporting multiple hardware targets, generates high-performance code comparable to or better than state-of-the-art implementations. △ Less

Submitted 24 May, 2024; v1 submitted 25 December, 2022; originally announced January 2023.

Comments: 16 pages, 12 figures

arXiv:2203.10850 [pdf, other]

doi 10.1145/3563553

Automatic Creation of High-Bandwidth Memory Architectures from Domain-Specific Languages: The Case of Computational Fluid Dynamics

Authors: Stephanie Soldavini, Karl F. A. Friebel, Mattia Tibaldi, Gerald Hempel, Jeronimo Castrillon, Christian Pilato

Abstract: Numerical simulations can help solve complex problems. Most of these algorithms are massively parallel and thus good candidates for FPGA acceleration thanks to spatial parallelism. Modern FPGA devices can leverage high-bandwidth memory technologies, but when applications are memory-bound designers must craft advanced communication and memory architectures for efficient data movement and on-chip st… ▽ More Numerical simulations can help solve complex problems. Most of these algorithms are massively parallel and thus good candidates for FPGA acceleration thanks to spatial parallelism. Modern FPGA devices can leverage high-bandwidth memory technologies, but when applications are memory-bound designers must craft advanced communication and memory architectures for efficient data movement and on-chip storage. This development process requires hardware design skills that are uncommon in domain-specific experts. In this paper, we propose an automated tool flow from a domain-specific language (DSL) for tensor expressions to generate massively-parallel accelerators on HBM-equipped FPGAs. Designers can use this flow to integrate and evaluate various compiler or hardware optimizations. We use computational fluid dynamics (CFD) as a paradigmatic example. Our flow starts from the high-level specification of tensor operations and combines an MLIR-based compiler with an in-house hardware generation flow to generate systems with parallel accelerators and a specialized memory architecture that moves data efficiently, aiming at fully exploiting the available CPU-FPGA bandwidth. We simulated applications with millions of elements, achieving up to 103 GFLOPS with one compute unit and custom precision when targeting a Xilinx Alveo U280. Our FPGA implementation is up to 25x more energy-efficient than expert-crafted Intel CPU implementations. △ Less

Submitted 8 November, 2022; v1 submitted 21 March, 2022; originally announced March 2022.

Comments: Accepted for publication in ACM Transactions on Reconfigurable Technology and Systems (TRETS)

arXiv:2108.03326 [pdf, other]

doi 10.1109/Cluster48925.2021.00112

From Domain-Specific Languages to Memory-Optimized Accelerators for Fluid Dynamics

Authors: Karl F. A. Friebel, Stephanie Soldavini, Gerald Hempel, Christian Pilato, Jeronimo Castrillon

Abstract: Many applications are increasingly requiring numerical simulations for solving complex problems. Most of these numerical algorithms are massively parallel and often implemented on parallel high-performance computers. However, classic CPU-based platforms suffers due to the demand for higher resolutions and the exponential growth of data. FPGAs offer a powerful and flexible alternative that can host… ▽ More Many applications are increasingly requiring numerical simulations for solving complex problems. Most of these numerical algorithms are massively parallel and often implemented on parallel high-performance computers. However, classic CPU-based platforms suffers due to the demand for higher resolutions and the exponential growth of data. FPGAs offer a powerful and flexible alternative that can host accelerators to complement such platforms. Develo** such application-specific accelerators is still challenging because it is hard to provide efficient code for hardware synthesis. In this paper, we study the challenges of porting a numerical simulation kernel onto FPGA. We propose an automated tool flow from a domain-specific language (DSL) to generate accelerators for computational fluid dynamics on FPGA. Our DSL-based flow simplifies the exploration of parameters and constraints such as on-chip memory usage. We also propose a decoupled optimization of memory and logic resources, which allows us to better use the limited FPGA resources. In our preliminary evaluation, this enabled doubling the number of parallel kernels, increasing the accelerator speedup versus ARM execution from 7 to 12 times. △ Less

Submitted 6 August, 2021; originally announced August 2021.

Comments: Accepted for presentation at the FPGA for HPC Workshop 2021

arXiv:2005.07662 [pdf]

Guided interactive image segmentation using machine learning and color based data set clustering

Authors: Adrian Friebel, Tim Johann, Dirk Drasdo, Stefan Hoehme

Abstract: We present a novel approach that combines machine learning based interactive image segmentation using supervoxels with a clustering method for the automated identification of similarly colored images in large data sets which enables a guided reuse of classifiers. Our approach solves the problem of significant color variability prevalent and often unavoidable in biological and medical images which… ▽ More We present a novel approach that combines machine learning based interactive image segmentation using supervoxels with a clustering method for the automated identification of similarly colored images in large data sets which enables a guided reuse of classifiers. Our approach solves the problem of significant color variability prevalent and often unavoidable in biological and medical images which typically leads to deteriorated segmentation and quantification accuracy thereby greatly reducing the necessary training effort. This increase in efficiency facilitates the quantification of much larger numbers of images thereby enabling interactive image analysis for recent new technological advances in high-throughput imaging. The presented methods are applicable for almost any image type and represent a useful tool for image analysis tasks in general. △ Less

Submitted 21 June, 2022; v1 submitted 15 May, 2020; originally announced May 2020.

arXiv:1410.4598 [pdf]

TiQuant: Software for tissue analysis, quantification and surface reconstruction

Authors: Adrian Friebel, Johannes Neitsch, Tim Johann, Seddik Hammad, Jan G. Hengstler, Dirk Drasdo, Stefan Hoehme

Abstract: Motivation: TiQuant is a modular software tool for efficient quantification of biological tissues based on volume data obtained by biomedical image modalities. It includes a number of versatile image and volume processing chains tailored to the analysis of different tissue types which have been experimentally verified. TiQuant implements a novel method for the reconstruction of three-dimensional s… ▽ More Motivation: TiQuant is a modular software tool for efficient quantification of biological tissues based on volume data obtained by biomedical image modalities. It includes a number of versatile image and volume processing chains tailored to the analysis of different tissue types which have been experimentally verified. TiQuant implements a novel method for the reconstruction of three-dimensional surfaces of biological systems, data that often cannot be obtained experimentally but which is of utmost importance for tissue modelling in systems biology. Availability: TiQuant is freely available for non-commercial use at msysbio.com/tiquant. Windows, OSX and Linux are supported. △ Less

Submitted 16 October, 2014; originally announced October 2014.

Comments: 6 pages, 4 figures

Showing 1–6 of 6 results for author: Friebel, A