Search | arXiv e-print repository

A System Development Kit for Big Data Applications on FPGA-based Clusters: The EVEREST Approach

Authors: Christian Pilato, Subhadeep Banik, Jakub Beranek, Fabien Brocheton, Jeronimo Castrillon, Riccardo Cevasco, Radim Cmar, Serena Curzel, Fabrizio Ferrandi, Karl F. A. Friebel, Antonella Galizia, Matteo Grasso, Paulo Silva, Jan Martinovic, Gianluca Palermo, Michele Paolino, Andrea Parodi, Antonio Parodi, Fabio Pintus, Raphael Polig, David Poulet, Francesco Regazzoni, Burkhard Ringlein, Roberto Rocco, Katerina Slaninova , et al. (6 additional authors not shown)

Abstract: Modern big data workflows are characterized by computationally intensive kernels. The simulated results are often combined with knowledge extracted from AI models to ultimately support decision-making. These energy-hungry workflows are increasingly executed in data centers with energy-efficient hardware accelerators since FPGAs are well-suited for this task due to their inherent parallelism. We pr… ▽ More Modern big data workflows are characterized by computationally intensive kernels. The simulated results are often combined with knowledge extracted from AI models to ultimately support decision-making. These energy-hungry workflows are increasingly executed in data centers with energy-efficient hardware accelerators since FPGAs are well-suited for this task due to their inherent parallelism. We present the H2020 project EVEREST, which has developed a system development kit (SDK) to simplify the creation of FPGA-accelerated kernels and manage the execution at runtime through a virtualization environment. This paper describes the main components of the EVEREST SDK and the benefits that can be achieved in our use cases. △ Less

Submitted 19 February, 2024; originally announced February 2024.

Comments: Accepted for presentation at DATE 2024 (multi-partner project session)

arXiv:2107.08716 [pdf, other]

doi 10.1145/3501804

Accelerating Weather Prediction using Near-Memory Reconfigurable Fabric

Authors: Gagandeep Singh, Dionysios Diamantopoulos, Juan Gómez-Luna, Christoph Hagleitner, Sander Stuijk, Henk Corporaal, Onur Mutlu

Abstract: Ongoing climate change calls for fast and accurate weather and climate modeling. However, when solving large-scale weather prediction simulations, state-of-the-art CPU and GPU implementations suffer from limited performance and high energy consumption. These implementations are dominated by complex irregular memory access patterns and low arithmetic intensity that pose fundamental challenges to ac… ▽ More Ongoing climate change calls for fast and accurate weather and climate modeling. However, when solving large-scale weather prediction simulations, state-of-the-art CPU and GPU implementations suffer from limited performance and high energy consumption. These implementations are dominated by complex irregular memory access patterns and low arithmetic intensity that pose fundamental challenges to acceleration. To overcome these challenges, we propose and evaluate the use of near-memory acceleration using a reconfigurable fabric with high-bandwidth memory (HBM). We focus on compound stencils that are fundamental kernels in weather prediction models. By using high-level synthesis techniques, we develop NERO, an FPGA+HBM-based accelerator connected through OCAPI (Open Coherent Accelerator Processor Interface) to an IBM POWER9 host system. Our experimental results show that NERO outperforms a 16-core POWER9 system by 5.3x and 12.7x when running two different compound stencil kernels. NERO reduces the energy consumption by 12x and 35x for the same two kernels over the POWER9 system with an energy efficiency of 1.61 GFLOPS/Watt and 21.01 GFLOPS/Watt. We conclude that employing near-memory acceleration solutions for weather prediction modeling is promising as a means to achieve both high performance and high energy efficiency. △ Less

Submitted 21 December, 2021; v1 submitted 19 July, 2021; originally announced July 2021.

Comments: arXiv admin note: substantial text overlap with arXiv:2009.08241, arXiv:2106.06433

arXiv:2106.06293 [pdf, other]

Acceleration-as-a-μService: A Cloud-native Monte-Carlo Option Pricing Engine on CPUs, GPUs and Disaggregated FPGAs

Authors: Dionysios Diamantopoulos, Raphael Polig, Burkhard Ringlein, Mitra Purandare, Beat Weiss, Christoph Hagleitner, Mark Lantz, Francois Abel

Abstract: The evolution of cloud applications into loosely-coupled microservices opens new opportunities for hardware accelerators to improve workload performance. Existing accelerator techniques for cloud sacrifice the consolidation benefits of microservices. This paper presents CloudiFi, a framework to deploy and compare accelerators as a cloud service. We evaluate our framework in the context of a financ… ▽ More The evolution of cloud applications into loosely-coupled microservices opens new opportunities for hardware accelerators to improve workload performance. Existing accelerator techniques for cloud sacrifice the consolidation benefits of microservices. This paper presents CloudiFi, a framework to deploy and compare accelerators as a cloud service. We evaluate our framework in the context of a financial workload and present early results indicating up to 485x gains in microservice response time. △ Less

Submitted 11 June, 2021; originally announced June 2021.

Comments: 3 pages, 6 figures

arXiv:2103.04185 [pdf, other]

doi 10.23919/DATE51398.2021.9473940

EVEREST: A design environment for extreme-scale big data analytics on heterogeneous platforms

Authors: Christian Pilato, Stanislav Bohm, Fabien Brocheton, Jeronimo Castrillon, Riccardo Cevasco, Vojtech Cima, Radim Cmar, Dionysios Diamantopoulos, Fabrizio Ferrandi, Jan Martinovic, Gianluca Palermo, Michele Paolino, Antonio Parodi, Lorenzo Pittaluga, Daniel Raho, Francesco Regazzoni, Katerina Slaninova, Christoph Hagleitner

Abstract: High-Performance Big Data Analytics (HPDA) applications are characterized by huge volumes of distributed and heterogeneous data that require efficient computation for knowledge extraction and decision making. Designers are moving towards a tight integration of computing systems combining HPC, Cloud, and IoT solutions with artificial intelligence (AI). Matching the application and data requirements… ▽ More High-Performance Big Data Analytics (HPDA) applications are characterized by huge volumes of distributed and heterogeneous data that require efficient computation for knowledge extraction and decision making. Designers are moving towards a tight integration of computing systems combining HPC, Cloud, and IoT solutions with artificial intelligence (AI). Matching the application and data requirements with the characteristics of the underlying hardware is a key element to improve the predictions thanks to high performance and better use of resources. We present EVEREST, a novel H2020 project started on October 1st, 2020 that aims at develo** a holistic environment for the co-design of HPDA applications on heterogeneous, distributed, and secure platforms. EVEREST focuses on programmability issues through a data-driven design approach, the use of hardware-accelerated AI, and an efficient runtime monitoring with virtualization support. In the different stages, EVEREST combines state-of-the-art programming models, emerging communication standards, and novel domain-specific extensions. We describe the EVEREST approach and the use cases that drive our research. △ Less

Submitted 6 March, 2021; originally announced March 2021.

Comments: Paper accepted for presentation at the IEEE/EDAC/ACM Design, Automation and Test in Europe Conference and Exhibition (DATE 2021)

arXiv:2009.08241 [pdf, other]

NERO: A Near High-Bandwidth Memory Stencil Accelerator for Weather Prediction Modeling

Authors: Gagandeep Singh, Dionysios Diamantopoulos, Christoph Hagleitner, Juan Gomez-Luna, Sander Stuijk, Onur Mutlu, Henk Corporaal

Abstract: Ongoing climate change calls for fast and accurate weather and climate modeling. However, when solving large-scale weather prediction simulations, state-of-the-art CPU and GPU implementations suffer from limited performance and high energy consumption. These implementations are dominated by complex irregular memory access patterns and low arithmetic intensity that pose fundamental challenges to ac… ▽ More Ongoing climate change calls for fast and accurate weather and climate modeling. However, when solving large-scale weather prediction simulations, state-of-the-art CPU and GPU implementations suffer from limited performance and high energy consumption. These implementations are dominated by complex irregular memory access patterns and low arithmetic intensity that pose fundamental challenges to acceleration. To overcome these challenges, we propose and evaluate the use of near-memory acceleration using a reconfigurable fabric with high-bandwidth memory (HBM). We focus on compound stencils that are fundamental kernels in weather prediction models. By using high-level synthesis techniques, we develop NERO, an FPGA+HBM-based accelerator connected through IBM CAPI2 (Coherent Accelerator Processor Interface) to an IBM POWER9 host system. Our experimental results show that NERO outperforms a 16-core POWER9 system by 4.2x and 8.3x when running two different compound stencil kernels. NERO reduces the energy consumption by 22x and 29x for the same two kernels over the POWER9 system with an energy efficiency of 1.5 GFLOPS/Watt and 17.3 GFLOPS/Watt. We conclude that employing near-memory acceleration solutions for weather prediction modeling is promising as a means to achieve both high performance and high energy efficiency. △ Less

Submitted 17 September, 2020; originally announced September 2020.

Comments: This paper appears in FPL 2020

arXiv:2004.10854 [pdf, other]

Agile Autotuning of a Transprecision Tensor Accelerator Overlay for TVM Compiler Stack

Authors: Dionysios Diamantopoulos, Burkhard Ringlein, Mitra Purandare, Gagandeep Singh, Christoph Hagleitner

Abstract: Specialized accelerators for tensor-operations, such as blocked-matrix operations and multi-dimensional convolutions, have been emerged as powerful architecture choices for high-performance Deep-Learning computing. The rapid development of frameworks, models, and precision options challenges the adaptability of such tensor-accelerators since the adaptation to new requirements incurs significant en… ▽ More Specialized accelerators for tensor-operations, such as blocked-matrix operations and multi-dimensional convolutions, have been emerged as powerful architecture choices for high-performance Deep-Learning computing. The rapid development of frameworks, models, and precision options challenges the adaptability of such tensor-accelerators since the adaptation to new requirements incurs significant engineering costs. Programmable tensor accelerators offer a promising alternative by allowing reconfiguration of a virtual architecture that overlays on top of the physical FPGA configurable fabric. We propose an overlay (τ-VTA) and an optimization method guided by agile-inspired auto-tuning techniques. We achieve higher performance and faster convergence than state-of-art. △ Less

Submitted 20 April, 2020; originally announced April 2020.

Comments: 9 pages, 7 figures

arXiv:2004.01635 [pdf, other]

High Bandwidth Memory on FPGAs: A Data Analytics Perspective

Authors: Kaan Kara, Christoph Hagleitner, Dionysios Diamantopoulos, Dimitris Syrivelis, Gustavo Alonso

Abstract: FPGA-based data processing in datacenters is increasing in popularity due to the demands of modern workloads and the ensuing necessity for specialization in hardware. Driven by this trend, vendors are rapidly adapting reconfigurable devices to suit data and compute intensive workloads. Inclusion of High Bandwidth Memory (HBM) in FPGA devices is a recent example. HBM promises overcoming the bandwid… ▽ More FPGA-based data processing in datacenters is increasing in popularity due to the demands of modern workloads and the ensuing necessity for specialization in hardware. Driven by this trend, vendors are rapidly adapting reconfigurable devices to suit data and compute intensive workloads. Inclusion of High Bandwidth Memory (HBM) in FPGA devices is a recent example. HBM promises overcoming the bandwidth bottleneck, faced often by FPGA-based accelerators due to their throughput oriented design. In this paper, we study the usage and benefits of HBM on FPGAs from a data analytics perspective. We consider three workloads that are often performed in analytics oriented databases and implement them on FPGA showing in which cases they benefit from HBM: range selection, hash join, and stochastic gradient descent for linear model training. We integrate our designs into a columnar database (MonetDB) and show the trade-offs arising from the integration related to data movement and partitioning. In certain cases, FPGA+HBM based solutions are able to surpass the highest performance provided by either a 2-socket POWER9 system or a 14-core XeonE5 by up to 1.8x (selection), 12.9x (join), and 3.2x (SGD). △ Less

Submitted 2 April, 2020; originally announced April 2020.

arXiv:1806.01103 [pdf, ps, other]

doi 10.1109/MM.2014.69

Giving Text Analytics a Boost

Authors: Raphael Polig, Kubilay Atasu, Laura Chiticariu, Christoph Hagleitner, H. Peter Hofstee, Frederick R. Reiss, Eva Sitaridi, Huaiyu Zhu

Abstract: The amount of textual data has reached a new scale and continues to grow at an unprecedented rate. IBM's SystemT software is a powerful text analytics system, which offers a query-based interface to reveal the valuable information that lies within these mounds of data. However, traditional server architectures are not capable of analyzing the so-called "Big Data" in an efficient way, despite the h… ▽ More The amount of textual data has reached a new scale and continues to grow at an unprecedented rate. IBM's SystemT software is a powerful text analytics system, which offers a query-based interface to reveal the valuable information that lies within these mounds of data. However, traditional server architectures are not capable of analyzing the so-called "Big Data" in an efficient way, despite the high memory bandwidth that is available. We show that by using a streaming hardware accelerator implemented in reconfigurable logic, the throughput rates of the SystemT's information extraction queries can be improved by an order of magnitude. We present how such a system can be deployed by extending SystemT's existing compilation flow and by using a multi-threaded communication interface that can efficiently use the bandwidth of the accelerator. △ Less

Submitted 25 April, 2018; originally announced June 2018.

Journal ref: IEEE Micro ( Volume: 34, Issue: 4, July-Aug. 2014 ) p. 6-14

arXiv:0710.4835 [pdf]

A CMOS-Based Tactile Sensor for Continuous Blood Pressure Monitoring

Authors: K. -U. Kirstein, J. Sedivy, T. Salo, C. Hagleitner, T. Vancura, A. Hierlemann

Abstract: A monolithic integrated tactile sensor array is presented, which is used to perform non-invasive blood pressure monitoring of a patient. The advantage of this device compared to a hand cuff based approach is the capability of recording continuous blood pressure data. The capacitive, membrane-based sensor device is fabricated in an industrial CMOS-technology combined with post-CMOS micromachining… ▽ More A monolithic integrated tactile sensor array is presented, which is used to perform non-invasive blood pressure monitoring of a patient. The advantage of this device compared to a hand cuff based approach is the capability of recording continuous blood pressure data. The capacitive, membrane-based sensor device is fabricated in an industrial CMOS-technology combined with post-CMOS micromachining. The capacitance change is detected by a S?-modulator. The modulator is operated at a sampling rate of 128kS/s and achieves a resolution of 12bit with an external decimation filter and an OSR of 128. △ Less

Submitted 25 October, 2007; originally announced October 2007.

Comments: Submitted on behalf of EDAA (http://www.edaa.com/)

Journal ref: Dans Design, Automation and Test in Europe | Designers'Forum - DATE'05, Munich : Allemagne (2005)

Showing 1–9 of 9 results for author: Hagleitner, C