-
WannaLaugh: A Configurable Ransomware Emulator -- Learning to Mimic Malicious Storage Traces
Authors:
Dionysios Diamantopoulos,
Roman Pletka,
Slavisa Sarafijanovic,
A. L. Narasimha Reddy,
Haris Pozidis
Abstract:
Ransomware, a fearsome and rapidly evolving cybersecurity threat, continues to inflict severe consequences on individuals and organizations worldwide. Traditional detection methods, reliant on static signatures and application behavioral patterns, are challenged by the dynamic nature of these threats. This paper introduces three primary contributions to address this challenge. First, we introduce…
▽ More
Ransomware, a fearsome and rapidly evolving cybersecurity threat, continues to inflict severe consequences on individuals and organizations worldwide. Traditional detection methods, reliant on static signatures and application behavioral patterns, are challenged by the dynamic nature of these threats. This paper introduces three primary contributions to address this challenge. First, we introduce a ransomware emulator. This tool is designed to safely mimic ransomware attacks without causing actual harm or spreading malware, making it a unique solution for studying ransomware behavior. Second, we demonstrate how we use this emulator to create storage I/O traces. These traces are then utilized to train machine-learning models. Our results show that these models are effective in detecting ransomware, highlighting the practical application of our emulator in develo** responsible cybersecurity tools. Third, we show how our emulator can be used to mimic the I/O behavior of existing ransomware thereby enabling safe trace collection. Both the emulator and its application represent significant steps forward in ransomware detection in the era of machine-learning-driven cybersecurity.
△ Less
Submitted 12 June, 2024; v1 submitted 12 March, 2024;
originally announced March 2024.
-
LEAPER: Fast and Accurate FPGA-based System Performance Prediction via Transfer Learning
Authors:
Gagandeep Singh,
Dionysios Diamantopoulos,
Juan Gómez-Luna,
Sander Stuijk,
Henk Corporaal,
Onur Mutlu
Abstract:
Machine learning has recently gained traction as a way to overcome the slow accelerator generation and implementation process on an FPGA. It can be used to build performance and resource usage models that enable fast early-stage design space exploration. First, training requires large amounts of data (features extracted from design synthesis and implementation tools), which is cost-inefficient bec…
▽ More
Machine learning has recently gained traction as a way to overcome the slow accelerator generation and implementation process on an FPGA. It can be used to build performance and resource usage models that enable fast early-stage design space exploration. First, training requires large amounts of data (features extracted from design synthesis and implementation tools), which is cost-inefficient because of the time-consuming accelerator design and implementation process. Second, a model trained for a specific environment cannot predict performance or resource usage for a new, unknown environment. In a cloud system, renting a platform for data collection to build an ML model can significantly increase the total-cost-ownership (TCO) of a system. Third, ML-based models trained using a limited number of samples are prone to overfitting. To overcome these limitations, we propose LEAPER, a transfer learning-based approach for prediction of performance and resource usage in FPGA-based systems. The key idea of LEAPER is to transfer an ML-based performance and resource usage model trained for a low-end edge environment to a new, high-end cloud environment to provide fast and accurate predictions for accelerator implementation. Experimental results show that LEAPER (1) provides, on average across six workloads and five FPGAs, 85% accuracy when we use our transferred model for prediction in a cloud environment with 5-shot learning and (2) reduces design-space exploration time for accelerator implementation on an FPGA by 10x, from days to only a few hours.
△ Less
Submitted 2 October, 2022; v1 submitted 22 August, 2022;
originally announced August 2022.
-
Accelerating Weather Prediction using Near-Memory Reconfigurable Fabric
Authors:
Gagandeep Singh,
Dionysios Diamantopoulos,
Juan Gómez-Luna,
Christoph Hagleitner,
Sander Stuijk,
Henk Corporaal,
Onur Mutlu
Abstract:
Ongoing climate change calls for fast and accurate weather and climate modeling. However, when solving large-scale weather prediction simulations, state-of-the-art CPU and GPU implementations suffer from limited performance and high energy consumption. These implementations are dominated by complex irregular memory access patterns and low arithmetic intensity that pose fundamental challenges to ac…
▽ More
Ongoing climate change calls for fast and accurate weather and climate modeling. However, when solving large-scale weather prediction simulations, state-of-the-art CPU and GPU implementations suffer from limited performance and high energy consumption. These implementations are dominated by complex irregular memory access patterns and low arithmetic intensity that pose fundamental challenges to acceleration. To overcome these challenges, we propose and evaluate the use of near-memory acceleration using a reconfigurable fabric with high-bandwidth memory (HBM). We focus on compound stencils that are fundamental kernels in weather prediction models. By using high-level synthesis techniques, we develop NERO, an FPGA+HBM-based accelerator connected through OCAPI (Open Coherent Accelerator Processor Interface) to an IBM POWER9 host system. Our experimental results show that NERO outperforms a 16-core POWER9 system by 5.3x and 12.7x when running two different compound stencil kernels. NERO reduces the energy consumption by 12x and 35x for the same two kernels over the POWER9 system with an energy efficiency of 1.61 GFLOPS/Watt and 21.01 GFLOPS/Watt. We conclude that employing near-memory acceleration solutions for weather prediction modeling is promising as a means to achieve both high performance and high energy efficiency.
△ Less
Submitted 21 December, 2021; v1 submitted 19 July, 2021;
originally announced July 2021.
-
FPGA-Based Near-Memory Acceleration of Modern Data-Intensive Applications
Authors:
Gagandeep Singh,
Mohammed Alser,
Damla Senol Cali,
Dionysios Diamantopoulos,
Juan Gómez-Luna,
Henk Corporaal,
Onur Mutlu
Abstract:
Modern data-intensive applications demand high computation capabilities with strict power constraints. Unfortunately, such applications suffer from a significant waste of both execution cycles and energy in current computing systems due to the costly data movement between the computation units and the memory units. Genome analysis and weather prediction are two examples of such applications. Recen…
▽ More
Modern data-intensive applications demand high computation capabilities with strict power constraints. Unfortunately, such applications suffer from a significant waste of both execution cycles and energy in current computing systems due to the costly data movement between the computation units and the memory units. Genome analysis and weather prediction are two examples of such applications. Recent FPGAs couple a reconfigurable fabric with high-bandwidth memory (HBM) to enable more efficient data movement and improve overall performance and energy efficiency. This trend is an example of a paradigm shift to near-memory computing. We leverage such an FPGA with high-bandwidth memory (HBM) for improving the pre-alignment filtering step of genome analysis and representative kernels from a weather prediction model. Our evaluation demonstrates large speedups and energy savings over a high-end IBM POWER9 system and a conventional FPGA board with DDR4 memory. We conclude that FPGA-based near-memory computing has the potential to alleviate the data movement bottleneck for modern data-intensive applications.
△ Less
Submitted 3 July, 2021; v1 submitted 11 June, 2021;
originally announced June 2021.
-
Acceleration-as-a-μService: A Cloud-native Monte-Carlo Option Pricing Engine on CPUs, GPUs and Disaggregated FPGAs
Authors:
Dionysios Diamantopoulos,
Raphael Polig,
Burkhard Ringlein,
Mitra Purandare,
Beat Weiss,
Christoph Hagleitner,
Mark Lantz,
Francois Abel
Abstract:
The evolution of cloud applications into loosely-coupled microservices opens new opportunities for hardware accelerators to improve workload performance. Existing accelerator techniques for cloud sacrifice the consolidation benefits of microservices. This paper presents CloudiFi, a framework to deploy and compare accelerators as a cloud service. We evaluate our framework in the context of a financ…
▽ More
The evolution of cloud applications into loosely-coupled microservices opens new opportunities for hardware accelerators to improve workload performance. Existing accelerator techniques for cloud sacrifice the consolidation benefits of microservices. This paper presents CloudiFi, a framework to deploy and compare accelerators as a cloud service. We evaluate our framework in the context of a financial workload and present early results indicating up to 485x gains in microservice response time.
△ Less
Submitted 11 June, 2021;
originally announced June 2021.
-
EVEREST: A design environment for extreme-scale big data analytics on heterogeneous platforms
Authors:
Christian Pilato,
Stanislav Bohm,
Fabien Brocheton,
Jeronimo Castrillon,
Riccardo Cevasco,
Vojtech Cima,
Radim Cmar,
Dionysios Diamantopoulos,
Fabrizio Ferrandi,
Jan Martinovic,
Gianluca Palermo,
Michele Paolino,
Antonio Parodi,
Lorenzo Pittaluga,
Daniel Raho,
Francesco Regazzoni,
Katerina Slaninova,
Christoph Hagleitner
Abstract:
High-Performance Big Data Analytics (HPDA) applications are characterized by huge volumes of distributed and heterogeneous data that require efficient computation for knowledge extraction and decision making. Designers are moving towards a tight integration of computing systems combining HPC, Cloud, and IoT solutions with artificial intelligence (AI). Matching the application and data requirements…
▽ More
High-Performance Big Data Analytics (HPDA) applications are characterized by huge volumes of distributed and heterogeneous data that require efficient computation for knowledge extraction and decision making. Designers are moving towards a tight integration of computing systems combining HPC, Cloud, and IoT solutions with artificial intelligence (AI). Matching the application and data requirements with the characteristics of the underlying hardware is a key element to improve the predictions thanks to high performance and better use of resources.
We present EVEREST, a novel H2020 project started on October 1st, 2020 that aims at develo** a holistic environment for the co-design of HPDA applications on heterogeneous, distributed, and secure platforms. EVEREST focuses on programmability issues through a data-driven design approach, the use of hardware-accelerated AI, and an efficient runtime monitoring with virtualization support. In the different stages, EVEREST combines state-of-the-art programming models, emerging communication standards, and novel domain-specific extensions. We describe the EVEREST approach and the use cases that drive our research.
△ Less
Submitted 6 March, 2021;
originally announced March 2021.
-
NERO: A Near High-Bandwidth Memory Stencil Accelerator for Weather Prediction Modeling
Authors:
Gagandeep Singh,
Dionysios Diamantopoulos,
Christoph Hagleitner,
Juan Gomez-Luna,
Sander Stuijk,
Onur Mutlu,
Henk Corporaal
Abstract:
Ongoing climate change calls for fast and accurate weather and climate modeling. However, when solving large-scale weather prediction simulations, state-of-the-art CPU and GPU implementations suffer from limited performance and high energy consumption. These implementations are dominated by complex irregular memory access patterns and low arithmetic intensity that pose fundamental challenges to ac…
▽ More
Ongoing climate change calls for fast and accurate weather and climate modeling. However, when solving large-scale weather prediction simulations, state-of-the-art CPU and GPU implementations suffer from limited performance and high energy consumption. These implementations are dominated by complex irregular memory access patterns and low arithmetic intensity that pose fundamental challenges to acceleration. To overcome these challenges, we propose and evaluate the use of near-memory acceleration using a reconfigurable fabric with high-bandwidth memory (HBM). We focus on compound stencils that are fundamental kernels in weather prediction models. By using high-level synthesis techniques, we develop NERO, an FPGA+HBM-based accelerator connected through IBM CAPI2 (Coherent Accelerator Processor Interface) to an IBM POWER9 host system. Our experimental results show that NERO outperforms a 16-core POWER9 system by 4.2x and 8.3x when running two different compound stencil kernels. NERO reduces the energy consumption by 22x and 29x for the same two kernels over the POWER9 system with an energy efficiency of 1.5 GFLOPS/Watt and 17.3 GFLOPS/Watt. We conclude that employing near-memory acceleration solutions for weather prediction modeling is promising as a means to achieve both high performance and high energy efficiency.
△ Less
Submitted 17 September, 2020;
originally announced September 2020.
-
Agile Autotuning of a Transprecision Tensor Accelerator Overlay for TVM Compiler Stack
Authors:
Dionysios Diamantopoulos,
Burkhard Ringlein,
Mitra Purandare,
Gagandeep Singh,
Christoph Hagleitner
Abstract:
Specialized accelerators for tensor-operations, such as blocked-matrix operations and multi-dimensional convolutions, have been emerged as powerful architecture choices for high-performance Deep-Learning computing. The rapid development of frameworks, models, and precision options challenges the adaptability of such tensor-accelerators since the adaptation to new requirements incurs significant en…
▽ More
Specialized accelerators for tensor-operations, such as blocked-matrix operations and multi-dimensional convolutions, have been emerged as powerful architecture choices for high-performance Deep-Learning computing. The rapid development of frameworks, models, and precision options challenges the adaptability of such tensor-accelerators since the adaptation to new requirements incurs significant engineering costs. Programmable tensor accelerators offer a promising alternative by allowing reconfiguration of a virtual architecture that overlays on top of the physical FPGA configurable fabric. We propose an overlay (τ-VTA) and an optimization method guided by agile-inspired auto-tuning techniques. We achieve higher performance and faster convergence than state-of-art.
△ Less
Submitted 20 April, 2020;
originally announced April 2020.
-
High Bandwidth Memory on FPGAs: A Data Analytics Perspective
Authors:
Kaan Kara,
Christoph Hagleitner,
Dionysios Diamantopoulos,
Dimitris Syrivelis,
Gustavo Alonso
Abstract:
FPGA-based data processing in datacenters is increasing in popularity due to the demands of modern workloads and the ensuing necessity for specialization in hardware. Driven by this trend, vendors are rapidly adapting reconfigurable devices to suit data and compute intensive workloads. Inclusion of High Bandwidth Memory (HBM) in FPGA devices is a recent example. HBM promises overcoming the bandwid…
▽ More
FPGA-based data processing in datacenters is increasing in popularity due to the demands of modern workloads and the ensuing necessity for specialization in hardware. Driven by this trend, vendors are rapidly adapting reconfigurable devices to suit data and compute intensive workloads. Inclusion of High Bandwidth Memory (HBM) in FPGA devices is a recent example. HBM promises overcoming the bandwidth bottleneck, faced often by FPGA-based accelerators due to their throughput oriented design. In this paper, we study the usage and benefits of HBM on FPGAs from a data analytics perspective. We consider three workloads that are often performed in analytics oriented databases and implement them on FPGA showing in which cases they benefit from HBM: range selection, hash join, and stochastic gradient descent for linear model training. We integrate our designs into a columnar database (MonetDB) and show the trade-offs arising from the integration related to data movement and partitioning. In certain cases, FPGA+HBM based solutions are able to surpass the highest performance provided by either a 2-socket POWER9 system or a 14-core XeonE5 by up to 1.8x (selection), 12.9x (join), and 3.2x (SGD).
△ Less
Submitted 2 April, 2020;
originally announced April 2020.