-
Attention-Based Deep Reinforcement Learning for Qubit Allocation in Modular Quantum Architectures
Authors:
Enrico Russo,
Maurizio Palesi,
Davide Patti,
Giuseppe Ascia,
Vincenzo Catania
Abstract:
Modular, distributed and multi-core architectures are currently considered a promising approach for scalability of quantum computing systems. The integration of multiple Quantum Processing Units necessitates classical and quantum-coherent communication, introducing challenges related to noise and quantum decoherence in quantum state transfers between cores. Optimizing communication becomes imperat…
▽ More
Modular, distributed and multi-core architectures are currently considered a promising approach for scalability of quantum computing systems. The integration of multiple Quantum Processing Units necessitates classical and quantum-coherent communication, introducing challenges related to noise and quantum decoherence in quantum state transfers between cores. Optimizing communication becomes imperative, and the compilation and map** of quantum circuits onto physical qubits must minimize state transfers while adhering to architectural constraints. The compilation process, inherently an NP-hard problem, demands extensive search times even with a small number of qubits to be solved to optimality. To address this challenge efficiently, we advocate for the utilization of heuristic mappers that can rapidly generate solutions. In this work, we propose a novel approach employing Deep Reinforcement Learning (DRL) methods to learn these heuristics for a specific multi-core architecture. Our DRL agent incorporates a Transformer encoder and Graph Neural Networks. It encodes quantum circuits using self-attention mechanisms and produce outputs through an attention-based pointer mechanism that directly signifies the probability of matching logical qubits with physical cores. This enables the selection of optimal cores for logical qubits efficiently. Experimental evaluations show that the proposed method can outperform baseline approaches in terms of reducing inter-core communications and minimizing online time-to-solution. This research contributes to the advancement of scalable quantum computing systems by introducing a novel learning-based heuristic approach for efficient quantum circuit compilation and map**.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Deep Reinforcement Learning based Online Scheduling Policy for Deep Neural Network Multi-Tenant Multi-Accelerator Systems
Authors:
Francesco G. Blanco,
Enrico Russo,
Maurizio Palesi,
Davide Patti,
Giuseppe Ascia,
Vincenzo Catania
Abstract:
Currently, there is a growing trend of outsourcing the execution of DNNs to cloud services. For service providers, managing multi-tenancy and ensuring high-quality service delivery, particularly in meeting stringent execution time constraints, assumes paramount importance, all while endeavoring to maintain cost-effectiveness. In this context, the utilization of heterogeneous multi-accelerator syst…
▽ More
Currently, there is a growing trend of outsourcing the execution of DNNs to cloud services. For service providers, managing multi-tenancy and ensuring high-quality service delivery, particularly in meeting stringent execution time constraints, assumes paramount importance, all while endeavoring to maintain cost-effectiveness. In this context, the utilization of heterogeneous multi-accelerator systems becomes increasingly relevant. This paper presents RELMAS, a low-overhead deep reinforcement learning algorithm designed for the online scheduling of DNNs in multi-tenant environments, taking into account the dataflow heterogeneity of accelerators and memory bandwidths contentions. By doing so, service providers can employ the most efficient scheduling policy for user requests, optimizing Service-Level-Agreement (SLA) satisfaction rates and enhancing hardware utilization. The application of RELMAS to a heterogeneous multi-accelerator system composed of various instances of Simba and Eyeriss sub-accelerators resulted in up to a 173% improvement in SLA satisfaction rate compared to state-of-the-art scheduling techniques across different workload scenarios, with less than a 1.5% energy overhead.
△ Less
Submitted 13 April, 2024;
originally announced April 2024.
-
Towards Fair and Firm Real-Time Scheduling in DNN Multi-Tenant Multi-Accelerator Systems via Reinforcement Learning
Authors:
Enrico Russo,
Francesco Giulio Blanco,
Maurizio Palesi,
Giuseppe Ascia,
Davide Patti,
Vincenzo Catania
Abstract:
This paper addresses the critical challenge of managing Quality of Service (QoS) in cloud services, focusing on the nuances of individual tenant expectations and varying Service Level Indicators (SLIs). It introduces a novel approach utilizing Deep Reinforcement Learning for tenant-specific QoS management in multi-tenant, multi-accelerator cloud environments. The chosen SLI, deadline hit rate, all…
▽ More
This paper addresses the critical challenge of managing Quality of Service (QoS) in cloud services, focusing on the nuances of individual tenant expectations and varying Service Level Indicators (SLIs). It introduces a novel approach utilizing Deep Reinforcement Learning for tenant-specific QoS management in multi-tenant, multi-accelerator cloud environments. The chosen SLI, deadline hit rate, allows clients to tailor QoS for each service request. A novel online scheduling algorithm for Deep Neural Networks in multi-accelerator systems is proposed, with a focus on guaranteeing tenant-wise, model-specific QoS levels while considering real-time constraints.
△ Less
Submitted 9 February, 2024;
originally announced March 2024.
-
A Survey on Design Methodologies for Accelerating Deep Learning on Heterogeneous Architectures
Authors:
Fabrizio Ferrandi,
Serena Curzel,
Leandro Fiorin,
Daniele Ielmini,
Cristina Silvano,
Francesco Conti,
Alessio Burrello,
Francesco Barchi,
Luca Benini,
Luciano Lavagno,
Teodoro Urso,
Enrico Calore,
Sebastiano Fabio Schifano,
Cristian Zambelli,
Maurizio Palesi,
Giuseppe Ascia,
Enrico Russo,
Nicola Petra,
Davide De Caro,
Gennaro Di Meo,
Valeria Cardellini,
Salvatore Filippone,
Francesco Lo Presti,
Francesco Silvestri,
Paolo Palazzari
, et al. (1 additional authors not shown)
Abstract:
In recent years, the field of Deep Learning has seen many disruptive and impactful advancements. Given the increasing complexity of deep neural networks, the need for efficient hardware accelerators has become more and more pressing to design heterogeneous HPC platforms. The design of Deep Learning accelerators requires a multidisciplinary approach, combining expertise from several areas, spanning…
▽ More
In recent years, the field of Deep Learning has seen many disruptive and impactful advancements. Given the increasing complexity of deep neural networks, the need for efficient hardware accelerators has become more and more pressing to design heterogeneous HPC platforms. The design of Deep Learning accelerators requires a multidisciplinary approach, combining expertise from several areas, spanning from computer architecture to approximate computing, computational models, and machine learning algorithms. Several methodologies and tools have been proposed to design accelerators for Deep Learning, including hardware-software co-design approaches, high-level synthesis methods, specific customized compilers, and methodologies for design space exploration, modeling, and simulation. These methodologies aim to maximize the exploitable parallelism and minimize data movement to achieve high performance and energy efficiency. This survey provides a holistic review of the most influential design methodologies and EDA tools proposed in recent years to implement Deep Learning accelerators, offering the reader a wide perspective in this rapidly evolving field. In particular, this work complements the previous survey proposed by the same authors in [203], which focuses on Deep Learning hardware accelerators for heterogeneous HPC platforms.
△ Less
Submitted 29 November, 2023;
originally announced November 2023.
-
A Survey on Deep Learning Hardware Accelerators for Heterogeneous HPC Platforms
Authors:
Cristina Silvano,
Daniele Ielmini,
Fabrizio Ferrandi,
Leandro Fiorin,
Serena Curzel,
Luca Benini,
Francesco Conti,
Angelo Garofalo,
Cristian Zambelli,
Enrico Calore,
Sebastiano Fabio Schifano,
Maurizio Palesi,
Giuseppe Ascia,
Davide Patti,
Nicola Petra,
Davide De Caro,
Luciano Lavagno,
Teodoro Urso,
Valeria Cardellini,
Gian Carlo Cardarilli,
Robert Birke,
Stefania Perri
Abstract:
Recent trends in deep learning (DL) imposed hardware accelerators as the most viable solution for several classes of high-performance computing (HPC) applications such as image classification, computer vision, and speech recognition. This survey summarizes and classifies the most recent advances in designing DL accelerators suitable to reach the performance requirements of HPC applications. In par…
▽ More
Recent trends in deep learning (DL) imposed hardware accelerators as the most viable solution for several classes of high-performance computing (HPC) applications such as image classification, computer vision, and speech recognition. This survey summarizes and classifies the most recent advances in designing DL accelerators suitable to reach the performance requirements of HPC applications. In particular, it highlights the most advanced approaches to support deep learning accelerations including not only GPU and TPU-based accelerators but also design-specific hardware accelerators such as FPGA-based and ASIC-based accelerators, Neural Processing Units, open hardware RISC-V-based accelerators and co-processors. The survey also describes accelerators based on emerging memory technologies and computing paradigms, such as 3D-stacked Processor-In-Memory, non-volatile memories (mainly, Resistive RAM and Phase Change Memories) to implement in-memory computing, Neuromorphic Processing Units, and accelerators based on Multi-Chip Modules. Among emerging technologies, we also include some insights into quantum-based accelerators and photonics. To conclude, the survey classifies the most influential architectures and technologies proposed in the last years, with the purpose of offering the reader a comprehensive perspective in the rapidly evolving field of deep learning.
△ Less
Submitted 12 July, 2024; v1 submitted 27 June, 2023;
originally announced June 2023.
-
Scalable multi-chip quantum architectures enabled by cryogenic hybrid wireless/quantum-coherent network-in-package
Authors:
Eduard Alarcón,
Sergi Abadal,
Fabio Sebastiano,
Masoud Babaie,
Edoardo Charbon,
Peter Haring Bolívar,
Maurizio Palesi,
Elena Blokhina,
Dirk Leipold,
Bogdan Staszewski,
Artur Garcia-Sáez,
Carmen G. Almudever
Abstract:
The grand challenge of scaling up quantum computers requires a full-stack architectural standpoint. In this position paper, we will present the vision of a new generation of scalable quantum computing architectures featuring distributed quantum cores (Qcores) interconnected via quantum-coherent qubit state transfer links and orchestrated via an integrated wireless interconnect.
The grand challenge of scaling up quantum computers requires a full-stack architectural standpoint. In this position paper, we will present the vision of a new generation of scalable quantum computing architectures featuring distributed quantum cores (Qcores) interconnected via quantum-coherent qubit state transfer links and orchestrated via an integrated wireless interconnect.
△ Less
Submitted 8 April, 2023; v1 submitted 24 March, 2023;
originally announced March 2023.
-
Multi-Objective Hardware-Map** Co-Optimisation for Multi-Tenant DNN Accelerators
Authors:
Abhijit Das,
Enrico Russo,
Maurizio Palesi
Abstract:
To meet the ever-increasing computation demand from emerging workloads, a scalable design paradigm combines multiple Deep Neural Network (DNN) accelerators to build a large multi-accelerator system. They are mainly proposed for data centers, where workload varies across vision, language, recommendation, etc. Existing works independently explore their hardware configuration and map** strategies d…
▽ More
To meet the ever-increasing computation demand from emerging workloads, a scalable design paradigm combines multiple Deep Neural Network (DNN) accelerators to build a large multi-accelerator system. They are mainly proposed for data centers, where workload varies across vision, language, recommendation, etc. Existing works independently explore their hardware configuration and map** strategies due to the extremely large cross-coupled design space. However, hardware and map** are interdependent and, if not explored together, may lead to sub-optimal performance when workload changes. Moreover, even though a data center accelerator has multiple objectives, almost all the existing works prefer aggregating them into one (mono-objective). But aggregation does not help if the objectives are conflicting, as improving one will worsen the other.
This work proposes MOHaM, a multi-objective hardware-map** co-optimisation framework for multi-tenant DNN accelerators. Specifically, given an application model and a library of heterogeneous, parameterised and reconfigurable sub-accelerator templates, MOHaM returns a Pareto-optimal set of multi-accelerator systems with an optimal schedule for each one of them to minimise the overall system latency, energy and area. MOHaM is evaluated for diverse workload scenarios with state-of-the-art sub-accelerators. The Pareto-optimal set of competitive design choices enables selecting the best one as per the requirement.
△ Less
Submitted 26 October, 2022;
originally announced October 2022.