A Survey on Design Methodologies for Accelerating Deep Learning on Heterogeneous Architectures
Authors:
Fabrizio Ferrandi,
Serena Curzel,
Leandro Fiorin,
Daniele Ielmini,
Cristina Silvano,
Francesco Conti,
Alessio Burrello,
Francesco Barchi,
Luca Benini,
Luciano Lavagno,
Teodoro Urso,
Enrico Calore,
Sebastiano Fabio Schifano,
Cristian Zambelli,
Maurizio Palesi,
Giuseppe Ascia,
Enrico Russo,
Nicola Petra,
Davide De Caro,
Gennaro Di Meo,
Valeria Cardellini,
Salvatore Filippone,
Francesco Lo Presti,
Francesco Silvestri,
Paolo Palazzari
, et al. (1 additional authors not shown)
Abstract:
In recent years, the field of Deep Learning has seen many disruptive and impactful advancements. Given the increasing complexity of deep neural networks, the need for efficient hardware accelerators has become more and more pressing to design heterogeneous HPC platforms. The design of Deep Learning accelerators requires a multidisciplinary approach, combining expertise from several areas, spanning…
▽ More
In recent years, the field of Deep Learning has seen many disruptive and impactful advancements. Given the increasing complexity of deep neural networks, the need for efficient hardware accelerators has become more and more pressing to design heterogeneous HPC platforms. The design of Deep Learning accelerators requires a multidisciplinary approach, combining expertise from several areas, spanning from computer architecture to approximate computing, computational models, and machine learning algorithms. Several methodologies and tools have been proposed to design accelerators for Deep Learning, including hardware-software co-design approaches, high-level synthesis methods, specific customized compilers, and methodologies for design space exploration, modeling, and simulation. These methodologies aim to maximize the exploitable parallelism and minimize data movement to achieve high performance and energy efficiency. This survey provides a holistic review of the most influential design methodologies and EDA tools proposed in recent years to implement Deep Learning accelerators, offering the reader a wide perspective in this rapidly evolving field. In particular, this work complements the previous survey proposed by the same authors in [203], which focuses on Deep Learning hardware accelerators for heterogeneous HPC platforms.
△ Less
Submitted 29 November, 2023;
originally announced November 2023.
Hyper-Systolic Matrix Multiplication
Authors:
Thomas Lippert,
Nikolay Petkov,
Paolo Palazzari,
Klaus Schilling
Abstract:
A novel parallel algorithm for matrix multiplication is presented. The hyper-systolic algorithm makes use of a one-dimensional processor abstraction. The procedure can be implemented on all types of parallel systems. It can handle matrix-vector multiplications as well as transposed matrix products.
A novel parallel algorithm for matrix multiplication is presented. The hyper-systolic algorithm makes use of a one-dimensional processor abstraction. The procedure can be implemented on all types of parallel systems. It can handle matrix-vector multiplications as well as transposed matrix products.
△ Less
Submitted 24 September, 1998;
originally announced September 1998.