Search | arXiv e-print repository

Scale up your In-Memory Accelerator: Leveraging Wireless-on-Chip Communication for AIMC-based CNN Inference

Authors: Nazareno Bruschi, Giuseppe Tagliavini, Francesco Conti, Sergi Abadal, Alberto Cabellos-Aparicio, Eduard Alarcón, Geethan Karunaratne, Irem Boybat, Luca Benini, Davide Rossi

Abstract: Analog In-Memory Computing (AIMC) is emerging as a disruptive paradigm for heterogeneous computing, potentially delivering orders of magnitude better peak performance and efficiency over traditional digital signal processing architectures on Matrix-Vector multiplication. However, to sustain this throughput in real-world applications, AIMC tiles must be supplied with data at very high bandwidth and… ▽ More Analog In-Memory Computing (AIMC) is emerging as a disruptive paradigm for heterogeneous computing, potentially delivering orders of magnitude better peak performance and efficiency over traditional digital signal processing architectures on Matrix-Vector multiplication. However, to sustain this throughput in real-world applications, AIMC tiles must be supplied with data at very high bandwidth and low latency; this poses an unprecedented pressure on the on-chip communication infrastructure, which becomes the system's performance and efficiency bottleneck. In this context, the performance and plasticity of emerging on-chip wireless communication paradigms provide the required breakthrough to up-scale on-chip communication in large AIMC devices. This work presents a many-tile AIMC architecture with inter-tile wireless communication that integrates multiple heterogeneous computing clusters, embedding a mix of parallel RISC-V cores and AIMC tiles. We perform an extensive design space exploration of the proposed architecture and discuss the benefits of exploiting emerging on-chip communication technologies such as wireless transceivers in the millimeter-wave and terahertz bands. △ Less

Submitted 3 June, 2022; originally announced June 2022.

arXiv:2005.07137 [pdf, other]

ChewBaccaNN: A Flexible 223 TOPS/W BNN Accelerator

Authors: Renzo Andri, Geethan Karunaratne, Lukas Cavigelli, Luca Benini

Abstract: Binary Neural Networks enable smart IoT devices, as they significantly reduce the required memory footprint and computational complexity while retaining a high network performance and flexibility. This paper presents ChewBaccaNN, a 0.7 mm$^2$ sized binary convolutional neural network (CNN) accelerator designed in GlobalFoundries 22 nm technology. By exploiting efficient data re-use, data buffering… ▽ More Binary Neural Networks enable smart IoT devices, as they significantly reduce the required memory footprint and computational complexity while retaining a high network performance and flexibility. This paper presents ChewBaccaNN, a 0.7 mm$^2$ sized binary convolutional neural network (CNN) accelerator designed in GlobalFoundries 22 nm technology. By exploiting efficient data re-use, data buffering, latch-based memories, and voltage scaling, a throughput of 241 GOPS is achieved while consuming just 1.1 mW at 0.4V/154MHz during inference of binary CNNs with up to 7x7 kernels, leading to a peak core energy efficiency of 223 TOPS/W. ChewBaccaNN's flexibility allows to run a much wider range of binary CNNs than other accelerators, drastically improving the accuracy-energy trade-off beyond what can be captured by the TOPS/W metric. In fact, it can perform CIFAR-10 inference at 86.8% accuracy with merely 1.3 $μJ$, thus exceeding the accuracy while at the same time lowering the energy cost by 2.8x compared to even the most efficient and much larger analog processing-in-memory devices, while kee** the flexibility of running larger CNNs for higher accuracy when needed. It also runs a binary ResNet-18 trained on the 1000-class ILSVRC dataset and improves the energy efficiency by 4.4x over accelerators of similar flexibility. Furthermore, it can perform inference on a binarized ResNet-18 trained with 8-bases Group-Net to achieve a 67.5% Top-1 accuracy with only 3.0 mJ/frame -- at an accuracy drop of merely 1.8% from the full-precision ResNet-18. △ Less

Submitted 26 February, 2021; v1 submitted 12 May, 2020; originally announced May 2020.

Comments: Accepted at IEEE ISCAS 2021, Daegu, South Korea, 23-26 May 2021

arXiv:1803.05849 [pdf, other]

XNORBIN: A 95 TOp/s/W Hardware Accelerator for Binary Convolutional Neural Networks

Authors: Andrawes Al Bahou, Geethan Karunaratne, Renzo Andri, Lukas Cavigelli, Luca Benini

Abstract: Deploying state-of-the-art CNNs requires power-hungry processors and off-chip memory. This precludes the implementation of CNNs in low-power embedded systems. Recent research shows CNNs sustain extreme quantization, binarizing their weights and intermediate feature maps, thereby saving 8-32\x memory and collapsing energy-intensive sum-of-products into XNOR-and-popcount operations. We present XNO… ▽ More Deploying state-of-the-art CNNs requires power-hungry processors and off-chip memory. This precludes the implementation of CNNs in low-power embedded systems. Recent research shows CNNs sustain extreme quantization, binarizing their weights and intermediate feature maps, thereby saving 8-32\x memory and collapsing energy-intensive sum-of-products into XNOR-and-popcount operations. We present XNORBIN, an accelerator for binary CNNs with computation tightly coupled to memory for aggressive data reuse. Implemented in UMC 65nm technology XNORBIN achieves an energy efficiency of 95 TOp/s/W and an area efficiency of 2.0 TOp/s/MGE at 0.8 V. △ Less

Submitted 5 March, 2018; originally announced March 2018.

Showing 1–3 of 3 results for author: Karunaratne, G