Search | arXiv e-print repository

doi 10.1109/ISCAS58744.2024.10558286

A Precision-Optimized Fixed-Point Near-Memory Digital Processing Unit for Analog In-Memory Computing

Authors: Elena Ferro, Athanasios Vasilopoulos, Corey Lammie, Manuel Le Gallo, Luca Benini, Irem Boybat, Abu Sebastian

Abstract: Analog In-Memory Computing (AIMC) is an emerging technology for fast and energy-efficient Deep Learning (DL) inference. However, a certain amount of digital post-processing is required to deal with circuit mismatches and non-idealities associated with the memory devices. Efficient near-memory digital logic is critical to retain the high area/energy efficiency and low latency of AIMC. Existing syst… ▽ More Analog In-Memory Computing (AIMC) is an emerging technology for fast and energy-efficient Deep Learning (DL) inference. However, a certain amount of digital post-processing is required to deal with circuit mismatches and non-idealities associated with the memory devices. Efficient near-memory digital logic is critical to retain the high area/energy efficiency and low latency of AIMC. Existing systems adopt Floating Point 16 (FP16) arithmetic with limited parallelization capability and high latency. To overcome these limitations, we propose a Near-Memory digital Processing Unit (NMPU) based on fixed-point arithmetic. It achieves competitive accuracy and higher computing throughput than previous approaches while minimizing the area overhead. Moreover, the NMPU supports standard DL activation steps, such as ReLU and Batch Normalization. We perform a physical implementation of the NMPU design in a 14 nm CMOS technology and provide detailed performance, power, and area assessments. We validate the efficacy of the NMPU by using data from an AIMC chip and demonstrate that a simulated AIMC system with the proposed NMPU outperforms existing FP16-based implementations, providing 139$\times$ speed-up, 7.8$\times$ smaller area, and a competitive power consumption. Additionally, our approach achieves an inference accuracy of 86.65 %/65.06 %, with an accuracy drop of just 0.12 %/0.4 % compared to the FP16 baseline when benchmarked with ResNet9/ResNet32 networks trained on the CIFAR10/CIFAR100 datasets, respectively. △ Less

Submitted 12 February, 2024; originally announced February 2024.

Comments: Accepted at ISCAS2024

arXiv:2209.10481 [pdf, other]

doi 10.1063/5.0116699

Benchmarking energy consumption and latency for neuromorphic computing in condensed matter and particle physics

Authors: Dominique J. Kösters, Bryan A. Kortman, Irem Boybat, Elena Ferro, Sagar Dolas, Roberto de Austri, Johan Kwisthout, Hans Hilgenkamp, Theo Rasing, Heike Riel, Abu Sebastian, Sascha Caron, Johan H. Mentink

Abstract: The massive use of artificial neural networks (ANNs), increasingly popular in many areas of scientific computing, rapidly increases the energy consumption of modern high-performance computing systems. An appealing and possibly more sustainable alternative is provided by novel neuromorphic paradigms, which directly implement ANNs in hardware. However, little is known about the actual benefits of ru… ▽ More The massive use of artificial neural networks (ANNs), increasingly popular in many areas of scientific computing, rapidly increases the energy consumption of modern high-performance computing systems. An appealing and possibly more sustainable alternative is provided by novel neuromorphic paradigms, which directly implement ANNs in hardware. However, little is known about the actual benefits of running ANNs on neuromorphic hardware for use cases in scientific computing. Here we present a methodology for measuring the energy cost and compute time for inference tasks with ANNs on conventional hardware. In addition, we have designed an architecture for these tasks and estimate the same metrics based on a state-of-the-art analog in-memory computing (AIMC) platform, one of the key paradigms in neuromorphic computing. Both methodologies are compared for a use case in quantum many-body physics in two dimensional condensed matter systems and for anomaly detection at 40 MHz rates at the Large Hadron Collider in particle physics. We find that AIMC can achieve up to one order of magnitude shorter computation times than conventional hardware, at an energy cost that is up to three orders of magnitude smaller. This suggests great potential for faster and more sustainable scientific computing with neuromorphic hardware. △ Less

Submitted 21 February, 2023; v1 submitted 21 September, 2022; originally announced September 2022.

Comments: 7 pages, 4 figures, submitted to and accepted by APL Machine Learning

Journal ref: APL Machine Learning 1, 016101 (2023)

arXiv:1510.06175 [pdf]

Wireless communication, identification and sensing technologies enabling integrated logistics: a study in the harbor environment

Authors: Mario G. C. A. Cimino, Nedo Celandroni, Erina Ferro, Davide La Rosa, Filippo Palumbo, Gigliola Vaglini

Abstract: In the last decade, integrated logistics has become an important challenge in the development of wireless communication, identification and sensing technology, due to the growing complexity of logistics processes and the increasing demand for adapting systems to new requirements. The advancement of wireless technology provides a wide range of options for the maritime container terminals. Electroni… ▽ More In the last decade, integrated logistics has become an important challenge in the development of wireless communication, identification and sensing technology, due to the growing complexity of logistics processes and the increasing demand for adapting systems to new requirements. The advancement of wireless technology provides a wide range of options for the maritime container terminals. Electronic devices employed in container terminals reduce the manual effort, facilitating timely information flow and enhancing control and quality of service and decision made. In this paper, we examine the technology that can be used to support integration in harbor's logistics. In the literature, most systems have been developed to address specific needs of particular harbors, but a systematic study is missing. The purpose is to provide an overview to the reader about which technology of integrated logistics can be implemented and what remains to be addressed in the future. △ Less

Submitted 21 October, 2015; originally announced October 2015.

Journal ref: http://esjournals.org//journaloftechnology/Download_Sep_2015_pdf_5.php

Showing 1–3 of 3 results for author: Ferro, E