Search | arXiv e-print repository

Lightator: An Optical Near-Sensor Accelerator with Compressive Acquisition Enabling Versatile Image Processing

Authors: Mehrdad Morsali, Brendan Reidy, Deniz Najafi, Sepehr Tabrizchi, Mohsen Imani, Mahdi Nikdast, Arman Roohi, Ramtin Zand, Shaahin Angizi

Abstract: This paper proposes a high-performance and energy-efficient optical near-sensor accelerator for vision applications, called Lightator. Harnessing the promising efficiency offered by photonic devices, Lightator features innovative compressive acquisition of input frames and fine-grained convolution operations for low-power and versatile image processing at the edge for the first time. This will sub… ▽ More This paper proposes a high-performance and energy-efficient optical near-sensor accelerator for vision applications, called Lightator. Harnessing the promising efficiency offered by photonic devices, Lightator features innovative compressive acquisition of input frames and fine-grained convolution operations for low-power and versatile image processing at the edge for the first time. This will substantially diminish the energy consumption and latency of conversion, transmission, and processing within the established cloud-centric architecture as well as recently designed edge accelerators. Our device-to-architecture simulation results show that with favorable accuracy, Lightator achieves 84.4 Kilo FPS/W and reduces power consumption by a factor of ~24x and 73x on average compared with existing photonic accelerators and GPU baseline. △ Less

Submitted 7 March, 2024; originally announced March 2024.

Comments: 6 pages, 10 figures

arXiv:2311.18655 [pdf, other]

OISA: Architecting an Optical In-Sensor Accelerator for Efficient Visual Computing

Authors: Mehrdad Morsali, Sepehr Tabrizchi, Deniz Najafi, Mohsen Imani, Mahdi Nikdast, Arman Roohi, Shaahin Angizi

Abstract: Targeting vision applications at the edge, in this work, we systematically explore and propose a high-performance and energy-efficient Optical In-Sensor Accelerator architecture called OISA for the first time. Taking advantage of the promising efficiency of photonic devices, the OISA intrinsically implements a coarse-grained convolution operation on the input frames in an innovative minimum-conver… ▽ More Targeting vision applications at the edge, in this work, we systematically explore and propose a high-performance and energy-efficient Optical In-Sensor Accelerator architecture called OISA for the first time. Taking advantage of the promising efficiency of photonic devices, the OISA intrinsically implements a coarse-grained convolution operation on the input frames in an innovative minimum-conversion fashion in low-bit-width neural networks. Such a design remarkably reduces the power consumption of data conversion, transmission, and processing in the conventional cloud-centric architecture as well as recently-presented edge accelerators. Our device-to-architecture simulation results on various image data-sets demonstrate acceptable accuracy while OISA achieves 6.68 TOp/s/W efficiency. OISA reduces power consumption by a factor of 7.9 and 18.4 on average compared with existing electronic in-/near-sensor and ASIC accelerators. △ Less

Submitted 30 November, 2023; originally announced November 2023.

Comments: 7 pages

arXiv:2311.16406 [pdf, other]

DIAC: Design Exploration of Intermittent-Aware Computing Realizing Batteryless Systems

Authors: Sepehr Tabrizchi, Shaahin Angizi, Arman Roohi

Abstract: Battery-powered IoT devices face challenges like cost, maintenance, and environmental sustainability, prompting the emergence of batteryless energy-harvesting systems that harness ambient sources. However, their intermittent behavior can disrupt program execution and cause data loss, leading to unpredictable outcomes. Despite exhaustive studies employing conventional checkpoint methods and intrica… ▽ More Battery-powered IoT devices face challenges like cost, maintenance, and environmental sustainability, prompting the emergence of batteryless energy-harvesting systems that harness ambient sources. However, their intermittent behavior can disrupt program execution and cause data loss, leading to unpredictable outcomes. Despite exhaustive studies employing conventional checkpoint methods and intricate programming paradigms to address these pitfalls, this paper proposes an innovative systematic methodology, namely DIAC. The DIAC synthesis procedure enhances the performance and efficiency of intermittent computing systems, with a focus on maximizing forward progress and minimizing the energy overhead imposed by distinct memory arrays for backup. Then, a finite-state machine is delineated, encapsulating the core operations of an IoT node, sense, compute, transmit, and sleep states. First, we validate the robustness and functionalities of a DIAC-based design in the presence of power disruptions. DIAC is then applied to a wide range of benchmarks, including ISCAS-89, MCNS, and ITC-99. The simulation results substantiate the power-delay-product (PDP) benefits. For example, results for complex MCNC benchmarks indicate a PDP improvement of 61%, 56%, and 38% on average compared to three alternative techniques, evaluated at 45 nm. △ Less

Submitted 27 November, 2023; originally announced November 2023.

Comments: 6 pages, will be appeared in Design, Automation and Test in Europe Conference 2024

arXiv:2210.06698 [pdf, other]

A Near-Sensor Processing Accelerator for Approximate Local Binary Pattern Networks

Authors: Shaahin Angizi, Mehrdad Morsali, Sepehr Tabrizchi, Arman Roohi

Abstract: In this work, a high-speed and energy-efficient comparator-based Near-Sensor Local Binary Pattern accelerator architecture (NS-LBP) is proposed to execute a novel local binary pattern deep neural network. First, inspired by recent LBP networks, we design an approximate, hardware-oriented, and multiply-accumulate (MAC)-free network named Ap-LBP for efficient feature extraction, further reducing the… ▽ More In this work, a high-speed and energy-efficient comparator-based Near-Sensor Local Binary Pattern accelerator architecture (NS-LBP) is proposed to execute a novel local binary pattern deep neural network. First, inspired by recent LBP networks, we design an approximate, hardware-oriented, and multiply-accumulate (MAC)-free network named Ap-LBP for efficient feature extraction, further reducing the computation complexity. Then, we develop NS-LBP as a processing-in-SRAM unit and a parallel in-memory LBP algorithm to process images near the sensor in a cache, remarkably reducing the power consumption of data transmission to an off-chip processor. Our circuit-to-application co-simulation results on MNIST and SVHN data-sets demonstrate minor accuracy degradation compared to baseline CNN and LBP-network models, while NS-LBP achieves 1.25 GHz and energy-efficiency of 37.4 TOPS/W. NS-LBP reduces energy consumption by 2.2x and execution time by a factor of 4x compared to the best recent LBP-based networks. △ Less

Submitted 12 October, 2022; originally announced October 2022.

Comments: 10 pages, 11 figures, 4 tables

arXiv:2202.09035 [pdf, other]

PISA: A Binary-Weight Processing-In-Sensor Accelerator for Edge Image Processing

Authors: Shaahin Angizi, Sepehr Tabrizchi, Arman Roohi

Abstract: This work proposes a Processing-In-Sensor Accelerator, namely PISA, as a flexible, energy-efficient, and high-performance solution for real-time and smart image processing in AI devices. PISA intrinsically implements a coarse-grained convolution operation in Binarized-Weight Neural Networks (BWNNs) leveraging a novel compute-pixel with non-volatile weight storage at the sensor side. This remarkabl… ▽ More This work proposes a Processing-In-Sensor Accelerator, namely PISA, as a flexible, energy-efficient, and high-performance solution for real-time and smart image processing in AI devices. PISA intrinsically implements a coarse-grained convolution operation in Binarized-Weight Neural Networks (BWNNs) leveraging a novel compute-pixel with non-volatile weight storage at the sensor side. This remarkably reduces the power consumption of data conversion and transmission to an off-chip processor. The design is completed with a bit-wise near-sensor processing-in-DRAM computing unit to process the remaining network layers. Once the object is detected, PISA switches to typical sensing mode to capture the image for a fine-grained convolution using only the near-sensor processing unit. Our circuit-to-application co-simulation results on a BWNN acceleration demonstrate acceptable accuracy on various image datasets in coarse-grained evaluation compared to baseline BWNN models, while PISA achieves a frame rate of 1000 and efficiency of ~1.74 TOp/s/W. Lastly, PISA substantially reduces data conversion and transmission energy by ~84% compared to a baseline CPU-sensor design. △ Less

Submitted 18 February, 2022; originally announced February 2022.

Comments: 11 pages, 16 figures

arXiv:1806.07570 [pdf]

Energy Efficient Tri-State CNFET Ternary Logic Gates

Authors: Sepher Tabrizchi, Fazel Sharifi, Abdel-Hameed A. Badawy

Abstract: Traditional silicon binary circuits continue to face challenges such as high leakage power dissipation and large area of interconnections. Multiple-Valued Logic (MVL) and nano devices are two feasible solutions to overcome these problems. In this paper, a novel method is presented to design ternary logic circuits based on Carbon Nanotube Field Effect Transistors (CNFETs). The proposed designs use… ▽ More Traditional silicon binary circuits continue to face challenges such as high leakage power dissipation and large area of interconnections. Multiple-Valued Logic (MVL) and nano devices are two feasible solutions to overcome these problems. In this paper, a novel method is presented to design ternary logic circuits based on Carbon Nanotube Field Effect Transistors (CNFETs). The proposed designs use the unique properties of CNFETs, for example, adjusting the Carbon Nanontube (CNT) diameters to have the desired threshold voltage and have the same mobility of P-FET and N-FET transistors. Each of our designed logic circuits implements a logic function and its complementary via a control signal. Also, these circuits have a high impedance state which saves power while the circuits are not in use. In an effort to show a more detailed application of our approach, we design a 2-digit adder-subtractor circuit. We simulate the proposed ternary circuits using HSPICE via standard 32nm CNFET technology. The simulation results indicate the correct operation of the designs under different process, voltage and temperature (PVT) variations. Moreover, a power efficient ternary logic ALU has been design based on the proposed gates. △ Less

Submitted 20 June, 2018; originally announced June 2018.

Showing 1–6 of 6 results for author: Tabrizchi, S