-
Lightator: An Optical Near-Sensor Accelerator with Compressive Acquisition Enabling Versatile Image Processing
Authors:
Mehrdad Morsali,
Brendan Reidy,
Deniz Najafi,
Sepehr Tabrizchi,
Mohsen Imani,
Mahdi Nikdast,
Arman Roohi,
Ramtin Zand,
Shaahin Angizi
Abstract:
This paper proposes a high-performance and energy-efficient optical near-sensor accelerator for vision applications, called Lightator. Harnessing the promising efficiency offered by photonic devices, Lightator features innovative compressive acquisition of input frames and fine-grained convolution operations for low-power and versatile image processing at the edge for the first time. This will sub…
▽ More
This paper proposes a high-performance and energy-efficient optical near-sensor accelerator for vision applications, called Lightator. Harnessing the promising efficiency offered by photonic devices, Lightator features innovative compressive acquisition of input frames and fine-grained convolution operations for low-power and versatile image processing at the edge for the first time. This will substantially diminish the energy consumption and latency of conversion, transmission, and processing within the established cloud-centric architecture as well as recently designed edge accelerators. Our device-to-architecture simulation results show that with favorable accuracy, Lightator achieves 84.4 Kilo FPS/W and reduces power consumption by a factor of ~24x and 73x on average compared with existing photonic accelerators and GPU baseline.
△ Less
Submitted 7 March, 2024;
originally announced March 2024.
-
OISA: Architecting an Optical In-Sensor Accelerator for Efficient Visual Computing
Authors:
Mehrdad Morsali,
Sepehr Tabrizchi,
Deniz Najafi,
Mohsen Imani,
Mahdi Nikdast,
Arman Roohi,
Shaahin Angizi
Abstract:
Targeting vision applications at the edge, in this work, we systematically explore and propose a high-performance and energy-efficient Optical In-Sensor Accelerator architecture called OISA for the first time. Taking advantage of the promising efficiency of photonic devices, the OISA intrinsically implements a coarse-grained convolution operation on the input frames in an innovative minimum-conver…
▽ More
Targeting vision applications at the edge, in this work, we systematically explore and propose a high-performance and energy-efficient Optical In-Sensor Accelerator architecture called OISA for the first time. Taking advantage of the promising efficiency of photonic devices, the OISA intrinsically implements a coarse-grained convolution operation on the input frames in an innovative minimum-conversion fashion in low-bit-width neural networks. Such a design remarkably reduces the power consumption of data conversion, transmission, and processing in the conventional cloud-centric architecture as well as recently-presented edge accelerators. Our device-to-architecture simulation results on various image data-sets demonstrate acceptable accuracy while OISA achieves 6.68 TOp/s/W efficiency. OISA reduces power consumption by a factor of 7.9 and 18.4 on average compared with existing electronic in-/near-sensor and ASIC accelerators.
△ Less
Submitted 30 November, 2023;
originally announced November 2023.
-
DIAC: Design Exploration of Intermittent-Aware Computing Realizing Batteryless Systems
Authors:
Sepehr Tabrizchi,
Shaahin Angizi,
Arman Roohi
Abstract:
Battery-powered IoT devices face challenges like cost, maintenance, and environmental sustainability, prompting the emergence of batteryless energy-harvesting systems that harness ambient sources. However, their intermittent behavior can disrupt program execution and cause data loss, leading to unpredictable outcomes. Despite exhaustive studies employing conventional checkpoint methods and intrica…
▽ More
Battery-powered IoT devices face challenges like cost, maintenance, and environmental sustainability, prompting the emergence of batteryless energy-harvesting systems that harness ambient sources. However, their intermittent behavior can disrupt program execution and cause data loss, leading to unpredictable outcomes. Despite exhaustive studies employing conventional checkpoint methods and intricate programming paradigms to address these pitfalls, this paper proposes an innovative systematic methodology, namely DIAC. The DIAC synthesis procedure enhances the performance and efficiency of intermittent computing systems, with a focus on maximizing forward progress and minimizing the energy overhead imposed by distinct memory arrays for backup. Then, a finite-state machine is delineated, encapsulating the core operations of an IoT node, sense, compute, transmit, and sleep states. First, we validate the robustness and functionalities of a DIAC-based design in the presence of power disruptions. DIAC is then applied to a wide range of benchmarks, including ISCAS-89, MCNS, and ITC-99. The simulation results substantiate the power-delay-product (PDP) benefits. For example, results for complex MCNC benchmarks indicate a PDP improvement of 61%, 56%, and 38% on average compared to three alternative techniques, evaluated at 45 nm.
△ Less
Submitted 27 November, 2023;
originally announced November 2023.
-
A Near-Sensor Processing Accelerator for Approximate Local Binary Pattern Networks
Authors:
Shaahin Angizi,
Mehrdad Morsali,
Sepehr Tabrizchi,
Arman Roohi
Abstract:
In this work, a high-speed and energy-efficient comparator-based Near-Sensor Local Binary Pattern accelerator architecture (NS-LBP) is proposed to execute a novel local binary pattern deep neural network. First, inspired by recent LBP networks, we design an approximate, hardware-oriented, and multiply-accumulate (MAC)-free network named Ap-LBP for efficient feature extraction, further reducing the…
▽ More
In this work, a high-speed and energy-efficient comparator-based Near-Sensor Local Binary Pattern accelerator architecture (NS-LBP) is proposed to execute a novel local binary pattern deep neural network. First, inspired by recent LBP networks, we design an approximate, hardware-oriented, and multiply-accumulate (MAC)-free network named Ap-LBP for efficient feature extraction, further reducing the computation complexity. Then, we develop NS-LBP as a processing-in-SRAM unit and a parallel in-memory LBP algorithm to process images near the sensor in a cache, remarkably reducing the power consumption of data transmission to an off-chip processor. Our circuit-to-application co-simulation results on MNIST and SVHN data-sets demonstrate minor accuracy degradation compared to baseline CNN and LBP-network models, while NS-LBP achieves 1.25 GHz and energy-efficiency of 37.4 TOPS/W. NS-LBP reduces energy consumption by 2.2x and execution time by a factor of 4x compared to the best recent LBP-based networks.
△ Less
Submitted 12 October, 2022;
originally announced October 2022.
-
PISA: A Binary-Weight Processing-In-Sensor Accelerator for Edge Image Processing
Authors:
Shaahin Angizi,
Sepehr Tabrizchi,
Arman Roohi
Abstract:
This work proposes a Processing-In-Sensor Accelerator, namely PISA, as a flexible, energy-efficient, and high-performance solution for real-time and smart image processing in AI devices. PISA intrinsically implements a coarse-grained convolution operation in Binarized-Weight Neural Networks (BWNNs) leveraging a novel compute-pixel with non-volatile weight storage at the sensor side. This remarkabl…
▽ More
This work proposes a Processing-In-Sensor Accelerator, namely PISA, as a flexible, energy-efficient, and high-performance solution for real-time and smart image processing in AI devices. PISA intrinsically implements a coarse-grained convolution operation in Binarized-Weight Neural Networks (BWNNs) leveraging a novel compute-pixel with non-volatile weight storage at the sensor side. This remarkably reduces the power consumption of data conversion and transmission to an off-chip processor. The design is completed with a bit-wise near-sensor processing-in-DRAM computing unit to process the remaining network layers. Once the object is detected, PISA switches to typical sensing mode to capture the image for a fine-grained convolution using only the near-sensor processing unit. Our circuit-to-application co-simulation results on a BWNN acceleration demonstrate acceptable accuracy on various image datasets in coarse-grained evaluation compared to baseline BWNN models, while PISA achieves a frame rate of 1000 and efficiency of ~1.74 TOp/s/W. Lastly, PISA substantially reduces data conversion and transmission energy by ~84% compared to a baseline CPU-sensor design.
△ Less
Submitted 18 February, 2022;
originally announced February 2022.
-
Energy Efficient Tri-State CNFET Ternary Logic Gates
Authors:
Sepher Tabrizchi,
Fazel Sharifi,
Abdel-Hameed A. Badawy
Abstract:
Traditional silicon binary circuits continue to face challenges such as high leakage power dissipation and large area of interconnections. Multiple-Valued Logic (MVL) and nano devices are two feasible solutions to overcome these problems. In this paper, a novel method is presented to design ternary logic circuits based on Carbon Nanotube Field Effect Transistors (CNFETs). The proposed designs use…
▽ More
Traditional silicon binary circuits continue to face challenges such as high leakage power dissipation and large area of interconnections. Multiple-Valued Logic (MVL) and nano devices are two feasible solutions to overcome these problems. In this paper, a novel method is presented to design ternary logic circuits based on Carbon Nanotube Field Effect Transistors (CNFETs). The proposed designs use the unique properties of CNFETs, for example, adjusting the Carbon Nanontube (CNT) diameters to have the desired threshold voltage and have the same mobility of P-FET and N-FET transistors. Each of our designed logic circuits implements a logic function and its complementary via a control signal. Also, these circuits have a high impedance state which saves power while the circuits are not in use. In an effort to show a more detailed application of our approach, we design a 2-digit adder-subtractor circuit. We simulate the proposed ternary circuits using HSPICE via standard 32nm CNFET technology. The simulation results indicate the correct operation of the designs under different process, voltage and temperature (PVT) variations. Moreover, a power efficient ternary logic ALU has been design based on the proposed gates.
△ Less
Submitted 20 June, 2018;
originally announced June 2018.