-
AdAM: Adaptive Fault-Tolerant Approximate Multiplier for Edge DNN Accelerators
Authors:
Mahdi Taheri,
Natalia Cherezova,
Samira Nazari,
Ahsan Rafiq,
Ali Azarpeyvand,
Tara Ghasempouri,
Masoud Daneshtalab,
Jaan Raik,
Maksim Jenihhin
Abstract:
In this paper, we propose an architecture of a novel adaptive fault-tolerant approximate multiplier tailored for ASIC-based DNN accelerators.
In this paper, we propose an architecture of a novel adaptive fault-tolerant approximate multiplier tailored for ASIC-based DNN accelerators.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
Exploration of Activation Fault Reliability in Quantized Systolic Array-Based DNN Accelerators
Authors:
Mahdi Taheri,
Natalia Cherezova,
Mohammad Saeed Ansari,
Maksim Jenihhin,
Ali Mahani,
Masoud Daneshtalab,
Jaan Raik
Abstract:
The stringent requirements for the Deep Neural Networks (DNNs) accelerator's reliability stand along with the need for reducing the computational burden on the hardware platforms, i.e. reducing the energy consumption and execution time as well as increasing the efficiency of DNN accelerators. Moreover, the growing demand for specialized DNN accelerators with tailored requirements, particularly for…
▽ More
The stringent requirements for the Deep Neural Networks (DNNs) accelerator's reliability stand along with the need for reducing the computational burden on the hardware platforms, i.e. reducing the energy consumption and execution time as well as increasing the efficiency of DNN accelerators. Moreover, the growing demand for specialized DNN accelerators with tailored requirements, particularly for safety-critical applications, necessitates a comprehensive design space exploration to enable the development of efficient and robust accelerators that meet those requirements. Therefore, the trade-off between hardware performance, i.e. area and delay, and the reliability of the DNN accelerator implementation becomes critical and requires tools for analysis. This paper presents a comprehensive methodology for exploring and enabling a holistic assessment of the trilateral impact of quantization on model accuracy, activation fault reliability, and hardware efficiency. A fully automated framework is introduced that is capable of applying various quantization-aware techniques, fault injection, and hardware implementation, thus enabling the measurement of hardware parameters. Moreover, this paper proposes a novel lightweight protection technique integrated within the framework to ensure the dependable deployment of the final systolic-array-based FPGA implementation. The experiments on established benchmarks demonstrate the analysis flow and the profound implications of quantization on reliability, hardware performance, and network accuracy, particularly concerning the transient faults in the network's activations.
△ Less
Submitted 17 January, 2024;
originally announced January 2024.
-
HLS-based Optimization of Tau Triggering Algorithm for LHC: a case study
Authors:
Natalia Cherezova,
Dmitri Mihhailov,
Sergei Devadze,
Artur Jutman
Abstract:
With the current increase in the data produced by the Large Hadron Collider (LHC) at CERN, it becomes important to process this data in a corresponding manner. To begin with, to efficiently select events that contain relevant information from a massive flow of data. This is the task of the tau lepton decay triggering algorithm. The implementation is based on the High-Level Synthesis (HLS) approach…
▽ More
With the current increase in the data produced by the Large Hadron Collider (LHC) at CERN, it becomes important to process this data in a corresponding manner. To begin with, to efficiently select events that contain relevant information from a massive flow of data. This is the task of the tau lepton decay triggering algorithm. The implementation is based on the High-Level Synthesis (HLS) approach that allows generating a hardware description of the design from the algorithm written in a high-level programming language like C++. HLS tools are intended to decrease the time and complexity of hardware design development, however, their capabilities are limited. The development of an efficient application requires substantial knowledge of the hardware design and HLS specifics. This paper presents the optimizations introduced to the algorithm that improved latency and area and more importantly solved the problems with the routing, making it possible to implement the algorithm on the FPGA fabric.
△ Less
Submitted 8 December, 2022;
originally announced December 2022.