-
AdaQAT: Adaptive Bit-Width Quantization-Aware Training
Authors:
Cédric Gernigon,
Silviu-Ioan Filip,
Olivier Sentieys,
Clément Coggiola,
Mickael Bruno
Abstract:
Large-scale deep neural networks (DNNs) have achieved remarkable success in many application scenarios. However, high computational complexity and energy costs of modern DNNs make their deployment on edge devices challenging. Model quantization is a common approach to deal with deployment constraints, but searching for optimized bit-widths can be challenging. In this work, we present Adaptive Bit-…
▽ More
Large-scale deep neural networks (DNNs) have achieved remarkable success in many application scenarios. However, high computational complexity and energy costs of modern DNNs make their deployment on edge devices challenging. Model quantization is a common approach to deal with deployment constraints, but searching for optimized bit-widths can be challenging. In this work, we present Adaptive Bit-Width Quantization Aware Training (AdaQAT), a learning-based method that automatically optimizes weight and activation signal bit-widths during training for more efficient DNN inference. We use relaxed real-valued bit-widths that are updated using a gradient descent rule, but are otherwise discretized for all quantization operations. The result is a simple and flexible QAT approach for mixed-precision uniform quantization problems. Compared to other methods that are generally designed to be run on a pretrained network, AdaQAT works well in both training from scratch and fine-tuning scenarios.Initial results on the CIFAR-10 and ImageNet datasets using ResNet20 and ResNet18 models, respectively, indicate that our method is competitive with other state-of-the-art mixed-precision quantization approaches.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
A Stochastic Rounding-Enabled Low-Precision Floating-Point MAC for DNN Training
Authors:
Sami Ben Ali,
Silviu-Ioan Filip,
Olivier Sentieys
Abstract:
Training Deep Neural Networks (DNNs) can be computationally demanding, particularly when dealing with large models. Recent work has aimed to mitigate this computational challenge by introducing 8-bit floating-point (FP8) formats for multiplication. However, accumulations are still done in either half (16-bit) or single (32-bit) precision arithmetic. In this paper, we investigate lowering accumulat…
▽ More
Training Deep Neural Networks (DNNs) can be computationally demanding, particularly when dealing with large models. Recent work has aimed to mitigate this computational challenge by introducing 8-bit floating-point (FP8) formats for multiplication. However, accumulations are still done in either half (16-bit) or single (32-bit) precision arithmetic. In this paper, we investigate lowering accumulator word length while maintaining the same model accuracy. We present a multiply-accumulate (MAC) unit with FP8 multiplier inputs and FP12 accumulations, which leverages an optimized stochastic rounding (SR) implementation to mitigate swam** errors that commonly arise during low precision accumulations. We investigate the hardware implications and accuracy impact associated with varying the number of random bits used for rounding operations. We additionally attempt to reduce MAC area and power by proposing a new scheme to support SR in floating-point MAC and by removing support for subnormal values. Our optimized eager SR unit significantly reduces delay and area when compared to a classic lazy SR design. Moreover, when compared to MACs utilizing single-or half-precision adders, our design showcases notable savings in all metrics. Furthermore, our approach consistently maintains near baseline accuracy across a diverse range of computer vision tasks, making it a promising alternative for low-precision DNN training.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
When Side-Channel Attacks Break the Black-Box Property of Embedded Artificial Intelligence
Authors:
Benoit Coqueret,
Mathieu Carbone,
Olivier Sentieys,
Gabriel Zaid
Abstract:
Artificial intelligence, and specifically deep neural networks (DNNs), has rapidly emerged in the past decade as the standard for several tasks from specific advertising to object detection. The performance offered has led DNN algorithms to become a part of critical embedded systems, requiring both efficiency and reliability. In particular, DNNs are subject to malicious examples designed in a way…
▽ More
Artificial intelligence, and specifically deep neural networks (DNNs), has rapidly emerged in the past decade as the standard for several tasks from specific advertising to object detection. The performance offered has led DNN algorithms to become a part of critical embedded systems, requiring both efficiency and reliability. In particular, DNNs are subject to malicious examples designed in a way to fool the network while being undetectable to the human observer: the adversarial examples. While previous studies propose frameworks to implement such attacks in black box settings, those often rely on the hypothesis that the attacker has access to the logits of the neural network, breaking the assumption of the traditional black box. In this paper, we investigate a real black box scenario where the attacker has no access to the logits. In particular, we propose an architecture-agnostic attack which solve this constraint by extracting the logits. Our method combines hardware and software attacks, by performing a side-channel attack that exploits electromagnetic leakages to extract the logits for a given input, allowing an attacker to estimate the gradients and produce state-of-the-art adversarial examples to fool the targeted neural network. Through this example of adversarial attack, we demonstrate the effectiveness of logits extraction using side-channel as a first step for more general attack frameworks requiring either the logits or the confidence scores.
△ Less
Submitted 23 November, 2023;
originally announced November 2023.
-
Low-Precision Floating-Point for Efficient On-Board Deep Neural Network Processing
Authors:
Cédric Gernigon,
Silviu-Ioan Filip,
Olivier Sentieys,
Clément Coggiola,
Mickaël Bruno
Abstract:
One of the major bottlenecks in high-resolution Earth Observation (EO) space systems is the downlink between the satellite and the ground. Due to hardware limitations, on-board power limitations or ground-station operation costs, there is a strong need to reduce the amount of data transmitted. Various processing methods can be used to compress the data. One of them is the use of on-board deep lear…
▽ More
One of the major bottlenecks in high-resolution Earth Observation (EO) space systems is the downlink between the satellite and the ground. Due to hardware limitations, on-board power limitations or ground-station operation costs, there is a strong need to reduce the amount of data transmitted. Various processing methods can be used to compress the data. One of them is the use of on-board deep learning to extract relevant information in the data. However, most ground-based deep neural network parameters and computations are performed using single-precision floating-point arithmetic, which is not adapted to the context of on-board processing. We propose to rely on quantized neural networks and study how to combine low precision (mini) floating-point arithmetic with a Quantization-Aware Training methodology. We evaluate our approach with a semantic segmentation task for ship detection using satellite images from the Airbus Ship dataset. Our results show that 6-bit floating-point quantization for both weights and activations can compete with single-precision without significant accuracy degradation. Using a Thin U-Net 32 model, only a 0.3% accuracy degradation is observed with 6-bit minifloat quantization (a 6-bit equivalent integer-based approach leads to a 0.5% degradation). An initial hardware study also confirms the potential impact of such low-precision floating-point designs, but further investigation at the scale of a full inference accelerator is needed before concluding whether they are relevant in a practical on-board scenario.
△ Less
Submitted 18 November, 2023;
originally announced November 2023.
-
Approximations in Deep Learning
Authors:
Etienne Dupuis,
Silviu-Ioan Filip,
Olivier Sentieys,
David Novo,
Ian O'Connor,
Alberto Bosio
Abstract:
The design and implementation of Deep Learning (DL) models is currently receiving a lot of attention from both industrials and academics. However, the computational workload associated with DL is often out of reach for low-power embedded devices and is still costly when run on datacenters. By relaxing the need for fully precise operations, Approximate Computing (AxC) substantially improves perform…
▽ More
The design and implementation of Deep Learning (DL) models is currently receiving a lot of attention from both industrials and academics. However, the computational workload associated with DL is often out of reach for low-power embedded devices and is still costly when run on datacenters. By relaxing the need for fully precise operations, Approximate Computing (AxC) substantially improves performance and energy efficiency. DL is extremely relevant in this context, since playing with the accuracy needed to do adequate computations will significantly enhance performance, while kee** the quality of results in a user-constrained range. This chapter will explore how AxC can improve the performance and energy efficiency of hardware accelerators in DL applications during inference and training.
△ Less
Submitted 8 December, 2022;
originally announced December 2022.
-
Customizing Number Representation and Precision
Authors:
Olivier Sentieys,
Daniel Menard
Abstract:
There is a growing interest in the use of reduced-precision arithmetic, exacerbated by the recent interest in artificial intelligence, especially with deep learning. Most architectures already provide reduced-precision capabilities (e.g., 8-bit integer, 16-bit floating point). In the context of FPGAs, any number format and bit-width can even be considered.In computer arithmetic, the representation…
▽ More
There is a growing interest in the use of reduced-precision arithmetic, exacerbated by the recent interest in artificial intelligence, especially with deep learning. Most architectures already provide reduced-precision capabilities (e.g., 8-bit integer, 16-bit floating point). In the context of FPGAs, any number format and bit-width can even be considered.In computer arithmetic, the representation of real numbers is a major issue. Fixed-point (FxP) and floating-point (FlP) are the main options to represent reals, both with their advantages and drawbacks. This chapter presents both FxP and FlP number representations, and draws a fair a comparison between their cost, performance and energy, as well as their impact on accuracy during computations.It is shown that the choice between FxP and FlP is not obvious and strongly depends on the application considered. In some cases, low-precision floating-point arithmetic can be the most effective and provides some benefits over the classical fixed-point choice for energy-constrained applications.
△ Less
Submitted 8 December, 2022;
originally announced December 2022.
-
Characterizing a Neutron-Induced Fault Model for Deep Neural Networks
Authors:
Fernando Fernandes dos Santos,
Angeliki Kritikakou,
Josie Esteban Rodriguez Condia,
Juan David Guerrero Balaguera,
Matteo Sonza Reorda,
Olivier Sentieys,
Paolo Rech
Abstract:
The reliability evaluation of Deep Neural Networks (DNNs) executed on Graphic Processing Units (GPUs) is a challenging problem since the hardware architecture is highly complex and the software frameworks are composed of many layers of abstraction. While software-level fault injection is a common and fast way to evaluate the reliability of complex applications, it may produce unrealistic results s…
▽ More
The reliability evaluation of Deep Neural Networks (DNNs) executed on Graphic Processing Units (GPUs) is a challenging problem since the hardware architecture is highly complex and the software frameworks are composed of many layers of abstraction. While software-level fault injection is a common and fast way to evaluate the reliability of complex applications, it may produce unrealistic results since it has limited access to the hardware resources and the adopted fault models may be too naive (i.e., single and double bit flip). Contrarily, physical fault injection with neutron beam provides realistic error rates but lacks fault propagation visibility. This paper proposes a characterization of the DNN fault model combining both neutron beam experiments and fault injection at software level. We exposed GPUs running General Matrix Multiplication (GEMM) and DNNs to beam neutrons to measure their error rate. On DNNs, we observe that the percentage of critical errors can be up to 61%, and show that ECC is ineffective in reducing critical errors. We then performed a complementary software-level fault injection, using fault models derived from RTL simulations. Our results show that by injecting complex fault models, the YOLOv3 misdetection rate is validated to be very close to the rate measured with beam experiments, which is 8.66x higher than the one measured with fault injection using only single-bit flips.
△ Less
Submitted 23 November, 2022;
originally announced November 2022.
-
Experimental evaluation of neutron-induced errors on a multicore RISC-V platform
Authors:
Fernando Fernandes dos Santos,
Angeliki Kritikakou,
Olivier Sentieys
Abstract:
RISC-V architectures have gained importance in the last years due to their flexibility and open-source Instruction Set Architecture (ISA), allowing developers to efficiently adopt RISC-V processors in several domains with a reduced cost. For application domains, such as safety-critical and mission-critical, the execution must be reliable as a fault can compromise the system's ability to operate co…
▽ More
RISC-V architectures have gained importance in the last years due to their flexibility and open-source Instruction Set Architecture (ISA), allowing developers to efficiently adopt RISC-V processors in several domains with a reduced cost. For application domains, such as safety-critical and mission-critical, the execution must be reliable as a fault can compromise the system's ability to operate correctly. However, the application's error rate on RISC-V processors is not significantly evaluated, as it has been done for standard x86 processors. In this work, we investigate the error rate of a commercial RISC-V ASIC platform, the GAP8, exposed to a neutron beam. We show that for computing-intensive applications, such as classification Convolutional Neural Networks (CNN), the error rate can be 3.2x higher than the average error rate. Additionally, we find that the majority (96.12%) of the errors on the CNN do not generate misclassifications. Finally, we also evaluate the events that cause application interruption on GAP8 and show that the major source of incorrect interruptions is application hangs (i.g., due to an infinite loop or a racing condition).
△ Less
Submitted 17 June, 2022;
originally announced June 2022.
-
Freezer: A Specialized NVM Backup Controller for Intermittently-Powered Systems
Authors:
Davide Pala,
Ivan Miro-Panades,
Olivier Sentieys
Abstract:
The explosion of IoT and wearable devices determined a rising attention towards energy harvesting as source for powering these systems. In this context, many applications cannot afford the presence of a battery because of size, weight and cost issues. Therefore, due to the intermittent nature of ambient energy sources, these systems must be able to save and restore their state, in order to guarant…
▽ More
The explosion of IoT and wearable devices determined a rising attention towards energy harvesting as source for powering these systems. In this context, many applications cannot afford the presence of a battery because of size, weight and cost issues. Therefore, due to the intermittent nature of ambient energy sources, these systems must be able to save and restore their state, in order to guarantee progress across power interruptions. In this work, we propose a specialized backup/restore controller that dynamically tracks the memory accesses during the execution of the program. The controller then commits the changes to a snapshot in a Non-Volatile Memory (NVM) when a power failure is detected. Our approach does not require complex hybrid memories and can be implemented with standard components. % and integrated in any MCU with Results on a set of benchmarks show an average $8\times$ reduction in backup size. Thanks to our dedicated controller, the backup time is further reduced by more than $100\times$, with an area and power overhead of only 0.4\% and 0.8\%, respectively, w.r.t. a low-end IoT node.
△ Less
Submitted 25 January, 2021;
originally announced January 2021.