Search | arXiv e-print repository

AdaQAT: Adaptive Bit-Width Quantization-Aware Training

Authors: Cédric Gernigon, Silviu-Ioan Filip, Olivier Sentieys, Clément Coggiola, Mickael Bruno

Abstract: Large-scale deep neural networks (DNNs) have achieved remarkable success in many application scenarios. However, high computational complexity and energy costs of modern DNNs make their deployment on edge devices challenging. Model quantization is a common approach to deal with deployment constraints, but searching for optimized bit-widths can be challenging. In this work, we present Adaptive Bit-… ▽ More Large-scale deep neural networks (DNNs) have achieved remarkable success in many application scenarios. However, high computational complexity and energy costs of modern DNNs make their deployment on edge devices challenging. Model quantization is a common approach to deal with deployment constraints, but searching for optimized bit-widths can be challenging. In this work, we present Adaptive Bit-Width Quantization Aware Training (AdaQAT), a learning-based method that automatically optimizes weight and activation signal bit-widths during training for more efficient DNN inference. We use relaxed real-valued bit-widths that are updated using a gradient descent rule, but are otherwise discretized for all quantization operations. The result is a simple and flexible QAT approach for mixed-precision uniform quantization problems. Compared to other methods that are generally designed to be run on a pretrained network, AdaQAT works well in both training from scratch and fine-tuning scenarios.Initial results on the CIFAR-10 and ImageNet datasets using ResNet20 and ResNet18 models, respectively, indicate that our method is competitive with other state-of-the-art mixed-precision quantization approaches. △ Less

Submitted 22 April, 2024; originally announced April 2024.

arXiv:2404.14010 [pdf, other]

A Stochastic Rounding-Enabled Low-Precision Floating-Point MAC for DNN Training

Authors: Sami Ben Ali, Silviu-Ioan Filip, Olivier Sentieys

Abstract: Training Deep Neural Networks (DNNs) can be computationally demanding, particularly when dealing with large models. Recent work has aimed to mitigate this computational challenge by introducing 8-bit floating-point (FP8) formats for multiplication. However, accumulations are still done in either half (16-bit) or single (32-bit) precision arithmetic. In this paper, we investigate lowering accumulat… ▽ More Training Deep Neural Networks (DNNs) can be computationally demanding, particularly when dealing with large models. Recent work has aimed to mitigate this computational challenge by introducing 8-bit floating-point (FP8) formats for multiplication. However, accumulations are still done in either half (16-bit) or single (32-bit) precision arithmetic. In this paper, we investigate lowering accumulator word length while maintaining the same model accuracy. We present a multiply-accumulate (MAC) unit with FP8 multiplier inputs and FP12 accumulations, which leverages an optimized stochastic rounding (SR) implementation to mitigate swam** errors that commonly arise during low precision accumulations. We investigate the hardware implications and accuracy impact associated with varying the number of random bits used for rounding operations. We additionally attempt to reduce MAC area and power by proposing a new scheme to support SR in floating-point MAC and by removing support for subnormal values. Our optimized eager SR unit significantly reduces delay and area when compared to a classic lazy SR design. Moreover, when compared to MACs utilizing single-or half-precision adders, our design showcases notable savings in all metrics. Furthermore, our approach consistently maintains near baseline accuracy across a diverse range of computer vision tasks, making it a promising alternative for low-precision DNN training. △ Less

Submitted 22 April, 2024; originally announced April 2024.

Journal ref: DATE 2024 - 27th IEEE/ACM Design, Automation and Test in Europe, Mar 2024, Valencia, Spain. pp.1-6

arXiv:2311.11172 [pdf, other]

Low-Precision Floating-Point for Efficient On-Board Deep Neural Network Processing

Authors: Cédric Gernigon, Silviu-Ioan Filip, Olivier Sentieys, Clément Coggiola, Mickaël Bruno

Abstract: One of the major bottlenecks in high-resolution Earth Observation (EO) space systems is the downlink between the satellite and the ground. Due to hardware limitations, on-board power limitations or ground-station operation costs, there is a strong need to reduce the amount of data transmitted. Various processing methods can be used to compress the data. One of them is the use of on-board deep lear… ▽ More One of the major bottlenecks in high-resolution Earth Observation (EO) space systems is the downlink between the satellite and the ground. Due to hardware limitations, on-board power limitations or ground-station operation costs, there is a strong need to reduce the amount of data transmitted. Various processing methods can be used to compress the data. One of them is the use of on-board deep learning to extract relevant information in the data. However, most ground-based deep neural network parameters and computations are performed using single-precision floating-point arithmetic, which is not adapted to the context of on-board processing. We propose to rely on quantized neural networks and study how to combine low precision (mini) floating-point arithmetic with a Quantization-Aware Training methodology. We evaluate our approach with a semantic segmentation task for ship detection using satellite images from the Airbus Ship dataset. Our results show that 6-bit floating-point quantization for both weights and activations can compete with single-precision without significant accuracy degradation. Using a Thin U-Net 32 model, only a 0.3% accuracy degradation is observed with 6-bit minifloat quantization (a 6-bit equivalent integer-based approach leads to a 0.5% degradation). An initial hardware study also confirms the potential impact of such low-precision floating-point designs, but further investigation at the scale of a full inference accelerator is needed before concluding whether they are relevant in a practical on-board scenario. △ Less

Submitted 18 November, 2023; originally announced November 2023.

Comments: 7 pages, 7 figures, EDHPC 2023

arXiv:2212.04297 [pdf, other]

doi 10.1007/978-3-030-94705-7_15

Approximations in Deep Learning

Authors: Etienne Dupuis, Silviu-Ioan Filip, Olivier Sentieys, David Novo, Ian O'Connor, Alberto Bosio

Abstract: The design and implementation of Deep Learning (DL) models is currently receiving a lot of attention from both industrials and academics. However, the computational workload associated with DL is often out of reach for low-power embedded devices and is still costly when run on datacenters. By relaxing the need for fully precise operations, Approximate Computing (AxC) substantially improves perform… ▽ More The design and implementation of Deep Learning (DL) models is currently receiving a lot of attention from both industrials and academics. However, the computational workload associated with DL is often out of reach for low-power embedded devices and is still costly when run on datacenters. By relaxing the need for fully precise operations, Approximate Computing (AxC) substantially improves performance and energy efficiency. DL is extremely relevant in this context, since playing with the accuracy needed to do adequate computations will significantly enhance performance, while kee** the quality of results in a user-constrained range. This chapter will explore how AxC can improve the performance and energy efficiency of hardware accelerators in DL applications during inference and training. △ Less

Submitted 8 December, 2022; originally announced December 2022.

Comments: Approximate Computing Techniques - From Component- to Application-Level, pp.467-512, 2022, 978-3-030-94704-0

arXiv:1511.02117 [pdf]

Introducing SKYSET - a Quintuple Approach for Improving Instructions

Authors: Kerry Fultz, Seth Filip

Abstract: A new approach called SKYSET (Synthetic Knowledge Yield Social Entities Translation) is proposed to validate completeness and to reduce ambiguity from written instructional documentation. SKYSET utilizes a quintuple set of standardized categories, which differs from traditional approaches that typically use triples. The SKYSET System defines the categories required to form a standard template for… ▽ More A new approach called SKYSET (Synthetic Knowledge Yield Social Entities Translation) is proposed to validate completeness and to reduce ambiguity from written instructional documentation. SKYSET utilizes a quintuple set of standardized categories, which differs from traditional approaches that typically use triples. The SKYSET System defines the categories required to form a standard template for representing information that is portable across different domains. It provides a standardized framework that enables sentences from written instructions to be translated into sets of category typed entities on a table or database. The SKYSET entities contain conceptual units or phrases that represent information from the original source documentation. SKYSET enables information concatenation where multiple documents from different domains can be translated and combined into a single common filterable and searchable table of entities. △ Less

Submitted 6 November, 2015; originally announced November 2015.

Comments: 11 pages

Showing 1–5 of 5 results for author: Filip, S