-
Compressed Real Numbers for AI: a case-study using a RISC-V CPU
Authors:
Federico Rossi,
Marco Cococcioni,
Roger Ferrer Ibàñez,
Jesùs Labarta,
Filippo Mantovani,
Marc Casas,
Emanuele Ruffaldi,
Sergio Saponara
Abstract:
As recently demonstrated, Deep Neural Networks (DNN), usually trained using single precision IEEE 754 floating point numbers (binary32), can also work using lower precision. Therefore, 16-bit and 8-bit compressed format have attracted considerable attention. In this paper, we focused on two families of formats that have already achieved interesting results in compressing binary32 numbers in machin…
▽ More
As recently demonstrated, Deep Neural Networks (DNN), usually trained using single precision IEEE 754 floating point numbers (binary32), can also work using lower precision. Therefore, 16-bit and 8-bit compressed format have attracted considerable attention. In this paper, we focused on two families of formats that have already achieved interesting results in compressing binary32 numbers in machine learning applications, without sensible degradation of the accuracy: bfloat and posit. Even if 16-bit and 8-bit bfloat/posit are routinely used for reducing the storage of the weights/biases of trained DNNs, the inference still often happens on the 32-bit FPU of the CPU (especially if GPUs are not available). In this paper we propose a way to decompress a tensor of bfloat/posits just before computations, i.e., after the compressed operands have been loaded within the vector registers of a vector capable CPU, in order to save bandwidth usage and increase cache efficiency. Finally, we show the architectural parameters and considerations under which this solution is advantageous with respect to the uncompressed one.
△ Less
Submitted 11 September, 2023;
originally announced September 2023.
-
PPU: Design and Implementation of a Pipelined Full Posit Processing Unit
Authors:
Federico Rossi,
Francesco Urbani,
Marco Cococcioni,
Emanuele Ruffaldi,
Sergio Saponara
Abstract:
By exploiting the modular RISC-V ISA this paper presents the customization of instruction set with posit\textsuperscript{\texttrademark} arithmetic instructions to provide improved numerical accuracy, well-defined behavior and increased range of representable numbers while kee** the flexibility and benefits of open-source ISA, like no licensing and royalty fee and community development. In this…
▽ More
By exploiting the modular RISC-V ISA this paper presents the customization of instruction set with posit\textsuperscript{\texttrademark} arithmetic instructions to provide improved numerical accuracy, well-defined behavior and increased range of representable numbers while kee** the flexibility and benefits of open-source ISA, like no licensing and royalty fee and community development. In this work we present the design, implementation and integration into the low-power Ibex RISC-V core of a full posit processing unit capable to directly implement in hardware the four arithmetic operations (add, sub, mul, div and fma), the inversion, the float-to-posit and posit-to-float conversions. We evaluate speed, power and area of this unit (that we have called Full Posit Processing Unit). The FPPU has been prototyped on Alveo and Kintex FPGAs, and its impact on the metrics of the full-RISC-V core have been evaluated, showing that we can provide real number processing capabilities to the mentioned core with an increase in area limited to $7\%$ for 8-bit posits and to $15\%$ for 16-bit posits. Finally we present tests one the use of posits for deep neural networks with different network models and datasets, showing minimal drop in accuracy when using 16-bit posits instead of 32-bit IEEE floats.
△ Less
Submitted 8 April, 2024; v1 submitted 7 August, 2023;
originally announced August 2023.
-
VALUE: Large Scale Voting-based Automatic Labelling for Urban Environments
Authors:
Giacomo Dabisias,
Emanuele Ruffaldi,
Hugo Grimmett,
Peter Ondruska
Abstract:
This paper presents a simple and robust method for the automatic localisation of static 3D objects in large-scale urban environments. By exploiting the potential to merge a large volume of noisy but accurately localised 2D image data, we achieve superior performance in terms of both robustness and accuracy of the recovered 3D information. The method is based on a simple distributed voting schema w…
▽ More
This paper presents a simple and robust method for the automatic localisation of static 3D objects in large-scale urban environments. By exploiting the potential to merge a large volume of noisy but accurately localised 2D image data, we achieve superior performance in terms of both robustness and accuracy of the recovered 3D information. The method is based on a simple distributed voting schema which can be fully distributed and parallelised to scale to large-scale scenarios. To evaluate the method we collected city-scale data sets from New York City and San Francisco consisting of almost 400k images spanning the area of 40 km$^2$ and used it to accurately recover the 3D positions of traffic lights. We demonstrate a robust performance and also show that the solution improves in quality over time as the amount of data increases.
△ Less
Submitted 5 June, 2020;
originally announced June 2020.
-
Robust and Subject-Independent Driving Manoeuvre Anticipation through Domain-Adversarial Recurrent Neural Networks
Authors:
Michele Tonutti,
Emanuele Ruffaldi,
Alessandro Cattaneo,
Carlo Alberto Avizzano
Abstract:
Through deep learning and computer vision techniques, driving manoeuvres can be predicted accurately a few seconds in advance. Even though adapting a learned model to new drivers and different vehicles is key for robust driver-assistance systems, this problem has received little attention so far. This work proposes to tackle this challenge through domain adaptation, a technique closely related to…
▽ More
Through deep learning and computer vision techniques, driving manoeuvres can be predicted accurately a few seconds in advance. Even though adapting a learned model to new drivers and different vehicles is key for robust driver-assistance systems, this problem has received little attention so far. This work proposes to tackle this challenge through domain adaptation, a technique closely related to transfer learning. A proof of concept for the application of a Domain-Adversarial Recurrent Neural Network (DA-RNN) to multi-modal time series driving data is presented, in which domain-invariant features are learned by maximizing the loss of an auxiliary domain classifier. Our implementation is evaluated using a leave-one-driver-out approach on individual drivers from the Brain4Cars dataset, as well as using a new dataset acquired through driving simulations, yielding an average increase in performance of 30% and 114% respectively compared to no adaptation. We also show the importance of fine-tuning sections of the network to optimise the extraction of domain-independent features. The results demonstrate the applicability of the approach to driver-assistance systems as well as training and simulation environments.
△ Less
Submitted 26 February, 2019;
originally announced February 2019.