Showing 1–2 of 2 results for author: Zarif, N E

Search v0.5.6 released 2020-02-24

arXiv:2306.09905 [pdf, other]

cs.AR

Sparq: A Custom RISC-V Vector Processor for Efficient Sub-Byte Quantized Inference

Authors: Théo Dupuis, Yoan Fournier, MohammadHossein AskariHemmat, Nizar El Zarif, François Leduc-Primeau, Jean Pierre David, Yvon Savaria

Abstract: Convolutional Neural Networks (CNNs) are used in a wide range of applications, with full-precision CNNs achieving high accuracy at the expense of portability. Recent progress in quantization techniques has demonstrated that sub-byte Quantized Neural Networks (QNNs) achieve comparable or superior accuracy while significantly reducing the computational cost and memory footprint. However, sub-byte co… ▽ More Convolutional Neural Networks (CNNs) are used in a wide range of applications, with full-precision CNNs achieving high accuracy at the expense of portability. Recent progress in quantization techniques has demonstrated that sub-byte Quantized Neural Networks (QNNs) achieve comparable or superior accuracy while significantly reducing the computational cost and memory footprint. However, sub-byte computation on commodity hardware is sub-optimal due to the lack of support for such precision. In this paper, we introduce Sparq, a Sub-byte vector Processor designed for the AcceleRation of QNN inference. This processor is based on a modified version of Ara, an open-source 64-bit RISC-V ``V'' compliant processor. Sparq is implemented in GLOBAL FOUNDRIES 22FDX FD-SOI technology and extends the Instruction Set Architecture (ISA) by adding a new multiply-shift-accumulate instruction to improve sub-byte computation effciency. The floating-point unit is also removed to minimize area and power usage. To demonstrate Sparq performance, we implement an ultra-low-precision (1-bit to 4-bit) vectorized conv2d operation taking advantage of the dedicated hardware. We show that Sparq can significantly accelerate sub-byte computations with respectively 3.2 times, and 1.7 times acceleration over an optimized 16-bit 2D convolution for 2-bit and 4-bit quantization. △ Less

Submitted 16 June, 2023; originally announced June 2023.

Comments: 5 pages, Accepted for publication in the 21st IEEE Interregional NEWCAS Conference (2023)
arXiv:2302.05996 [pdf, other]

cs.AR

Quark: An Integer RISC-V Vector Processor for Sub-Byte Quantized DNN Inference

Authors: MohammadHossein AskariHemmat, Theo Dupuis, Yoan Fournier, Nizar El Zarif, Matheus Cavalcante, Matteo Perotti, Frank Gurkaynak, Luca Benini, Francois Leduc-Primeau, Yvon Savaria, Jean-Pierre David

Abstract: In this paper, we present Quark, an integer RISC-V vector processor specifically tailored for sub-byte DNN inference. Quark is implemented in GlobalFoundries' 22FDX FD-SOI technology. It is designed on top of Ara, an open-source 64-bit RISC-V vector processor. To accommodate sub-byte DNN inference, Quark extends Ara by adding specialized vector instructions to perform sub-byte quantized operations… ▽ More In this paper, we present Quark, an integer RISC-V vector processor specifically tailored for sub-byte DNN inference. Quark is implemented in GlobalFoundries' 22FDX FD-SOI technology. It is designed on top of Ara, an open-source 64-bit RISC-V vector processor. To accommodate sub-byte DNN inference, Quark extends Ara by adding specialized vector instructions to perform sub-byte quantized operations. We also remove the floating-point unit from Quarks' lanes and use the CVA6 RISC-V scalar core for the re-scaling operations that are required in quantized neural network inference. This makes each lane of Quark 2 times smaller and 1.9 times more power efficient compared to the ones of Ara. In this paper we show that Quark can run quantized models at sub-byte precision. Notably we show that for 1-bit and 2-bit quantized models, Quark can accelerate computation of Conv2d over various ranges of inputs and kernel sizes. △ Less

Submitted 12 February, 2023; originally announced February 2023.

Comments: 5 pages. Accepted for publication in the 56th International Symposium on Circuits and Systems (ISCAS 2023)

ACM Class: C.1.3; C.3

Search v0.5.6 released 2020-02-24