QUENN: QUantization Engine for low-power Neural Networks

de Prado, Miguel; Denna, Maurizio; Benini, Luca; Pazos, Nuria

doi:10.1145/3203217.3203282

Computer Science > Neural and Evolutionary Computing

arXiv:1811.05896 (cs)

[Submitted on 14 Nov 2018]

Title:QUENN: QUantization Engine for low-power Neural Networks

Authors:Miguel de Prado, Maurizio Denna, Luca Benini, Nuria Pazos

View PDF

Abstract:Deep Learning is moving to edge devices, ushering in a new age of distributed Artificial Intelligence (AI). The high demand of computational resources required by deep neural networks may be alleviated by approximate computing techniques, and most notably reduced-precision arithmetic with coarsely quantized numerical representations. In this context, Bonseyes comes in as an initiative to enable stakeholders to bring AI to low-power and autonomous environments such as: Automotive, Medical Healthcare and Consumer Electronics. To achieve this, we introduce LPDNN, a framework for optimized deployment of Deep Neural Networks on heterogeneous embedded devices. In this work, we detail the quantization engine that is integrated in LPDNN. The engine depends on a fine-grained workflow which enables a Neural Network Design Exploration and a sensitivity analysis of each layer for quantization. We demonstrate the engine with a case study on Alexnet and VGG16 for three different techniques for direct quantization: standard fixed-point, dynamic fixed-point and k-means clustering, and demonstrate the potential of the latter. We argue that using a Gaussian quantizer with k-means clustering can achieve better performance than linear quantizers. Without retraining, we achieve over 55.64\% saving for weights' storage and 69.17\% for run-time memory accesses with less than 1\% drop in top5 accuracy in Imagenet.

Comments:	Computing Frontiers 2018
Subjects:	Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG)
Cite as:	arXiv:1811.05896 [cs.NE]
	(or arXiv:1811.05896v1 [cs.NE] for this version)
	https://doi.org/10.48550/arXiv.1811.05896
Related DOI:	https://doi.org/10.1145/3203217.3203282

Submission history

From: Miguel De Prado [view email]
[v1] Wed, 14 Nov 2018 16:38:42 UTC (2,136 KB)

Computer Science > Neural and Evolutionary Computing

Title:QUENN: QUantization Engine for low-power Neural Networks

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Neural and Evolutionary Computing

Title:QUENN: QUantization Engine for low-power Neural Networks

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators