Two Heads are Better Than One: Neural Networks Quantization with 2D Hilbert Curve-based Output Representation
Authors:
Mykhailo Uss,
Ruslan Yermolenko,
Olena Kolodiazhna,
Oleksii Shashko,
Ivan Safonov,
Volodymyr Savin,
Yoonjae Yeo,
Seowon Ji,
Jaeyun Jeong
Abstract:
Quantization is widely used to increase deep neural networks' (DNN) memory, computation, and power efficiency. Various techniques, such as post-training quantization and quantization-aware training, have been proposed to improve quantization quality. We introduce a novel approach for DNN quantization that uses a redundant representation of DNN's output. We represent the target quantity as a point…
▽ More
Quantization is widely used to increase deep neural networks' (DNN) memory, computation, and power efficiency. Various techniques, such as post-training quantization and quantization-aware training, have been proposed to improve quantization quality. We introduce a novel approach for DNN quantization that uses a redundant representation of DNN's output. We represent the target quantity as a point on a 2D parametric curve. The DNN model is modified to predict 2D points that are mapped back to the target quantity at a post-processing stage. We demonstrate that this map** can reduce quantization error. For the low-order parametric Hilbert curve, Depth-From-Stereo task, and two models represented by U-Net architecture and vision transformer, we achieved a quantization error reduction by about 5 times for the INT8 model at both CPU and DSP delegates. This gain comes with a minimal inference time increase (less than 7%). Our approach can be applied to other tasks, including segmentation, object detection, and key-points prediction.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.