Network Memory Footprint Compression Through Jointly Learnable Codebooks and Map**s

Yvinec, Edouard; Dapogny, Arnaud; Bailly, Kevin

Abstract:The massive interest in deep neural networks (DNNs) for both computer vision and natural language processing has been sparked by the growth in computational power. However, this led to an increase in the memory footprint, to a point where it can be challenging to simply load a model on commodity devices such as mobile phones. To address this limitation, quantization is a favored solution as it maps high precision tensors to a low precision, memory efficient format. In terms of memory footprint reduction, its most effective variants are based on codebooks. These methods, however, suffer from two limitations. First, they either define a single codebook for each tensor, or use a memory-expensive map** to multiple codebooks. Second, gradient descent optimization of the map** favors jumps toward extreme values, hence not defining a proximal search. In this work, we propose to address these two limitations. First, we initially group similarly distributed neurons and leverage the re-ordered structure to either apply different scale factors to the different groups, or map weights that fall in these groups to several codebooks, without any map** overhead. Second, stemming from this initialization, we propose a joint learning of the codebook and weight map**s that bears similarities with recent gradient-based post-training quantization techniques. Third, drawing estimation from straight-through estimation techniques, we introduce a novel gradient update definition to enable a proximal search of the codebooks and their map**s. The proposed jointly learnable codebooks and map**s (JLCM) method allows a very efficient approximation of any DNN: as such, a Llama 7B can be compressed down to 2Go and loaded on 5-year-old smartphones.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2309.17361 [cs.CV]
	(or arXiv:2309.17361v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2309.17361

Computer Science > Computer Vision and Pattern Recognition

Title:Network Memory Footprint Compression Through Jointly Learnable Codebooks and Map**s

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators