Effective Interplay between Sparsity and Quantization: From Theory to Practice

Harma, Simla Burcu; Chakraborty, Ayan; Kostenok, Elizaveta; Mishin, Danila; Ha, Dongho; Falsafi, Babak; Jaggi, Martin; Liu, Ming; Oh, Yunho; Subramanian, Suvinay; Yazdanbakhsh, Amir

Computer Science > Machine Learning

arXiv:2405.20935 (cs)

[Submitted on 31 May 2024]

Title:Effective Interplay between Sparsity and Quantization: From Theory to Practice

Authors:Simla Burcu Harma, Ayan Chakraborty, Elizaveta Kostenok, Danila Mishin, Dongho Ha, Babak Falsafi, Martin Jaggi, Ming Liu, Yunho Oh, Suvinay Subramanian, Amir Yazdanbakhsh

View PDF HTML (experimental)

Abstract:The increasing size of deep neural networks necessitates effective model compression to improve computational efficiency and reduce their memory footprint. Sparsity and quantization are two prominent compression methods that have individually demonstrated significant reduction in computational and memory footprints while preserving model accuracy. While effective, the interplay between these two methods remains an open question. In this paper, we investigate the interaction between these two methods and assess whether their combination impacts final model accuracy. We mathematically prove that applying sparsity before quantization is the optimal sequence for these operations, minimizing error in computation. Our empirical studies across a wide range of models, including OPT and Llama model families (125M-8B) and ViT corroborate these theoretical findings. In addition, through rigorous analysis, we demonstrate that sparsity and quantization are not orthogonal; their interaction can significantly harm model accuracy, with quantization error playing a dominant role in this degradation. Our findings extend to the efficient deployment of large models in resource-limited compute platforms and reduce serving cost, offering insights into best practices for applying these compression methods to maximize efficacy without compromising accuracy.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2405.20935 [cs.LG]
	(or arXiv:2405.20935v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2405.20935

Submission history

From: Simla Burcu Harma [view email]
[v1] Fri, 31 May 2024 15:34:13 UTC (499 KB)

Computer Science > Machine Learning

Title:Effective Interplay between Sparsity and Quantization: From Theory to Practice

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Effective Interplay between Sparsity and Quantization: From Theory to Practice

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators