Zero-Shot Sharpness-Aware Quantization for Pre-trained Language Models

Zhu, Miaoxi; Zhong, Qihuang; Shen, Li; Ding, Liang; Liu, Juhua; Du, Bo; Tao, Dacheng

Computer Science > Computation and Language

arXiv:2310.13315 (cs)

[Submitted on 20 Oct 2023]

Title:Zero-Shot Sharpness-Aware Quantization for Pre-trained Language Models

Authors:Miaoxi Zhu, Qihuang Zhong, Li Shen, Liang Ding, Juhua Liu, Bo Du, Dacheng Tao

View PDF

Abstract:Quantization is a promising approach for reducing memory overhead and accelerating inference, especially in large pre-trained language model (PLM) scenarios. While having no access to original training data due to security and privacy concerns has emerged the demand for zero-shot quantization. Most of the cutting-edge zero-shot quantization methods primarily 1) apply to computer vision tasks, and 2) neglect of overfitting problem in the generative adversarial learning process, leading to sub-optimal performance. Motivated by this, we propose a novel zero-shot sharpness-aware quantization (ZSAQ) framework for the zero-shot quantization of various PLMs. The key algorithm in solving ZSAQ is the SAM-SGA optimization, which aims to improve the quantization accuracy and model generalization via optimizing a minimax problem. We theoretically prove the convergence rate for the minimax optimization problem and this result can be applied to other nonconvex-PL minimax optimization frameworks. Extensive experiments on 11 tasks demonstrate that our method brings consistent and significant performance gains on both discriminative and generative PLMs, i.e., up to +6.98 average score. Furthermore, we empirically validate that our method can effectively improve the model generalization.

Comments:	Accepted to EMNLP2023 (Main). Miaoxi Zhu and Qihuang Zhong contribute equally to this work
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2310.13315 [cs.CL]
	(or arXiv:2310.13315v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2310.13315

Submission history

From: Qihuang Zhong [view email]
[v1] Fri, 20 Oct 2023 07:09:56 UTC (1,974 KB)

Computer Science > Computation and Language

Title:Zero-Shot Sharpness-Aware Quantization for Pre-trained Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Zero-Shot Sharpness-Aware Quantization for Pre-trained Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators