Towards Next-Level Post-Training Quantization of Hyper-Scale Transformers

Kim, Junhan; Park, Kyungphil; Lee, Chungman; Kim, Ho-young; Kim, Joonyoung; Jeon, Yongkweon

Computer Science > Machine Learning

arXiv:2402.08958 (cs)

[Submitted on 14 Feb 2024]

Title:Towards Next-Level Post-Training Quantization of Hyper-Scale Transformers

Authors:Junhan Kim, Kyungphil Park, Chungman Lee, Ho-young Kim, Joonyoung Kim, Yongkweon Jeon

View PDF HTML (experimental)

Abstract:With the increasing complexity of generative AI models, post-training quantization (PTQ) has emerged as a promising solution for deploying hyper-scale models on edge devices such as mobile devices and TVs. Existing PTQ schemes, however, consume considerable time and resources, which could be a bottleneck in real situations where frequent model updates and multiple hyper-parameter tunings are required. As a cost-effective alternative, one-shot PTQ schemes have been proposed. Still, the performance is somewhat limited because they cannot consider the inter-layer dependency within the attention module, which is a very important feature of Transformers. In this paper, we thus propose a novel PTQ algorithm that balances accuracy and efficiency. The key idea of the proposed algorithm called aespa is to perform quantization layer-wise for efficiency while considering cross-layer dependency to preserve the attention score. Through extensive experiments on various language models and complexity analysis, we demonstrate that aespa is accurate and efficient in quantizing Transformer models.

Comments:	17 pages, under review
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2402.08958 [cs.LG]
	(or arXiv:2402.08958v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2402.08958

Submission history

From: Yongkweon Jeon [view email]
[v1] Wed, 14 Feb 2024 05:58:43 UTC (342 KB)

Computer Science > Machine Learning

Title:Towards Next-Level Post-Training Quantization of Hyper-Scale Transformers

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Towards Next-Level Post-Training Quantization of Hyper-Scale Transformers

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators