Is It a Free Lunch for Removing Outliers during Pretraining?

Liao, Baohao; Monz, Christof

Computer Science > Computation and Language

arXiv:2402.12102 (cs)

[Submitted on 19 Feb 2024]

Title:Is It a Free Lunch for Removing Outliers during Pretraining?

Authors:Baohao Liao, Christof Monz

View PDF

Abstract:With the growing size of large language models, the role of quantization becomes increasingly significant. However, outliers present in weights or activations notably influence the performance of quantized models. Recently, \citet{qtransformer} introduced a novel softmax function aimed at pretraining models in an outlier-free manner, thereby enhancing their suitability for quantization. Interestingly, we observed that such an approach leads to performance degradation in full precision. Building on this insight, we enhance the method by ensuring its normalization is invariant to sequence length, a crucial factor for bridging the gap between pretraining and fine-tuning. Moreover, this improved method also facilitates successful pretraining of causal language models.

Comments:	5 pages, 3 figures, 1 table
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2402.12102 [cs.CL]
	(or arXiv:2402.12102v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2402.12102

Submission history

From: Baohao Liao [view email]
[v1] Mon, 19 Feb 2024 12:45:52 UTC (1,617 KB)

Full-text links:

Access Paper:

view license

Current browse context:

< prev | next >

new | recent | 2024-02

Change to browse by:

cs.AI
cs.CL

References & Citations

export BibTeX citation

Computer Science > Computation and Language

Title:Is It a Free Lunch for Removing Outliers during Pretraining?

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Is It a Free Lunch for Removing Outliers during Pretraining?

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators