GQKVA: Efficient Pre-training of Transformers by Grou** Queries, Keys, and Values

Javadi, Farnoosh; Ahmed, Walid; Hajimolahoseini, Habib; Ataiefard, Foozhan; Hassanpour, Mohammad; Asani, Saina; Wen, Austin; Awad, Omar Mohamed; Liu, Kangling; Liu, Yang

Computer Science > Machine Learning

arXiv:2311.03426 (cs)

[Submitted on 6 Nov 2023 (v1), last revised 13 Dec 2023 (this version, v2)]

Title:GQKVA: Efficient Pre-training of Transformers by Grou** Queries, Keys, and Values

Authors:Farnoosh Javadi, Walid Ahmed, Habib Hajimolahoseini, Foozhan Ataiefard, Mohammad Hassanpour, Saina Asani, Austin Wen, Omar Mohamed Awad, Kangling Liu, Yang Liu

View PDF HTML (experimental)

Abstract:Massive transformer-based models face several challenges, including slow and computationally intensive pre-training and over-parametrization. This paper addresses these challenges by proposing a versatile method called GQKVA, which generalizes query, key, and value grou** techniques. GQKVA is designed to speed up transformer pre-training while reducing the model size. Our experiments with various GQKVA variants highlight a clear trade-off between performance and model size, allowing for customized choices based on resource and time limitations. Our findings also indicate that the conventional multi-head attention approach is not always the best choice, as there are lighter and faster alternatives available. We tested our method on ViT, which achieved an approximate 0.3% increase in accuracy while reducing the model size by about 4% in the task of image classification. Additionally, our most aggressive model reduction experiment resulted in a reduction of approximately 15% in model size, with only around a 1% drop in accuracy.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2311.03426 [cs.LG]
	(or arXiv:2311.03426v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2311.03426

Submission history

From: Farnoosh Javadi [view email]
[v1] Mon, 6 Nov 2023 17:29:24 UTC (814 KB)
[v2] Wed, 13 Dec 2023 16:57:19 UTC (814 KB)

Computer Science > Machine Learning

Title:GQKVA: Efficient Pre-training of Transformers by Grou** Queries, Keys, and Values

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:GQKVA: Efficient Pre-training of Transformers by Grou** Queries, Keys, and Values

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators