Frustratingly Simple Memory Efficiency for Pre-trained Language Models via Dynamic Embedding Pruning

Williams, Miles; Aletras, Nikolaos

Computer Science > Computation and Language

arXiv:2309.08708 (cs)

[Submitted on 15 Sep 2023]

Title:Frustratingly Simple Memory Efficiency for Pre-trained Language Models via Dynamic Embedding Pruning

Authors:Miles Williams, Nikolaos Aletras

View PDF

Abstract:The extensive memory footprint of pre-trained language models (PLMs) can hinder deployment in memory-constrained settings, such as cloud environments or on-device. PLMs use embedding matrices to represent extensive vocabularies, forming a large proportion of the model parameters. While previous work towards parameter-efficient PLM development has considered pruning parameters within the transformer layers, pruning the embedding matrix as part of fine-tuning or inference has yet to be explored. We first demonstrate that a significant proportion of the vocabulary remains unused in these scenarios. We then propose a simple yet effective approach that leverages this finding to minimize the memory footprint of the embedding matrix. We show that this approach provides substantial reductions in memory usage across a wide range of models and tasks. Notably, our approach maintains equivalent downstream task performance while allowing a more efficient use of compute resources.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2309.08708 [cs.CL]
	(or arXiv:2309.08708v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2309.08708

Submission history

From: Miles Williams [view email]
[v1] Fri, 15 Sep 2023 19:00:00 UTC (7,679 KB)

Computer Science > Computation and Language

Title:Frustratingly Simple Memory Efficiency for Pre-trained Language Models via Dynamic Embedding Pruning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Frustratingly Simple Memory Efficiency for Pre-trained Language Models via Dynamic Embedding Pruning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators