SliceGPT: Compress Large Language Models by Deleting Rows and Columns

Ashkboos, Saleh; Croci, Maximilian L.; Nascimento, Marcelo Gennari do; Hoefler, Torsten; Hensman, James

Computer Science > Machine Learning

arXiv:2401.15024 (cs)

[Submitted on 26 Jan 2024 (v1), last revised 9 Feb 2024 (this version, v2)]

Title:SliceGPT: Compress Large Language Models by Deleting Rows and Columns

Authors:Saleh Ashkboos, Maximilian L. Croci, Marcelo Gennari do Nascimento, Torsten Hoefler, James Hensman

View PDF

Abstract:Large language models have become the cornerstone of natural language processing, but their use comes with substantial costs in terms of compute and memory resources. Sparsification provides a solution to alleviate these resource constraints, and recent works have shown that trained models can be sparsified post-hoc. Existing sparsification techniques face challenges as they need additional data structures and offer constrained speedup with current hardware. In this paper we present SliceGPT, a new post-training sparsification scheme which replaces each weight matrix with a smaller (dense) matrix, reducing the embedding dimension of the network. Through extensive experimentation, we show that SliceGPT can remove up to 25% of the model parameters (including embeddings) for LLAMA2-70B, OPT 66B and Phi-2 models while maintaining 99%, 99% and 90% zero-shot task performance of the dense model respectively. Our sliced models run on fewer GPUs and run faster without any additional code optimization: on 24GB consumer GPUs we reduce the total compute for inference on LLAMA2-70B to 64% of that of the dense model; on 40GB A100 GPUs we reduce it to 66%. We offer a new insight, computational invariance in transformer networks, which enables SliceGPT and we hope it will inspire and enable future avenues to reduce memory and computation demands for pre-trained models. Code is available at: this https URL

Comments:	22 pages, 8 figures, accepted at ICLR24
Subjects:	Machine Learning (cs.LG); Computation and Language (cs.CL)
Cite as:	arXiv:2401.15024 [cs.LG]
	(or arXiv:2401.15024v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2401.15024

Submission history

From: Maximilian Croci [view email]
[v1] Fri, 26 Jan 2024 17:35:45 UTC (176 KB)
[v2] Fri, 9 Feb 2024 17:59:40 UTC (176 KB)

Computer Science > Machine Learning

Title:SliceGPT: Compress Large Language Models by Deleting Rows and Columns

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:SliceGPT: Compress Large Language Models by Deleting Rows and Columns

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators