LoSparse: Structured Compression of Large Language Models based on Low-Rank and Sparse Approximation

Li, Yixiao; Yu, Yifan; Zhang, Qingru; Liang, Chen; He, Pengcheng; Chen, Weizhu; Zhao, Tuo

Computer Science > Machine Learning

arXiv:2306.11222 (cs)

[Submitted on 20 Jun 2023 (v1), last revised 26 Jun 2023 (this version, v2)]

Title:LoSparse: Structured Compression of Large Language Models based on Low-Rank and Sparse Approximation

Authors:Yixiao Li, Yifan Yu, Qingru Zhang, Chen Liang, Pengcheng He, Weizhu Chen, Tuo Zhao

View PDF

Abstract:Transformer models have achieved remarkable results in various natural language tasks, but they are often prohibitively large, requiring massive memories and computational resources. To reduce the size and complexity of these models, we propose LoSparse (Low-Rank and Sparse approximation), a novel model compression technique that approximates a weight matrix by the sum of a low-rank matrix and a sparse matrix. Our method combines the advantages of both low-rank approximations and pruning, while avoiding their limitations. Low-rank approximation compresses the coherent and expressive parts in neurons, while pruning removes the incoherent and non-expressive parts in neurons. Pruning enhances the diversity of low-rank approximations, and low-rank approximation prevents pruning from losing too many expressive neurons. We evaluate our method on natural language understanding, question answering, and natural language generation tasks. We show that it significantly outperforms existing compression methods.

Subjects:	Machine Learning (cs.LG); Computation and Language (cs.CL)
Cite as:	arXiv:2306.11222 [cs.LG]
	(or arXiv:2306.11222v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2306.11222

Submission history

From: Yixiao Li [view email]
[v1] Tue, 20 Jun 2023 01:16:11 UTC (489 KB)
[v2] Mon, 26 Jun 2023 15:34:57 UTC (419 KB)

Computer Science > Machine Learning

Title:LoSparse: Structured Compression of Large Language Models based on Low-Rank and Sparse Approximation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:LoSparse: Structured Compression of Large Language Models based on Low-Rank and Sparse Approximation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators