EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism

Chen, Yanxi; Pan, Xuchen; Li, Yaliang; Ding, Bolin; Zhou, **gren

Computer Science > Machine Learning

arXiv:2312.04916 (cs)

[Submitted on 8 Dec 2023 (v1), last revised 16 Jun 2024 (this version, v3)]

Title:EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism

Authors:Yanxi Chen, Xuchen Pan, Yaliang Li, Bolin Ding, **gren Zhou

View PDF HTML (experimental)

Abstract:We present EE-LLM, a framework for large-scale training and inference of early-exit large language models (LLMs). While recent works have shown preliminary evidence for the efficacy of early exiting in accelerating LLM inference, EE-LLM makes a foundational step towards scaling up early-exit LLMs by supporting their training and inference with massive 3D parallelism. Built upon Megatron-LM, EE-LLM implements a variety of algorithmic innovations and performance optimizations tailored to early exiting, including a lightweight method that facilitates backpropagation for the early-exit training objective with pipeline parallelism, techniques of leveraging idle resources in the original pipeline schedule for computation related to early-exit layers, and two approaches of early-exit inference that are compatible with KV caching for autoregressive generation. Our analytical and empirical study shows that EE-LLM achieves great training efficiency with negligible computational overhead compared to standard LLM training, as well as outstanding inference speedup without compromising output quality. To facilitate further research and adoption, we release EE-LLM at this https URL.

Comments:	ICML 2024 camera-ready version
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
Cite as:	arXiv:2312.04916 [cs.LG]
	(or arXiv:2312.04916v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2312.04916

Submission history

From: Yanxi Chen [view email]
[v1] Fri, 8 Dec 2023 09:31:50 UTC (1,625 KB)
[v2] Thu, 1 Feb 2024 11:58:27 UTC (1,561 KB)
[v3] Sun, 16 Jun 2024 08:37:25 UTC (1,267 KB)

Computer Science > Machine Learning

Title:EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators