A Study of Optimizations for Fine-tuning Large Language Models

Singh, Arjun; Pandey, Nikhil; Shirgaonkar, Anup; Manoj, Pavan; Aski, Vijay

Computer Science > Machine Learning

arXiv:2406.02290 (cs)

[Submitted on 4 Jun 2024 (v1), last revised 6 Jun 2024 (this version, v2)]

Title:A Study of Optimizations for Fine-tuning Large Language Models

Authors:Arjun Singh, Nikhil Pandey, Anup Shirgaonkar, Pavan Manoj, Vijay Aski

View PDF HTML (experimental)

Abstract:Fine-tuning large language models is a popular choice among users trying to adapt them for specific applications. However, fine-tuning these models is a demanding task because the user has to examine several factors, such as resource budget, runtime, model size and context length among others. A specific challenge is that fine-tuning is memory intensive, imposing constraints on the required hardware memory and context length of training data that can be handled. In this work, we share a detailed study on a variety of fine-tuning optimizations across different fine-tuning scenarios. In particular, we assess Gradient Checkpointing, Low-Rank Adaptation, DeepSpeed's Zero Redundancy Optimizer and FlashAttention. With a focus on memory and runtime, we examine the impact of different optimization combinations on GPU memory usage and execution runtime during fine-tuning phase. We provide our recommendation on the best default optimization for balancing memory and runtime across diverse model sizes. We share effective strategies for fine-tuning very large models with tens or hundreds of billions of parameters and enabling large context lengths during fine-tuning. Furthermore, we propose the appropriate optimization mixtures for fine-tuning under GPU resource limitations.

Comments:	10 pages, 4 figures. Revised text for clarity, updated references
Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2406.02290 [cs.LG]
	(or arXiv:2406.02290v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2406.02290

Submission history

From: Nikhil Pandey [view email]
[v1] Tue, 4 Jun 2024 13:05:47 UTC (477 KB)
[v2] Thu, 6 Jun 2024 16:09:31 UTC (479 KB)

Computer Science > Machine Learning

Title:A Study of Optimizations for Fine-tuning Large Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:A Study of Optimizations for Fine-tuning Large Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators