nanoLM: an Affordable LLM Pre-training Benchmark via Accurate Loss Prediction across Scales

Yao, Yiqun; fan, Siqi; Huang, Xiusheng; Fang, Xuezhi; Li, Xiang; Ni, Ziyi; Jiang, Xin; Meng, Xuying; Han, Peng; Shang, Shuo; Liu, Kang; Sun, Aixin; Wang, Yequan

Computer Science > Computation and Language

arXiv:2304.06875v4 (cs)

[Submitted on 14 Apr 2023 (v1), last revised 6 Apr 2024 (this version, v4)]

Title:nanoLM: an Affordable LLM Pre-training Benchmark via Accurate Loss Prediction across Scales

Authors:Yiqun Yao, Siqi fan, Xiusheng Huang, Xuezhi Fang, Xiang Li, Ziyi Ni, Xin Jiang, Xuying Meng, Peng Han, Shuo Shang, Kang Liu, Aixin Sun, Yequan Wang

View PDF HTML (experimental)

Abstract:As language models scale up, it becomes increasingly expensive to verify research ideas because conclusions on small models do not trivially transfer to large ones. A possible solution is to establish a generic system that accurately predicts certain metrics for large models without training them. Existing scaling laws require hyperparameter search on the largest models, limiting their predicative capability. In this paper, we present an approach (namely {\mu}Scaling) to predict the pre-training loss, based on our observations that Maximal Update Parametrization ({\mu}P) enables accurate fitting of scaling laws close to common loss basins in hyperparameter space. With {\mu}Scaling, different model designs can be compared on large scales by training only their smaller counterparts. Further, we introduce nanoLM: an affordable LLM pre-training benchmark that facilitates this new research paradigm. With around 14% of the one-time pre-training cost, we can accurately forecast the loss for models up to 52B. Our goal with nanoLM is to empower researchers with limited resources to reach meaningful conclusions on large models. We also aspire for our benchmark to serve as a bridge between the academic community and the industry. Code for {\mu}Scaling is available at this https URL. Code for nanoLLM will be available later.

Comments:	This is a modified and extended version of our previous Mu-scaling work released in April 2023 (see v1)
Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2304.06875 [cs.CL]
	(or arXiv:2304.06875v4 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2304.06875

Submission history

From: Yiqun Yao [view email]
[v1] Fri, 14 Apr 2023 00:45:01 UTC (538 KB)
[v2] Sat, 29 Apr 2023 03:14:58 UTC (242 KB)
[v3] Sun, 3 Sep 2023 06:55:28 UTC (259 KB)
[v4] Sat, 6 Apr 2024 05:50:39 UTC (928 KB)

Computer Science > Computation and Language

Title:nanoLM: an Affordable LLM Pre-training Benchmark via Accurate Loss Prediction across Scales

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:nanoLM: an Affordable LLM Pre-training Benchmark via Accurate Loss Prediction across Scales

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators