Selecting Informative Contexts Improves Language Model Finetuning

Antonello, Richard; Beckage, Nicole; Turek, Javier; Huth, Alexander

Computer Science > Computation and Language

arXiv:2005.00175 (cs)

[Submitted on 1 May 2020 (v1), last revised 19 May 2022 (this version, v3)]

Title:Selecting Informative Contexts Improves Language Model Finetuning

Authors:Richard Antonello, Nicole Beckage, Javier Turek, Alexander Huth

View PDF

Abstract:Language model fine-tuning is essential for modern natural language processing, but is computationally expensive and time-consuming. Further, the effectiveness of fine-tuning is limited by the inclusion of training examples that negatively affect performance. Here we present a general fine-tuning method that we call information gain filtration for improving the overall training efficiency and final performance of language model fine-tuning. We define the information gain of an example as the improvement on a test metric after training on that example. A secondary learner is then trained to approximate this quantity. During fine-tuning, this learner selects informative examples and skips uninformative ones. We show that our method has consistent improvement across datasets, fine-tuning tasks, and language model architectures. For example, we achieve a median perplexity of 54.0 on a books dataset compared to 57.3 for standard fine-tuning. We present statistical evidence that offers insight into the improvements of our method over standard fine-tuning. The generality of our method leads us to propose a new paradigm for language model fine-tuning -- we encourage researchers to release pretrained secondary learners on common corpora to promote efficient and effective fine-tuning, thereby improving the performance and reducing the overall energy footprint of language model fine-tuning.

Comments:	Accepted submission at the Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2005.00175 [cs.CL]
	(or arXiv:2005.00175v3 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2005.00175

Submission history

From: Richard Antonello [view email]
[v1] Fri, 1 May 2020 02:01:18 UTC (837 KB)
[v2] Sat, 8 May 2021 20:22:21 UTC (7,378 KB)
[v3] Thu, 19 May 2022 22:49:00 UTC (11,473 KB)

Computer Science > Computation and Language

Title:Selecting Informative Contexts Improves Language Model Finetuning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Selecting Informative Contexts Improves Language Model Finetuning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators