Search | arXiv e-print repository

Domain Adaptation of Llama3-70B-Instruct through Continual Pre-Training and Model Merging: A Comprehensive Evaluation

Authors: Shamane Siriwardhana, Mark McQuade, Thomas Gauthier, Lucas Atkins, Fernando Fernandes Neto, Luke Meyers, Anneketh Vij, Tyler Odenthal, Charles Goddard, Mary MacCarthy, Jacob Solawetz

Abstract: We conducted extensive experiments on domain adaptation of the Meta-Llama-3-70B-Instruct model on SEC data, exploring its performance on both general and domain-specific benchmarks. Our focus included continual pre-training (CPT) and model merging, aiming to enhance the model's domain-specific capabilities while mitigating catastrophic forgetting. Through this study, we evaluated the impact of int… ▽ More We conducted extensive experiments on domain adaptation of the Meta-Llama-3-70B-Instruct model on SEC data, exploring its performance on both general and domain-specific benchmarks. Our focus included continual pre-training (CPT) and model merging, aiming to enhance the model's domain-specific capabilities while mitigating catastrophic forgetting. Through this study, we evaluated the impact of integrating financial regulatory data into a robust language model and examined the effectiveness of our model merging techniques in preserving and improving the model's instructive abilities. The model is accessible at hugging face: https://huggingface.co/arcee-ai/Llama-3-SEC-Base, arcee-ai/Llama-3-SEC-Base. This is an intermediate checkpoint of our final model, which has seen 20B tokens so far. The full model is still in the process of training. This is a preprint technical report with thorough evaluations to understand the entire process. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: 8 pages, 6 figures

arXiv:2403.13257 [pdf, other]

Arcee's MergeKit: A Toolkit for Merging Large Language Models

Authors: Charles Goddard, Shamane Siriwardhana, Malikeh Ehghaghi, Luke Meyers, Vlad Karpukhin, Brian Benedict, Mark McQuade, Jacob Solawetz

Abstract: The rapid expansion of the open-source language model landscape presents an opportunity to merge the competencies of these model checkpoints by combining their parameters. Advances in transfer learning, the process of fine-tuning pretrained models for specific tasks, has resulted in the development of vast amounts of task-specific models, typically specialized in individual tasks and unable to uti… ▽ More The rapid expansion of the open-source language model landscape presents an opportunity to merge the competencies of these model checkpoints by combining their parameters. Advances in transfer learning, the process of fine-tuning pretrained models for specific tasks, has resulted in the development of vast amounts of task-specific models, typically specialized in individual tasks and unable to utilize each other's strengths. Model merging facilitates the creation of multitask models without the need for additional training, offering a promising avenue for enhancing model performance and versatility. By preserving the intrinsic capabilities of the original models, model merging addresses complex challenges in AI - including the difficulties of catastrophic forgetting and multitask learning. To support this expanding area of research, we introduce MergeKit, a comprehensive, open-source library designed to facilitate the application of model merging strategies. MergeKit offers an extensible framework to efficiently merge models on any hardware, providing utility to researchers and practitioners. To date, thousands of models have been merged by the open-source community, leading to the creation of some of the worlds most powerful open-source model checkpoints, as assessed by the Open LLM Leaderboard. The library is accessible at https://github.com/arcee-ai/MergeKit. △ Less

Submitted 20 March, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

Comments: 11 pages, 4 figures

arXiv:1912.00803 [pdf, other]

doi 10.13140/RG.2.2.22720.28166

Policies for constraining the behaviour of coalitions of agents in the context of algebraic information theory

Authors: Christopher Goddard

Abstract: This article takes an oblique sidestep from two previous papers, wherein an approach to reformulation of game theory in terms of information theory, topology, as well as a few other notions was indicated. In this document a description is provided as to how one might determine an approach for an agent to choose a policy concerning which actions to take in a game that constrains behaviour of subsid… ▽ More This article takes an oblique sidestep from two previous papers, wherein an approach to reformulation of game theory in terms of information theory, topology, as well as a few other notions was indicated. In this document a description is provided as to how one might determine an approach for an agent to choose a policy concerning which actions to take in a game that constrains behaviour of subsidiary agents. It is then demonstrated how these results in algebraic information theory, together with previous investigations in geometric and topological information theory, can be unified into a single cohesive framework. △ Less

Submitted 28 November, 2019; originally announced December 2019.

Comments: 27 pages

arXiv:1608.06697 [pdf]

Semantic descriptions of 24 evaluational adjectives, for application in sentiment analysis

Authors: Cliff Goddard, Maite Taboada, Radoslava Trnavac

Abstract: We apply the Natural Semantic Metalanguage (NSM) approach (Goddard and Wierzbicka 2014) to the lexical-semantic analysis of English evaluational adjectives and compare the results with the picture developed in the Appraisal Framework (Martin and White 2005). The analysis is corpus-assisted, with examples mainly drawn from film and book reviews, and supported by collocational and statistical inform… ▽ More We apply the Natural Semantic Metalanguage (NSM) approach (Goddard and Wierzbicka 2014) to the lexical-semantic analysis of English evaluational adjectives and compare the results with the picture developed in the Appraisal Framework (Martin and White 2005). The analysis is corpus-assisted, with examples mainly drawn from film and book reviews, and supported by collocational and statistical information from WordBanks Online. We propose NSM explications for 24 evaluational adjectives, arguing that they fall into five groups, each of which corresponds to a distinct semantic template. The groups can be sketched as follows: "First-person thought-plus-affect", e.g. wonderful; "Experiential", e.g. entertaining; "Experiential with bodily reaction", e.g. grip**; "Lasting impact", e.g. memorable; "Cognitive evaluation", e.g. complex, excellent. These grou**s and semantic templates are compared with the classifications in the Appraisal Framework's system of Appreciation. In addition, we are particularly interested in sentiment analysis, the automatic identification of evaluation and subjectivity in text. We discuss the relevance of the two frameworks for sentiment analysis and other language technology applications. △ Less

Submitted 23 August, 2016; originally announced August 2016.

Report number: SFU-CMPT TR 2016-42-1

Showing 1–4 of 4 results for author: Goddard, C