-
Domain Adaptation of Llama3-70B-Instruct through Continual Pre-Training and Model Merging: A Comprehensive Evaluation
Authors:
Shamane Siriwardhana,
Mark McQuade,
Thomas Gauthier,
Lucas Atkins,
Fernando Fernandes Neto,
Luke Meyers,
Anneketh Vij,
Tyler Odenthal,
Charles Goddard,
Mary MacCarthy,
Jacob Solawetz
Abstract:
We conducted extensive experiments on domain adaptation of the Meta-Llama-3-70B-Instruct model on SEC data, exploring its performance on both general and domain-specific benchmarks. Our focus included continual pre-training (CPT) and model merging, aiming to enhance the model's domain-specific capabilities while mitigating catastrophic forgetting. Through this study, we evaluated the impact of int…
▽ More
We conducted extensive experiments on domain adaptation of the Meta-Llama-3-70B-Instruct model on SEC data, exploring its performance on both general and domain-specific benchmarks. Our focus included continual pre-training (CPT) and model merging, aiming to enhance the model's domain-specific capabilities while mitigating catastrophic forgetting. Through this study, we evaluated the impact of integrating financial regulatory data into a robust language model and examined the effectiveness of our model merging techniques in preserving and improving the model's instructive abilities. The model is accessible at hugging face: https://huggingface.co/arcee-ai/Llama-3-SEC-Base, arcee-ai/Llama-3-SEC-Base. This is an intermediate checkpoint of our final model, which has seen 20B tokens so far. The full model is still in the process of training. This is a preprint technical report with thorough evaluations to understand the entire process.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
Spectrum: Targeted Training on Signal to Noise Ratio
Authors:
Eric Hartford,
Lucas Atkins,
Fernando Fernandes Neto,
David Golchinfar
Abstract:
Efficiently post-training large language models remains a challenging task due to the vast computational resources required. We present Spectrum, a method that accelerates LLM training by selectively targeting layer modules based on their signal-to-noise ratio (SNR), and freezing the remaining modules. Our approach, which utilizes an algorithm to compute module SNRs prior to training, has shown to…
▽ More
Efficiently post-training large language models remains a challenging task due to the vast computational resources required. We present Spectrum, a method that accelerates LLM training by selectively targeting layer modules based on their signal-to-noise ratio (SNR), and freezing the remaining modules. Our approach, which utilizes an algorithm to compute module SNRs prior to training, has shown to effectively match the performance of full fine-tuning while reducing GPU memory usage. Experiments comparing Spectrum to existing methods such as QLoRA demonstrate its effectiveness in terms of model quality and VRAM efficiency in distributed environments.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
Deep Haar Scattering Networks in Pattern Recognition: A promising approach
Authors:
Fernando Fernandes Neto,
Alemayehu Admasu Solomon,
Rodrigo de Losso,
Claudio Garcia,
Pedro Delano Cavalcanti
Abstract:
The aim of this paper is to discuss the use of Haar scattering networks, which is a very simple architecture that naturally supports a large number of stacked layers, yet with very few parameters, in a relatively broad set of pattern recognition problems, including regression and classification tasks. This architecture, basically, consists of stacking convolutional filters, that can be thought as…
▽ More
The aim of this paper is to discuss the use of Haar scattering networks, which is a very simple architecture that naturally supports a large number of stacked layers, yet with very few parameters, in a relatively broad set of pattern recognition problems, including regression and classification tasks. This architecture, basically, consists of stacking convolutional filters, that can be thought as a generalization of Haar wavelets, followed by non-linear operators which aim to extract symmetries and invariances that are later fed in a classification/regression algorithm. We show that good results can be obtained with the proposed method for both kind of tasks. We have outperformed the best available algorithms in 4 out of 18 important data classification problems, and have obtained a more robust performance than ARIMA and ETS time series methods in regression problems for data with strong periodicities.
△ Less
Submitted 29 November, 2018;
originally announced November 2018.
-
Building Function Approximators on top of Haar Scattering Networks
Authors:
Fernando Fernandes Neto
Abstract:
In this article we propose building general-purpose function approximators on top of Haar Scattering Networks. We advocate that this architecture enables a better comprehension of feature extraction, in addition to its implementation simplicity and low computational costs. We show its approximation and feature extraction capabilities in a wide range of different problems, which can be applied on s…
▽ More
In this article we propose building general-purpose function approximators on top of Haar Scattering Networks. We advocate that this architecture enables a better comprehension of feature extraction, in addition to its implementation simplicity and low computational costs. We show its approximation and feature extraction capabilities in a wide range of different problems, which can be applied on several phenomena in signal processing, system identification, econometrics and other potential fields.
△ Less
Submitted 9 April, 2018;
originally announced April 2018.
-
Generative Models for Stochastic Processes Using Convolutional Neural Networks
Authors:
Fernando Fernandes Neto
Abstract:
The present paper aims to demonstrate the usage of Convolutional Neural Networks as a generative model for stochastic processes, enabling researchers from a wide range of fields (such as quantitative finance and physics) to develop a general tool for forecasts and simulations without the need to identify/assume a specific system structure or estimate its parameters.
The present paper aims to demonstrate the usage of Convolutional Neural Networks as a generative model for stochastic processes, enabling researchers from a wide range of fields (such as quantitative finance and physics) to develop a general tool for forecasts and simulations without the need to identify/assume a specific system structure or estimate its parameters.
△ Less
Submitted 8 January, 2018;
originally announced January 2018.