-
Injecting New Knowledge into Large Language Models via Supervised Fine-Tuning
Authors:
Nick Mecklenburg,
Yiyou Lin,
Xiaoxiao Li,
Daniel Holstein,
Leonardo Nunes,
Sara Malvar,
Bruno Silva,
Ranveer Chandra,
Vijay Aski,
Pavan Kumar Reddy Yannam,
Tolga Aktas,
Todd Hendry
Abstract:
In recent years, Large Language Models (LLMs) have shown remarkable performance in generating human-like text, proving to be a valuable asset across various applications. However, adapting these models to incorporate new, out-of-domain knowledge remains a challenge, particularly for facts and events that occur after the model's knowledge cutoff date. This paper investigates the effectiveness of Su…
▽ More
In recent years, Large Language Models (LLMs) have shown remarkable performance in generating human-like text, proving to be a valuable asset across various applications. However, adapting these models to incorporate new, out-of-domain knowledge remains a challenge, particularly for facts and events that occur after the model's knowledge cutoff date. This paper investigates the effectiveness of Supervised Fine-Tuning (SFT) as a method for knowledge injection in LLMs, specifically focusing on the domain of recent sporting events. We compare different dataset generation strategies -- token-based and fact-based scaling -- to create training data that helps the model learn new information. Our experiments on GPT-4 demonstrate that while token-based scaling can lead to improvements in Q&A accuracy, it may not provide uniform coverage of new knowledge. Fact-based scaling, on the other hand, offers a more systematic approach to ensure even coverage across all facts. We present a novel dataset generation process that leads to more effective knowledge ingestion through SFT, and our results show considerable performance improvements in Q&A tasks related to out-of-domain knowledge. This study contributes to the understanding of domain adaptation for LLMs and highlights the potential of SFT in enhancing the factuality of LLM responses in specific knowledge domains.
△ Less
Submitted 2 April, 2024; v1 submitted 29 March, 2024;
originally announced April 2024.
-
RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture
Authors:
Angels Balaguer,
Vinamra Benara,
Renato Luiz de Freitas Cunha,
Roberto de M. Estevão Filho,
Todd Hendry,
Daniel Holstein,
Jennifer Marsman,
Nick Mecklenburg,
Sara Malvar,
Leonardo O. Nunes,
Rafael Padilha,
Morris Sharp,
Bruno Silva,
Swati Sharma,
Vijay Aski,
Ranveer Chandra
Abstract:
There are two common ways in which developers are incorporating proprietary and domain-specific data when building applications of Large Language Models (LLMs): Retrieval-Augmented Generation (RAG) and Fine-Tuning. RAG augments the prompt with the external data, while fine-Tuning incorporates the additional knowledge into the model itself. However, the pros and cons of both approaches are not well…
▽ More
There are two common ways in which developers are incorporating proprietary and domain-specific data when building applications of Large Language Models (LLMs): Retrieval-Augmented Generation (RAG) and Fine-Tuning. RAG augments the prompt with the external data, while fine-Tuning incorporates the additional knowledge into the model itself. However, the pros and cons of both approaches are not well understood. In this paper, we propose a pipeline for fine-tuning and RAG, and present the tradeoffs of both for multiple popular LLMs, including Llama2-13B, GPT-3.5, and GPT-4. Our pipeline consists of multiple stages, including extracting information from PDFs, generating questions and answers, using them for fine-tuning, and leveraging GPT-4 for evaluating the results. We propose metrics to assess the performance of different stages of the RAG and fine-Tuning pipeline. We conduct an in-depth study on an agricultural dataset. Agriculture as an industry has not seen much penetration of AI, and we study a potentially disruptive application - what if we could provide location-specific insights to a farmer? Our results show the effectiveness of our dataset generation pipeline in capturing geographic-specific knowledge, and the quantitative and qualitative benefits of RAG and fine-tuning. We see an accuracy increase of over 6 p.p. when fine-tuning the model and this is cumulative with RAG, which increases accuracy by 5 p.p. further. In one particular experiment, we also demonstrate that the fine-tuned model leverages information from across geographies to answer specific questions, increasing answer similarity from 47% to 72%. Overall, the results point to how systems built using LLMs can be adapted to respond and incorporate knowledge across a dimension that is critical for a specific industry, paving the way for further applications of LLMs in other industrial domains.
△ Less
Submitted 30 January, 2024; v1 submitted 16 January, 2024;
originally announced January 2024.
-
Impact of noise and damage on collective dynamics of scale-free neuronal networks
Authors:
D. Holstein,
A. V. Goltsev,
J. F. F. Mendes
Abstract:
We study the role of scale-free structure and noise in collective dynamics of neuronal networks. For this purpose, we simulate and study analytically a cortical circuit model with stochastic neurons. We compare collective neuronal activity of networks with different topologies: classical random graphs and scale-free networks. We show that, in scale-free networks with divergent second moment of deg…
▽ More
We study the role of scale-free structure and noise in collective dynamics of neuronal networks. For this purpose, we simulate and study analytically a cortical circuit model with stochastic neurons. We compare collective neuronal activity of networks with different topologies: classical random graphs and scale-free networks. We show that, in scale-free networks with divergent second moment of degree distribution, an influence of noise on neuronal activity is strongly enhanced in comparison with networks with a finite second moment. A very small noise level can stimulate spontaneous activity of a finite fraction of neurons and sustained network oscillations. We demonstrate tolerance of collective dynamics of the scale-free networks to random damage in a broad range of the number of randomly removed excitatory and inhibitory neurons. A random removal of neurons leads to gradual decrease of frequency of network oscillations similar to the slowing-down of the alpha rhythm in Alzheimer's disease. However, the networks are vulnerable to targeted attacks. A removal of a few excitatory or inhibitory hubs can impair sustained network oscillations.
△ Less
Submitted 29 November, 2012;
originally announced November 2012.
-
Optimal Markov Approximations and Generalized Embeddings
Authors:
Detlef Holstein,
Holger Kantz
Abstract:
Based on information theory, we present a method to determine an optimal Markov approximation for modelling and prediction from time series data. The method finds a balance between minimal modelling errors by taking as much as possible memory into account and minimal statistical errors by working in embedding spaces of rather small dimension. A key ingredient is an estimate of the statistical er…
▽ More
Based on information theory, we present a method to determine an optimal Markov approximation for modelling and prediction from time series data. The method finds a balance between minimal modelling errors by taking as much as possible memory into account and minimal statistical errors by working in embedding spaces of rather small dimension. A key ingredient is an estimate of the statistical error of entropy estimates. The method is illustrated with several examples and the consequences for prediction are evaluated by means of the root mean squard prediction error for point prediction.
△ Less
Submitted 11 August, 2008;
originally announced August 2008.
-
Entropies in case of continuous time
Authors:
Detlef Holstein
Abstract:
Information theory on a time-discrete setting in the framework of time series analysis is generalized to the time-continuous case. Considerations of the Roessler and Lorenz dynamics as well as the Ornstein-Uhlenbeck process yield for time-continuous entropies a new possibility for the distinction of chaos and noise. In the deterministic case an upper threshold of the joint uncertainty in the lim…
▽ More
Information theory on a time-discrete setting in the framework of time series analysis is generalized to the time-continuous case. Considerations of the Roessler and Lorenz dynamics as well as the Ornstein-Uhlenbeck process yield for time-continuous entropies a new possibility for the distinction of chaos and noise. In the deterministic case an upper threshold of the joint uncertainty in the limit of infinitely high sampling rate can be found and the entropy rate can be calculated as a usual time derivative of the entropy. In a three-dimensional representation the dependence of the joint entropy on space resolution, discretization time step length and uncertainty-assessed time is shown in a unified manner. Hence the dimension and the Kolmogorov-Sinai entropy rate of any dynamics can be read out as limit cases from one single graph.
△ Less
Submitted 3 June, 2008;
originally announced June 2008.
-
Precursors of extreme increments
Authors:
Sarah Hallerberg,
Eduardo G. Altmann,
Detlef Holstein,
Holger Kantz
Abstract:
We investigate precursors and predictability of extreme increments in a time series. The events we are focusing on consist in large increments within successive time steps. We are especially interested in understanding how the quality of the predictions depends on the strategy to choose precursors, on the size of the event and on the correlation strength. We study the prediction of extreme incre…
▽ More
We investigate precursors and predictability of extreme increments in a time series. The events we are focusing on consist in large increments within successive time steps. We are especially interested in understanding how the quality of the predictions depends on the strategy to choose precursors, on the size of the event and on the correlation strength. We study the prediction of extreme increments analytically in an AR(1) process, and numerically in wind speed recordings and long-range correlated ARMA data. We evaluate the success of predictions via receiver operator characteristics (ROC-curves). Furthermore, we observe an increase of the quality of predictions with increasing event size and with decreasing correlation in all examples. Both effects can be understood by using the likelihood ratio as a summary index for smooth ROC-curves.
△ Less
Submitted 12 September, 2006; v1 submitted 20 April, 2006;
originally announced April 2006.