-
Enhancing Large Language Model Performance To Answer Questions and Extract Information More Accurately
Authors:
Liang Zhang,
Katherine Jijo,
Spurthi Setty,
Eden Chung,
Fatima Javid,
Natan Vidra,
Tommy Clifford
Abstract:
Large Language Models (LLMs) generate responses to questions; however, their effectiveness is often hindered by sub-optimal quality of answers and occasional failures to provide accurate responses to questions. To address these challenges, a fine-tuning process is employed, involving feedback and examples to refine models. The objective is to enhance AI models through continuous feedback loops, ut…
▽ More
Large Language Models (LLMs) generate responses to questions; however, their effectiveness is often hindered by sub-optimal quality of answers and occasional failures to provide accurate responses to questions. To address these challenges, a fine-tuning process is employed, involving feedback and examples to refine models. The objective is to enhance AI models through continuous feedback loops, utilizing metrics such as cosine similarity, LLM evaluation and Rouge-L scores to evaluate the models. Leveraging LLMs like GPT-3.5, GPT4ALL, and LLaMA2, and Claude, this approach is benchmarked on financial datasets, including the FinanceBench and RAG Instruct Benchmark Tester Dataset, illustrating the necessity of fine-tuning. The results showcase the capability of fine-tuned models to surpass the accuracy of zero-shot LLMs, providing superior question and answering capabilities. Notably, the combination of fine-tuning the LLM with a process known as Retrieval Augmented Generation (RAG) proves to generate responses with improved accuracy.
△ Less
Submitted 26 January, 2024;
originally announced February 2024.
-
Improving Classification Performance With Human Feedback: Label a few, we label the rest
Authors:
Natan Vidra,
Thomas Clifford,
Katherine Jijo,
Eden Chung,
Liang Zhang
Abstract:
In the realm of artificial intelligence, where a vast majority of data is unstructured, obtaining substantial amounts of labeled data to train supervised machine learning models poses a significant challenge. To address this, we delve into few-shot and active learning, where are goal is to improve AI models with human feedback on a few labeled examples. This paper focuses on understanding how a co…
▽ More
In the realm of artificial intelligence, where a vast majority of data is unstructured, obtaining substantial amounts of labeled data to train supervised machine learning models poses a significant challenge. To address this, we delve into few-shot and active learning, where are goal is to improve AI models with human feedback on a few labeled examples. This paper focuses on understanding how a continuous feedback loop can refine models, thereby enhancing their accuracy, recall, and precision through incremental human input. By employing Large Language Models (LLMs) such as GPT-3.5, BERT, and SetFit, we aim to analyze the efficacy of using a limited number of labeled examples to substantially improve model accuracy. We benchmark this approach on the Financial Phrasebank, Banking, Craigslist, Trec, Amazon Reviews datasets to prove that with just a few labeled examples, we are able to surpass the accuracy of zero shot large language models to provide enhanced text classification performance. We demonstrate that rather than needing to manually label millions of rows of data, we just need to label a few and the model can effectively predict the rest.
△ Less
Submitted 17 January, 2024;
originally announced January 2024.
-
MBW: Multi-view Bootstrap** in the Wild
Authors:
Mosam Dabhi,
Chaoyang Wang,
Tim Clifford,
Laszlo Attila Jeni,
Ian R. Fasel,
Simon Lucey
Abstract:
Labeling articulated objects in unconstrained settings have a wide variety of applications including entertainment, neuroscience, psychology, ethology, and many fields of medicine. Large offline labeled datasets do not exist for all but the most common articulated object categories (e.g., humans). Hand labeling these landmarks within a video sequence is a laborious task. Learned landmark detectors…
▽ More
Labeling articulated objects in unconstrained settings have a wide variety of applications including entertainment, neuroscience, psychology, ethology, and many fields of medicine. Large offline labeled datasets do not exist for all but the most common articulated object categories (e.g., humans). Hand labeling these landmarks within a video sequence is a laborious task. Learned landmark detectors can help, but can be error-prone when trained from only a few examples. Multi-camera systems that train fine-grained detectors have shown significant promise in detecting such errors, allowing for self-supervised solutions that only need a small percentage of the video sequence to be hand-labeled. The approach, however, is based on calibrated cameras and rigid geometry, making it expensive, difficult to manage, and impractical in real-world scenarios. In this paper, we address these bottlenecks by combining a non-rigid 3D neural prior with deep flow to obtain high-fidelity landmark estimates from videos with only two or three uncalibrated, handheld cameras. With just a few annotations (representing 1-2% of the frames), we are able to produce 2D results comparable to state-of-the-art fully supervised methods, along with 3D reconstructions that are impossible with other existing approaches. Our Multi-view Bootstrap** in the Wild (MBW) approach demonstrates impressive results on standard human datasets, as well as tigers, cheetahs, fish, colobus monkeys, chimpanzees, and flamingos from videos captured casually in a zoo. We release the codebase for MBW as well as this challenging zoo dataset consisting image frames of tail-end distribution categories with their corresponding 2D, 3D labels generated from minimal human intervention.
△ Less
Submitted 4 October, 2022;
originally announced October 2022.
-
RHIC Power Supply Ramp Diagnostics*
Authors:
J. T. Morris,
T. S. Clifford,
B. Frak,
J. Laster,
A. Marusic,
J. van Zeijts
Abstract:
Reliable and reproducible performance of the more than 800 Relativistic Heavy Ion Collider (RHIC) magnet power supplies is essential to successful RHIC operation. In order to support power supply commissioning, a system was developed to capture detailed power supply measurements from all the RHIC ring power supplies during acceleration ramps. Diagnostic tools were developed to allow experts to a…
▽ More
Reliable and reproducible performance of the more than 800 Relativistic Heavy Ion Collider (RHIC) magnet power supplies is essential to successful RHIC operation. In order to support power supply commissioning, a system was developed to capture detailed power supply measurements from all the RHIC ring power supplies during acceleration ramps. Diagnostic tools were developed to allow experts to assess ramp reproducibility and rapidly identify problems. The system has now become a routine part of RHIC operations, with data captured for every acceleration ramp. This paper describes the RHIC power supply ramp diagnostic system and considers its impact on RHIC operations.
△ Less
Submitted 21 November, 2001;
originally announced November 2001.
-
Post Mortem System - Playback of the RHIC Collider
Authors:
J. S. Laster,
T. Clifford,
T. D'Ottavio,
A. Marusic,
J. F. Skelly
Abstract:
A Post Mortem System was developed for the Relativistic Heavy Ion Collider at Brookhaven National Laboratory to provide a playback of the collider state at the time of a beam abort, quench, or other failure event. Post Mortem data is used to provide diagnostics about the failure and to improve future stores. This data is read from hardware buffers and is written directly to the main file system…
▽ More
A Post Mortem System was developed for the Relativistic Heavy Ion Collider at Brookhaven National Laboratory to provide a playback of the collider state at the time of a beam abort, quench, or other failure event. Post Mortem data is used to provide diagnostics about the failure and to improve future stores. This data is read from hardware buffers and is written directly to the main file system by Accelerator Device Objects in the front-end computers. The Post Mortem System has facilitated analysis of loss monitor and power supply data, such as beam loss during magnet quenches, dump kicker misfires and power supply malfunctions. System details and recent operating experience will be discussed.
△ Less
Submitted 21 November, 2001;
originally announced November 2001.