Open-Source LLMs for Text Annotation: A Practical Guide for Model Setting and Fine-Tuning
Authors:
Meysam Alizadeh,
Maƫl Kubli,
Zeynab Samei,
Shirin Dehghani,
Mohammadmasiha Zahedivafa,
Juan Diego Bermeo,
Maria Korobeynikova,
Fabrizio Gilardi
Abstract:
This paper studies the performance of open-source Large Language Models (LLMs) in text classification tasks typical for political science research. By examining tasks like stance, topic, and relevance classification, we aim to guide scholars in making informed decisions about their use of LLMs for text analysis. Specifically, we conduct an assessment of both zero-shot and fine-tuned LLMs across a…
▽ More
This paper studies the performance of open-source Large Language Models (LLMs) in text classification tasks typical for political science research. By examining tasks like stance, topic, and relevance classification, we aim to guide scholars in making informed decisions about their use of LLMs for text analysis. Specifically, we conduct an assessment of both zero-shot and fine-tuned LLMs across a range of text annotation tasks using news articles and tweets datasets. Our analysis shows that fine-tuning improves the performance of open-source LLMs, allowing them to match or even surpass zero-shot GPT-3.5 and GPT-4, though still lagging behind fine-tuned GPT-3.5. We further establish that fine-tuning is preferable to few-shot training with a relatively modest quantity of annotated text. Our findings show that fine-tuned open-source LLMs can be effectively deployed in a broad spectrum of text annotation applications. We provide a Python notebook facilitating the application of LLMs in text annotation for other researchers.
△ Less
Submitted 29 May, 2024; v1 submitted 5 July, 2023;
originally announced July 2023.
NEMO5: Achieving High-end Internode Communication for Performance Projection Beyond Moore's Law
Authors:
Robert Andrawis,
Jose David Bermeo,
James Charles,
Jianbin Fang,
Jim Fonseca,
Yu He,
Gerhard Klimeck,
Zheng** Jiang,
Tillmann Kubis,
Daniel Mejia,
Daniel Lemus,
Michael Povolotskyi,
Santiago Alonso Perez Rubiano,
Prasad Sarangapani,
Lang Zeng
Abstract:
Electronic performance predictions of modern nanotransistors require nonequilibrium Green's functions including incoherent scattering on phonons as well as inclusion of random alloy disorder and surface roughness effects. The solution of all these effects is numerically extremely expensive and has to be done on the world's largest supercomputers due to the large memory requirement and the high per…
▽ More
Electronic performance predictions of modern nanotransistors require nonequilibrium Green's functions including incoherent scattering on phonons as well as inclusion of random alloy disorder and surface roughness effects. The solution of all these effects is numerically extremely expensive and has to be done on the world's largest supercomputers due to the large memory requirement and the high performance demands on the communication network between the compute nodes. In this work, it is shown that NEMO5 covers all required physical effects and their combination. Furthermore, it is also shown that NEMO5's implementation of the algorithm scales very well up to about 178176CPUs with a sustained performance of about 857 TFLOPS. Therefore, NEMO5 is ready to simulate future nanotransistors.
△ Less
Submitted 15 October, 2015;
originally announced October 2015.