Skip to main content

Showing 1–4 of 4 results for author: Gilardi, F

Searching in archive cs. Search in all archives.
.
  1. arXiv:2402.04879  [pdf, other

    cs.SI

    Comparing Methods for Creating a National Random Sample of Twitter Users

    Authors: Meysam Alizadeh, Darya Zare, Zeynab Samei, Mohammadamin Alizadeh, Mael Kubli, Mohammadhadi Aliahmadi, Sarvenaz Ebrahimi, Fabrizio Gilardi

    Abstract: Twitter data has been widely used by researchers across various social and computer science disciplines. A common aim when working with Twitter data is the construction of a random sample of users from a given country. However, while several methods have been proposed in the literature, their comparative performance is mostly unexplored. In this paper, we implement four common methods to collect a… ▽ More

    Submitted 11 March, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

  2. arXiv:2307.02179  [pdf, other

    cs.CL

    Open-Source LLMs for Text Annotation: A Practical Guide for Model Setting and Fine-Tuning

    Authors: Meysam Alizadeh, Maël Kubli, Zeynab Samei, Shirin Dehghani, Mohammadmasiha Zahedivafa, Juan Diego Bermeo, Maria Korobeynikova, Fabrizio Gilardi

    Abstract: This paper studies the performance of open-source Large Language Models (LLMs) in text classification tasks typical for political science research. By examining tasks like stance, topic, and relevance classification, we aim to guide scholars in making informed decisions about their use of LLMs for text analysis. Specifically, we conduct an assessment of both zero-shot and fine-tuned LLMs across a… ▽ More

    Submitted 29 May, 2024; v1 submitted 5 July, 2023; originally announced July 2023.

  3. ChatGPT Outperforms Crowd-Workers for Text-Annotation Tasks

    Authors: Fabrizio Gilardi, Meysam Alizadeh, Maël Kubli

    Abstract: Many NLP applications require manual data annotations for a variety of tasks, notably to train classifiers or evaluate the performance of unsupervised models. Depending on the size and degree of complexity, the tasks may be conducted by crowd-workers on platforms such as MTurk as well as trained annotators, such as research assistants. Using a sample of 2,382 tweets, we demonstrate that ChatGPT ou… ▽ More

    Submitted 19 July, 2023; v1 submitted 27 March, 2023; originally announced March 2023.

    Comments: Gilardi, Fabrizio, Meysam Alizadeh, and Maël Kubli. 2023. "ChatGPT Outperforms Crowd Workers for Text-Annotation Tasks". Proceedings of the National Academy of Sciences 120(30): e2305016120

  4. arXiv:2212.02108  [pdf, other

    cs.CL cs.LG

    Human-in-the-Loop Hate Speech Classification in a Multilingual Context

    Authors: Ana Kotarcic, Dominik Hangartner, Fabrizio Gilardi, Selina Kurer, Karsten Donnay

    Abstract: The shift of public debate to the digital sphere has been accompanied by a rise in online hate speech. While many promising approaches for hate speech classification have been proposed, studies often focus only on a single language, usually English, and do not address three key concerns: post-deployment performance, classifier maintenance and infrastructural limitations. In this paper, we introduc… ▽ More

    Submitted 1 January, 2023; v1 submitted 5 December, 2022; originally announced December 2022.

    Comments: Findings of EMNLP 2022