-
Prompt Stability Scoring for Text Annotation with Large Language Models
Authors:
Christopher Barrie,
Elli Palaiologou,
Petter Törnberg
Abstract:
Researchers are increasingly using language models (LMs) for text annotation. These approaches rely only on a prompt telling the model to return a given output according to a set of instructions. The reproducibility of LM outputs may nonetheless be vulnerable to small changes in the prompt design. This calls into question the replicability of classification routines. To tackle this problem, resear…
▽ More
Researchers are increasingly using language models (LMs) for text annotation. These approaches rely only on a prompt telling the model to return a given output according to a set of instructions. The reproducibility of LM outputs may nonetheless be vulnerable to small changes in the prompt design. This calls into question the replicability of classification routines. To tackle this problem, researchers have typically tested a variety of semantically similar prompts to determine what we call "prompt stability." These approaches remain ad-hoc and task specific. In this article, we propose a general framework for diagnosing prompt stability by adapting traditional approaches to intra- and inter-coder reliability scoring. We call the resulting metric the Prompt Stability Score (PSS) and provide a Python package PromptStability for its estimation. Using six different datasets and twelve outcomes, we classify >150k rows of data to: a) diagnose when prompt stability is low; and b) demonstrate the functionality of the package. We conclude by providing best practice recommendations for applied researchers.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
Satyrn: A Platform for Analytics Augmented Generation
Authors:
Marko Sterbentz,
Cameron Barrie,
Shubham Shahi,
Abhratanu Dutta,
Donna Hooshmand,
Harper Pack,
Kristian J. Hammond
Abstract:
Large language models (LLMs) are capable of producing documents, and retrieval augmented generation (RAG) has shown itself to be a powerful method for improving accuracy without sacrificing fluency. However, not all information can be retrieved from text. We propose an approach that uses the analysis of structured data to generate fact sets that are used to guide generation in much the same way th…
▽ More
Large language models (LLMs) are capable of producing documents, and retrieval augmented generation (RAG) has shown itself to be a powerful method for improving accuracy without sacrificing fluency. However, not all information can be retrieved from text. We propose an approach that uses the analysis of structured data to generate fact sets that are used to guide generation in much the same way that retrieved documents are used in RAG. This analytics augmented generation (AAG) approach supports the ability to utilize standard analytic techniques to generate facts that are then converted to text and passed to an LLM. We present a neurosymbolic platform, Satyrn that leverages AAG to produce accurate, fluent, and coherent reports grounded in large scale databases. In our experiments, we find that Satyrn generates reports in which over 86% accurate claims while maintaining high levels of fluency and coherence, even when using smaller language models such as Mistral-7B, as compared to GPT-4 Code Interpreter in which just 57% of claims are accurate.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Lightweight Knowledge Representations for Automating Data Analysis
Authors:
Marko Sterbentz,
Cameron Barrie,
Donna Hooshmand,
Shubham Shahi,
Abhratanu Dutta,
Harper Pack,
Andong Li Zhao,
Andrew Paley,
Alexander Einarsson,
Kristian Hammond
Abstract:
The principal goal of data science is to derive meaningful information from data. To do this, data scientists develop a space of analytic possibilities and from it reach their information goals by using their knowledge of the domain, the available data, the operations that can be performed on those data, the algorithms/models that are fed the data, and how all of these facets interweave. In this w…
▽ More
The principal goal of data science is to derive meaningful information from data. To do this, data scientists develop a space of analytic possibilities and from it reach their information goals by using their knowledge of the domain, the available data, the operations that can be performed on those data, the algorithms/models that are fed the data, and how all of these facets interweave. In this work, we take the first steps towards automating a key aspect of the data science pipeline: data analysis. We present an extensible taxonomy of data analytic operations that scopes across domains and data, as well as a method for codifying domain-specific knowledge that links this analytics taxonomy to actual data. We validate the functionality of our analytics taxonomy by implementing a system that leverages it, alongside domain labelings for 8 distinct domains, to automatically generate a space of answerable questions and associated analytic plans. In this way, we produce information spaces over data that enable complex analyses and search over this data and pave the way for fully automated data analysis.
△ Less
Submitted 15 October, 2023;
originally announced November 2023.
-
Did the Musk Takeover Boost Contentious Actors on Twitter?
Authors:
Christopher Barrie
Abstract:
Twitter has been accused of a liberal bias in its account verification and content moderation policies. Elon Musk pledged, after his acquisition of the company, to promote free speech on the platform by overhauling verification and moderation policies. These events sparked fears of a rise in influence of contentious actors -- notably from the political right. In this article, I use a publicly rele…
▽ More
Twitter has been accused of a liberal bias in its account verification and content moderation policies. Elon Musk pledged, after his acquisition of the company, to promote free speech on the platform by overhauling verification and moderation policies. These events sparked fears of a rise in influence of contentious actors -- notably from the political right. In this article, I use a publicly released list of 138k Twitter accounts that purchased blue check verification during the open window of November 9-November 11, 2022. I retrieve 4.9m tweets from a sample of politically contentious accounts. I then compare engagement on contentious user posts before and after the Musk acquisition. I find that the period following the Musk acquisition saw a substantive increase in post engagement. There is no additional increase following blue tick verification. I explain the findings with reference to an increase in activity by a newly sympathetic user base.
△ Less
Submitted 20 December, 2022;
originally announced December 2022.
-
Requirements for Open Political Information: Transparency Beyond Open Data
Authors:
Andong Luis Li Zhao,
Andrew Paley,
Rachel Adler,
Harper Pack,
Sergio Servantez,
Alexander Einarsson,
Cameron Barrie,
Marko Sterbentz,
Kristian Hammond
Abstract:
A politically informed citizenry is imperative for a welldeveloped democracy. While the US government has pursued policies for open data, these efforts have been insufficient in achieving an open government because only people with technical and domain knowledge can access information in the data. In this work, we conduct user interviews to identify wants and needs among stakeholders. We further u…
▽ More
A politically informed citizenry is imperative for a welldeveloped democracy. While the US government has pursued policies for open data, these efforts have been insufficient in achieving an open government because only people with technical and domain knowledge can access information in the data. In this work, we conduct user interviews to identify wants and needs among stakeholders. We further use this information to sketch out the foundational requirements for a functional political information technical system.
△ Less
Submitted 6 December, 2021;
originally announced December 2021.
-
Explaining Recruitment to Extremism: A Bayesian Case-Control Approach
Authors:
Roberto Cerina,
Christopher Barrie,
Neil Ketchley,
Aaron Zelin
Abstract:
Who joins extremist movements? Answering this question poses considerable methodological challenges. Survey techniques are infeasible and selective samples provide no counterfactual. Recruits can be assigned to contextual units, but this is vulnerable to problems of ecological inference. In this article, we take inspiration from epidemiology and elaborate a technique that combines survey and ecolo…
▽ More
Who joins extremist movements? Answering this question poses considerable methodological challenges. Survey techniques are infeasible and selective samples provide no counterfactual. Recruits can be assigned to contextual units, but this is vulnerable to problems of ecological inference. In this article, we take inspiration from epidemiology and elaborate a technique that combines survey and ecological approaches. The multilevel Bayesian case-control design that we propose allows us to identify individual-level and contextual factors patterning the incidence of recruitment, while accounting for rare events, contamination, and spatial autocorrelation. We validate our approach by matching a sample of Islamic State (ISIS) fighters from nine MENA countries with representative population surveys enumerated shortly before recruits joined the movement. High status individuals in their early twenties with university education were more likely to join ISIS. There is more mixed evidence for relative deprivation. We provide software for applied researchers to implement our method.
△ Less
Submitted 17 May, 2022; v1 submitted 3 June, 2021;
originally announced June 2021.