Skip to main content

Showing 1–27 of 27 results for author: Kirk, H R

.
  1. arXiv:2406.06196  [pdf, other

    cs.CL

    LINGOLY: A Benchmark of Olympiad-Level Linguistic Reasoning Puzzles in Low-Resource and Extinct Languages

    Authors: Andrew M. Bean, Simi Hellsten, Harry Mayne, Jabez Magomere, Ethan A. Chi, Ryan Chi, Scott A. Hale, Hannah Rose Kirk

    Abstract: In this paper, we present the LingOly benchmark, a novel benchmark for advanced reasoning abilities in large language models. Using challenging Linguistic Olympiad puzzles, we evaluate (i) capabilities for in-context identification and generalisation of linguistic patterns in very low-resource or extinct languages, and (ii) abilities to follow complex task instructions. The LingOly benchmark cover… ▽ More

    Submitted 11 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

    Comments: 9 pages, 5 figures, 16 pages supplemental materials

  2. arXiv:2405.13058  [pdf, other

    cs.SE cs.AI cs.CY cs.LG

    The AI Community Building the Future? A Quantitative Analysis of Development Activity on Hugging Face Hub

    Authors: Cailean Osborne, Jennifer Ding, Hannah Rose Kirk

    Abstract: Open model developers have emerged as key actors in the political economy of artificial intelligence (AI), but we still have a limited understanding of collaborative practices in the open AI ecosystem. This paper responds to this gap with a three-part quantitative analysis of development activity on the Hugging Face (HF) Hub, a popular platform for building, sharing, and demonstrating models. Firs… ▽ More

    Submitted 5 June, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

    Comments: 27 pages, 5 figures, 9 tables

    ACM Class: K.4.1

  3. arXiv:2404.16019  [pdf, other

    cs.CL

    The PRISM Alignment Project: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models

    Authors: Hannah Rose Kirk, Alexander Whitefield, Paul Röttger, Andrew Bean, Katerina Margatina, Juan Ciro, Rafael Mosquera, Max Bartolo, Adina Williams, He He, Bertie Vidgen, Scott A. Hale

    Abstract: Human feedback plays a central role in the alignment of Large Language Models (LLMs). However, open questions remain about the methods (how), domains (where), people (who) and objectives (to what end) of human feedback collection. To navigate these questions, we introduce PRISM, a new dataset which maps the sociodemographics and stated preferences of 1,500 diverse participants from 75 countries, t… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  4. arXiv:2404.12241  [pdf, other

    cs.CL cs.AI

    Introducing v0.5 of the AI Safety Benchmark from MLCommons

    Authors: Bertie Vidgen, Adarsh Agrawal, Ahmed M. Ahmed, Victor Akinwande, Namir Al-Nuaimi, Najla Alfaraj, Elie Alhajjar, Lora Aroyo, Trupti Bavalatti, Max Bartolo, Borhane Blili-Hamelin, Kurt Bollacker, Rishi Bomassani, Marisa Ferrara Boston, Siméon Campos, Kal Chakra, Canyu Chen, Cody Coleman, Zacharie Delpierre Coudert, Leon Derczynski, Debojyoti Dutta, Ian Eisenberg, James Ezick, Heather Frase, Brian Fuller , et al. (75 additional authors not shown)

    Abstract: This paper introduces v0.5 of the AI Safety Benchmark, which has been created by the MLCommons AI Safety Working Group. The AI Safety Benchmark has been designed to assess the safety risks of AI systems that use chat-tuned language models. We introduce a principled approach to specifying and constructing the benchmark, which for v0.5 covers only a single use case (an adult chatting to a general-pu… ▽ More

    Submitted 13 May, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

  5. arXiv:2403.12075  [pdf, other

    cs.CY cs.AI cs.CR cs.CV cs.LG

    Adversarial Nibbler: An Open Red-Teaming Method for Identifying Diverse Harms in Text-to-Image Generation

    Authors: Jessica Quaye, Alicia Parrish, Oana Inel, Charvi Rastogi, Hannah Rose Kirk, Minsuk Kahng, Erin van Liemt, Max Bartolo, Jess Tsang, Justin White, Nathan Clement, Rafael Mosquera, Juan Ciro, Vijay Janapa Reddi, Lora Aroyo

    Abstract: With the rise of text-to-image (T2I) generative AI models reaching wide audiences, it is critical to evaluate model robustness against non-obvious attacks to mitigate the generation of offensive images. By focusing on ``implicitly adversarial'' prompts (those that trigger T2I models to generate unsafe images for non-obvious reasons), we isolate a set of difficult safety issues that human creativit… ▽ More

    Submitted 13 May, 2024; v1 submitted 14 February, 2024; originally announced March 2024.

    Comments: 10 pages, 6 figures

  6. arXiv:2402.16786  [pdf, other

    cs.CL cs.AI

    Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models

    Authors: Paul Röttger, Valentin Hofmann, Valentina Pyatkin, Musashi Hinck, Hannah Rose Kirk, Hinrich Schütze, Dirk Hovy

    Abstract: Much recent work seeks to evaluate values and opinions in large language models (LLMs) using multiple-choice surveys and questionnaires. Most of this work is motivated by concerns around real-world LLM applications. For example, politically-biased LLMs may subtly influence society when they are used by millions of people. Such real-world concerns, however, stand in stark contrast to the artificial… ▽ More

    Submitted 5 June, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

    Comments: Accepted at ACL 2024 (Main Conference)

  7. arXiv:2401.12295  [pdf, other

    cs.CL

    Cheap Learning: Maximising Performance of Language Models for Social Data Science Using Minimal Data

    Authors: Leonardo Castro-Gonzalez, Yi-Ling Chung, Hannak Rose Kirk, John Francis, Angus R. Williams, Pica Johansson, Jonathan Bright

    Abstract: The field of machine learning has recently made significant progress in reducing the requirements for labelled training data when building new models. These `cheaper' learning techniques hold significant potential for the social sciences, where development of large labelled training datasets is often a significant practical impediment to the use of machine learning for analytical tasks. In this ar… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

    Comments: 39 pages, 10 figures, 6 tables

    ACM Class: I.2.7; J.4

  8. arXiv:2311.08370  [pdf, other

    cs.CL

    SimpleSafetyTests: a Test Suite for Identifying Critical Safety Risks in Large Language Models

    Authors: Bertie Vidgen, Nino Scherrer, Hannah Rose Kirk, Rebecca Qian, Anand Kannappan, Scott A. Hale, Paul Röttger

    Abstract: The past year has seen rapid acceleration in the development of large language models (LLMs). However, without proper steering and safeguards, LLMs will readily follow malicious instructions, provide unsafe advice, and generate toxic content. We introduce SimpleSafetyTests (SST) as a new test suite for rapidly and systematically identifying such critical safety risks. The test suite comprises 100… ▽ More

    Submitted 16 February, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

  9. arXiv:2310.07629  [pdf, other

    cs.CL cs.CY

    The Past, Present and Better Future of Feedback Learning in Large Language Models for Subjective Human Preferences and Values

    Authors: Hannah Rose Kirk, Andrew M. Bean, Bertie Vidgen, Paul Röttger, Scott A. Hale

    Abstract: Human feedback is increasingly used to steer the behaviours of Large Language Models (LLMs). However, it is unclear how to collect and incorporate feedback in a way that is efficient, effective and unbiased, especially for highly subjective human preferences and values. In this paper, we survey existing approaches for learning from human feedback, drawing on 95 papers primarily from the ACL and ar… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

    Comments: Accepted for the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP, Main)

  10. arXiv:2310.02457  [pdf, other

    cs.CL cs.CY

    The Empty Signifier Problem: Towards Clearer Paradigms for Operationalising "Alignment" in Large Language Models

    Authors: Hannah Rose Kirk, Bertie Vidgen, Paul Röttger, Scott A. Hale

    Abstract: In this paper, we address the concept of "alignment" in large language models (LLMs) through the lens of post-structuralist socio-political theory, specifically examining its parallels to empty signifiers. To establish a shared vocabulary around how abstract concepts of alignment are operationalised in empirical datasets, we propose a framework that demarcates: 1) which dimensions of model behavio… ▽ More

    Submitted 15 November, 2023; v1 submitted 3 October, 2023; originally announced October 2023.

    Comments: Socially Responsible Language Modelling Research (SoLaR) @ NeurIPs 2023

  11. arXiv:2309.08573  [pdf, other

    cs.CL cs.CY

    Casteist but Not Racist? Quantifying Disparities in Large Language Model Bias between India and the West

    Authors: Khyati Khandelwal, Manuel Tonneau, Andrew M. Bean, Hannah Rose Kirk, Scott A. Hale

    Abstract: Large Language Models (LLMs), now used daily by millions of users, can encode societal biases, exposing their users to representational harms. A large body of scholarship on LLM bias exists but it predominantly adopts a Western-centric frame and attends comparatively less to bias levels and potential harms in the Global South. In this paper, we quantify stereotypical bias in popular LLMs according… ▽ More

    Submitted 15 September, 2023; originally announced September 2023.

  12. arXiv:2308.01263  [pdf, other

    cs.CL cs.AI

    XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models

    Authors: Paul Röttger, Hannah Rose Kirk, Bertie Vidgen, Giuseppe Attanasio, Federico Bianchi, Dirk Hovy

    Abstract: Without proper safeguards, large language models will readily follow malicious instructions and generate toxic content. This risk motivates safety efforts such as red-teaming and large-scale feedback learning, which aim to make models both helpful and harmless. However, there is a tension between these two objectives, since harmlessness requires models to refuse to comply with unsafe prompts, and… ▽ More

    Submitted 1 April, 2024; v1 submitted 2 August, 2023; originally announced August 2023.

    Comments: Accepted at NAACL 2024 (Main Conference)

  13. arXiv:2307.16811  [pdf, other

    cs.CL cs.CY

    DoDo Learning: DOmain-DemOgraphic Transfer in Language Models for Detecting Abuse Targeted at Public Figures

    Authors: Angus R. Williams, Hannah Rose Kirk, Liam Burke, Yi-Ling Chung, Ivan Debono, Pica Johansson, Francesca Stevens, Jonathan Bright, Scott A. Hale

    Abstract: Public figures receive a disproportionate amount of abuse on social media, impacting their active participation in public life. Automated systems can identify abuse at scale but labelling training data is expensive, complex and potentially harmful. So, it is desirable that systems are efficient and generalisable, handling both shared and specific aspects of online abuse. We explore the dynamics of… ▽ More

    Submitted 25 April, 2024; v1 submitted 31 July, 2023; originally announced July 2023.

    Comments: 15 pages, 7 figures, 4 tables

  14. arXiv:2306.12424  [pdf, other

    cs.CV cs.CL

    VisoGender: A dataset for benchmarking gender bias in image-text pronoun resolution

    Authors: Siobhan Mackenzie Hall, Fernanda Gonçalves Abrantes, Hanwen Zhu, Grace Sodunke, Aleksandar Shtedritski, Hannah Rose Kirk

    Abstract: We introduce VisoGender, a novel dataset for benchmarking gender bias in vision-language models. We focus on occupation-related biases within a hegemonic system of binary gender, inspired by Winograd and Winogender schemas, where each image is associated with a caption containing a pronoun relationship of subjects and objects in the scene. VisoGender is balanced by gender representation in profess… ▽ More

    Submitted 12 December, 2023; v1 submitted 21 June, 2023; originally announced June 2023.

    Comments: NeurIPS Datasets and Benchmarks 2023. Data and code available at https://github.com/oxai/visogender

  15. arXiv:2305.15407  [pdf, other

    cs.CV

    Balancing the Picture: Debiasing Vision-Language Datasets with Synthetic Contrast Sets

    Authors: Brandon Smith, Miguel Farinha, Siobhan Mackenzie Hall, Hannah Rose Kirk, Aleksandar Shtedritski, Max Bain

    Abstract: Vision-language models are growing in popularity and public visibility to generate, edit, and caption images at scale; but their outputs can perpetuate and amplify societal biases learned during pre-training on uncurated image-text pairs from the internet. Although debiasing methods have been proposed, we argue that these measurements of model bias lack validity due to dataset bias. We demonstrate… ▽ More

    Submitted 24 May, 2023; originally announced May 2023.

    Comments: Github: https://github.com/oxai/debias-gensynth

  16. arXiv:2305.14384  [pdf, other

    cs.LG cs.AI cs.CR cs.CV

    Adversarial Nibbler: A Data-Centric Challenge for Improving the Safety of Text-to-Image Models

    Authors: Alicia Parrish, Hannah Rose Kirk, Jessica Quaye, Charvi Rastogi, Max Bartolo, Oana Inel, Juan Ciro, Rafael Mosquera, Addison Howard, Will Cukierski, D. Sculley, Vijay Janapa Reddi, Lora Aroyo

    Abstract: The generative AI revolution in recent years has been spurred by an expansion in compute power and data quantity, which together enable extensive pre-training of powerful text-to-image (T2I) models. With their greater capabilities to generate realistic and creative content, these T2I models like DALL-E, MidJourney, Imagen or Stable Diffusion are reaching ever wider audiences. Any unsafe behaviors… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

    MSC Class: 14J68 (Primary)

  17. arXiv:2303.18190  [pdf, other

    cs.CL

    Assessing Language Model Deployment with Risk Cards

    Authors: Leon Derczynski, Hannah Rose Kirk, Vidhisha Balachandran, Sachin Kumar, Yulia Tsvetkov, M. R. Leiser, Saif Mohammad

    Abstract: This paper introduces RiskCards, a framework for structured assessment and documentation of risks associated with an application of language models. As with all language, text generated by language models can be harmful, or used to bring about harm. Automating language generation adds both an element of scale and also more subtle or emergent undesirable tendencies to the generated text. Prior work… ▽ More

    Submitted 31 March, 2023; originally announced March 2023.

  18. arXiv:2303.05453  [pdf, ps, other

    cs.CL cs.CY

    Personalisation within bounds: A risk taxonomy and policy framework for the alignment of large language models with personalised feedback

    Authors: Hannah Rose Kirk, Bertie Vidgen, Paul Röttger, Scott A. Hale

    Abstract: Large language models (LLMs) are used to generate content for a wide range of tasks, and are set to reach a growing audience in coming years due to integration in product interfaces like ChatGPT or search engines like Bing. This intensifies the need to ensure that models are aligned with human preferences and do not produce unsafe, inaccurate or toxic outputs. While alignment techniques like reinf… ▽ More

    Submitted 9 March, 2023; originally announced March 2023.

    Comments: 19 pages, 1 table

  19. arXiv:2303.04222  [pdf, other

    cs.CL cs.CY

    SemEval-2023 Task 10: Explainable Detection of Online Sexism

    Authors: Hannah Rose Kirk, Wenjie Yin, Bertie Vidgen, Paul Röttger

    Abstract: Online sexism is a widespread and harmful phenomenon. Automated tools can assist the detection of sexism at scale. Binary detection, however, disregards the diversity of sexist content, and fails to provide clear explanations for why something is sexist. To address this issue, we introduce SemEval Task 10 on the Explainable Detection of Online Sexism (EDOS). We make three main contributions: i) a… ▽ More

    Submitted 8 May, 2023; v1 submitted 7 March, 2023; originally announced March 2023.

    Comments: SemEval-2023 Task 10 (ACL 2023)

  20. Auditing large language models: a three-layered approach

    Authors: Jakob Mökander, Jonas Schuett, Hannah Rose Kirk, Luciano Floridi

    Abstract: Large language models (LLMs) represent a major advance in artificial intelligence (AI) research. However, the widespread use of LLMs is also coupled with significant ethical and social challenges. Previous research has pointed towards auditing as a promising governance mechanism to help ensure that AI systems are designed and deployed in ways that are ethical, legal, and technically robust. Howeve… ▽ More

    Submitted 27 June, 2023; v1 submitted 16 February, 2023; originally announced February 2023.

    Comments: 22 pages, 2 figures. AI Ethics (2023)

    ACM Class: K.4; K.6

  21. arXiv:2209.10193  [pdf, other

    cs.CL

    Is More Data Better? Re-thinking the Importance of Efficiency in Abusive Language Detection with Transformers-Based Active Learning

    Authors: Hannah Rose Kirk, Bertie Vidgen, Scott A. Hale

    Abstract: Annotating abusive language is expensive, logistically complex and creates a risk of psychological harm. However, most machine learning research has prioritized maximizing effectiveness (i.e., F1 or accuracy score) rather than data efficiency (i.e., minimizing the amount of data that is annotated). In this paper, we use simulated experiments over two datasets at varying percentages of abuse to dem… ▽ More

    Submitted 21 September, 2022; originally announced September 2022.

    Comments: Third Workshop on Threat, Aggression and Cyberbullying (COLING 2022)

  22. arXiv:2207.10062  [pdf, other

    cs.LG

    DataPerf: Benchmarks for Data-Centric AI Development

    Authors: Mark Mazumder, Colby Banbury, Xiaozhe Yao, Bojan Karlaš, William Gaviria Rojas, Sudnya Diamos, Greg Diamos, Lynn He, Alicia Parrish, Hannah Rose Kirk, Jessica Quaye, Charvi Rastogi, Douwe Kiela, David Jurado, David Kanter, Rafael Mosquera, Juan Ciro, Lora Aroyo, Bilge Acun, Lingjiao Chen, Mehul Smriti Raje, Max Bartolo, Sabri Eyuboglu, Amirata Ghorbani, Emmett Goodman , et al. (20 additional authors not shown)

    Abstract: Machine learning research has long focused on models rather than datasets, and prominent datasets are used for common ML tasks without regard to the breadth, difficulty, and faithfulness of the underlying problems. Neglecting the fundamental importance of data has given rise to inaccuracy, bias, and fragility in real-world applications, and research is hindered by saturation across existing datase… ▽ More

    Submitted 13 October, 2023; v1 submitted 20 July, 2022; originally announced July 2022.

    Comments: NeurIPS 2023 Datasets and Benchmarks Track

  23. arXiv:2205.11374  [pdf, other

    cs.CL cs.AI

    Looking for a Handsome Carpenter! Debiasing GPT-3 Job Advertisements

    Authors: Conrad Borchers, Dalia Sara Gala, Benjamin Gilburt, Eduard Oravkin, Wilfried Bounsi, Yuki M. Asano, Hannah Rose Kirk

    Abstract: The growing capability and availability of generative language models has enabled a wide range of new downstream tasks. Academic research has identified, quantified and mitigated biases present in language models but is rarely tailored to downstream tasks where wider impact on individuals and society can be felt. In this work, we leverage one popular generative language model, GPT-3, with the goal… ▽ More

    Submitted 23 May, 2022; originally announced May 2022.

    Comments: Accepted for the 4th Workshop on Gender Bias in Natural Language Processing at NAACL 2022

  24. arXiv:2204.14256  [pdf, other

    cs.CL

    Handling and Presenting Harmful Text in NLP Research

    Authors: Hannah Rose Kirk, Abeba Birhane, Bertie Vidgen, Leon Derczynski

    Abstract: Text data can pose a risk of harm. However, the risks are not fully understood, and how to handle, present, and discuss harmful text in a safe way remains an unresolved issue in the NLP community. We provide an analytical framework categorising harms on three axes: (1) the harm type (e.g., misinformation, hate speech or racial stereotypes); (2) whether a harm is \textit{sought} as a feature of the… ▽ More

    Submitted 24 February, 2023; v1 submitted 29 April, 2022; originally announced April 2022.

    Comments: in Findings of EMNLP 2022

  25. arXiv:2203.11933  [pdf, other

    cs.LG cs.CL cs.CV cs.CY

    A Prompt Array Keeps the Bias Away: Debiasing Vision-Language Models with Adversarial Learning

    Authors: Hugo Berg, Siobhan Mackenzie Hall, Yash Bhalgat, Wonsuk Yang, Hannah Rose Kirk, Aleksandar Shtedritski, Max Bain

    Abstract: Vision-language models can encode societal biases and stereotypes, but there are challenges to measuring and mitigating these multimodal harms due to lacking measurement robustness and feature degradation. To address these challenges, we investigate bias measures and apply ranking metrics for image-text representations. We then investigate debiasing methods and show that prepending learned embeddi… ▽ More

    Submitted 25 October, 2022; v1 submitted 22 March, 2022; originally announced March 2022.

    Comments: 17 pages, 4 figures, 7 tables. For code and trained token embeddings, see https://github.com/oxai/debias-vision-lang; Changed to use ACL layout, added joint training with comparison figure, corrected spelling and formatting errors; This paper is accepted for publication at AACL 2022, the official version of record is in the ACL Anthology

  26. arXiv:2108.05921  [pdf, other

    cs.CL cs.CY

    Hatemoji: A Test Suite and Adversarially-Generated Dataset for Benchmarking and Detecting Emoji-based Hate

    Authors: Hannah Rose Kirk, Bertram Vidgen, Paul Röttger, Tristan Thrush, Scott A. Hale

    Abstract: Detecting online hate is a complex task, and low-performing models have harmful consequences when used for sensitive applications such as content moderation. Emoji-based hate is an emerging challenge for automated detection. We present HatemojiCheck, a test suite of 3,930 short-form statements that allows us to evaluate performance on hateful language expressed with emoji. Using the test suite, we… ▽ More

    Submitted 6 May, 2022; v1 submitted 12 August, 2021; originally announced August 2021.

    Journal ref: 2022 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2022)

  27. arXiv:2107.04313  [pdf, other

    cs.CV

    Memes in the Wild: Assessing the Generalizability of the Hateful Memes Challenge Dataset

    Authors: Hannah Rose Kirk, Yennie Jun, Paulius Rauba, Gal Wachtel, Ruining Li, Xingjian Bai, Noah Broestl, Martin Doff-Sotta, Aleksandar Shtedritski, Yuki M. Asano

    Abstract: Hateful memes pose a unique challenge for current machine learning systems because their message is derived from both text- and visual-modalities. To this effect, Facebook released the Hateful Memes Challenge, a dataset of memes with pre-extracted text captions, but it is unclear whether these synthetic examples generalize to `memes in the wild'. In this paper, we collect hateful and non-hateful m… ▽ More

    Submitted 9 July, 2021; originally announced July 2021.

    Comments: Accepted paper at ACL WOAH 2021