Skip to main content

Showing 1–10 of 10 results for author: Shtedritski, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.03428  [pdf, other

    cs.LG

    HelloFresh: LLM Evaluations on Streams of Real-World Human Editorial Actions across X Community Notes and Wikipedia edits

    Authors: Tim Franzmeyer, Aleksandar Shtedritski, Samuel Albanie, Philip Torr, João F. Henriques, Jakob N. Foerster

    Abstract: Benchmarks have been essential for driving progress in machine learning. A better understanding of LLM capabilities on real world tasks is vital for safe development. Designing adequate LLM benchmarks is challenging: Data from real-world tasks is hard to collect, public availability of static evaluation data results in test data contamination and benchmark overfitting, and periodically generating… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: ACL 2024 Findings

  2. arXiv:2312.07559  [pdf, other

    cs.CL cs.AI cs.LG

    PaperQA: Retrieval-Augmented Generative Agent for Scientific Research

    Authors: Jakub Lála, Odhran O'Donoghue, Aleksandar Shtedritski, Sam Cox, Samuel G. Rodriques, Andrew D. White

    Abstract: Large Language Models (LLMs) generalize well across language tasks, but suffer from hallucinations and uninterpretability, making it difficult to assess their accuracy without ground-truth. Retrieval-Augmented Generation (RAG) models have been proposed to reduce hallucinations and provide provenance for how an answer was generated. Applying such models to the scientific literature may enable large… ▽ More

    Submitted 14 December, 2023; v1 submitted 8 December, 2023; originally announced December 2023.

  3. arXiv:2310.10632  [pdf, other

    cs.CL cs.AI cs.RO

    BioPlanner: Automatic Evaluation of LLMs on Protocol Planning in Biology

    Authors: Odhran O'Donoghue, Aleksandar Shtedritski, John Ginger, Ralph Abboud, Ali Essa Ghareeb, Justin Booth, Samuel G Rodriques

    Abstract: The ability to automatically generate accurate protocols for scientific experiments would represent a major step towards the automation of science. Large Language Models (LLMs) have impressive capabilities on a wide range of tasks, such as question answering and the generation of coherent text and code. However, LLMs can struggle with multi-step problems and long-term planning, which are crucial f… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

    Comments: EMNLP 2023. Dataset and code: https://github.com/bioplanner/bioplanner

  4. arXiv:2306.12424  [pdf, other

    cs.CV cs.CL

    VisoGender: A dataset for benchmarking gender bias in image-text pronoun resolution

    Authors: Siobhan Mackenzie Hall, Fernanda Gonçalves Abrantes, Hanwen Zhu, Grace Sodunke, Aleksandar Shtedritski, Hannah Rose Kirk

    Abstract: We introduce VisoGender, a novel dataset for benchmarking gender bias in vision-language models. We focus on occupation-related biases within a hegemonic system of binary gender, inspired by Winograd and Winogender schemas, where each image is associated with a caption containing a pronoun relationship of subjects and objects in the scene. VisoGender is balanced by gender representation in profess… ▽ More

    Submitted 12 December, 2023; v1 submitted 21 June, 2023; originally announced June 2023.

    Comments: NeurIPS Datasets and Benchmarks 2023. Data and code available at https://github.com/oxai/visogender

  5. arXiv:2305.15407  [pdf, other

    cs.CV

    Balancing the Picture: Debiasing Vision-Language Datasets with Synthetic Contrast Sets

    Authors: Brandon Smith, Miguel Farinha, Siobhan Mackenzie Hall, Hannah Rose Kirk, Aleksandar Shtedritski, Max Bain

    Abstract: Vision-language models are growing in popularity and public visibility to generate, edit, and caption images at scale; but their outputs can perpetuate and amplify societal biases learned during pre-training on uncurated image-text pairs from the internet. Although debiasing methods have been proposed, we argue that these measurements of model bias lack validity due to dataset bias. We demonstrate… ▽ More

    Submitted 24 May, 2023; originally announced May 2023.

    Comments: Github: https://github.com/oxai/debias-gensynth

  6. arXiv:2304.06712  [pdf, other

    cs.CV

    What does CLIP know about a red circle? Visual prompt engineering for VLMs

    Authors: Aleksandar Shtedritski, Christian Rupprecht, Andrea Vedaldi

    Abstract: Large-scale Vision-Language Models, such as CLIP, learn powerful image-text representations that have found numerous applications, from zero-shot classification to text-to-image generation. Despite that, their capabilities for solving novel discriminative tasks via prompting fall behind those of large language models, such as GPT-3. Here we explore the idea of visual prompt engineering for solving… ▽ More

    Submitted 18 August, 2023; v1 submitted 13 April, 2023; originally announced April 2023.

    Comments: ICCV 2023 Oral

  7. arXiv:2203.11933  [pdf, other

    cs.LG cs.CL cs.CV cs.CY

    A Prompt Array Keeps the Bias Away: Debiasing Vision-Language Models with Adversarial Learning

    Authors: Hugo Berg, Siobhan Mackenzie Hall, Yash Bhalgat, Wonsuk Yang, Hannah Rose Kirk, Aleksandar Shtedritski, Max Bain

    Abstract: Vision-language models can encode societal biases and stereotypes, but there are challenges to measuring and mitigating these multimodal harms due to lacking measurement robustness and feature degradation. To address these challenges, we investigate bias measures and apply ranking metrics for image-text representations. We then investigate debiasing methods and show that prepending learned embeddi… ▽ More

    Submitted 25 October, 2022; v1 submitted 22 March, 2022; originally announced March 2022.

    Comments: 17 pages, 4 figures, 7 tables. For code and trained token embeddings, see https://github.com/oxai/debias-vision-lang; Changed to use ACL layout, added joint training with comparison figure, corrected spelling and formatting errors; This paper is accepted for publication at AACL 2022, the official version of record is in the ACL Anthology

  8. arXiv:2107.04313  [pdf, other

    cs.CV

    Memes in the Wild: Assessing the Generalizability of the Hateful Memes Challenge Dataset

    Authors: Hannah Rose Kirk, Yennie Jun, Paulius Rauba, Gal Wachtel, Ruining Li, Xingjian Bai, Noah Broestl, Martin Doff-Sotta, Aleksandar Shtedritski, Yuki M. Asano

    Abstract: Hateful memes pose a unique challenge for current machine learning systems because their message is derived from both text- and visual-modalities. To this effect, Facebook released the Hateful Memes Challenge, a dataset of memes with pre-extracted text captions, but it is unclear whether these synthetic examples generalize to `memes in the wild'. In this paper, we collect hateful and non-hateful m… ▽ More

    Submitted 9 July, 2021; originally announced July 2021.

    Comments: Accepted paper at ACL WOAH 2021

  9. arXiv:2103.06587  [pdf, other

    cs.CV

    Privacy-preserving Object Detection

    Authors: Peiyang He, Charlie Griffin, Krzysztof Kacprzyk, Artjom Joosen, Michael Collyer, Aleksandar Shtedritski, Yuki M. Asano

    Abstract: Privacy considerations and bias in datasets are quickly becoming high-priority issues that the computer vision community needs to face. So far, little attention has been given to practical solutions that do not involve collection of new datasets. In this work, we show that for object detection on COCO, both anonymizing the dataset by blurring faces, as well as swap** faces in a balanced manner a… ▽ More

    Submitted 11 March, 2021; originally announced March 2021.

  10. arXiv:2102.04130  [pdf, other

    cs.CL cs.AI

    Bias Out-of-the-Box: An Empirical Analysis of Intersectional Occupational Biases in Popular Generative Language Models

    Authors: Hannah Kirk, Yennie Jun, Haider Iqbal, Elias Benussi, Filippo Volpin, Frederic A. Dreyer, Aleksandar Shtedritski, Yuki M. Asano

    Abstract: The capabilities of natural language models trained on large-scale data have increased immensely over the past few years. Open source libraries such as HuggingFace have made these models easily available and accessible. While prior research has identified biases in large language models, this paper considers biases contained in the most popular versions of these models when applied `out-of-the-box… ▽ More

    Submitted 27 October, 2021; v1 submitted 8 February, 2021; originally announced February 2021.

    Comments: Accepted to NeurIPS 2021. Code and data at https://github.com/oxai/intersectional_gpt2