Skip to main content

Showing 1–2 of 2 results for author: Sodunke, G

.
  1. arXiv:2406.06613  [pdf, other

    cs.CL cs.AI

    GameBench: Evaluating Strategic Reasoning Abilities of LLM Agents

    Authors: Anthony Costarelli, Mat Allen, Roman Hauksson, Grace Sodunke, Suhas Hariharan, Carlson Cheng, Wenjie Li, Arjun Yadav

    Abstract: Large language models have demonstrated remarkable few-shot performance on many natural language understanding tasks. Despite several demonstrations of using large language models in complex, strategic scenarios, there lacks a comprehensive framework for evaluating agents' performance across various types of reasoning found in games. To address this gap, we introduce GameBench, a cross-domain benc… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  2. arXiv:2306.12424  [pdf, other

    cs.CV cs.CL

    VisoGender: A dataset for benchmarking gender bias in image-text pronoun resolution

    Authors: Siobhan Mackenzie Hall, Fernanda Gonçalves Abrantes, Hanwen Zhu, Grace Sodunke, Aleksandar Shtedritski, Hannah Rose Kirk

    Abstract: We introduce VisoGender, a novel dataset for benchmarking gender bias in vision-language models. We focus on occupation-related biases within a hegemonic system of binary gender, inspired by Winograd and Winogender schemas, where each image is associated with a caption containing a pronoun relationship of subjects and objects in the scene. VisoGender is balanced by gender representation in profess… ▽ More

    Submitted 12 December, 2023; v1 submitted 21 June, 2023; originally announced June 2023.

    Comments: NeurIPS Datasets and Benchmarks 2023. Data and code available at https://github.com/oxai/visogender