Skip to main content

Showing 1–1 of 1 results for author: Hauksson, R

.
  1. arXiv:2406.06613  [pdf, other

    cs.CL cs.AI

    GameBench: Evaluating Strategic Reasoning Abilities of LLM Agents

    Authors: Anthony Costarelli, Mat Allen, Roman Hauksson, Grace Sodunke, Suhas Hariharan, Carlson Cheng, Wenjie Li, Arjun Yadav

    Abstract: Large language models have demonstrated remarkable few-shot performance on many natural language understanding tasks. Despite several demonstrations of using large language models in complex, strategic scenarios, there lacks a comprehensive framework for evaluating agents' performance across various types of reasoning found in games. To address this gap, we introduce GameBench, a cross-domain benc… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.