Skip to main content

Showing 1–1 of 1 results for author: Vidas, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2307.13692  [pdf, other

    cs.CL cs.LG

    ARB: Advanced Reasoning Benchmark for Large Language Models

    Authors: Tomohiro Sawada, Daniel Paleka, Alexander Havrilla, Pranav Tadepalli, Paula Vidas, Alexander Kranias, John J. Nay, Kshitij Gupta, Aran Komatsuzaki

    Abstract: Large Language Models (LLMs) have demonstrated remarkable performance on various quantitative reasoning and knowledge benchmarks. However, many of these benchmarks are losing utility as LLMs get increasingly high scores, despite not yet reaching expert performance in these domains. We introduce ARB, a novel benchmark composed of advanced reasoning problems in multiple fields. ARB presents a more c… ▽ More

    Submitted 27 July, 2023; v1 submitted 25 July, 2023; originally announced July 2023.

    Comments: Submitted to NeurIPS Datasets and Benchmarks Track