Skip to main content

Showing 1–1 of 1 results for author: Alyahya, H A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2402.01781  [pdf, other

    cs.CL cs.AI cs.LG

    When Benchmarks are Targets: Revealing the Sensitivity of Large Language Model Leaderboards

    Authors: Norah Alzahrani, Hisham Abdullah Alyahya, Yazeed Alnumay, Sultan Alrashed, Shaykhah Alsubaie, Yusef Almushaykeh, Faisal Mirza, Nouf Alotaibi, Nora Altwairesh, Areeb Alowisheq, M Saiful Bari, Haidar Khan

    Abstract: Large Language Model (LLM) leaderboards based on benchmark rankings are regularly used to guide practitioners in model selection. Often, the published leaderboard rankings are taken at face value - we show this is a (potentially costly) mistake. Under existing leaderboards, the relative performance of LLMs is highly sensitive to (often minute) details. We show that for popular multiple-choice ques… ▽ More

    Submitted 3 July, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

    Comments: updated with ACL 2024 camera ready version