Skip to main content

Showing 1–1 of 1 results for author: Ewaleifoh, O

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.11634  [pdf, other

    cs.CL cs.AI

    The Base-Rate Effect on LLM Benchmark Performance: Disambiguating Test-Taking Strategies from Benchmark Performance

    Authors: Kyle Moore, Jesse Roberts, Thao Pham, Oseremhen Ewaleifoh, Doug Fisher

    Abstract: Cloze testing is a common method for measuring the behavior of large language models on a number of benchmark tasks. Using the MMLU dataset, we show that the base-rate probability (BRP) differences across answer tokens are significant and affect task performance ie. guess A if uncertain. We find that counterfactual prompting does sufficiently mitigate the BRP effect. The BRP effect is found to hav… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.