Skip to main content

Showing 1–2 of 2 results for author: Dirani, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2311.12022  [pdf, other

    cs.AI cs.CL

    GPQA: A Graduate-Level Google-Proof Q&A Benchmark

    Authors: David Rein, Betty Li Hou, Asa Cooper Stickland, Jackson Petty, Richard Yuanzhe Pang, Julien Dirani, Julian Michael, Samuel R. Bowman

    Abstract: We present GPQA, a challenging dataset of 448 multiple-choice questions written by domain experts in biology, physics, and chemistry. We ensure that the questions are high-quality and extremely difficult: experts who have or are pursuing PhDs in the corresponding domains reach 65% accuracy (74% when discounting clear mistakes the experts identified in retrospect), while highly skilled non-expert v… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

    Comments: 28 pages, 5 figures, 7 tables

  2. arXiv:2311.08702  [pdf, other

    cs.AI cs.CL

    Debate Helps Supervise Unreliable Experts

    Authors: Julian Michael, Salsabila Mahdi, David Rein, Jackson Petty, Julien Dirani, Vishakh Padmakumar, Samuel R. Bowman

    Abstract: As AI systems are used to answer more difficult questions and potentially help create new knowledge, judging the truthfulness of their outputs becomes more difficult and more important. How can we supervise unreliable experts, which have access to the truth but may not accurately report it, to give answers that are systematically true and don't just superficially seem true, when the supervisor can… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.

    Comments: 84 pages, 13 footnotes, 5 figures, 4 tables, 28 debate transcripts; data and code at https://github.com/julianmichael/debate/tree/2023-nyu-experiments

    ACM Class: I.2.0