Skip to main content

Showing 1–5 of 5 results for author: McHardy, R

.
  1. arXiv:2406.04127  [pdf, other

    cs.CL cs.AI

    Are We Done with MMLU?

    Authors: Aryo Pradipta Gema, Joshua Ong Jun Leang, Giwon Hong, Alessio Devoto, Alberto Carlo Maria Mancino, Rohit Saxena, Xuanli He, Yu Zhao, Xiaotang Du, Mohammad Reza Ghasemi Madani, Claire Barale, Robert McHardy, Joshua Harris, Jean Kaddour, Emile van Krieken, Pasquale Minervini

    Abstract: Maybe not. We identify and analyse errors in the popular Massive Multitask Language Understanding (MMLU) benchmark. Even though MMLU is widely adopted, our analysis demonstrates numerous ground truth errors that obscure the true capabilities of LLMs. For example, we find that 57% of the analysed questions in the Virology subset contain errors. To address this issue, we introduce a comprehensive fr… ▽ More

    Submitted 7 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

  2. arXiv:2404.09841  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Anatomy of Industrial Scale Multilingual ASR

    Authors: Francis McCann Ramirez, Luka Chkhetiani, Andrew Ehrenberg, Robert McHardy, Rami Botros, Yash Khare, Andrea Vanzo, Taufiquzzaman Peyash, Gabriel Oexle, Michael Liang, Ilya Sklyar, Enver Fakhan, Ahmed Etefy, Daniel McCrystal, Sam Flamini, Domenic Donato, Takuya Yoshioka

    Abstract: This paper describes AssemblyAI's industrial-scale automatic speech recognition (ASR) system, designed to meet the requirements of large-scale, multilingual ASR serving various application needs. Our system leverages a diverse training dataset comprising unsupervised (12.5M hours), supervised (188k hours), and pseudo-labeled (1.6M hours) data across four languages. We provide a detailed descriptio… ▽ More

    Submitted 16 April, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

  3. arXiv:2307.10169  [pdf, other

    cs.CL cs.AI cs.LG

    Challenges and Applications of Large Language Models

    Authors: Jean Kaddour, Joshua Harris, Maximilian Mozes, Herbie Bradley, Roberta Raileanu, Robert McHardy

    Abstract: Large Language Models (LLMs) went from non-existent to ubiquitous in the machine learning discourse within a few years. Due to the fast pace of the field, it is difficult to identify the remaining challenges and already fruitful application areas. In this paper, we aim to establish a systematic set of open problems and application successes so that ML researchers can comprehend the field's current… ▽ More

    Submitted 19 July, 2023; originally announced July 2023.

    Comments: 72 pages. v01. Work in progress. Feedback and comments are highly appreciated!

  4. arXiv:1902.11145  [pdf, other

    cs.CL

    Adversarial Training for Satire Detection: Controlling for Confounding Variables

    Authors: Robert McHardy, Heike Adel, Roman Klinger

    Abstract: The automatic detection of satire vs. regular news is relevant for downstream applications (for instance, knowledge base population) and to improve the understanding of linguistic characteristics of satire. Recent approaches build upon corpora which have been labeled automatically based on article sources. We hypothesize that this encourages the models to learn characteristics for different public… ▽ More

    Submitted 1 March, 2019; v1 submitted 28 February, 2019; originally announced February 2019.

    Comments: Accepted for publication at NAACL 2019

  5. arXiv:1307.2189  [pdf, other

    cs.SI physics.soc-ph

    On the Topology of the Facebook Page Network

    Authors: R. E. Slattery, R. R. McHardy, R. Bairathi

    Abstract: The Facebook Page Network (FPN) is a platform for Businesses, Public Figures and Organizations (BPOs) to connect with individuals and other BPOs in the digital space. For over a decade scale-free networks have most appropriately described a variety of seemingly disparate physical, biological and social real-world systems unified by similar network properties such as scale-invariance, growth via a… ▽ More

    Submitted 8 July, 2013; originally announced July 2013.

    Comments: 3 pages, 1 figure