Earnings-21: A Practical Benchmark for ASR in the Wild
Authors:
Miguel Del Rio,
Natalie Delworth,
Ryan Westerman,
Michelle Huang,
Nishchal Bhandari,
Joseph Palakapilly,
Quinten McNamara,
Joshua Dong,
Piotr Zelasko,
Miguel Jette
Abstract:
Commonly used speech corpora inadequately challenge academic and commercial ASR systems. In particular, speech corpora lack metadata needed for detailed analysis and WER measurement. In response, we present Earnings-21, a 39-hour corpus of earnings calls containing entity-dense speech from nine different financial sectors. This corpus is intended to benchmark ASR systems in the wild with special a…
▽ More
Commonly used speech corpora inadequately challenge academic and commercial ASR systems. In particular, speech corpora lack metadata needed for detailed analysis and WER measurement. In response, we present Earnings-21, a 39-hour corpus of earnings calls containing entity-dense speech from nine different financial sectors. This corpus is intended to benchmark ASR systems in the wild with special attention towards named entity recognition. We benchmark four commercial ASR models, two internal models built with open-source tools, and an open-source LibriSpeech model and discuss their differences in performance on Earnings-21. Using our recently released fstalign tool, we provide a candid analysis of each model's recognition capabilities under different partitions. Our analysis finds that ASR accuracy for certain NER categories is poor, presenting a significant impediment to transcript comprehension and usage. Earnings-21 bridges academic and commercial ASR system evaluation and enables further research on entity modeling and WER on real world audio.
△ Less
Submitted 15 June, 2021; v1 submitted 22 April, 2021;
originally announced April 2021.
Accented Speech Recognition: A Survey
Authors:
Arthur Hinsvark,
Natalie Delworth,
Miguel Del Rio,
Quinten McNamara,
Joshua Dong,
Ryan Westerman,
Michelle Huang,
Joseph Palakapilly,
Jennifer Drexler,
Ilya Pirkin,
Nishchal Bhandari,
Miguel Jette
Abstract:
Automatic Speech Recognition (ASR) systems generalize poorly on accented speech. The phonetic and linguistic variability of accents present hard challenges for ASR systems today in both data collection and modeling strategies. The resulting bias in ASR performance across accents comes at a cost to both users and providers of ASR.
We present a survey of current promising approaches to accented sp…
▽ More
Automatic Speech Recognition (ASR) systems generalize poorly on accented speech. The phonetic and linguistic variability of accents present hard challenges for ASR systems today in both data collection and modeling strategies. The resulting bias in ASR performance across accents comes at a cost to both users and providers of ASR.
We present a survey of current promising approaches to accented speech recognition and highlight the key challenges in the space. Approaches mostly focus on single model generalization and accent feature engineering. Among the challenges, lack of a standard benchmark makes research and comparison especially difficult.
△ Less
Submitted 2 June, 2021; v1 submitted 21 April, 2021;
originally announced April 2021.