Skip to main content

Showing 1–7 of 7 results for author: Del Rio, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2203.15591  [pdf, other

    cs.CL

    Earnings-22: A Practical Benchmark for Accents in the Wild

    Authors: Miguel Del Rio, Peter Ha, Quinten McNamara, Corey Miller, Shipra Chandra

    Abstract: Modern automatic speech recognition (ASR) systems have achieved superhuman Word Error Rate (WER) on many common corpora despite lacking adequate performance on speech in the wild. Beyond that, there is a lack of real-world, accented corpora to properly benchmark academic and commercial models. To ensure this type of speech is represented in ASR benchmarking, we present Earnings-22, a 125 file, 119… ▽ More

    Submitted 29 March, 2022; originally announced March 2022.

    Comments: Submitted to Interspeech 2022

  2. Earnings-21: A Practical Benchmark for ASR in the Wild

    Authors: Miguel Del Rio, Natalie Delworth, Ryan Westerman, Michelle Huang, Nishchal Bhandari, Joseph Palakapilly, Quinten McNamara, Joshua Dong, Piotr Zelasko, Miguel Jette

    Abstract: Commonly used speech corpora inadequately challenge academic and commercial ASR systems. In particular, speech corpora lack metadata needed for detailed analysis and WER measurement. In response, we present Earnings-21, a 39-hour corpus of earnings calls containing entity-dense speech from nine different financial sectors. This corpus is intended to benchmark ASR systems in the wild with special a… ▽ More

    Submitted 15 June, 2021; v1 submitted 22 April, 2021; originally announced April 2021.

    Comments: Accepted to INTERSPEECH 2021. June 15 2021: Addressing the comments of reviewers and updating the results of our internal ESPNet model. The results do not change our conclusions. April 28th, 2021: We found and resolved an issue in our experimental evaluation that scored the LibriSpeech model at ~20% worse relative WER than the actual WER. The updated results do not affect our conclusions

  3. arXiv:2104.10747  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Accented Speech Recognition: A Survey

    Authors: Arthur Hinsvark, Natalie Delworth, Miguel Del Rio, Quinten McNamara, Joshua Dong, Ryan Westerman, Michelle Huang, Joseph Palakapilly, Jennifer Drexler, Ilya Pirkin, Nishchal Bhandari, Miguel Jette

    Abstract: Automatic Speech Recognition (ASR) systems generalize poorly on accented speech. The phonetic and linguistic variability of accents present hard challenges for ASR systems today in both data collection and modeling strategies. The resulting bias in ASR performance across accents comes at a cost to both users and providers of ASR. We present a survey of current promising approaches to accented sp… ▽ More

    Submitted 2 June, 2021; v1 submitted 21 April, 2021; originally announced April 2021.

  4. arXiv:2104.07578  [pdf, other

    cs.CL

    Syntactic Perturbations Reveal Representational Correlates of Hierarchical Phrase Structure in Pretrained Language Models

    Authors: Matteo Alleman, Jonathan Mamou, Miguel A Del Rio, Hanlin Tang, Yoon Kim, SueYeon Chung

    Abstract: While vector-based language representations from pretrained language models have set a new standard for many NLP tasks, there is not yet a complete accounting of their inner workings. In particular, it is not entirely clear what aspects of sentence-level syntax are captured by these representations, nor how (if at all) they are built along the stacked layers of the network. In this paper, we aim t… ▽ More

    Submitted 15 April, 2021; originally announced April 2021.

    Comments: 12 pages, 7 figures

  5. arXiv:2006.01095  [pdf, other

    cs.CL cs.NE

    Emergence of Separable Manifolds in Deep Language Representations

    Authors: Jonathan Mamou, Hang Le, Miguel Del Rio, Cory Stephenson, Hanlin Tang, Yoon Kim, SueYeon Chung

    Abstract: Deep neural networks (DNNs) have shown much empirical success in solving perceptual tasks across various cognitive modalities. While they are only loosely inspired by the biological brain, recent studies report considerable similarities between representations extracted from task-optimized DNNs and neural populations in the brain. DNNs have subsequently become a popular model class to infer comput… ▽ More

    Submitted 8 July, 2020; v1 submitted 1 June, 2020; originally announced June 2020.

    Comments: 9 pages. 10 figures. Accepted to ICML 2020. Included supplemental materials

  6. arXiv:1908.10546  [pdf, other

    cs.CV cs.CL

    Fingerspelling recognition in the wild with iterative visual attention

    Authors: Bowen Shi, Aurora Martinez Del Rio, Jonathan Keane, Diane Brentari, Greg Shakhnarovich, Karen Livescu

    Abstract: Sign language recognition is a challenging gesture sequence recognition problem, characterized by quick and highly coarticulated motion. In this paper we focus on recognition of fingerspelling sequences in American Sign Language (ASL) videos collected in the wild, mainly from YouTube and Deaf social media. Most previous work on sign language recognition has focused on controlled settings where the… ▽ More

    Submitted 28 August, 2019; originally announced August 2019.

    Comments: ICCV 2019

  7. arXiv:1810.11438  [pdf, other

    cs.CV cs.CL

    American Sign Language fingerspelling recognition in the wild

    Authors: Bowen Shi, Aurora Martinez Del Rio, Jonathan Keane, Jonathan Michaux, Diane Brentari, Greg Shakhnarovich, Karen Livescu

    Abstract: We address the problem of American Sign Language fingerspelling recognition in the wild, using videos collected from websites. We introduce the largest data set available so far for the problem of fingerspelling recognition, and the first using naturally occurring video data. Using this data set, we present the first attempt to recognize fingerspelling sequences in this challenging setting. Unlike… ▽ More

    Submitted 17 February, 2019; v1 submitted 26 October, 2018; originally announced October 2018.

    Comments: accepted in SLT 2018