Skip to main content

Showing 1–2 of 2 results for author: McDonnell, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2305.10951  [pdf, other

    cs.CL eess.AS

    Making More of Little Data: Improving Low-Resource Automatic Speech Recognition Using Data Augmentation

    Authors: Martijn Bartelds, Nay San, Bradley McDonnell, Dan Jurafsky, Martijn Wieling

    Abstract: The performance of automatic speech recognition (ASR) systems has advanced substantially in recent years, particularly for languages for which a large amount of transcribed speech is available. Unfortunately, for low-resource languages, such as minority languages, regional languages or dialects, ASR performance generally remains much lower. In this study, we investigate whether data augmentation t… ▽ More

    Submitted 18 May, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

    Comments: Accepted at ACL 2023

  2. arXiv:2302.04975  [pdf, other

    cs.CL

    Leveraging supplementary text data to kick-start automatic speech recognition system development with limited transcriptions

    Authors: Nay San, Martijn Bartelds, Blaine Billings, Ella de Falco, Hendi Feriza, Johan Safri, Wawan Sahrozi, Ben Foley, Bradley McDonnell, Dan Jurafsky

    Abstract: Recent research using pre-trained transformer models suggests that just 10 minutes of transcribed speech may be enough to fine-tune such a model for automatic speech recognition (ASR) -- at least if we can also leverage vast amounts of text data (803 million tokens). But is that much text data necessary? We study the use of different amounts of text data, both for creating a lexicon that constrain… ▽ More

    Submitted 9 February, 2023; originally announced February 2023.

    Comments: Accepted for ComputEL-6