Skip to main content

Showing 1–3 of 3 results for author: Keith, F

.
  1. arXiv:2210.15135  [pdf, other

    cs.CL cs.SD eess.AS

    Training Autoregressive Speech Recognition Models with Limited in-domain Supervision

    Authors: Chak-Fai Li, Francis Keith, William Hartmann, Matthew Snover

    Abstract: Advances in self-supervised learning have significantly reduced the amount of transcribed audio required for training. However, the majority of work in this area is focused on read speech. We explore limited supervision in the domain of conversational speech. While we assume the amount of in-domain data is limited, we augment the model with open source read speech data. The XLS-R model has been sh… ▽ More

    Submitted 26 October, 2022; originally announced October 2022.

    Comments: Submitted to IEEE ICASSP 2023

  2. arXiv:2110.15836  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Combining Unsupervised and Text Augmented Semi-Supervised Learning for Low Resourced Autoregressive Speech Recognition

    Authors: Chak-Fai Li, Francis Keith, William Hartmann, Matthew Snover

    Abstract: Recent advances in unsupervised representation learning have demonstrated the impact of pretraining on large amounts of read speech. We adapt these techniques for domain adaptation in low-resource -- both in terms of data and compute -- conversational and broadcast domains. Moving beyond CTC, we pretrain state-of-the-art Conformer models in an unsupervised manner. While the unsupervised approach o… ▽ More

    Submitted 11 February, 2022; v1 submitted 29 October, 2021; originally announced October 2021.

    Comments: 5 pages, minor changes for camera ready version, to be published in IEEE ICASSP 2022

  3. arXiv:2106.07716  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Overcoming Domain Mismatch in Low Resource Sequence-to-Sequence ASR Models using Hybrid Generated Pseudotranscripts

    Authors: Chak-Fai Li, Francis Keith, William Hartmann, Matthew Snover, Owen Kimball

    Abstract: Sequence-to-sequence (seq2seq) models are competitive with hybrid models for automatic speech recognition (ASR) tasks when large amounts of training data are available. However, data sparsity and domain adaptation are more problematic for seq2seq models than their hybrid counterparts. We examine corpora of five languages from the IARPA MATERIAL program where the transcribed data is conversational… ▽ More

    Submitted 14 June, 2021; originally announced June 2021.

    Comments: 5 pages