Skip to main content

Showing 1–4 of 4 results for author: Bakhturina, E

Searching in archive eess. Search in all archives.
.
  1. arXiv:2306.02317  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    SpellMapper: A non-autoregressive neural spellchecker for ASR customization with candidate retrieval based on n-gram map**s

    Authors: Alexandra Antonova, Evelina Bakhturina, Boris Ginsburg

    Abstract: Contextual spelling correction models are an alternative to shallow fusion to improve automatic speech recognition (ASR) quality given user vocabulary. To deal with large user vocabularies, most of these models include candidate retrieval mechanisms, usually based on minimum edit distance between fragments of ASR hypothesis and user phrases. However, the edit-distance approach is slow, non-trainab… ▽ More

    Submitted 4 June, 2023; originally announced June 2023.

    Comments: Accepted by INTERSPEECH 2023

  2. arXiv:2104.05055  [pdf, other

    cs.CL cs.SD eess.AS

    NeMo Inverse Text Normalization: From Development To Production

    Authors: Yang Zhang, Evelina Bakhturina, Kyle Gorman, Boris Ginsburg

    Abstract: Inverse text normalization (ITN) converts spoken-domain automatic speech recognition (ASR) output into written-domain text to improve the readability of the ASR output. Many state-of-the-art ITN systems use hand-written weighted finite-state transducer(WFST) grammars since this task has extremely low tolerance to unrecoverable errors. We introduce an open-source Python WFST-based library for ITN w… ▽ More

    Submitted 17 May, 2021; v1 submitted 11 April, 2021; originally announced April 2021.

  3. arXiv:2104.04896  [pdf

    eess.AS cs.CL cs.SD

    A Toolbox for Construction and Analysis of Speech Datasets

    Authors: Evelina Bakhturina, Vitaly Lavrukhin, Boris Ginsburg

    Abstract: Automatic Speech Recognition and Text-to-Speech systems are primarily trained in a supervised fashion and require high-quality, accurately labeled speech datasets. In this work, we examine common problems with speech data and introduce a toolbox for the construction and interactive error analysis of speech datasets. The construction tool is based on Kürzinger et al. work, and, to the best of our k… ▽ More

    Submitted 6 January, 2022; v1 submitted 10 April, 2021; originally announced April 2021.

  4. arXiv:2104.01497  [pdf, other

    eess.AS

    Hi-Fi Multi-Speaker English TTS Dataset

    Authors: Evelina Bakhturina, Vitaly Lavrukhin, Boris Ginsburg, Yang Zhang

    Abstract: This paper introduces a new multi-speaker English dataset for training text-to-speech models. The dataset is based on LibriVox audiobooks and Project Gutenberg texts, both in the public domain. The new dataset contains about 292 hours of speech from 10 speakers with at least 17 hours per speaker sampled at 44.1 kHz. To select speech samples with high quality, we considered audio recordings with a… ▽ More

    Submitted 14 June, 2021; v1 submitted 3 April, 2021; originally announced April 2021.