Skip to main content

Showing 1–2 of 2 results for author: Korostik, R

Searching in archive eess. Search in all archives.
.
  1. arXiv:2302.14036  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Text-only domain adaptation for end-to-end ASR using integrated text-to-mel-spectrogram generator

    Authors: Vladimir Bataev, Roman Korostik, Evgeny Shabalin, Vitaly Lavrukhin, Boris Ginsburg

    Abstract: We propose an end-to-end Automatic Speech Recognition (ASR) system that can be trained on transcribed speech data, text-only data, or a mixture of both. The proposed model uses an integrated auxiliary block for text-based training. This block combines a non-autoregressive multi-speaker text-to-mel-spectrogram generator with a GAN-based enhancer to improve the spectrogram quality. The proposed syst… ▽ More

    Submitted 16 August, 2023; v1 submitted 27 February, 2023; originally announced February 2023.

    Comments: Accepted to INTERSPEECH 2023

  2. arXiv:2005.07157  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    You Do Not Need More Data: Improving End-To-End Speech Recognition by Text-To-Speech Data Augmentation

    Authors: Aleksandr Laptev, Roman Korostik, Aleksey Svischev, Andrei Andrusenko, Ivan Medennikov, Sergey Rybin

    Abstract: Data augmentation is one of the most effective ways to make end-to-end automatic speech recognition (ASR) perform close to the conventional hybrid approach, especially when dealing with low-resource tasks. Using recent advances in speech synthesis (text-to-speech, or TTS), we build our TTS system on an ASR training database and then extend the data with synthesized speech to train a recognition mo… ▽ More

    Submitted 30 July, 2020; v1 submitted 14 May, 2020; originally announced May 2020.