Skip to main content

Showing 1–7 of 7 results for author: Medennikov, I

Searching in archive eess. Search in all archives.
.
  1. arXiv:2104.02526  [pdf, ps, other

    eess.AS cs.CL cs.LG

    LT-LM: a novel non-autoregressive language model for single-shot lattice rescoring

    Authors: Anton Mitrofanov, Mariya Korenevskaya, Ivan Podluzhny, Yuri Khokhlov, Aleksandr Laptev, Andrei Andrusenko, Aleksei Ilin, Maxim Korenevsky, Ivan Medennikov, Aleksei Romanenko

    Abstract: Neural network-based language models are commonly used in rescoring approaches to improve the quality of modern automatic speech recognition (ASR) systems. Most of the existing methods are computationally expensive since they use autoregressive language models. We propose a novel rescoring approach, which processes the entire lattice in a single call to the model. The key feature of our rescoring… ▽ More

    Submitted 6 April, 2021; originally announced April 2021.

    Comments: Submitted to InterSpeech 2021

  2. arXiv:2103.07186  [pdf, ps, other

    eess.AS cs.CL cs.LG cs.SD

    Dynamic Acoustic Unit Augmentation With BPE-Dropout for Low-Resource End-to-End Speech Recognition

    Authors: Aleksandr Laptev, Andrei Andrusenko, Ivan Podluzhny, Anton Mitrofanov, Ivan Medennikov, Yuri Matveev

    Abstract: With the rapid development of speech assistants, adapting server-intended automatic speech recognition (ASR) solutions to a direct device has become crucial. Researchers and industry prefer to use end-to-end ASR systems for on-device speech recognition tasks. This is because end-to-end systems can be made resource-efficient while maintaining a higher quality compared to hybrid systems. However, bu… ▽ More

    Submitted 12 March, 2021; originally announced March 2021.

    Comments: 16 pages, 7 figures

  3. arXiv:2006.08274  [pdf, ps, other

    eess.AS cs.CL cs.LG cs.SD

    Exploration of End-to-End ASR for OpenSTT -- Russian Open Speech-to-Text Dataset

    Authors: Andrei Andrusenko, Aleksandr Laptev, Ivan Medennikov

    Abstract: This paper presents an exploration of end-to-end automatic speech recognition systems (ASR) for the largest open-source Russian language data set -- OpenSTT. We evaluate different existing end-to-end approaches such as joint CTC/Attention, RNN-Transducer, and Transformer. All of them are compared with the strong hybrid ASR system based on LF-MMI TDNN-F acoustic model. For the three available valid… ▽ More

    Submitted 26 July, 2020; v1 submitted 15 June, 2020; originally announced June 2020.

    Comments: Accepted by SPECOM 2020

  4. Target-Speaker Voice Activity Detection: a Novel Approach for Multi-Speaker Diarization in a Dinner Party Scenario

    Authors: Ivan Medennikov, Maxim Korenevsky, Tatiana Prisyach, Yuri Khokhlov, Mariya Korenevskaya, Ivan Sorokin, Tatiana Timofeeva, Anton Mitrofanov, Andrei Andrusenko, Ivan Podluzhny, Aleksandr Laptev, Aleksei Romanenko

    Abstract: Speaker diarization for real-life scenarios is an extremely challenging problem. Widely used clustering-based diarization approaches perform rather poorly in such conditions, mainly due to the limited ability to handle overlap** speech. We propose a novel Target-Speaker Voice Activity Detection (TS-VAD) approach, which directly predicts an activity of each speaker on each time frame. TS-VAD mode… ▽ More

    Submitted 27 July, 2020; v1 submitted 14 May, 2020; originally announced May 2020.

    Comments: Accepted to Interspeech 2020

  5. arXiv:2005.07157  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    You Do Not Need More Data: Improving End-To-End Speech Recognition by Text-To-Speech Data Augmentation

    Authors: Aleksandr Laptev, Roman Korostik, Aleksey Svischev, Andrei Andrusenko, Ivan Medennikov, Sergey Rybin

    Abstract: Data augmentation is one of the most effective ways to make end-to-end automatic speech recognition (ASR) perform close to the conventional hybrid approach, especially when dealing with low-resource tasks. Using recent advances in speech synthesis (text-to-speech, or TTS), we build our TTS system on an ASR training database and then extend the data with synthesized speech to train a recognition mo… ▽ More

    Submitted 30 July, 2020; v1 submitted 14 May, 2020; originally announced May 2020.

  6. arXiv:2004.10799  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Towards a Competitive End-to-End Speech Recognition for CHiME-6 Dinner Party Transcription

    Authors: Andrei Andrusenko, Aleksandr Laptev, Ivan Medennikov

    Abstract: While end-to-end ASR systems have proven competitive with the conventional hybrid approach, they are prone to accuracy degradation when it comes to noisy and low-resource conditions. In this paper, we argue that, even in such difficult cases, some end-to-end approaches show performance close to the hybrid baseline. To demonstrate this, we use the CHiME-6 Challenge data as an example of challenging… ▽ More

    Submitted 7 August, 2020; v1 submitted 22 April, 2020; originally announced April 2020.

    Comments: Accepted by Interspeech 2020

  7. arXiv:1807.00868  [pdf, other

    cs.SD cs.CL eess.AS

    Exploring End-to-End Techniques for Low-Resource Speech Recognition

    Authors: Vladimir Bataev, Maxim Korenevsky, Ivan Medennikov, Alexander Zatvornitskiy

    Abstract: In this work we present simple grapheme-based system for low-resource speech recognition using Babel data for Turkish spontaneous speech (80 hours). We have investigated different neural network architectures performance, including fully-convolutional, recurrent and ResNet with GRU. Different features and normalization techniques are compared as well. We also proposed CTC-loss modification using s… ▽ More

    Submitted 2 July, 2018; originally announced July 2018.

    Comments: Accepted for Specom 2018, 20th International Conference on Speech and Computer