Skip to main content

Showing 1–3 of 3 results for author: Bedyakin, R

Searching in archive eess. Search in all archives.
.
  1. arXiv:2106.00052  [pdf

    eess.AS cs.CL cs.LG cs.SD

    Low-Resource Spoken Language Identification Using Self-Attentive Pooling and Deep 1D Time-Channel Separable Convolutions

    Authors: Roman Bedyakin, Nikolay Mikhaylovskiy

    Abstract: This memo describes NTR/TSU winning submission for Low Resource ASR challenge at Dialog2021 conference, language identification track. Spoken Language Identification (LID) is an important step in a multilingual Automated Speech Recognition (ASR) system pipeline. Traditionally, the ASR task requires large volumes of labeled data that are unattainable for most of the world's languages, including m… ▽ More

    Submitted 31 May, 2021; originally announced June 2021.

    Comments: Accepted to Dialog2021. arXiv admin note: text overlap with arXiv:2104.11985

  2. Language ID Prediction from Speech Using Self-Attentive Pooling and 1D-Convolutions

    Authors: Roman Bedyakin, Nikolay Mikhaylovskiy

    Abstract: This memo describes NTR-TSU submission for SIGTYP 2021 Shared Task on predicting language IDs from speech. Spoken Language Identification (LID) is an important step in a multilingual Automated Speech Recognition (ASR) system pipeline. For many low-resource and endangered languages, only single-speaker recordings may be available, demanding a need for domain and speaker-invariant language ID syst… ▽ More

    Submitted 24 April, 2021; originally announced April 2021.

    Comments: Accepted to SYGTYP-2021

  3. arXiv:2103.16193  [pdf

    eess.AS cs.SD

    MediaSpeech: Multilanguage ASR Benchmark and Dataset

    Authors: Rostislav Kolobov, Olga Okhapkina, Olga Omelchishina, Andrey Platunov, Roman Bedyakin, Vyacheslav Moshkin, Dmitry Menshikov, Nikolay Mikhaylovskiy

    Abstract: The performance of automated speech recognition (ASR) systems is well known to differ for varied application domains. At the same time, vendors and research groups typically report ASR quality results either for limited use simplistic domains (audiobooks, TED talks), or proprietary datasets. To fill this gap, we provide an open-source 10-hour ASR system evaluation dataset NTR MediaSpeech for 4 lan… ▽ More

    Submitted 30 March, 2021; originally announced March 2021.