Skip to main content

Showing 1–4 of 4 results for author: Karpov, N

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.01446  [pdf

    cs.CL cs.LG eess.AS eess.SP

    Enabling ASR for Low-Resource Languages: A Comprehensive Dataset Creation Approach

    Authors: Ara Yeroyan, Nikolay Karpov

    Abstract: In recent years, automatic speech recognition (ASR) systems have significantly improved, especially in languages with a vast amount of transcribed speech data. However, ASR systems tend to perform poorly for low-resource languages with fewer resources, such as minority and regional languages. This study introduces a novel pipeline designed to generate ASR training datasets from audiobooks, which t… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 13 pages, 10 figures (including ablation studies), to be published in 2024 IEEE Spoken Language Technology Workshop. Additionally, the associated software package can be accessed at (https://pypi.org/project/vac-aligner/) for practical applications and further development

  2. arXiv:2310.12378  [pdf, other

    eess.AS cs.SD

    The CHiME-7 Challenge: System Description and Performance of NeMo Team's DASR System

    Authors: Tae ** Park, He Huang, Ante Jukic, Kunal Dhawan, Krishna C. Puvvada, Nithin Koluguri, Nikolay Karpov, Aleksandr Laptev, Jagadeesh Balam, Boris Ginsburg

    Abstract: We present the NVIDIA NeMo team's multi-channel speech recognition system for the 7th CHiME Challenge Distant Automatic Speech Recognition (DASR) Task, focusing on the development of a multi-channel, multi-speaker speech recognition system tailored to transcribe speech from distributed microphones and microphone arrays. The system predominantly comprises of the following integral modules: the Spea… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

    Journal ref: CHiME-7 Workshop 2023

  3. arXiv:2212.12266  [pdf, other

    eess.AS

    Large Raw Emotional Dataset with Aggregation Mechanism

    Authors: Vladimir Kondratenko, Artem Sokolov, Nikolay Karpov, Oleg Kutuzov, Nikita Savushkin, Fyodor Minkin

    Abstract: We present a new data set for speech emotion recognition (SER) tasks called Dusha. The corpus contains approximately 350 hours of data, more than 300 000 audio recordings with Russian speech and their transcripts. Therefore it is the biggest open bi-modal data collection for SER task nowadays. It is annotated using a crowd-sourcing platform and includes two subsets: acted and real-life. Acted subs… ▽ More

    Submitted 23 December, 2022; originally announced December 2022.

    Comments: 6 pages, 1 figures, submitted to ICASSP 2023

    MSC Class: 62-07 ACM Class: I.2.7

  4. arXiv:2106.10161  [pdf, other

    eess.AS

    Golos: Russian Dataset for Speech Research

    Authors: Nikolay Karpov, Alexander Denisenko, Fedor Minkin

    Abstract: This paper introduces a novel Russian speech dataset called Golos, a large corpus suitable for speech research. The dataset mainly consists of recorded audio files manually annotated on the crowd-sourcing platform. The total duration of the audio is about 1240 hours. We have made the corpus freely available to download, along with the acoustic model with CTC loss prepared on this corpus. Additiona… ▽ More

    Submitted 18 June, 2021; originally announced June 2021.

    Comments: 5 pages, 3 figures, accepted to Interspeech2021

    ACM Class: E.m; I.5.1