Skip to main content

Showing 1–7 of 7 results for author: Mussakhojayeva, S

Searching in archive eess. Search in all archives.
.
  1. arXiv:2404.01033  [pdf, other

    eess.AS

    KazEmoTTS: A Dataset for Kazakh Emotional Text-to-Speech Synthesis

    Authors: Adal Abilbekov, Saida Mussakhojayeva, Rustem Yeshpanov, Huseyin Atakan Varol

    Abstract: This study focuses on the creation of the KazEmoTTS dataset, designed for emotional Kazakh text-to-speech (TTS) applications. KazEmoTTS is a collection of 54,760 audio-text pairs, with a total duration of 74.85 hours, featuring 34.23 hours delivered by a female narrator and 40.62 hours by two male narrators. The list of the emotions considered include "neutral", "angry", "happy", "sad", "scared",… ▽ More

    Submitted 9 April, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

    Comments: To appear in Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

  2. arXiv:2305.15749  [pdf, other

    eess.AS cs.CL

    Multilingual Text-to-Speech Synthesis for Turkic Languages Using Transliteration

    Authors: Rustem Yeshpanov, Saida Mussakhojayeva, Yerbolat Khassanov

    Abstract: This work aims to build a multilingual text-to-speech (TTS) synthesis system for ten lower-resourced Turkic languages: Azerbaijani, Bashkir, Kazakh, Kyrgyz, Sakha, Tatar, Turkish, Turkmen, Uyghur, and Uzbek. We specifically target the zero-shot learning scenario, where a TTS model trained using the data of one language is applied to synthesise speech for other, unseen languages. An end-to-end TTS… ▽ More

    Submitted 25 May, 2023; originally announced May 2023.

    Comments: 5 pages, 1 figure, 3 tables, accepted to Interspeech

  3. arXiv:2201.05771  [pdf, other

    eess.AS cs.CL cs.SD

    KazakhTTS2: Extending the Open-Source Kazakh TTS Corpus With More Data, Speakers, and Topics

    Authors: Saida Mussakhojayeva, Yerbolat Khassanov, Huseyin Atakan Varol

    Abstract: We present an expanded version of our previously released Kazakh text-to-speech (KazakhTTS) synthesis corpus. In the new KazakhTTS2 corpus, the overall size has increased from 93 hours to 271 hours, the number of speakers has risen from two to five (three females and two males), and the topic coverage has been diversified with the help of new sources, including a book and Wikipedia articles. This… ▽ More

    Submitted 20 April, 2022; v1 submitted 15 January, 2022; originally announced January 2022.

    Comments: 8 pages, 2 figures, 5 tables, accepted to LREC 2022

  4. arXiv:2108.01280  [pdf, ps, other

    eess.AS cs.CL

    A Study of Multilingual End-to-End Speech Recognition for Kazakh, Russian, and English

    Authors: Saida Mussakhojayeva, Yerbolat Khassanov, Huseyin Atakan Varol

    Abstract: We study training a single end-to-end (E2E) automatic speech recognition (ASR) model for three languages used in Kazakhstan: Kazakh, Russian, and English. We first describe the development of multilingual E2E ASR based on Transformer networks and then perform an extensive assessment on the aforementioned languages. We also compare two variants of output grapheme set construction: combined and inde… ▽ More

    Submitted 3 August, 2021; originally announced August 2021.

    Comments: 12 pages, 3 tables, accepted to SPECOM 2021

  5. arXiv:2107.14419  [pdf, other

    eess.AS cs.CL

    USC: An Open-Source Uzbek Speech Corpus and Initial Speech Recognition Experiments

    Authors: Muhammadjon Musaev, Saida Mussakhojayeva, Ilyos Khujayorov, Yerbolat Khassanov, Mannon Ochilov, Huseyin Atakan Varol

    Abstract: We present a freely available speech corpus for the Uzbek language and report preliminary automatic speech recognition (ASR) results using both the deep neural network hidden Markov model (DNN-HMM) and end-to-end (E2E) architectures. The Uzbek speech corpus (USC) comprises 958 different speakers with a total of 105 hours of transcribed audio recordings. To the best of our knowledge, this is the fi… ▽ More

    Submitted 29 July, 2021; originally announced July 2021.

    Comments: 11 pages, 2 figures, 2 tables, accepted to SPECOM 2021

  6. KazakhTTS: An Open-Source Kazakh Text-to-Speech Synthesis Dataset

    Authors: Saida Mussakhojayeva, Aigerim Janaliyeva, Almas Mirzakhmetov, Yerbolat Khassanov, Huseyin Atakan Varol

    Abstract: This paper introduces a high-quality open-source speech synthesis dataset for Kazakh, a low-resource language spoken by over 13 million people worldwide. The dataset consists of about 93 hours of transcribed audio recordings spoken by two professional speakers (female and male). It is the first publicly available large-scale dataset developed to promote Kazakh text-to-speech (TTS) applications in… ▽ More

    Submitted 16 June, 2021; v1 submitted 17 April, 2021; originally announced April 2021.

    Comments: 5 pages, 4 tables, 2 figures, accepted to INTERSPEECH 2021

  7. arXiv:2009.10334  [pdf, other

    eess.AS cs.CL cs.SD

    A Crowdsourced Open-Source Kazakh Speech Corpus and Initial Speech Recognition Baseline

    Authors: Yerbolat Khassanov, Saida Mussakhojayeva, Almas Mirzakhmetov, Alen Adiyev, Mukhamet Nurpeiissov, Huseyin Atakan Varol

    Abstract: We present an open-source speech corpus for the Kazakh language. The Kazakh speech corpus (KSC) contains around 332 hours of transcribed audio comprising over 153,000 utterances spoken by participants from different regions and age groups, as well as both genders. It was carefully inspected by native Kazakh speakers to ensure high quality. The KSC is the largest publicly available database develop… ▽ More

    Submitted 13 January, 2021; v1 submitted 22 September, 2020; originally announced September 2020.

    Comments: 10 pages, 5 figures, 4 tables, accepted by EACL2021

    Journal ref: https://aclanthology.org/2021.eacl-main.58