Skip to main content

Showing 1–10 of 10 results for author: Varol, H A

Searching in archive eess. Search in all archives.
.
  1. arXiv:2404.01033  [pdf, other

    eess.AS

    KazEmoTTS: A Dataset for Kazakh Emotional Text-to-Speech Synthesis

    Authors: Adal Abilbekov, Saida Mussakhojayeva, Rustem Yeshpanov, Huseyin Atakan Varol

    Abstract: This study focuses on the creation of the KazEmoTTS dataset, designed for emotional Kazakh text-to-speech (TTS) applications. KazEmoTTS is a collection of 54,760 audio-text pairs, with a total duration of 74.85 hours, featuring 34.23 hours delivered by a female narrator and 40.62 hours by two male narrators. The list of the emotions considered include "neutral", "angry", "happy", "sad", "scared",… ▽ More

    Submitted 9 April, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

    Comments: To appear in Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

  2. arXiv:2201.05771  [pdf, other

    eess.AS cs.CL cs.SD

    KazakhTTS2: Extending the Open-Source Kazakh TTS Corpus With More Data, Speakers, and Topics

    Authors: Saida Mussakhojayeva, Yerbolat Khassanov, Huseyin Atakan Varol

    Abstract: We present an expanded version of our previously released Kazakh text-to-speech (KazakhTTS) synthesis corpus. In the new KazakhTTS2 corpus, the overall size has increased from 93 hours to 271 hours, the number of speakers has risen from two to five (three females and two males), and the topic coverage has been diversified with the help of new sources, including a book and Wikipedia articles. This… ▽ More

    Submitted 20 April, 2022; v1 submitted 15 January, 2022; originally announced January 2022.

    Comments: 8 pages, 2 figures, 5 tables, accepted to LREC 2022

  3. arXiv:2110.12136  [pdf, other

    cs.CV cs.SD eess.AS eess.IV

    A Study of Multimodal Person Verification Using Audio-Visual-Thermal Data

    Authors: Madina Abdrakhmanova, Saniya Abushakimova, Yerbolat Khassanov, Huseyin Atakan Varol

    Abstract: In this paper, we study an approach to multimodal person verification using audio, visual, and thermal modalities. The combination of audio and visual modalities has already been shown to be effective for robust person verification. From this perspective, we investigate the impact of further increasing the number of modalities by adding thermal images. In particular, we implemented unimodal, bimod… ▽ More

    Submitted 4 March, 2022; v1 submitted 23 October, 2021; originally announced October 2021.

    Comments: 7 pages, 4 figures, 4 tables

  4. arXiv:2108.01280  [pdf, ps, other

    eess.AS cs.CL

    A Study of Multilingual End-to-End Speech Recognition for Kazakh, Russian, and English

    Authors: Saida Mussakhojayeva, Yerbolat Khassanov, Huseyin Atakan Varol

    Abstract: We study training a single end-to-end (E2E) automatic speech recognition (ASR) model for three languages used in Kazakhstan: Kazakh, Russian, and English. We first describe the development of multilingual E2E ASR based on Transformer networks and then perform an extensive assessment on the aforementioned languages. We also compare two variants of output grapheme set construction: combined and inde… ▽ More

    Submitted 3 August, 2021; originally announced August 2021.

    Comments: 12 pages, 3 tables, accepted to SPECOM 2021

  5. arXiv:2107.14419  [pdf, other

    eess.AS cs.CL

    USC: An Open-Source Uzbek Speech Corpus and Initial Speech Recognition Experiments

    Authors: Muhammadjon Musaev, Saida Mussakhojayeva, Ilyos Khujayorov, Yerbolat Khassanov, Mannon Ochilov, Huseyin Atakan Varol

    Abstract: We present a freely available speech corpus for the Uzbek language and report preliminary automatic speech recognition (ASR) results using both the deep neural network hidden Markov model (DNN-HMM) and end-to-end (E2E) architectures. The Uzbek speech corpus (USC) comprises 958 different speakers with a total of 105 hours of transcribed audio recordings. To the best of our knowledge, this is the fi… ▽ More

    Submitted 29 July, 2021; originally announced July 2021.

    Comments: 11 pages, 2 figures, 2 tables, accepted to SPECOM 2021

  6. arXiv:2107.08673  [pdf, other

    eess.IV cs.CV cs.LG

    Input Agnostic Deep Learning for Alzheimer's Disease Classification Using Multimodal MRI Images

    Authors: Aidana Massalimova, Huseyin Atakan Varol

    Abstract: Alzheimer's disease (AD) is a progressive brain disorder that causes memory and functional impairments. The advances in machine learning and publicly available medical datasets initiated multiple studies in AD diagnosis. In this work, we utilize a multi-modal deep learning approach in classifying normal cognition, mild cognitive impairment and AD classes on the basis of structural MRI and diffusio… ▽ More

    Submitted 19 July, 2021; originally announced July 2021.

    Comments: 4 pages, submitted to EMBC 2021

  7. End-to-End Deep Fault Tolerant Control

    Authors: Daulet Baimukashev, Bexultan Rakhim, Matteo Rubagotti, Huseyin Atakan Varol

    Abstract: PUBLISHED ON IEEE/ASME TRANSACTIONS ON MECHATRONICS, DOI: 10.1109/TMECH.2021.3100150. Ideally, accurate sensor measurements are needed to achieve a good performance in the closed-loop control of mechatronic systems. As a consequence, sensor faults will prevent the system from working correctly, unless a fault-tolerant control (FTC) architecture is adopted. As model-based FTC algorithms for nonline… ▽ More

    Submitted 30 November, 2021; v1 submitted 28 May, 2021; originally announced May 2021.

    Comments: 11 pages, 7 figures

  8. KazakhTTS: An Open-Source Kazakh Text-to-Speech Synthesis Dataset

    Authors: Saida Mussakhojayeva, Aigerim Janaliyeva, Almas Mirzakhmetov, Yerbolat Khassanov, Huseyin Atakan Varol

    Abstract: This paper introduces a high-quality open-source speech synthesis dataset for Kazakh, a low-resource language spoken by over 13 million people worldwide. The dataset consists of about 93 hours of transcribed audio recordings spoken by two professional speakers (female and male). It is the first publicly available large-scale dataset developed to promote Kazakh text-to-speech (TTS) applications in… ▽ More

    Submitted 16 June, 2021; v1 submitted 17 April, 2021; originally announced April 2021.

    Comments: 5 pages, 4 tables, 2 figures, accepted to INTERSPEECH 2021

  9. arXiv:2009.10334  [pdf, other

    eess.AS cs.CL cs.SD

    A Crowdsourced Open-Source Kazakh Speech Corpus and Initial Speech Recognition Baseline

    Authors: Yerbolat Khassanov, Saida Mussakhojayeva, Almas Mirzakhmetov, Alen Adiyev, Mukhamet Nurpeiissov, Huseyin Atakan Varol

    Abstract: We present an open-source speech corpus for the Kazakh language. The Kazakh speech corpus (KSC) contains around 332 hours of transcribed audio comprising over 153,000 utterances spoken by participants from different regions and age groups, as well as both genders. It was carefully inspected by native Kazakh speakers to ensure high quality. The KSC is the largest publicly available database develop… ▽ More

    Submitted 13 January, 2021; v1 submitted 22 September, 2020; originally announced September 2020.

    Comments: 10 pages, 5 figures, 4 tables, accepted by EACL2021

    Journal ref: https://aclanthology.org/2021.eacl-main.58

  10. arXiv:2003.08605  [pdf

    eess.IV cs.CV

    End-to-End Deep Diagnosis of X-ray Images

    Authors: Kudaibergen Urinbayev, Yerassyl Orazbek, Yernur Nurambek, Almas Mirzakhmetov, Huseyin Atakan Varol

    Abstract: In this work, we present an end-to-end deep learning framework for X-ray image diagnosis. As the first step, our system determines whether a submitted image is an X-ray or not. After it classifies the type of the X-ray, it runs the dedicated abnormality classification network. In this work, we only focus on the chest X-rays for abnormality classification. However, the system can be extended to oth… ▽ More

    Submitted 19 March, 2020; originally announced March 2020.

    Comments: 4 pages, 5 figures