Skip to main content

Showing 1–4 of 4 results for author: Ogun, S

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.12387  [pdf, other

    eess.AS cs.CL cs.SD

    Performant ASR Models for Medical Entities in Accented Speech

    Authors: Tejumade Afonja, Tobi Olatunji, Sewade Ogun, Naome A. Etori, Abraham Owodunni, Moshood Yekini

    Abstract: Recent strides in automatic speech recognition (ASR) have accelerated their application in the medical domain where their performance on accented medical named entities (NE) such as drug names, diagnoses, and lab results, is largely unknown. We rigorously evaluate multiple ASR models on a clinical English dataset of 93 African accents. Our analysis reveals that despite some models achieving low ov… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: Accepted at Interspeech 2024

  2. arXiv:2406.11727  [pdf, ps, other

    eess.AS cs.CL

    1000 African Voices: Advancing inclusive multi-speaker multi-accent speech synthesis

    Authors: Sewade Ogun, Abraham T. Owodunni, Tobi Olatunji, Eniola Alese, Babatunde Oladimeji, Tejumade Afonja, Kayode Olaleye, Naome A. Etori, Tosin Adewumi

    Abstract: Recent advances in speech synthesis have enabled many useful applications like audio directions in Google Maps, screen readers, and automated content generation on platforms like TikTok. However, these systems are mostly dominated by voices sourced from data-rich geographies with personas representative of their source data. Although 3000 of the world's languages are domiciled in Africa, African v… ▽ More

    Submitted 27 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: Accepted at Interspeech 2024

  3. arXiv:2305.17724  [pdf, other

    eess.AS cs.SD

    Stochastic Pitch Prediction Improves the Diversity and Naturalness of Speech in Glow-TTS

    Authors: Sewade Ogun, Vincent Colotte, Emmanuel Vincent

    Abstract: Flow-based generative models are widely used in text-to-speech (TTS) systems to learn the distribution of audio features (e.g., Mel-spectrograms) given the input tokens and to sample from this distribution to generate diverse utterances. However, in the zero-shot multi-speaker TTS scenario, the generated utterances lack diversity and naturalness. In this paper, we propose to improve the diversity… ▽ More

    Submitted 28 May, 2023; originally announced May 2023.

    Comments: 5 pages with 3 figures, InterSpeech 2023

  4. arXiv:2210.06370  [pdf, other

    eess.AS cs.SD

    Can we use Common Voice to train a Multi-Speaker TTS system?

    Authors: Sewade Ogun, Vincent Colotte, Emmanuel Vincent

    Abstract: Training of multi-speaker text-to-speech (TTS) systems relies on curated datasets based on high-quality recordings or audiobooks. Such datasets often lack speaker diversity and are expensive to collect. As an alternative, recent studies have leveraged the availability of large, crowdsourced automatic speech recognition (ASR) datasets. A major problem with such datasets is the presence of noisy and… ▽ More

    Submitted 12 October, 2022; originally announced October 2022.

    Comments: To appear in Proc. SLT 2022, Jan 09-12, 2023, Doha, Qatar