Skip to main content

Showing 1–6 of 6 results for author: Olaleye, K

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.11727  [pdf, ps, other

    eess.AS cs.CL

    1000 African Voices: Advancing inclusive multi-speaker multi-accent speech synthesis

    Authors: Sewade Ogun, Abraham T. Owodunni, Tobi Olatunji, Eniola Alese, Babatunde Oladimeji, Tejumade Afonja, Kayode Olaleye, Naome A. Etori, Tosin Adewumi

    Abstract: Recent advances in speech synthesis have enabled many useful applications like audio directions in Google Maps, screen readers, and automated content generation on platforms like TikTok. However, these systems are mostly dominated by voices sourced from data-rich geographies with personas representative of their source data. Although 3000 of the world's languages are domiciled in Africa, African v… ▽ More

    Submitted 27 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: Accepted at Interspeech 2024

  2. arXiv:2302.00765  [pdf, other

    cs.CL cs.SD eess.AS

    Visually Grounded Keyword Detection and Localisation for Low-Resource Languages

    Authors: Kayode Kolawole Olaleye

    Abstract: This study investigates the use of Visually Grounded Speech (VGS) models for keyword localisation in speech. The study focusses on two main research questions: (1) Is keyword localisation possible with VGS models and (2) Can keyword localisation be done cross-lingually in a real low-resource setting? Four methods for localisation are proposed and evaluated on an English dataset, with the best-perf… ▽ More

    Submitted 1 February, 2023; originally announced February 2023.

    Comments: PhD dissertation, University of Stellenbosch, 108 pages, submitted and accepted 2023

  3. arXiv:2210.04600  [pdf, other

    cs.CL eess.AS

    YFACC: A Yorùbá speech-image dataset for cross-lingual keyword localisation through visual grounding

    Authors: Kayode Olaleye, Dan Oneata, Herman Kamper

    Abstract: Visually grounded speech (VGS) models are trained on images paired with unlabelled spoken captions. Such models could be used to build speech systems in settings where it is impossible to get labelled data, e.g. for documenting unwritten languages. However, most VGS studies are in English or other high-resource languages. This paper attempts to address this shortcoming. We collect and release a ne… ▽ More

    Submitted 12 October, 2022; v1 submitted 10 October, 2022; originally announced October 2022.

    Comments: Accepted to IEEE SLT 2022

  4. arXiv:2202.01107  [pdf, other

    cs.CL cs.SD eess.AS

    Keyword localisation in untranscribed speech using visually grounded speech models

    Authors: Kayode Olaleye, Dan Oneata, Herman Kamper

    Abstract: Keyword localisation is the task of finding where in a speech utterance a given query keyword occurs. We investigate to what extent keyword localisation is possible using a visually grounded speech (VGS) model. VGS models are trained on unlabelled images paired with spoken captions. These models are therefore self-supervised -- trained without any explicit textual label or location information. To… ▽ More

    Submitted 2 February, 2022; originally announced February 2022.

    Comments: 10 figures, 5 tables

  5. arXiv:2106.08859  [pdf, other

    cs.CL cs.SD eess.AS

    Attention-Based Keyword Localisation in Speech using Visual Grounding

    Authors: Kayode Olaleye, Herman Kamper

    Abstract: Visually grounded speech models learn from images paired with spoken captions. By tagging images with soft text labels using a trained visual classifier with a fixed vocabulary, previous work has shown that it is possible to train a model that can detect whether a particular text keyword occurs in speech utterances or not. Here we investigate whether visually grounded speech models can also do key… ▽ More

    Submitted 23 June, 2021; v1 submitted 16 June, 2021; originally announced June 2021.

    Comments: Accepted to Interspeech 2021

  6. arXiv:2012.07396  [pdf, other

    cs.CL eess.AS

    Towards localisation of keywords in speech using weak supervision

    Authors: Kayode Olaleye, Benjamin van Niekerk, Herman Kamper

    Abstract: Developments in weakly supervised and self-supervised models could enable speech technology in low-resource settings where full transcriptions are not available. We consider whether keyword localisation is possible using two forms of weak supervision where location information is not provided explicitly. In the first, only the presence or absence of a word is indicated, i.e. a bag-of-words (BoW) l… ▽ More

    Submitted 14 December, 2020; originally announced December 2020.

    Comments: Accepted to NeurIPS-SAS