Skip to main content

Showing 1–17 of 17 results for author: Orife, I

Searching in archive cs. Search in all archives.
.
  1. arXiv:2309.02539  [pdf, other

    eess.AS cs.LG cs.SD eess.SP

    A Generalized Bandsplit Neural Network for Cinematic Audio Source Separation

    Authors: Karn N. Watcharasupat, Chih-Wei Wu, Yiwei Ding, Iroro Orife, Aaron J. Hipple, Phillip A. Williams, Scott Kramer, Alexander Lerch, William Wolcott

    Abstract: Cinematic audio source separation is a relatively new subtask of audio source separation, with the aim of extracting the dialogue, music, and effects stems from their mixture. In this work, we developed a model generalizing the Bandsplit RNN for any complete or overcomplete partitions of the frequency axis. Psychoacoustically motivated frequency scales were used to inform the band definitions whic… ▽ More

    Submitted 1 December, 2023; v1 submitted 5 September, 2023; originally announced September 2023.

    Comments: Accepted to the IEEE Open Journal of Signal Processing (ICASSP 2024 Track)

  2. arXiv:2307.16071  [pdf, other

    cs.CL cs.SD eess.AS

    ÌròyìnSpeech: A multi-purpose Yorùbá Speech Corpus

    Authors: Tolulope Ogunremi, Kola Tubosun, Anuoluwapo Aremu, Iroro Orife, David Ifeoluwa Adelani

    Abstract: We introduce ÌròyìnSpeech, a new corpus influenced by the desire to increase the amount of high quality, contemporary Yorùbá speech data, which can be used for both Text-to-Speech (TTS) and Automatic Speech Recognition (ASR) tasks. We curated about 23000 text sentences from news and creative writing domains with the open license CC-BY-4.0. To encourage a participatory approach to data creation, we… ▽ More

    Submitted 27 March, 2024; v1 submitted 29 July, 2023; originally announced July 2023.

    Comments: Accepted to LREC-COLING 2024

  3. arXiv:2304.05600  [pdf, other

    cs.SD cs.CV cs.LG cs.MM eess.AS

    Looking Similar, Sounding Different: Leveraging Counterfactual Cross-Modal Pairs for Audiovisual Representation Learning

    Authors: Nikhil Singh, Chih-Wei Wu, Iroro Orife, Mahdi Kalayeh

    Abstract: Audiovisual representation learning typically relies on the correspondence between sight and sound. However, there are often multiple audio tracks that can correspond with a visual scene. Consider, for example, different conversations on the same crowded street. The effect of such counterfactual pairs on audiovisual representation learning has not been previously explored. To investigate this, we… ▽ More

    Submitted 8 June, 2024; v1 submitted 12 April, 2023; originally announced April 2023.

    Comments: Accepted to CVPR 2024

  4. arXiv:2207.03546  [pdf, other

    eess.AS cs.CL cs.SD

    BibleTTS: a large, high-fidelity, multilingual, and uniquely African speech corpus

    Authors: Josh Meyer, David Ifeoluwa Adelani, Edresson Casanova, Alp Öktem, Daniel Whitenack Julian Weber, Salomon Kabongo, Elizabeth Salesky, Iroro Orife, Colin Leong, Perez Ogayo, Chris Emezue, Jonathan Mukiibi, Salomey Osei, Apelete Agbolo, Victor Akinode, Bernard Opoku, Samuel Olanrewaju, Jesujoba Alabi, Shamsuddeen Muhammad

    Abstract: BibleTTS is a large, high-quality, open speech dataset for ten languages spoken in Sub-Saharan Africa. The corpus contains up to 86 hours of aligned, studio quality 48kHz single speaker recordings per language, enabling the development of high-quality text-to-speech models. The ten languages represented are: Akuapem Twi, Asante Twi, Chichewa, Ewe, Hausa, Kikuyu, Lingala, Luganda, Luo, and Yoruba.… ▽ More

    Submitted 7 July, 2022; originally announced July 2022.

    Comments: Accepted to INTERSPEECH 2022

  5. arXiv:2112.06199  [pdf, other

    cs.CL cs.SD eess.AS

    Learning Nigerian accent embeddings from speech: preliminary results based on SautiDB-Naija corpus

    Authors: Tejumade Afonja, Oladimeji Mudele, Iroro Orife, Kenechi Dukor, Lawrence Francis, Duru Goodness, Oluwafemi Azeez, Ademola Malomo, Clinton Mbataku

    Abstract: This paper describes foundational efforts with SautiDB-Naija, a novel corpus of non-native (L2) Nigerian English speech. We describe how the corpus was created and curated as well as preliminary experiments with accent classification and learning Nigerian accent embeddings. The initial version of the corpus includes over 900 recordings from L2 English speakers of Nigerian languages, such as Yoruba… ▽ More

    Submitted 12 December, 2021; originally announced December 2021.

  6. arXiv:2111.01320  [pdf, other

    eess.AS cs.SD

    AVASpeech-SMAD: A Strongly Labelled Speech and Music Activity Detection Dataset with Label Co-Occurrence

    Authors: Yun-Ning Hung, Karn N. Watcharasupat, Chih-Wei Wu, Iroro Orife, Kelian Li, Pavan Seshadri, Junyoung Lee

    Abstract: We propose a dataset, AVASpeech-SMAD, to assist speech and music activity detection research. With frame-level music labels, the proposed dataset extends the existing AVASpeech dataset, which originally consists of 45 hours of audio and speech activity labels. To the best of our knowledge, the proposed AVASpeech-SMAD is the first open-source dataset that features strong polyphonic labels for both… ▽ More

    Submitted 1 November, 2021; originally announced November 2021.

  7. Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets

    Authors: Julia Kreutzer, Isaac Caswell, Lisa Wang, Ahsan Wahab, Daan van Esch, Nasanbayar Ulzii-Orshikh, Allahsera Tapo, Nishant Subramani, Artem Sokolov, Claytone Sikasote, Monang Setyawan, Supheakmungkol Sarin, Sokhar Samb, Benoît Sagot, Clara Rivera, Annette Rios, Isabel Papadimitriou, Salomey Osei, Pedro Ortiz Suarez, Iroro Orife, Kelechi Ogueji, Andre Niyongabo Rubungo, Toan Q. Nguyen, Mathias Müller, André Müller , et al. (27 additional authors not shown)

    Abstract: With the success of large-scale pre-training and multilingual modeling in Natural Language Processing (NLP), recent years have seen a proliferation of large, web-mined text datasets covering hundreds of languages. We manually audit the quality of 205 language-specific corpora released with five major public datasets (CCAligned, ParaCrawl, WikiMatrix, OSCAR, mC4). Lower-resource corpora have system… ▽ More

    Submitted 21 February, 2022; v1 submitted 22 March, 2021; originally announced March 2021.

    Comments: Accepted at TACL; pre-MIT Press publication version

    Journal ref: Transactions of the Association for Computational Linguistics (2022) 10: 50-72

  8. arXiv:2103.11811  [pdf

    cs.CL cs.AI

    MasakhaNER: Named Entity Recognition for African Languages

    Authors: David Ifeoluwa Adelani, Jade Abbott, Graham Neubig, Daniel D'souza, Julia Kreutzer, Constantine Lignos, Chester Palen-Michel, Happy Buzaaba, Shruti Rijhwani, Sebastian Ruder, Stephen Mayhew, Israel Abebe Azime, Shamsuddeen Muhammad, Chris Chinenye Emezue, Joyce Nakatumba-Nabende, Perez Ogayo, Anuoluwapo Aremu, Catherine Gitau, Derguene Mbaye, Jesujoba Alabi, Seid Muhie Yimam, Tajuddeen Gwadabe, Ignatius Ezeani, Rubungo Andre Niyongabo, Jonathan Mukiibi , et al. (36 additional authors not shown)

    Abstract: We take a step towards addressing the under-representation of the African continent in NLP research by creating the first large publicly available high-quality dataset for named entity recognition (NER) in ten African languages, bringing together a variety of stakeholders. We detail characteristics of the languages to help researchers understand the challenges that these languages pose for NER. We… ▽ More

    Submitted 5 July, 2021; v1 submitted 22 March, 2021; originally announced March 2021.

    Comments: Accepted to TACL 2021, pre-MIT Press publication version

  9. arXiv:2010.02353  [pdf, other

    cs.CL cs.AI cs.LG

    Participatory Research for Low-resourced Machine Translation: A Case Study in African Languages

    Authors: Wilhelmina Nekoto, Vukosi Marivate, Tshinondiwa Matsila, Timi Fasubaa, Tajudeen Kolawole, Taiwo Fagbohungbe, Solomon Oluwole Akinola, Shamsuddeen Hassan Muhammad, Salomon Kabongo, Salomey Osei, Sackey Freshia, Rubungo Andre Niyongabo, Ricky Macharm, Perez Ogayo, Orevaoghene Ahia, Musie Meressa, Mofe Adeyemi, Masabata Mokgesi-Selinga, Lawrence Okegbemi, Laura Jane Martinus, Kolawole Tajudeen, Kevin Degila, Kelechi Ogueji, Kathleen Siminyu, Julia Kreutzer , et al. (23 additional authors not shown)

    Abstract: Research in NLP lacks geographic diversity, and the question of how NLP can be scaled to low-resourced languages has not yet been adequately solved. "Low-resourced"-ness is a complex problem going beyond data availability and reflects systemic problems in society. In this paper, we focus on the task of Machine Translation (MT), that plays a crucial role for information accessibility and communicat… ▽ More

    Submitted 6 November, 2020; v1 submitted 5 October, 2020; originally announced October 2020.

    Comments: Findings of EMNLP 2020; updated benchmarks

  10. arXiv:2003.11529  [pdf, other

    cs.CL

    Masakhane -- Machine Translation For Africa

    Authors: Iroro Orife, Julia Kreutzer, Blessing Sibanda, Daniel Whitenack, Kathleen Siminyu, Laura Martinus, Jamiil Toure Ali, Jade Abbott, Vukosi Marivate, Salomon Kabongo, Musie Meressa, Espoir Murhabazi, Orevaoghene Ahia, Elan van Biljon, Arshath Ramkilowan, Adewale Akinfaderin, Alp Öktem, Wole Akin, Ghollah Kioko, Kevin Degila, Herman Kamper, Bonaventure Dossou, Chris Emezue, Kelechi Ogueji, Abdallah Bashir

    Abstract: Africa has over 2000 languages. Despite this, African languages account for a small portion of available resources and publications in Natural Language Processing (NLP). This is due to multiple factors, including: a lack of focus from government and funding, discoverability, a lack of community, sheer language complexity, difficulty in reproducing papers and no benchmarks to compare techniques. To… ▽ More

    Submitted 13 March, 2020; originally announced March 2020.

    Comments: Accepted for the AfricaNLP Workshop, ICLR 2020

  11. arXiv:2003.10704  [pdf, ps, other

    cs.CL

    Towards Neural Machine Translation for Edoid Languages

    Authors: Iroro Orife

    Abstract: Many Nigerian languages have relinquished their previous prestige and purpose in modern society to English and Nigerian Pidgin. For the millions of L1 speakers of indigenous languages, there are inequalities that manifest themselves as unequal access to information, communications, health care, security as well as attenuated participation in political and civic life. To minimize exclusion and prom… ▽ More

    Submitted 24 March, 2020; originally announced March 2020.

    Comments: Accepted to ICLR 2020 AfricaNLP workshop

  12. arXiv:2003.10564  [pdf, ps, other

    cs.CL

    Improving Yorùbá Diacritic Restoration

    Authors: Iroro Orife, David I. Adelani, Timi Fasubaa, Victor Williamson, Wuraola Fisayo Oyewusi, Olamilekan Wahab, Kola Tubosun

    Abstract: Yorùbá is a widely spoken West African language with a writing system rich in orthographic and tonal diacritics. They provide morphological information, are crucial for lexical disambiguation, pronunciation and are vital for any computational Speech or Natural Language Processing tasks. However diacritic marks are commonly excluded from electronic texts due to limited device and application suppor… ▽ More

    Submitted 23 March, 2020; originally announced March 2020.

    Comments: Accepted to ICLR 2020 AfricaNLP workshop

  13. arXiv:1811.04139  [pdf, other

    cs.SD eess.AS

    Audio Spectrogram Factorization for Classification of Telephony Signals below the Auditory Threshold

    Authors: Iroro Orife, Shane Walker, Jason Flaks

    Abstract: Traffic Pum** attacks are a form of high-volume SPAM that target telephone networks, defraud customers and squander telephony resources. One type of call in these attacks is characterized by very low-amplitude signal levels, notably below the auditory threshold. We propose a technique to classify so-called "dead air" or "silent" SPAM calls based on features derived from factorizing the caller au… ▽ More

    Submitted 9 November, 2018; originally announced November 2018.

    Comments: 7 pages, 4 figures. Marchex Technical Report on VoIP SPAM classification

  14. arXiv:1811.02058  [pdf, other

    cs.CL

    The Marchex 2018 English Conversational Telephone Speech Recognition System

    Authors: Seongjun Hahm, Iroro Orife, Shane Walker, Jason Flaks

    Abstract: In this paper, we describe recent performance improvements to the production Marchex speech recognition system for our spontaneous customer-to-business telephone conversations. In our previous work, we focused on in-domain language and acoustic model training. In this work we employ state-of-the-art semi-supervised lattice-free maximum mutual information (LF-MMI) training process which can supervi… ▽ More

    Submitted 1 May, 2019; v1 submitted 5 November, 2018; originally announced November 2018.

    Comments: 5 pages, 1 figure. Submitted to INTERSPEECH 2019

  15. arXiv:1804.00832  [pdf, other

    cs.CL

    Attentive Sequence-to-Sequence Learning for Diacritic Restoration of Yorùbá Language Text

    Authors: Iroro Orife

    Abstract: Yorùbá is a widely spoken West African language with a writing system rich in tonal and orthographic diacritics. With very few exceptions, diacritics are omitted from electronic texts, due to limited device and application support. Diacritics provide morphological information, are crucial for lexical disambiguation, pronunciation and are vital for any Yorùbá text-to-speech (TTS), automatic speech… ▽ More

    Submitted 29 October, 2018; v1 submitted 3 April, 2018; originally announced April 2018.

    Comments: 6 pages, 3 figures. Interspeech 2018 preprint with extra figures and reviewer comments addressed

  16. arXiv:1705.09724  [pdf, other

    cs.CL

    Semi-Supervised Model Training for Unbounded Conversational Speech Recognition

    Authors: Shane Walker, Morten Pedersen, Iroro Orife, Jason Flaks

    Abstract: For conversational large-vocabulary continuous speech recognition (LVCSR) tasks, up to about two thousand hours of audio is commonly used to train state of the art models. Collection of labeled conversational audio however, is prohibitively expensive, laborious and error-prone. Furthermore, academic corpora like Fisher English (2004) or Switchboard (1992) are inadequate to train models with suffic… ▽ More

    Submitted 26 May, 2017; originally announced May 2017.

  17. arXiv:1705.04792  [pdf, other

    cs.SD

    Riddim: A Rhythm Analysis and Decomposition Tool Based On Independent Subspace Analysis

    Authors: Iroro Orife

    Abstract: The goal of this thesis was to implement a tool that, given a digital audio input, can extract and represent rhythm and musical time. The purpose of the tool is to help develop better models of rhythm for real-time computer based performance and composition. This analysis tool, Riddim, uses Independent Subspace Analysis (ISA) and a robust onset detection scheme to separate and detect salient rhyth… ▽ More

    Submitted 13 May, 2017; originally announced May 2017.