Skip to main content

Showing 1–3 of 3 results for author: Ferrand, É L

.
  1. arXiv:2106.06160  [pdf, other

    cs.CL cs.SD eess.AS

    Spoken Term Detection Methods for Sparse Transcription in Very Low-resource Settings

    Authors: Éric Le Ferrand, Steven Bird, Laurent Besacier

    Abstract: We investigate the efficiency of two very different spoken term detection approaches for transcription when the available data is insufficient to train a robust ASR system. This work is grounded in very low-resource language documentation scenario where only few minutes of recording have been transcribed for a given language so far.Experiments on two oral languages show that a pretrained universal… ▽ More

    Submitted 11 June, 2021; originally announced June 2021.

  2. arXiv:2011.06198  [pdf, other

    cs.CL

    Enabling Interactive Transcription in an Indigenous Community

    Authors: Éric Le Ferrand, Steven Bird, Laurent Besacier

    Abstract: We propose a novel transcription workflow which combines spoken term detection and human-in-the-loop, together with a pilot experiment. This work is grounded in an almost zero-resource scenario where only a few terms have so far been identified, involving two endangered languages. We show that in the early stages of transcription, when the available data is insufficient to train a robust ASR syste… ▽ More

    Submitted 11 November, 2020; originally announced November 2020.

    Comments: inproceedings Coling 2020

  3. arXiv:1907.12895  [pdf, other

    cs.CL

    MaSS: A Large and Clean Multilingual Corpus of Sentence-aligned Spoken Utterances Extracted from the Bible

    Authors: Marcely Zanon Boito, William N. Havard, Mahault Garnerin, Éric Le Ferrand, Laurent Besacier

    Abstract: The CMU Wilderness Multilingual Speech Dataset (Black, 2019) is a newly published multilingual speech dataset based on recorded readings of the New Testament. It provides data to build Automatic Speech Recognition (ASR) and Text-to-Speech (TTS) models for potentially 700 languages. However, the fact that the source content (the Bible) is the same for all the languages is not exploited to date.Ther… ▽ More

    Submitted 26 February, 2020; v1 submitted 30 July, 2019; originally announced July 2019.

    Comments: Accepted to LREC2020