Skip to main content

Showing 1–6 of 6 results for author: Godard, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:1910.08418  [pdf, other

    cs.CL

    Controlling Utterance Length in NMT-based Word Segmentation with Attention

    Authors: Pierre Godard, Laurent Besacier, Francois Yvon

    Abstract: One of the basic tasks of computational language documentation (CLD) is to identify word boundaries in an unsegmented phonemic stream. While several unsupervised monolingual word segmentation algorithms exist in the literature, they are challenged in real-world CLD settings by the small amount of available data. A possible remedy is to take advantage of glosses or translation in a foreign, well-re… ▽ More

    Submitted 18 October, 2019; originally announced October 2019.

    Comments: Accepted to IWSLT 2019 (Hong-Kong)

  2. arXiv:1806.06734  [pdf, other

    cs.CL cs.AI

    Unsupervised Word Segmentation from Speech with Attention

    Authors: Pierre Godard, Marcely Zanon-Boito, Lucas Ondel, Alexandre Berard, François Yvon, Aline Villavicencio, Laurent Besacier

    Abstract: We present a first attempt to perform attentional word segmentation directly from the speech signal, with the final goal to automatically identify lexical units in a low-resource, unwritten language (UL). Our methodology assumes a pairing between recordings in the UL with translations in a well-resourced language. It uses Acoustic Unit Discovery (AUD) to convert speech into a sequence of pseudo-ph… ▽ More

    Submitted 18 June, 2018; originally announced June 2018.

    Comments: Interspeech 2018

  3. arXiv:1803.00188  [pdf, ps, other

    cs.CL

    XNMT: The eXtensible Neural Machine Translation Toolkit

    Authors: Graham Neubig, Matthias Sperber, Xinyi Wang, Matthieu Felix, Austin Matthews, Sarguna Padmanabhan, Ye Qi, Devendra Singh Sachan, Philip Arthur, Pierre Godard, John Hewitt, Rachid Riad, Liming Wang

    Abstract: This paper describes XNMT, the eXtensible Neural Machine Translation toolkit. XNMT distin- guishes itself from other open-source NMT toolkits by its focus on modular code design, with the purpose of enabling fast iteration in research and replicable, reliable results. In this paper we describe the design of XNMT and its experiment configuration system, and demonstrate its utility on the tasks of m… ▽ More

    Submitted 28 February, 2018; originally announced March 2018.

    Comments: To be presented at AMTA 2018 Open Source Software Showcase

  4. arXiv:1802.06053  [pdf, ps, other

    cs.CL

    Bayesian Models for Unit Discovery on a Very Low Resource Language

    Authors: Lucas Ondel, Pierre Godard, Laurent Besacier, Elin Larsen, Mark Hasegawa-Johnson, Odette Scharenborg, Emmanuel Dupoux, Lukas Burget, François Yvon, Sanjeev Khudanpur

    Abstract: Develo** speech technologies for low-resource languages has become a very active research field over the last decade. Among others, Bayesian models have shown some promising results on artificial examples but still lack of in situ experiments. Our work applies state-of-the-art Bayesian models to unsupervised Acoustic Unit Discovery (AUD) in a real low-resource language scenario. We also show tha… ▽ More

    Submitted 20 February, 2018; v1 submitted 16 February, 2018; originally announced February 2018.

    Comments: Accepted to ICASSP 2018

  5. arXiv:1802.05092  [pdf, other

    cs.CL

    Linguistic unit discovery from multi-modal inputs in unwritten languages: Summary of the "Speaking Rosetta" JSALT 2017 Workshop

    Authors: Odette Scharenborg, Laurent Besacier, Alan Black, Mark Hasegawa-Johnson, Florian Metze, Graham Neubig, Sebastian Stueker, Pierre Godard, Markus Mueller, Lucas Ondel, Shruti Palaskar, Philip Arthur, Francesco Ciannella, Mingxing Du, Elin Larsen, Danny Merkx, Rachid Riad, Liming Wang, Emmanuel Dupoux

    Abstract: We summarize the accomplishments of a multi-disciplinary workshop exploring the computational and scientific issues surrounding the discovery of linguistic units (subwords and words) in a language without orthography. We study the replacement of orthographic transcriptions by images and/or translated text in a well-resourced language to help unsupervised discovery from raw speech.

    Submitted 14 February, 2018; originally announced February 2018.

    Comments: Accepted to ICASSP 2018

  6. arXiv:1710.03501  [pdf, ps, other

    cs.CL

    A Very Low Resource Language Speech Corpus for Computational Language Documentation Experiments

    Authors: P. Godard, G. Adda, M. Adda-Decker, J. Benjumea, L. Besacier, J. Cooper-Leavitt, G-N. Kouarata, L. Lamel, H. Maynard, M. Mueller, A. Rialland, S. Stueker, F. Yvon, M. Zanon-Boito

    Abstract: Most speech and language technologies are trained with massive amounts of speech and text information. However, most of the world languages do not have such resources or stable orthography. Systems constructed under these almost zero resource conditions are not only promising for speech technology but also for computational language documentation. The goal of computational language documentation i… ▽ More

    Submitted 15 February, 2018; v1 submitted 10 October, 2017; originally announced October 2017.

    Comments: accepted to LREC 2018