Skip to main content

Showing 1–50 of 75 results for author: Besacier, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.06371  [pdf, other

    cs.CL cs.SD eess.AS

    mHuBERT-147: A Compact Multilingual HuBERT Model

    Authors: Marcely Zanon Boito, Vivek Iyer, Nikolaos Lagos, Laurent Besacier, Ioan Calapodescu

    Abstract: We present mHuBERT-147, the first general-purpose massively multilingual HuBERT speech representation model trained on 90K hours of clean, open-license data. To scale up the multi-iteration HuBERT approach, we use faiss-based clustering, achieving 5.2x faster label assignment than the original method. We also apply a new multilingual batching up-sampling strategy, leveraging both language and data… ▽ More

    Submitted 27 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

    Comments: Extended version of the Interspeech 2024 paper of same name

  2. arXiv:2403.20262  [pdf, other

    cs.CL cs.AI cs.LG

    ELITR-Bench: A Meeting Assistant Benchmark for Long-Context Language Models

    Authors: Thibaut Thonet, Jos Rozen, Laurent Besacier

    Abstract: Research on Large Language Models (LLMs) has recently witnessed an increasing interest in extending models' context size to better capture dependencies within long documents. While benchmarks have been proposed to assess long-range abilities, existing efforts primarily considered generic tasks that are not necessarily aligned with real-world applications. In contrast, our work proposes a new bench… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

  3. arXiv:2309.05472  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    LeBenchmark 2.0: a Standardized, Replicable and Enhanced Framework for Self-supervised Representations of French Speech

    Authors: Titouan Parcollet, Ha Nguyen, Solene Evain, Marcely Zanon Boito, Adrien Pupier, Salima Mdhaffar, Hang Le, Sina Alisamir, Natalia Tomashenko, Marco Dinarelli, Shucong Zhang, Alexandre Allauzen, Maximin Coavoux, Yannick Esteve, Mickael Rouvier, Jerome Goulian, Benjamin Lecouteux, Francois Portet, Solange Rossato, Fabien Ringeval, Didier Schwab, Laurent Besacier

    Abstract: Self-supervised learning (SSL) is at the origin of unprecedented improvements in many different domains including computer vision and natural language processing. Speech processing drastically benefitted from SSL as most of the current domain-related tasks are now being approached with pre-trained models. This work introduces LeBenchmark 2.0 an open-source framework for assessing and building SSL-… ▽ More

    Submitted 18 March, 2024; v1 submitted 11 September, 2023; originally announced September 2023.

    Comments: Published in Computer Science and Language. Preprint allowed

  4. arXiv:2302.06459  [pdf, other

    cs.CL

    Encoding Sentence Position in Context-Aware Neural Machine Translation with Concatenation

    Authors: Lorenzo Lupo, Marco Dinarelli, Laurent Besacier

    Abstract: Context-aware translation can be achieved by processing a concatenation of consecutive sentences with the standard Transformer architecture. This paper investigates the intuitive idea of providing the model with explicit information about the position of the sentences contained in the concatenation window. We compare various methods to encode sentence positions into token representations, includin… ▽ More

    Submitted 4 April, 2023; v1 submitted 13 February, 2023; originally announced February 2023.

    Comments: Insights2023 camera-ready

  5. arXiv:2210.13388  [pdf, other

    cs.CL

    Focused Concatenation for Context-Aware Neural Machine Translation

    Authors: Lorenzo Lupo, Marco Dinarelli, Laurent Besacier

    Abstract: A straightforward approach to context-aware neural machine translation consists in feeding the standard encoder-decoder architecture with a window of consecutive sentences, formed by the current sentence and a number of sentences from its context concatenated to it. In this work, we propose an improved concatenation approach that encourages the model to focus on the translation of the current sent… ▽ More

    Submitted 24 October, 2022; originally announced October 2022.

    Comments: WMT 2022 (camera ready)

  6. arXiv:2210.11835  [pdf, other

    cs.CL cs.SD eess.AS

    A Textless Metric for Speech-to-Speech Comparison

    Authors: Laurent Besacier, Swen Ribeiro, Olivier Galibert, Ioan Calapodescu

    Abstract: In this paper, we introduce a new and simple method for comparing speech utterances without relying on text transcripts. Our speech-to-speech comparison metric utilizes state-of-the-art speech2unit encoders like HuBERT to convert speech utterances into discrete acoustic units. We then propose a simple and easily replicable neural architecture that learns a speech-based metric that closely correspo… ▽ More

    Submitted 20 July, 2023; v1 submitted 21 October, 2022; originally announced October 2022.

    Comments: link to supplementary material: https://github.com/besacier/textless-metric

  7. arXiv:2210.11621  [pdf, other

    cs.CL cs.AI cs.LG

    SMaLL-100: Introducing Shallow Multilingual Machine Translation Model for Low-Resource Languages

    Authors: Alireza Mohammadshahi, Vassilina Nikoulina, Alexandre Berard, Caroline Brun, James Henderson, Laurent Besacier

    Abstract: In recent years, multilingual machine translation models have achieved promising performance on low-resource language pairs by sharing information between similar languages, thus enabling zero-shot translation. To overcome the "curse of multilinguality", these models often opt for scaling up the number of parameters, which makes their use in resource-constrained environments challenging. We introd… ▽ More

    Submitted 20 October, 2022; originally announced October 2022.

    Comments: Accepted to EMNLP 2022

    Journal ref: https://aclanthology.org/2022.emnlp-main.571

  8. arXiv:2207.01893  [pdf, other

    cs.CL

    ASR-Generated Text for Language Model Pre-training Applied to Speech Tasks

    Authors: Valentin Pelloin, Franck Dary, Nicolas Herve, Benoit Favre, Nathalie Camelin, Antoine Laurent, Laurent Besacier

    Abstract: We aim at improving spoken language modeling (LM) using very large amount of automatically transcribed speech. We leverage the INA (French National Audiovisual Institute) collection and obtain 19GB of text after applying ASR on 350,000 hours of diverse TV shows. From this, spoken language models are trained either by fine-tuning an existing LM (FlauBERT) or through training a LM from scratch. New… ▽ More

    Submitted 5 July, 2022; originally announced July 2022.

    Comments: Interspeech 2022 (Camera Ready)

  9. arXiv:2207.01718  [pdf, other

    cs.CL eess.AS

    BERT, can HE predict contrastive focus? Predicting and controlling prominence in neural TTS using a language model

    Authors: Brooke Stephenson, Laurent Besacier, Laurent Girin, Thomas Hueber

    Abstract: Several recent studies have tested the use of transformer language model representations to infer prosodic features for text-to-speech synthesis (TTS). While these studies have explored prosody in general, in this work, we look specifically at the prediction of contrastive focus on personal pronouns. This is a particularly challenging task as it often requires semantic, discursive and/or pragmatic… ▽ More

    Submitted 4 July, 2022; originally announced July 2022.

    Comments: 5 pages

  10. arXiv:2205.10828  [pdf, other

    cs.CL cs.AI cs.LG

    What Do Compressed Multilingual Machine Translation Models Forget?

    Authors: Alireza Mohammadshahi, Vassilina Nikoulina, Alexandre Berard, Caroline Brun, James Henderson, Laurent Besacier

    Abstract: Recently, very large pre-trained models achieve state-of-the-art results in various natural language processing (NLP) tasks, but their size makes it more challenging to apply them in resource-constrained environments. Compression techniques allow to drastically reduce the size of the models and therefore their inference time with negligible impact on top-tier metrics. However, the general performa… ▽ More

    Submitted 27 June, 2023; v1 submitted 22 May, 2022; originally announced May 2022.

    Comments: Accepted to Findings of EMNLP 2022, presented at WMT 2022

    Journal ref: https://aclanthology.org/2022.findings-emnlp.317/

  11. arXiv:2204.01397  [pdf, ps, other

    cs.CL cs.SD eess.AS

    A Study of Gender Impact in Self-supervised Models for Speech-to-Text Systems

    Authors: Marcely Zanon Boito, Laurent Besacier, Natalia Tomashenko, Yannick Estève

    Abstract: Self-supervised models for speech processing emerged recently as popular foundation blocks in speech processing pipelines. These models are pre-trained on unlabeled audio data and then used in speech processing downstream tasks such as automatic speech recognition (ASR) or speech translation (ST). Since these models are now used in research and industrial systems alike, it becomes necessary to und… ▽ More

    Submitted 5 July, 2022; v1 submitted 4 April, 2022; originally announced April 2022.

    Comments: Accepted to INTERSPEECH 2022 (Special session Inclusive and Fair Speech Technologies)

  12. arXiv:2110.10472  [pdf, other

    cs.CL

    Multilingual Unsupervised Neural Machine Translation with Denoising Adapters

    Authors: Ahmet Üstün, Alexandre Bérard, Laurent Besacier, Matthias Gallé

    Abstract: We consider the problem of multilingual unsupervised machine translation, translating to and from languages that only have monolingual data by using auxiliary parallel language pairs. For this problem the standard procedure so far to leverage the monolingual data is back-translation, which is computationally costly and hard to tune. In this paper we propose instead to use denoising adapters, ada… ▽ More

    Submitted 20 October, 2021; originally announced October 2021.

    Comments: Accepted as a long paper to EMNLP 2021

  13. arXiv:2106.11891  [pdf, other

    cs.CL

    On the Evaluation of Machine Translation for Terminology Consistency

    Authors: Md Mahfuz ibn Alam, Antonios Anastasopoulos, Laurent Besacier, James Cross, Matthias Gallé, Philipp Koehn, Vassilina Nikoulina

    Abstract: As neural machine translation (NMT) systems become an important part of professional translator pipelines, a growing body of work focuses on combining NMT with terminologies. In many scenarios and particularly in cases of domain adaptation, one expects the MT output to adhere to the constraints provided by a terminology. In this work, we propose metrics to measure the consistency of MT output with… ▽ More

    Submitted 24 June, 2021; v1 submitted 22 June, 2021; originally announced June 2021.

    Comments: preprint

  14. arXiv:2106.06160  [pdf, other

    cs.CL cs.SD eess.AS

    Spoken Term Detection Methods for Sparse Transcription in Very Low-resource Settings

    Authors: Éric Le Ferrand, Steven Bird, Laurent Besacier

    Abstract: We investigate the efficiency of two very different spoken term detection approaches for transcription when the available data is insufficient to train a robust ASR system. This work is grounded in very low-resource language documentation scenario where only few minutes of recording have been transcribed for a given language so far.Experiments on two oral languages show that a pretrained universal… ▽ More

    Submitted 11 June, 2021; originally announced June 2021.

  15. arXiv:2106.04298  [pdf, other

    cs.CL cs.SD eess.AS

    Unsupervised Word Segmentation from Discrete Speech Units in Low-Resource Settings

    Authors: Marcely Zanon Boito, Bolaji Yusuf, Lucas Ondel, Aline Villavicencio, Laurent Besacier

    Abstract: Documenting languages helps to prevent the extinction of endangered dialects, many of which are otherwise expected to disappear by the end of the century. When documenting oral languages, unsupervised word segmentation (UWS) from speech is a useful, yet challenging, task. It consists in producing time-stamps for slicing utterances into smaller segments corresponding to words, being performed from… ▽ More

    Submitted 18 May, 2022; v1 submitted 8 June, 2021; originally announced June 2021.

    Comments: Accepted to SIGUL 2022

  16. arXiv:2106.01463  [pdf, other

    cs.CL

    Lightweight Adapter Tuning for Multilingual Speech Translation

    Authors: Hang Le, Juan Pino, Changhan Wang, Jiatao Gu, Didier Schwab, Laurent Besacier

    Abstract: Adapter modules were recently introduced as an efficient alternative to fine-tuning in NLP. Adapter tuning consists in freezing pretrained parameters of a model and injecting lightweight modules between layers, resulting in the addition of only a small number of task-specific trainable parameters. While adapter tuning was investigated for multilingual neural machine translation, this paper propose… ▽ More

    Submitted 12 July, 2021; v1 submitted 2 June, 2021; originally announced June 2021.

    Comments: Accepted at ACL-IJCNLP 2021

  17. arXiv:2105.14940  [pdf, other

    cs.CL cs.AI cs.LG

    Do Multilingual Neural Machine Translation Models Contain Language Pair Specific Attention Heads?

    Authors: Zae Myung Kim, Laurent Besacier, Vassilina Nikoulina, Didier Schwab

    Abstract: Recent studies on the analysis of the multilingual representations focus on identifying whether there is an emergence of language-independent representations, or whether a multilingual model partitions its weights among different languages. While most of such work has been conducted in a "black-box" manner, this paper aims to analyze individual components of a multilingual neural translation (NMT)… ▽ More

    Submitted 31 May, 2021; originally announced May 2021.

    Comments: 10 pages, accepted at Findings of ACL 2021 (short)

  18. arXiv:2104.14470  [pdf, other

    cs.CL cs.SD eess.AS

    Impact of Encoding and Segmentation Strategies on End-to-End Simultaneous Speech Translation

    Authors: Ha Nguyen, Yannick Estève, Laurent Besacier

    Abstract: Boosted by the simultaneous translation shared task at IWSLT 2020, promising end-to-end online speech translation approaches were recently proposed. They consist in incrementally encoding a speech input (in a source language) and decoding the corresponding text (in a target language) with the best possible trade-off between latency and translation quality. This paper investigates two key aspects o… ▽ More

    Submitted 14 June, 2021; v1 submitted 29 April, 2021; originally announced April 2021.

    Comments: Accepted for presentation at Interspeech 2021

  19. LeBenchmark: A Reproducible Framework for Assessing Self-Supervised Representation Learning from Speech

    Authors: Solene Evain, Ha Nguyen, Hang Le, Marcely Zanon Boito, Salima Mdhaffar, Sina Alisamir, Ziyi Tong, Natalia Tomashenko, Marco Dinarelli, Titouan Parcollet, Alexandre Allauzen, Yannick Esteve, Benjamin Lecouteux, Francois Portet, Solange Rossato, Fabien Ringeval, Didier Schwab, Laurent Besacier

    Abstract: Self-Supervised Learning (SSL) using huge unlabeled data has been successfully explored for image and natural language processing. Recent works also investigated SSL from speech. They were notably successful to improve performance on downstream tasks such as automatic speech recognition (ASR). While these works suggest it is possible to reduce dependence on labeled data for building efficient spee… ▽ More

    Submitted 10 June, 2021; v1 submitted 23 April, 2021; originally announced April 2021.

    Comments: Will be presented at Interspeech 2021

    Journal ref: Proc. Interspeech 2021

  20. Divide and Rule: Effective Pre-Training for Context-Aware Multi-Encoder Translation Models

    Authors: Lorenzo Lupo, Marco Dinarelli, Laurent Besacier

    Abstract: Multi-encoder models are a broad family of context-aware neural machine translation systems that aim to improve translation quality by encoding document-level contextual information alongside the current sentence. The context encoding is undertaken by contextual parameters, trained on document-level data. In this work, we discuss the difficulty of training these parameters effectively, due to the… ▽ More

    Submitted 15 March, 2022; v1 submitted 31 March, 2021; originally announced March 2021.

    Comments: ACL 2022 (camera ready)

  21. arXiv:2103.08993  [pdf, other

    cs.SD cs.CL eess.AS

    Fast Development of ASR in African Languages using Self Supervised Speech Representation Learning

    Authors: Jama Hussein Mohamud, Lloyd Acquaye Thompson, Aissatou Ndoye, Laurent Besacier

    Abstract: This paper describes the results of an informal collaboration launched during the African Master of Machine Intelligence (AMMI) in June 2020. After a series of lectures and labs on speech data collection using mobile applications and on self-supervised representation learning from speech, a small group of students and the lecturer continued working on automatic speech recognition (ASR) project for… ▽ More

    Submitted 16 March, 2021; originally announced March 2021.

    Comments: Accepted at AfricaNLP2021 workshop at EACL 2021

  22. arXiv:2103.03233  [pdf, other

    cs.CL

    An Empirical Study of End-to-end Simultaneous Speech Translation Decoding Strategies

    Authors: Ha Nguyen, Yannick Estève, Laurent Besacier

    Abstract: This paper proposes a decoding strategy for end-to-end simultaneous speech translation. We leverage end-to-end models trained in offline mode and conduct an empirical study for two language pairs (English-to-German and English-to-Portuguese). We also investigate different output token granularities including characters and Byte Pair Encoding (BPE) units. The results show that the proposed decoding… ▽ More

    Submitted 4 March, 2021; originally announced March 2021.

    Comments: This paper has been accepted for presentation at IEEE ICASSP 2021

  23. arXiv:2102.09914  [pdf, other

    cs.CL eess.AS

    Alternate Endings: Improving Prosody for Incremental Neural TTS with Predicted Future Text Input

    Authors: Brooke Stephenson, Thomas Hueber, Laurent Girin, Laurent Besacier

    Abstract: The prosody of a spoken word is determined by its surrounding context. In incremental text-to-speech synthesis, where the synthesizer produces an output before it has access to the complete input, the full context is often unknown which can result in a loss of naturalness in the synthesized speech. In this paper, we investigate whether the use of predicted future text can attenuate this loss. We c… ▽ More

    Submitted 15 June, 2021; v1 submitted 19 February, 2021; originally announced February 2021.

    Comments: 4 pages

  24. arXiv:2101.03027  [pdf, other

    cs.CL cs.AI eess.SP

    User-friendly automatic transcription of low-resource languages: Plugging ESPnet into Elpis

    Authors: Oliver Adams, Benjamin Galliot, Guillaume Wisniewski, Nicholas Lambourne, Ben Foley, Rahasya Sanders-Dwyer, Janet Wiles, Alexis Michaud, Séverine Guillaume, Laurent Besacier, Christopher Cox, Katya Aplonova, Guillaume Jacques, Nathan Hill

    Abstract: This paper reports on progress integrating the speech recognition toolkit ESPnet into Elpis, a web front-end originally designed to provide access to the Kaldi automatic speech recognition toolkit. The goal of this work is to make end-to-end speech recognition models available to language workers via a user-friendly graphical interface. Encouraging results are reported on (i) development of an ESP… ▽ More

    Submitted 22 February, 2021; v1 submitted 15 December, 2020; originally announced January 2021.

  25. arXiv:2011.06198  [pdf, other

    cs.CL

    Enabling Interactive Transcription in an Indigenous Community

    Authors: Éric Le Ferrand, Steven Bird, Laurent Besacier

    Abstract: We propose a novel transcription workflow which combines spoken term detection and human-in-the-loop, together with a pilot experiment. This work is grounded in an almost zero-resource scenario where only a few terms have so far been identified, involving two endangered languages. We show that in the early stages of transcription, when the available data is insufficient to train a robust ASR syste… ▽ More

    Submitted 11 November, 2020; originally announced November 2020.

    Comments: inproceedings Coling 2020

  26. arXiv:2011.00747  [pdf, other

    cs.CL cs.SD eess.AS

    Dual-decoder Transformer for Joint Automatic Speech Recognition and Multilingual Speech Translation

    Authors: Hang Le, Juan Pino, Changhan Wang, Jiatao Gu, Didier Schwab, Laurent Besacier

    Abstract: We introduce dual-decoder Transformer, a new model architecture that jointly performs automatic speech recognition (ASR) and multilingual speech translation (ST). Our models are based on the original Transformer architecture (Vaswani et al., 2017) but consist of two decoders, each responsible for one task (ASR or ST). Our major contribution lies in how these decoders interact with each other: one… ▽ More

    Submitted 1 November, 2020; originally announced November 2020.

    Comments: Accepted at COLING 2020 (Oral)

    Journal ref: The 28th International Conference on Computational Linguistics (COLING 2020)

  27. arXiv:2010.05967  [pdf, other

    cs.CL cs.AI

    The Zero Resource Speech Challenge 2020: Discovering discrete subword and word units

    Authors: Ewan Dunbar, Julien Karadayi, Mathieu Bernard, Xuan-Nga Cao, Robin Algayres, Lucas Ondel, Laurent Besacier, Sakriani Sakti, Emmanuel Dupoux

    Abstract: We present the Zero Resource Speech Challenge 2020, which aims at learning speech representations from raw audio signals without any labels. It combines the data sets and metrics from two previous benchmarks (2017 and 2019) and features two tasks which tap into two levels of speech representation. The first task is to discover low bit-rate subword representations that optimize the quality of speec… ▽ More

    Submitted 12 October, 2020; originally announced October 2020.

    Journal ref: Proceedings of Interspeech 2020

  28. arXiv:2009.02035  [pdf, other

    eess.AS cs.CL

    What the Future Brings: Investigating the Impact of Lookahead for Incremental Neural TTS

    Authors: Brooke Stephenson, Laurent Besacier, Laurent Girin, Thomas Hueber

    Abstract: In incremental text to speech synthesis (iTTS), the synthesizer produces an audio output before it has access to the entire input sentence. In this paper, we study the behavior of a neural sequence-to-sequence TTS system when used in an incremental mode, i.e. when generating speech output for token n, the system has access to n + k tokens from the text sequence. We first analyze the impact of this… ▽ More

    Submitted 4 September, 2020; originally announced September 2020.

    Comments: 5 pages, 4 figures

  29. arXiv:2006.08387  [pdf, other

    cs.CL

    Catplayinginthesnow: Impact of Prior Segmentation on a Model of Visually Grounded Speech

    Authors: William N. Havard, Jean-Pierre Chevrot, Laurent Besacier

    Abstract: The language acquisition literature shows that children do not build their lexicon by segmenting the spoken input into phonemes and then building up words from them, but rather adopt a top-down approach and start by segmenting word-like units and then break them down into smaller units. This suggests that the ideal way of learning a language is by starting from full semantic units. In this paper,… ▽ More

    Submitted 20 October, 2020; v1 submitted 15 June, 2020; originally announced June 2020.

    Comments: Accepted at CoNLL20

  30. ConfNet2Seq: Full Length Answer Generation from Spoken Questions

    Authors: Vaishali Pal, Manish Shrivastava, Laurent Besacier

    Abstract: Conversational and task-oriented dialogue systems aim to interact with the user using natural responses through multi-modal interfaces, such as text or speech. These desired responses are in the form of full-length natural answers generated over facts retrieved from a knowledge source. While the task of generating natural answers to questions from an answer span has been widely studied, there has… ▽ More

    Submitted 11 June, 2020; v1 submitted 9 June, 2020; originally announced June 2020.

    Comments: Accepted at Text, Speech and Dialogue, 2020

    Journal ref: ConfNet2Seq, Text, Speech, and Dialogue - 23rd International Conference, {TSD}, Brno, Czech Republic, September 8-11, 2020, Proceedings, 12284, 2020, 524-531 (2020)

  31. arXiv:2006.00814  [pdf, other

    cs.CL

    Online Versus Offline NMT Quality: An In-depth Analysis on English-German and German-English

    Authors: Maha Elbayad, Michael Ustaszewski, Emmanuelle Esperança-Rodier, Francis Brunet Manquat, Jakob Verbeek, Laurent Besacier

    Abstract: We conduct in this work an evaluation study comparing offline and online neural machine translation architectures. Two sequence-to-sequence models: convolutional Pervasive Attention (Elbayad et al. 2018) and attention-based Transformer (Vaswani et al. 2017) are considered. We investigate, for both architectures, the impact of online decoding constraints on the translation quality through a careful… ▽ More

    Submitted 24 November, 2020; v1 submitted 1 June, 2020; originally announced June 2020.

    Comments: Accepted at COLING 2020

  32. arXiv:2005.11861  [pdf, other

    cs.CL eess.AS

    ON-TRAC Consortium for End-to-End and Simultaneous Speech Translation Challenge Tasks at IWSLT 2020

    Authors: Maha Elbayad, Ha Nguyen, Fethi Bougares, Natalia Tomashenko, Antoine Caubrière, Benjamin Lecouteux, Yannick Estève, Laurent Besacier

    Abstract: This paper describes the ON-TRAC Consortium translation systems developed for two challenge tracks featured in the Evaluation Campaign of IWSLT 2020, offline speech translation and simultaneous speech translation. ON-TRAC Consortium is composed of researchers from three French academic laboratories: LIA (Avignon Université), LIG (Université Grenoble Alpes), and LIUM (Le Mans Université). Attention… ▽ More

    Submitted 24 May, 2020; originally announced May 2020.

  33. arXiv:2005.08595  [pdf, other

    cs.CL cs.SD eess.AS

    Efficient Wait-k Models for Simultaneous Machine Translation

    Authors: Maha Elbayad, Laurent Besacier, Jakob Verbeek

    Abstract: Simultaneous machine translation consists in starting output generation before the entire input sequence is available. Wait-k decoders offer a simple but efficient approach for this problem. They first read k source tokens, after which they alternate between producing a target token and reading another source token. We investigate the behavior of wait-k decoding in low resource settings for spoken… ▽ More

    Submitted 3 August, 2020; v1 submitted 18 May, 2020; originally announced May 2020.

    Comments: Accepted at INTERSPEECH 2020

  34. arXiv:2003.13325  [pdf, other

    cs.CL

    Investigating Language Impact in Bilingual Approaches for Computational Language Documentation

    Authors: Marcely Zanon Boito, Aline Villavicencio, Laurent Besacier

    Abstract: For endangered languages, data collection campaigns have to accommodate the challenge that many of them are from oral tradition, and producing transcriptions is costly. Therefore, it is fundamental to translate them into a widely spoken language to ensure interpretability of the recordings. In this paper we investigate how the choice of translation language affects the posterior documentation work… ▽ More

    Submitted 30 March, 2020; originally announced March 2020.

    Comments: Accepted to 1st Joint SLTU and CCURL Workshop

  35. arXiv:2003.08132  [pdf, other

    cs.CL

    Gender Representation in Open Source Speech Resources

    Authors: Mahault Garnerin, Solange Rossato, Laurent Besacier

    Abstract: With the rise of artificial intelligence (AI) and the growing use of deep-learning architectures, the question of ethics, transparency and fairness of AI systems has become a central concern within the research community. We address transparency and fairness in spoken language systems by proposing a study about gender representation in speech resources available through the Open Speech and Languag… ▽ More

    Submitted 18 March, 2020; originally announced March 2020.

    Comments: accepted to LREC2020

  36. arXiv:2002.05955  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    A Data Efficient End-To-End Spoken Language Understanding Architecture

    Authors: Marco Dinarelli, Nikita Kapoor, Bassam Jabaian, Laurent Besacier

    Abstract: End-to-end architectures have been recently proposed for spoken language understanding (SLU) and semantic parsing. Based on a large amount of data, those models learn jointly acoustic and linguistic-sequential features. Such architectures give very good results in the context of domain, intent and slot detection, their application in a more complex semantic chunking and tagging task is less easy.… ▽ More

    Submitted 14 February, 2020; originally announced February 2020.

    Comments: Accepted to ICASSP 2020

  37. arXiv:2002.00768  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Modeling ASR Ambiguity for Dialogue State Tracking Using Word Confusion Networks

    Authors: Vaishali Pal, Fabien Guillot, Manish Shrivastava, Jean-Michel Renders, Laurent Besacier

    Abstract: Spoken dialogue systems typically use a list of top-N ASR hypotheses for inferring the semantic meaning and tracking the state of the dialogue. However ASR graphs, such as confusion networks (confnets), provide a compact representation of a richer hypothesis space than a top-N ASR list. In this paper, we study the benefits of using confusion networks with a state-of-the-art neural dialogue state t… ▽ More

    Submitted 1 August, 2020; v1 submitted 3 February, 2020; originally announced February 2020.

    Comments: Accepted at Interspeech-2020

  38. arXiv:1912.05372  [pdf, ps, other

    cs.CL cs.LG

    FlauBERT: Unsupervised Language Model Pre-training for French

    Authors: Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab

    Abstract: Language models have become a key step to achieve state-of-the art results in many different Natural Language Processing (NLP) tasks. Leveraging the huge amount of unlabeled texts nowadays available, they provide an efficient way to pre-train continuous word representations that can be fine-tuned for a downstream task, along with their contextualization at the sentence level. This has been widely… ▽ More

    Submitted 12 March, 2020; v1 submitted 11 December, 2019; originally announced December 2019.

    Comments: Accepted to LREC 2020

  39. arXiv:1911.04997  [pdf, other

    cs.CL

    Character-based NMT with Transformer

    Authors: Rohit Gupta, Laurent Besacier, Marc Dymetman, Matthias Gallé

    Abstract: Character-based translation has several appealing advantages, but its performance is in general worse than a carefully tuned BPE baseline. In this paper we study the impact of character-based input and output with the Transformer architecture. In particular, our experiments on EN-DE show that character-based Transformer models are more robust than their BPE counterpart, both when translating noisy… ▽ More

    Submitted 12 November, 2019; originally announced November 2019.

  40. arXiv:1911.02898  [pdf, other

    cs.CL

    The LIG system for the English-Czech Text Translation Task of IWSLT 2019

    Authors: Loïc Vial, Benjamin Lecouteux, Didier Schwab, Hang Le, Laurent Besacier

    Abstract: In this paper, we present our submission for the English to Czech Text Translation Task of IWSLT 2019. Our system aims to study how pre-trained language models, used as input embeddings, can improve a specialized machine translation system trained on few data. Therefore, we implemented a Transformer-based encoder-decoder neural system which is able to use the output of a pre-trained language model… ▽ More

    Submitted 7 November, 2019; originally announced November 2019.

    Comments: IWSLT 2019

  41. arXiv:1910.14539  [pdf, other

    cs.CL

    Naver Labs Europe's Systems for the Document-Level Generation and Translation Task at WNGT 2019

    Authors: Fahimeh Saleh, Alexandre Bérard, Ioan Calapodescu, Laurent Besacier

    Abstract: Recently, neural models led to significant improvements in both machine translation (MT) and natural language generation tasks (NLG). However, generation of long descriptive summaries conditioned on structured data remains an open challenge. Likewise, MT that goes beyond sentence-level context is still an open issue (e.g., document-level MT or MT with metadata). To address these challenges, we pro… ▽ More

    Submitted 31 October, 2019; originally announced October 2019.

    Comments: WNGT 2019 - System Description Paper

  42. arXiv:1910.13689  [pdf, other

    cs.CL cs.SD eess.AS

    ON-TRAC Consortium End-to-End Speech Translation Systems for the IWSLT 2019 Shared Task

    Authors: Ha Nguyen, Natalia Tomashenko, Marcely Zanon Boito, Antoine Caubriere, Fethi Bougares, Mickael Rouvier, Laurent Besacier, Yannick Esteve

    Abstract: This paper describes the ON-TRAC Consortium translation systems developed for the end-to-end model task of IWSLT Evaluation 2019 for the English-to-Portuguese language pair. ON-TRAC Consortium is composed of researchers from three French academic laboratories: LIA (Avignon Université), LIG (Université Grenoble Alpes), and LIUM (Le Mans Université). A single end-to-end model built as a neural encod… ▽ More

    Submitted 30 October, 2019; originally announced October 2019.

    Comments: IWSLT 2019 - First two authors contributed equally to this work

  43. arXiv:1910.08418  [pdf, other

    cs.CL

    Controlling Utterance Length in NMT-based Word Segmentation with Attention

    Authors: Pierre Godard, Laurent Besacier, Francois Yvon

    Abstract: One of the basic tasks of computational language documentation (CLD) is to identify word boundaries in an unsegmented phonemic stream. While several unsupervised monolingual word segmentation algorithms exist in the literature, they are challenged in real-world CLD settings by the small amount of available data. A possible remedy is to take advantage of glosses or translation in a foreign, well-re… ▽ More

    Submitted 18 October, 2019; originally announced October 2019.

    Comments: Accepted to IWSLT 2019 (Hong-Kong)

  44. arXiv:1910.05154  [pdf, other

    cs.CL

    How Does Language Influence Documentation Workflow? Unsupervised Word Discovery Using Translations in Multiple Languages

    Authors: Marcely Zanon Boito, Aline Villavicencio, Laurent Besacier

    Abstract: For language documentation initiatives, transcription is an expensive resource: one minute of audio is estimated to take one hour and a half on average of a linguist's work (Austin and Sallabank, 2013). Recently, collecting aligned translations in well-resourced languages became a popular solution for ensuring posterior interpretability of the recordings (Adda et al. 2016). In this paper we invest… ▽ More

    Submitted 11 October, 2019; originally announced October 2019.

    Comments: 4 pages, workshop LIFT 2019

  45. arXiv:1909.08491  [pdf, other

    cs.CL cs.LG

    Word Recognition, Competition, and Activation in a Model of Visually Grounded Speech

    Authors: William N. Havard, Jean-Pierre Chevrot, Laurent Besacier

    Abstract: In this paper, we study how word-like units are represented and activated in a recurrent neural model of visually grounded speech. The model used in our experiments is trained to project an image and its spoken description in a common representation space. We show that a recurrent model trained on spoken sentences implicitly segments its input into word-like units and reliably maps them to their c… ▽ More

    Submitted 18 September, 2019; originally announced September 2019.

    Comments: Accepted at CoNLL2019

  46. arXiv:1908.08717  [pdf, other

    cs.CL cs.SD eess.AS

    Gender Representation in French Broadcast Corpora and Its Impact on ASR Performance

    Authors: Mahault Garnerin, Solange Rossato, Laurent Besacier

    Abstract: This paper analyzes the gender representation in four major corpora of French broadcast. These corpora being widely used within the speech processing community, they are a primary material for training automatic speech recognition (ASR) systems. As gender bias has been highlighted in numerous natural language processing (NLP) applications, we study the impact of the gender imbalance in TV and radi… ▽ More

    Submitted 23 August, 2019; originally announced August 2019.

    Comments: Accepted to ACM Workshop AI4TV

  47. arXiv:1907.12895  [pdf, other

    cs.CL

    MaSS: A Large and Clean Multilingual Corpus of Sentence-aligned Spoken Utterances Extracted from the Bible

    Authors: Marcely Zanon Boito, William N. Havard, Mahault Garnerin, Éric Le Ferrand, Laurent Besacier

    Abstract: The CMU Wilderness Multilingual Speech Dataset (Black, 2019) is a newly published multilingual speech dataset based on recorded readings of the New Testament. It provides data to build Automatic Speech Recognition (ASR) and Text-to-Speech (TTS) models for potentially 700 languages. However, the fact that the source content (the Bible) is the same for all the languages is not exploited to date.Ther… ▽ More

    Submitted 26 February, 2020; v1 submitted 30 July, 2019; originally announced July 2019.

    Comments: Accepted to LREC2020

  48. arXiv:1907.00184  [pdf, other

    cs.CL

    Empirical Evaluation of Sequence-to-Sequence Models for Word Discovery in Low-resource Settings

    Authors: Marcely Zanon Boito, Aline Villavicencio, Laurent Besacier

    Abstract: Since Bahdanau et al. [1] first introduced attention for neural machine translation, most sequence-to-sequence models made use of attention mechanisms [2, 3, 4]. While they produce soft-alignment matrices that could be interpreted as alignment between target and source languages, we lack metrics to quantify their quality, being unclear which approach produces the best alignments. This paper presen… ▽ More

    Submitted 11 September, 2019; v1 submitted 29 June, 2019; originally announced July 2019.

    Comments: Interspeech 2019

  49. arXiv:1904.11469  [pdf, other

    cs.CL cs.SD eess.AS

    The Zero Resource Speech Challenge 2019: TTS without T

    Authors: Ewan Dunbar, Robin Algayres, Julien Karadayi, Mathieu Bernard, Juan Benjumea, Xuan-Nga Cao, Lucie Miskic, Charlotte Dugrain, Lucas Ondel, Alan W. Black, Laurent Besacier, Sakriani Sakti, Emmanuel Dupoux

    Abstract: We present the Zero Resource Speech Challenge 2019, which proposes to build a speech synthesizer without any text or phonetic labels: hence, TTS without T (text-to-speech without text). We provide raw audio for a target voice in an unknown language (the Voice dataset), but no alignment, text or labels. Participants must discover subword units in an unsupervised way (using the Unit Discovery datase… ▽ More

    Submitted 7 July, 2019; v1 submitted 25 April, 2019; originally announced April 2019.

    Comments: Interspeech 2019

  50. arXiv:1902.03052  [pdf, other

    cs.CL

    Models of Visually Grounded Speech Signal Pay Attention To Nouns: a Bilingual Experiment on English and Japanese

    Authors: William N. Havard, Jean-Pierre Chevrot, Laurent Besacier

    Abstract: We investigate the behaviour of attention in neural models of visually grounded speech trained on two languages: English and Japanese. Experimental results show that attention focuses on nouns and this behaviour holds true for two very typologically different languages. We also draw parallels between artificial neural attention and human attention and show that neural attention focuses on word end… ▽ More

    Submitted 8 February, 2019; originally announced February 2019.

    Comments: 5 pages, 3 figures, accepted at ICASSP2019