Skip to main content

Showing 1–16 of 16 results for author: Elbayad, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.00986  [pdf, other

    cs.CL cs.AI cs.LG

    Merging Text Transformer Models from Different Initializations

    Authors: Neha Verma, Maha Elbayad

    Abstract: Recent work on one-shot permutation-based model merging has shown impressive low- or zero-barrier mode connectivity between models from completely different initializations. However, this line of work has not yet extended to the Transformer architecture, despite its dominant popularity in the language domain. Therefore, in this work, we investigate the extent to which separate Transformer minima l… ▽ More

    Submitted 7 March, 2024; v1 submitted 1 March, 2024; originally announced March 2024.

  2. arXiv:2402.05755  [pdf, other

    cs.CL cs.SD eess.AS

    SpiRit-LM: Interleaved Spoken and Written Language Model

    Authors: Tu Anh Nguyen, Benjamin Muller, Bokai Yu, Marta R. Costa-jussa, Maha Elbayad, Sravya Popuri, Paul-Ambroise Duquenne, Robin Algayres, Ruslan Mavlyutov, Itai Gat, Gabriel Synnaeve, Juan Pino, Benoit Sagot, Emmanuel Dupoux

    Abstract: We introduce SPIRIT-LM, a foundation multimodal language model that freely mixes text and speech. Our model is based on a pretrained text language model that we extend to the speech modality by continuously training it on text and speech units. Speech and text sequences are concatenated as a single set of tokens, and trained with a word-level interleaving method using a small automatically-curated… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

  3. arXiv:2312.05187  [pdf, other

    cs.CL cs.SD eess.AS

    Seamless: Multilingual Expressive and Streaming Speech Translation

    Authors: Seamless Communication, Loïc Barrault, Yu-An Chung, Mariano Coria Meglioli, David Dale, Ning Dong, Mark Duppenthaler, Paul-Ambroise Duquenne, Brian Ellis, Hady Elsahar, Justin Haaheim, John Hoffman, Min-Jae Hwang, Hirofumi Inaguma, Christopher Klaiber, Ilia Kulikov, Pengwei Li, Daniel Licht, Jean Maillard, Ruslan Mavlyutov, Alice Rakotoarison, Kaushik Ram Sadagopan, Abinesh Ramakrishnan, Tuan Tran, Guillaume Wenzek , et al. (40 additional authors not shown)

    Abstract: Large-scale automatic speech translation systems today lack key features that help machine-mediated communication feel seamless when compared to human-to-human dialogue. In this work, we introduce a family of models that enable end-to-end expressive and multilingual translations in a streaming fashion. First, we contribute an improved version of the massively multilingual and multimodal SeamlessM4… ▽ More

    Submitted 8 December, 2023; originally announced December 2023.

  4. arXiv:2311.06532  [pdf, other

    cs.CL

    Added Toxicity Mitigation at Inference Time for Multimodal and Massively Multilingual Translation

    Authors: Marta R. Costa-jussà, David Dale, Maha Elbayad, Bokai Yu

    Abstract: Added toxicity in the context of translation refers to the fact of producing a translation output with more toxicity than there exists in the input. In this paper, we present MinTox which is a novel pipeline to identify added toxicity and mitigate this issue which works at inference time. MinTox uses a toxicity detection classifier which is multimodal (speech and text) and works in languages at sc… ▽ More

    Submitted 11 November, 2023; originally announced November 2023.

    ACM Class: I.2.7

  5. arXiv:2308.11596  [pdf, other

    cs.CL

    SeamlessM4T: Massively Multilingual & Multimodal Machine Translation

    Authors: Seamless Communication, Loïc Barrault, Yu-An Chung, Mariano Cora Meglioli, David Dale, Ning Dong, Paul-Ambroise Duquenne, Hady Elsahar, Hongyu Gong, Kevin Heffernan, John Hoffman, Christopher Klaiber, Pengwei Li, Daniel Licht, Jean Maillard, Alice Rakotoarison, Kaushik Ram Sadagopan, Guillaume Wenzek, Ethan Ye, Bapi Akula, Peng-Jen Chen, Naji El Hachem, Brian Ellis, Gabriel Mejia Gonzalez, Justin Haaheim , et al. (43 additional authors not shown)

    Abstract: What does it take to create the Babel Fish, a tool that can help individuals translate speech between any two languages? While recent breakthroughs in text-based models have pushed machine translation coverage beyond 200 languages, unified speech-to-speech translation models have yet to achieve similar strides. More specifically, conventional speech-to-speech translation systems rely on cascaded s… ▽ More

    Submitted 24 October, 2023; v1 submitted 22 August, 2023; originally announced August 2023.

    ACM Class: I.2.7

  6. arXiv:2305.02176  [pdf, other

    cs.CL

    Towards Being Parameter-Efficient: A Stratified Sparsely Activated Transformer with Dynamic Capacity

    Authors: Haoran Xu, Maha Elbayad, Kenton Murray, Jean Maillard, Vedanuj Goswami

    Abstract: Mixture-of-experts (MoE) models that employ sparse activation have demonstrated effectiveness in significantly increasing the number of parameters while maintaining low computational requirements per token. However, recent studies have established that MoE models are inherently parameter-inefficient as the improvement in performance diminishes with an increasing number of experts. We hypothesize t… ▽ More

    Submitted 22 October, 2023; v1 submitted 3 May, 2023; originally announced May 2023.

    Comments: Accepted at Findings of EMNLP 2023

  7. arXiv:2302.03528  [pdf, other

    cs.CL

    Efficiently Upgrading Multilingual Machine Translation Models to Support More Languages

    Authors: Simeng Sun, Maha Elbayad, Anna Sun, James Cross

    Abstract: With multilingual machine translation (MMT) models continuing to grow in size and number of supported languages, it is natural to reuse and upgrade existing models to save computation as data becomes available in more languages. However, adding new languages requires updating the vocabulary, which complicates the reuse of embeddings. The question of how to reuse existing models while also making a… ▽ More

    Submitted 7 February, 2023; originally announced February 2023.

    Comments: Accepted to EACL 2023 (Main)

  8. arXiv:2212.07571  [pdf, other

    cs.CL cs.AI

    Fixing MoE Over-Fitting on Low-Resource Languages in Multilingual Machine Translation

    Authors: Maha Elbayad, Anna Sun, Shruti Bhosale

    Abstract: Sparsely gated Mixture of Experts (MoE) models have been shown to be a compute-efficient method to scale model capacity for multilingual machine translation. However, for low-resource tasks, MoE models severely over-fit. We show effective regularization strategies, namely dropout techniques for MoE layers in EOM and FOM, Conditional MoE Routing and Curriculum Learning methods that prevent over-fit… ▽ More

    Submitted 14 December, 2022; originally announced December 2022.

    Comments: arXiv admin note: text overlap with arXiv:2207.04672

  9. arXiv:2212.07530  [pdf, other

    cs.CL cs.AI cs.LG

    Causes and Cures for Interference in Multilingual Translation

    Authors: Uri Shaham, Maha Elbayad, Vedanuj Goswami, Omer Levy, Shruti Bhosale

    Abstract: Multilingual machine translation models can benefit from synergy between different language pairs, but also suffer from interference. While there is a growing number of sophisticated methods that aim to eliminate interference, our understanding of interference as a phenomenon is still limited. This work identifies the main factors that contribute to interference in multilingual machine translation… ▽ More

    Submitted 19 May, 2023; v1 submitted 14 December, 2022; originally announced December 2022.

    Comments: ACL 2023

  10. arXiv:2207.04672  [pdf

    cs.CL cs.AI

    No Language Left Behind: Scaling Human-Centered Machine Translation

    Authors: NLLB Team, Marta R. Costa-jussà, James Cross, Onur Çelebi, Maha Elbayad, Kenneth Heafield, Kevin Heffernan, Elahe Kalbassi, Janice Lam, Daniel Licht, Jean Maillard, Anna Sun, Skyler Wang, Guillaume Wenzek, Al Youngblood, Bapi Akula, Loic Barrault, Gabriel Mejia Gonzalez, Prangthip Hansanti, John Hoffman, Semarley Jarrett, Kaushik Ram Sadagopan, Dirk Rowe, Shannon Spruit, Chau Tran , et al. (14 additional authors not shown)

    Abstract: Driven by the goal of eradicating language barriers on a global scale, machine translation has solidified itself as a key focus of artificial intelligence research today. However, such efforts have coalesced around a small subset of languages, leaving behind the vast majority of mostly low-resource languages. What does it take to break the 200 language barrier while ensuring safe, high quality res… ▽ More

    Submitted 25 August, 2022; v1 submitted 11 July, 2022; originally announced July 2022.

    Comments: 190 pages

    MSC Class: 68T50 ACM Class: I.2.7

  11. arXiv:2006.00814  [pdf, other

    cs.CL

    Online Versus Offline NMT Quality: An In-depth Analysis on English-German and German-English

    Authors: Maha Elbayad, Michael Ustaszewski, Emmanuelle Esperança-Rodier, Francis Brunet Manquat, Jakob Verbeek, Laurent Besacier

    Abstract: We conduct in this work an evaluation study comparing offline and online neural machine translation architectures. Two sequence-to-sequence models: convolutional Pervasive Attention (Elbayad et al. 2018) and attention-based Transformer (Vaswani et al. 2017) are considered. We investigate, for both architectures, the impact of online decoding constraints on the translation quality through a careful… ▽ More

    Submitted 24 November, 2020; v1 submitted 1 June, 2020; originally announced June 2020.

    Comments: Accepted at COLING 2020

  12. arXiv:2005.11861  [pdf, other

    cs.CL eess.AS

    ON-TRAC Consortium for End-to-End and Simultaneous Speech Translation Challenge Tasks at IWSLT 2020

    Authors: Maha Elbayad, Ha Nguyen, Fethi Bougares, Natalia Tomashenko, Antoine Caubrière, Benjamin Lecouteux, Yannick Estève, Laurent Besacier

    Abstract: This paper describes the ON-TRAC Consortium translation systems developed for two challenge tracks featured in the Evaluation Campaign of IWSLT 2020, offline speech translation and simultaneous speech translation. ON-TRAC Consortium is composed of researchers from three French academic laboratories: LIA (Avignon Université), LIG (Université Grenoble Alpes), and LIUM (Le Mans Université). Attention… ▽ More

    Submitted 24 May, 2020; originally announced May 2020.

  13. arXiv:2005.08595  [pdf, other

    cs.CL cs.SD eess.AS

    Efficient Wait-k Models for Simultaneous Machine Translation

    Authors: Maha Elbayad, Laurent Besacier, Jakob Verbeek

    Abstract: Simultaneous machine translation consists in starting output generation before the entire input sequence is available. Wait-k decoders offer a simple but efficient approach for this problem. They first read k source tokens, after which they alternate between producing a target token and reading another source token. We investigate the behavior of wait-k decoding in low resource settings for spoken… ▽ More

    Submitted 3 August, 2020; v1 submitted 18 May, 2020; originally announced May 2020.

    Comments: Accepted at INTERSPEECH 2020

  14. arXiv:1910.10073  [pdf, other

    cs.CL cs.LG

    Depth-Adaptive Transformer

    Authors: Maha Elbayad, Jiatao Gu, Edouard Grave, Michael Auli

    Abstract: State of the art sequence-to-sequence models for large scale tasks perform a fixed number of computations for each input sequence regardless of whether it is easy or hard to process. In this paper, we train Transformer models which can make output predictions at different stages of the network and we investigate different ways to predict how much computation is required for a particular sequence.… ▽ More

    Submitted 14 February, 2020; v1 submitted 22 October, 2019; originally announced October 2019.

    Comments: Published as a conference paper at ICLR 2020

  15. arXiv:1808.03867  [pdf, other

    cs.CL

    Pervasive Attention: 2D Convolutional Neural Networks for Sequence-to-Sequence Prediction

    Authors: Maha Elbayad, Laurent Besacier, Jakob Verbeek

    Abstract: Current state-of-the-art machine translation systems are based on encoder-decoder architectures, that first encode the input sequence, and then generate an output sequence based on the input encoding. Both are interfaced with an attention mechanism that recombines a fixed encoding of the source tokens based on the decoder state. We propose an alternative approach which instead relies on a single 2… ▽ More

    Submitted 1 November, 2018; v1 submitted 11 August, 2018; originally announced August 2018.

    Comments: Accepted at CoNLL 2018

  16. arXiv:1805.05062  [pdf, other

    cs.CL cs.CV

    Token-level and sequence-level loss smoothing for RNN language models

    Authors: Maha Elbayad, Laurent Besacier, Jakob Verbeek

    Abstract: Despite the effectiveness of recurrent neural network language models, their maximum likelihood estimation suffers from two limitations. It treats all sentences that do not match the ground truth as equally poor, ignoring the structure of the output space. Second, it suffers from "exposure bias": during training tokens are predicted given ground-truth sequences, while at test time prediction is co… ▽ More

    Submitted 14 May, 2018; originally announced May 2018.

    Comments: Accepted by ACL 2018