Skip to main content

Showing 1–9 of 9 results for author: Calapodescu, I

Searching in archive cs. Search in all archives.
.
  1. An Adapter-Based Unified Model for Multiple Spoken Language Processing Tasks

    Authors: Varsha Suresh, Salah Aït-Mokhtar, Caroline Brun, Ioan Calapodescu

    Abstract: Self-supervised learning models have revolutionized the field of speech processing. However, the process of fine-tuning these models on downstream tasks requires substantial computational resources, particularly when dealing with multiple speech-processing tasks. In this paper, we explore the potential of adapter-based fine-tuning in develo** a unified model capable of effectively handling multi… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: ICASSP 2024

  2. arXiv:2406.06371  [pdf, other

    cs.CL cs.SD eess.AS

    mHuBERT-147: A Compact Multilingual HuBERT Model

    Authors: Marcely Zanon Boito, Vivek Iyer, Nikolaos Lagos, Laurent Besacier, Ioan Calapodescu

    Abstract: We present mHuBERT-147, the first general-purpose massively multilingual HuBERT speech representation model trained on 90K hours of clean, open-license data. To scale up the multi-iteration HuBERT approach, we use faiss-based clustering, achieving 5.2x faster label assignment than the original method. We also apply a new multilingual batching up-sampling strategy, leveraging both language and data… ▽ More

    Submitted 27 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

    Comments: Extended version of the Interspeech 2024 paper of same name

  3. arXiv:2306.07763  [pdf, other

    cs.CL

    NAVER LABS Europe's Multilingual Speech Translation Systems for the IWSLT 2023 Low-Resource Track

    Authors: Edward Gow-Smith, Alexandre Berard, Marcely Zanon Boito, Ioan Calapodescu

    Abstract: This paper presents NAVER LABS Europe's systems for Tamasheq-French and Quechua-Spanish speech translation in the IWSLT 2023 Low-Resource track. Our work attempts to maximize translation quality in low-resource settings using multilingual parameter-efficient solutions that leverage strong pre-trained models. Our primary submission for Tamasheq outperforms the previous state of the art by 7.5 BLEU… ▽ More

    Submitted 13 June, 2023; originally announced June 2023.

    Comments: IWSLT 2023: Tamasheq-French and Quechua-Spanish challenge winner

  4. arXiv:2210.11835  [pdf, other

    cs.CL cs.SD eess.AS

    A Textless Metric for Speech-to-Speech Comparison

    Authors: Laurent Besacier, Swen Ribeiro, Olivier Galibert, Ioan Calapodescu

    Abstract: In this paper, we introduce a new and simple method for comparing speech utterances without relying on text transcripts. Our speech-to-speech comparison metric utilizes state-of-the-art speech2unit encoders like HuBERT to convert speech utterances into discrete acoustic units. We then propose a simple and easily replicable neural architecture that learns a speech-based metric that closely correspo… ▽ More

    Submitted 20 July, 2023; v1 submitted 21 October, 2022; originally announced October 2022.

    Comments: link to supplementary material: https://github.com/besacier/textless-metric

  5. arXiv:2204.09259  [pdf, other

    cs.CL cs.AI

    DaLC: Domain Adaptation Learning Curve Prediction for Neural Machine Translation

    Authors: Cheonbok Park, Hantae Kim, Ioan Calapodescu, Hyunchang Cho, Vassilina Nikoulina

    Abstract: Domain Adaptation (DA) of Neural Machine Translation (NMT) model often relies on a pre-trained general NMT model which is adapted to the new domain on a sample of in-domain parallel data. Without parallel data, there is no way to estimate the potential benefit of DA, nor the amount of parallel samples it would require. It is however a desirable functionality that could help MT practitioners to mak… ▽ More

    Submitted 20 April, 2022; originally announced April 2022.

    Comments: to be published in ACL2021

  6. arXiv:1910.14589  [pdf, other

    cs.CL

    Machine Translation of Restaurant Reviews: New Corpus for Domain Adaptation and Robustness

    Authors: Alexandre Bérard, Ioan Calapodescu, Marc Dymetman, Claude Roux, Jean-Luc Meunier, Vassilina Nikoulina

    Abstract: We share a French-English parallel corpus of Foursquare restaurant reviews (https://europe.naverlabs.com/research/natural-language-processing/machine-translation-of-restaurant-reviews), and define a new task to encourage research on Neural Machine Translation robustness and domain adaptation, in a real-world scenario where better-quality MT would be greatly beneficial. We discuss the challenges of… ▽ More

    Submitted 31 October, 2019; originally announced October 2019.

    Comments: WNGT 2019 Paper

  7. arXiv:1910.14539  [pdf, other

    cs.CL

    Naver Labs Europe's Systems for the Document-Level Generation and Translation Task at WNGT 2019

    Authors: Fahimeh Saleh, Alexandre Bérard, Ioan Calapodescu, Laurent Besacier

    Abstract: Recently, neural models led to significant improvements in both machine translation (MT) and natural language generation tasks (NLG). However, generation of long descriptive summaries conditioned on structured data remains an open challenge. Likewise, MT that goes beyond sentence-level context is still an open issue (e.g., document-level MT or MT with metadata). To address these challenges, we pro… ▽ More

    Submitted 31 October, 2019; originally announced October 2019.

    Comments: WNGT 2019 - System Description Paper

  8. arXiv:1907.06488  [pdf, other

    cs.CL

    Naver Labs Europe's Systems for the WMT19 Machine Translation Robustness Task

    Authors: Alexandre Bérard, Ioan Calapodescu, Claude Roux

    Abstract: This paper describes the systems that we submitted to the WMT19 Machine Translation robustness task. This task aims to improve MT's robustness to noise found on social media, like informal language, spelling mistakes and other orthographic variations. The organizers provide parallel data extracted from a social media website in two language pairs: French-English and Japanese-English (in both trans… ▽ More

    Submitted 15 July, 2019; originally announced July 2019.

    Comments: WMT 2019 - Shared Task Paper

  9. arXiv:1812.09836  [pdf, ps, other

    cs.CL cs.LG

    Moment Matching Training for Neural Machine Translation: A Preliminary Study

    Authors: Cong Duy Vu Hoang, Ioan Calapodescu, Marc Dymetman

    Abstract: In previous works, neural sequence models have been shown to improve significantly if external prior knowledge can be provided, for instance by allowing the model to access the embeddings of explicit features during both training and inference. In this work, we propose a different point of view on how to incorporate prior knowledge in a principled way, using a moment matching framework. In this ap… ▽ More

    Submitted 27 December, 2018; v1 submitted 24 December, 2018; originally announced December 2018.

    Comments: A preliminary study