Skip to main content

Showing 1–7 of 7 results for author: Heffernan, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2312.05187  [pdf, other

    cs.CL cs.SD eess.AS

    Seamless: Multilingual Expressive and Streaming Speech Translation

    Authors: Seamless Communication, Loïc Barrault, Yu-An Chung, Mariano Coria Meglioli, David Dale, Ning Dong, Mark Duppenthaler, Paul-Ambroise Duquenne, Brian Ellis, Hady Elsahar, Justin Haaheim, John Hoffman, Min-Jae Hwang, Hirofumi Inaguma, Christopher Klaiber, Ilia Kulikov, Pengwei Li, Daniel Licht, Jean Maillard, Ruslan Mavlyutov, Alice Rakotoarison, Kaushik Ram Sadagopan, Abinesh Ramakrishnan, Tuan Tran, Guillaume Wenzek , et al. (40 additional authors not shown)

    Abstract: Large-scale automatic speech translation systems today lack key features that help machine-mediated communication feel seamless when compared to human-to-human dialogue. In this work, we introduce a family of models that enable end-to-end expressive and multilingual translations in a streaming fashion. First, we contribute an improved version of the massively multilingual and multimodal SeamlessM4… ▽ More

    Submitted 8 December, 2023; originally announced December 2023.

  2. arXiv:2308.11596  [pdf, other

    cs.CL

    SeamlessM4T: Massively Multilingual & Multimodal Machine Translation

    Authors: Seamless Communication, Loïc Barrault, Yu-An Chung, Mariano Cora Meglioli, David Dale, Ning Dong, Paul-Ambroise Duquenne, Hady Elsahar, Hongyu Gong, Kevin Heffernan, John Hoffman, Christopher Klaiber, Pengwei Li, Daniel Licht, Jean Maillard, Alice Rakotoarison, Kaushik Ram Sadagopan, Guillaume Wenzek, Ethan Ye, Bapi Akula, Peng-Jen Chen, Naji El Hachem, Brian Ellis, Gabriel Mejia Gonzalez, Justin Haaheim , et al. (43 additional authors not shown)

    Abstract: What does it take to create the Babel Fish, a tool that can help individuals translate speech between any two languages? While recent breakthroughs in text-based models have pushed machine translation coverage beyond 200 languages, unified speech-to-speech translation models have yet to achieve similar strides. More specifically, conventional speech-to-speech translation systems rely on cascaded s… ▽ More

    Submitted 24 October, 2023; v1 submitted 22 August, 2023; originally announced August 2023.

    ACM Class: I.2.7

  3. arXiv:2306.12907  [pdf, other

    cs.CL

    xSIM++: An Improved Proxy to Bitext Mining Performance for Low-Resource Languages

    Authors: Mingda Chen, Kevin Heffernan, Onur Çelebi, Alex Mourachko, Holger Schwenk

    Abstract: We introduce a new proxy score for evaluating bitext mining based on similarity in a multilingual embedding space: xSIM++. In comparison to xSIM, this improved proxy leverages rule-based approaches to extend English sentences in any evaluation set with synthetic, hard-to-distinguish examples which more closely mirror the scenarios we encounter during large-scale mining. We validate this proxy by r… ▽ More

    Submitted 22 June, 2023; originally announced June 2023.

    Comments: The first two authors contributed equally; ACL 2023 short; Code and data are available at https://github.com/facebookresearch/LASER

  4. arXiv:2210.05033  [pdf, other

    cs.CL

    Multilingual Representation Distillation with Contrastive Learning

    Authors: Weiting Tan, Kevin Heffernan, Holger Schwenk, Philipp Koehn

    Abstract: Multilingual sentence representations from large models encode semantic information from two or more languages and can be used for different cross-lingual information retrieval and matching tasks. In this paper, we integrate contrastive learning into multilingual representation distillation and use it for quality estimation of parallel sentences (i.e., find semantically similar sentences that can… ▽ More

    Submitted 30 April, 2023; v1 submitted 10 October, 2022; originally announced October 2022.

    Comments: EACL 2023

  5. arXiv:2207.04672  [pdf

    cs.CL cs.AI

    No Language Left Behind: Scaling Human-Centered Machine Translation

    Authors: NLLB Team, Marta R. Costa-jussà, James Cross, Onur Çelebi, Maha Elbayad, Kenneth Heafield, Kevin Heffernan, Elahe Kalbassi, Janice Lam, Daniel Licht, Jean Maillard, Anna Sun, Skyler Wang, Guillaume Wenzek, Al Youngblood, Bapi Akula, Loic Barrault, Gabriel Mejia Gonzalez, Prangthip Hansanti, John Hoffman, Semarley Jarrett, Kaushik Ram Sadagopan, Dirk Rowe, Shannon Spruit, Chau Tran , et al. (14 additional authors not shown)

    Abstract: Driven by the goal of eradicating language barriers on a global scale, machine translation has solidified itself as a key focus of artificial intelligence research today. However, such efforts have coalesced around a small subset of languages, leaving behind the vast majority of mostly low-resource languages. What does it take to break the 200 language barrier while ensuring safe, high quality res… ▽ More

    Submitted 25 August, 2022; v1 submitted 11 July, 2022; originally announced July 2022.

    Comments: 190 pages

    MSC Class: 68T50 ACM Class: I.2.7

  6. arXiv:2205.12654  [pdf, other

    cs.CL

    Bitext Mining Using Distilled Sentence Representations for Low-Resource Languages

    Authors: Kevin Heffernan, Onur Çelebi, Holger Schwenk

    Abstract: Scaling multilingual representation learning beyond the hundred most frequent languages is challenging, in particular to cover the long tail of low-resource languages. A promising approach has been to train one-for-all multilingual models capable of cross-lingual transfer, but these models often suffer from insufficient capacity and interference between unrelated languages. Instead, we move away f… ▽ More

    Submitted 25 May, 2022; originally announced May 2022.

    Comments: 12 pages

  7. arXiv:2106.12978  [pdf, other

    cs.LG cs.CL

    Unsupervised Topic Segmentation of Meetings with BERT Embeddings

    Authors: Alessandro Solbiati, Kevin Heffernan, Georgios Damaskinos, Shivani Poddar, Shubham Modi, Jacques Cali

    Abstract: Topic segmentation of meetings is the task of dividing multi-person meeting transcripts into topic blocks. Supervised approaches to the problem have proven intractable due to the difficulties in collecting and accurately annotating large datasets. In this paper we show how previous unsupervised topic segmentation methods can be improved using pre-trained neural architectures. We introduce an unsup… ▽ More

    Submitted 24 June, 2021; originally announced June 2021.