Skip to main content

Showing 1–16 of 16 results for author: Gallego, G I

Searching in archive cs. Search in all archives.
.
  1. arXiv:2402.10422  [pdf, other

    cs.CL

    Pushing the Limits of Zero-shot End-to-End Speech Translation

    Authors: Ioannis Tsiamas, Gerard I. Gállego, José A. R. Fonollosa, Marta R. Costa-jussà

    Abstract: Data scarcity and the modality gap between the speech and text modalities are two major obstacles of end-to-end Speech Translation (ST) systems, thus hindering their performance. Prior work has attempted to mitigate these challenges by leveraging external MT data and optimizing distance metrics that bring closer the speech-text representations. However, achieving competitive results typically requ… ▽ More

    Submitted 5 June, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

    Comments: ACL 2024 (Findings)

  2. arXiv:2309.11585  [pdf, other

    cs.CL

    SpeechAlign: a Framework for Speech Translation Alignment Evaluation

    Authors: Belen Alastruey, Aleix Sant, Gerard I. Gállego, David Dale, Marta R. Costa-jussà

    Abstract: Speech-to-Speech and Speech-to-Text translation are currently dynamic areas of research. In our commitment to advance these fields, we present SpeechAlign, a framework designed to evaluate the underexplored field of source-target alignment in speech models. The SpeechAlign framework has two core components. First, to tackle the absence of suitable evaluation datasets, we introduce the Speech Gold… ▽ More

    Submitted 25 April, 2024; v1 submitted 20 September, 2023; originally announced September 2023.

    Comments: LREC-COLING 2024

  3. arXiv:2306.01327  [pdf, other

    cs.CL cs.SD eess.AS

    Speech Translation with Foundation Models and Optimal Transport: UPC at IWSLT23

    Authors: Ioannis Tsiamas, Gerard I. Gállego, José A. R. Fonollosa, Marta R. Costa-jussà

    Abstract: This paper describes the submission of the UPC Machine Translation group to the IWSLT 2023 Offline Speech Translation task. Our Speech Translation systems utilize foundation models for speech (wav2vec 2.0) and text (mBART50). We incorporate a Siamese pretraining step of the speech and text encoders with CTC and Optimal Transport, to adapt the speech representations to the space of the text model,… ▽ More

    Submitted 2 June, 2023; originally announced June 2023.

    Comments: IWSLT 2023

  4. arXiv:2305.12535  [pdf, other

    cs.CL cs.AI cs.LG

    Explaining How Transformers Use Context to Build Predictions

    Authors: Javier Ferrando, Gerard I. Gállego, Ioannis Tsiamas, Marta R. Costa-jussà

    Abstract: Language Generation Models produce words based on the previous context. Although existing methods offer input attributions as explanations for a model's prediction, it is still unclear how prior words affect the model's decision throughout the layers. In this work, we leverage recent advances in explainability of the Transformer and present a procedure to analyze models for language generation. Us… ▽ More

    Submitted 21 May, 2023; originally announced May 2023.

    Comments: ACL 2023

  5. arXiv:2304.06371  [pdf, other

    cs.CL cs.CV

    Sign Language Translation from Instructional Videos

    Authors: Laia Tarrés, Gerard I. Gállego, Amanda Duarte, Jordi Torres, Xavier Giró-i-Nieto

    Abstract: The advances in automatic sign language translation (SLT) to spoken languages have been mostly benchmarked with datasets of limited size and restricted domains. Our work advances the state of the art by providing the first baseline results on How2Sign, a large and broad dataset. We train a Transformer over I3D video features, using the reduced BLEU as a reference metric for validation, instead o… ▽ More

    Submitted 14 April, 2023; v1 submitted 13 April, 2023; originally announced April 2023.

    Comments: Paper accepted at WiCV @CVPR23

  6. arXiv:2212.01140  [pdf, other

    cs.CL cs.CV

    Tackling Low-Resourced Sign Language Translation: UPC at WMT-SLT 22

    Authors: Laia Tarrés, Gerard I. Gàllego, Xavier Giró-i-Nieto, Jordi Torres

    Abstract: This paper describes the system developed at the Universitat Politècnica de Catalunya for the Workshop on Machine Translation 2022 Sign Language Translation Task, in particular, for the sign-to-text direction. We use a Transformer model implemented with the Fairseq modeling toolkit. We have experimented with the vocabulary size, data augmentation techniques and pretraining the model with the PHOEN… ▽ More

    Submitted 2 December, 2022; originally announced December 2022.

  7. arXiv:2210.16264  [pdf, other

    cs.CL cs.SD eess.AS

    Efficient Speech Translation with Dynamic Latent Perceivers

    Authors: Ioannis Tsiamas, Gerard I. Gállego, José A. R. Fonollosa, Marta R. Costa-jussà

    Abstract: Transformers have been the dominant architecture for Speech Translation in recent years, achieving significant improvements in translation quality. Since speech signals are longer than their textual counterparts, and due to the quadratic complexity of the Transformer, a down-sampling step is essential for its adoption in Speech Translation. Instead, in this research, we propose to ease the complex… ▽ More

    Submitted 14 March, 2023; v1 submitted 28 October, 2022; originally announced October 2022.

    Comments: ICASSP 2023

  8. arXiv:2209.02402  [pdf, other

    cs.CV cs.AI

    Topic Detection in Continuous Sign Language Videos

    Authors: Alvaro Budria, Laia Tarres, Gerard I. Gallego, Francesc Moreno-Noguer, Jordi Torres, Xavier Giro-i-Nieto

    Abstract: Significant progress has been made recently on challenging tasks in automatic sign language understanding, such as sign language recognition, translation and production. However, these works have focused on datasets with relatively few samples, short recordings and limited vocabulary and signing space. In this work, we introduce the novel task of sign language topic detection. We base our experime… ▽ More

    Submitted 1 September, 2022; originally announced September 2022.

    Comments: Presented as an extended abstract in the "AVA: Accessibility, Vision, and Autonomy Meet" CVPR 2022 Workshop

    Journal ref: "AVA: Accessibility, Vision, and Autonomy Meet" CVPR 2022 Workshop

  9. arXiv:2205.11631  [pdf, other

    cs.CL

    Towards Opening the Black Box of Neural Machine Translation: Source and Target Interpretations of the Transformer

    Authors: Javier Ferrando, Gerard I. Gállego, Belen Alastruey, Carlos Escolano, Marta R. Costa-jussà

    Abstract: In Neural Machine Translation (NMT), each token prediction is conditioned on the source sentence and the target prefix (what has been previously translated at a decoding step). However, previous work on interpretability in NMT has mainly focused solely on source sentence tokens' attributions. Therefore, we lack a full understanding of the influences of every input token (source sentence and target… ▽ More

    Submitted 4 November, 2022; v1 submitted 23 May, 2022; originally announced May 2022.

    Comments: EMNLP 2022

  10. arXiv:2205.07100  [pdf, other

    cs.CL cs.AI cs.MM cs.SD eess.AS

    Multiformer: A Head-Configurable Transformer-Based Model for Direct Speech Translation

    Authors: Gerard Sant, Gerard I. Gállego, Belen Alastruey, Marta R. Costa-Jussà

    Abstract: Transformer-based models have been achieving state-of-the-art results in several fields of Natural Language Processing. However, its direct application to speech tasks is not trivial. The nature of this sequences carries problems such as long sequence lengths and redundancy between adjacent tokens. Therefore, we believe that regular self-attention mechanism might not be well suited for it. Diffe… ▽ More

    Submitted 14 May, 2022; originally announced May 2022.

    Comments: NAACL-SRW 2022

  11. arXiv:2204.09028  [pdf, other

    cs.CL cs.SD eess.AS

    On the Locality of Attention in Direct Speech Translation

    Authors: Belen Alastruey, Javier Ferrando, Gerard I. Gállego, Marta R. Costa-jussà

    Abstract: Transformers have achieved state-of-the-art results across multiple NLP tasks. However, the self-attention mechanism complexity scales quadratically with the sequence length, creating an obstacle for tasks involving long sequences, like in the speech domain. In this paper, we discuss the usefulness of self-attention for Direct Speech Translation. First, we analyze the layer-wise token contribution… ▽ More

    Submitted 19 April, 2022; originally announced April 2022.

    Comments: ACL-SRW 2022. Equal contribution between Belen Alastruey and Javier Ferrando

  12. arXiv:2203.04212  [pdf, other

    cs.CL

    Measuring the Mixing of Contextual Information in the Transformer

    Authors: Javier Ferrando, Gerard I. Gállego, Marta R. Costa-jussà

    Abstract: The Transformer architecture aggregates input information through the self-attention mechanism, but there is no clear understanding of how this information is mixed across the entire model. Additionally, recent works have demonstrated that attention weights alone are not enough to describe the flow of information. In this paper, we consider the whole attention block -- multi-head attention, residu… ▽ More

    Submitted 21 October, 2022; v1 submitted 8 March, 2022; originally announced March 2022.

    Comments: EMNLP 2022

  13. arXiv:2202.04774  [pdf, other

    cs.SD cs.CL eess.AS

    SHAS: Approaching optimal Segmentation for End-to-End Speech Translation

    Authors: Ioannis Tsiamas, Gerard I. Gállego, José A. R. Fonollosa, Marta R. Costa-jussà

    Abstract: Speech translation models are unable to directly process long audios, like TED talks, which have to be split into shorter segments. Speech translation datasets provide manual segmentations of the audios, which are not available in real-world scenarios, and existing segmentation methods usually significantly reduce translation quality at inference time. To bridge the gap between the manual segmenta… ▽ More

    Submitted 6 July, 2022; v1 submitted 9 February, 2022; originally announced February 2022.

    Comments: Accepted to Interspeech 2022. For an additional 2-page Appendix refer to v1

  14. arXiv:2107.03069  [pdf, other

    cs.CL cs.SD eess.AS

    Efficient Transformer for Direct Speech Translation

    Authors: Belen Alastruey, Gerard I. Gállego, Marta R. Costa-jussà

    Abstract: The advent of Transformer-based models has surpassed the barriers of text. When working with speech, we must face a problem: the sequence length of an audio input is not suitable for the Transformer. To bypass this problem, a usual approach is adding strided convolutional layers, to reduce the sequence length before using the Transformer. In this paper, we propose a new approach for direct Speech… ▽ More

    Submitted 7 July, 2021; originally announced July 2021.

    Comments: (c) 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

  15. arXiv:2105.04512  [pdf, other

    cs.CL

    End-to-End Speech Translation with Pre-trained Models and Adapters: UPC at IWSLT 2021

    Authors: Gerard I. Gállego, Ioannis Tsiamas, Carlos Escolano, José A. R. Fonollosa, Marta R. Costa-jussà

    Abstract: This paper describes the submission to the IWSLT 2021 offline speech translation task by the UPC Machine Translation group. The task consists of building a system capable of translating English audio recordings extracted from TED talks into German text. Submitted systems can be either cascade or end-to-end and use a custom or given segmentation. Our submission is an end-to-end speech translation s… ▽ More

    Submitted 28 June, 2021; v1 submitted 10 May, 2021; originally announced May 2021.

    Comments: Submitted to IWSLT 2021; changed the title and added submission results

  16. arXiv:2010.14465  [pdf, other

    cs.CL

    Evaluating Gender Bias in Speech Translation

    Authors: Marta R. Costa-jussà, Christine Basta, Gerard I. Gállego

    Abstract: The scientific community is increasingly aware of the necessity to embrace pluralism and consistently represent major and minor social groups. Currently, there are no standard evaluation techniques for different types of biases. Accordingly, there is an urgent need to provide evaluation sets and protocols to measure existing biases in our automatic systems. Evaluating the biases should be an essen… ▽ More

    Submitted 14 May, 2022; v1 submitted 27 October, 2020; originally announced October 2020.

    Comments: Preprint

    ACM Class: I.2.7

    Journal ref: Proceedings of the LREC 2022