Skip to main content

Showing 1–17 of 17 results for author: Escolano, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.09140  [pdf, other

    cs.CL

    Investigating the translation capabilities of Large Language Models trained on parallel data only

    Authors: Javier García Gilabert, Carlos Escolano, Aleix Sant Savall, Francesca De Luca Fornaciari, Audrey Mash, Xixian Liao, Maite Melero

    Abstract: In recent years, Large Language Models (LLMs) have demonstrated exceptional proficiency across a broad spectrum of Natural Language Processing (NLP) tasks, including Machine Translation. However, previous methods predominantly relied on iterative processes such as instruction fine-tuning or continual pre-training, leaving unexplored the challenges of training LLMs solely on parallel data. In this… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: We release our code at: https://github.com/projecte-aina/Plume

  2. arXiv:2309.17134  [pdf, other

    cs.CL

    Promoting Generalized Cross-lingual Question Answering in Few-resource Scenarios via Self-knowledge Distillation

    Authors: Casimiro Pio Carrino, Carlos Escolano, José A. R. Fonollosa

    Abstract: Despite substantial progress in multilingual extractive Question Answering (QA), models with high and uniformly distributed performance across languages remain challenging, especially for languages with limited resources. We study cross-lingual transfer mainly focusing on the Generalized Cross-Lingual Transfer (G-XLT) task, where the question language differs from the context language - a challeng… ▽ More

    Submitted 29 September, 2023; originally announced September 2023.

    Comments: Submitted to the Journal of Artificial Intelligence Research (JAIR)

  3. arXiv:2305.11761  [pdf, other

    cs.CL

    ReSeTOX: Re-learning attention weights for toxicity mitigation in machine translation

    Authors: Javier García Gilabert, Carlos Escolano, Marta R. Costa-Jussà

    Abstract: Our proposed method, ReSeTOX (REdo SEarch if TOXic), addresses the issue of Neural Machine Translation (NMT) generating translation outputs that contain toxic words not present in the input. The objective is to mitigate the introduction of toxic language without the need for re-training. In the case of identified added toxicity during the inference process, ReSeTOX dynamically adjusts the key-valu… ▽ More

    Submitted 19 May, 2023; originally announced May 2023.

  4. arXiv:2210.03070  [pdf, other

    cs.CL

    Toxicity in Multilingual Machine Translation at Scale

    Authors: Marta R. Costa-jussà, Eric Smith, Christophe Ropers, Daniel Licht, Jean Maillard, Javier Ferrando, Carlos Escolano

    Abstract: Machine Translation systems can produce different types of errors, some of which are characterized as critical or catastrophic due to the specific negative impact that they can have on users. In this paper we focus on one type of critical error: added toxicity. We evaluate and analyze added toxicity when translating a large evaluation dataset (HOLISTICBIAS, over 472k sentences, covering 13 demogra… ▽ More

    Submitted 5 April, 2023; v1 submitted 6 October, 2022; originally announced October 2022.

    ACM Class: I.2.7

  5. arXiv:2205.11631  [pdf, other

    cs.CL

    Towards Opening the Black Box of Neural Machine Translation: Source and Target Interpretations of the Transformer

    Authors: Javier Ferrando, Gerard I. Gállego, Belen Alastruey, Carlos Escolano, Marta R. Costa-jussà

    Abstract: In Neural Machine Translation (NMT), each token prediction is conditioned on the source sentence and the target prefix (what has been previously translated at a decoding step). However, previous work on interpretability in NMT has mainly focused solely on source sentence tokens' attributions. Therefore, we lack a full understanding of the influences of every input token (source sentence and target… ▽ More

    Submitted 4 November, 2022; v1 submitted 23 May, 2022; originally announced May 2022.

    Comments: EMNLP 2022

  6. arXiv:2202.06041  [pdf, other

    cs.CL cs.IR

    A multi-task semi-supervised framework for Text2Graph & Graph2Text

    Authors: Oriol Domingo, Marta R. Costa-jussà, Carlos Escolano

    Abstract: The Artificial Intelligence industry regularly develops applications that mostly rely on Knowledge Bases, a data repository about specific, or general, domains, usually represented in a graph shape. Similar to other databases, they face two main challenges: information ingestion and information retrieval. We approach these challenges by jointly learning graph extraction from text and text generati… ▽ More

    Submitted 18 February, 2022; v1 submitted 12 February, 2022; originally announced February 2022.

    Comments: 5 pages, 2 figures, 3 tables and 8 equations

  7. arXiv:2105.04512  [pdf, other

    cs.CL

    End-to-End Speech Translation with Pre-trained Models and Adapters: UPC at IWSLT 2021

    Authors: Gerard I. Gállego, Ioannis Tsiamas, Carlos Escolano, José A. R. Fonollosa, Marta R. Costa-jussà

    Abstract: This paper describes the submission to the IWSLT 2021 offline speech translation task by the UPC Machine Translation group. The task consists of building a system capable of translating English audio recordings extracted from TED talks into German text. Submitted systems can be either cascade or end-to-end and use a custom or given segmentation. Our submission is an end-to-end speech translation s… ▽ More

    Submitted 28 June, 2021; v1 submitted 10 May, 2021; originally announced May 2021.

    Comments: Submitted to IWSLT 2021; changed the title and added submission results

  8. arXiv:2012.13176  [pdf, other

    cs.CL

    Gender Bias in Multilingual Neural Machine Translation: The Architecture Matters

    Authors: Marta R. Costa-jussà, Carlos Escolano, Christine Basta, Javier Ferrando, Roser Batlle, Ksenia Kharitonova

    Abstract: Multilingual Neural Machine Translation architectures mainly differ in the amount of sharing modules and parameters among languages. In this paper, and from an algorithmic perspective, we explore if the chosen architecture, when trained with the same data, influences the gender bias accuracy. Experiments in four language pairs show that Language-Specific encoders-decoders exhibit less bias than th… ▽ More

    Submitted 24 December, 2020; originally announced December 2020.

    Comments: 12 pages, 5 figures, 3 tables

    ACM Class: I.2.7

  9. arXiv:2011.01097  [pdf, other

    cs.CL

    Enabling Zero-shot Multilingual Spoken Language Translation with Language-Specific Encoders and Decoders

    Authors: Carlos Escolano, Marta R. Costa-jussà, José A. R. Fonollosa, Carlos Segura

    Abstract: Current end-to-end approaches to Spoken Language Translation (SLT) rely on limited training resources, especially for multilingual settings. On the other hand, Multilingual Neural Machine Translation (MultiNMT) approaches rely on higher-quality and more massive data sets. Our proposed method extends a MultiNMT architecture based on language-specific encoders-decoders to the task of Multilingual SL… ▽ More

    Submitted 15 September, 2021; v1 submitted 2 November, 2020; originally announced November 2020.

    ACM Class: I.2.7

    Journal ref: IEEE Workshop on Automatic Speech Recognition and Understanding 2021

  10. arXiv:2006.01594  [pdf, other

    cs.CL

    Training Multilingual Machine Translation by Alternately Freezing Language-Specific Encoders-Decoders

    Authors: Carlos Escolano, Marta R. Costa-jussà, José A. R. Fonollosa, Mikel Artetxe

    Abstract: We propose a modular architecture of language-specific encoder-decoders that constitutes a multilingual machine translation system that can be incrementally extended to new languages without the need for retraining the existing system when adding new languages. Differently from previous works, we simultaneously train $N$ languages in all translation directions by alternately freezing encoder or de… ▽ More

    Submitted 29 May, 2020; originally announced June 2020.

    Comments: arXiv admin note: text overlap with arXiv:2004.06575

    ACM Class: I.2.7

  11. arXiv:2004.08053  [pdf, other

    cs.CL

    Enriching the Transformer with Linguistic Factors for Low-Resource Machine Translation

    Authors: Jordi Armengol-Estapé, Marta R. Costa-jussà, Carlos Escolano

    Abstract: Introducing factors, that is to say, word features such as linguistic information referring to the source tokens, is known to improve the results of neural machine translation systems in certain settings, typically in recurrent architectures. This study proposes enhancing the current state-of-the-art neural machine translation architecture, the Transformer, so that it allows to introduce external… ▽ More

    Submitted 24 December, 2020; v1 submitted 16 April, 2020; originally announced April 2020.

    ACM Class: I.2.7

  12. arXiv:2004.06575  [pdf, ps, other

    cs.CL

    Multilingual Machine Translation: Closing the Gap between Shared and Language-specific Encoder-Decoders

    Authors: Carlos Escolano, Marta R. Costa-jussà, José A. R. Fonollosa, Mikel Artetxe

    Abstract: State-of-the-art multilingual machine translation relies on a universal encoder-decoder, which requires retraining the entire system to add new languages. In this paper, we propose an alternative approach that is based on language-specific encoder-decoders, and can thus be more easily extended to new languages by learning their corresponding modules. So as to encourage a common interlingua represe… ▽ More

    Submitted 14 April, 2020; originally announced April 2020.

    ACM Class: I.2.7

  13. arXiv:1907.00810  [pdf, other

    cs.CL

    Multilingual, Multi-scale and Multi-layer Visualization of Intermediate Representations

    Authors: Carlos Escolano, Marta R. Costa-jussà, Elora Lacroux, Pere-Pau Vázquez

    Abstract: The main alternatives nowadays to deal with sequences are Recurrent Neural Networks (RNN), Convolutional Neural Networks (CNN) architectures and the Transformer. In this context, RNN's, CNN's and Transformer have most commonly been used as an encoder-decoder architecture with multiple layers in each module. Far beyond this, these architectures are the basis for the contextual word embeddings which… ▽ More

    Submitted 1 July, 2019; originally announced July 2019.

  14. arXiv:1907.00735  [pdf, other

    cs.CL

    From Bilingual to Multilingual Neural Machine Translation by Incremental Training

    Authors: Carlos Escolano, Marta R. Costa-Jussà, José A. R. Fonollosa

    Abstract: Multilingual Neural Machine Translation approaches are based on the use of task-specific models and the addition of one more language can only be done by retraining the whole system. In this work, we propose a new training schedule that allows the system to scale to more languages without modification of the previous components based on joint training and language-independent encoder/decoder modul… ▽ More

    Submitted 11 July, 2019; v1 submitted 28 June, 2019; originally announced July 2019.

    Comments: Accepted paper at ACL 2019 Student Research Workshop. arXiv admin note: substantial text overlap with arXiv:1905.06831

  15. arXiv:1905.06831  [pdf, other

    cs.CL

    Towards Interlingua Neural Machine Translation

    Authors: Carlos Escolano, Marta R. Costa-jussà, José A. R. Fonollosa

    Abstract: Common intermediate language representation in neural machine translation can be used to extend bilingual to multilingual systems by incremental training. In this paper, we propose a new architecture based on introducing an interlingual loss as an additional training objective. By adding and forcing this interlingual loss, we are able to train multiple encoders and decoders for each language, shar… ▽ More

    Submitted 8 December, 2019; v1 submitted 15 May, 2019; originally announced May 2019.

    Comments: arXiv admin note: substantial text overlap with arXiv:1810.06351

  16. arXiv:1810.06351  [pdf, other

    cs.CL

    (Self-Attentive) Autoencoder-based Universal Language Representation for Machine Translation

    Authors: Carlos Escolano, Marta R. Costa-jussà, José A. R. Fonollosa

    Abstract: Universal language representation is the holy grail in machine translation (MT). Thanks to the new neural MT approach, it seems that there are good perspectives towards this goal. In this paper, we propose a new architecture based on combining variational autoencoders with encoder-decoders and introducing an interlingual loss as an additional training objective. By adding and forcing this interlin… ▽ More

    Submitted 15 October, 2018; originally announced October 2018.

    Comments: 7 pages, 4 figures

  17. arXiv:1610.02209  [pdf, ps, other

    cs.CL stat.ML

    Morphology Generation for Statistical Machine Translation using Deep Learning Techniques

    Authors: Marta R. Costa-jussà, Carlos Escolano

    Abstract: Morphology in unbalanced languages remains a big challenge in the context of machine translation. In this paper, we propose to de-couple machine translation from morphology generation in order to better deal with the problem. We investigate the morphology simplification with a reasonable trade-off between expected gain and generation complexity. For the Chinese-Spanish task, optimum morphological… ▽ More

    Submitted 6 February, 2017; v1 submitted 7 October, 2016; originally announced October 2016.