Skip to main content

Showing 1–2 of 2 results for author: Núñez, J C R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2110.12552  [pdf, ps, other

    cs.CL

    Noisy UGC Translation at the Character Level: Revisiting Open-Vocabulary Capabilities and Robustness of Char-Based Models

    Authors: José Carlos Rosales Núñez, Guillaume Wisniewski, Djamé Seddah

    Abstract: This work explores the capacities of character-based Neural Machine Translation to translate noisy User-Generated Content (UGC) with a strong focus on exploring the limits of such approaches to handle productive UGC phenomena, which almost by definition, cannot be seen at training time. Within a strict zero-shot scenario, we first study the detrimental impact on translation performance of various… ▽ More

    Submitted 24 October, 2021; originally announced October 2021.

  2. arXiv:2110.12551  [pdf, other

    cs.CL

    Understanding the Impact of UGC Specificities on Translation Quality

    Authors: José Carlos Rosales Núñez, Djamé Seddah, Guillaume Wisniewski

    Abstract: This work takes a critical look at the evaluation of user-generated content automatic translation, the well-known specificities of which raise many challenges for MT. Our analyses show that measuring the average-case performance using a standard metric on a UGC test set falls far short of giving a reliable image of the UGC translation quality. That is why we introduce a new data set for the evalua… ▽ More

    Submitted 24 October, 2021; originally announced October 2021.