Skip to main content

Showing 1–7 of 7 results for author: Blain, F

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.18018  [pdf, other

    cs.CL cs.LG

    DORE: A Dataset For Portuguese Definition Generation

    Authors: Anna Beatriz Dimas Furtado, Tharindu Ranasinghe, Frédéric Blain, Ruslan Mitkov

    Abstract: Definition modelling (DM) is the task of automatically generating a dictionary definition for a specific word. Computational systems that are capable of DM can have numerous applications benefiting a wide range of audiences. As DM is considered a supervised natural language generation problem, these systems require large annotated datasets to train the machine learning (ML) models. Several DM data… ▽ More

    Submitted 28 March, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

    Comments: Accepted to LREC-COLING 2024 (The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation)

  2. arXiv:2304.08891  [pdf, other

    cs.CL

    Tailoring Domain Adaptation for Machine Translation Quality Estimation

    Authors: Javad Pourmostafa Roshan Sharami, Dimitar Shterionov, Frédéric Blain, Eva Vanmassenhove, Mirella De Sisto, Chris Emmery, Pieter Spronck

    Abstract: While quality estimation (QE) can play an important role in the translation process, its effectiveness relies on the availability and quality of training data. For QE in particular, high-quality labeled data is often lacking due to the high cost and effort associated with labeling such data. Aside from the data scarcity challenge, QE models should also be generalizable, i.e., they should be able t… ▽ More

    Submitted 9 May, 2023; v1 submitted 18 April, 2023; originally announced April 2023.

    Comments: Accepted to EAMT 2023 (main)

  3. arXiv:2109.10859  [pdf, other

    cs.CL cs.AI

    Pushing the Right Buttons: Adversarial Evaluation of Quality Estimation

    Authors: Diptesh Kanojia, Marina Fomicheva, Tharindu Ranasinghe, Frédéric Blain, Constantin Orăsan, Lucia Specia

    Abstract: Current Machine Translation (MT) systems achieve very good results on a growing variety of language pairs and datasets. However, they are known to produce fluent translation outputs that can contain important meaning errors, thus undermining their reliability in practice. Quality Estimation (QE) is the task of automatically assessing the performance of MT systems at test time. Thus, in order to be… ▽ More

    Submitted 22 September, 2021; originally announced September 2021.

    Comments: Accepted to WMT 2021 Conference co-located with EMNLP 2021. 14 pages with a 4 page appendix

  4. arXiv:2107.00411  [pdf, other

    cs.CL

    Knowledge Distillation for Quality Estimation

    Authors: Amit Gajbhiye, Marina Fomicheva, Fernando Alva-Manchego, Frédéric Blain, Abiola Obamuyide, Nikolaos Aletras, Lucia Specia

    Abstract: Quality Estimation (QE) is the task of automatically predicting Machine Translation quality in the absence of reference translations, making it applicable in real-time settings, such as translating online social media conversations. Recent success in QE stems from the use of multilingual pre-trained representations, where very large models lead to impressive results. However, the inference time, d… ▽ More

    Submitted 1 July, 2021; originally announced July 2021.

    Comments: ACL Findings 2021

  5. arXiv:2104.05688  [pdf, other

    cs.CL cs.HC

    Backtranslation Feedback Improves User Confidence in MT, Not Quality

    Authors: Vilém Zouhar, Michal Novák, Matúš Žilinec, Ondřej Bojar, Mateo Obregón, Robin L. Hill, Frédéric Blain, Marina Fomicheva, Lucia Specia, Lisa Yankovskaya

    Abstract: Translating text into a language unknown to the text's author, dubbed outbound translation, is a modern need for which the user experience has significant room for improvement, beyond the basic machine translation facility. We demonstrate this by showing three ways in which user confidence in the outbound translation, as well as its overall final quality, can be affected: backward translation, qua… ▽ More

    Submitted 12 April, 2021; originally announced April 2021.

    Comments: 9 pages (excluding references); to appear at NAACL-HWT 2021

  6. arXiv:2010.04480  [pdf, other

    cs.CL

    MLQE-PE: A Multilingual Quality Estimation and Post-Editing Dataset

    Authors: Marina Fomicheva, Shuo Sun, Erick Fonseca, Chrysoula Zerva, Frédéric Blain, Vishrav Chaudhary, Francisco Guzmán, Nina Lopatina, Lucia Specia, André F. T. Martins

    Abstract: We present MLQE-PE, a new dataset for Machine Translation (MT) Quality Estimation (QE) and Automatic Post-Editing (APE). The dataset contains eleven language pairs, with human labels for up to 10,000 translations per language pair in the following formats: sentence-level direct assessments and post-editing effort, and word-level good/bad labels. It also contains the post-edited sentences, as well… ▽ More

    Submitted 11 October, 2021; v1 submitted 9 October, 2020; originally announced October 2020.

  7. arXiv:2005.10608  [pdf, other

    cs.CL

    Unsupervised Quality Estimation for Neural Machine Translation

    Authors: Marina Fomicheva, Shuo Sun, Lisa Yankovskaya, Frédéric Blain, Francisco Guzmán, Mark Fishel, Nikolaos Aletras, Vishrav Chaudhary, Lucia Specia

    Abstract: Quality Estimation (QE) is an important component in making Machine Translation (MT) useful in real-world applications, as it is aimed to inform the user on the quality of the MT output at test time. Existing approaches require large amounts of expert annotated data, computation and time for training. As an alternative, we devise an unsupervised approach to QE where no training or access to additi… ▽ More

    Submitted 20 July, 2020; v1 submitted 21 May, 2020; originally announced May 2020.

    Comments: Accepted for publication in TACL. Authors' final version