Skip to main content

Showing 1–13 of 13 results for author: Grundkiewicz, R

.
  1. arXiv:2406.11580  [pdf, other

    cs.CL

    Error Span Annotation: A Balanced Approach for Human Evaluation of Machine Translation

    Authors: Tom Kocmi, Vilém Zouhar, Eleftherios Avramidis, Roman Grundkiewicz, Marzena Karpinska, Maja Popović, Mrinmaya Sachan, Mariya Shmatova

    Abstract: High-quality Machine Translation (MT) evaluation relies heavily on human judgments. Comprehensive error classification methods, such as Multidimensional Quality Metrics (MQM), are expensive as they are time-consuming and can only be done by experts, whose availability may be limited especially for low-resource languages. On the other hand, just assigning overall scores, like Direct Assessment (DA)… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  2. arXiv:2308.07489  [pdf, other

    cs.CL

    SOTASTREAM: A Streaming Approach to Machine Translation Training

    Authors: Matt Post, Thamme Gowda, Roman Grundkiewicz, Huda Khayrallah, Rohit Jain, Marcin Junczys-Dowmunt

    Abstract: Many machine translation toolkits make use of a data preparation step wherein raw data is transformed into a tensor format that can be used directly by the trainer. This preparation step is increasingly at odds with modern research and development practices because this process produces a static, unchangeable version of the training data, making common training-time needs difficult (e.g., subword… ▽ More

    Submitted 14 August, 2023; originally announced August 2023.

  3. arXiv:2107.10821  [pdf, other

    cs.CL

    To Ship or Not to Ship: An Extensive Evaluation of Automatic Metrics for Machine Translation

    Authors: Tom Kocmi, Christian Federmann, Roman Grundkiewicz, Marcin Junczys-Dowmunt, Hitokazu Matsushita, Arul Menezes

    Abstract: Automatic metrics are commonly used as the exclusive tool for declaring the superiority of one machine translation system's quality over another. The community choice of automatic metric guides research directions and industrial developments by deciding which models are deemed better. Evaluating metrics correlations with sets of human judgements has been limited by the size of these sets. In this… ▽ More

    Submitted 13 September, 2021; v1 submitted 22 July, 2021; originally announced July 2021.

    Comments: Accepted to WMT 2021 research papers

  4. arXiv:2104.10408  [pdf, other

    cs.CL

    On User Interfaces for Large-Scale Document-Level Human Evaluation of Machine Translation Outputs

    Authors: Roman Grundkiewicz, Marcin Junczys-Dowmunt, Christian Federmann, Tom Kocmi

    Abstract: Recent studies emphasize the need of document context in human evaluation of machine translations, but little research has been done on the impact of user interfaces on annotator productivity and the reliability of assessments. In this work, we compare human assessment data from the last two WMT evaluation campaigns collected via two different methods for document-level evaluation. Our analysis sh… ▽ More

    Submitted 21 April, 2021; originally announced April 2021.

    Comments: Presented at HumEval, EACL 2021

  5. arXiv:1907.05854  [pdf, other

    cs.CL

    The University of Edinburgh's Submissions to the WMT19 News Translation Task

    Authors: Rachel Bawden, Nikolay Bogoychev, Ulrich Germann, Roman Grundkiewicz, Faheem Kirefu, Antonio Valerio Miceli Barone, Alexandra Birch

    Abstract: The University of Edinburgh participated in the WMT19 Shared Task on News Translation in six language directions: English-to-Gujarati, Gujarati-to-English, English-to-Chinese, Chinese-to-English, German-to-English, and English-to-Czech. For all translation directions, we created or used back-translations of monolingual data in the target language as additional synthetic training data. For English-… ▽ More

    Submitted 12 July, 2019; originally announced July 2019.

    Comments: To appear in the Proceedings of WMT19: Shared Task Papers

  6. arXiv:1809.00188  [pdf, other

    cs.CL

    MS-UEdin Submission to the WMT2018 APE Shared Task: Dual-Source Transformer for Automatic Post-Editing

    Authors: Marcin Junczys-Dowmunt, Roman Grundkiewicz

    Abstract: This paper describes the Microsoft and University of Edinburgh submission to the Automatic Post-editing shared task at WMT2018. Based on training data and systems from the WMT2017 shared task, we re-implement our own models from the last shared task and introduce improvements based on extensive parameter sharing. Next we experiment with our implementation of dual-source transformer models and data… ▽ More

    Submitted 1 September, 2018; originally announced September 2018.

    Comments: Winning submissions for WMT2018 APE shared task

  7. arXiv:1805.12096  [pdf, other

    cs.CL

    Marian: Cost-effective High-Quality Neural Machine Translation in C++

    Authors: Marcin Junczys-Dowmunt, Kenneth Heafield, Hieu Hoang, Roman Grundkiewicz, Anthony Aue

    Abstract: This paper describes the submissions of the "Marian" team to the WNMT 2018 shared task. We investigate combinations of teacher-student training, low-precision matrix products, auto-tuning and other methods to optimize the Transformer model on GPU and CPU. By further integrating these methods with the new averaging attention networks, a recently introduced faster Transformer variant, we create a nu… ▽ More

    Submitted 30 May, 2018; originally announced May 2018.

    Comments: System submission to the Workshop for Neural Machine Translation 2018, efficiency task

  8. arXiv:1804.05945  [pdf, other

    cs.CL

    Near Human-Level Performance in Grammatical Error Correction with Hybrid Machine Translation

    Authors: Roman Grundkiewicz, Marcin Junczys-Dowmunt

    Abstract: We combine two of the most popular approaches to automated Grammatical Error Correction (GEC): GEC based on Statistical Machine Translation (SMT) and GEC based on Neural Machine Translation (NMT). The hybrid system achieves new state-of-the-art results on the CoNLL-2014 and JFLEG benchmarks. This GEC system preserves the accuracy of SMT output and, at the same time, generates more fluent sentences… ▽ More

    Submitted 16 April, 2018; originally announced April 2018.

    Comments: Accepted for oral presentation, research track, short papers, at NAACL 2018

  9. arXiv:1804.05940  [pdf, other

    cs.CL

    Approaching Neural Grammatical Error Correction as a Low-Resource Machine Translation Task

    Authors: Marcin Junczys-Dowmunt, Roman Grundkiewicz, Shubha Guha, Kenneth Heafield

    Abstract: Previously, neural methods in grammatical error correction (GEC) did not reach state-of-the-art results compared to phrase-based statistical machine translation (SMT) baselines. We demonstrate parallels between neural GEC and low-resource neural MT and successfully adapt several methods from low-resource MT to neural GEC. We further establish guidelines for trustable results in neural GEC and prop… ▽ More

    Submitted 16 April, 2018; originally announced April 2018.

    Comments: Accepted for oral presentation in long paper research track at NAACL 2018

  10. arXiv:1804.00344  [pdf, other

    cs.CL

    Marian: Fast Neural Machine Translation in C++

    Authors: Marcin Junczys-Dowmunt, Roman Grundkiewicz, Tomasz Dwojak, Hieu Hoang, Kenneth Heafield, Tom Neckermann, Frank Seide, Ulrich Germann, Alham Fikri Aji, Nikolay Bogoychev, André F. T. Martins, Alexandra Birch

    Abstract: We present Marian, an efficient and self-contained Neural Machine Translation framework with an integrated automatic differentiation engine based on dynamic computation graphs. Marian is written entirely in C++. We describe the design of the encoder-decoder framework and demonstrate that a research-friendly toolkit can achieve high training and translation speed.

    Submitted 4 April, 2018; v1 submitted 1 April, 2018; originally announced April 2018.

    Comments: Demonstration paper

  11. arXiv:1706.04138  [pdf, other

    cs.CL

    An Exploration of Neural Sequence-to-Sequence Architectures for Automatic Post-Editing

    Authors: Marcin Junczys-Dowmunt, Roman Grundkiewicz

    Abstract: In this work, we explore multiple neural architectures adapted for the task of automatic post-editing of machine translation output. We focus on neural end-to-end models that combine both inputs $mt$ (raw MT output) and $src$ (source language input) in a single neural architecture, modeling $\{mt, src\} \rightarrow pe$ directly. Apart from that, we investigate the influence of hard-attention model… ▽ More

    Submitted 30 September, 2017; v1 submitted 13 June, 2017; originally announced June 2017.

    Comments: Accepted for presentation at IJCNLP 2017

  12. arXiv:1605.06353  [pdf, other

    cs.CL

    Phrase-based Machine Translation is State-of-the-Art for Automatic Grammatical Error Correction

    Authors: Marcin Junczys-Dowmunt, Roman Grundkiewicz

    Abstract: In this work, we study parameter tuning towards the M^2 metric, the standard metric for automatic grammar error correction (GEC) tasks. After implementing M^2 as a scorer in the Moses tuning framework, we investigate interactions of dense and sparse features, different optimizers, and tuning strategies for the CoNLL-2014 shared task. We notice erratic behavior when optimizing sparse feature weight… ▽ More

    Submitted 5 October, 2016; v1 submitted 20 May, 2016; originally announced May 2016.

    Comments: Accepted for publication at EMNLP 2016

  13. arXiv:1605.04800  [pdf, ps, other

    cs.CL

    Log-linear Combinations of Monolingual and Bilingual Neural Machine Translation Models for Automatic Post-Editing

    Authors: Marcin Junczys-Dowmunt, Roman Grundkiewicz

    Abstract: This paper describes the submission of the AMU (Adam Mickiewicz University) team to the Automatic Post-Editing (APE) task of WMT 2016. We explore the application of neural translation models to the APE problem and achieve good results by treating different models as components in a log-linear model, allowing for multiple inputs (the MT-output and the source) that are decoded to the same target lan… ▽ More

    Submitted 23 June, 2016; v1 submitted 16 May, 2016; originally announced May 2016.

    Comments: Submission to the WMT 2016 shared task on Automatic Post-Editing