Skip to main content

Showing 1–10 of 10 results for author: Nadejde, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.08255  [pdf, other

    cs.CL

    M3T: A New Benchmark Dataset for Multi-Modal Document-Level Machine Translation

    Authors: Benjamin Hsu, Xiaoyu Liu, Huayang Li, Yoshinari Fu**uma, Maria Nadejde, Xing Niu, Yair Kittenplon, Ron Litman, Raghavendra Pappagari

    Abstract: Document translation poses a challenge for Neural Machine Translation (NMT) systems. Most document-level NMT systems rely on meticulously curated sentence-level parallel data, assuming flawless extraction of text from documents along with their precise reading order. These systems also tend to disregard additional visual cues such as the document layout, deeming it irrelevant. However, real-world… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: NAACL 2024, dataset at https://github.com/amazon-science/m3t-multi-modal-translation-bench

  2. RAMP: Retrieval and Attribute-Marking Enhanced Prompting for Attribute-Controlled Translation

    Authors: Gabriele Sarti, Phu Mon Htut, Xing Niu, Benjamin Hsu, Anna Currey, Georgiana Dinu, Maria Nadejde

    Abstract: Attribute-controlled translation (ACT) is a subtask of machine translation that involves controlling stylistic or linguistic attributes (like formality and gender) of translation outputs. While ACT has garnered attention in recent years due to its usefulness in real-world applications, progress in the task is currently limited by dataset availability, since most prior approaches rely on supervised… ▽ More

    Submitted 26 May, 2023; originally announced May 2023.

    Comments: Accepted at ACL 2023

    Journal ref: Proceedings of ACL (2023) 1476-1490

  3. arXiv:2305.11808  [pdf, other

    cs.CL

    Pseudo-Label Training and Model Inertia in Neural Machine Translation

    Authors: Benjamin Hsu, Anna Currey, Xing Niu, Maria Nădejde, Georgiana Dinu

    Abstract: Like many other machine learning applications, neural machine translation (NMT) benefits from over-parameterized deep neural models. However, these models have been observed to be brittle: NMT model predictions are sensitive to small input changes and can show significant variation across re-training or incremental model updates. This work studies a frequently used method in NMT, pseudo-label trai… ▽ More

    Submitted 19 May, 2023; originally announced May 2023.

    Comments: accepted ICLR 2023

  4. arXiv:2211.01355  [pdf, other

    cs.CL

    MT-GenEval: A Counterfactual and Contextual Dataset for Evaluating Gender Accuracy in Machine Translation

    Authors: Anna Currey, Maria Nădejde, Raghavendra Pappagari, Mia Mayer, Stanislas Lauly, Xing Niu, Benjamin Hsu, Georgiana Dinu

    Abstract: As generic machine translation (MT) quality has improved, the need for targeted benchmarks that explore fine-grained aspects of quality has increased. In particular, gender accuracy in translation can have implications in terms of output fluency, translation accuracy, and ethics. In this paper, we introduce MT-GenEval, a benchmark for evaluating gender accuracy in translation from English into eig… ▽ More

    Submitted 2 November, 2022; originally announced November 2022.

    Comments: Accepted at EMNLP 2022. Data and code: https://github.com/amazon-research/machine-translation-gender-eval

  5. arXiv:2210.10906  [pdf, other

    cs.CL cs.LG

    A baseline revisited: Pushing the limits of multi-segment models for context-aware translation

    Authors: Suvodeep Majumder, Stanislas Lauly, Maria Nadejde, Marcello Federico, Georgiana Dinu

    Abstract: This paper addresses the task of contextual translation using multi-segment models. Specifically we show that increasing model capacity further pushes the limits of this approach and that deeper models are more suited to capture context dependencies. Furthermore, improvements observed with larger models can be transferred to smaller models using knowledge distillation. Our experiments show that th… ▽ More

    Submitted 21 October, 2022; v1 submitted 19 October, 2022; originally announced October 2022.

  6. arXiv:2207.05851  [pdf, ps, other

    cs.CL

    Sockeye 3: Fast Neural Machine Translation with PyTorch

    Authors: Felix Hieber, Michael Denkowski, Tobias Domhan, Barbara Darques Barros, Celina Dong Ye, Xing Niu, Cuong Hoang, Ke Tran, Benjamin Hsu, Maria Nadejde, Surafel Lakew, Prashant Mathur, Anna Currey, Marcello Federico

    Abstract: Sockeye 3 is the latest version of the Sockeye toolkit for Neural Machine Translation (NMT). Now based on PyTorch, Sockeye 3 provides faster model implementations and more advanced features with a further streamlined codebase. This enables broader experimentation with faster iteration, efficient training of stronger and faster models, and the flexibility to move new ideas quickly from research to… ▽ More

    Submitted 2 August, 2022; v1 submitted 12 July, 2022; originally announced July 2022.

  7. arXiv:2205.04022  [pdf, other

    cs.CL cs.AI

    CoCoA-MT: A Dataset and Benchmark for Contrastive Controlled MT with Application to Formality

    Authors: Maria Nădejde, Anna Currey, Benjamin Hsu, Xing Niu, Marcello Federico, Georgiana Dinu

    Abstract: The machine translation (MT) task is typically formulated as that of returning a single translation for an input segment. However, in many cases, multiple different translations are valid and the appropriate translation may depend on the intended target audience, characteristics of the speaker, or even the relationship between speakers. Specific problems arise when dealing with honorifics, particu… ▽ More

    Submitted 9 May, 2022; originally announced May 2022.

    Comments: NAACL 2022

  8. arXiv:2006.02964  [pdf, other

    cs.CL

    Personalizing Grammatical Error Correction: Adaptation to Proficiency Level and L1

    Authors: Maria Nadejde, Joel Tetreault

    Abstract: Grammar error correction (GEC) systems have become ubiquitous in a variety of software applications, and have started to approach human-level performance for some datasets. However, very little is known about how to efficiently personalize these systems to the user's characteristics, such as their proficiency level and first language, or to emerging domains of text. We present the first results on… ▽ More

    Submitted 4 June, 2020; originally announced June 2020.

    Comments: Proceedings of the 2019 EMNLP Workshop W-NUT: The 5th Workshop on Noisy User-generated Text

    Journal ref: Proceedings of the 2019 EMNLP Workshop W-NUT: The 5th Workshop on Noisy User-generated Text, pages 27-33, Hong Kong, Nov 4, 2019

  9. arXiv:1703.04357  [pdf, other

    cs.CL

    Nematus: a Toolkit for Neural Machine Translation

    Authors: Rico Sennrich, Orhan Firat, Kyunghyun Cho, Alexandra Birch, Barry Haddow, Julian Hitschler, Marcin Junczys-Dowmunt, Samuel Läubli, Antonio Valerio Miceli Barone, Jozef Mokry, Maria Nădejde

    Abstract: We present Nematus, a toolkit for Neural Machine Translation. The toolkit prioritizes high translation accuracy, usability, and extensibility. Nematus has been used to build top-performing submissions to shared translation tasks at WMT and IWSLT, and has been used to train systems for production environments.

    Submitted 13 March, 2017; originally announced March 2017.

    Comments: EACL 2017 demo track

  10. arXiv:1702.01147  [pdf, other

    cs.CL

    Predicting Target Language CCG Supertags Improves Neural Machine Translation

    Authors: Maria Nadejde, Siva Reddy, Rico Sennrich, Tomasz Dwojak, Marcin Junczys-Dowmunt, Philipp Koehn, Alexandra Birch

    Abstract: Neural machine translation (NMT) models are able to partially learn syntactic information from sequential lexical information. Still, some complex syntactic phenomena such as prepositional phrase attachment are poorly modeled. This work aims to answer two questions: 1) Does explicitly modeling target language syntax help NMT? 2) Is tight integration of words and syntax better than multitask traini… ▽ More

    Submitted 18 July, 2017; v1 submitted 3 February, 2017; originally announced February 2017.

    Comments: Accepted at the Second Conference on Machine Translation (WMT17). This version includes more results regarding target syntax for Romanian->English and reports fewer results regarding source syntax