Skip to main content

Showing 1–3 of 3 results for author: Kirefu, F

Searching in archive cs. Search in all archives.
.
  1. arXiv:2210.11309  [pdf, other

    cs.CL

    The University of Edinburgh's Submission to the WMT22 Code-Mixing Shared Task (MixMT)

    Authors: Faheem Kirefu, Vivek Iyer, Pinzhen Chen, Laurie Burchell

    Abstract: The University of Edinburgh participated in the WMT22 shared task on code-mixed translation. This consists of two subtasks: i) generating code-mixed Hindi/English (Hinglish) text generation from parallel Hindi and English sentences and ii) machine translation from Hinglish to English. As both subtasks are considered low-resource, we focused our efforts on careful data generation and curation, espe… ▽ More

    Submitted 20 October, 2022; originally announced October 2022.

  2. arXiv:2001.09907  [pdf, other

    cs.CL

    PMIndia -- A Collection of Parallel Corpora of Languages of India

    Authors: Barry Haddow, Faheem Kirefu

    Abstract: Parallel text is required for building high-quality machine translation (MT) systems, as well as for other multilingual NLP applications. For many South Asian languages, such data is in short supply. In this paper, we described a new publicly available corpus (PMIndia) consisting of parallel sentences which pair 13 major languages of India with English. The corpus includes up to 56000 sentences fo… ▽ More

    Submitted 27 January, 2020; originally announced January 2020.

  3. arXiv:1907.05854  [pdf, other

    cs.CL

    The University of Edinburgh's Submissions to the WMT19 News Translation Task

    Authors: Rachel Bawden, Nikolay Bogoychev, Ulrich Germann, Roman Grundkiewicz, Faheem Kirefu, Antonio Valerio Miceli Barone, Alexandra Birch

    Abstract: The University of Edinburgh participated in the WMT19 Shared Task on News Translation in six language directions: English-to-Gujarati, Gujarati-to-English, English-to-Chinese, Chinese-to-English, German-to-English, and English-to-Czech. For all translation directions, we created or used back-translations of monolingual data in the target language as additional synthetic training data. For English-… ▽ More

    Submitted 12 July, 2019; originally announced July 2019.

    Comments: To appear in the Proceedings of WMT19: Shared Task Papers