Skip to main content

Showing 1–3 of 3 results for author: Cromieres, F

Searching in archive cs. Search in all archives.
.
  1. arXiv:2208.13170  [pdf, ps, other

    cs.CL

    CJaFr-v3 : A Freely Available Filtered Japanese-French Aligned Corpus

    Authors: Raoul Blin, Fabien Cromières

    Abstract: We present a free Japanese-French parallel corpus. It includes 15M aligned segments and is obtained by compiling and filtering several existing resources. In this paper, we describe the existing resources, their quantity and quality, the filtering we applied to improve the quality of the corpus, and the content of the ready-to-use corpus. We also evaluate the usefulness of this corpus and the qual… ▽ More

    Submitted 28 August, 2022; originally announced August 2022.

  2. arXiv:2005.03361  [pdf, other

    cs.CL

    JASS: Japanese-specific Sequence to Sequence Pre-training for Neural Machine Translation

    Authors: Zhuoyuan Mao, Fabien Cromieres, Raj Dabre, Haiyue Song, Sadao Kurohashi

    Abstract: Neural machine translation (NMT) needs large parallel corpora for state-of-the-art translation quality. Low-resource NMT is typically addressed by transfer learning which leverages large monolingual or parallel corpora for pre-training. Monolingual pre-training approaches such as MASS (MAsked Sequence to Sequence) are extremely effective in boosting NMT quality for languages with small parallel co… ▽ More

    Submitted 7 May, 2020; originally announced May 2020.

    Comments: LREC 2020

  3. arXiv:1702.06135  [pdf, other

    cs.CL

    Enabling Multi-Source Neural Machine Translation By Concatenating Source Sentences In Multiple Languages

    Authors: Raj Dabre, Fabien Cromieres, Sadao Kurohashi

    Abstract: In this paper, we explore a simple solution to "Multi-Source Neural Machine Translation" (MSNMT) which only relies on preprocessing a N-way multilingual corpus without modifying the Neural Machine Translation (NMT) architecture or training procedure. We simply concatenate the source sentences to form a single long multi-source input sentence while kee** the target side sentence as it is and trai… ▽ More

    Submitted 3 March, 2019; v1 submitted 20 February, 2017; originally announced February 2017.

    Comments: Official version of manuscript which was accepted in MT Summit 2017