Skip to main content

Showing 1–1 of 1 results for author: Castillo, D A M

.
  1. arXiv:2305.17404  [pdf, other

    cs.CL

    Parallel Corpus for Indigenous Language Translation: Spanish-Mazatec and Spanish-Mixtec

    Authors: Atnafu Lambebo Tonja, Christian Maldonado-Sifuentes, David Alejandro Mendoza Castillo, Olga Kolesnikova, Noé Castro-Sánchez, Grigori Sidorov, Alexander Gelbukh

    Abstract: In this paper, we present a parallel Spanish-Mazatec and Spanish-Mixtec corpus for machine translation (MT) tasks, where Mazatec and Mixtec are two indigenous Mexican languages. We evaluated the usability of the collected corpus using three different approaches: transformer, transfer learning, and fine-tuning pre-trained multilingual MT models. Fine-tuning the Facebook M2M100-48 model outperformed… ▽ More

    Submitted 27 May, 2023; originally announced May 2023.

    Comments: Accepted to Third Workshop on NLP for Indigenous Languages of the Americas