Skip to main content

Showing 1–1 of 1 results for author: Weymann, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2311.14838  [pdf, other

    cs.CL

    OpusCleaner and OpusTrainer, open source toolkits for training Machine Translation and Large language models

    Authors: Nikolay Bogoychev, Jelmer van der Linde, Graeme Nail, Barry Haddow, Jaume Zaragoza-Bernabeu, Gema Ramírez-Sánchez, Lukas Weymann, Tudor Nicolae Mateiu, **dřich Helcl, Mikko Aulamo

    Abstract: Develo** high quality machine translation systems is a labour intensive, challenging and confusing process for newcomers to the field. We present a pair of tools OpusCleaner and OpusTrainer that aim to simplify the process, reduce the amount of work and lower the entry barrier for newcomers. OpusCleaner is a data downloading, cleaning, and proprocessing toolkit. It is designed to allow researc… ▽ More

    Submitted 24 November, 2023; originally announced November 2023.

    Comments: Code on Github: https://github.com/hplt-project/OpusCleaner and https://github.com/hplt-project/OpusTrainer