Skip to main content

Showing 1–1 of 1 results for author: Hatanpää, V

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.01856  [pdf, other

    cs.CL

    Poro 34B and the Blessing of Multilinguality

    Authors: Risto Luukkonen, Jonathan Burdge, Elaine Zosa, Aarne Talman, Ville Komulainen, Väinö Hatanpää, Peter Sarlin, Sampo Pyysalo

    Abstract: The pretraining of state-of-the-art large language models now requires trillions of words of text, which is orders of magnitude more than available for the vast majority of languages. While including text in more than one language is an obvious way to acquire more pretraining data, multilinguality is often seen as a curse, and most model training efforts continue to focus near-exclusively on indiv… ▽ More

    Submitted 24 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.