Vikhr: The Family of Open-Source Instruction-Tuned Large Language Models for Russian

Nikolich, Aleksandr; Korolev, Konstantin; Shelmanov, Artem; Kiselev, Igor

Computer Science > Computation and Language

arXiv:2405.13929 (cs)

[Submitted on 22 May 2024 (v1), last revised 19 Jun 2024 (this version, v2)]

Title:Vikhr: The Family of Open-Source Instruction-Tuned Large Language Models for Russian

Authors:Aleksandr Nikolich, Konstantin Korolev, Artem Shelmanov, Igor Kiselev

View PDF HTML (experimental)

Abstract:There has been a surge in the development of various Large Language Models (LLMs). However, text generation for languages other than English often faces significant challenges, including poor generation quality and the reduced computational performance due to the disproportionate representation of tokens in model's vocabulary. In this work, we address these issues and introduce Vikhr, a new state-of-the-art open-source instruction-tuned LLM designed specifically for the Russian language. Unlike previous efforts for Russian that utilize computationally inexpensive LoRA adapters on top of English-oriented models, Vikhr features an adapted tokenizer vocabulary and undergoes the continued pre-training and instruction tuning of all weights. This approach not only enhances the model's performance but also significantly improves its computational and contextual efficiency. The remarkable performance of Vikhr across various Russian-language benchmarks can also be attributed to our efforts in expanding instruction datasets and corpora for continued pre-training. Vikhr not only sets the new state of the art among open-source LLMs for Russian, but even outperforms some proprietary closed-source models on certain benchmarks. The model weights, instruction sets, and code are publicly available

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2405.13929 [cs.CL]
	(or arXiv:2405.13929v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2405.13929

Submission history

From: Aleksandr Nikolich [view email]
[v1] Wed, 22 May 2024 18:58:58 UTC (7,238 KB)
[v2] Wed, 19 Jun 2024 17:32:23 UTC (7,238 KB)

Computer Science > Computation and Language

Title:Vikhr: The Family of Open-Source Instruction-Tuned Large Language Models for Russian

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Vikhr: The Family of Open-Source Instruction-Tuned Large Language Models for Russian

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators