-
TEncDM: Understanding the Properties of Diffusion Model in the Space of Language Model Encodings
Authors:
Alexander Shabalin,
Viacheslav Meshchaninov,
Tingir Badmaev,
Dmitry Molchanov,
Grigory Bartosh,
Sergey Markov,
Dmitry Vetrov
Abstract:
Drawing inspiration from the success of diffusion models in various domains, numerous research papers proposed methods for adapting them to text data. Despite these efforts, none of them has managed to achieve the quality of the large language models. In this paper, we conduct a comprehensive analysis of key components of the text diffusion models and introduce a novel approach named Text Encoding…
▽ More
Drawing inspiration from the success of diffusion models in various domains, numerous research papers proposed methods for adapting them to text data. Despite these efforts, none of them has managed to achieve the quality of the large language models. In this paper, we conduct a comprehensive analysis of key components of the text diffusion models and introduce a novel approach named Text Encoding Diffusion Model (TEncDM). Instead of the commonly used token embedding space, we train our model in the space of the language model encodings. Additionally, we propose to use a Transformer-based decoder that utilizes contextual information for text reconstruction. We also analyse self-conditioning and find that it increases the magnitude of the model outputs, allowing the reduction of the number of denoising steps at the inference stage. Evaluation of TEncDM on two downstream text generation tasks, QQP and XSum, demonstrates its superiority over existing non-autoregressive models.
△ Less
Submitted 29 February, 2024;
originally announced February 2024.
-
MERA: A Comprehensive LLM Evaluation in Russian
Authors:
Alena Fenogenova,
Artem Chervyakov,
Nikita Martynov,
Anastasia Kozlova,
Maria Tikhonova,
Albina Akhmetgareeva,
Anton Emelyanov,
Denis Shevelev,
Pavel Lebedev,
Leonid Sinev,
Ulyana Isaeva,
Katerina Kolomeytseva,
Daniil Moskovskiy,
Elizaveta Goncharova,
Nikita Savushkin,
Polina Mikhailova,
Denis Dimitrov,
Alexander Panchenko,
Sergei Markov
Abstract:
Over the past few years, one of the most notable advancements in AI research has been in foundation models (FMs), headlined by the rise of language models (LMs). As the models' size increases, LMs demonstrate enhancements in measurable aspects and the development of new qualitative features. However, despite researchers' attention and the rapid growth in LM application, the capabilities, limitatio…
▽ More
Over the past few years, one of the most notable advancements in AI research has been in foundation models (FMs), headlined by the rise of language models (LMs). As the models' size increases, LMs demonstrate enhancements in measurable aspects and the development of new qualitative features. However, despite researchers' attention and the rapid growth in LM application, the capabilities, limitations, and associated risks still need to be better understood. To address these issues, we introduce an open Multimodal Evaluation of Russian-language Architectures (MERA), a new instruction benchmark for evaluating foundation models oriented towards the Russian language. The benchmark encompasses 21 evaluation tasks for generative models in 11 skill domains and is designed as a black-box test to ensure the exclusion of data leakage. The paper introduces a methodology to evaluate FMs and LMs in zero- and few-shot fixed instruction settings that can be extended to other modalities. We propose an evaluation methodology, an open-source code base for the MERA assessment, and a leaderboard with a submission system. We evaluate open LMs as baselines and find that they are still far behind the human level. We publicly release MERA to guide forthcoming research, anticipate groundbreaking model features, standardize the evaluation procedure, and address potential societal drawbacks.
△ Less
Submitted 12 January, 2024; v1 submitted 9 January, 2024;
originally announced January 2024.
-
A Family of Pretrained Transformer Language Models for Russian
Authors:
Dmitry Zmitrovich,
Alexander Abramov,
Andrey Kalmykov,
Maria Tikhonova,
Ekaterina Taktasheva,
Danil Astafurov,
Mark Baushenko,
Artem Snegirev,
Vitalii Kadulin,
Sergey Markov,
Tatiana Shavrina,
Vladislav Mikhailov,
Alena Fenogenova
Abstract:
Transformer language models (LMs) are fundamental to NLP research methodologies and applications in various languages. However, develo** such models specifically for the Russian language has received little attention. This paper introduces a collection of 13 Russian Transformer LMs, which spans encoder (ruBERT, ruRoBERTa, ruELECTRA), decoder (ruGPT-3), and encoder-decoder (ruT5, FRED-T5) archite…
▽ More
Transformer language models (LMs) are fundamental to NLP research methodologies and applications in various languages. However, develo** such models specifically for the Russian language has received little attention. This paper introduces a collection of 13 Russian Transformer LMs, which spans encoder (ruBERT, ruRoBERTa, ruELECTRA), decoder (ruGPT-3), and encoder-decoder (ruT5, FRED-T5) architectures. We provide a report on the model architecture design and pretraining, and the results of evaluating their generalization abilities on Russian language understanding and generation datasets and benchmarks. By pretraining and releasing these specialized Transformer LMs, we aim to broaden the scope of the NLP research directions and enable the development of industrial solutions for the Russian language.
△ Less
Submitted 18 April, 2024; v1 submitted 19 September, 2023;
originally announced September 2023.
-
RuCLIP -- new models and experiments: a technical report
Authors:
Alex Shonenkov,
Andrey Kuznetsov,
Denis Dimitrov,
Tatyana Shavrina,
Daniil Chesakov,
Anastasia Maltseva,
Alena Fenogenova,
Igor Pavlov,
Anton Emelyanov,
Sergey Markov,
Daria Bakshandaeva,
Vera Shybaeva,
Andrey Chertok
Abstract:
In the report we propose six new implementations of ruCLIP model trained on our 240M pairs. The accuracy results are compared with original CLIP model with Ru-En translation (OPUS-MT) on 16 datasets from different domains. Our best implementations outperform CLIP + OPUS-MT solution on most of the datasets in few-show and zero-shot tasks. In the report we briefly describe the implementations and co…
▽ More
In the report we propose six new implementations of ruCLIP model trained on our 240M pairs. The accuracy results are compared with original CLIP model with Ru-En translation (OPUS-MT) on 16 datasets from different domains. Our best implementations outperform CLIP + OPUS-MT solution on most of the datasets in few-show and zero-shot tasks. In the report we briefly describe the implementations and concentrate on the conducted experiments. Inference execution time comparison is also presented in the report.
△ Less
Submitted 22 February, 2022;
originally announced February 2022.