Skip to main content

Showing 1–10 of 10 results for author: Kuratov, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.04841  [pdf, other

    cs.CL cs.AI cs.LG

    Associative Recurrent Memory Transformer

    Authors: Ivan Rodkin, Yuri Kuratov, Aydar Bulatov, Mikhail Burtsev

    Abstract: This paper addresses the challenge of creating a neural architecture for very long sequences that requires constant time for processing new information at each time step. Our approach, Associative Recurrent Memory Transformer (ARMT), is based on transformer self-attention for local context and segment-level recurrence for storage of task specific information distributed over a long context. We dem… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: ICML 2024 Next Generation of Sequence Modeling Architectures Workshop

    ACM Class: I.2.7

  2. arXiv:2406.10149  [pdf, other

    cs.CL cs.AI

    BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack

    Authors: Yuri Kuratov, Aydar Bulatov, Petr Anokhin, Ivan Rodkin, Dmitry Sorokin, Artyom Sorokin, Mikhail Burtsev

    Abstract: In recent years, the input context sizes of large language models (LLMs) have increased dramatically. However, existing evaluation methods have not kept pace, failing to comprehensively assess the efficiency of models in handling long contexts. To bridge this gap, we introduce the BABILong benchmark, designed to test language models' ability to reason across facts distributed in extremely long doc… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  3. arXiv:2402.10790  [pdf, other

    cs.CL cs.AI cs.LG

    In Search of Needles in a 11M Haystack: Recurrent Memory Finds What LLMs Miss

    Authors: Yuri Kuratov, Aydar Bulatov, Petr Anokhin, Dmitry Sorokin, Artyom Sorokin, Mikhail Burtsev

    Abstract: This paper addresses the challenge of processing long documents using generative transformer models. To evaluate different approaches, we introduce BABILong, a new benchmark designed to assess model capabilities in extracting and processing distributed facts within extensive texts. Our evaluation, which includes benchmarks for GPT-4 and RAG, reveals that common methods are effective only for seque… ▽ More

    Submitted 20 February, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

    Comments: 11M tokens, fix qa3 min facts per task in Table 1

  4. arXiv:2311.01326  [pdf, other

    cs.CL cs.AI

    Better Together: Enhancing Generative Knowledge Graph Completion with Language Models and Neighborhood Information

    Authors: Alla Chepurova, Aydar Bulatov, Yuri Kuratov, Mikhail Burtsev

    Abstract: Real-world Knowledge Graphs (KGs) often suffer from incompleteness, which limits their potential performance. Knowledge Graph Completion (KGC) techniques aim to address this issue. However, traditional KGC methods are computationally intensive and impractical for large-scale KGs, necessitating the learning of dense node embeddings and computing pairwise distances. Generative transformer-based lang… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

    Comments: Accepted to Findings of the Association for Computational Linguistics: EMNLP 2023

  5. arXiv:2304.11062  [pdf, other

    cs.CL cs.AI cs.LG

    Scaling Transformer to 1M tokens and beyond with RMT

    Authors: Aydar Bulatov, Yuri Kuratov, Yermek Kapushev, Mikhail S. Burtsev

    Abstract: A major limitation for the broader scope of problems solvable by transformers is the quadratic scaling of computational complexity with input size. In this study, we investigate the recurrent memory augmentation of pre-trained transformer models to extend input context length while linearly scaling compute. Our approach demonstrates the capability to store information in memory for sequences of up… ▽ More

    Submitted 6 February, 2024; v1 submitted 19 April, 2023; originally announced April 2023.

  6. arXiv:2207.06881  [pdf, other

    cs.CL cs.LG

    Recurrent Memory Transformer

    Authors: Aydar Bulatov, Yuri Kuratov, Mikhail S. Burtsev

    Abstract: Transformer-based models show their effectiveness across multiple domains and tasks. The self-attention allows to combine information from all sequence elements into context-aware representations. However, global and local information has to be stored mostly in the same element-wise representations. Moreover, the length of an input sequence is limited by quadratic computational complexity of self-… ▽ More

    Submitted 8 December, 2022; v1 submitted 14 July, 2022; originally announced July 2022.

    Comments: 36th Conference on Neural Information Processing Systems (NeurIPS 2022)

  7. arXiv:2205.02340  [pdf, other

    cs.CL cs.LG

    Knowledge Distillation of Russian Language Models with Reduction of Vocabulary

    Authors: Alina Kolesnikova, Yuri Kuratov, Vasily Konovalov, Mikhail Burtsev

    Abstract: Today, transformer language models serve as a core component for majority of natural language processing tasks. Industrial application of such models requires minimization of computation time and memory footprint. Knowledge distillation is one of approaches to address this goal. Existing methods in this field are mainly focused on reducing the number of layers or dimension of embeddings/hidden rep… ▽ More

    Submitted 4 May, 2022; originally announced May 2022.

  8. arXiv:2006.11527  [pdf, other

    cs.CL cs.LG cs.NE

    Memory Transformer

    Authors: Mikhail S. Burtsev, Yuri Kuratov, Anton Peganov, Grigory V. Sapunov

    Abstract: Transformer-based models have achieved state-of-the-art results in many natural language processing tasks. The self-attention architecture allows transformer to combine information from all elements of a sequence into context-aware representations. However, information about the context is stored mostly in the same element-wise representations. This might limit the processing of properties related… ▽ More

    Submitted 16 February, 2021; v1 submitted 20 June, 2020; originally announced June 2020.

  9. arXiv:2002.02450  [pdf, other

    cs.CL cs.LG stat.ML

    Goal-Oriented Multi-Task BERT-Based Dialogue State Tracker

    Authors: Pavel Gulyaev, Eugenia Elistratova, Vasily Konovalov, Yuri Kuratov, Leonid Pugachev, Mikhail Burtsev

    Abstract: Dialogue State Tracking (DST) is a core component of virtual assistants such as Alexa or Siri. To accomplish various tasks, these assistants need to support an increasing number of services and APIs. The Schema-Guided State Tracking track of the 8th Dialogue System Technology Challenge highlighted the DST problem for unseen services. The organizers introduced the Schema-Guided Dialogue (SGD) datas… ▽ More

    Submitted 5 February, 2020; originally announced February 2020.

  10. arXiv:1905.07213  [pdf, other

    cs.CL

    Adaptation of Deep Bidirectional Multilingual Transformers for Russian Language

    Authors: Yuri Kuratov, Mikhail Arkhipov

    Abstract: The paper introduces methods of adaptation of multilingual masked language models for a specific language. Pre-trained bidirectional language models show state-of-the-art performance on a wide range of tasks including reading comprehension, natural language inference, and sentiment analysis. At the moment there are two alternative approaches to train such models: monolingual and multilingual. Whil… ▽ More

    Submitted 17 May, 2019; originally announced May 2019.