Skip to main content

Showing 1–5 of 5 results for author: Cherniuk, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.09737  [pdf, other

    cs.LG cs.CL

    Quantization of Large Language Models with an Overdetermined Basis

    Authors: Daniil Merkulov, Daria Cherniuk, Alexander Rudikov, Ivan Oseledets, Ekaterina Muravleva, Aleksandr Mikhalev, Boris Kashin

    Abstract: In this paper, we introduce an algorithm for data quantization based on the principles of Kashin representation. This approach hinges on decomposing any given vector, matrix, or tensor into two factors. The first factor maintains a small infinity norm, while the second exhibits a similarly constrained norm when multiplied by an orthogonal matrix. Surprisingly, the entries of factors after decompos… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  2. arXiv:2402.01376  [pdf

    cs.CL cs.AI cs.LG

    LoTR: Low Tensor Rank Weight Adaptation

    Authors: Daniel Bershatsky, Daria Cherniuk, Talgat Daulbaev, Aleksandr Mikhalev, Ivan Oseledets

    Abstract: In this paper we generalize and extend an idea of low-rank adaptation (LoRA) of large language models (LLMs) based on Transformer architecture. Widely used LoRA-like methods of fine-tuning LLMs are based on matrix factorization of gradient update. We introduce LoTR, a novel approach for parameter-efficient fine-tuning of LLMs which represents a gradient update to parameters in a form of tensor dec… ▽ More

    Submitted 5 February, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

    Comments: Submitted; missing author and sections were added;

  3. arXiv:2312.03415  [pdf, other

    cs.LG

    Run LoRA Run: Faster and Lighter LoRA Implementations

    Authors: Daria Cherniuk, Aleksandr Mikhalev, Ivan Oseledets

    Abstract: LoRA is a technique that reduces the number of trainable parameters in a neural network by introducing low-rank adapters to linear layers. This technique is used both for fine-tuning and full training of large language models. This paper presents the RunLoRA framework for efficient implementations of LoRA that significantly improves the speed of neural network training and fine-tuning using low-ra… ▽ More

    Submitted 14 June, 2024; v1 submitted 6 December, 2023; originally announced December 2023.

  4. arXiv:2308.04595  [pdf, other

    cs.LG

    Quantization Aware Factorization for Deep Neural Network Compression

    Authors: Daria Cherniuk, Stanislav Abukhovich, Anh-Huy Phan, Ivan Oseledets, Andrzej Cichocki, Julia Gusak

    Abstract: Tensor decomposition of convolutional and fully-connected layers is an effective way to reduce parameters and FLOP in neural networks. Due to memory and power consumption limitations of mobile or embedded devices, the quantization step is usually necessary when pre-trained models are deployed. A conventional post-training quantization approach applied to networks with decomposed weights yields a d… ▽ More

    Submitted 8 August, 2023; originally announced August 2023.

  5. arXiv:2202.10435  [pdf, ps, other

    cs.LG cs.AI

    Survey on Large Scale Neural Network Training

    Authors: Julia Gusak, Daria Cherniuk, Alena Shilova, Alexander Katrutsa, Daniel Bershatsky, Xunyi Zhao, Lionel Eyraud-Dubois, Oleg Shlyazhko, Denis Dimitrov, Ivan Oseledets, Olivier Beaumont

    Abstract: Modern Deep Neural Networks (DNNs) require significant memory to store weight, activations, and other intermediate tensors during training. Hence, many models do not fit one GPU device or can be trained using only a small per-GPU batch size. This survey provides a systematic overview of the approaches that enable more efficient DNNs training. We analyze techniques that save memory and make good us… ▽ More

    Submitted 21 February, 2022; originally announced February 2022.