Skip to main content

Showing 1–7 of 7 results for author: Daulbaev, T

.
  1. arXiv:2402.01376  [pdf

    cs.CL cs.AI cs.LG

    LoTR: Low Tensor Rank Weight Adaptation

    Authors: Daniel Bershatsky, Daria Cherniuk, Talgat Daulbaev, Aleksandr Mikhalev, Ivan Oseledets

    Abstract: In this paper we generalize and extend an idea of low-rank adaptation (LoRA) of large language models (LLMs) based on Transformer architecture. Widely used LoRA-like methods of fine-tuning LLMs are based on matrix factorization of gradient update. We introduce LoTR, a novel approach for parameter-efficient fine-tuning of LLMs which represents a gradient update to parameters in a form of tensor dec… ▽ More

    Submitted 5 February, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

    Comments: Submitted; missing author and sections were added;

  2. arXiv:2103.08561  [pdf, other

    cs.LG

    Meta-Solver for Neural Ordinary Differential Equations

    Authors: Julia Gusak, Alexandr Katrutsa, Talgat Daulbaev, Andrzej Cichocki, Ivan Oseledets

    Abstract: A conventional approach to train neural ordinary differential equations (ODEs) is to fix an ODE solver and then learn the neural network's weights to optimize a target loss function. However, such an approach is tailored for a specific discretization method and its properties, which may not be optimal for the selected application and yield the overfitting to the given solver. In our paper, we inve… ▽ More

    Submitted 15 March, 2021; originally announced March 2021.

  3. arXiv:2004.09222  [pdf, other

    cs.LG stat.ML

    Towards Understanding Normalization in Neural ODEs

    Authors: Julia Gusak, Larisa Markeeva, Talgat Daulbaev, Alexandr Katrutsa, Andrzej Cichocki, Ivan Oseledets

    Abstract: Normalization is an important and vastly investigated technique in deep learning. However, its role for Ordinary Differential Equation based networks (neural ODEs) is still poorly understood. This paper investigates how different normalization techniques affect the performance of neural ODEs. Particularly, we show that it is possible to achieve 93% accuracy in the CIFAR-10 classification task, and… ▽ More

    Submitted 27 April, 2020; v1 submitted 20 April, 2020; originally announced April 2020.

  4. arXiv:2003.05271  [pdf, other

    cs.NE math.NA stat.ML

    Interpolation Technique to Speed Up Gradients Propagation in Neural ODEs

    Authors: Talgat Daulbaev, Alexandr Katrutsa, Larisa Markeeva, Julia Gusak, Andrzej Cichocki, Ivan Oseledets

    Abstract: We propose a simple interpolation-based method for the efficient approximation of gradients in neural ODE models. We compare it with the reverse dynamic method (known in the literature as "adjoint method") to train neural ODEs on classification, density estimation, and inference approximation tasks. We also propose a theoretical justification of our approach using logarithmic norm formalism. As a… ▽ More

    Submitted 30 October, 2020; v1 submitted 11 March, 2020; originally announced March 2020.

  5. arXiv:1910.13025  [pdf, ps, other

    cs.LG math.NA stat.ML

    Active Subspace of Neural Networks: Structural Analysis and Universal Attacks

    Authors: Chunfeng Cui, Kaiqi Zhang, Talgat Daulbaev, Julia Gusak, Ivan Oseledets, Zheng Zhang

    Abstract: Active subspace is a model reduction method widely used in the uncertainty quantification community. In this paper, we propose analyzing the internal structure and vulnerability and deep neural networks using active subspace. Firstly, we employ the active subspace to measure the number of "active neurons" at each intermediate layer and reduce the number of neurons from several thousands to several… ▽ More

    Submitted 29 April, 2020; v1 submitted 28 October, 2019; originally announced October 2019.

  6. arXiv:1910.06995  [pdf, other

    cs.LG stat.ML

    Reduced-Order Modeling of Deep Neural Networks

    Authors: Julia Gusak, Talgat Daulbaev, Evgeny Ponomarev, Andrzej Cichocki, Ivan Oseledets

    Abstract: We introduce a new method for speeding up the inference of deep neural networks. It is somewhat inspired by the reduced-order modeling techniques for dynamical systems.The cornerstone of the proposed method is the maximum volume algorithm. We demonstrate efficiency on neural networks pre-trained on different datasets. We show that in many practical cases it is possible to replace convolutional lay… ▽ More

    Submitted 25 November, 2020; v1 submitted 15 October, 2019; originally announced October 2019.

  7. arXiv:1711.03825  [pdf, ps, other

    math.NA

    Deep Multigrid: learning prolongation and restriction matrices

    Authors: Alexandr Katrutsa, Talgat Daulbaev, Ivan Oseledets

    Abstract: This paper proposes the method to optimize restriction and prolongation operators in the two-grid method. The proposed method is straightforwardly extended to the geometric multigrid method (GMM). GMM is used in solving discretized partial differential equation (PDE) and based on the restriction and prolongation operators. The operators are crucial for fast convergence of GMM, but they are unknown… ▽ More

    Submitted 10 November, 2017; originally announced November 2017.