Skip to main content

Showing 1–13 of 13 results for author: Gusak, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2308.04595  [pdf, other

    cs.LG

    Quantization Aware Factorization for Deep Neural Network Compression

    Authors: Daria Cherniuk, Stanislav Abukhovich, Anh-Huy Phan, Ivan Oseledets, Andrzej Cichocki, Julia Gusak

    Abstract: Tensor decomposition of convolutional and fully-connected layers is an effective way to reduce parameters and FLOP in neural networks. Due to memory and power consumption limitations of mobile or embedded devices, the quantization step is usually necessary when pre-trained models are deployed. A conventional post-training quantization approach applied to networks with decomposed weights yields a d… ▽ More

    Submitted 8 August, 2023; originally announced August 2023.

  2. arXiv:2307.01236  [pdf, other

    cs.LG cs.PL

    Rockmate: an Efficient, Fast, Automatic and Generic Tool for Re-materialization in PyTorch

    Authors: Xunyi Zhao, Théotime Le Hellard, Lionel Eyraud, Julia Gusak, Olivier Beaumont

    Abstract: We propose Rockmate to control the memory requirements when training PyTorch DNN models. Rockmate is an automatic tool that starts from the model code and generates an equivalent model, using a predefined amount of memory for activations, at the cost of a few re-computations. Rockmate automatically detects the structure of computational and data dependencies and rewrites the initial model as a seq… ▽ More

    Submitted 3 July, 2023; originally announced July 2023.

  3. arXiv:2306.02697  [pdf, other

    cs.AI

    Efficient GPT Model Pre-training using Tensor Train Matrix Representation

    Authors: Viktoriia Chekalina, Georgii Novikov, Julia Gusak, Ivan Oseledets, Alexander Panchenko

    Abstract: Large-scale transformer models have shown remarkable performance in language modelling tasks. However, such models feature billions of parameters, leading to difficulties in their deployment and prohibitive training costs from scratch. To reduce the number of the parameters in the GPT-2 architecture, we replace the matrices of fully-connected layers with the corresponding Tensor Train Matrix~(TTM)… ▽ More

    Submitted 5 June, 2023; originally announced June 2023.

  4. arXiv:2202.10435  [pdf, ps, other

    cs.LG cs.AI

    Survey on Large Scale Neural Network Training

    Authors: Julia Gusak, Daria Cherniuk, Alena Shilova, Alexander Katrutsa, Daniel Bershatsky, Xunyi Zhao, Lionel Eyraud-Dubois, Oleg Shlyazhko, Denis Dimitrov, Ivan Oseledets, Olivier Beaumont

    Abstract: Modern Deep Neural Networks (DNNs) require significant memory to store weight, activations, and other intermediate tensors during training. Hence, many models do not fit one GPU device or can be trained using only a small per-GPU batch size. This survey provides a systematic overview of the approaches that enable more efficient DNNs training. We analyze techniques that save memory and make good us… ▽ More

    Submitted 21 February, 2022; originally announced February 2022.

  5. arXiv:2202.00441  [pdf, other

    cs.LG cs.AI

    Few-Bit Backward: Quantized Gradients of Activation Functions for Memory Footprint Reduction

    Authors: Georgii Novikov, Daniel Bershatsky, Julia Gusak, Alex Shonenkov, Denis Dimitrov, Ivan Oseledets

    Abstract: Memory footprint is one of the main limiting factors for large neural network training. In backpropagation, one needs to store the input to each operation in the computational graph. Every modern neural network model has quite a few pointwise nonlinearities in its architecture, and such operation induces additional memory costs which -- as we show -- can be significantly reduced by quantization of… ▽ More

    Submitted 2 February, 2022; v1 submitted 1 February, 2022; originally announced February 2022.

    Comments: Submitted

  6. arXiv:2201.13195  [pdf, other

    cs.LG cs.AI stat.ML

    Memory-Efficient Backpropagation through Large Linear Layers

    Authors: Daniel Bershatsky, Aleksandr Mikhalev, Alexandr Katrutsa, Julia Gusak, Daniil Merkulov, Ivan Oseledets

    Abstract: In modern neural networks like Transformers, linear layers require significant memory to store activations during backward pass. This study proposes a memory reduction approach to perform backpropagation through linear layers. Since the gradients of linear layers are computed by matrix multiplications, we consider methods for randomized matrix multiplications and demonstrate that they require less… ▽ More

    Submitted 2 February, 2022; v1 submitted 31 January, 2022; originally announced January 2022.

    Comments: Submitted

  7. arXiv:2103.08561  [pdf, other

    cs.LG

    Meta-Solver for Neural Ordinary Differential Equations

    Authors: Julia Gusak, Alexandr Katrutsa, Talgat Daulbaev, Andrzej Cichocki, Ivan Oseledets

    Abstract: A conventional approach to train neural ordinary differential equations (ODEs) is to fix an ODE solver and then learn the neural network's weights to optimize a target loss function. However, such an approach is tailored for a specific discretization method and its properties, which may not be optimal for the selected application and yield the overfitting to the given solver. In our paper, we inve… ▽ More

    Submitted 15 March, 2021; originally announced March 2021.

  8. arXiv:2008.05441  [pdf, other

    cs.CV

    Stable Low-rank Tensor Decomposition for Compression of Convolutional Neural Network

    Authors: Anh-Huy Phan, Konstantin Sobolev, Konstantin Sozykin, Dmitry Ermilov, Julia Gusak, Petr Tichavsky, Valeriy Glukhov, Ivan Oseledets, Andrzej Cichocki

    Abstract: Most state of the art deep neural networks are overparameterized and exhibit a high computational cost. A straightforward approach to this problem is to replace convolutional kernels with its low-rank tensor approximations, whereas the Canonical Polyadic tensor Decomposition is one of the most suited models. However, fitting the convolutional tensors by numerical optimization algorithms often enco… ▽ More

    Submitted 12 August, 2020; originally announced August 2020.

    Comments: This paper is accepted to ECCV2020

  9. arXiv:2004.09222  [pdf, other

    cs.LG stat.ML

    Towards Understanding Normalization in Neural ODEs

    Authors: Julia Gusak, Larisa Markeeva, Talgat Daulbaev, Alexandr Katrutsa, Andrzej Cichocki, Ivan Oseledets

    Abstract: Normalization is an important and vastly investigated technique in deep learning. However, its role for Ordinary Differential Equation based networks (neural ODEs) is still poorly understood. This paper investigates how different normalization techniques affect the performance of neural ODEs. Particularly, we show that it is possible to achieve 93% accuracy in the CIFAR-10 classification task, and… ▽ More

    Submitted 27 April, 2020; v1 submitted 20 April, 2020; originally announced April 2020.

  10. arXiv:2003.05271  [pdf, other

    cs.NE math.NA stat.ML

    Interpolation Technique to Speed Up Gradients Propagation in Neural ODEs

    Authors: Talgat Daulbaev, Alexandr Katrutsa, Larisa Markeeva, Julia Gusak, Andrzej Cichocki, Ivan Oseledets

    Abstract: We propose a simple interpolation-based method for the efficient approximation of gradients in neural ODE models. We compare it with the reverse dynamic method (known in the literature as "adjoint method") to train neural ODEs on classification, density estimation, and inference approximation tasks. We also propose a theoretical justification of our approach using logarithmic norm formalism. As a… ▽ More

    Submitted 30 October, 2020; v1 submitted 11 March, 2020; originally announced March 2020.

  11. arXiv:1910.13025  [pdf, ps, other

    cs.LG math.NA stat.ML

    Active Subspace of Neural Networks: Structural Analysis and Universal Attacks

    Authors: Chunfeng Cui, Kaiqi Zhang, Talgat Daulbaev, Julia Gusak, Ivan Oseledets, Zheng Zhang

    Abstract: Active subspace is a model reduction method widely used in the uncertainty quantification community. In this paper, we propose analyzing the internal structure and vulnerability and deep neural networks using active subspace. Firstly, we employ the active subspace to measure the number of "active neurons" at each intermediate layer and reduce the number of neurons from several thousands to several… ▽ More

    Submitted 29 April, 2020; v1 submitted 28 October, 2019; originally announced October 2019.

  12. arXiv:1910.06995  [pdf, other

    cs.LG stat.ML

    Reduced-Order Modeling of Deep Neural Networks

    Authors: Julia Gusak, Talgat Daulbaev, Evgeny Ponomarev, Andrzej Cichocki, Ivan Oseledets

    Abstract: We introduce a new method for speeding up the inference of deep neural networks. It is somewhat inspired by the reduced-order modeling techniques for dynamical systems.The cornerstone of the proposed method is the maximum volume algorithm. We demonstrate efficiency on neural networks pre-trained on different datasets. We show that in many practical cases it is possible to replace convolutional lay… ▽ More

    Submitted 25 November, 2020; v1 submitted 15 October, 2019; originally announced October 2019.

  13. arXiv:1903.09973  [pdf, other

    cs.LG cs.CV stat.ML

    MUSCO: Multi-Stage Compression of neural networks

    Authors: Julia Gusak, Maksym Kholiavchenko, Evgeny Ponomarev, Larisa Markeeva, Ivan Oseledets, Andrzej Cichocki

    Abstract: The low-rank tensor approximation is very promising for the compression of deep neural networks. We propose a new simple and efficient iterative approach, which alternates low-rank factorization with a smart rank selection and fine-tuning. We demonstrate the efficiency of our method comparing to non-iterative ones. Our approach improves the compression rate while maintaining the accuracy for a var… ▽ More

    Submitted 15 November, 2019; v1 submitted 24 March, 2019; originally announced March 2019.