Skip to main content

Showing 1–7 of 7 results for author: Merkulov, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.09737  [pdf, other

    cs.LG cs.CL

    Quantization of Large Language Models with an Overdetermined Basis

    Authors: Daniil Merkulov, Daria Cherniuk, Alexander Rudikov, Ivan Oseledets, Ekaterina Muravleva, Aleksandr Mikhalev, Boris Kashin

    Abstract: In this paper, we introduce an algorithm for data quantization based on the principles of Kashin representation. This approach hinges on decomposing any given vector, matrix, or tensor into two factors. The first factor maintains a small infinity norm, while the second exhibits a similarly constrained norm when multiplied by an orthogonal matrix. Surprisingly, the entries of factors after decompos… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  2. arXiv:2209.14937  [pdf, other

    math.OC cs.LG math.NA

    NAG-GS: Semi-Implicit, Accelerated and Robust Stochastic Optimizer

    Authors: Valentin Leplat, Daniil Merkulov, Aleksandr Katrutsa, Daniel Bershatsky, Olga Tsymboi, Ivan Oseledets

    Abstract: Classical machine learning models such as deep neural networks are usually trained by using Stochastic Gradient Descent-based (SGD) algorithms. The classical SGD can be interpreted as a discretization of the stochastic gradient flow. In this paper we propose a novel, robust and accelerated stochastic optimizer that relies on two key elements: (1) an accelerated Nesterov-like Stochastic Differentia… ▽ More

    Submitted 30 September, 2023; v1 submitted 29 September, 2022; originally announced September 2022.

    Comments: We study Nesterov acceleration for the Stochastic Differential Equation

  3. arXiv:2201.13195  [pdf, other

    cs.LG cs.AI stat.ML

    Memory-Efficient Backpropagation through Large Linear Layers

    Authors: Daniel Bershatsky, Aleksandr Mikhalev, Alexandr Katrutsa, Julia Gusak, Daniil Merkulov, Ivan Oseledets

    Abstract: In modern neural networks like Transformers, linear layers require significant memory to store activations during backward pass. This study proposes a memory reduction approach to perform backpropagation through linear layers. Since the gradients of linear layers are computed by matrix multiplications, we consider methods for randomized matrix multiplications and demonstrate that they require less… ▽ More

    Submitted 2 February, 2022; v1 submitted 31 January, 2022; originally announced January 2022.

    Comments: Submitted

  4. arXiv:2110.00874  [pdf, other

    cs.LG math.OC

    Fast Line Search for Multi-Task Learning

    Authors: Andrey Filatov, Daniil Merkulov

    Abstract: Multi-task learning is a powerful method for solving several tasks jointly by learning robust representation. Optimization of the multi-task learning model is a more complex task than a single-task due to task conflict. Based on theoretical results, convergence to the optimal point is guaranteed when step size is chosen through line search. But, usually, line search for the step size is not the be… ▽ More

    Submitted 2 October, 2021; originally announced October 2021.

  5. arXiv:2007.06937  [pdf, other

    math.OC cs.LG

    Follow the bisector: a simple method for multi-objective optimization

    Authors: Alexandr Katrutsa, Daniil Merkulov, Nurislam Tursynbek, Ivan Oseledets

    Abstract: This study presents a novel Equiangular Direction Method (EDM) to solve a multi-objective optimization problem. We consider optimization problems, where multiple differentiable losses have to be minimized. The presented method computes descent direction in every iteration to guarantee equal relative decrease of objective functions. This descent direction is based on the normalized gradients of the… ▽ More

    Submitted 14 July, 2020; originally announced July 2020.

  6. arXiv:2004.08981  [pdf, other

    stat.ML cs.LG math.OC

    Stochastic gradient algorithms from ODE splitting perspective

    Authors: Daniil Merkulov, Ivan Oseledets

    Abstract: We present a different view on stochastic optimization, which goes back to the splitting schemes for approximate solutions of ODE. In this work, we provide a connection between stochastic gradient descent approach and first-order splitting scheme for ODE. We consider the special case of splitting, which is inspired by machine learning applications and derive a new upper bound on the global splitti… ▽ More

    Submitted 19 April, 2020; originally announced April 2020.

  7. Empirical study of extreme overfitting points of neural networks

    Authors: Daniil Merkulov, Ivan Oseledets

    Abstract: In this paper we propose a method of obtaining points of extreme overfitting - parameters of modern neural networks, at which they demonstrate close to 100 % training accuracy, simultaneously with almost zero accuracy on the test sample. Despite the widespread opinion that the overwhelming majority of critical points of the loss function of a neural network have equally good generalizing ability,… ▽ More

    Submitted 3 July, 2019; v1 submitted 14 June, 2019; originally announced June 2019.

    Journal ref: J. Commun. Technol. Electron. 64, 1527-1534 (2019)