Search | arXiv e-print repository

EEvA: Fast Expert-Based Algorithms for Buffer Page Replacement

Authors: Alexander Demin, Yuriy Dorn, Aleksandr Katrutsa, Daniil Kazantsev, Ilgam Latypov, Yulia Maximlyuk, Denis Ponomaryov

Abstract: Optimal page replacement is an important problem in efficient buffer management. The range of replacement strategies known in the literature varies from simple but efficient FIFO-based algorithms to more accurate but potentially costly methods tailored to specific data access patterns. The principal issue in adopting a pattern-specific replacement logic in a DB buffer manager is to guarantee non-d… ▽ More Optimal page replacement is an important problem in efficient buffer management. The range of replacement strategies known in the literature varies from simple but efficient FIFO-based algorithms to more accurate but potentially costly methods tailored to specific data access patterns. The principal issue in adopting a pattern-specific replacement logic in a DB buffer manager is to guarantee non-degradation in general high-load regimes. In this paper, we propose a new family of page replacement algorithms for DB buffer manager which demonstrate a superior performance wrt competitors on custom data access patterns and imply a low computational overhead on TPC-C. We provide theoretical foundations and an extensive experimental study on the proposed algorithms which covers synthetic benchmarks and an implementation in an open-source DB kernel evaluated on TPC-C. △ Less

Submitted 30 April, 2024; originally announced May 2024.

arXiv:2402.07062 [pdf, other]

Fast UCB-type algorithms for stochastic bandits with heavy and super heavy symmetric noise

Authors: Yuriy Dorn, Aleksandr Katrutsa, Ilgam Latypov, Andrey Pudovikov

Abstract: In this study, we propose a new method for constructing UCB-type algorithms for stochastic multi-armed bandits based on general convex optimization methods with an inexact oracle. We derive the regret bounds corresponding to the convergence rates of the optimization methods. We propose a new algorithm Clipped-SGD-UCB and show, both theoretically and empirically, that in the case of symmetric noise… ▽ More In this study, we propose a new method for constructing UCB-type algorithms for stochastic multi-armed bandits based on general convex optimization methods with an inexact oracle. We derive the regret bounds corresponding to the convergence rates of the optimization methods. We propose a new algorithm Clipped-SGD-UCB and show, both theoretically and empirically, that in the case of symmetric noise in the reward, we can achieve an $O(\log T\sqrt{KT\log T})$ regret bound instead of $O\left (T^{\frac{1}{1+α}} K^{\fracα{1+α}} \right)$ for the case when the reward distribution satisfies $\mathbb{E}_{X \in D}[|X|^{1+α}] \leq σ^{1+α}$ ($α\in (0, 1])$, i.e. perform better than it is assumed by the general lower bound for bandits with heavy-tails. Moreover, the same bound holds even when the reward distribution does not have the expectation, that is, when $α<0$. △ Less

Submitted 10 February, 2024; originally announced February 2024.

arXiv:2310.01595 [pdf, other]

Memory-efficient particle filter recurrent neural network for object localization

Authors: Roman Korkin, Ivan Oseledets, Aleksandr Katrutsa

Abstract: This study proposes a novel memory-efficient recurrent neural network (RNN) architecture specified to solve the object localization problem. This problem is to recover the object states along with its movement in a noisy environment. We take the idea of the classical particle filter and combine it with GRU RNN architecture. The key feature of the resulting memory-efficient particle filter RNN mode… ▽ More This study proposes a novel memory-efficient recurrent neural network (RNN) architecture specified to solve the object localization problem. This problem is to recover the object states along with its movement in a noisy environment. We take the idea of the classical particle filter and combine it with GRU RNN architecture. The key feature of the resulting memory-efficient particle filter RNN model (mePFRNN) is that it requires the same number of parameters to process environments of different sizes. Thus, the proposed mePFRNN architecture consumes less memory to store parameters compared to the previously proposed PFRNN model. To demonstrate the performance of our model, we test it on symmetric and noisy environments that are incredibly challenging for filtering algorithms. In our experiments, the mePFRNN model provides more precise localization than the considered competitors and requires fewer trained parameters. △ Less

Submitted 2 October, 2023; originally announced October 2023.

arXiv:2303.07897 [pdf, other]

Multiparticle Kalman filter for object localization in symmetric environments

Authors: Roman Korkin, Ivan Oseledets, Aleksandr Katrutsa

Abstract: This study considers the object localization problem and proposes a novel multiparticle Kalman filter to solve it in complex and symmetric environments. Two well-known classes of filtering algorithms to solve the localization problem are Kalman filter-based methods and particle filter-based methods. We consider these classes, demonstrate their complementary properties, and propose a novel filterin… ▽ More This study considers the object localization problem and proposes a novel multiparticle Kalman filter to solve it in complex and symmetric environments. Two well-known classes of filtering algorithms to solve the localization problem are Kalman filter-based methods and particle filter-based methods. We consider these classes, demonstrate their complementary properties, and propose a novel filtering algorithm that takes the best from two classes. We evaluate the multiparticle Kalman filter in symmetric and noisy environments. Such environments are especially challenging for both classes of classical methods. We compare the proposed approach with the particle filter since only this method is feasible if the initial state is unknown. In the considered challenging environments, our method outperforms the particle filter in terms of both localization error and runtime. △ Less

Submitted 14 March, 2023; originally announced March 2023.

arXiv:2303.04744 [pdf, other]

Federated Privacy-preserving Collaborative Filtering for On-Device Next App Prediction

Authors: Albert Sayapin, Gleb Balitskiy, Daniel Bershatsky, Aleksandr Katrutsa, Evgeny Frolov, Alexey Frolov, Ivan Oseledets, Vitaliy Kharin

Abstract: In this study, we propose a novel SeqMF model to solve the problem of predicting the next app launch during mobile device usage. Although this problem can be represented as a classical collaborative filtering problem, it requires proper modification since the data are sequential, the user feedback is distributed among devices and the transmission of users' data to aggregate common patterns must be… ▽ More In this study, we propose a novel SeqMF model to solve the problem of predicting the next app launch during mobile device usage. Although this problem can be represented as a classical collaborative filtering problem, it requires proper modification since the data are sequential, the user feedback is distributed among devices and the transmission of users' data to aggregate common patterns must be protected against leakage. According to such requirements, we modify the structure of the classical matrix factorization model and update the training procedure to sequential learning. Since the data about user experience are distributed among devices, the federated learning setup is used to train the proposed sequential matrix factorization model. One more ingredient of the proposed approach is a new privacy mechanism that guarantees the protection of the sent data from the users to the remote server. To demonstrate the efficiency of the proposed model we use publicly available mobile user behavior data. We compare our model with sequential rules and models based on the frequency of app launches. The comparison is conducted in static and dynamic environments. The static environment evaluates how our model processes sequential data compared to competitors. Therefore, the standard train-validation-test evaluation procedure is used. The dynamic environment emulates the real-world scenario, where users generate new data by running apps on devices, and evaluates our model in this case. Our experiments show that the proposed model provides comparable quality with other methods in the static environment. However, more importantly, our method achieves a better privacy-utility trade-off than competitors in the dynamic environment, which provides more accurate simulations of real-world usage. △ Less

Submitted 5 February, 2023; originally announced March 2023.

arXiv:2209.14937 [pdf, other]

NAG-GS: Semi-Implicit, Accelerated and Robust Stochastic Optimizer

Authors: Valentin Leplat, Daniil Merkulov, Aleksandr Katrutsa, Daniel Bershatsky, Olga Tsymboi, Ivan Oseledets

Abstract: Classical machine learning models such as deep neural networks are usually trained by using Stochastic Gradient Descent-based (SGD) algorithms. The classical SGD can be interpreted as a discretization of the stochastic gradient flow. In this paper we propose a novel, robust and accelerated stochastic optimizer that relies on two key elements: (1) an accelerated Nesterov-like Stochastic Differentia… ▽ More Classical machine learning models such as deep neural networks are usually trained by using Stochastic Gradient Descent-based (SGD) algorithms. The classical SGD can be interpreted as a discretization of the stochastic gradient flow. In this paper we propose a novel, robust and accelerated stochastic optimizer that relies on two key elements: (1) an accelerated Nesterov-like Stochastic Differential Equation (SDE) and (2) its semi-implicit Gauss-Seidel type discretization. The convergence and stability of the obtained method, referred to as NAG-GS, are first studied extensively in the case of the minimization of a quadratic function. This analysis allows us to come up with an optimal learning rate in terms of the convergence rate while ensuring the stability of NAG-GS. This is achieved by the careful analysis of the spectral radius of the iteration matrix and the covariance matrix at stationarity with respect to all hyperparameters of our method. Further, we show that NAG- GS is competitive with state-of-the-art methods such as momentum SGD with weight decay and AdamW for the training of machine learning models such as the logistic regression model, the residual networks models on standard computer vision datasets, Transformers in the frame of the GLUE benchmark and the recent Vision Transformers. △ Less

Submitted 30 September, 2023; v1 submitted 29 September, 2022; originally announced September 2022.

Comments: We study Nesterov acceleration for the Stochastic Differential Equation

arXiv:2202.11432 [pdf, other]

doi 10.1016/j.jcp.2023.111913

Extension of Dynamic Mode Decomposition for dynamic systems with incomplete information based on t-model of optimal prediction

Authors: Aleksandr Katrutsa, Sergey Utyuzhnikov, Ivan Oseledets

Abstract: The Dynamic Mode Decomposition has proved to be a very efficient technique to study dynamic data. This is entirely a data-driven approach that extracts all necessary information from data snapshots which are commonly supposed to be sampled from measurement. The application of this approach becomes problematic if the available data is incomplete because some dimensions of smaller scale either missi… ▽ More The Dynamic Mode Decomposition has proved to be a very efficient technique to study dynamic data. This is entirely a data-driven approach that extracts all necessary information from data snapshots which are commonly supposed to be sampled from measurement. The application of this approach becomes problematic if the available data is incomplete because some dimensions of smaller scale either missing or unmeasured. Such setting occurs very often in modeling complex dynamical systems such as power grids, in particular with reduced-order modeling. To take into account the effect of unresolved variables the optimal prediction approach based on the Mori-Zwanzig formalism can be applied to obtain the most expected prediction under existing uncertainties. This effectively leads to the development of a time-predictive model accounting for the impact of missing data. In the present paper we provide a detailed derivation of the considered method from the Liouville equation and finalize it with the optimization problem that defines the optimal transition operator corresponding to the observed data. In contrast to the existing approach, we consider a first-order approximation of the Mori-Zwanzig decomposition, state the corresponding optimization problem and solve it with the gradient-based optimization method. The gradient of the obtained objective function is computed precisely through the automatic differentiation technique. The numerical experiments illustrate that the considered approach gives practically the same dynamics as the exact Mori-Zwanzig decomposition, but is less computationally intensive. △ Less

Submitted 23 February, 2022; originally announced February 2022.

arXiv:2202.10435 [pdf, ps, other]

Survey on Large Scale Neural Network Training

Authors: Julia Gusak, Daria Cherniuk, Alena Shilova, Alexander Katrutsa, Daniel Bershatsky, Xunyi Zhao, Lionel Eyraud-Dubois, Oleg Shlyazhko, Denis Dimitrov, Ivan Oseledets, Olivier Beaumont

Abstract: Modern Deep Neural Networks (DNNs) require significant memory to store weight, activations, and other intermediate tensors during training. Hence, many models do not fit one GPU device or can be trained using only a small per-GPU batch size. This survey provides a systematic overview of the approaches that enable more efficient DNNs training. We analyze techniques that save memory and make good us… ▽ More Modern Deep Neural Networks (DNNs) require significant memory to store weight, activations, and other intermediate tensors during training. Hence, many models do not fit one GPU device or can be trained using only a small per-GPU batch size. This survey provides a systematic overview of the approaches that enable more efficient DNNs training. We analyze techniques that save memory and make good use of computation and communication resources on architectures with a single or several GPUs. We summarize the main categories of strategies and compare strategies within and across categories. Along with approaches proposed in the literature, we discuss available implementations. △ Less

Submitted 21 February, 2022; originally announced February 2022.

arXiv:2201.13195 [pdf, other]

Memory-Efficient Backpropagation through Large Linear Layers

Authors: Daniel Bershatsky, Aleksandr Mikhalev, Alexandr Katrutsa, Julia Gusak, Daniil Merkulov, Ivan Oseledets

Abstract: In modern neural networks like Transformers, linear layers require significant memory to store activations during backward pass. This study proposes a memory reduction approach to perform backpropagation through linear layers. Since the gradients of linear layers are computed by matrix multiplications, we consider methods for randomized matrix multiplications and demonstrate that they require less… ▽ More In modern neural networks like Transformers, linear layers require significant memory to store activations during backward pass. This study proposes a memory reduction approach to perform backpropagation through linear layers. Since the gradients of linear layers are computed by matrix multiplications, we consider methods for randomized matrix multiplications and demonstrate that they require less memory with a moderate decrease of the test accuracy. Also, we investigate the variance of the gradient estimate induced by the randomized matrix multiplication. We compare this variance with the variance coming from gradient estimation based on the batch of samples. We demonstrate the benefits of the proposed method on the fine-tuning of the pre-trained RoBERTa model on GLUE tasks. △ Less

Submitted 2 February, 2022; v1 submitted 31 January, 2022; originally announced January 2022.

Comments: Submitted

arXiv:2103.08561 [pdf, other]

Meta-Solver for Neural Ordinary Differential Equations

Authors: Julia Gusak, Alexandr Katrutsa, Talgat Daulbaev, Andrzej Cichocki, Ivan Oseledets

Abstract: A conventional approach to train neural ordinary differential equations (ODEs) is to fix an ODE solver and then learn the neural network's weights to optimize a target loss function. However, such an approach is tailored for a specific discretization method and its properties, which may not be optimal for the selected application and yield the overfitting to the given solver. In our paper, we inve… ▽ More A conventional approach to train neural ordinary differential equations (ODEs) is to fix an ODE solver and then learn the neural network's weights to optimize a target loss function. However, such an approach is tailored for a specific discretization method and its properties, which may not be optimal for the selected application and yield the overfitting to the given solver. In our paper, we investigate how the variability in solvers' space can improve neural ODEs performance. We consider a family of Runge-Kutta methods that are parameterized by no more than two scalar variables. Based on the solvers' properties, we propose an approach to decrease neural ODEs overfitting to the pre-defined solver, along with a criterion to evaluate such behaviour. Moreover, we show that the right choice of solver parameterization can significantly affect neural ODEs models in terms of robustness to adversarial attacks. Recently it was shown that neural ODEs demonstrate superiority over conventional CNNs in terms of robustness. Our work demonstrates that the model robustness can be further improved by optimizing solver choice for a given task. The source code to reproduce our experiments is available at https://github.com/juliagusak/neural-ode-metasolver. △ Less

Submitted 15 March, 2021; originally announced March 2021.

arXiv:2007.06937 [pdf, other]

Follow the bisector: a simple method for multi-objective optimization

Authors: Alexandr Katrutsa, Daniil Merkulov, Nurislam Tursynbek, Ivan Oseledets

Abstract: This study presents a novel Equiangular Direction Method (EDM) to solve a multi-objective optimization problem. We consider optimization problems, where multiple differentiable losses have to be minimized. The presented method computes descent direction in every iteration to guarantee equal relative decrease of objective functions. This descent direction is based on the normalized gradients of the… ▽ More This study presents a novel Equiangular Direction Method (EDM) to solve a multi-objective optimization problem. We consider optimization problems, where multiple differentiable losses have to be minimized. The presented method computes descent direction in every iteration to guarantee equal relative decrease of objective functions. This descent direction is based on the normalized gradients of the individual losses. Therefore, it is appropriate to solve multi-objective optimization problems with multi-scale losses. We test the proposed method on the imbalanced classification problem and multi-task learning problem, where standard datasets are used. EDM is compared with other methods to solve these problems. △ Less

Submitted 14 July, 2020; originally announced July 2020.

arXiv:2004.09222 [pdf, other]

Towards Understanding Normalization in Neural ODEs

Authors: Julia Gusak, Larisa Markeeva, Talgat Daulbaev, Alexandr Katrutsa, Andrzej Cichocki, Ivan Oseledets

Abstract: Normalization is an important and vastly investigated technique in deep learning. However, its role for Ordinary Differential Equation based networks (neural ODEs) is still poorly understood. This paper investigates how different normalization techniques affect the performance of neural ODEs. Particularly, we show that it is possible to achieve 93% accuracy in the CIFAR-10 classification task, and… ▽ More Normalization is an important and vastly investigated technique in deep learning. However, its role for Ordinary Differential Equation based networks (neural ODEs) is still poorly understood. This paper investigates how different normalization techniques affect the performance of neural ODEs. Particularly, we show that it is possible to achieve 93% accuracy in the CIFAR-10 classification task, and to the best of our knowledge, this is the highest reported accuracy among neural ODEs tested on this problem. △ Less

Submitted 27 April, 2020; v1 submitted 20 April, 2020; originally announced April 2020.

arXiv:2003.05271 [pdf, other]

Interpolation Technique to Speed Up Gradients Propagation in Neural ODEs

Authors: Talgat Daulbaev, Alexandr Katrutsa, Larisa Markeeva, Julia Gusak, Andrzej Cichocki, Ivan Oseledets

Abstract: We propose a simple interpolation-based method for the efficient approximation of gradients in neural ODE models. We compare it with the reverse dynamic method (known in the literature as "adjoint method") to train neural ODEs on classification, density estimation, and inference approximation tasks. We also propose a theoretical justification of our approach using logarithmic norm formalism. As a… ▽ More We propose a simple interpolation-based method for the efficient approximation of gradients in neural ODE models. We compare it with the reverse dynamic method (known in the literature as "adjoint method") to train neural ODEs on classification, density estimation, and inference approximation tasks. We also propose a theoretical justification of our approach using logarithmic norm formalism. As a result, our method allows faster model training than the reverse dynamic method that was confirmed and validated by extensive numerical experiments for several standard benchmarks. △ Less

Submitted 30 October, 2020; v1 submitted 11 March, 2020; originally announced March 2020.

arXiv:1502.07167 [pdf, ps, other]

Linear complexity SimRank computation based on the iterative diagonal estimation

Authors: I. V. Oseledets, G. V. Ovchinnikov, A. M. Katrutsa

Abstract: This paper presents a deterministic linear time complexity IDE-SimRank method to approximately compute SimRank with proved error bound. SimRank is a well-known similarity measure between graph vertices which relies on graph topology only and is built on intuition that "two objects are similar if they are related to similar objects". The fixed point equation for direct SimRank computation is the di… ▽ More This paper presents a deterministic linear time complexity IDE-SimRank method to approximately compute SimRank with proved error bound. SimRank is a well-known similarity measure between graph vertices which relies on graph topology only and is built on intuition that "two objects are similar if they are related to similar objects". The fixed point equation for direct SimRank computation is the discrete Lyapunov equation with specific diagonal matrix in the right hand side. The proposed method is based on estimation of this diagonal matrix with GMRES and use this estimation to compute singe-source and single pairs queries. These computations are executed with the part of series converging to the discrete Lyapunov equation solution. △ Less

Submitted 25 February, 2015; originally announced February 2015.

Showing 1–14 of 14 results for author: Katrutsa, A