Search | arXiv e-print repository

doi 10.1016/j.chaos.2024.115048

Gradient-free algorithm for saddle point problems under overparametrization

Authors: Ekaterina Statkevich, Sofiya Bondar, Darina Dvinskikh, Alexander Gasnikov, Aleksandr Lobanov

Abstract: This paper focuses on solving a stochastic saddle point problem (SPP) under an overparameterized regime for the case, when the gradient computation is impractical. As an intermediate step, we generalize Same-sample Stochastic Extra-gradient algorithm (Gorbunov et al., 2022) to a biased oracle and estimate novel convergence rates. As the result of the paper we introduce an algorithm, which uses gra… ▽ More This paper focuses on solving a stochastic saddle point problem (SPP) under an overparameterized regime for the case, when the gradient computation is impractical. As an intermediate step, we generalize Same-sample Stochastic Extra-gradient algorithm (Gorbunov et al., 2022) to a biased oracle and estimate novel convergence rates. As the result of the paper we introduce an algorithm, which uses gradient approximation instead of a gradient oracle. We also conduct an analysis to find the maximum admissible level of adversarial noise and the optimal number of iterations at which our algorithm can guarantee achieving the desired accuracy. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Journal ref: Chaos, Solitons & Fractals Chaos, Solitons & Fractals Volume 185 August 2024 115048

arXiv:2311.16743 [pdf, ps, other]

About some works of Boris Polyak on convergence of gradient methods and their development

Authors: Seydamet Ablaev, Aleksandr Beznosikov, Alexander Gasnikov, Darina Dvinskikh, Aleksandr Lobanov, Sergei Puchinin, Fedor Stonyakin

Abstract: The paper presents a review of the state-of-the-art of subgradient and accelerated methods of convex optimization, including in the presence of disturbances and access to various information about the objective function (function value, gradient, stochastic gradient, higher derivatives). For nonconvex problems, the Polak-Lojasiewicz condition is considered and a review of the main results is given… ▽ More The paper presents a review of the state-of-the-art of subgradient and accelerated methods of convex optimization, including in the presence of disturbances and access to various information about the objective function (function value, gradient, stochastic gradient, higher derivatives). For nonconvex problems, the Polak-Lojasiewicz condition is considered and a review of the main results is given. The behavior of numerical methods in the presence of sharp minima is considered. The purpose of this survey is to show the influence of the works of B.T. Polyak (1935 -- 2023) on gradient optimization methods and their neighborhoods on the modern development of numerical optimization methods. △ Less

Submitted 28 November, 2023; originally announced November 2023.

Comments: in Russian language

arXiv:2311.06953 [pdf, other]

Bregman Proximal Method for Efficient Communications under Similarity

Authors: Aleksandr Beznosikov, Darina Dvinskikh, Andrei Semenov, Alexander Gasnikov

Abstract: We propose a novel distributed method for monotone variational inequalities and convex-concave saddle point problems arising in various machine learning applications such as game theory and adversarial training. By exploiting \textit{similarity} our algorithm overcomes communication bottleneck which is a major issue in distributed optimization. The proposed algorithm enjoys optimal communication c… ▽ More We propose a novel distributed method for monotone variational inequalities and convex-concave saddle point problems arising in various machine learning applications such as game theory and adversarial training. By exploiting \textit{similarity} our algorithm overcomes communication bottleneck which is a major issue in distributed optimization. The proposed algorithm enjoys optimal communication complexity of $δ/ε$, where $ε$ measures the non-optimality gap function, and $δ$ is a parameter of similarity. All the existing distributed algorithms achieving this bound essentially utilize the Euclidean setup. In contrast to them, our algorithm is built upon Bregman proximal maps and it is compatible with an arbitrary Bregman divergence. Thanks to this, it has more flexibility to fit the problem geometry than algorithms with the Euclidean setup. Thereby the proposed method bridges the gap between the Euclidean and non-Euclidean setting. By using the restart technique, we extend our algorithm to variational inequalities with $μ$-strongly monotone operator, resulting in optimal communication complexity of $δ/μ$ (up to a logarithmic factor). Our theoretical results are confirmed by numerical experiments on a stochastic matrix game. △ Less

Submitted 21 June, 2024; v1 submitted 12 November, 2023; originally announced November 2023.

Comments: 16 pages

arXiv:2310.18763 [pdf, other]

Accelerated Zeroth-order Method for Non-Smooth Stochastic Convex Optimization Problem with Infinite Variance

Authors: Nikita Kornilov, Ohad Shamir, Aleksandr Lobanov, Darina Dvinskikh, Alexander Gasnikov, Innokentiy Shibaev, Eduard Gorbunov, Samuel Horváth

Abstract: In this paper, we consider non-smooth stochastic convex optimization with two function evaluations per round under infinite noise variance. In the classical setting when noise has finite variance, an optimal algorithm, built upon the batched accelerated gradient method, was proposed in (Gasnikov et. al., 2022). This optimality is defined in terms of iteration and oracle complexity, as well as the… ▽ More In this paper, we consider non-smooth stochastic convex optimization with two function evaluations per round under infinite noise variance. In the classical setting when noise has finite variance, an optimal algorithm, built upon the batched accelerated gradient method, was proposed in (Gasnikov et. al., 2022). This optimality is defined in terms of iteration and oracle complexity, as well as the maximal admissible level of adversarial noise. However, the assumption of finite variance is burdensome and it might not hold in many practical scenarios. To address this, we demonstrate how to adapt a refined clipped version of the accelerated gradient (Stochastic Similar Triangles) method from (Sadiev et al., 2023) for a two-point zero-order oracle. This adaptation entails extending the batching technique to accommodate infinite variance -- a non-trivial task that stands as a distinct contribution of this paper. △ Less

Submitted 28 October, 2023; originally announced October 2023.

arXiv:2304.02442 [pdf, ps, other]

Gradient-Free Methods for Non-Smooth Convex Stochastic Optimization with Heavy-Tailed Noise on Convex Compact

Authors: Nikita Kornilov, Alexander Gasnikov, Pavel Dvurechensky, Darina Dvinskikh

Abstract: We present two easy-to-implement gradient-free/zeroth-order methods to optimize a stochastic non-smooth function accessible only via a black-box. The methods are built upon efficient first-order methods in the heavy-tailed case, i.e., when the gradient noise has infinite variance but bounded $(1+κ)$-th moment for some $κ\in(0,1]$. The first algorithm is based on the stochastic mirror descent with… ▽ More We present two easy-to-implement gradient-free/zeroth-order methods to optimize a stochastic non-smooth function accessible only via a black-box. The methods are built upon efficient first-order methods in the heavy-tailed case, i.e., when the gradient noise has infinite variance but bounded $(1+κ)$-th moment for some $κ\in(0,1]$. The first algorithm is based on the stochastic mirror descent with a particular class of uniformly convex mirror maps which is robust to heavy-tailed noise. The second algorithm is based on the stochastic mirror descent and gradient clip** technique. Additionally, for the objective functions satisfying the $r$-growth condition, faster algorithms are proposed based on these methods and the restart technique. △ Less

Submitted 24 August, 2023; v1 submitted 5 April, 2023; originally announced April 2023.

arXiv:2211.13566 [pdf, ps, other]

Randomized gradient-free methods in convex optimization

Authors: Alexander Gasnikov, Darina Dvinskikh, Pavel Dvurechensky, Eduard Gorbunov, Aleksander Beznosikov, Aleksandr Lobanov

Abstract: This review presents modern gradient-free methods to solve convex optimization problems. By gradient-free methods, we mean those that use only (noisy) realizations of the objective value. We are motivated by various applications where gradient information is prohibitively expensive or even unavailable. We mainly focus on three criteria: oracle complexity, iteration complexity, and the maximum perm… ▽ More This review presents modern gradient-free methods to solve convex optimization problems. By gradient-free methods, we mean those that use only (noisy) realizations of the objective value. We are motivated by various applications where gradient information is prohibitively expensive or even unavailable. We mainly focus on three criteria: oracle complexity, iteration complexity, and the maximum permissible noise level. △ Less

Submitted 12 February, 2024; v1 submitted 24 November, 2022; originally announced November 2022.

Comments: Survey paper; 9 pages

arXiv:2211.10783 [pdf, other]

Gradient-Free Federated Learning Methods with $l_1$ and $l_2$-Randomization for Non-Smooth Convex Stochastic Optimization Problems

Authors: Aleksandr Lobanov, Belal Alashqar, Darina Dvinskikh, Alexander Gasnikov

Abstract: This paper studies non-smooth problems of convex stochastic optimization. Using the smoothing technique based on the replacement of the function value at the considered point by the averaged function value over a ball (in $l_1$-norm or $l_2$-norm) of small radius with the center in this point, the original problem is reduced to a smooth problem (whose Lipschitz constant of the gradient is inversel… ▽ More This paper studies non-smooth problems of convex stochastic optimization. Using the smoothing technique based on the replacement of the function value at the considered point by the averaged function value over a ball (in $l_1$-norm or $l_2$-norm) of small radius with the center in this point, the original problem is reduced to a smooth problem (whose Lipschitz constant of the gradient is inversely proportional to the radius of the ball). An important property of the smoothing used is the possibility to calculate an unbiased estimation of the gradient of a smoothed function based only on realizations of the original function. The obtained smooth stochastic optimization problem is proposed to be solved in a distributed federated learning architecture (the problem is solved in parallel: nodes make local steps, e.g. stochastic gradient descent, then they communicate - all with all, then all this is repeated). The goal of this paper is to build on the current advances in gradient-free non-smooth optimization and in feild of federated learning, gradient-free methods for solving non-smooth stochastic optimization problems in federated learning architecture. △ Less

Submitted 21 May, 2023; v1 submitted 19 November, 2022; originally announced November 2022.

Comments: In Russian language. Redesigned version for publication in the journal Computational Mathematics and Mathematical Physics

arXiv:2210.11368 [pdf, ps, other]

Numerical Methods for Large-Scale Optimal Transport

Authors: Nazarii Tupitsa, Pavel Dvurechensky, Darina Dvinskikh, Alexander Gasnikov

Abstract: The optimal transport (OT) problem is a classical optimization problem having the form of linear programming. Machine learning applications put forward new computational challenges in its solution. In particular, the OT problem defines a distance between real-world objects such as images, videos, texts, etc., modeled as probability distributions. In this case, the large dimension of the correspond… ▽ More The optimal transport (OT) problem is a classical optimization problem having the form of linear programming. Machine learning applications put forward new computational challenges in its solution. In particular, the OT problem defines a distance between real-world objects such as images, videos, texts, etc., modeled as probability distributions. In this case, the large dimension of the corresponding optimization problem does not allow applying classical methods such as network simplex or interior-point methods. This challenge was overcome by introducing entropic regularization and using the efficient Sinkhorn's algorithm to solve the regularized problem. A flexible alternative is the accelerated primal-dual gradient method, which can use any strongly-convex regularization. We discuss these algorithms and other related problems such as approximating the Wasserstein barycenter together with efficient algorithms for its solution, including decentralized distributed algorithms. △ Less

Submitted 24 October, 2022; v1 submitted 20 October, 2022; originally announced October 2022.

Comments: An Encyclopedia article

arXiv:2202.06114 [pdf, other]

Gradient-Free Optimization for Non-Smooth Saddle Point Problems under Adversarial Noise

Authors: Darina Dvinskikh, Vladislav Tominin, Yaroslav Tominin, Alexander Gasnikov

Abstract: We consider non-smooth saddle point optimization problems. To solve these problems, we propose a zeroth-order method under bounded or Lipschitz continuous noise, possible adversarial. In contrast to the state-of-the-art algorithms, our algorithm is optimal in terms of both criteria: oracle calls complexity and the maximum value of admissible noise. The proposed method is simple and easy to impleme… ▽ More We consider non-smooth saddle point optimization problems. To solve these problems, we propose a zeroth-order method under bounded or Lipschitz continuous noise, possible adversarial. In contrast to the state-of-the-art algorithms, our algorithm is optimal in terms of both criteria: oracle calls complexity and the maximum value of admissible noise. The proposed method is simple and easy to implement as it is built on zeroth-order version of the stochastic mirror descent. The convergence analysis is given in terms of the average and probability. We also pay special attention to the duality gap $r$-growth condition $(r\geq 1)$, for which we provide a modification of our algorithm using the restart technique. We also comment on infinite noise variance and upper bounds in the case of Lipschitz noise. The results obtained in this paper are significant not only for saddle point problems but also for convex optimization. △ Less

Submitted 25 March, 2023; v1 submitted 12 February, 2022; originally announced February 2022.

arXiv:2202.01805 [pdf, ps, other]

On the relations of stochastic convex optimization problems with empirical risk minimization problems on $p$-norm balls

Authors: Darina Dvinskikh, Vitali Pirau, Alexander Gasnikov

Abstract: In this paper, we consider convex stochastic optimization problems arising in machine learning applications (e.g., risk minimization) and mathematical statistics (e.g., maximum likelihood estimation). There are two main approaches to solve such kinds of problems, namely the Stochastic Approximation approach (online approach) and the Sample Average Approximation approach, also known as the Monte Ca… ▽ More In this paper, we consider convex stochastic optimization problems arising in machine learning applications (e.g., risk minimization) and mathematical statistics (e.g., maximum likelihood estimation). There are two main approaches to solve such kinds of problems, namely the Stochastic Approximation approach (online approach) and the Sample Average Approximation approach, also known as the Monte Carlo approach, (offline approach). In the offline approach, the problem is replaced by its empirical counterpart (the empirical risk minimization problem). The natural question is how to define the problem sample size, i.e., how many realizations should be sampled so that the quite accurate solution of the empirical problem be the solution of the original problem with the desired precision. This issue is one of the main issues in modern machine learning and optimization. In the last decade, a lot of significant advances were made in these areas to solve convex stochastic optimization problems on the Euclidean balls (or the whole space). In this work, we are based on these advances and study the case of arbitrary balls in the $\ell_p$-norms. We also explore the question of how the parameter $p$ affects the estimates of the required number of terms as a function of empirical risk. △ Less

Submitted 2 March, 2022; v1 submitted 3 February, 2022; originally announced February 2022.

Comments: 14 pages, in Russian

arXiv:2107.07190 [pdf, other]

doi 10.1016/j.ejco.2022.100041

Decentralized Personalized Federated Learning: Lower Bounds and Optimal Algorithm for All Personalization Modes

Authors: Abdurakhmon Sadiev, Ekaterina Borodich, Aleksandr Beznosikov, Darina Dvinskikh, Saveliy Chezhegov, Rachael Tappenden, Martin Takáč, Alexander Gasnikov

Abstract: This paper considers the problem of decentralized, personalized federated learning. For centralized personalized federated learning, a penalty that measures the deviation from the local model and its average, is often added to the objective function. However, in a decentralized setting this penalty is expensive in terms of communication costs, so here, a different penalty - one that is built to re… ▽ More This paper considers the problem of decentralized, personalized federated learning. For centralized personalized federated learning, a penalty that measures the deviation from the local model and its average, is often added to the objective function. However, in a decentralized setting this penalty is expensive in terms of communication costs, so here, a different penalty - one that is built to respect the structure of the underlying computational network - is used instead. We present lower bounds on the communication and local computation costs for this problem formulation and we also present provably optimal methods for decentralized personalized federated learning. Numerical experiments are presented to demonstrate the practical performance of our methods. △ Less

Submitted 23 August, 2022; v1 submitted 15 July, 2021; originally announced July 2021.

Comments: New in v3: more detailed proofs, more experiments. 40 pages, 6 algorithms, 10 figures, 2 tables, 5 theorems

arXiv:2105.01587 [pdf, other]

Decentralized Algorithms for Wasserstein Barycenters

Authors: Darina Dvinskikh

Abstract: In this thesis, we consider the Wasserstein barycenter problem of discrete probability measures from computational and statistical sides. The statistical focus is estimating the sample size of measures necessary to calculate an approximation for Fréchet mean (barycenter) of a probability distribution with a given precision. For empirical risk minimization approaches, the question of the regulariza… ▽ More In this thesis, we consider the Wasserstein barycenter problem of discrete probability measures from computational and statistical sides. The statistical focus is estimating the sample size of measures necessary to calculate an approximation for Fréchet mean (barycenter) of a probability distribution with a given precision. For empirical risk minimization approaches, the question of the regularization is also studied together with proposing a new regularization which contributes to the better complexity bounds in comparison with quadratic regularization. The computational focus is develo** algorithms for calculating Wasserstein barycenters: both primal and dual algorithms which can be executed in a decentralized manner. The motivation for dual approaches is closed-forms for the dual formulation of entropy-regularized Wasserstein distances and their derivatives, whereas the primal formulation has closed-form expression only in some cases, e.g., for Gaussian measures. Moreover, the dual oracle returning the gradient of the dual representation for entropy-regularized Wasserstein distance can be computed for a cheaper price in comparison with the primal oracle returning the gradient of the entropy-regularized Wasserstein distance. The number of dual oracle calls, in this case, will also be less, i.e., the square root of the number of primal oracle calls. This explains the successful application of the first-order dual approaches for the Wasserstein barycenter problem. △ Less

Submitted 25 October, 2021; v1 submitted 4 May, 2021; originally announced May 2021.

Comments: 103 pages, Masters thesis. arXiv admin note: text overlap with arXiv:2001.07697

arXiv:2102.07758 [pdf, other]

doi 10.1080/10556788.2023.2280062

Decentralized Distributed Optimization for Saddle Point Problems

Authors: Alexander Rogozin, Aleksandr Beznosikov, Darina Dvinskikh, Dmitry Kovalev, Pavel Dvurechensky, Alexander Gasnikov

Abstract: We consider distributed convex-concave saddle point problems over arbitrary connected undirected networks and propose a decentralized distributed algorithm for their solution. The local functions distributed across the nodes are assumed to have global and local groups of variables. For the proposed algorithm we prove non-asymptotic convergence rate estimates with explicit dependence on the network… ▽ More We consider distributed convex-concave saddle point problems over arbitrary connected undirected networks and propose a decentralized distributed algorithm for their solution. The local functions distributed across the nodes are assumed to have global and local groups of variables. For the proposed algorithm we prove non-asymptotic convergence rate estimates with explicit dependence on the network characteristics. To supplement the convergence rate analysis, we propose lower bounds for strongly-convex-strongly-concave and convex-concave saddle-point problems over arbitrary connected undirected networks. We illustrate the considered problem setting by a particular application to distributed calculation of non-regularized Wasserstein barycenters. △ Less

Submitted 9 April, 2024; v1 submitted 15 February, 2021; originally announced February 2021.

arXiv:2011.13259 [pdf, ps, other]

doi 10.1007/978-3-031-00832-0_8

Recent theoretical advances in decentralized distributed convex optimization

Authors: Eduard Gorbunov, Alexander Rogozin, Aleksandr Beznosikov, Darina Dvinskikh, Alexander Gasnikov

Abstract: In the last few years, the theory of decentralized distributed convex optimization has made significant progress. The lower bounds on communications rounds and oracle calls have appeared, as well as methods that reach both of these bounds. In this paper, we focus on how these results can be explained based on optimal algorithms for the non-distributed setup. In particular, we provide our recent re… ▽ More In the last few years, the theory of decentralized distributed convex optimization has made significant progress. The lower bounds on communications rounds and oracle calls have appeared, as well as methods that reach both of these bounds. In this paper, we focus on how these results can be explained based on optimal algorithms for the non-distributed setup. In particular, we provide our recent results that have not been published yet and that could be found in details only in arXiv preprints. △ Less

Submitted 29 November, 2021; v1 submitted 26 November, 2020; originally announced November 2020.

Comments: 46 pages; a survey paper

arXiv:2010.09585 [pdf, ps, other]

Parallel and Distributed algorithms for ML problems

Authors: Darina Dvinskikh, Alexander Gasnikov, Alexander Rogozin, Alexander Beznosikov

Abstract: In this paper we make a survey of modern parallel and distributed approaches to solve sum-type convex minimization problems come from ML applications. In this paper we make a survey of modern parallel and distributed approaches to solve sum-type convex minimization problems come from ML applications. △ Less

Submitted 25 April, 2021; v1 submitted 19 October, 2020; originally announced October 2020.

Comments: in Russian

arXiv:2010.04677 [pdf, other]

Improved Complexity Bounds in Wasserstein Barycenter Problem

Authors: Darina Dvinskikh, Daniil Tiapkin

Abstract: In this paper, we focus on computational aspects of the Wasserstein barycenter problem. We propose two algorithms to compute Wasserstein barycenters of $m$ discrete measures of size $n$ with accuracy $\e$. The first algorithm, based on mirror prox with a specific norm, meets the complexity of celebrated accelerated iterative Bregman projections (IBP), namely $\widetilde O(mn^2\sqrt n/\e)$, however… ▽ More In this paper, we focus on computational aspects of the Wasserstein barycenter problem. We propose two algorithms to compute Wasserstein barycenters of $m$ discrete measures of size $n$ with accuracy $\e$. The first algorithm, based on mirror prox with a specific norm, meets the complexity of celebrated accelerated iterative Bregman projections (IBP), namely $\widetilde O(mn^2\sqrt n/\e)$, however, with no limitations in contrast to the (accelerated) IBP, which is numerically unstable under small regularization parameter. The second algorithm, based on area-convexity and dual extrapolation, improves the previously best-known convergence rates for the Wasserstein barycenter problem enjoying $\widetilde O(mn^2/\e)$ complexity. △ Less

Submitted 24 February, 2021; v1 submitted 9 October, 2020; originally announced October 2020.

Comments: 23 pages

arXiv:2004.08691 [pdf, other]

doi 10.1134/S096554252101005X

Accelerated meta-algorithm for convex optimization

Authors: Alexander Gasnikov, Darina Dvinskikh, Pavel Dvurechensky, Dmitry Kamzolov, Vladislav Matykhin, Dmitry Pasechnyk, Nazarii Tupitsa, Alexei Chernov

Abstract: We propose an accelerated meta-algorithm, which allows to obtain accelerated methods for convex unconstrained minimization in different settings. As an application of the general scheme we propose nearly optimal methods for minimizing smooth functions with Lipschitz derivatives of an arbitrary order, as well as for smooth minimax optimization problems. The proposed meta-algorithm is more general t… ▽ More We propose an accelerated meta-algorithm, which allows to obtain accelerated methods for convex unconstrained minimization in different settings. As an application of the general scheme we propose nearly optimal methods for minimizing smooth functions with Lipschitz derivatives of an arbitrary order, as well as for smooth minimax optimization problems. The proposed meta-algorithm is more general than the ones in the literature and allows to obtain better convergence rates and practical performance in several settings. △ Less

Submitted 4 November, 2020; v1 submitted 18 April, 2020; originally announced April 2020.

Comments: 25 pages, in Russian

arXiv:2004.04490

Accelerated and nonaccelerated stochastic gradient descent with inexact model

Authors: Darina Dvinskikh, Alexander Tyurin, Alexander Gasnikov, Sergey Omelchenko

Abstract: In this paper, we propose a new way to obtain optimal convergence rates for smooth stochastic (strong) convex optimization tasks. Our approach is based on results for optimization tasks where gradients have nonrandom noise. In contrast to previously known results, we extend our idea to the inexact model conception. In this paper, we propose a new way to obtain optimal convergence rates for smooth stochastic (strong) convex optimization tasks. Our approach is based on results for optimization tasks where gradients have nonrandom noise. In contrast to previously known results, we extend our idea to the inexact model conception. △ Less

Submitted 15 April, 2020; v1 submitted 9 April, 2020; originally announced April 2020.

Comments: Withdrawn as this should not have been a new article. Please instead see arXiv:2001.03443

arXiv:2002.02706 [pdf, other]

Oracle Complexity Separation in Convex Optimization

Authors: Anastasiya Ivanova, Evgeniya Vorontsova, Dmitry Pasechnyuk, Alexander Gasnikov, Pavel Dvurechensky, Darina Dvinskikh, Alexander Tyurin

Abstract: Many convex optimization problems have structured objective function written as a sum of functions with different types of oracles (full gradient, coordinate derivative, stochastic gradient) and different evaluation complexity of these oracles. In the strongly convex case these functions also have different condition numbers, which eventually define the iteration complexity of first-order methods… ▽ More Many convex optimization problems have structured objective function written as a sum of functions with different types of oracles (full gradient, coordinate derivative, stochastic gradient) and different evaluation complexity of these oracles. In the strongly convex case these functions also have different condition numbers, which eventually define the iteration complexity of first-order methods and the number of oracle calls required to achieve given accuracy. Motivated by the desire to call more expensive oracle less number of times, in this paper we consider minimization of a sum of two functions and propose a generic algorithmic framework to separate oracle complexities for each component in the sum. As a specific example, for the $μ$-strongly convex problem $\min_{x\in \mathbb{R}^n} h(x) + g(x)$ with $L_h$-smooth function $h$ and $L_g$-smooth function $g$, a special case of our algorithm requires, up to a logarithmic factor, $O(\sqrt{L_h/μ})$ first-order oracle calls for $h$ and $O(\sqrt{L_g/μ})$ first-order oracle calls for $g$. Our general framework covers also the setting of strongly convex objectives, the setting when $g$ is given by coordinate derivative oracle, and the setting when $g$ has a finite-sum structure and is available through stochastic gradient oracle. In the latter two cases we obtain respectively accelerated random coordinate descent and accelerated variance reduction methods with oracle complexity separation. △ Less

Submitted 11 March, 2022; v1 submitted 7 February, 2020; originally announced February 2020.

arXiv:2001.09013 [pdf, other]

Inexact Relative Smoothness and Strong Convexity for Optimization and Variational Inequalities by Inexact Model

Authors: Fedor Stonyakin, Alexander Tyurin, Alexander Gasnikov, Pavel Dvurechensky, Artem Agafonov, Darina Dvinskikh, Mohammad Alkousa, Dmitry Pasechnyuk, Sergei Artamonov, Victorya Piskunova

Abstract: In this paper, we propose a general algorithmic framework for first-order methods in optimization in a broad sense, including minimization problems, saddle-point problems, and variational inequalities. This framework allows obtaining many known methods as a special case, the list including accelerated gradient method, composite optimization methods, level-set methods, Bregman proximal methods. The… ▽ More In this paper, we propose a general algorithmic framework for first-order methods in optimization in a broad sense, including minimization problems, saddle-point problems, and variational inequalities. This framework allows obtaining many known methods as a special case, the list including accelerated gradient method, composite optimization methods, level-set methods, Bregman proximal methods. The idea of the framework is based on constructing an inexact model of the main problem component, i.e. objective function in optimization or operator in variational inequalities. Besides reproducing known results, our framework allows constructing new methods, which we illustrate by constructing a universal conditional gradient method and a universal method for variational inequalities with a composite structure. This method works for smooth and non-smooth problems with optimal complexity without a priori knowledge of the problem's smoothness. As a particular case of our general framework, we introduce relative smoothness for operators and propose an algorithm for variational inequalities (VIs) with such operators. We also generalize our framework for relatively strongly convex objectives and strongly monotone variational inequalities. This paper is an extended and updated version of [arXiv:1902.00990]. In particular, we add an extension of relative strong convexity for optimization and variational inequalities. △ Less

Submitted 19 December, 2021; v1 submitted 23 January, 2020; originally announced January 2020.

Comments: arXiv admin note: text overlap with arXiv:1902.00990. To appear in Optimization Methods and Software, https://doi.org/10.1080/10556788.2021.1924714

arXiv:2001.07697 [pdf, other]

Stochastic Approximation versus Sample Average Approximation for population Wasserstein barycenters

Authors: Darina Dvinskikh

Abstract: In the machine learning and optimization community, there are two main approaches for the convex risk minimization problem, namely, the Stochastic Approximation (SA) and the Sample Average Approximation (SAA). In terms of oracle complexity (required number of stochastic gradient evaluations), both approaches are considered equivalent on average (up to a logarithmic factor). The total complexity de… ▽ More In the machine learning and optimization community, there are two main approaches for the convex risk minimization problem, namely, the Stochastic Approximation (SA) and the Sample Average Approximation (SAA). In terms of oracle complexity (required number of stochastic gradient evaluations), both approaches are considered equivalent on average (up to a logarithmic factor). The total complexity depends on the specific problem, however, starting from work \cite{nemirovski2009robust} it was generally accepted that the SA is better than the SAA. % Nevertheless, in case of large-scale problems SA may run out of memory as storing all data on one machine and organizing online access to it can be impossible without communications with other machines. SAA in contradistinction to SA allows parallel/distributed calculations. We show that for the Wasserstein barycenter problem this superiority can be inverted. We provide a detailed comparison by stating the complexity bounds for the SA and the SAA implementations calculating barycenters defined with respect to optimal transport distances and entropy-regularized optimal transport distances. As a byproduct, we also construct confidence intervals for the barycenter defined with respect to entropy-regularized optimal transport distances in the $\ell_2$-norm. The preliminary results are derived for a general convex optimization problem given by the expectation in order to have other applications besides the Wasserstein barycenter problem. △ Less

Submitted 25 October, 2021; v1 submitted 21 January, 2020; originally announced January 2020.

Comments: 33 pages

arXiv:2001.03443 [pdf, ps, other]

Accelerated and nonaccelerated stochastic gradient descent with model conception

Authors: Darina Dvinskikh, Alexander Tyurin, Alexander Gasnikov, Sergey Omelchenko

Abstract: In this paper, we describe a new way to get convergence rates for optimal methods in smooth (strongly) convex optimization tasks. Our approach is based on results for tasks where gradients have nonrandom small noises. Unlike previous results, we obtain convergence rates with model conception. In this paper, we describe a new way to get convergence rates for optimal methods in smooth (strongly) convex optimization tasks. Our approach is based on results for tasks where gradients have nonrandom small noises. Unlike previous results, we obtain convergence rates with model conception. △ Less

Submitted 13 July, 2020; v1 submitted 10 January, 2020; originally announced January 2020.

Comments: in Russian

arXiv:1912.11632 [pdf, ps, other]

Accelerated gradient sliding and variance reduction

Authors: Darina Dvinskikh, Sergey Omelchenko, Alexander Tyurin, Alexander Gasnikov

Abstract: We consider sum-type strongly convex optimization problem (first term) with smooth convex not proximal friendly composite (second term). We show that the complexity of this problem can be split into optimal number of incremental oracle calls for the first (sum-type) term and optimal number of oracle calls for the second (composite) term. Here under `optimal number' we mean estimate that correspond… ▽ More We consider sum-type strongly convex optimization problem (first term) with smooth convex not proximal friendly composite (second term). We show that the complexity of this problem can be split into optimal number of incremental oracle calls for the first (sum-type) term and optimal number of oracle calls for the second (composite) term. Here under `optimal number' we mean estimate that corresponds to the well known lower bound in the absence of another term. △ Less

Submitted 11 March, 2020; v1 submitted 25 December, 2019; originally announced December 2019.

Comments: in Russian

arXiv:1911.08380 [pdf, other]

Adaptive Gradient Descent for Convex and Non-Convex Stochastic Optimization

Authors: Darina Dvinskikh, Aleksandr Ogaltsov, Alexander Gasnikov, Pavel Dvurechensky, Alexander Tyurin, Vladimir Spokoiny

Abstract: In this paper we propose several adaptive gradient methods for stochastic optimization. Unlike AdaGrad-type of methods, our algorithms are based on Armijo-type line search and they simultaneously adapt to the unknown Lipschitz constant of the gradient and variance of the stochastic approximation for the gradient. We consider an accelerated and non-accelerated gradient descent for convex problems a… ▽ More In this paper we propose several adaptive gradient methods for stochastic optimization. Unlike AdaGrad-type of methods, our algorithms are based on Armijo-type line search and they simultaneously adapt to the unknown Lipschitz constant of the gradient and variance of the stochastic approximation for the gradient. We consider an accelerated and non-accelerated gradient descent for convex problems and gradient descent for non-convex problems. In the experiments we demonstrate superiority of our methods to existing adaptive methods, e.g. AdaGrad and Adam. △ Less

Submitted 12 June, 2020; v1 submitted 19 November, 2019; originally announced November 2019.

Comments: 18 pages

arXiv:1911.07363 [pdf, ps, other]

Optimal Decentralized Distributed Algorithms for Stochastic Convex Optimization

Authors: Eduard Gorbunov, Darina Dvinskikh, Alexander Gasnikov

Abstract: We consider stochastic convex optimization problems with affine constraints and develop several methods using either primal or dual approach to solve it. In the primal case, we use a special penalization technique to make the initial problem more convenient for using optimization methods. We propose algorithms to solve it based on Similar Triangles Method with Inexact Proximal Step for the convex… ▽ More We consider stochastic convex optimization problems with affine constraints and develop several methods using either primal or dual approach to solve it. In the primal case, we use a special penalization technique to make the initial problem more convenient for using optimization methods. We propose algorithms to solve it based on Similar Triangles Method with Inexact Proximal Step for the convex smooth and strongly convex smooth objective functions and methods based on Gradient Sliding algorithm to solve the same problems in the non-smooth case. We prove the convergence guarantees in the smooth convex case with deterministic first-order oracle. We propose and analyze three novel methods to handle stochastic convex optimization problems with affine constraints: SPDSTM, R-RRMA-AC-SA$^2$, and SSTM_sc. All methods use stochastic dual oracle. SPDSTM is the stochastic primal-dual modification of STM and it is applied for the dual problem when the primal functional is strongly convex and Lipschitz continuous on some ball. We extend the result from Dvinskikh & Gasnikov (2019) for this method to the case when only biased stochastic oracle is available. R-RRMA-AC-SA$^2$ is an accelerated stochastic method based on the restarts of RRMA-AC-SA$^2$ from Foster et al. (2019) and SSTM_sc is just stochastic STM for strongly convex problems. Both methods are applied to the dual problem when the primal functional is strongly convex, smooth, and Lipschitz continuous on some ball and use stochastic dual first-order oracle. We develop convergence analysis for these methods for unbiased and biased oracles respectively. Finally, we apply all the aforementioned results and approaches to solve the decentralized distributed optimization problem and discuss the optimality of the obtained results in terms of communication rounds and the number of oracle calls per node. △ Less

Submitted 11 November, 2020; v1 submitted 17 November, 2019; originally announced November 2019.

Comments: The content of this version is the same as in the version from February 16, 2020. The changes are only in the restructuring of the paper

arXiv:1906.03620 [pdf, ps, other]

Accelerated methods for composite non-bilinear saddle point problem

Authors: Mohammad Alkousa, Darina Dvinskikh, Fedor Stonyakin, Alexander Gasnikov, Dmitry Kovalev

Abstract: Based on G. Lan's accelerated gradient sliding and general relation between the smoothness and strong convexity parameters of function under Legendre transformation we show that under rather general conditions the best known bounds for bilinear convex-concave smooth composite saddle point problem keep true for or non-bilinear convex-concave smooth composite saddle point problem. Moreover, we descr… ▽ More Based on G. Lan's accelerated gradient sliding and general relation between the smoothness and strong convexity parameters of function under Legendre transformation we show that under rather general conditions the best known bounds for bilinear convex-concave smooth composite saddle point problem keep true for or non-bilinear convex-concave smooth composite saddle point problem. Moreover, we describe situations when the bounds differ and explain the nature of the difference. △ Less

Submitted 1 January, 2020; v1 submitted 9 June, 2019; originally announced June 2019.

Comments: 28 pages, in Russian

arXiv:1904.09015 [pdf, ps, other]

Decentralized and Parallel Primal and Dual Accelerated Methods for Stochastic Convex Programming Problems

Authors: Darina Dvinskikh, Alexander Gasnikov

Abstract: We introduce primal and dual stochastic gradient oracle methods for decentralized convex optimization problems. Both for primal and dual oracles, the proposed methods are optimal in terms of the number of communication steps. However, for all classes of the objective, the optimality in terms of the number of oracle calls per node takes place only up to a logarithmic factor and the notion of smooth… ▽ More We introduce primal and dual stochastic gradient oracle methods for decentralized convex optimization problems. Both for primal and dual oracles, the proposed methods are optimal in terms of the number of communication steps. However, for all classes of the objective, the optimality in terms of the number of oracle calls per node takes place only up to a logarithmic factor and the notion of smoothness. By using mini-batching technique, we show that the proposed methods with stochastic oracle can be additionally parallelized at each node. The considered algorithms can be applied to many data science problems and inverse problems. △ Less

Submitted 10 February, 2021; v1 submitted 18 April, 2019; originally announced April 2019.

Comments: 36 pages

arXiv:1903.09844 [pdf, ps, other]

On Primal-Dual Approach for Distributed Stochastic Convex Optimization over Networks

Authors: Darina Dvinskikh, Eduard Gorbunov, Alexander Gasnikov, Pavel Dvurechensky, Cesar A. Uribe

Abstract: We introduce a primal-dual stochastic gradient oracle method for distributed convex optimization problems over networks. We show that the proposed method is optimal in terms of communication steps. Additionally, we propose a new analysis method for the rate of convergence in terms of duality gap and probability of large deviations. This analysis is based on a new technique that allows to bound the… ▽ More We introduce a primal-dual stochastic gradient oracle method for distributed convex optimization problems over networks. We show that the proposed method is optimal in terms of communication steps. Additionally, we propose a new analysis method for the rate of convergence in terms of duality gap and probability of large deviations. This analysis is based on a new technique that allows to bound the distance between the iteration sequence and the optimal point. By the proper choice of batch size, we can guarantee that this distance equals (up to a constant) to the distance between the starting point and the solution. △ Less

Submitted 26 November, 2019; v1 submitted 23 March, 2019; originally announced March 2019.

arXiv:1902.09001 [pdf, other]

Gradient Methods for Problems with Inexact Model of the Objective

Authors: Fedor Stonyakin, Darina Dvinskikh, Pavel Dvurechensky, Alexey Kroshnin, Olesya Kuznetsova, Artem Agafonov, Alexander Gasnikov, Alexander Tyurin, César A. Uribe, Dmitry Pasechnyuk, Sergei Artamonov

Abstract: We consider optimization methods for convex minimization problems under inexact information on the objective function. We introduce inexact model of the objective, which as a particular cases includes $(δ,L)$ inexact oracle and relative smoothness condition. We analyze gradient method which uses this inexact model and obtain convergence rates for convex and strongly convex problems. To show potent… ▽ More We consider optimization methods for convex minimization problems under inexact information on the objective function. We introduce inexact model of the objective, which as a particular cases includes $(δ,L)$ inexact oracle and relative smoothness condition. We analyze gradient method which uses this inexact model and obtain convergence rates for convex and strongly convex problems. To show potential applications of our general framework we consider three particular problems. The first one is clustering by electorial model introduced in [Nesterov, 2018]. The second one is approximating optimal transport distance, for which we propose a Proximal Sinkhorn algorithm. The third one is devoted to approximating optimal transport barycenter and we propose a Proximal Iterative Bregman Projections algorithm. We also illustrate the practical performance of our algorithms by numerical experiments. △ Less

Submitted 23 March, 2019; v1 submitted 24 February, 2019; originally announced February 2019.

MSC Class: 90C25; 90C30; 90C06; 90C90; 68Q25; 65K05; 65Y20; 68W40 ACM Class: G.1.6

arXiv:1902.00990 [pdf, ps, other]

Inexact Model: A Framework for Optimization and Variational Inequalities

Authors: Fedor Stonyakin, Alexander Gasnikov, Alexander Tyurin, Dmitry Pasechnyuk, Artem Agafonov, Pavel Dvurechensky, Darina Dvinskikh, Alexey Kroshnin, Victorya Piskunova

Abstract: In this paper we propose a general algorithmic framework for first-order methods in optimization in a broad sense, including minimization problems, saddle-point problems and variational inequalities. This framework allows to obtain many known methods as a special case, the list including accelerated gradient method, composite optimization methods, level-set methods, proximal methods. The idea of t… ▽ More In this paper we propose a general algorithmic framework for first-order methods in optimization in a broad sense, including minimization problems, saddle-point problems and variational inequalities. This framework allows to obtain many known methods as a special case, the list including accelerated gradient method, composite optimization methods, level-set methods, proximal methods. The idea of the framework is based on constructing an inexact model of the main problem component, i.e. objective function in optimization or operator in variational inequalities. Besides reproducing known results, our framework allows to construct new methods, which we illustrate by constructing a universal method for variational inequalities with composite structure. This method works for smooth and non-smooth problems with optimal complexity without a priori knowledge of the problem smoothness. We also generalize our framework for strongly convex objectives and strongly monotone variational inequalities. △ Less

Submitted 5 January, 2020; v1 submitted 3 February, 2019; originally announced February 2019.

Comments: 41 pages

arXiv:1901.08686 [pdf, ps, other]

On the Complexity of Approximating Wasserstein Barycenter

Authors: Alexey Kroshnin, Darina Dvinskikh, Pavel Dvurechensky, Alexander Gasnikov, Nazarii Tupitsa, Cesar Uribe

Abstract: We study the complexity of approximating Wassertein barycenter of $m$ discrete measures, or histograms of size $n$ by contrasting two alternative approaches, both using entropic regularization. The first approach is based on the Iterative Bregman Projections (IBP) algorithm for which our novel analysis gives a complexity bound proportional to $\frac{mn^2}{\varepsilon^2}$ to approximate the origina… ▽ More We study the complexity of approximating Wassertein barycenter of $m$ discrete measures, or histograms of size $n$ by contrasting two alternative approaches, both using entropic regularization. The first approach is based on the Iterative Bregman Projections (IBP) algorithm for which our novel analysis gives a complexity bound proportional to $\frac{mn^2}{\varepsilon^2}$ to approximate the original non-regularized barycenter. Using an alternative accelerated-gradient-descent-based approach, we obtain a complexity proportional to $\frac{mn^{2.5}}{\varepsilon} $. As a byproduct, we show that the regularization parameter in both approaches has to be proportional to $\varepsilon$, which causes instability of both algorithms when the desired accuracy is high. To overcome this issue, we propose a novel proximal-IBP algorithm, which can be seen as a proximal gradient method, which uses IBP on each iteration to make a proximal step. We also consider the question of scalability of these algorithms using approaches from distributed optimization and show that the first algorithm can be implemented in a centralized distributed setting (master/slave), while the second one is amenable to a more general decentralized distributed setting with an arbitrary network topology. △ Less

Submitted 20 February, 2020; v1 submitted 24 January, 2019; originally announced January 2019.

Comments: Corrected misprints. Added a reference to accelerated Iterative Bregman Projections introduced in arXiv:1906.03622

MSC Class: 90C25; 90C30; 90C06; 90C90

Journal ref: ICML 2019, in PMLR 97:3530-3540. http://proceedings.mlr.press/v97/kroshnin19a.html

arXiv:1806.03915 [pdf, other]

Decentralize and Randomize: Faster Algorithm for Wasserstein Barycenters

Authors: Pavel Dvurechensky, Darina Dvinskikh, Alexander Gasnikov, César A. Uribe, Angelia Nedić

Abstract: We study the decentralized distributed computation of discrete approximations for the regularized Wasserstein barycenter of a finite set of continuous probability measures distributedly stored over a network. We assume there is a network of agents/machines/computers, and each agent holds a private continuous probability measure and seeks to compute the barycenter of all the measures in the network… ▽ More We study the decentralized distributed computation of discrete approximations for the regularized Wasserstein barycenter of a finite set of continuous probability measures distributedly stored over a network. We assume there is a network of agents/machines/computers, and each agent holds a private continuous probability measure and seeks to compute the barycenter of all the measures in the network by getting samples from its local measure and exchanging information with its neighbors. Motivated by this problem, we develop, and analyze, a novel accelerated primal-dual stochastic gradient method for general stochastic convex optimization problems with linear equality constraints. Then, we apply this method to the decentralized distributed optimization setting to obtain a new algorithm for the distributed semi-discrete regularized Wasserstein barycenter problem. Moreover, we show explicit non-asymptotic complexity for the proposed algorithm. △ Less

Submitted 19 February, 2020; v1 submitted 11 June, 2018; originally announced June 2018.

MSC Class: 90C25; 90C30; 90C06; 90C90; 68Q25; 65K05; 65Y20; 68W40 ACM Class: G.1.6

arXiv:1803.02933 [pdf, other]

Distributed Computation of Wasserstein Barycenters over Networks

Authors: César A. Uribe, Darina Dvinskikh, Pavel Dvurechensky, Alexander Gasnikov, Angelia Nedić

Abstract: We propose a new \cu{class-optimal} algorithm for the distributed computation of Wasserstein Barycenters over networks. Assuming that each node in a graph has a probability distribution, we prove that every node can reach the barycenter of all distributions held in the network by using local interactions compliant with the topology of the graph. We provide an estimate for the minimum number of com… ▽ More We propose a new \cu{class-optimal} algorithm for the distributed computation of Wasserstein Barycenters over networks. Assuming that each node in a graph has a probability distribution, we prove that every node can reach the barycenter of all distributions held in the network by using local interactions compliant with the topology of the graph. We provide an estimate for the minimum number of communication rounds required for the proposed method to achieve arbitrary relative precision both in the optimality of the solution and the consensus among all agents for undirected fixed networks. △ Less

Submitted 20 September, 2018; v1 submitted 7 March, 2018; originally announced March 2018.

Showing 1–33 of 33 results for author: Dvinskikh, D