Search | arXiv e-print repository

Strong Convexity of Sets in Riemannian Manifolds

Authors: Damien Scieur, Thomas Kerdreux, Martínez-Rubio, Alexandre d'Aspremont, Sebastian Pokutta

Abstract: Convex curvature properties are important in designing and analyzing convex optimization algorithms in the Hilbertian or Riemannian settings. In the case of the Hilbertian setting, strongly convex sets are well studied. Herein, we propose various definitions of strong convexity for uniquely geodesic sets in a Riemannian manifold. We study their relationship, propose tools to determine the geodesic… ▽ More Convex curvature properties are important in designing and analyzing convex optimization algorithms in the Hilbertian or Riemannian settings. In the case of the Hilbertian setting, strongly convex sets are well studied. Herein, we propose various definitions of strong convexity for uniquely geodesic sets in a Riemannian manifold. We study their relationship, propose tools to determine the geodesic strongly convex nature of sets, and analyze the convergence of optimization algorithms over those sets. In particular, we demonstrate that the Riemannian Frank-Wolfe algorithm enjoys a global linear convergence rate when the Riemannian scaling inequalities hold. △ Less

Submitted 6 February, 2024; v1 submitted 6 December, 2023; originally announced December 2023.

arXiv:2305.19179 [pdf, ps, other]

Adaptive Quasi-Newton and Anderson Acceleration Framework with Explicit Global (Accelerated) Convergence Rates

Authors: Damien Scieur

Abstract: Despite the impressive numerical performance of the quasi-Newton and Anderson/nonlinear acceleration methods, their global convergence rates have remained elusive for over 50 years. This study addresses this long-standing issue by introducing a framework that derives novel, adaptive quasi-Newton and nonlinear/Anderson acceleration schemes. Under mild assumptions, the proposed iterative methods exh… ▽ More Despite the impressive numerical performance of the quasi-Newton and Anderson/nonlinear acceleration methods, their global convergence rates have remained elusive for over 50 years. This study addresses this long-standing issue by introducing a framework that derives novel, adaptive quasi-Newton and nonlinear/Anderson acceleration schemes. Under mild assumptions, the proposed iterative methods exhibit explicit, non-asymptotic convergence rates that blend those of the gradient descent and Cubic Regularized Newton's methods. The proposed approach also includes an accelerated version for convex functions. Notably, these rates are achieved adaptively without prior knowledge of the function's parameters. The framework presented in this study is generic, and its special cases includes algorithms such as Newton's method with random subspaces, finite-differences, or lazy Hessian. Numerical experiments demonstrated the efficiency of the proposed framework, even compared to the l-BFGS algorithm with Wolfe line-search. △ Less

Submitted 14 November, 2023; v1 submitted 30 May, 2023; originally announced May 2023.

arXiv:2209.13271 [pdf, other]

The Curse of Unrolling: Rate of Differentiating Through Optimization

Authors: Damien Scieur, Quentin Bertrand, Gauthier Gidel, Fabian Pedregosa

Abstract: Computing the Jacobian of the solution of an optimization problem is a central problem in machine learning, with applications in hyperparameter optimization, meta-learning, optimization as a layer, and dataset distillation, to name a few. Unrolled differentiation is a popular heuristic that approximates the solution using an iterative solver and differentiates it through the computational path. Th… ▽ More Computing the Jacobian of the solution of an optimization problem is a central problem in machine learning, with applications in hyperparameter optimization, meta-learning, optimization as a layer, and dataset distillation, to name a few. Unrolled differentiation is a popular heuristic that approximates the solution using an iterative solver and differentiates it through the computational path. This work provides a non-asymptotic convergence-rate analysis of this approach on quadratic objectives for gradient descent and the Chebyshev method. We show that to ensure convergence of the Jacobian, we can either 1) choose a large learning rate leading to a fast asymptotic convergence but accept that the algorithm may have an arbitrarily long burn-in phase or 2) choose a smaller learning rate leading to an immediate but slower convergence. We refer to this phenomenon as the curse of unrolling. Finally, we discuss open problems relative to this approach, such as deriving a practical update rule for the optimal unrolling strategy and making novel connections with the field of Sobolev orthogonal polynomials. △ Less

Submitted 25 August, 2023; v1 submitted 27 September, 2022; originally announced September 2022.

arXiv:2206.09901 [pdf, other]

Only Tails Matter: Average-Case Universality and Robustness in the Convex Regime

Authors: Leonardo Cunha, Gauthier Gidel, Fabian Pedregosa, Damien Scieur, Courtney Paquette

Abstract: The recently developed average-case analysis of optimization methods allows a more fine-grained and representative convergence analysis than usual worst-case results. In exchange, this analysis requires a more precise hypothesis over the data generating process, namely assuming knowledge of the expected spectral distribution (ESD) of the random matrix associated with the problem. This work shows t… ▽ More The recently developed average-case analysis of optimization methods allows a more fine-grained and representative convergence analysis than usual worst-case results. In exchange, this analysis requires a more precise hypothesis over the data generating process, namely assuming knowledge of the expected spectral distribution (ESD) of the random matrix associated with the problem. This work shows that the concentration of eigenvalues near the edges of the ESD determines a problem's asymptotic average complexity. This a priori information on this concentration is a more grounded assumption than complete knowledge of the ESD. This approximate concentration is effectively a middle ground between the coarseness of the worst-case scenario convergence and the restrictive previous average-case analysis. We also introduce the Generalized Chebyshev method, asymptotically optimal under a hypothesis on this concentration and globally optimal when the ESD follows a Beta distribution. We compare its performance to classical optimization algorithms, such as gradient descent or Nesterov's scheme, and we show that, in the average-case context, Nesterov's method is universally nearly optimal asymptotically. △ Less

Submitted 22 June, 2022; v1 submitted 20 June, 2022; originally announced June 2022.

Comments: To be published in ICML 2022

arXiv:2111.06826 [pdf, other]

Convergence Rates for the MAP of an Exponential Family and Stochastic Mirror Descent -- an Open Problem

Authors: Rémi Le Priol, Frederik Kunstner, Damien Scieur, Simon Lacoste-Julien

Abstract: We consider the problem of upper bounding the expected log-likelihood sub-optimality of the maximum likelihood estimate (MLE), or a conjugate maximum a posteriori (MAP) for an exponential family, in a non-asymptotic way. Surprisingly, we found no general solution to this problem in the literature. In particular, current theories do not hold for a Gaussian or in the interesting few samples regime.… ▽ More We consider the problem of upper bounding the expected log-likelihood sub-optimality of the maximum likelihood estimate (MLE), or a conjugate maximum a posteriori (MAP) for an exponential family, in a non-asymptotic way. Surprisingly, we found no general solution to this problem in the literature. In particular, current theories do not hold for a Gaussian or in the interesting few samples regime. After exhibiting various facets of the problem, we show we can interpret the MAP as running stochastic mirror descent (SMD) on the log-likelihood. However, modern convergence results do not apply for standard examples of the exponential family, highlighting holes in the convergence literature. We believe solving this very fundamental problem may bring progress to both the statistics and optimization communities. △ Less

Submitted 12 November, 2021; originally announced November 2021.

Comments: 9 pages and 3 figures + Appendix

arXiv:2106.09687 [pdf, other]

Super-Acceleration with Cyclical Step-sizes

Authors: Baptiste Goujaud, Damien Scieur, Aymeric Dieuleveut, Adrien Taylor, Fabian Pedregosa

Abstract: We develop a convergence-rate analysis of momentum with cyclical step-sizes. We show that under some assumption on the spectral gap of Hessians in machine learning, cyclical step-sizes are provably faster than constant step-sizes. More precisely, we develop a convergence rate analysis for quadratic objectives that provides optimal parameters and shows that cyclical learning rates can improve upon… ▽ More We develop a convergence-rate analysis of momentum with cyclical step-sizes. We show that under some assumption on the spectral gap of Hessians in machine learning, cyclical step-sizes are provably faster than constant step-sizes. More precisely, we develop a convergence rate analysis for quadratic objectives that provides optimal parameters and shows that cyclical learning rates can improve upon traditional lower complexity bounds. We further propose a systematic approach to design optimal first order methods for quadratic minimization with a given spectral structure. Finally, we provide a local convergence rate analysis beyond quadratic minimization for the proposed methods and illustrate our findings through benchmarks on least squares and logistic regression problems. △ Less

Submitted 9 May, 2022; v1 submitted 17 June, 2021; originally announced June 2021.

Journal ref: Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, PMLR 151:3028-3065, 2022

arXiv:2101.09545 [pdf, ps, other]

doi 10.1561/2400000036

Acceleration Methods

Authors: Alexandre d'Aspremont, Damien Scieur, Adrien Taylor

Abstract: This monograph covers some recent advances in a range of acceleration techniques frequently used in convex optimization. We first use quadratic optimization problems to introduce two key families of methods, namely momentum and nested optimization schemes. They coincide in the quadratic case to form the Chebyshev method. We discuss momentum methods in detail, starting with the seminal work of Nest… ▽ More This monograph covers some recent advances in a range of acceleration techniques frequently used in convex optimization. We first use quadratic optimization problems to introduce two key families of methods, namely momentum and nested optimization schemes. They coincide in the quadratic case to form the Chebyshev method. We discuss momentum methods in detail, starting with the seminal work of Nesterov and structure convergence proofs using a few master templates, such as that for optimized gradient methods, which provide the key benefit of showing how momentum methods optimize convergence guarantees. We further cover proximal acceleration, at the heart of the Catalyst and Accelerated Hybrid Proximal Extragradient frameworks, using similar algorithmic patterns. Common acceleration techniques rely directly on the knowledge of some of the regularity parameters in the problem at hand. We conclude by discussing restart schemes, a set of simple techniques for reaching nearly optimal convergence rates while adapting to unobserved regularity parameters. △ Less

Submitted 21 December, 2021; v1 submitted 23 January, 2021; originally announced January 2021.

Comments: Published in Foundation and Trends in Optimization (see https://www.nowpublishers.com/article/Details/OPT-036)

Journal ref: Foundations and Trends in Optimization: Vol. 5: No. 1-2, pp 1-245 (2021)

arXiv:2011.03358 [pdf, ps, other]

Generalization of Quasi-Newton Methods: Application to Robust Symmetric Multisecant Updates

Authors: Damien Scieur, Lewis Liu, Thomas Pumir, Nicolas Boumal

Abstract: Quasi-Newton techniques approximate the Newton step by estimating the Hessian using the so-called secant equations. Some of these methods compute the Hessian using several secant equations but produce non-symmetric updates. Other quasi-Newton schemes, such as BFGS, enforce symmetry but cannot satisfy more than one secant equation. We propose a new type of quasi-Newton symmetric update using severa… ▽ More Quasi-Newton techniques approximate the Newton step by estimating the Hessian using the so-called secant equations. Some of these methods compute the Hessian using several secant equations but produce non-symmetric updates. Other quasi-Newton schemes, such as BFGS, enforce symmetry but cannot satisfy more than one secant equation. We propose a new type of quasi-Newton symmetric update using several secant equations in a least-squares sense. Our approach generalizes and unifies the design of quasi-Newton updates and satisfies provable robustness guarantees. △ Less

Submitted 8 February, 2021; v1 submitted 6 November, 2020; originally announced November 2020.

Comments: AISTATS 2021

arXiv:2011.03351 [pdf, ps, other]

Affine Invariant Analysis of Frank-Wolfe on Strongly Convex Sets

Authors: Thomas Kerdreux, Lewis Liu, Simon Lacoste-Julien, Damien Scieur

Abstract: It is known that the Frank-Wolfe (FW) algorithm, which is affine-covariant, enjoys accelerated convergence rates when the constraint set is strongly convex. However, these results rely on norm-dependent assumptions, usually incurring non-affine invariant bounds, in contradiction with FW's affine-covariant property. In this work, we introduce new structural assumptions on the problem (such as the d… ▽ More It is known that the Frank-Wolfe (FW) algorithm, which is affine-covariant, enjoys accelerated convergence rates when the constraint set is strongly convex. However, these results rely on norm-dependent assumptions, usually incurring non-affine invariant bounds, in contradiction with FW's affine-covariant property. In this work, we introduce new structural assumptions on the problem (such as the directional smoothness) and derive an affine invariant, norm-independent analysis of Frank-Wolfe. Based on our analysis, we propose an affine invariant backtracking line-search. Interestingly, we show that typical backtracking line-searches using smoothness of the objective function surprisingly converge to an affine invariant step size, despite using affine-dependent norms in the step size's computation. This indicates that we do not necessarily need to know the set's structure in advance to enjoy the affine-invariant accelerated rate. △ Less

Submitted 6 November, 2020; originally announced November 2020.

arXiv:2010.02076 [pdf, other]

Average-case Acceleration for Bilinear Games and Normal Matrices

Authors: Carles Domingo-Enrich, Fabian Pedregosa, Damien Scieur

Abstract: Advances in generative modeling and adversarial learning have given rise to renewed interest in smooth games. However, the absence of symmetry in the matrix of second derivatives poses challenges that are not present in the classical minimization framework. While a rich theory of average-case analysis has been developed for minimization problems, little is known in the context of smooth games. In… ▽ More Advances in generative modeling and adversarial learning have given rise to renewed interest in smooth games. However, the absence of symmetry in the matrix of second derivatives poses challenges that are not present in the classical minimization framework. While a rich theory of average-case analysis has been developed for minimization problems, little is known in the context of smooth games. In this work we take a first step towards closing this gap by develo** average-case optimal first-order methods for a subset of smooth games. We make the following three main contributions. First, we show that for zero-sum bilinear games the average-case optimal method is the optimal method for the minimization of the Hamiltonian. Second, we provide an explicit expression for the optimal method corresponding to normal matrices, potentially non-symmetric. Finally, we specialize it to matrices with eigenvalues located in a disk and show a provable speed-up compared to worst-case optimal algorithms. We illustrate our findings through benchmarks with a varying degree of mismatch with our assumptions. △ Less

Submitted 5 October, 2020; originally announced October 2020.

Comments: 24 pages, 1 figure

arXiv:2002.04756 [pdf, other]

Average-case Acceleration Through Spectral Density Estimation

Authors: Fabian Pedregosa, Damien Scieur

Abstract: We develop a framework for the average-case analysis of random quadratic problems and derive algorithms that are optimal under this analysis. This yields a new class of methods that achieve acceleration given a model of the Hessian's eigenvalue distribution. We develop explicit algorithms for the uniform, Marchenko-Pastur, and exponential distributions. These methods are momentum-based algorithms,… ▽ More We develop a framework for the average-case analysis of random quadratic problems and derive algorithms that are optimal under this analysis. This yields a new class of methods that achieve acceleration given a model of the Hessian's eigenvalue distribution. We develop explicit algorithms for the uniform, Marchenko-Pastur, and exponential distributions. These methods are momentum-based algorithms, whose hyper-parameters can be estimated without knowledge of the Hessian's smallest singular value, in contrast with classical accelerated methods like Nesterov acceleration and Polyak momentum. Through empirical benchmarks on quadratic and logistic regression problems, we identify regimes in which the the proposed methods improve over classical (worst-case) accelerated methods. △ Less

Submitted 20 February, 2023; v1 submitted 11 February, 2020; originally announced February 2020.

Journal ref: Proceedings of the 37th International Conference on Machine Learning, PMLR 119, 2020

arXiv:2002.04664 [pdf, other]

Universal Average-Case Optimality of Polyak Momentum

Authors: Damien Scieur, Fabian Pedregosa

Abstract: Polyak momentum (PM), also known as the heavy-ball method, is a widely used optimization method that enjoys an asymptotic optimal worst-case complexity on quadratic objectives. However, its remarkable empirical success is not fully explained by this optimality, as the worst-case analysis -- contrary to the average-case -- is not representative of the expected complexity of an algorithm. In this wo… ▽ More Polyak momentum (PM), also known as the heavy-ball method, is a widely used optimization method that enjoys an asymptotic optimal worst-case complexity on quadratic objectives. However, its remarkable empirical success is not fully explained by this optimality, as the worst-case analysis -- contrary to the average-case -- is not representative of the expected complexity of an algorithm. In this work we establish a novel link between PM and the average-case analysis. Our main contribution is to prove that any optimal average-case method converges in the number of iterations to PM, under mild assumptions. This brings a new perspective on this classical method, showing that PM is asymptotically both worst-case and average-case optimal. △ Less

Submitted 21 January, 2021; v1 submitted 11 February, 2020; originally announced February 2020.

Comments: Added references in the proof of Theorem 4.1

Journal ref: Proceedings of the 37th International Conference on Machine Learning, PMLR 119, 2020

arXiv:2001.00602 [pdf, other]

Accelerating Smooth Games by Manipulating Spectral Shapes

Authors: Waïss Azizian, Damien Scieur, Ioannis Mitliagkas, Simon Lacoste-Julien, Gauthier Gidel

Abstract: We use matrix iteration theory to characterize acceleration in smooth games. We define the spectral shape of a family of games as the set containing all eigenvalues of the Jacobians of standard gradient dynamics in the family. Shapes restricted to the real line represent well-understood classes of problems, like minimization. Shapes spanning the complex plane capture the added numerical challenges… ▽ More We use matrix iteration theory to characterize acceleration in smooth games. We define the spectral shape of a family of games as the set containing all eigenvalues of the Jacobians of standard gradient dynamics in the family. Shapes restricted to the real line represent well-understood classes of problems, like minimization. Shapes spanning the complex plane capture the added numerical challenges in solving smooth games. In this framework, we describe gradient-based methods, such as extragradient, as transformations on the spectral shape. Using this perspective, we propose an optimal algorithm for bilinear games. For smooth and strongly monotone operators, we identify a continuum between convex minimization, where acceleration is possible using Polyak's momentum, and the worst case where gradient descent is optimal. Finally, going beyond first-order methods, we propose an accelerated version of consensus optimization. △ Less

Submitted 9 March, 2020; v1 submitted 2 January, 2020; originally announced January 2020.

Comments: Appears in: Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS 2020). 34 pages

MSC Class: G.1.6; I.2.6 ACM Class: G.1.6; I.2.6

arXiv:1905.12363 [pdf, other]

Extragradient with player sampling for faster Nash equilibrium finding

Authors: Carles Domingo Enrich, Samy Jelassi, Carles Domingo-Enrich, Damien Scieur, Arthur Mensch, Joan Bruna

Abstract: Data-driven modeling increasingly requires to find a Nash equilibrium in multi-player games, e.g. when training GANs. In this paper, we analyse a new extra-gradient method for Nash equilibrium finding, that performs gradient extrapolations and updates on a random subset of players at each iteration. This approach provably exhibits a better rate of convergence than full extra-gradient for non-smoot… ▽ More Data-driven modeling increasingly requires to find a Nash equilibrium in multi-player games, e.g. when training GANs. In this paper, we analyse a new extra-gradient method for Nash equilibrium finding, that performs gradient extrapolations and updates on a random subset of players at each iteration. This approach provably exhibits a better rate of convergence than full extra-gradient for non-smooth convex games with noisy gradient oracle. We propose an additional variance reduction mechanism to obtain speed-ups in smooth convex games. Our approach makes extrapolation amenable to massive multiplayer settings, and brings empirical speed-ups, in particular when using a heuristic cyclic sampling scheme. Most importantly, it allows to train faster and better GANs and mixtures of GANs. △ Less

Submitted 21 July, 2020; v1 submitted 29 May, 2019; originally announced May 2019.

arXiv:1903.08764 [pdf, ps, other]

Generalized Framework for Nonlinear Acceleration

Authors: Damien Scieur

Abstract: Nonlinear acceleration algorithms improve the performance of iterative methods, such as gradient descent, using the information contained in past iterates. However, their efficiency is still not entirely understood even in the quadratic case. In this paper, we clarify the convergence analysis by giving general properties that share several classes of nonlinear acceleration: Anderson acceleration (… ▽ More Nonlinear acceleration algorithms improve the performance of iterative methods, such as gradient descent, using the information contained in past iterates. However, their efficiency is still not entirely understood even in the quadratic case. In this paper, we clarify the convergence analysis by giving general properties that share several classes of nonlinear acceleration: Anderson acceleration (and variants), quasi-Newton methods (such as Broyden Type-I or Type-II, SR1, DFP, and BFGS) and Krylov methods (Conjugate Gradient, MINRES, GMRES). In particular, we propose a generic family of algorithms that contains all the previous methods and prove its optimal rate of convergence when minimizing quadratic functions. We also propose multi-secants updates for the quasi-Newton methods listed above. We provide a Matlab code implementing the algorithm. △ Less

Submitted 20 March, 2019; originally announced March 2019.

arXiv:1810.04539 [pdf, other]

Nonlinear Acceleration of Momentum and Primal-Dual Algorithms

Authors: Raghu Bollapragada, Damien Scieur, Alexandre d'Aspremont

Abstract: We describe convergence acceleration schemes for multistep optimization algorithms. The extrapolated solution is written as a nonlinear average of the iterates produced by the original optimization method. Our analysis does not need the underlying fixed-point operator to be symmetric, hence handles e.g. algorithms with momentum terms such as Nesterov's accelerated method, or primal-dual methods. T… ▽ More We describe convergence acceleration schemes for multistep optimization algorithms. The extrapolated solution is written as a nonlinear average of the iterates produced by the original optimization method. Our analysis does not need the underlying fixed-point operator to be symmetric, hence handles e.g. algorithms with momentum terms such as Nesterov's accelerated method, or primal-dual methods. The weights are computed via a simple linear system and we analyze performance in both online and offline modes. We use Crouzeix's conjecture to show that acceleration performance is controlled by the solution of a Chebyshev problem on the numerical range of a non-symmetric operator modeling the behavior of iterates near the optimum. Numerical experiments are detailed on logistic regression problems. △ Less

Submitted 17 October, 2019; v1 submitted 10 October, 2018; originally announced October 2018.

arXiv:1806.00370 [pdf, ps, other]

Nonlinear Acceleration of CNNs

Authors: Damien Scieur, Edouard Oyallon, Alexandre d'Aspremont, Francis Bach

Abstract: The Regularized Nonlinear Acceleration (RNA) algorithm is an acceleration method capable of improving the rate of convergence of many optimization schemes such as gradient descend, SAGA or SVRG. Until now, its analysis is limited to convex problems, but empirical observations shows that RNA may be extended to wider settings. In this paper, we investigate further the benefits of RNA when applied to… ▽ More The Regularized Nonlinear Acceleration (RNA) algorithm is an acceleration method capable of improving the rate of convergence of many optimization schemes such as gradient descend, SAGA or SVRG. Until now, its analysis is limited to convex problems, but empirical observations shows that RNA may be extended to wider settings. In this paper, we investigate further the benefits of RNA when applied to neural networks, in particular for the task of image recognition on CIFAR10 and ImageNet. With very few modifications of exiting frameworks, RNA improves slightly the optimization process of CNNs, after training. △ Less

Submitted 1 June, 2018; originally announced June 2018.

arXiv:1805.09639 [pdf, ps, other]

Online Regularized Nonlinear Acceleration

Authors: Damien Scieur, Edouard Oyallon, Alexandre d'Aspremont, Francis Bach

Abstract: Regularized nonlinear acceleration (RNA) estimates the minimum of a function by post-processing iterates from an algorithm such as the gradient method. It can be seen as a regularized version of Anderson acceleration, a classical acceleration scheme from numerical analysis. The new scheme provably improves the rate of convergence of fixed step gradient descent, and its empirical performance is com… ▽ More Regularized nonlinear acceleration (RNA) estimates the minimum of a function by post-processing iterates from an algorithm such as the gradient method. It can be seen as a regularized version of Anderson acceleration, a classical acceleration scheme from numerical analysis. The new scheme provably improves the rate of convergence of fixed step gradient descent, and its empirical performance is comparable to that of quasi-Newton methods. However, RNA cannot accelerate faster multistep algorithms like Nesterov's method and often diverges in this context. Here, we adapt RNA to overcome these issues, so that our scheme can be used on fast algorithms such as gradient methods with momentum. We show optimal complexity bounds for quadratics and asymptotically optimal rates on general convex minimization problems. Moreover, this new scheme works online, i.e., extrapolated solution estimates can be reinjected at each iteration, significantly improving numerical performance over classical accelerated methods. △ Less

Submitted 21 June, 2019; v1 submitted 24 May, 2018; originally announced May 2018.

arXiv:1706.07270 [pdf, other]

Nonlinear Acceleration of Stochastic Algorithms

Authors: Damien Scieur, Alexandre d'Aspremont, Francis Bach

Abstract: Extrapolation methods use the last few iterates of an optimization algorithm to produce a better estimate of the optimum. They were shown to achieve optimal convergence rates in a deterministic setting using simple gradient iterates. Here, we study extrapolation methods in a stochastic setting, where the iterates are produced by either a simple or an accelerated stochastic gradient algorithm. We f… ▽ More Extrapolation methods use the last few iterates of an optimization algorithm to produce a better estimate of the optimum. They were shown to achieve optimal convergence rates in a deterministic setting using simple gradient iterates. Here, we study extrapolation methods in a stochastic setting, where the iterates are produced by either a simple or an accelerated stochastic gradient algorithm. We first derive convergence bounds for arbitrary, potentially biased perturbations, then produce asymptotic bounds using the ratio between the variance of the noise and the accuracy of the current point. Finally, we apply this acceleration technique to stochastic algorithms such as SGD, SAGA, SVRG and Katyusha in different settings, and show significant performance gains. △ Less

Submitted 3 August, 2017; v1 submitted 22 June, 2017; originally announced June 2017.

arXiv:1702.06751 [pdf, ps, other]

Integration Methods and Accelerated Optimization Algorithms

Authors: Damien Scieur, Vincent Roulet, Francis Bach, Alexandre d'Aspremont

Abstract: We show that accelerated optimization methods can be seen as particular instances of multi-step integration schemes from numerical analysis, applied to the gradient flow equation. In comparison with recent advances in this vein, the differential equation considered here is the basic gradient flow and we show that multi-step schemes allow integration of this differential equation using larger step… ▽ More We show that accelerated optimization methods can be seen as particular instances of multi-step integration schemes from numerical analysis, applied to the gradient flow equation. In comparison with recent advances in this vein, the differential equation considered here is the basic gradient flow and we show that multi-step schemes allow integration of this differential equation using larger step sizes, thus intuitively explaining acceleration results. △ Less

Submitted 22 February, 2017; originally announced February 2017.

arXiv:1606.04133 [pdf, ps, other]

Regularized Nonlinear Acceleration

Authors: Damien Scieur, Alexandre d'Aspremont, Francis Bach

Abstract: We describe a convergence acceleration technique for unconstrained optimization problems. Our scheme computes estimates of the optimum from a nonlinear average of the iterates produced by any optimization method. The weights in this average are computed via a simple linear system, whose solution can be updated online. This acceleration scheme runs in parallel to the base algorithm, providing impro… ▽ More We describe a convergence acceleration technique for unconstrained optimization problems. Our scheme computes estimates of the optimum from a nonlinear average of the iterates produced by any optimization method. The weights in this average are computed via a simple linear system, whose solution can be updated online. This acceleration scheme runs in parallel to the base algorithm, providing improved estimates of the solution on the fly, while the original optimization method is running. Numerical experiments are detailed on classical classification problems. △ Less

Submitted 14 April, 2019; v1 submitted 13 June, 2016; originally announced June 2016.

Showing 1–21 of 21 results for author: Scieur, D