Search | arXiv e-print repository

Provably Faster Gradient Descent via Long Steps

Abstract: This work establishes new convergence guarantees for gradient descent in smooth convex optimization via a computer-assisted analysis technique. Our theory allows nonconstant stepsize policies with frequent long steps potentially violating descent by analyzing the overall effect of many iterations at once rather than the typical one-iteration inductions used in most first-order method analyses. We… ▽ More This work establishes new convergence guarantees for gradient descent in smooth convex optimization via a computer-assisted analysis technique. Our theory allows nonconstant stepsize policies with frequent long steps potentially violating descent by analyzing the overall effect of many iterations at once rather than the typical one-iteration inductions used in most first-order method analyses. We show that long steps, which may increase the objective value in the short term, lead to provably faster convergence in the long term. A conjecture towards proving a faster $O(1/T\log T)$ rate for gradient descent is also motivated along with simple numerical validation. △ Less

Submitted 4 February, 2024; v1 submitted 12 July, 2023; originally announced July 2023.

Comments: 20 pages

arXiv:2305.17323 [pdf, other]

Some Primal-Dual Theory for Subgradient Methods for Strongly Convex Optimization

Authors: Benjamin Grimmer, Danlin Li

Abstract: We consider (stochastic) subgradient methods for strongly convex but potentially nonsmooth non-Lipschitz optimization. We provide new equivalent dual descriptions (in the style of dual averaging) for the classic subgradient method, the proximal subgradient method, and the switching subgradient method. These equivalences enable $O(1/T)$ convergence guarantees in terms of both their classic primal g… ▽ More We consider (stochastic) subgradient methods for strongly convex but potentially nonsmooth non-Lipschitz optimization. We provide new equivalent dual descriptions (in the style of dual averaging) for the classic subgradient method, the proximal subgradient method, and the switching subgradient method. These equivalences enable $O(1/T)$ convergence guarantees in terms of both their classic primal gap and a not previously analyzed dual gap for strongly convex optimization. Consequently, our theory provides these classic methods with simple, optimal stop** criteria and optimality certificates at no added computational cost. Our results apply to a wide range of stepsize selections and of non-Lipschitz ill-conditioned problems where the early iterations of the subgradient method may diverge exponentially quickly (a phenomenon which, to the best of our knowledge, no prior works address). Even in the presence of such undesirable behaviors, our theory still ensures and bounds eventual convergence. △ Less

Submitted 26 June, 2024; v1 submitted 26 May, 2023; originally announced May 2023.

Comments: 24 pages, major revision shortened the write-up and unified the analysis to be done just once in a single "super" setting

arXiv:2303.05037 [pdf, other]

Gauges and Accelerated Optimization over Smooth and/or Strongly Convex Sets

Authors: Ning Liu, Benjamin Grimmer

Abstract: We consider feasibility and constrained optimization problems defined over smooth and/or strongly convex sets. These notions mirror their popular function counterparts but are much less explored in the first-order optimization literature. We propose new scalable, projection-free, accelerated first-order methods in these settings. Our methods avoid linear optimization or projection oracles, only us… ▽ More We consider feasibility and constrained optimization problems defined over smooth and/or strongly convex sets. These notions mirror their popular function counterparts but are much less explored in the first-order optimization literature. We propose new scalable, projection-free, accelerated first-order methods in these settings. Our methods avoid linear optimization or projection oracles, only using cheap one-dimensional linesearches and normal vector computations. Despite this, we derive optimal accelerated convergence guarantees of $O(1/T)$ for strongly convex problems, $O(1/T^2)$ for smooth problems, and accelerated linear convergence given both. Our algorithms and analysis are based on novel characterizations of the Minkowski gauge of smooth and/or strongly convex sets, which may be of independent interest: although the gauge is neither smooth nor strongly convex, we show the gauge squared inherits any structure present in the set. △ Less

Submitted 31 March, 2023; v1 submitted 9 March, 2023; originally announced March 2023.

Comments: 22pages (32pages with references and appendix)

MSC Class: 90C25; 90C52

arXiv:2010.10628 [pdf, other]

Limiting Behaviors of Nonconvex-Nonconcave Minimax Optimization via Continuous-Time Systems

Authors: Benjamin Grimmer, Haihao Lu, Pratik Worah, Vahab Mirrokni

Abstract: Unlike nonconvex optimization, where gradient descent is guaranteed to converge to a local optimizer, algorithms for nonconvex-nonconcave minimax optimization can have topologically different solution paths: sometimes converging to a solution, sometimes never converging and instead following a limit cycle, and sometimes diverging. In this paper, we study the limiting behaviors of three classic min… ▽ More Unlike nonconvex optimization, where gradient descent is guaranteed to converge to a local optimizer, algorithms for nonconvex-nonconcave minimax optimization can have topologically different solution paths: sometimes converging to a solution, sometimes never converging and instead following a limit cycle, and sometimes diverging. In this paper, we study the limiting behaviors of three classic minimax algorithms: gradient descent ascent (GDA), alternating gradient descent ascent (AGDA), and the extragradient method (EGM). Numerically, we observe that all of these limiting behaviors can arise in Generative Adversarial Networks (GAN) training and are easily demonstrated for a range of GAN problems. To explain these different behaviors, we study the high-order resolution continuous-time dynamics that correspond to each algorithm, which results in the sufficient (and almost necessary) conditions for the local convergence by each method. Moreover, this ODE perspective allows us to characterize the phase transition between these different limiting behaviors caused by introducing regularization as Hopf Bifurcations. △ Less

Submitted 4 March, 2021; v1 submitted 20 October, 2020; originally announced October 2020.

MSC Class: 65K05; 65K10; 90C26; 90C15; 90C30

arXiv:2006.08667 [pdf, other]

The Landscape of the Proximal Point Method for Nonconvex-Nonconcave Minimax Optimization

Authors: Benjamin Grimmer, Haihao Lu, Pratik Worah, Vahab Mirrokni

Abstract: Minimax optimization has become a central tool in machine learning with applications in robust optimization, reinforcement learning, GANs, etc. These applications are often nonconvex-nonconcave, but the existing theory is unable to identify and deal with the fundamental difficulties this poses. In this paper, we study the classic proximal point method (PPM) applied to nonconvex-nonconcave minimax… ▽ More Minimax optimization has become a central tool in machine learning with applications in robust optimization, reinforcement learning, GANs, etc. These applications are often nonconvex-nonconcave, but the existing theory is unable to identify and deal with the fundamental difficulties this poses. In this paper, we study the classic proximal point method (PPM) applied to nonconvex-nonconcave minimax problems. We find that a classic generalization of the Moreau envelope by Attouch and Wets provides key insights. Critically, we show this envelope not only smooths the objective but can convexify and concavify it based on the level of interaction present between the minimizing and maximizing variables. From this, we identify three distinct regions of nonconvex-nonconcave problems. When interaction is sufficiently strong, we derive global linear convergence guarantees. Conversely when the interaction is fairly weak, we derive local linear convergence guarantees with a proper initialization. Between these two settings, we show that PPM may diverge or converge to a limit cycle. △ Less

Submitted 1 April, 2021; v1 submitted 15 June, 2020; originally announced June 2020.

Comments: Notably updated version that connects our theory with that of Attouch and Wets from the 80s and notably expands on our first posting to apply to generic minimax problems (rather than requiring bilinear interaction)

MSC Class: 65K05; 65K10; 90C26; 90C15; 90C30

arXiv:1712.04104 [pdf, ps, other]

Convergence Rates for Deterministic and Stochastic Subgradient Methods Without Lipschitz Continuity

Authors: Benjamin Grimmer

Abstract: We extend the classic convergence rate theory for subgradient methods to apply to non-Lipschitz functions. For the deterministic projected subgradient method, we present a global $O(1/\sqrt{T})$ convergence rate for any convex function which is locally Lipschitz around its minimizers. This approach is based on Shor's classic subgradient analysis and implies generalizations of the standard converge… ▽ More We extend the classic convergence rate theory for subgradient methods to apply to non-Lipschitz functions. For the deterministic projected subgradient method, we present a global $O(1/\sqrt{T})$ convergence rate for any convex function which is locally Lipschitz around its minimizers. This approach is based on Shor's classic subgradient analysis and implies generalizations of the standard convergence rates for gradient descent on functions with Lipschitz or Hölder continuous gradients. Further, we show a $O(1/\sqrt{T})$ convergence rate for the stochastic projected subgradient method on convex functions with at most quadratic growth, which improves to $O(1/T)$ under either strong convexity or a weaker quadratic lower bound condition. △ Less

Submitted 26 February, 2018; v1 submitted 11 December, 2017; originally announced December 2017.

Comments: Update 2/26/18: Major revision improving the convergence results to no longer need an exponential upper bound on function growth in the convex case. Now local Lipschitz continuity around a minimizer suffices for a global convergence rate. Update 12/21/17: Added three more references on weakening strong convexity and minorly changed some wording. 16 pages

MSC Class: 65K05; 65K10; 90C25; 90C15; 90C30

arXiv:1707.03505 [pdf, other]

Proximally Guided Stochastic Subgradient Method for Nonsmooth, Nonconvex Problems

Authors: Damek Davis, Benjamin Grimmer

Abstract: In this paper, we introduce a stochastic projected subgradient method for weakly convex (i.e., uniformly prox-regular) nonsmooth, nonconvex functions---a wide class of functions which includes the additive and convex composite classes. At a high-level, the method is an inexact proximal point iteration in which the strongly convex proximal subproblems are quickly solved with a specialized stochasti… ▽ More In this paper, we introduce a stochastic projected subgradient method for weakly convex (i.e., uniformly prox-regular) nonsmooth, nonconvex functions---a wide class of functions which includes the additive and convex composite classes. At a high-level, the method is an inexact proximal point iteration in which the strongly convex proximal subproblems are quickly solved with a specialized stochastic projected subgradient method. The primary contribution of this paper is a simple proof that the proposed algorithm converges at the same rate as the stochastic gradient method for smooth nonconvex problems. This result appears to be the first convergence rate analysis of a stochastic (or even deterministic) subgradient method for the class of weakly convex functions. △ Less

Submitted 17 September, 2018; v1 submitted 11 July, 2017; originally announced July 2017.

Comments: Updated 9/17/2018: Major Revision -added high probability bounds, improved convergence analysis in general, new experimental results. Updated 7/26/2017: Added references to introduction and a couple simple extensions as Sections 3.2 and 4. Updated 8/23/2017: Added NSF acknowledgements. Updated 10/16/2017: Added experimental results

MSC Class: 65K05; 65K10; 90C26; 90C15; 90C30

arXiv:1508.05567 [pdf, other]

Dual-Based Approximation Algorithms for Cut-Based Network Connectivity Problems

Authors: Benjamin Grimmer

Abstract: We consider a variety of NP-Complete network connectivity problems. We introduce a novel dual-based approach to approximating network design problems with cut-based linear programming relaxations. This approach gives a $3/2$-approximation to Minimum 2-Edge-Connected Spanning Subgraph that is equivalent to a previously proposed algorithm. One well-studied branch of network design models ad hoc netw… ▽ More We consider a variety of NP-Complete network connectivity problems. We introduce a novel dual-based approach to approximating network design problems with cut-based linear programming relaxations. This approach gives a $3/2$-approximation to Minimum 2-Edge-Connected Spanning Subgraph that is equivalent to a previously proposed algorithm. One well-studied branch of network design models ad hoc networks where each node can either operate at high or low power. If we allow unidirectional links, we can formalize this into the problem Dual Power Assignment (DPA). Our dual-based approach gives a $3/2$-approximation to DPA, improving the previous best approximation known of $11/7\approx 1.57$. Another standard network design problem is Minimum Strongly Connected Spanning Subgraph (MSCS). We propose a new problem generalizing MSCS and DPA called Star Strong Connectivity (SSC). Then we show that our dual-based approach achieves a 1.6-approximation ratio on SSC. As a consequence of our dual-based approximations, we prove new upper bounds on the integrality gaps of these problems. △ Less

Submitted 20 July, 2017; v1 submitted 23 August, 2015; originally announced August 2015.

Comments: 7/20/2017: Changed Title to be more accurate. Improved presentation and clarity throughout the document (i.e. adding references and fixing typos)

ACM Class: G.1.6; G.2.2; F.2.2

Showing 1–8 of 8 results for author: Grimmer, B