Skip to main content

Showing 1–18 of 18 results for author: Woodworth, B

Searching in archive math. Search in all archives.
.
  1. arXiv:2302.03542  [pdf, other

    cs.LG math.OC

    Two Losses Are Better Than One: Faster Optimization Using a Cheaper Proxy

    Authors: Blake Woodworth, Konstantin Mishchenko, Francis Bach

    Abstract: We present an algorithm for minimizing an objective with hard-to-compute gradients by using a related, easier-to-access function as a proxy. Our algorithm is based on approximate proximal point iterations on the proxy combined with relatively few stochastic gradients from the objective. When the difference between the objective and the proxy is $δ$-smooth, our algorithm guarantees convergence at a… ▽ More

    Submitted 7 June, 2023; v1 submitted 7 February, 2023; originally announced February 2023.

  2. arXiv:2206.07638  [pdf, other

    math.OC cs.DC cs.LG

    Asynchronous SGD Beats Minibatch SGD Under Arbitrary Delays

    Authors: Konstantin Mishchenko, Francis Bach, Mathieu Even, Blake Woodworth

    Abstract: The existing analysis of asynchronous stochastic gradient descent (SGD) degrades dramatically when any delay is large, giving the impression that performance depends primarily on the delay. On the contrary, we prove much better guarantees for the same asynchronous SGD algorithm regardless of the delays in the gradients, depending instead just on the number of parallel devices used to implement the… ▽ More

    Submitted 20 April, 2023; v1 submitted 15 June, 2022; originally announced June 2022.

  3. arXiv:2204.04970  [pdf, other

    cs.LG math.OC

    Non-Convex Optimization with Certificates and Fast Rates Through Kernel Sums of Squares

    Authors: Blake Woodworth, Francis Bach, Alessandro Rudi

    Abstract: We consider potentially non-convex optimization problems, for which optimal rates of approximation depend on the dimension of the parameter space and the smoothness of the function to be optimized. In this paper, we propose an algorithm that achieves close to optimal a priori computational guarantees, while also providing a posteriori certificates of optimality. Our general formulation builds on i… ▽ More

    Submitted 11 April, 2022; originally announced April 2022.

  4. arXiv:2110.02954  [pdf, other

    math.OC cs.LG stat.ML

    A Stochastic Newton Algorithm for Distributed Convex Optimization

    Authors: Brian Bullins, Kumar Kshitij Patel, Ohad Shamir, Nathan Srebro, Blake Woodworth

    Abstract: We propose and analyze a stochastic Newton algorithm for homogeneous distributed stochastic convex optimization, where each machine can calculate stochastic gradients of the same population objective, as well as stochastic Hessian-vector products (products of an independent unbiased estimator of the Hessian of the population objective with arbitrary vectors), with many such stochastic computations… ▽ More

    Submitted 7 October, 2021; originally announced October 2021.

  5. arXiv:2109.00534  [pdf, other

    math.OC cs.LG

    The Minimax Complexity of Distributed Optimization

    Authors: Blake Woodworth

    Abstract: In this thesis, I study the minimax oracle complexity of distributed stochastic optimization. First, I present the "graph oracle model", an extension of the classic oracle complexity framework that can be applied to study distributed optimization algorithms. Next, I describe a general approach to proving optimization lower bounds for arbitrary randomized algorithms (as opposed to more restricted c… ▽ More

    Submitted 1 September, 2021; originally announced September 2021.

  6. arXiv:2106.02720  [pdf, ps, other

    cs.LG math.OC

    An Even More Optimal Stochastic Optimization Algorithm: Minibatching and Interpolation Learning

    Authors: Blake Woodworth, Nathan Srebro

    Abstract: We present and analyze an algorithm for optimizing smooth and convex or strongly convex objectives using minibatch stochastic gradient estimates. The algorithm is optimal with respect to its dependence on both the minibatch size and minimum expected loss simultaneously. This improves over the optimal method of Lan (2012), which is insensitive to the minimum expected loss; over the optimistic accel… ▽ More

    Submitted 26 October, 2021; v1 submitted 4 June, 2021; originally announced June 2021.

    Comments: 24 pages

  7. arXiv:2102.01583  [pdf, other

    cs.LG math.OC

    The Min-Max Complexity of Distributed Stochastic Convex Optimization with Intermittent Communication

    Authors: Blake Woodworth, Brian Bullins, Ohad Shamir, Nathan Srebro

    Abstract: We resolve the min-max complexity of distributed stochastic convex optimization (up to a log factor) in the intermittent communication setting, where $M$ machines work in parallel over the course of $R$ rounds of communication to optimize the objective, and during each round of communication, each machine may sequentially compute $K$ stochastic gradient estimates. We present a novel lower bound wi… ▽ More

    Submitted 5 August, 2021; v1 submitted 2 February, 2021; originally announced February 2021.

    Comments: 48 pages

  8. arXiv:2006.04735  [pdf, other

    cs.LG math.OC stat.ML

    Minibatch vs Local SGD for Heterogeneous Distributed Learning

    Authors: Blake Woodworth, Kumar Kshitij Patel, Nathan Srebro

    Abstract: We analyze Local SGD (aka parallel or federated SGD) and Minibatch SGD in the heterogeneous distributed setting, where each machine has access to stochastic gradient estimates for a different, machine-specific, convex objective; the goal is to optimize w.r.t. the average objective; and machines can only communicate intermittently. We argue that, (i) Minibatch SGD (even without acceleration) domina… ▽ More

    Submitted 1 March, 2022; v1 submitted 8 June, 2020; originally announced June 2020.

    Comments: 34 pages

  9. arXiv:2004.01025  [pdf, ps, other

    cs.LG math.OC stat.ML

    Mirrorless Mirror Descent: A Natural Derivation of Mirror Descent

    Authors: Suriya Gunasekar, Blake Woodworth, Nathan Srebro

    Abstract: We present a primal only derivation of Mirror Descent as a "partial" discretization of gradient flow on a Riemannian manifold where the metric tensor is the Hessian of the Mirror Descent potential. We contrast this discretization to Natural Gradient Descent, which is obtained by a "full" forward Euler discretization. This view helps shed light on the relationship between the methods and allows gen… ▽ More

    Submitted 1 July, 2021; v1 submitted 2 April, 2020; originally announced April 2020.

    Comments: 11 pages

  10. arXiv:2002.07839  [pdf, other

    cs.LG math.OC stat.ML

    Is Local SGD Better than Minibatch SGD?

    Authors: Blake Woodworth, Kumar Kshitij Patel, Sebastian U. Stich, Zhen Dai, Brian Bullins, H. Brendan McMahan, Ohad Shamir, Nathan Srebro

    Abstract: We study local SGD (also known as parallel SGD and federated averaging), a natural and frequently used stochastic distributed optimization method. Its theoretical foundations are currently lacking and we highlight how all existing error guarantees in the convex setting are dominated by a simple baseline, minibatch SGD. (1) For quadratic objectives we prove that local SGD strictly dominates minibat… ▽ More

    Submitted 20 July, 2020; v1 submitted 18 February, 2020; originally announced February 2020.

    Comments: 29 pages

  11. arXiv:1912.02365  [pdf, other

    math.OC cs.IT cs.LG stat.ML

    Lower Bounds for Non-Convex Stochastic Optimization

    Authors: Yossi Arjevani, Yair Carmon, John C. Duchi, Dylan J. Foster, Nathan Srebro, Blake Woodworth

    Abstract: We lower bound the complexity of finding $ε$-stationary points (with gradient norm at most $ε$) using stochastic first-order methods. In a well-studied model where algorithms access smooth, potentially non-convex functions through queries to an unbiased stochastic gradient oracle with bounded variance, we prove that (in the worst case) any algorithm requires at least $ε^{-4}$ queries to find an… ▽ More

    Submitted 27 February, 2022; v1 submitted 4 December, 2019; originally announced December 2019.

    Comments: Correction to hard instance dimensions in Theorem 3

  12. arXiv:1911.02212  [pdf, other

    cs.LG cs.DS math.OC stat.ML

    The gradient complexity of linear regression

    Authors: Mark Braverman, Elad Hazan, Max Simchowitz, Blake Woodworth

    Abstract: We investigate the computational complexity of several basic linear algebra primitives, including largest eigenvector computation and linear regression, in the computational model that allows access to the data via a matrix-vector product oracle. We show that for polynomial accuracy, $Θ(d)$ calls to the oracle are necessary and sufficient even for a randomized algorithm. Our lower bound is based… ▽ More

    Submitted 23 May, 2021; v1 submitted 6 November, 2019; originally announced November 2019.

  13. arXiv:1907.00762  [pdf, other

    cs.LG math.OC stat.ML

    Open Problem: The Oracle Complexity of Convex Optimization with Limited Memory

    Authors: Blake Woodworth, Nathan Srebro

    Abstract: We note that known methods achieving the optimal oracle complexity for first order convex optimization require quadratic memory, and ask whether this is necessary, and more broadly seek to characterize the minimax number of first order queries required to optimize a convex Lipschitz function subject to a memory constraint.

    Submitted 1 July, 2019; originally announced July 2019.

    Comments: 9 pages

  14. arXiv:1906.09231  [pdf, other

    cs.LG math.ST stat.ML

    Guaranteed Validity for Empirical Approaches to Adaptive Data Analysis

    Authors: Ryan Rogers, Aaron Roth, Adam Smith, Nathan Srebro, Om Thakkar, Blake Woodworth

    Abstract: We design a general framework for answering adaptive statistical queries that focuses on providing explicit confidence intervals along with point estimates. Prior work in this area has either focused on providing tight confidence intervals for specific analyses, or providing general worst-case bounds for point estimates. Unfortunately, as we observe, these worst-case bounds are loose in many setti… ▽ More

    Submitted 9 March, 2020; v1 submitted 21 June, 2019; originally announced June 2019.

    Comments: Accepted to appear in the proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS) 2020

  15. arXiv:1902.04686  [pdf, ps, other

    cs.LG math.OC stat.ML

    The Complexity of Making the Gradient Small in Stochastic Convex Optimization

    Authors: Dylan J. Foster, Ayush Sekhari, Ohad Shamir, Nathan Srebro, Karthik Sridharan, Blake Woodworth

    Abstract: We give nearly matching upper and lower bounds on the oracle complexity of finding $ε$-stationary points ($\| \nabla F(x) \| \leqε$) in stochastic convex optimization. We jointly analyze the oracle complexity in both the local stochastic oracle model and the global oracle (or, statistical learning) model. This allows us to decompose the complexity of finding near-stationary points into optimizatio… ▽ More

    Submitted 14 February, 2019; v1 submitted 12 February, 2019; originally announced February 2019.

  16. arXiv:1805.10222  [pdf, other

    math.OC cs.LG stat.ML

    Graph Oracle Models, Lower Bounds, and Gaps for Parallel Stochastic Optimization

    Authors: Blake Woodworth, Jialei Wang, Adam Smith, Brendan McMahan, Nathan Srebro

    Abstract: We suggest a general oracle-based framework that captures different parallel stochastic optimization settings described by a dependency graph, and derive generic lower bounds in terms of this graph. We then use the framework and derive lower bounds for several specific parallel optimization settings, including delayed updates and parallel processing with intermittent communication. We highlight ga… ▽ More

    Submitted 11 February, 2019; v1 submitted 25 May, 2018; originally announced May 2018.

  17. arXiv:1709.03594  [pdf, ps, other

    math.OC

    Lower Bound for Randomized First Order Convex Optimization

    Authors: Blake Woodworth, Nathan Srebro

    Abstract: We provide an explicit construction and direct proof for the lower bound on the number of first order oracle accesses required for a randomized algorithm to minimize a convex Lipschitz function.

    Submitted 3 November, 2017; v1 submitted 11 September, 2017; originally announced September 2017.

    Comments: 8 pages

  18. arXiv:1605.08003  [pdf, ps, other

    math.OC cs.LG stat.ML

    Tight Complexity Bounds for Optimizing Composite Objectives

    Authors: Blake Woodworth, Nathan Srebro

    Abstract: We provide tight upper and lower bounds on the complexity of minimizing the average of $m$ convex functions using gradient and prox oracles of the component functions. We show a significant gap between the complexity of deterministic vs randomized optimization. For smooth functions, we show that accelerated gradient descent (AGD) and an accelerated variant of SVRG are optimal in the deterministic… ▽ More

    Submitted 4 April, 2019; v1 submitted 25 May, 2016; originally announced May 2016.