Skip to main content

Showing 1–12 of 12 results for author: Drusvyatskiy, D

Searching in archive stat. Search in all archives.
.
  1. arXiv:2405.09676  [pdf, ps, other

    math.ST math.OC stat.ML

    The radius of statistical efficiency

    Authors: Joshua Cutler, Mateo Díaz, Dmitriy Drusvyatskiy

    Abstract: Classical results in asymptotic statistics show that the Fisher information matrix controls the difficulty of estimating a statistical model from observed data. In this work, we introduce a companion measure of robustness of an estimation problem: the radius of statistical efficiency (RSE) is the size of the smallest perturbation to the problem data that renders the Fisher information matrix singu… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    MSC Class: 90C15; 49K40; 62F12; 90C31

  2. arXiv:2401.04553  [pdf, other

    stat.ML cs.LG

    Linear Recursive Feature Machines provably recover low-rank matrices

    Authors: Adityanarayanan Radhakrishnan, Mikhail Belkin, Dmitriy Drusvyatskiy

    Abstract: A fundamental problem in machine learning is to understand how neural networks make accurate predictions, while seemingly bypassing the curse of dimensionality. A possible explanation is that common training algorithms for neural networks implicitly perform dimensionality reduction - a process called feature learning. Recent work posited that the effects of feature learning can be elicited from a… ▽ More

    Submitted 9 January, 2024; originally announced January 2024.

  3. arXiv:2306.02601  [pdf, other

    cs.LG math.OC stat.ML

    Aiming towards the minimizers: fast convergence of SGD for overparametrized problems

    Authors: Chaoyue Liu, Dmitriy Drusvyatskiy, Mikhail Belkin, Damek Davis, Yi-An Ma

    Abstract: Modern machine learning paradigms, such as deep learning, occur in or close to the interpolation regime, wherein the number of model parameters is much larger than the number of data samples. In this work, we propose a regularity condition within the interpolation regime which endows the stochastic gradient method with the same worst-case iteration complexity as the deterministic gradient method,… ▽ More

    Submitted 5 June, 2023; originally announced June 2023.

  4. arXiv:2301.06632  [pdf, other

    math.OC math.ST stat.ML

    Asymptotic normality and optimality in nonsmooth stochastic approximation

    Authors: Damek Davis, Dmitriy Drusvyatskiy, Liwei Jiang

    Abstract: In their seminal work, Polyak and Juditsky showed that stochastic approximation algorithms for solving smooth equations enjoy a central limit theorem. Moreover, it has since been argued that the asymptotic covariance of the method is best possible among any estimation procedure in a local minimax sense of Hájek and Le Cam. A long-standing open question in this line of work is whether similar guara… ▽ More

    Submitted 16 January, 2023; originally announced January 2023.

    Comments: The arxiv report arXiv:2108.11832 has been split into two parts. This is Part 2 of the original submission, augmented by a some new results and a reworked exposition

    MSC Class: 65K05; 65K10; 90C15; 90C30; 90C06

  5. arXiv:2207.04173  [pdf, other

    math.OC cs.LG stat.ML

    Stochastic Approximation with Decision-Dependent Distributions: Asymptotic Normality and Optimality

    Authors: Joshua Cutler, Mateo Díaz, Dmitriy Drusvyatskiy

    Abstract: We analyze a stochastic approximation algorithm for decision-dependent problems, wherein the data distribution used by the algorithm evolves along the iterate sequence. The primary examples of such problems appear in performative prediction and its multiplayer extensions. We show that under mild assumptions, the deviation between the average iterate of the algorithm and the solution is asymptotica… ▽ More

    Submitted 13 March, 2024; v1 submitted 8 July, 2022; originally announced July 2022.

    Comments: 49 pages, 1 figure. v2: revised asymptotic optimality results and reworked exposition. v3: minor updates

    MSC Class: 90C15; 90C25

    Journal ref: Journal of Machine Learning Research, 25(90):1-49, 2024

  6. arXiv:2204.08281  [pdf, other

    math.OC cs.LG stat.ML

    Decision-Dependent Risk Minimization in Geometrically Decaying Dynamic Environments

    Authors: Mitas Ray, Dmitriy Drusvyatskiy, Maryam Fazel, Lillian J. Ratliff

    Abstract: This paper studies the problem of expected loss minimization given a data distribution that is dependent on the decision-maker's action and evolves dynamically in time according to a geometric decay process. Novel algorithms for both the information setting in which the decision-maker has a first order gradient oracle and the setting in which they have simply a loss function oracle are introduced.… ▽ More

    Submitted 8 April, 2022; originally announced April 2022.

    Comments: Accepted at AAAI 2022

  7. arXiv:2203.03756  [pdf, other

    cs.LG math.OC stat.ML

    Flat minima generalize for low-rank matrix recovery

    Authors: Lijun Ding, Dmitriy Drusvyatskiy, Maryam Fazel, Zaid Harchaoui

    Abstract: Empirical evidence suggests that for a variety of overparameterized nonlinear models, most notably in neural network training, the growth of the loss around a minimizer strongly impacts its performance. Flat minima -- those around which the loss grows slowly -- appear to generalize well. This work takes a step towards understanding this phenomenon by focusing on the simplest class of overparameter… ▽ More

    Submitted 17 February, 2023; v1 submitted 7 March, 2022; originally announced March 2022.

    Comments: 36 pages

  8. arXiv:2106.09815  [pdf, other

    math.OC cs.LG stat.ML

    Esca** strict saddle points of the Moreau envelope in nonsmooth optimization

    Authors: Damek Davis, Mateo Díaz, Dmitriy Drusvyatskiy

    Abstract: Recent work has shown that stochastically perturbed gradient methods can efficiently escape strict saddle points of smooth functions. We extend this body of work to nonsmooth optimization, by analyzing an inexact analogue of a stochastically perturbed gradient method applied to the Moreau envelope. The main conclusion is that a variety of algorithms for nonsmooth optimization can escape strict sad… ▽ More

    Submitted 17 June, 2021; originally announced June 2021.

    Comments: 29 pages, 1 figure

    MSC Class: 65K05; 65K10; 90C15; 90C30; 90C06

  9. arXiv:1912.07146  [pdf, other

    math.OC cs.LG stat.ML

    Proximal methods avoid active strict saddles of weakly convex functions

    Authors: Damek Davis, Dmitriy Drusvyatskiy

    Abstract: We introduce a geometrically transparent strict saddle property for nonsmooth functions. This property guarantees that simple proximal algorithms on weakly convex problems converge only to local minimizers, when randomly initialized. We argue that the strict saddle property may be a realistic assumption in applications, since it provably holds for generic semi-algebraic optimization problems.

    Submitted 16 February, 2021; v1 submitted 15 December, 2019; originally announced December 2019.

    Comments: 43 pages, 2 figures

    MSC Class: 65K05; 65K10; 90C15; 90C30; 90C06

  10. arXiv:1907.13307  [pdf, ps, other

    math.OC cs.LG stat.ML

    From low probability to high confidence in stochastic convex optimization

    Authors: Damek Davis, Dmitriy Drusvyatskiy, Lin Xiao, Junyu Zhang

    Abstract: Standard results in stochastic convex optimization bound the number of samples that an algorithm needs to generate a point with small function value in expectation. More nuanced high probability guarantees are rare, and typically either rely on "light-tail" noise assumptions or exhibit worse sample complexity. In this work, we show that a wide class of stochastic optimization algorithms for strong… ▽ More

    Submitted 16 October, 2019; v1 submitted 31 July, 2019; originally announced July 2019.

    Comments: 37 pages

    MSC Class: 65K05; 65K10; 90C15; 90C25

  11. arXiv:1907.09547  [pdf, other

    math.OC cs.LG stat.ML

    Stochastic algorithms with geometric step decay converge linearly on sharp functions

    Authors: Damek Davis, Dmitriy Drusvyatskiy, Vasileios Charisopoulos

    Abstract: Stochastic (sub)gradient methods require step size schedule tuning to perform well in practice. Classical tuning strategies decay the step size polynomially and lead to optimal sublinear rates on (strongly) convex problems. An alternative schedule, popular in nonconvex optimization, is called \emph{geometric step decay} and proceeds by halving the step size after every few epochs. In recent work,… ▽ More

    Submitted 22 July, 2019; originally announced July 2019.

    MSC Class: 65K05; 65K10; 90C15; 90C30; 90C06

  12. arXiv:1703.10993  [pdf, other

    stat.ML math.OC

    Catalyst Acceleration for Gradient-Based Non-Convex Optimization

    Authors: Courtney Paquette, Hongzhou Lin, Dmitriy Drusvyatskiy, Julien Mairal, Zaid Harchaoui

    Abstract: We introduce a generic scheme to solve nonconvex optimization problems using gradient-based algorithms originally designed for minimizing convex functions. Even though these methods may originally require convexity to operate, the proposed approach allows one to use them on weakly convex objectives, which covers a large class of non-convex functions typically appearing in machine learning and sign… ▽ More

    Submitted 31 December, 2018; v1 submitted 31 March, 2017; originally announced March 2017.