Skip to main content

Showing 1–33 of 33 results for author: Diakonikolas, J

Searching in archive math. Search in all archives.
.
  1. arXiv:2403.16317  [pdf, other

    math.OC cs.DS cs.LG

    Optimization on a Finer Scale: Bounded Local Subgradient Variation Perspective

    Authors: Jelena Diakonikolas, Cristóbal Guzmán

    Abstract: We initiate the study of nonsmooth optimization problems under bounded local subgradient variation, which postulates bounded difference between (sub)gradients in small local regions around points, in either average or maximum sense. The resulting class of objective functions encapsulates the classes of objective functions traditionally studied in optimization, which are defined based on either Lip… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

  2. arXiv:2403.10763  [pdf, other

    stat.ML cs.LG math.OC

    A Primal-Dual Algorithm for Faster Distributionally Robust Optimization

    Authors: Ronak Mehta, Jelena Diakonikolas, Zaid Harchaoui

    Abstract: We consider the penalized distributionally robust optimization (DRO) problem with a closed, convex uncertainty set, a setting that encompasses the $f$-DRO, Wasserstein-DRO, and spectral/$L$-risk formulations used in practice. We present Drago, a stochastic primal-dual algorithm that achieves a state-of-the-art linear convergence rate on strongly convex-strongly concave DRO problems. The method com… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  3. arXiv:2403.10547  [pdf, ps, other

    math.OC cs.AI cs.DS cs.LG

    Robust Second-Order Nonconvex Optimization and Its Application to Low Rank Matrix Sensing

    Authors: Shuyao Li, Yu Cheng, Ilias Diakonikolas, Jelena Diakonikolas, Rong Ge, Stephen J. Wright

    Abstract: Finding an approximate second-order stationary point (SOSP) is a well-studied and fundamental problem in stochastic nonconvex optimization with many applications in machine learning. However, this problem is poorly understood in the presence of outliers, limiting the use of existing nonconvex algorithms in adversarial settings. In this paper, we study the problem of finding SOSPs in the strong c… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

  4. arXiv:2403.06873  [pdf, other

    math.OC cs.LG

    Last Iterate Convergence of Incremental Methods and Applications in Continual Learning

    Authors: Xufeng Cai, Jelena Diakonikolas

    Abstract: Incremental gradient and incremental proximal methods are a fundamental class of optimization algorithms used for solving finite sum problems, broadly studied in the literature. Yet, without strong convexity, their convergence guarantees have primarily been established for the ergodic (average) iterate. Motivated by applications in continual learning, we obtain the first convergence guarantees for… ▽ More

    Submitted 27 June, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

  5. arXiv:2402.17756  [pdf, other

    cs.LG cs.DS math.OC math.ST stat.ML

    Robustly Learning Single-Index Models via Alignment Sharpness

    Authors: Nikos Zarifis, Puqian Wang, Ilias Diakonikolas, Jelena Diakonikolas

    Abstract: We study the problem of learning Single-Index Models under the $L_2^2$ loss in the agnostic model. We give an efficient learning algorithm, achieving a constant factor approximation to the optimal loss, that succeeds under a range of distributions (including log-concave distributions) and a broad class of monotone and Lipschitz link functions. This is the first efficient constant factor approximat… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

  6. arXiv:2311.17296  [pdf, other

    math.OC

    Mirror Duality in Convex Optimization

    Authors: Jaeyeon Kim, Chanwoo Park, Asuman Ozdaglar, Jelena Diakonikolas, Ernest K. Ryu

    Abstract: While first-order optimization methods are usually designed to efficiently reduce the function value $f(x)$, there has been recent interest in methods efficiently reducing the magnitude of $\nabla f(x)$, and the findings show that the two types of methods exhibit a certain symmetry. In this work, we present mirror duality, a one-to-one correspondence between mirror-descent-type methods reducing fu… ▽ More

    Submitted 15 May, 2024; v1 submitted 28 November, 2023; originally announced November 2023.

  7. arXiv:2310.02987  [pdf, other

    cs.LG math.OC

    Variance Reduced Halpern Iteration for Finite-Sum Monotone Inclusions

    Authors: Xufeng Cai, Ahmet Alacaoglu, Jelena Diakonikolas

    Abstract: Machine learning approaches relying on such criteria as adversarial robustness or multi-agent settings have raised the need for solving game-theoretic equilibrium problems. Of particular relevance to these applications are methods targeting finite-sum structure, which generically arises in empirical variants of learning problems in these contexts. Further, methods with computable approximation err… ▽ More

    Submitted 26 October, 2023; v1 submitted 4 October, 2023; originally announced October 2023.

  8. arXiv:2307.16754  [pdf, other

    cs.GT cs.MA math.OC

    Block-Coordinate Methods and Restarting for Solving Extensive-Form Games

    Authors: Darshan Chakrabarti, Jelena Diakonikolas, Christian Kroer

    Abstract: Coordinate descent methods are popular in machine learning and optimization for their simple sparse updates and excellent practical performance. In the context of large-scale sequential game solving, these same properties would be attractive, but until now no such methods were known, because the strategy spaces do not satisfy the typical separable block structure exploited by such methods. We pres… ▽ More

    Submitted 31 July, 2023; originally announced July 2023.

  9. arXiv:2307.08438  [pdf, ps, other

    cs.LG cs.DS math.ST stat.ML

    Near-Optimal Bounds for Learning Gaussian Halfspaces with Random Classification Noise

    Authors: Ilias Diakonikolas, Jelena Diakonikolas, Daniel M. Kane, Puqian Wang, Nikos Zarifis

    Abstract: We study the problem of learning general (i.e., not necessarily homogeneous) halfspaces with Random Classification Noise under the Gaussian distribution. We establish nearly-matching algorithmic and Statistical Query (SQ) lower bound results revealing a surprising information-computation gap for this basic problem. Specifically, the sample complexity of this learning problem is $\widetildeΘ(d/ε)$,… ▽ More

    Submitted 13 July, 2023; originally announced July 2023.

  10. arXiv:2306.16352  [pdf, ps, other

    cs.LG cs.DS math.ST stat.ML

    Information-Computation Tradeoffs for Learning Margin Halfspaces with Random Classification Noise

    Authors: Ilias Diakonikolas, Jelena Diakonikolas, Daniel M. Kane, Puqian Wang, Nikos Zarifis

    Abstract: We study the problem of PAC learning $γ$-margin halfspaces with Random Classification Noise. We establish an information-computation tradeoff suggesting an inherent gap between the sample complexity of the problem and the sample complexity of computationally efficient algorithms. Concretely, the sample complexity of the problem is $\widetildeΘ(1/(γ^2 ε))$. We start by giving a simple efficient alg… ▽ More

    Submitted 28 June, 2023; originally announced June 2023.

  11. arXiv:2306.12498  [pdf, other

    math.OC cs.LG

    Empirical Risk Minimization with Shuffled SGD: A Primal-Dual Perspective and Improved Bounds

    Authors: Xufeng Cai, Cheuk Yin Lin, Jelena Diakonikolas

    Abstract: Stochastic gradient descent (SGD) is perhaps the most prevalent optimization method in modern machine learning. Contrary to the empirical practice of sampling from the datasets without replacement and with (possible) reshuffling at each epoch, the theoretical counterpart of SGD usually relies on the assumption of sampling with replacement. It is only very recently that SGD with sampling without re… ▽ More

    Submitted 7 February, 2024; v1 submitted 21 June, 2023; originally announced June 2023.

  12. arXiv:2306.07892  [pdf, other

    cs.LG cs.DS math.OC math.ST stat.ML

    Robustly Learning a Single Neuron via Sharpness

    Authors: Puqian Wang, Nikos Zarifis, Ilias Diakonikolas, Jelena Diakonikolas

    Abstract: We study the problem of learning a single neuron with respect to the $L_2^2$-loss in the presence of adversarial label noise. We give an efficient algorithm that, for a broad family of activations including ReLUs, approximates the optimal $L_2^2$-error within a constant factor. Our algorithm applies under much milder distributional assumptions compared to prior work. The key ingredient enabling ou… ▽ More

    Submitted 13 June, 2023; originally announced June 2023.

  13. arXiv:2303.16279  [pdf, other

    math.OC cs.LG

    Accelerated Cyclic Coordinate Dual Averaging with Extrapolation for Composite Convex Optimization

    Authors: Cheuk Yin Lin, Chaobing Song, Jelena Diakonikolas

    Abstract: Exploiting partial first-order information in a cyclic way is arguably the most natural strategy to obtain scalable first-order methods. However, despite their wide use in practice, cyclic schemes are far less understood from a theoretical perspective than their randomized counterparts. Motivated by a recent success in analyzing an extrapolated cyclic scheme for generalized variational inequalitie… ▽ More

    Submitted 28 March, 2023; originally announced March 2023.

  14. arXiv:2212.05088  [pdf, other

    math.OC cs.LG

    Cyclic Block Coordinate Descent With Variance Reduction for Composite Nonconvex Optimization

    Authors: Xufeng Cai, Chaobing Song, Stephen J. Wright, Jelena Diakonikolas

    Abstract: Nonconvex optimization is central in solving many machine learning problems, in which block-wise structure is commonly encountered. In this work, we propose cyclic block coordinate methods for nonconvex optimization problems with non-asymptotic gradient norm guarantees. Our convergence analysis is based on a gradient Lipschitz condition with respect to a Mahalanobis norm, inspired by a recent prog… ▽ More

    Submitted 27 January, 2023; v1 submitted 9 December, 2022; originally announced December 2022.

  15. arXiv:2203.09436  [pdf, other

    math.OC cs.LG

    Stochastic Halpern Iteration with Variance Reduction for Stochastic Monotone Inclusions

    Authors: Xufeng Cai, Chaobing Song, Cristóbal Guzmán, Jelena Diakonikolas

    Abstract: We study stochastic monotone inclusion problems, which widely appear in machine learning applications, including robust regression and adversarial learning. We propose novel variants of stochastic Halpern iteration with recursive variance reduction. In the cocoercive -- and more generally Lipschitz-monotone -- setup, our algorithm attains $ε$ norm of the operator with $\mathcal{O}(\frac{1}{ε^3})$… ▽ More

    Submitted 8 January, 2023; v1 submitted 17 March, 2022; originally announced March 2022.

  16. arXiv:2203.03808  [pdf, other

    math.OC cs.LG stat.ML

    A Fast Scale-Invariant Algorithm for Non-negative Least Squares with Non-negative Data

    Authors: Jelena Diakonikolas, Chenghui Li, Swati Padmanabhan, Chaobing Song

    Abstract: Nonnegative (linear) least square problems are a fundamental class of problems that is well-studied in statistical learning and for which solvers have been implemented in many of the standard programming languages used within the machine learning community. The existing off-the-shelf solvers view the non-negativity constraint in these problems as an obstacle and, compared to unconstrained least sq… ▽ More

    Submitted 7 March, 2022; originally announced March 2022.

  17. arXiv:2111.01842  [pdf, other

    math.OC cs.LG

    Coordinate Linear Variance Reduction for Generalized Linear Programming

    Authors: Chaobing Song, Cheuk Yin Lin, Stephen J. Wright, Jelena Diakonikolas

    Abstract: We study a class of generalized linear programs (GLP) in a large-scale setting, which includes simple, possibly nonsmooth convex regularizer and simple convex set constraints. By reformulating (GLP) as an equivalent convex-concave min-max problem, we show that the linear structure in the problem can be used to design an efficient, scalable first-order algorithm, to which we give the name \emph{Coo… ▽ More

    Submitted 6 April, 2023; v1 submitted 2 November, 2021; originally announced November 2021.

    Comments: 39 pages, NeurIPS 2022

  18. arXiv:2102.13643  [pdf, other

    math.OC cs.LG math.NA

    Variance Reduction via Primal-Dual Accelerated Dual Averaging for Nonsmooth Convex Finite-Sums

    Authors: Chaobing Song, Stephen J. Wright, Jelena Diakonikolas

    Abstract: We study structured nonsmooth convex finite-sum optimization that appears widely in machine learning applications, including support vector machines and least absolute deviation. For the primal-dual formulation of this problem, we propose a novel algorithm called \emph{Variance Reduction via Primal-Dual Accelerated Dual Averaging (\vrpda)}. In the nonsmooth and general convex setting, \vrpda~has t… ▽ More

    Submitted 7 April, 2021; v1 submitted 26 February, 2021; originally announced February 2021.

    Comments: 33 pages, 18 figures

  19. arXiv:2102.13244  [pdf, other

    math.OC cs.LG

    Cyclic Coordinate Dual Averaging with Extrapolation

    Authors: Chaobing Song, Jelena Diakonikolas

    Abstract: Cyclic block coordinate methods are a fundamental class of optimization methods widely used in practice and implemented as part of standard software packages for statistical learning. Nevertheless, their convergence is generally not well understood and so far their good practical performance has not been explained by existing convergence analyses. In this work, we introduce a new block coordinate… ▽ More

    Submitted 8 June, 2023; v1 submitted 25 February, 2021; originally announced February 2021.

    Comments: 27 pages, 2 figures. Accepted to SIAM Journal on Optimization. Version prior to final copy editing

  20. arXiv:2102.06806  [pdf, other

    math.OC cs.LG stat.ML

    Parameter-free Locally Accelerated Conditional Gradients

    Authors: Alejandro Carderera, Jelena Diakonikolas, Cheuk Yin Lin, Sebastian Pokutta

    Abstract: Projection-free conditional gradient (CG) methods are the algorithms of choice for constrained optimization setups in which projections are often computationally prohibitive but linear optimization over the constraint set remains computationally feasible. Unlike in projection-based methods, globally accelerated convergence rates are in general unattainable for CG. However, a very recent work on Lo… ▽ More

    Submitted 15 June, 2021; v1 submitted 12 February, 2021; originally announced February 2021.

  21. arXiv:2101.12101  [pdf, ps, other

    math.OC cs.DS cs.LG

    Potential Function-based Framework for Making the Gradients Small in Convex and Min-Max Optimization

    Authors: Jelena Diakonikolas, Puqian Wang

    Abstract: Making the gradients small is a fundamental optimization problem that has eluded unifying and simple convergence arguments in first-order optimization, so far primarily reserved for other convergence criteria, such as reducing the optimality gap. We introduce a novel potential function-based framework to study the convergence of standard methods for making the gradients small in smooth convex opti… ▽ More

    Submitted 28 January, 2021; originally announced January 2021.

  22. arXiv:2101.11041  [pdf, ps, other

    math.OC cs.DS cs.LG stat.ML

    Complementary Composite Minimization, Small Gradients in General Norms, and Applications

    Authors: Jelena Diakonikolas, Cristóbal Guzmán

    Abstract: Composite minimization is a powerful framework in large-scale convex optimization, based on decoupling of the objective function into terms with structurally different properties and allowing for more flexible algorithmic design. We introduce a new algorithmic framework for complementary composite minimization, where the objective function decouples into a (weakly) smooth and a uniformly convex te… ▽ More

    Submitted 15 February, 2023; v1 submitted 26 January, 2021; originally announced January 2021.

  23. arXiv:2011.00364  [pdf, ps, other

    math.OC cs.DS cs.LG stat.ML

    Efficient Methods for Structured Nonconvex-Nonconcave Min-Max Optimization

    Authors: Jelena Diakonikolas, Constantinos Daskalakis, Michael I. Jordan

    Abstract: The use of min-max optimization in adversarial training of deep neural network classifiers and training of generative adversarial networks has motivated the study of nonconvex-nonconcave optimization objectives, which frequently arise in these applications. Unfortunately, recent results have established that even approximate first-order stationary points of such objectives are intractable, even un… ▽ More

    Submitted 27 February, 2021; v1 submitted 31 October, 2020; originally announced November 2020.

    Comments: in Proc. AISTATS'21

  24. arXiv:2002.08872  [pdf, ps, other

    math.OC cs.DS cs.LG

    Halpern Iteration for Near-Optimal and Parameter-Free Monotone Inclusion and Strong Solutions to Variational Inequalities

    Authors: Jelena Diakonikolas

    Abstract: We leverage the connections between nonexpansive maps, monotone Lipschitz operators, and proximal map**s to obtain near-optimal (i.e., optimal up to poly-log factors in terms of iteration complexity) and parameter-free methods for solving monotone inclusion problems. These results immediately translate into near-optimal guarantees for approximating strong solutions to variational inequality prob… ▽ More

    Submitted 11 April, 2020; v1 submitted 20 February, 2020; originally announced February 2020.

    Comments: 23 pages; v1->v2: added acknowledgements and some more related work; v2 -> v3: fixed a small typo in the proof of Lemma 2.1

  25. arXiv:1907.00289  [pdf, ps, other

    math.OC cs.DS cs.LG stat.ML

    Conjugate Gradients and Accelerated Methods Unified: The Approximate Duality Gap View

    Authors: Jelena Diakonikolas, Lorenzo Orecchia

    Abstract: This note provides a novel, simple analysis of the method of conjugate gradients for the minimization of convex quadratic functions. In contrast with standard arguments, our proof is entirely self-contained and does not rely on the existence of Chebyshev polynomials. Another advantage of our development is that it clarifies the relation between the method of conjugate gradients and general acceler… ▽ More

    Submitted 9 February, 2020; v1 submitted 29 June, 2019; originally announced July 2019.

    Comments: 8 pages. v1 -> v2: corrected a reference to the paper with Nemirovski acceleration with line search. v2 -> v3: updated affiliations, corrected a few typos on p.7 and added an acknowledgement

  26. arXiv:1906.07867  [pdf, other

    math.OC cs.LG stat.ML

    Locally Accelerated Conditional Gradients

    Authors: Jelena Diakonikolas, Alejandro Carderera, Sebastian Pokutta

    Abstract: Conditional gradients constitute a class of projection-free first-order algorithms for smooth convex optimization. As such, they are frequently used in solving smooth convex optimization problems over polytopes, for which the computational cost of orthogonal projections would be prohibitive. However, they do not enjoy the optimal convergence rates achieved by projection-based accelerated methods;… ▽ More

    Submitted 11 October, 2019; v1 submitted 18 June, 2019; originally announced June 2019.

  27. arXiv:1906.00436  [pdf, other

    math.OC cs.LG stat.ML

    Generalized Momentum-Based Methods: A Hamiltonian Perspective

    Authors: Jelena Diakonikolas, Michael I. Jordan

    Abstract: We take a Hamiltonian-based perspective to generalize Nesterov's accelerated gradient descent and Polyak's heavy ball method to a broad class of momentum methods in the setting of (possibly) constrained minimization in Euclidean and non-Euclidean normed vector spaces. Our perspective leads to a generic and unifying nonasymptotic analysis of convergence of these methods in both the function value (… ▽ More

    Submitted 15 November, 2020; v1 submitted 2 June, 2019; originally announced June 2019.

    Comments: To appear in SIAM Journal on Optimization. v1 -> v2: minor edits + added funding acknowledgements, v2 -> v3: revised presentation, upon journal revision

  28. arXiv:1811.01903  [pdf, ps, other

    math.OC cs.DS cs.LG stat.ML

    Lower Bounds for Parallel and Randomized Convex Optimization

    Authors: Jelena Diakonikolas, Cristóbal Guzmán

    Abstract: We study the question of whether parallelization in the exploration of the feasible set can be used to speed up convex optimization, in the local oracle model of computation. We show that the answer is negative for both deterministic and randomized algorithms applied to essentially any of the interesting geometries and nonsmooth, weakly-smooth, or smooth objective functions. In particular, we show… ▽ More

    Submitted 19 June, 2019; v1 submitted 5 November, 2018; originally announced November 2018.

    Comments: In Proc. COLT'19

  29. arXiv:1808.02517  [pdf, other

    cs.DS cs.NI math.OC

    Fair Packing and Covering on a Relative Scale

    Authors: Jelena Diakonikolas, Maryam Fazel, Lorenzo Orecchia

    Abstract: Fair resource allocation is a fundamental optimization problem with applications in operations research, networking, and economic and game theory. Research in these areas has led to the general acceptance of a class of $α$-fair utility functions parameterized by $α\in [0, \infty]$. We consider $α$-fair packing -- the problem of maximizing $α$-fair utilities under positive linear constraints -- and… ▽ More

    Submitted 15 November, 2020; v1 submitted 7 August, 2018; originally announced August 2018.

    Comments: To appear in SIAM Journal on Optimization

  30. arXiv:1805.12591  [pdf, other

    math.OC cs.DS

    On Acceleration with Noise-Corrupted Gradients

    Authors: Michael B. Cohen, Jelena Diakonikolas, Lorenzo Orecchia

    Abstract: Accelerated algorithms have broad applications in large-scale optimization, due to their generality and fast convergence. However, their stability in the practical setting of noise-corrupted gradient oracles is not well-understood. This paper provides two main technical contributions: (i) a new accelerated method AGDP that generalizes Nesterov's AGD and improves on the recent method AXGD (Diakonik… ▽ More

    Submitted 31 July, 2018; v1 submitted 31 May, 2018; originally announced May 2018.

    Comments: Appeared in Proc. ICML'18; v2 corrects the statement of Corollary 3.9; v3 added references to concurrent work

  31. arXiv:1805.09185  [pdf, other

    math.OC

    Alternating Randomized Block Coordinate Descent

    Authors: Jelena Diakonikolas, Lorenzo Orecchia

    Abstract: Block-coordinate descent algorithms and alternating minimization methods are fundamental optimization algorithms and an important primitive in large-scale optimization and machine learning. While various block-coordinate-descent-type methods have been studied extensively, only alternating minimization -- which applies to the setting of only two blocks -- is known to have convergence time that scal… ▽ More

    Submitted 1 July, 2019; v1 submitted 23 May, 2018; originally announced May 2018.

    Comments: Version 1 appeared Proc. ICML'18. v1 -> v2: added remarks about how accelerated alternating minimization follows directly from the results that appeared in ICML'18; no new technical results were needed for this

  32. arXiv:1712.02485  [pdf, other

    math.OC cs.DS

    The Approximate Duality Gap Technique: A Unified Theory of First-Order Methods

    Authors: Jelena Diakonikolas, Lorenzo Orecchia

    Abstract: We present a general technique for the analysis of first-order methods. The technique relies on the construction of a duality gap for an appropriate approximation of the objective function, where the function approximation improves as the algorithm converges. We show that in continuous time enforcement of an invariant that this approximate duality gap decreases at a certain rate exactly recovers a… ▽ More

    Submitted 10 December, 2019; v1 submitted 6 December, 2017; originally announced December 2017.

    Comments: In SIAM Journal on Optimization. The most recent version corrected a few typos

  33. arXiv:1706.04680  [pdf, other

    math.OC cs.DS math.NA

    Accelerated Extra-Gradient Descent: A Novel Accelerated First-Order Method

    Authors: Jelena Diakonikolas, Lorenzo Orecchia

    Abstract: We provide a novel accelerated first-order method that achieves the asymptotically optimal convergence rate for smooth functions in the first-order oracle model. To this day, Nesterov's Accelerated Gradient Descent (AGD) and variations thereof were the only methods achieving acceleration in this standard blackbox model. In contrast, our algorithm is significantly different from AGD, as it relies o… ▽ More

    Submitted 10 February, 2018; v1 submitted 14 June, 2017; originally announced June 2017.

    Comments: Appeared in Proc. ITCS'18, conference version available at: http://drops.dagstuhl.de/opus/volltexte/2018/8356/