Skip to main content

Showing 1–22 of 22 results for author: Arjevani, Y

.
  1. arXiv:2312.16819  [pdf, other

    cs.LG math.OC stat.ML

    Hidden Minima in Two-Layer ReLU Networks

    Authors: Yossi Arjevani

    Abstract: The optimization problem associated to fitting two-layer ReLU networks having $d$~inputs, $k$~neurons, and labels generated by a target network, is considered. Two types of infinite families of spurious minima, giving one minimum per $d$, were recently found. The loss at minima belonging to the first type converges to zero as $d$ increases. In the second type, the loss remains bounded away from ze… ▽ More

    Submitted 19 February, 2024; v1 submitted 27 December, 2023; originally announced December 2023.

  2. arXiv:2306.07886  [pdf, ps, other

    math.OC cs.LG math.AG math.NA stat.ML

    Symmetry & Critical Points for Symmetric Tensor Decomposition Problems

    Authors: Yossi Arjevani, Gal Vinograd

    Abstract: We consider the nonconvex optimization problem associated with the decomposition of a real symmetric tensor into a sum of rank one terms. Use is made of the rich symmetry structure to construct infinite families of critical points represented by Puiseux series in the problem dimension, and so obtain precise analytic estimates on the value of the objective function and the Hessian spectrum. The res… ▽ More

    Submitted 7 August, 2023; v1 submitted 13 June, 2023; originally announced June 2023.

  3. arXiv:2210.06088  [pdf, ps, other

    cs.LG math.OC

    Annihilation of Spurious Minima in Two-Layer ReLU Networks

    Authors: Yossi Arjevani, Michael Field

    Abstract: We study the optimization problem associated with fitting two-layer ReLU neural networks with respect to the squared loss, where labels are generated by a target network. Use is made of the rich symmetry structure to develop a novel set of tools for studying the mechanism by which over-parameterization annihilates spurious minima. Sharp analytic estimates are obtained for the loss and the Hessian… ▽ More

    Submitted 12 October, 2022; originally announced October 2022.

  4. arXiv:2107.10370  [pdf, other

    cs.LG math.DS math.OC

    Analytic Study of Families of Spurious Minima in Two-Layer ReLU Neural Networks: A Tale of Symmetry II

    Authors: Yossi Arjevani, Michael Field

    Abstract: We study the optimization problem associated with fitting two-layer ReLU neural networks with respect to the squared loss, where labels are generated by a target network. We make use of the rich symmetry structure to develop a novel set of tools for studying families of spurious minima. In contrast to existing approaches which operate in limiting regimes, our technique directly addresses the nonco… ▽ More

    Submitted 17 October, 2021; v1 submitted 21 July, 2021; originally announced July 2021.

    Comments: arXiv admin note: text overlap with arXiv:2008.01805

  5. arXiv:2107.02422  [pdf, ps, other

    cs.LG math.DS math.OC

    Equivariant bifurcation, quadratic equivariants, and symmetry breaking for the standard representation of $S_n$

    Authors: Yossi Arjevani, Michael Field

    Abstract: Motivated by questions originating from the study of a class of shallow student-teacher neural networks, methods are developed for the analysis of spurious minima in classes of gradient equivariant dynamics related to neural nets. In the symmetric case, methods depend on the generic equivariant bifurcation theory of irreducible representations of the symmetric group on $n$ symbols, $S_n$; in parti… ▽ More

    Submitted 6 July, 2021; originally announced July 2021.

  6. arXiv:2103.06234  [pdf, other

    math.OC cs.LG

    Symmetry Breaking in Symmetric Tensor Decomposition

    Authors: Yossi Arjevani, Joan Bruna, Michael Field, Joe Kileel, Matthew Trager, Francis Williams

    Abstract: In this note, we consider the highly nonconvex optimization problem associated with computing the rank decomposition of symmetric tensors. We formulate the invariance properties of the loss function and show that critical points detected by standard gradient based methods are \emph{symmetry breaking} with respect to the target tensor. The phenomena, seen for different choices of target tensors and… ▽ More

    Submitted 28 December, 2023; v1 submitted 10 March, 2021; originally announced March 2021.

  7. arXiv:2008.01805  [pdf, other

    cs.LG math.OC stat.ML

    Analytic Characterization of the Hessian in Shallow ReLU Models: A Tale of Symmetry

    Authors: Yossi Arjevani, Michael Field

    Abstract: We consider the optimization problem associated with fitting two-layers ReLU networks with respect to the squared loss, where labels are generated by a target network. We leverage the rich symmetry structure to analytically characterize the Hessian at various families of spurious minima in the natural regime where the number of inputs $d$ and the number of hidden neurons $k$ is finite. In particul… ▽ More

    Submitted 15 October, 2020; v1 submitted 4 August, 2020; originally announced August 2020.

  8. arXiv:2006.13476  [pdf, other

    cs.LG math.OC stat.ML

    Second-Order Information in Non-Convex Stochastic Optimization: Power and Limitations

    Authors: Yossi Arjevani, Yair Carmon, John C. Duchi, Dylan J. Foster, Ayush Sekhari, Karthik Sridharan

    Abstract: We design an algorithm which finds an $ε$-approximate stationary point (with $\|\nabla F(x)\|\le ε$) using $O(ε^{-3})$ stochastic gradient and Hessian-vector products, matching guarantees that were previously available only under a stronger assumption of access to multiple queries with the same random seed. We prove a lower bound which establishes that this rate is optimal and---surprisingly---tha… ▽ More

    Submitted 24 June, 2020; originally announced June 2020.

    Comments: Accepted to CONFERENCE ON LEARNING THEORY (COLT) 2020

  9. arXiv:2006.06733  [pdf, other

    math.OC cs.LG

    IDEAL: Inexact DEcentralized Accelerated Augmented Lagrangian Method

    Authors: Yossi Arjevani, Joan Bruna, Bugra Can, Mert Gürbüzbalaban, Stefanie Jegelka, Hongzhou Lin

    Abstract: We introduce a framework for designing primal methods under the decentralized optimization setting where local functions are smooth and strongly convex. Our approach consists of approximately solving a sequence of sub-problems induced by the accelerated augmented Lagrangian method, thereby providing a systematic way for deriving several well-known decentralized algorithms including EXTRA arXiv:140… ▽ More

    Submitted 11 June, 2020; originally announced June 2020.

  10. arXiv:2003.10576  [pdf, ps, other

    cs.LG math.DS math.OC stat.ML

    Symmetry & critical points for a model shallow neural network

    Authors: Yossi Arjevani, Michael Field

    Abstract: We consider the optimization problem associated with fitting two-layer ReLU networks with $k$ hidden neurons, where labels are assumed to be generated by a (teacher) neural network. We leverage the rich symmetry exhibited by such models to identify various families of critical points and express them as power series in $k^{-\frac{1}{2}}$. These expressions are then used to derive estimates for sev… ▽ More

    Submitted 11 March, 2021; v1 submitted 23 March, 2020; originally announced March 2020.

  11. arXiv:2002.03273  [pdf, ps, other

    cs.LG math.OC stat.ML

    On the Complexity of Minimizing Convex Finite Sums Without Using the Indices of the Individual Functions

    Authors: Yossi Arjevani, Amit Daniely, Stefanie Jegelka, Hongzhou Lin

    Abstract: Recent advances in randomized incremental methods for minimizing $L$-smooth $μ$-strongly convex finite sums have culminated in tight complexity of $\tilde{O}((n+\sqrt{n L/μ})\log(1/ε))$ and $O(n+\sqrt{nL/ε})$, where $μ>0$ and $μ=0$, respectively, and $n$ denotes the number of individual functions. Unlike incremental methods, stochastic methods for finite sums do not rely on an explicit knowledge o… ▽ More

    Submitted 8 February, 2020; originally announced February 2020.

  12. arXiv:1912.11939  [pdf, other

    cs.LG stat.ML

    On the Principle of Least Symmetry Breaking in Shallow ReLU Models

    Authors: Yossi Arjevani, Michael Field

    Abstract: We consider the optimization problem associated with fitting two-layer ReLU networks with respect to the squared loss, where labels are assumed to be generated by a target network. Focusing first on standard Gaussian inputs, we show that the structure of spurious local minima detected by stochastic gradient descent (SGD) is, in a well-defined sense, the \emph{least loss of symmetry} with respect t… ▽ More

    Submitted 28 December, 2023; v1 submitted 26 December, 2019; originally announced December 2019.

  13. arXiv:1912.02365  [pdf, other

    math.OC cs.IT cs.LG stat.ML

    Lower Bounds for Non-Convex Stochastic Optimization

    Authors: Yossi Arjevani, Yair Carmon, John C. Duchi, Dylan J. Foster, Nathan Srebro, Blake Woodworth

    Abstract: We lower bound the complexity of finding $ε$-stationary points (with gradient norm at most $ε$) using stochastic first-order methods. In a well-studied model where algorithms access smooth, potentially non-convex functions through queries to an unbiased stochastic gradient oracle with bounded variance, we prove that (in the worst case) any algorithm requires at least $ε^{-4}$ queries to find an… ▽ More

    Submitted 27 February, 2022; v1 submitted 4 December, 2019; originally announced December 2019.

    Comments: Correction to hard instance dimensions in Theorem 3

  14. arXiv:1806.10188  [pdf, ps, other

    math.OC cs.LG stat.ML

    A Tight Convergence Analysis for Stochastic Gradient Descent with Delayed Updates

    Authors: Yossi Arjevani, Ohad Shamir, Nathan Srebro

    Abstract: We provide tight finite-time convergence bounds for gradient descent and stochastic gradient descent on quadratic functions, when the gradients are delayed and reflect iterates from $τ$ rounds ago. First, we show that without stochastic noise, delays strongly affect the attainable optimization error: In fact, the error can be as bad as non-delayed gradient descent ran on only $1/τ$ of the gradient… ▽ More

    Submitted 26 June, 2018; originally announced June 2018.

  15. arXiv:1706.01686  [pdf, ps, other

    math.OC cs.LG stat.ML

    Limitations on Variance-Reduction and Acceleration Schemes for Finite Sum Optimization

    Authors: Yossi Arjevani

    Abstract: We study the conditions under which one is able to efficiently apply variance-reduction and acceleration schemes on finite sum optimization problems. First, we show that, perhaps surprisingly, the finite sum structure by itself, is not sufficient for obtaining a complexity bound of $\tilde{\cO}((n+L/μ)\ln(1/ε))$ for $L$-smooth and $μ$-strongly convex individual functions - one must also know which… ▽ More

    Submitted 6 December, 2017; v1 submitted 6 June, 2017; originally announced June 2017.

  16. arXiv:1705.07260  [pdf, ps, other

    math.OC

    Oracle Complexity of Second-Order Methods for Smooth Convex Optimization

    Authors: Yossi Arjevani, Ohad Shamir, Ron Shiff

    Abstract: Second-order methods, which utilize gradients as well as Hessians to optimize a given function, are of major importance in mathematical optimization. In this work, we prove tight bounds on the oracle complexity of such methods for smooth convex functions, or equivalently, the worst-case number of iterations required to optimize such functions to a given accuracy. In particular, these bounds indica… ▽ More

    Submitted 17 August, 2017; v1 submitted 20 May, 2017; originally announced May 2017.

    Comments: 35 pages; Added discussion of matching upper bounds, and generalization to higher-order methods

  17. arXiv:1611.04982  [pdf, ps, other

    math.OC cs.LG stat.ML

    Oracle Complexity of Second-Order Methods for Finite-Sum Problems

    Authors: Yossi Arjevani, Ohad Shamir

    Abstract: Finite-sum optimization problems are ubiquitous in machine learning, and are commonly solved using first-order methods which rely on gradient computations. Recently, there has been growing interest in \emph{second-order} methods, which rely on both gradients and Hessians. In principle, second-order methods can require much fewer iterations than first-order methods, and hold the promise for more ef… ▽ More

    Submitted 8 March, 2017; v1 submitted 15 November, 2016; originally announced November 2016.

    Comments: 30 pages

  18. arXiv:1606.09333  [pdf, other

    math.OC cs.LG math.NA

    Dimension-Free Iteration Complexity of Finite Sum Optimization Problems

    Authors: Yossi Arjevani, Ohad Shamir

    Abstract: Many canonical machine learning problems boil down to a convex optimization problem with a finite sum structure. However, whereas much progress has been made in develo** faster algorithms for this setting, the inherent limitations of these problems are not satisfactorily addressed by existing lower bounds. Indeed, current bounds focus on first-order optimization algorithms, and only apply in the… ▽ More

    Submitted 29 June, 2016; originally announced June 2016.

  19. arXiv:1605.03529  [pdf, ps, other

    math.OC cs.LG

    On the Iteration Complexity of Oblivious First-Order Optimization Algorithms

    Authors: Yossi Arjevani, Ohad Shamir

    Abstract: We consider a broad class of first-order optimization algorithms which are \emph{oblivious}, in the sense that their step sizes are scheduled regardless of the function under consideration, except for limited side-information such as smoothness or strong convexity parameters. With the knowledge of these two parameters, we show that any such algorithm attains an iteration complexity lower bound of… ▽ More

    Submitted 11 May, 2016; originally announced May 2016.

  20. arXiv:1506.01900  [pdf, ps, other

    cs.LG math.OC stat.ML

    Communication Complexity of Distributed Convex Learning and Optimization

    Authors: Yossi Arjevani, Ohad Shamir

    Abstract: We study the fundamental limits to communication-efficient distributed methods for convex learning and optimization, under different assumptions on the information available to individual machines, and the types of functions considered. We identify cases where existing algorithms are already worst-case optimal, as well as cases where room for further improvement is still possible. Among other thin… ▽ More

    Submitted 28 October, 2015; v1 submitted 5 June, 2015; originally announced June 2015.

  21. arXiv:1503.06833  [pdf, other

    math.OC cs.LG

    On Lower and Upper Bounds for Smooth and Strongly Convex Optimization Problems

    Authors: Yossi Arjevani, Shai Shalev-Shwartz, Ohad Shamir

    Abstract: We develop a novel framework to study smooth and strongly convex optimization algorithms, both deterministic and stochastic. Focusing on quadratic functions we are able to examine optimization algorithms as a recursive application of linear operators. This, in turn, reveals a powerful connection between a class of optimization algorithms and the analytic theory of polynomials whereby new lower and… ▽ More

    Submitted 23 March, 2015; originally announced March 2015.

  22. arXiv:1410.6387  [pdf, other

    math.OC cs.LG

    On Lower and Upper Bounds in Smooth Strongly Convex Optimization - A Unified Approach via Linear Iterative Methods

    Authors: Yossi Arjevani

    Abstract: In this thesis we develop a novel framework to study smooth and strongly convex optimization algorithms, both deterministic and stochastic. Focusing on quadratic functions we are able to examine optimization algorithms as a recursive application of linear operators. This, in turn, reveals a powerful connection between a class of optimization algorithms and the analytic theory of polynomials whereb… ▽ More

    Submitted 23 October, 2014; originally announced October 2014.

    Comments: A related paper co-authored with Shai Shalev-Shwartz and Ohad Shamir is to be published soon