Skip to main content

Showing 1–9 of 9 results for author: Arous, G B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2310.03010  [pdf, other

    cs.LG math.PR stat.ML

    High-dimensional SGD aligns with emerging outlier eigenspaces

    Authors: Gerard Ben Arous, Reza Gheissari, Jiaoyang Huang, Aukosh Jagannath

    Abstract: We rigorously study the joint evolution of training dynamics via stochastic gradient descent (SGD) and the spectra of empirical Hessian and gradient matrices. We prove that in two canonical classification tasks for multi-class high-dimensional mixtures and either 1 or 2-layer neural networks, the SGD trajectory rapidly aligns with emerging low-rank outlier eigenspaces of the Hessian and gradient m… ▽ More

    Submitted 4 October, 2023; originally announced October 2023.

    Comments: 52 pages, 12 figures

  2. arXiv:2206.04030  [pdf, other

    stat.ML cs.LG math.PR math.ST

    High-dimensional limit theorems for SGD: Effective dynamics and critical scaling

    Authors: Gerard Ben Arous, Reza Gheissari, Aukosh Jagannath

    Abstract: We study the scaling limits of stochastic gradient descent (SGD) with constant step-size in the high-dimensional regime. We prove limit theorems for the trajectories of summary statistics (i.e., finite-dimensional functions) of SGD as the dimension goes to infinity. Our approach allows one to choose the summary statistics that are tracked, the initialization, and the step-size. It yields both ball… ▽ More

    Submitted 17 August, 2023; v1 submitted 8 June, 2022; originally announced June 2022.

    Comments: 43 pages, 11 figures

  3. arXiv:2110.10210  [pdf, other

    math.PR cs.LG stat.ML

    Long Random Matrices and Tensor Unfolding

    Authors: Gérard Ben Arous, Daniel Zhengyu Huang, Jiaoyang Huang

    Abstract: In this paper, we consider the singular values and singular vectors of low rank perturbations of large rectangular random matrices, in the regime the matrix is "long": we allow the number of rows (columns) to grow polynomially in the number of columns (rows). We prove there exists a critical signal-to-noise ratio (depending on the dimensions of the matrix), and the extreme singular values and sing… ▽ More

    Submitted 19 October, 2021; originally announced October 2021.

    Comments: 29 pages, 4 figures

  4. arXiv:2006.10689  [pdf, ps, other

    math.PR cs.DS cs.LG math.OC math.ST

    Free Energy Wells and Overlap Gap Property in Sparse PCA

    Authors: Gérard Ben Arous, Alexander S. Wein, Ilias Zadik

    Abstract: We study a variant of the sparse PCA (principal component analysis) problem in the "hard" regime, where the inference task is possible yet no polynomial-time algorithm is known to exist. Prior work, based on the low-degree likelihood ratio, has conjectured a precise expression for the best possible (sub-exponential) runtime throughout the hard regime. Following instead a statistical physics inspir… ▽ More

    Submitted 18 June, 2020; originally announced June 2020.

    Comments: 63 pages. Accepted for presentation at the Conference on Learning Theory (COLT) 2020

  5. arXiv:2003.10409  [pdf, other

    stat.ML cs.LG math.PR math.ST

    Online stochastic gradient descent on non-convex losses from high-dimensional inference

    Authors: Gerard Ben Arous, Reza Gheissari, Aukosh Jagannath

    Abstract: Stochastic gradient descent (SGD) is a popular algorithm for optimization problems arising in high-dimensional inference tasks. Here one produces an estimator of an unknown parameter from independent samples of data by iteratively optimizing a loss function. This loss function is random and often non-convex. We study the performance of the simplest version of SGD, namely online SGD, from a random… ▽ More

    Submitted 10 May, 2021; v1 submitted 23 March, 2020; originally announced March 2020.

    Comments: final version to appear at Jour. Mach. Learn. Res$.$

    Journal ref: J. Mach. Learn. Res., Vol 22, No.106,1-51(2021)

  6. arXiv:1912.02143  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG math.PR

    Landscape Complexity for the Empirical Risk of Generalized Linear Models

    Authors: Antoine Maillard, Gérard Ben Arous, Giulio Biroli

    Abstract: We present a method to obtain the average and the typical value of the number of critical points of the empirical risk landscape for generalized linear estimation problems and variants. This represents a substantial extension of previous applications of the Kac-Rice method since it allows to analyze the critical points of high dimensional non-Gaussian random functions. Under a technical hypothesis… ▽ More

    Submitted 18 January, 2023; v1 submitted 4 December, 2019; originally announced December 2019.

    Comments: 18 pages and 18 pages appendix. Update to match the published version (v2). Corrections of remaining small typos (v3). Simplification of a technical argument in Appendix A (v4) and clarification of a technical hypothesis (v5)

    Journal ref: Proceedings of The First Mathematical and Scientific Machine Learning Conference, PMLR 107:287-327, 2020

  7. arXiv:1803.06969  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Comparing Dynamics: Deep Neural Networks versus Glassy Systems

    Authors: M. Baity-Jesi, L. Sagun, M. Geiger, S. Spigler, G. Ben Arous, C. Cammarota, Y. LeCun, M. Wyart, G. Biroli

    Abstract: We analyze numerically the training dynamics of deep neural networks (DNN) by using methods developed in statistical physics of glassy systems. The two main issues we address are (1) the complexity of the loss landscape and of the dynamics within it, and (2) to what extent DNNs share similarities with glassy systems. Our findings, obtained for different architectures and datasets, suggest that dur… ▽ More

    Submitted 7 June, 2018; v1 submitted 19 March, 2018; originally announced March 2018.

    Comments: 10 pages, 5 figures. Version accepted at ICML 2018

    Journal ref: PMLR 80:324-333, 2018; Republication with DOI (cite this one): J. Stat. Mech. (2019) 124013

  8. arXiv:1412.6615  [pdf, other

    stat.ML cs.LG

    Explorations on high dimensional landscapes

    Authors: Levent Sagun, V. Ugur Guney, Gerard Ben Arous, Yann LeCun

    Abstract: Finding minima of a real valued non-convex function over a high dimensional space is a major challenge in science. We provide evidence that some such functions that are defined on high dimensional domains have a narrow band of values whose pre-image contains the bulk of its critical points. This is in contrast with the low dimensional picture in which this band is wide. Our simulations agree with… ▽ More

    Submitted 6 April, 2015; v1 submitted 20 December, 2014; originally announced December 2014.

    Comments: 11 pages, 8 figures, workshop contribution at ICLR 2015

  9. arXiv:1412.0233  [pdf, other

    cs.LG

    The Loss Surfaces of Multilayer Networks

    Authors: Anna Choromanska, Mikael Henaff, Michael Mathieu, Gérard Ben Arous, Yann LeCun

    Abstract: We study the connection between the highly non-convex loss function of a simple model of the fully-connected feed-forward neural network and the Hamiltonian of the spherical spin-glass model under the assumptions of: i) variable independence, ii) redundancy in network parametrization, and iii) uniformity. These assumptions enable us to explain the complexity of the fully decoupled neural network t… ▽ More

    Submitted 21 January, 2015; v1 submitted 30 November, 2014; originally announced December 2014.