Skip to main content

Showing 1–11 of 11 results for author: Berthier, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.06354  [pdf, other

    cs.LG

    On the Minimal Degree Bias in Generalization on the Unseen for non-Boolean Functions

    Authors: Denys Pushkin, Raphaël Berthier, Emmanuel Abbe

    Abstract: We investigate the out-of-domain generalization of random feature (RF) models and Transformers. We first prove that in the `generalization on the unseen (GOTU)' setting, where training data is fully seen in some part of the domain but testing is made on another part, and for RF models in the small feature regime, the convergence takes place to interpolators of minimal degree as in the Boolean case… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 9 pages of main body, 24 pages in total. 7 figures Proceedings of the 41-st International Conference on Machine Learning, Vienna, Austria. PMLR 235, 2024

  2. arXiv:2304.09576  [pdf, other

    math.OC cs.LG stat.ML

    Leveraging the two timescale regime to demonstrate convergence of neural networks

    Authors: Pierre Marion, Raphaël Berthier

    Abstract: We study the training dynamics of shallow neural networks, in a two-timescale regime in which the stepsizes for the inner layer are much smaller than those for the outer layer. In this regime, we prove convergence of the gradient flow to a global optimum of the non-convex optimization problem in a simple univariate setting. The number of neurons need not be asymptotically large for our result to h… ▽ More

    Submitted 25 October, 2023; v1 submitted 19 April, 2023; originally announced April 2023.

    Comments: NeurIPS 2023. 34 pages, 10 figures

  3. arXiv:2303.00055  [pdf, other

    cs.LG math.OC stat.ML

    Learning time-scales in two-layers neural networks

    Authors: Raphaël Berthier, Andrea Montanari, Kangjie Zhou

    Abstract: Gradient-based learning in multi-layer neural networks displays a number of striking features. In particular, the decrease rate of empirical risk is non-monotone even after averaging over large batches. Long plateaus in which one observes barely any progress alternate with intervals of rapid decrease. These successive phases of learning often take place on very different time scales. Finally, mode… ▽ More

    Submitted 17 April, 2024; v1 submitted 28 February, 2023; originally announced March 2023.

    Comments: 64 pages, 15 figures

    MSC Class: 34E15; 37N40; 68T07

  4. arXiv:2208.14673  [pdf, other

    cs.LG math.OC

    Incremental Learning in Diagonal Linear Networks

    Authors: Raphaël Berthier

    Abstract: Diagonal linear networks (DLNs) are a toy simplification of artificial neural networks; they consist in a quadratic reparametrization of linear regression inducing a sparse implicit regularization. In this paper, we describe the trajectory of the gradient flow of DLNs in the limit of small initialization. We show that incremental learning is effectively performed in the limit: coordinates are succ… ▽ More

    Submitted 13 November, 2023; v1 submitted 31 August, 2022; originally announced August 2022.

    Journal ref: Journal of Machine Learning Research, 2023, 24 (171), pp.1-26

  5. arXiv:2202.10742  [pdf, other

    cs.DC cs.MA

    Acceleration of Gossip Algorithms through the Euler-Poisson-Darboux Equation

    Authors: Raphaël Berthier, Mufan Li

    Abstract: Gossip algorithms and their accelerated versions have been studied exclusively in discrete time on graphs. In this work, we take a different approach, and consider the scaling limit of gossip algorithms in both large graphs and large number of iterations. These limits lead to well-known partial differential equations (PDEs) with insightful properties. On lattices, we prove that the non-accelerated… ▽ More

    Submitted 22 February, 2022; originally announced February 2022.

  6. arXiv:2109.11905  [pdf, ps, other

    cs.IT math.PR math.ST stat.ML

    Graph-based Approximate Message Passing Iterations

    Authors: Cédric Gerbelot, Raphaël Berthier

    Abstract: Approximate-message passing (AMP) algorithms have become an important element of high-dimensional statistical inference, mostly due to their adaptability and concentration properties, the state evolution (SE) equations. This is demonstrated by the growing number of new iterations proposed for increasingly complex problems, ranging from multi-layer inference to low-rank matrix estimation with elabo… ▽ More

    Submitted 19 April, 2022; v1 submitted 24 September, 2021; originally announced September 2021.

    Comments: 59 pages, 24 main, 35 appendix

  7. arXiv:2106.07644  [pdf, other

    math.OC cs.LG cs.MA math.PR stat.ML

    A Continuized View on Nesterov Acceleration for Stochastic Gradient Descent and Randomized Gossip

    Authors: Mathieu Even, Raphaël Berthier, Francis Bach, Nicolas Flammarion, Pierre Gaillard, Hadrien Hendrikx, Laurent Massoulié, Adrien Taylor

    Abstract: We introduce the continuized Nesterov acceleration, a close variant of Nesterov acceleration whose variables are indexed by a continuous time parameter. The two variables continuously mix following a linear ordinary differential equation and take gradient steps at random times. This continuized variant benefits from the best of the continuous and the discrete frameworks: as a continuous process, o… ▽ More

    Submitted 27 October, 2021; v1 submitted 10 June, 2021; originally announced June 2021.

    Comments: arXiv admin note: substantial text overlap with arXiv:2102.06035

  8. arXiv:2102.06035  [pdf, other

    cs.DC math.OC

    A Continuized View on Nesterov Acceleration

    Authors: Raphaël Berthier, Francis Bach, Nicolas Flammarion, Pierre Gaillard, Adrien Taylor

    Abstract: We introduce the "continuized" Nesterov acceleration, a close variant of Nesterov acceleration whose variables are indexed by a continuous time parameter. The two variables continuously mix following a linear ordinary differential equation and take gradient steps at random times. This continuized variant benefits from the best of the continuous and the discrete frameworks: as a continuous process,… ▽ More

    Submitted 11 February, 2021; originally announced February 2021.

  9. arXiv:2006.08212  [pdf, other

    cs.LG cs.MA math.OC stat.ML

    Tight Nonparametric Convergence Rates for Stochastic Gradient Descent under the Noiseless Linear Model

    Authors: Raphaël Berthier, Francis Bach, Pierre Gaillard

    Abstract: In the context of statistical supervised learning, the noiseless linear model assumes that there exists a deterministic linear relation $Y = \langle θ_*, X \rangle$ between the random output $Y$ and the random feature vector $Φ(U)$, a potentially non-linear transformation of the inputs $U$. We analyze the convergence of single-pass, fixed step-size stochastic gradient descent on the least-square r… ▽ More

    Submitted 27 October, 2020; v1 submitted 15 June, 2020; originally announced June 2020.

  10. arXiv:1805.08531  [pdf, other

    cs.MA cs.DC stat.ML

    Accelerated Gossip in Networks of Given Dimension using Jacobi Polynomial Iterations

    Authors: Raphaël Berthier, Francis Bach, Pierre Gaillard

    Abstract: Consider a network of agents connected by communication links, where each agent holds a real value. The gossip problem consists in estimating the average of the values diffused in the network in a distributed manner. We develop a method solving the gossip problem that depends only on the spectral dimension of the network, that is, in the communication network set-up, the dimension of the space in… ▽ More

    Submitted 11 June, 2019; v1 submitted 22 May, 2018; originally announced May 2018.

  11. arXiv:1708.03950  [pdf, other

    cs.IT

    State Evolution for Approximate Message Passing with Non-Separable Functions

    Authors: Raphael Berthier, Andrea Montanari, Phan-Minh Nguyen

    Abstract: Given a high-dimensional data matrix ${\boldsymbol A}\in{\mathbb R}^{m\times n}$, Approximate Message Passing (AMP) algorithms construct sequences of vectors ${\boldsymbol u}^t\in{\mathbb R}^n$, ${\boldsymbol v}^t\in{\mathbb R}^m$, indexed by $t\in\{0,1,2\dots\}$ by iteratively applying ${\boldsymbol A}$ or ${\boldsymbol A}^{\sf T}$, and suitable non-linear functions, which depend on the specific… ▽ More

    Submitted 13 August, 2017; originally announced August 2017.

    Comments: 41 pages, 4 figures