Skip to main content

Showing 1–15 of 15 results for author: Pillaud-Vivien, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.02322  [pdf, other

    cs.LG math.PR

    Stochastic Differential Equations models for Least-Squares Stochastic Gradient Descent

    Authors: Adrien Schertzer, Loucas Pillaud-Vivien

    Abstract: We study the dynamics of a continuous-time model of the Stochastic Gradient Descent (SGD) for the least-square problem. Indeed, pursuing the work of Li et al. (2019), we analyze Stochastic Differential Equations (SDEs) that model SGD either in the case of the training loss (finite samples) or the population one (online setting). A key qualitative feature of the dynamics is the existence of a perfe… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  2. arXiv:2403.13748  [pdf, other

    stat.ML cs.LG stat.CO

    Variational Inference for Uncertainty Quantification: an Analysis of Trade-offs

    Authors: Charles C. Margossian, Loucas Pillaud-Vivien, Lawrence K. Saul

    Abstract: Given an intractable distribution $p$, the problem of variational inference (VI) is to find the best approximation from some more tractable family $Q$. Commonly, one chooses $Q$ to be a family of factorized distributions (i.e., the mean-field assumption), even though~$p$ itself does not factorize. We show that this mismatch leads to an impossibility theorem: if $p$ does not factorize, then any fac… ▽ More

    Submitted 7 June, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

  3. arXiv:2403.05529  [pdf, other

    cs.LG stat.ML

    Computational-Statistical Gaps in Gaussian Single-Index Models

    Authors: Alex Damian, Loucas Pillaud-Vivien, Jason D. Lee, Joan Bruna

    Abstract: Single-Index Models are high-dimensional regression problems with planted structure, whereby labels depend on an unknown one-dimensional projection of the input via a generic, non-linear, and potentially non-deterministic transformation. As such, they encompass a broad class of statistical inference tasks, and provide a rich template to study statistical and computational trade-offs in the high-di… ▽ More

    Submitted 12 March, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

    Comments: 61 pages

  4. arXiv:2402.14758  [pdf, other

    stat.ML cs.AI cs.LG stat.CO

    Batch and match: black-box variational inference with a score-based divergence

    Authors: Diana Cai, Chirag Modi, Loucas Pillaud-Vivien, Charles C. Margossian, Robert M. Gower, David M. Blei, Lawrence K. Saul

    Abstract: Most leading implementations of black-box variational inference (BBVI) are based on optimizing a stochastic evidence lower bound (ELBO). But such approaches to BBVI often converge slowly due to the high variance of their gradient estimates and their sensitivity to hyperparameters. In this work, we propose batch and match (BaM), an alternative approach to BBVI based on a score-based divergence. Not… ▽ More

    Submitted 12 June, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

    Comments: 49 pages, 14 figures. To appear in the Proceedings of the 41st International Conference on Machine Learning (ICML), 2024

  5. arXiv:2310.19793  [pdf, other

    stat.ML cs.LG math.OC

    On Learning Gaussian Multi-index Models with Gradient Flow

    Authors: Alberto Bietti, Joan Bruna, Loucas Pillaud-Vivien

    Abstract: We study gradient flow on the multi-index regression problem for high-dimensional Gaussian data. Multi-index functions consist of a composition of an unknown low-rank linear projection and an arbitrary unknown, low-dimensional link function. As such, they constitute a natural template for feature learning in neural networks. We consider a two-timescale algorithm, whereby the low-dimensional link… ▽ More

    Submitted 2 November, 2023; v1 submitted 30 October, 2023; originally announced October 2023.

  6. arXiv:2307.15804  [pdf, other

    cs.LG

    On Single Index Models beyond Gaussian Data

    Authors: Joan Bruna, Loucas Pillaud-Vivien, Aaron Zweig

    Abstract: Sparse high-dimensional functions have arisen as a rich framework to study the behavior of gradient-descent methods using shallow neural networks, showcasing their ability to perform feature learning beyond linear models. Amongst those functions, the simplest are single-index models $f(x) = φ( x \cdot θ^*)$, where the labels are generated by an arbitrary non-linear scalar link function $φ$ applied… ▽ More

    Submitted 25 October, 2023; v1 submitted 28 July, 2023; originally announced July 2023.

  7. arXiv:2302.06757  [pdf, other

    stat.ML cs.LG math.NA math.ST

    Kernelized Diffusion maps

    Authors: Loucas Pillaud-Vivien, Francis Bach

    Abstract: Spectral clustering and diffusion maps are celebrated dimensionality reduction algorithms built on eigen-elements related to the diffusive structure of the data. The core of these procedures is the approximation of a Laplacian through a graph kernel approach, however this local average construction is known to be cursed by the high-dimension d. In this article, we build a different estimator of th… ▽ More

    Submitted 13 February, 2023; originally announced February 2023.

    Comments: 19 pages, 1 Figure

  8. arXiv:2210.05337  [pdf, other

    cs.LG stat.ML

    SGD with Large Step Sizes Learns Sparse Features

    Authors: Maksym Andriushchenko, Aditya Varre, Loucas Pillaud-Vivien, Nicolas Flammarion

    Abstract: We showcase important features of the dynamics of the Stochastic Gradient Descent (SGD) in the training of neural networks. We present empirical observations that commonly used large step sizes (i) lead the iterates to jump from one side of a valley to the other causing loss stabilization, and (ii) this stabilization induces a hidden stochastic dynamics orthogonal to the bouncing directions that b… ▽ More

    Submitted 7 June, 2023; v1 submitted 11 October, 2022; originally announced October 2022.

    Comments: The camera-ready version (ICML 2023): extended experiments on deep networks (DenseNets on CIFAR-10, CIFAR-100, and Tiny ImageNet), empirically validated the SDE modelling, improved the clarity of the paper

  9. arXiv:2206.09841  [pdf, other

    stat.ML cs.LG math.OC

    Label noise (stochastic) gradient descent implicitly solves the Lasso for quadratic parametrisation

    Authors: Loucas Pillaud-Vivien, Julien Reygner, Nicolas Flammarion

    Abstract: Understanding the implicit bias of training algorithms is of crucial importance in order to explain the success of overparametrised neural networks. In this paper, we study the role of the label noise in the training dynamics of a quadratically parametrised model through its continuous time version. We explicitly characterise the solution chosen by the stochastic flow and prove that it implicitly… ▽ More

    Submitted 20 June, 2022; originally announced June 2022.

  10. arXiv:2206.00939  [pdf, other

    stat.ML cs.LG

    Gradient flow dynamics of shallow ReLU networks for square loss and orthogonal inputs

    Authors: Etienne Boursier, Loucas Pillaud-Vivien, Nicolas Flammarion

    Abstract: The training of neural networks by gradient descent methods is a cornerstone of the deep learning revolution. Yet, despite some recent progress, a complete theory explaining its success is still missing. This article presents, for orthogonal input vectors, a precise description of the gradient flow dynamics of training one-hidden layer ReLU neural networks for the mean squared error at small initi… ▽ More

    Submitted 31 October, 2022; v1 submitted 2 June, 2022; originally announced June 2022.

  11. arXiv:2106.09524  [pdf, other

    cs.LG

    Implicit Bias of SGD for Diagonal Linear Networks: a Provable Benefit of Stochasticity

    Authors: Scott Pesme, Loucas Pillaud-Vivien, Nicolas Flammarion

    Abstract: Understanding the implicit bias of training algorithms is of crucial importance in order to explain the success of overparametrised neural networks. In this paper, we study the dynamics of stochastic gradient descent over diagonal linear networks through its continuous time version, namely stochastic gradient flow. We explicitly characterise the solution chosen by the stochastic flow and prove tha… ▽ More

    Submitted 7 December, 2021; v1 submitted 17 June, 2021; originally announced June 2021.

  12. arXiv:2102.03183  [pdf, ps, other

    cs.LG math.OC stat.ML

    Last iterate convergence of SGD for Least-Squares in the Interpolation regime

    Authors: Aditya Varre, Loucas Pillaud-Vivien, Nicolas Flammarion

    Abstract: Motivated by the recent successes of neural networks that have the ability to fit the data perfectly and generalize well, we study the noiseless model in the fundamental least-squares setup. We assume that an optimum predictor fits perfectly inputs and outputs $\langle θ_* , φ(X) \rangle = Y$, where $φ(X)$ stands for a possibly infinite dimensional non-linear feature map. To solve this problem, we… ▽ More

    Submitted 2 June, 2021; v1 submitted 5 February, 2021; originally announced February 2021.

    Comments: 23 pages, 1 figure, 1 Appendix

  13. arXiv:2009.04324  [pdf, other

    stat.ML cs.LG

    Overcoming the curse of dimensionality with Laplacian regularization in semi-supervised learning

    Authors: Vivien Cabannes, Loucas Pillaud-Vivien, Francis Bach, Alessandro Rudi

    Abstract: As annotations of data can be scarce in large-scale practical problems, leveraging unlabelled examples is one of the most important aspects of machine learning. This is the aim of semi-supervised learning. To benefit from the access to unlabelled data, it is natural to diffuse smoothly knowledge of labelled data to unlabelled one. This induces to the use of Laplacian regularization. Yet, current i… ▽ More

    Submitted 29 November, 2021; v1 submitted 9 September, 2020; originally announced September 2020.

    Comments: 38 pages, 6 figures

    Journal ref: NeurIPS 2021

  14. arXiv:1805.10074  [pdf, other

    cs.LG math.OC math.ST stat.ML

    Statistical Optimality of Stochastic Gradient Descent on Hard Learning Problems through Multiple Passes

    Authors: Loucas Pillaud-Vivien, Alessandro Rudi, Francis Bach

    Abstract: We consider stochastic gradient descent (SGD) for least-squares regression with potentially several passes over the data. While several passes have been widely reported to perform practically better in terms of predictive performance on unseen data, the existing theoretical analysis of SGD suggests that a single pass is statistically optimal. While this is true for low-dimensional easy problems, w… ▽ More

    Submitted 23 November, 2018; v1 submitted 25 May, 2018; originally announced May 2018.

    Journal ref: Neural Information Processing Systems (NIPS), Dec 2018, Montr{é}al, Canada. 2018

  15. arXiv:1712.04755  [pdf, other

    cs.LG stat.ML

    Exponential convergence of testing error for stochastic gradient methods

    Authors: Loucas Pillaud-Vivien, Alessandro Rudi, Francis Bach

    Abstract: We consider binary classification problems with positive definite kernels and square loss, and study the convergence rates of stochastic gradient methods. We show that while the excess testing loss (squared loss) converges slowly to zero as the number of observations (and thus iterations) goes to infinity, the testing error (classification error) converges exponentially fast if low-noise condition… ▽ More

    Submitted 20 November, 2018; v1 submitted 13 December, 2017; originally announced December 2017.

    Journal ref: Conference on Learning Theory (COLT), Jul 2018, Stockholm, Sweden