Skip to main content

Showing 1–18 of 18 results for author: Chizat, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.13456  [pdf, other

    stat.ML cs.LG

    Deep linear networks for regression are implicitly regularized towards flat minima

    Authors: Pierre Marion, Lénaïc Chizat

    Abstract: The largest eigenvalue of the Hessian, or sharpness, of neural networks is a key quantity to understand their optimization dynamics. In this paper, we study the sharpness of deep linear networks for overdetermined univariate regression. Minimizers can have arbitrarily large sharpness, but not an arbitrarily small one. Indeed, we show a lower bound on the sharpness of minimizers, which grows linear… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: 46 pages, 4 figures

  2. arXiv:2311.18718  [pdf, other

    cs.LG

    The Feature Speed Formula: a flexible approach to scale hyper-parameters of deep neural networks

    Authors: Lénaïc Chizat, Praneeth Netrapalli

    Abstract: Deep learning succeeds by doing hierarchical feature learning, yet tuning hyper-parameters (HP) such as initialization scales, learning rates etc., only give indirect control over this behavior. In this paper, we introduce a key notion to predict and control feature learning: the angle $θ_\ell$ between the feature updates and the backward pass (at layer index $\ell$). We show that the magnitude of… ▽ More

    Submitted 22 June, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

    Comments: Previous title "Steering deep feature learning with backward aligned feature updates". Novelties in v3: BFA for linear Resnets (Prop. 5.2), scaling FSC (Table 1), and content reorganized

    MSC Class: 68T07

  3. arXiv:2307.13370  [pdf, ps, other

    math.OC cs.LG stat.ML

    Computational Guarantees for Doubly Entropic Wasserstein Barycenters via Damped Sinkhorn Iterations

    Authors: Lénaïc Chizat, Tomas Vaškevičius

    Abstract: We study the computation of doubly regularized Wasserstein barycenters, a recently introduced family of entropic barycenters governed by inner and outer regularization strengths. Previous research has demonstrated that various regularization parameter choices unify several notions of entropy-penalized barycenters while also revealing new ones, including a special case of debiased barycenters. In t… ▽ More

    Submitted 25 July, 2023; originally announced July 2023.

  4. arXiv:2305.17275  [pdf, other

    math.OC cs.GT cs.LG

    Local Convergence of Gradient Methods for Min-Max Games: Partial Curvature Generically Suffices

    Authors: Guillaume Wang, Lénaïc Chizat

    Abstract: We study the convergence to local Nash equilibria of gradient methods for two-player zero-sum differentiable games. It is well-known that such dynamics converge locally when $S \succ 0$ and may diverge when $S=0$, where $S\succeq 0$ is the symmetric part of the Jacobian at equilibrium that accounts for the "potential" component of the game. We show that these dynamics also converge as soon as $S$… ▽ More

    Submitted 7 November, 2023; v1 submitted 26 May, 2023; originally announced May 2023.

    Comments: 37 pages, 2 figures, 2 tables, to appear in NeurIPS 2023 Proceedings

  5. arXiv:2303.17805  [pdf, other

    cs.LG math.OC

    On the Effect of Initialization: The Scaling Path of 2-Layer Neural Networks

    Authors: Sebastian Neumayer, Lénaïc Chizat, Michael Unser

    Abstract: In supervised learning, the regularization path is sometimes used as a convenient theoretical proxy for the optimization path of gradient descent initialized from zero. In this paper, we study a modification of the regularization path for infinite-width 2-layer ReLU neural networks with nonzero initial distribution of the weights at different scales. By exploiting a link with unbalanced optimal-tr… ▽ More

    Submitted 9 August, 2023; v1 submitted 31 March, 2023; originally announced March 2023.

  6. arXiv:2303.11844  [pdf, other

    math.OC cs.LG stat.ML

    Doubly Regularized Entropic Wasserstein Barycenters

    Authors: Lénaïc Chizat

    Abstract: We study a general formulation of regularized Wasserstein barycenters that enjoys favorable regularity, approximation, stability and (grid-free) optimization properties. This barycenter is defined as the unique probability measure that minimizes the sum of entropic optimal transport (EOT) costs with respect to a family of given probability measures, plus an entropy term. We denote it $(λ,τ)$-baryc… ▽ More

    Submitted 21 March, 2023; originally announced March 2023.

    MSC Class: 49N99 (Primary) 62G05; 90C30 (Secondary)

  7. arXiv:2211.16980  [pdf, other

    cs.LG math.OC stat.ML

    Infinite-width limit of deep linear neural networks

    Authors: Lénaïc Chizat, Maria Colombo, Xavier Fernández-Real, Alessio Figalli

    Abstract: This paper studies the infinite-width limit of deep linear neural networks initialized with random parameters. We obtain that, when the number of neurons diverges, the training dynamics converge (in a precise sense) to the dynamics obtained from a gradient descent on an infinitely wide deterministic linear neural network. Moreover, even if the weights remain random, we get their precise law along… ▽ More

    Submitted 29 November, 2022; originally announced November 2022.

    MSC Class: 68T07; 35Q49

  8. arXiv:2211.08771  [pdf, other

    cs.LG stat.ML

    On the symmetries in the dynamics of wide two-layer neural networks

    Authors: Karl Hajjar, Lenaic Chizat

    Abstract: We consider the idealized setting of gradient flow on the population risk for infinitely wide two-layer ReLU neural networks (without bias), and study the effect of symmetries on the learned parameters and predictors. We first describe a general class of symmetries which, when satisfied by the target function $f^*$ and the input distribution, are preserved by the dynamics. We then study more speci… ▽ More

    Submitted 9 February, 2023; v1 submitted 16 November, 2022; originally announced November 2022.

  9. arXiv:2211.01280  [pdf, other

    math.OC cs.GT cs.LG

    An Exponentially Converging Particle Method for the Mixed Nash Equilibrium of Continuous Games

    Authors: Guillaume Wang, Lénaïc Chizat

    Abstract: We consider the problem of computing mixed Nash equilibria of two-player zero-sum games with continuous sets of pure strategies and with first-order access to the payoff function. This problem arises for example in game-theory-inspired machine learning applications, such as distributionally-robust learning. In those applications, the strategy sets are high-dimensional and thus methods based on dis… ▽ More

    Submitted 13 April, 2023; v1 submitted 2 November, 2022; originally announced November 2022.

    Comments: 72 pages, 6 figures. Simplified notation for the Lyapunov function, added section 3.3.3 on "local star-convexity-concavity" property, made clarifying adjustments in the appendix

  10. arXiv:2205.07146  [pdf, other

    math.OC cs.LG stat.ML

    Trajectory Inference via Mean-field Langevin in Path Space

    Authors: Lénaïc Chizat, Stephen Zhang, Matthieu Heitz, Geoffrey Schiebinger

    Abstract: Trajectory inference aims at recovering the dynamics of a population from snapshots of its temporal marginals. To solve this task, a min-entropy estimator relative to the Wiener measure in path space was introduced by Lavenant et al. arXiv:2102.09204, and shown to consistently recover the dynamics of a large class of drift-diffusion processes from the solution of an infinite dimensional convex opt… ▽ More

    Submitted 12 October, 2022; v1 submitted 14 May, 2022; originally announced May 2022.

    MSC Class: 90-08 (Primary) 62M99 (Secondary)

  11. arXiv:2110.15596  [pdf, other

    cs.LG

    Training Integrable Parameterizations of Deep Neural Networks in the Infinite-Width Limit

    Authors: Karl Hajjar, Lénaïc Chizat, Christophe Giraud

    Abstract: To theoretically understand the behavior of trained deep neural networks, it is necessary to study the dynamics induced by gradient methods from a random initialization. However, the nonlinear and compositional structure of these models make these dynamics difficult to analyze. To overcome these challenges, large-width asymptotics have recently emerged as a fruitful viewpoint and led to practical… ▽ More

    Submitted 20 December, 2021; v1 submitted 29 October, 2021; originally announced October 2021.

  12. arXiv:2110.08084  [pdf, other

    cs.LG math.OC math.ST

    Gradient Descent on Infinitely Wide Neural Networks: Global Convergence and Generalization

    Authors: Francis Bach, Lenaïc Chizat

    Abstract: Many supervised machine learning methods are naturally cast as optimization problems. For prediction models which are linear in their parameters, this often leads to convex problems for which many mathematical guarantees exist. Models which are non-linear in their parameters such as neural networks lead to non-convex optimization problems for which guarantees are harder to obtain. In this review p… ▽ More

    Submitted 15 October, 2021; originally announced October 2021.

  13. arXiv:2003.05783  [pdf, other

    stat.ML cs.LG

    Statistical and Topological Properties of Sliced Probability Divergences

    Authors: Kimia Nadjahi, Alain Durmus, Lénaïc Chizat, Soheil Kolouri, Shahin Shahrampour, Umut Şimşekli

    Abstract: The idea of slicing divergences has been proven to be successful when comparing two probability measures in various machine learning applications including generative modeling, and consists in computing the expected value of a `base divergence' between one-dimensional random projections of the two measures. However, the topological, statistical, and computational consequences of this technique hav… ▽ More

    Submitted 4 January, 2022; v1 submitted 12 March, 2020; originally announced March 2020.

    Comments: Published at NeurIPS 2020 (Spotlight)

  14. arXiv:2002.04486  [pdf, other

    math.OC cs.LG stat.ML

    Implicit Bias of Gradient Descent for Wide Two-layer Neural Networks Trained with the Logistic Loss

    Authors: Lenaic Chizat, Francis Bach

    Abstract: Neural networks trained to minimize the logistic (a.k.a. cross-entropy) loss with gradient-based methods are observed to perform well in many supervised classification tasks. Towards understanding this phenomenon, we analyze the training and generalization behavior of infinitely wide two-layer neural networks with homogeneous activations. We show that the limits of the gradient flow on exponential… ▽ More

    Submitted 22 June, 2020; v1 submitted 11 February, 2020; originally announced February 2020.

    Journal ref: Conference on Learning Theory, Jul 2020, Graz, Austria

  15. arXiv:1903.07614  [pdf, other

    cs.GR cs.CV cs.DS physics.data-an physics.geo-ph

    HexaShrink, an exact scalable framework for hexahedral meshes with attributes and discontinuities: multiresolution rendering and storage of geoscience models

    Authors: Jean-Luc Peyrot, Laurent Duval, Frédéric Payan, Lauriane Bouard, Lénaïc Chizat, Sébastien Schneider, Marc Antonini

    Abstract: With huge data acquisition progresses realized in the past decades and acquisition systems now able to produce high resolution grids and point clouds, the digitization of physical terrains becomes increasingly more precise. Such extreme quantities of generated and modeled data greatly impact computational performances on many levels of high-performance computing (HPC): storage media, memory requir… ▽ More

    Submitted 4 May, 2019; v1 submitted 16 March, 2019; originally announced March 2019.

    MSC Class: 65M50

  16. arXiv:1812.07956  [pdf, other

    math.OC cs.LG

    On Lazy Training in Differentiable Programming

    Authors: Lenaic Chizat, Edouard Oyallon, Francis Bach

    Abstract: In a series of recent theoretical works, it was shown that strongly over-parameterized neural networks trained with gradient-based methods could converge exponentially fast to zero training loss, with their parameters hardly varying. In this work, we show that this "lazy training" phenomenon is not specific to over-parameterized neural networks, and is due to a choice of scaling, often implicit, t… ▽ More

    Submitted 7 January, 2020; v1 submitted 19 December, 2018; originally announced December 2018.

    Journal ref: Advances in Neural Information Processing Systems (NeurIPS), Dec 2019, Vancouver, Canada

  17. arXiv:1805.09545  [pdf, other

    math.OC cs.NE stat.ML

    On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport

    Authors: Lenaic Chizat, Francis Bach

    Abstract: Many tasks in machine learning and signal processing can be solved by minimizing a convex function of a measure. This includes sparse spikes deconvolution or training a neural network with a single hidden layer. For these problems, we study a simple minimization method: the unknown measure is discretized into a mixture of particles and a continuous-time gradient descent is performed on their weigh… ▽ More

    Submitted 29 October, 2018; v1 submitted 24 May, 2018; originally announced May 2018.

    Comments: Advances in Neural Information Processing Systems (NIPS), Dec 2018, Montréal, Canada

  18. arXiv:1612.08731  [pdf, other

    cs.GR

    Quantum Optimal Transport for Tensor Field Processing

    Authors: Gabriel Peyré, Lenaïc Chizat, François-Xavier Vialard, Justin Solomon

    Abstract: This article introduces a new notion of optimal transport (OT) between tensor fields, which are measures whose values are positive semidefinite (PSD) matrices. This "quantum" formulation of OT (Q-OT) corresponds to a relaxed version of the classical Kantorovich transport problem, where the fidelity between the input PSD-valued measures is captured using the geometry of the Von-Neumann quantum entr… ▽ More

    Submitted 23 July, 2017; v1 submitted 20 December, 2016; originally announced December 2016.