Skip to main content

Showing 1–13 of 13 results for author: Dandi, Y

Searching in archive stat. Search in all archives.
.
  1. arXiv:2406.02157  [pdf, other

    stat.ML cs.LG

    Online Learning and Information Exponents: On The Importance of Batch size, and Time/Complexity Tradeoffs

    Authors: Luca Arnaboldi, Yatin Dandi, Florent Krzakala, Bruno Loureiro, Luca Pesce, Ludovic Stephan

    Abstract: We study the impact of the batch size $n_b$ on the iteration time $T$ of training two-layer neural networks with one-pass stochastic gradient descent (SGD) on multi-index target functions of isotropic covariates. We characterize the optimal batch size minimizing the iteration time as a function of the hardness of the target, as characterized by the information exponents. We show that performing gr… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  2. arXiv:2405.15459  [pdf, other

    stat.ML cs.LG

    Repetita Iuvant: Data Repetition Allows SGD to Learn High-Dimensional Multi-Index Functions

    Authors: Luca Arnaboldi, Yatin Dandi, Florent Krzakala, Luca Pesce, Ludovic Stephan

    Abstract: Neural networks can identify low-dimensional relevant structures within high-dimensional noisy data, yet our mathematical understanding of how they do so remains scarce. Here, we investigate the training dynamics of two-layer shallow neural networks trained with gradient-based algorithms, and discuss how they learn pertinent features in multi-index models, that is target functions with low-dimensi… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  3. arXiv:2402.04980  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Asymptotics of feature learning in two-layer networks after one gradient-step

    Authors: Hugo Cui, Luca Pesce, Yatin Dandi, Florent Krzakala, Yue M. Lu, Lenka Zdeborová, Bruno Loureiro

    Abstract: In this manuscript, we investigate the problem of how two-layer neural networks learn features from data, and improve over the kernel regime, after being trained with a single gradient descent step. Leveraging the insight from (Ba et al., 2022), we model the trained network by a spiked Random Features (sRF) model. Further building on recent progress on Gaussian universality (Dandi et al., 2023), w… ▽ More

    Submitted 4 June, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

  4. arXiv:2402.03220  [pdf, other

    stat.ML cs.LG

    The Benefits of Reusing Batches for Gradient Descent in Two-Layer Networks: Breaking the Curse of Information and Leap Exponents

    Authors: Yatin Dandi, Emanuele Troiani, Luca Arnaboldi, Luca Pesce, Lenka Zdeborová, Florent Krzakala

    Abstract: We investigate the training dynamics of two-layer neural networks when learning multi-index target functions. We focus on multi-pass gradient descent (GD) that reuses the batches multiple times and show that it significantly changes the conclusion about which functions are learnable compared to single-pass gradient descent. In particular, multi-pass GD with finite stepsize is found to overcome the… ▽ More

    Submitted 30 June, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

    Comments: Accepted at the International Conference on Machine Learning (ICML), 2024

  5. arXiv:2309.04877  [pdf, other

    cs.LG stat.ML

    A Gentle Introduction to Gradient-Based Optimization and Variational Inequalities for Machine Learning

    Authors: Neha S. Wadia, Yatin Dandi, Michael I. Jordan

    Abstract: The rapid progress in machine learning in recent years has been based on a highly productive connection to gradient-based optimization. Further progress hinges in part on a shift in focus from pattern recognition to decision-making and multi-agent problems. In these broader settings, new mathematical challenges emerge that involve equilibria and game theory instead of optima. Gradient-based method… ▽ More

    Submitted 26 February, 2024; v1 submitted 9 September, 2023; originally announced September 2023.

    Comments: 36 pages, 7 figures; minor corrections

  6. arXiv:2305.18270  [pdf, other

    stat.ML cs.LG

    How Two-Layer Neural Networks Learn, One (Giant) Step at a Time

    Authors: Yatin Dandi, Florent Krzakala, Bruno Loureiro, Luca Pesce, Ludovic Stephan

    Abstract: We investigate theoretically how the features of a two-layer neural network adapt to the structure of the target function through a few large batch gradient descent steps, leading to improvement in the approximation capacity with respect to the initialization. We compare the influence of batch size and that of multiple (but finitely many) steps. For a single gradient step, a batch of size… ▽ More

    Submitted 15 December, 2023; v1 submitted 29 May, 2023; originally announced May 2023.

  7. arXiv:2302.08933  [pdf, other

    math.ST stat.ML

    Universality laws for Gaussian mixtures in generalized linear models

    Authors: Yatin Dandi, Ludovic Stephan, Florent Krzakala, Bruno Loureiro, Lenka Zdeborová

    Abstract: Let $(x_{i}, y_{i})_{i=1,\dots,n}$ denote independent samples from a general mixture distribution $\sum_{c\in\mathcal{C}}ρ_{c}P_{c}^{x}$, and consider the hypothesis class of generalized linear models $\hat{y} = F(Θ^{\top}x)$. In this work, we investigate the asymptotic joint statistics of the family of generalized linear estimators $(Θ_{1}, \dots, Θ_{M})$ obtained either from (a) minimizing an em… ▽ More

    Submitted 17 February, 2023; originally announced February 2023.

  8. arXiv:2204.06477  [pdf, other

    cs.LG cs.AI math.OC stat.ML

    Data-heterogeneity-aware Mixing for Decentralized Learning

    Authors: Yatin Dandi, Anastasia Koloskova, Martin Jaggi, Sebastian U. Stich

    Abstract: Decentralized learning provides an effective framework to train machine learning models with data distributed over arbitrary communication graphs. However, most existing approaches toward decentralized learning disregard the interaction between data heterogeneity and graph topology. In this paper, we characterize the dependence of convergence on the relationship between the mixing weights of the g… ▽ More

    Submitted 13 April, 2022; originally announced April 2022.

  9. arXiv:2111.03972  [pdf, other

    cs.LG stat.ML

    Understanding Layer-wise Contributions in Deep Neural Networks through Spectral Analysis

    Authors: Yatin Dandi, Arthur Jacot

    Abstract: Spectral analysis is a powerful tool, decomposing any function into simpler parts. In machine learning, Mercer's theorem generalizes this idea, providing for any kernel and input distribution a natural basis of functions of increasing frequency. More recently, several works have extended this analysis to deep neural networks through the framework of Neural Tangent Kernel. In this work, we analyze… ▽ More

    Submitted 7 January, 2022; v1 submitted 6 November, 2021; originally announced November 2021.

  10. arXiv:2106.13897  [pdf, other

    cs.LG stat.ML

    Implicit Gradient Alignment in Distributed and Federated Learning

    Authors: Yatin Dandi, Luis Barba, Martin Jaggi

    Abstract: A major obstacle to achieving global convergence in distributed and federated learning is the misalignment of gradients across clients, or mini-batches due to heterogeneity and stochasticity of the distributed data. In this work, we show that data heterogeneity can in fact be exploited to improve generalization performance through implicit regularization. One way to alleviate the effects of hetero… ▽ More

    Submitted 12 December, 2021; v1 submitted 25 June, 2021; originally announced June 2021.

    Comments: AAAI Conference on Artificial Intelligence, 2022

  11. arXiv:2012.02684  [pdf, other

    cs.LG cs.AI stat.ML

    Model-Agnostic Learning to Meta-Learn

    Authors: Arnout Devos, Yatin Dandi

    Abstract: In this paper, we propose a learning algorithm that enables a model to quickly exploit commonalities among related tasks from an unseen task distribution, before quickly adapting to specific tasks from that same distribution. We investigate how learning with different task distributions can first improve adaptability by meta-finetuning on related tasks before improving goal task generalization wit… ▽ More

    Submitted 19 July, 2021; v1 submitted 4 December, 2020; originally announced December 2020.

    Comments: Published in Proceedings of Machine Learning Research, PMLR 148:155-175

  12. arXiv:2006.08089  [pdf, other

    cs.LG cs.CV stat.ML

    Generalized Adversarially Learned Inference

    Authors: Yatin Dandi, Homanga Bharadhwaj, Abhishek Kumar, Piyush Rai

    Abstract: Allowing effective inference of latent vectors while training GANs can greatly increase their applicability in various downstream tasks. Recent approaches, such as ALI and BiGAN frameworks, develop methods of inference of latent variables in GANs by adversarially training an image generator along with an encoder to match two joint distributions of image and latent vector pairs. We generalize these… ▽ More

    Submitted 21 December, 2020; v1 submitted 14 June, 2020; originally announced June 2020.

    Comments: AAAI 2021 (accepted for publication)

  13. arXiv:1912.07991  [pdf, other

    cs.LG cs.CV stat.ML

    Jointly Trained Image and Video Generation using Residual Vectors

    Authors: Yatin Dandi, Aniket Das, Soumye Singhal, Vinay P. Namboodiri, Piyush Rai

    Abstract: In this work, we propose a modeling technique for jointly training image and video generation models by simultaneously learning to map latent variables with a fixed prior onto real images and interpolate over images to generate videos. The proposed approach models the variations in representations using residual vectors encoding the change at each time step over a summary vector for the entire vid… ▽ More

    Submitted 17 December, 2019; originally announced December 2019.

    Comments: Accepted in 2020 Winter Conference on Applications of Computer Vision (WACV '20)