Skip to main content

Showing 1–18 of 18 results for author: Lu, Y M

Searching in archive stat. Search in all archives.
.
  1. arXiv:2405.11751  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Asymptotic theory of in-context learning by linear attention

    Authors: Yue M. Lu, Mary I. Letey, Jacob A. Zavatone-Veth, Anindita Maiti, Cengiz Pehlevan

    Abstract: Transformers have a remarkable ability to learn and execute tasks based on examples provided within the input itself, without explicit prior training. It has been argued that this capability, known as in-context learning (ICL), is a cornerstone of Transformers' success, yet questions about the necessary sample complexity, pretraining task diversity, and context length for successful ICL remain unr… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

    Comments: 20 pages, 5 figures, and supplementary information

  2. arXiv:2403.08160  [pdf, other

    stat.ML cs.LG math.ST

    Asymptotics of Random Feature Regression Beyond the Linear Scaling Regime

    Authors: Hong Hu, Yue M. Lu, Theodor Misiakiewicz

    Abstract: Recent advances in machine learning have been achieved by using overparametrized models trained until near interpolation of the training data. It was shown, e.g., through the double descent phenomenon, that the number of parameters is a poor proxy for the model complexity and generalization capabilities. This leaves open the question of understanding the impact of parametrization on the performanc… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

    Comments: 106 pages, 8 figures

  3. arXiv:2402.04980  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Asymptotics of feature learning in two-layer networks after one gradient-step

    Authors: Hugo Cui, Luca Pesce, Yatin Dandi, Florent Krzakala, Yue M. Lu, Lenka Zdeborová, Bruno Loureiro

    Abstract: In this manuscript, we investigate the problem of how two-layer neural networks learn features from data, and improve over the kernel regime, after being trained with a single gradient descent step. Leveraging the insight from (Ba et al., 2022), we model the trained network by a spiked Random Features (sRF) model. Further building on recent progress on Gaussian universality (Dandi et al., 2023), w… ▽ More

    Submitted 4 June, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

  4. arXiv:2310.18280  [pdf, ps, other

    math.PR stat.ML

    Universality for the global spectrum of random inner-product kernel matrices in the polynomial regime

    Authors: Sofiia Dubova, Yue M. Lu, Benjamin McKenna, Horng-Tzer Yau

    Abstract: We consider certain large random matrices, called random inner-product kernel matrices, which are essentially given by a nonlinear function $f$ applied entrywise to a sample-covariance matrix, $f(X^TX)$, where $X \in \mathbb{R}^{d \times N}$ is random and normalized in such a way that $f$ typically has order-one arguments. We work in the polynomial regime, where $N \asymp d^\ell$ for some… ▽ More

    Submitted 27 October, 2023; originally announced October 2023.

    Comments: 43 pages, no figures

    MSC Class: 60B20; 15B52

  5. arXiv:2205.14846  [pdf, other

    cs.LG stat.ML

    Precise Learning Curves and Higher-Order Scaling Limits for Dot Product Kernel Regression

    Authors: Lechao Xiao, Hong Hu, Theodor Misiakiewicz, Yue M. Lu, Jeffrey Pennington

    Abstract: As modern machine learning models continue to advance the computational frontier, it has become increasingly important to develop precise estimates for expected performance improvements under different model and data scaling regimes. Currently, theoretical understanding of the learning curves that characterize how the prediction error depends on the number of samples is restricted to either large-… ▽ More

    Submitted 12 June, 2023; v1 submitted 30 May, 2022; originally announced May 2022.

    Comments: 42 pages; 5 + 6 figures

    MSC Class: 68T07

  6. arXiv:2205.06308  [pdf, other

    math.PR stat.ML

    An Equivalence Principle for the Spectrum of Random Inner-Product Kernel Matrices with Polynomial Scalings

    Authors: Yue M. Lu, Horng-Tzer Yau

    Abstract: We investigate random matrices whose entries are obtained by applying a nonlinear kernel function to pairwise inner products between $n$ independent data vectors, drawn uniformly from the unit sphere in $\mathbb{R}^d$. This study is motivated by applications in machine learning and statistics, where these kernel random matrices and their spectral properties play significant roles. We establish the… ▽ More

    Submitted 5 May, 2023; v1 submitted 12 May, 2022; originally announced May 2022.

  7. arXiv:2101.07464  [pdf, other

    cs.IT physics.data-an stat.CO stat.ML

    Householder Dice: A Matrix-Free Algorithm for Simulating Dynamics on Gaussian and Random Orthogonal Ensembles

    Authors: Yue M. Lu

    Abstract: This paper proposes a new algorithm, named Householder Dice (HD), for simulating dynamics on dense random matrix ensembles with translation-invariant properties. Examples include the Gaussian ensemble, the Haar-distributed random orthogonal ensemble, and their complex-valued counterparts. A "direct" approach to the simulation, where one first generates a dense $n \times n$ matrix from the ensemble… ▽ More

    Submitted 21 January, 2021; v1 submitted 18 January, 2021; originally announced January 2021.

  8. arXiv:2101.01918  [pdf, ps, other

    cs.LG stat.ML

    Phase Transitions in Transfer Learning for High-Dimensional Perceptrons

    Authors: Oussama Dhifallah, Yue M. Lu

    Abstract: Transfer learning seeks to improve the generalization performance of a target task by exploiting the knowledge learned from a related source task. Central questions include deciding what information one should transfer and when transfer can be beneficial. The latter question is related to the so-called negative transfer phenomenon, where the transferred source information actually reduces the gene… ▽ More

    Submitted 6 January, 2021; originally announced January 2021.

  9. arXiv:2006.06560  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG math.ST

    Generalization error in high-dimensional perceptrons: Approaching Bayes error with convex optimization

    Authors: Benjamin Aubin, Florent Krzakala, Yue M. Lu, Lenka Zdeborová

    Abstract: We consider a commonly studied supervised classification of a synthetic dataset whose labels are generated by feeding a one-layer neural network with random iid inputs. We study the generalization performances of standard classifiers in the high-dimensional regime where $α=n/d$ is kept finite in the limit of a high dimension $d$ and number of samples $n$. Our contribution is three-fold: First, we… ▽ More

    Submitted 7 November, 2020; v1 submitted 11 June, 2020; originally announced June 2020.

    Comments: 11 pages + 45 pages Supplementary Material / 5 figures, v2 revised and accepted at NeurIPS

    Journal ref: Advances in Neural Information Processing Systems, v33, pages 12199--12210, 2020

  10. arXiv:2002.11544  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG math.ST

    The role of regularization in classification of high-dimensional noisy Gaussian mixture

    Authors: Francesca Mignacco, Florent Krzakala, Yue M. Lu, Lenka Zdeborová

    Abstract: We consider a high-dimensional mixture of two Gaussians in the noisy regime where even an oracle knowing the centers of the clusters misclassifies a small but finite fraction of the points. We provide a rigorous analysis of the generalization error of regularized convex classifiers, including ridge, hinge and logistic regression, in the high-dimensional limit where the number $n$ of samples and th… ▽ More

    Submitted 26 February, 2020; originally announced February 2020.

    Comments: 8 pages + appendix, 6 figures

    Journal ref: International Conference on Machine Learning, ICML 2020

  11. arXiv:1809.09573  [pdf, other

    cs.LG cs.IT eess.SP math.OC math.ST stat.ML

    Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview

    Authors: Yuejie Chi, Yue M. Lu, Yuxin Chen

    Abstract: Substantial progress has been made recently on develo** provably accurate and efficient algorithms for low-rank matrix factorization via nonconvex optimization. While conventional wisdom often takes a dim view of nonconvex optimization algorithms due to their susceptibility to spurious local minima, simple iterative methods such as gradient descent have been remarkably successful in practice. Th… ▽ More

    Submitted 19 September, 2019; v1 submitted 25 September, 2018; originally announced September 2018.

    Comments: Invited overview article

    Journal ref: IEEE Transactions on Signal Processing, vol. 67, no. 20, pp. 5239-5269, October 2019

  12. arXiv:1806.04609  [pdf, other

    stat.ML cs.IT cs.LG

    Streaming PCA and Subspace Tracking: The Missing Data Case

    Authors: Laura Balzano, Yuejie Chi, Yue M. Lu

    Abstract: For many modern applications in science and engineering, data are collected in a streaming fashion carrying time-varying information, and practitioners need to process them with a limited amount of memory and computational resources in a timely manner for decision making. This often is coupled with the missing data problem, such that only a small fraction of data attributes are observed. These com… ▽ More

    Submitted 12 June, 2018; originally announced June 2018.

    Comments: 27 pages, 7 figures, submitted to the Proceedings of IEEE

  13. arXiv:1805.08349  [pdf, other

    cs.LG cond-mat.dis-nn cs.IT stat.ML

    A Solvable High-Dimensional Model of GAN

    Authors: Chuang Wang, Hong Hu, Yue M. Lu

    Abstract: We present a theoretical analysis of the training process for a single-layer GAN fed by high-dimensional input data. The training dynamics of the proposed model at both microscopic and macroscopic scales can be exactly analyzed in the high-dimensional limit. In particular, we prove that the macroscopic quantities measuring the quality of the training process converge to a deterministic process cha… ▽ More

    Submitted 28 October, 2019; v1 submitted 21 May, 2018; originally announced May 2018.

    Comments: Accepted by 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada

  14. arXiv:1805.06834  [pdf, other

    cs.LG cond-mat.dis-nn cs.IT stat.ML

    Subspace Estimation from Incomplete Observations: A High-Dimensional Analysis

    Authors: Chuang Wang, Yonina C. Eldar, Yue M. Lu

    Abstract: We present a high-dimensional analysis of three popular algorithms, namely, Oja's method, GROUSE and PETRELS, for subspace estimation from streaming and highly incomplete observations. We show that, with proper time scaling, the time-varying principal angles between the true subspace and its estimates given by the algorithms converge weakly to deterministic processes when the ambient dimension… ▽ More

    Submitted 17 October, 2018; v1 submitted 17 May, 2018; originally announced May 2018.

    Comments: 26 pages, 6 figures

  15. arXiv:1712.04332  [pdf, other

    cs.LG cs.IT math.PR stat.ML

    Scaling Limit: Exact and Tractable Analysis of Online Learning Algorithms with Applications to Regularized Regression and PCA

    Authors: Chuang Wang, Jonathan Mattingly, Yue M. Lu

    Abstract: We present a framework for analyzing the exact dynamics of a class of online learning algorithms in the high-dimensional scaling limit. Our results are applied to two concrete examples: online regularized linear regression and principal component analysis. As the ambient dimension tends to infinity, and with proper time scaling, we show that the time-varying joint empirical measures of the target… ▽ More

    Submitted 7 December, 2017; originally announced December 2017.

  16. arXiv:1710.05384  [pdf, other

    cs.LG cond-mat.dis-nn stat.ML

    The Scaling Limit of High-Dimensional Online Independent Component Analysis

    Authors: Chuang Wang, Yue M. Lu

    Abstract: We analyze the dynamics of an online algorithm for independent component analysis in the high-dimensional scaling limit. As the ambient dimension tends to infinity, and with proper time scaling, we show that the time-varying joint empirical measure of the target feature vector and the estimates provided by the algorithm will converge weakly to a deterministic measured-valued process that can be ch… ▽ More

    Submitted 6 November, 2017; v1 submitted 15 October, 2017; originally announced October 2017.

    Comments: 10 pages, 3 figures, 31st Conference on Neural Information Processing Systems (NIPS 2017)

  17. arXiv:1702.06435  [pdf, other

    cs.IT stat.ML

    Phase Transitions of Spectral Initialization for High-Dimensional Nonconvex Estimation

    Authors: Yue M. Lu, Gen Li

    Abstract: We study a spectral initialization method that serves a key role in recent work on estimating signals in nonconvex settings. Previous analysis of this method focuses on the phase retrieval problem and provides only performance bounds. In this paper, we consider arbitrary generalized linear sensing models and present a precise asymptotic characterization of the performance of the method in the high… ▽ More

    Submitted 21 July, 2019; v1 submitted 21 February, 2017; originally announced February 2017.

  18. Monte Carlo non local means: Random sampling for large-scale image filtering

    Authors: Stanley H. Chan, Todd Zickler, Yue M. Lu

    Abstract: We propose a randomized version of the non-local means (NLM) algorithm for large-scale image filtering. The new algorithm, called Monte Carlo non-local means (MCNLM), speeds up the classical NLM by computing a small subset of image patch distances, which are randomly selected according to a designed sampling pattern. We make two contributions. First, we analyze the performance of the MCNLM algorit… ▽ More

    Submitted 14 May, 2014; v1 submitted 27 December, 2013; originally announced December 2013.

    Comments: submitted for publication