Skip to main content

Showing 1–14 of 14 results for author: Zavatone-Veth, J A

.
  1. arXiv:2405.17181  [pdf, other

    cs.LG cs.CV

    Spectral regularization for adversarially-robust representation learning

    Authors: Sheng Yang, Jacob A. Zavatone-Veth, Cengiz Pehlevan

    Abstract: The vulnerability of neural network classifiers to adversarial attacks is a major obstacle to their deployment in safety-critical applications. Regularization of network parameters during training can be used to improve adversarial robustness and generalization performance. Usually, the network is regularized end-to-end, with parameters at all layers affected by regularization. However, in setting… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: 15 + 15 pages, 8 + 11 figures

  2. arXiv:2405.11751  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Asymptotic theory of in-context learning by linear attention

    Authors: Yue M. Lu, Mary I. Letey, Jacob A. Zavatone-Veth, Anindita Maiti, Cengiz Pehlevan

    Abstract: Transformers have a remarkable ability to learn and execute tasks based on examples provided within the input itself, without explicit prior training. It has been argued that this capability, known as in-context learning (ICL), is a cornerstone of Transformers' success, yet questions about the necessary sample complexity, pretraining task diversity, and context length for successful ICL remain unr… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

    Comments: 20 pages, 5 figures, and supplementary information

  3. arXiv:2405.00592  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Scaling and renormalization in high-dimensional regression

    Authors: Alexander Atanasov, Jacob A. Zavatone-Veth, Cengiz Pehlevan

    Abstract: This paper presents a succinct derivation of the training and generalization performance of a variety of high-dimensional ridge regression models using the basic tools of random matrix theory and free probability. We provide an introduction and review of recent results on these topics, aimed at readers with backgrounds in physics and deep learning. Analytic formulas for the training and generaliza… ▽ More

    Submitted 26 June, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

    Comments: 68 pages, 17 figures

  4. arXiv:2306.04532  [pdf, other

    cs.NE cond-mat.dis-nn cs.LG q-bio.NC stat.ML

    Long Sequence Hopfield Memory

    Authors: Hamza Tahir Chaudhry, Jacob A. Zavatone-Veth, Dmitry Krotov, Cengiz Pehlevan

    Abstract: Sequence memory is an essential attribute of natural and artificial intelligence that enables agents to encode, store, and retrieve complex sequences of stimuli and actions. Computational models of sequence memory have been proposed where recurrent Hopfield-like neural networks are trained with temporally asymmetric Hebbian rules. However, these networks suffer from limited sequence capacity (maxi… ▽ More

    Submitted 2 November, 2023; v1 submitted 7 June, 2023; originally announced June 2023.

    Comments: NeurIPS 2023 Camera-Ready, 41 pages

    Journal ref: Advances in Neural Information Processing Systems 36 (2023)

  5. arXiv:2303.00564  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Learning curves for deep structured Gaussian feature models

    Authors: Jacob A. Zavatone-Veth, Cengiz Pehlevan

    Abstract: In recent years, significant attention in deep learning theory has been devoted to analyzing when models that interpolate their training data can still generalize well to unseen examples. Many insights have been gained from studying models with multiple layers of Gaussian random features, for which one can compute precise generalization asymptotics. However, few works have considered the effect of… ▽ More

    Submitted 23 October, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

    Comments: 14+18 pages, 2+1 figures. NeurIPS 2023 Camera Ready

    Journal ref: Advances in Neural Information Processing Systems 36 (2023)

  6. arXiv:2301.11375  [pdf, other

    cs.LG cond-mat.dis-nn stat.ML

    Neural networks learn to magnify areas near decision boundaries

    Authors: Jacob A. Zavatone-Veth, Sheng Yang, Julian A. Rubinfien, Cengiz Pehlevan

    Abstract: In machine learning, there is a long history of trying to build neural networks that can learn from fewer example data by baking in strong geometric priors. However, it is not always clear a priori what geometric constraints are appropriate for a given task. Here, we consider the possibility that one can uncover useful geometric inductive biases by studying how training molds the Riemannian geomet… ▽ More

    Submitted 14 October, 2023; v1 submitted 26 January, 2023; originally announced January 2023.

    Comments: 93 pages, 48 figures

  7. arXiv:2209.10499  [pdf, other

    cond-mat.dis-nn math.PR

    Replica method for eigenvalues of real Wishart product matrices

    Authors: Jacob A. Zavatone-Veth, Cengiz Pehlevan

    Abstract: We show how the replica method can be used to compute the asymptotic eigenvalue spectrum of a real Wishart product matrix. For unstructured factors, this provides a compact, elementary derivation of a polynomial condition on the Stieltjes transform first proved by Müller [IEEE Trans. Inf. Theory. 48, 2086-2091 (2002)]. We then show how this computation can be extended to ensembles where the factor… ▽ More

    Submitted 20 January, 2023; v1 submitted 21 September, 2022; originally announced September 2022.

    Comments: 50 pages, 5 figures

  8. arXiv:2203.00573  [pdf, other

    cs.LG cond-mat.dis-nn stat.ML

    Contrasting random and learned features in deep Bayesian linear regression

    Authors: Jacob A. Zavatone-Veth, William L. Tong, Cengiz Pehlevan

    Abstract: Understanding how feature learning affects generalization is among the foremost goals of modern deep learning theory. Here, we study how the ability to learn representations affects the generalization performance of a simple class of models: deep Bayesian linear neural networks trained on unstructured Gaussian data. By comparing deep random feature models to deep networks in which all layers are t… ▽ More

    Submitted 16 June, 2022; v1 submitted 1 March, 2022; originally announced March 2022.

    Comments: 35 pages, 7 figures. v2: minor typos corrected and references added; published in PRE

    Journal ref: Physical Review E 105, 064118 (2022)

  9. arXiv:2201.04669  [pdf, ps, other

    cond-mat.dis-nn cs.LG

    On neural network kernels and the storage capacity problem

    Authors: Jacob A. Zavatone-Veth, Cengiz Pehlevan

    Abstract: In this short note, we reify the connection between work on the storage capacity problem in wide two-layer treelike neural networks and the rapidly-growing body of literature on kernel limits of wide neural networks. Concretely, we observe that the "effective order parameter" studied in the statistical mechanics literature is exactly equivalent to the infinite-width Neural Network Gaussian Process… ▽ More

    Submitted 12 January, 2022; originally announced January 2022.

    Comments: 5 pages, no figures

    Journal ref: Neural Computation (2022) 34 (5): 1136-1142

  10. Parallel locomotor control strategies in mice and flies

    Authors: Ana I. Gonçalves, Jacob A. Zavatone-Veth, Megan R. Carey, Damon A. Clark

    Abstract: Our understanding of the neural basis of locomotor behavior can be informed by careful quantification of animal movement. Classical descriptions of legged locomotion have defined discrete locomotor gaits, characterized by distinct patterns of limb movement. Recent technical advances have enabled increasingly detailed characterization of limb kinematics across many species, imposing tighter constra… ▽ More

    Submitted 22 December, 2021; originally announced December 2021.

    Comments: 8 pages; 4 figures

    Journal ref: Current Opinion in Neurobiology, 2022

  11. Depth induces scale-averaging in overparameterized linear Bayesian neural networks

    Authors: Jacob A. Zavatone-Veth, Cengiz Pehlevan

    Abstract: Inference in deep Bayesian neural networks is only fully understood in the infinite-width limit, where the posterior flexibility afforded by increased depth washes out and the posterior predictive collapses to a shallow Gaussian process. Here, we interpret finite deep linear Bayesian neural networks as data-dependent scale mixtures of Gaussian process predictors across output channels. We leverage… ▽ More

    Submitted 23 November, 2021; originally announced November 2021.

    Comments: 8 pages, no figures

    Journal ref: 55th Asilomar Conference on Signals, Systems, and Computers, 2021

  12. arXiv:2106.00651  [pdf, other

    cs.LG cond-mat.dis-nn stat.ML

    Asymptotics of representation learning in finite Bayesian neural networks

    Authors: Jacob A. Zavatone-Veth, Abdulkadir Canatar, Benjamin S. Ruben, Cengiz Pehlevan

    Abstract: Recent works have suggested that finite Bayesian neural networks may sometimes outperform their infinite cousins because finite networks can flexibly adapt their internal representations. However, our theoretical understanding of how the learned hidden layer representations of finite networks differ from the fixed representations of infinite networks remains incomplete. Perturbative finite-width c… ▽ More

    Submitted 8 February, 2022; v1 submitted 1 June, 2021; originally announced June 2021.

    Comments: 13+28 pages, 4 figures; v3: extensive revision with improved exposition and new section on CNNs, accepted to NeurIPS 2021; v4: minor updates to supplement; v5: post-NeurIPS update, minor typos fixed

    Journal ref: Advances in Neural Information Processing Systems 34 (2021); JSTAT 114008 (2022)

  13. arXiv:2104.11734  [pdf, other

    cs.LG cond-mat.dis-nn stat.ML

    Exact marginal prior distributions of finite Bayesian neural networks

    Authors: Jacob A. Zavatone-Veth, Cengiz Pehlevan

    Abstract: Bayesian neural networks are theoretically well-understood only in the infinite-width limit, where Gaussian priors over network weights yield Gaussian priors over network outputs. Recent work has suggested that finite Bayesian networks may outperform their infinite counterparts, but their non-Gaussian function space priors have been characterized only though perturbative approaches. Here, we deriv… ▽ More

    Submitted 18 October, 2021; v1 submitted 23 April, 2021; originally announced April 2021.

    Comments: 12+9 pages, 4 figures; v3: Accepted as NeurIPS 2021 Spotlight

    Journal ref: Advances in Neural Information Processing Systems 34 (2021)

  14. arXiv:2007.11136  [pdf, other

    cond-mat.dis-nn cs.LG stat.ML

    Activation function dependence of the storage capacity of treelike neural networks

    Authors: Jacob A. Zavatone-Veth, Cengiz Pehlevan

    Abstract: The expressive power of artificial neural networks crucially depends on the nonlinearity of their activation functions. Though a wide variety of nonlinear activation functions have been proposed for use in artificial neural networks, a detailed understanding of their role in determining the expressive power of a network has not emerged. Here, we study how activation functions affect the storage ca… ▽ More

    Submitted 4 February, 2021; v1 submitted 21 July, 2020; originally announced July 2020.

    Comments: 5+23 pages, 2+4 figures. v3: accepted for publication as a Letter in Physical Review E

    Journal ref: Phys. Rev. E 103, 020301 (2021)