Search | arXiv e-print repository

A Spectral Theory of Neural Prediction and Alignment

Authors: Abdulkadir Canatar, Jenelle Feather, Albert Wakhloo, SueYeon Chung

Abstract: The representations of neural networks are often compared to those of biological systems by performing regression between the neural network responses and those measured from biological systems. Many different state-of-the-art deep neural networks yield similar neural predictions, but it remains unclear how to differentiate among models that perform equally well at predicting neural responses. To… ▽ More The representations of neural networks are often compared to those of biological systems by performing regression between the neural network responses and those measured from biological systems. Many different state-of-the-art deep neural networks yield similar neural predictions, but it remains unclear how to differentiate among models that perform equally well at predicting neural responses. To gain insight into this, we use a recent theoretical framework that relates the generalization error from regression to the spectral properties of the model and the target. We apply this theory to the case of regression between model activations and neural responses and decompose the neural prediction error in terms of the model eigenspectra, alignment of model eigenvectors and neural responses, and the training set size. Using this decomposition, we introduce geometrical measures to interpret the neural prediction error. We test a large number of deep neural networks that predict visual cortical activity and show that there are multiple types of geometries that result in low neural prediction error as measured via regression. The work demonstrates that carefully decomposing representational metrics can provide interpretability of how models are capturing neural activity and points the way towards improved models of neural activity. △ Less

Submitted 11 December, 2023; v1 submitted 22 September, 2023; originally announced September 2023.

Comments: First two authors contributed equally. NeurIPS 2023

arXiv:2206.06686 [pdf, other]

Bandwidth Enables Generalization in Quantum Kernel Models

Authors: Abdulkadir Canatar, Evan Peters, Cengiz Pehlevan, Stefan M. Wild, Ruslan Shaydulin

Abstract: Quantum computers are known to provide speedups over classical state-of-the-art machine learning methods in some specialized settings. For example, quantum kernel methods have been shown to provide an exponential speedup on a learning version of the discrete logarithm problem. Understanding the generalization of quantum models is essential to realizing similar speedups on problems of practical int… ▽ More Quantum computers are known to provide speedups over classical state-of-the-art machine learning methods in some specialized settings. For example, quantum kernel methods have been shown to provide an exponential speedup on a learning version of the discrete logarithm problem. Understanding the generalization of quantum models is essential to realizing similar speedups on problems of practical interest. Recent results demonstrate that generalization is hindered by the exponential size of the quantum feature space. Although these results suggest that quantum models cannot generalize when the number of qubits is large, in this paper we show that these results rely on overly restrictive assumptions. We consider a wider class of models by varying a hyperparameter that we call quantum kernel bandwidth. We analyze the large-qubit limit and provide explicit formulas for the generalization of a quantum model that can be solved in closed form. Specifically, we show that changing the value of the bandwidth can take a model from provably not being able to generalize to any target function to good generalization for well-aligned targets. Our analysis shows how the bandwidth controls the spectrum of the kernel integral operator and thereby the inductive bias of the model. We demonstrate empirically that our theory correctly predicts how varying the bandwidth affects generalization of quantum models on challenging datasets, including those far outside our theoretical assumptions. We discuss the implications of our results for quantum advantage in machine learning. △ Less

Submitted 18 June, 2023; v1 submitted 14 June, 2022; originally announced June 2022.

Comments: Accepted version

arXiv:2106.02261 [pdf, other]

Out-of-Distribution Generalization in Kernel Regression

Authors: Abdulkadir Canatar, Blake Bordelon, Cengiz Pehlevan

Abstract: In real word applications, data generating process for training a machine learning model often differs from what the model encounters in the test stage. Understanding how and whether machine learning models generalize under such distributional shifts have been a theoretical challenge. Here, we study generalization in kernel regression when the training and test distributions are different using me… ▽ More In real word applications, data generating process for training a machine learning model often differs from what the model encounters in the test stage. Understanding how and whether machine learning models generalize under such distributional shifts have been a theoretical challenge. Here, we study generalization in kernel regression when the training and test distributions are different using methods from statistical physics. Using the replica method, we derive an analytical formula for the out-of-distribution generalization error applicable to any kernel and real datasets. We identify an overlap matrix that quantifies the mismatch between distributions for a given kernel as a key determinant of generalization performance under distribution shift. Using our analytical expressions we elucidate various generalization phenomena including possible improvement in generalization when there is a mismatch. We develop procedures for optimizing training and test distributions for a given data budget to find best and worst case generalizations under the shift. We present applications of our theory to real and synthetic datasets and for many kernels. We compare results of our theory applied to Neural Tangent Kernel with simulations of wide networks and show agreement. We analyze linear regression in further depth. △ Less

Submitted 4 February, 2022; v1 submitted 4 June, 2021; originally announced June 2021.

Comments: Eq. (SI.1.59) corrected

Journal ref: Neural Information Processing Systems (NeurIPS), 2021

arXiv:2106.00651 [pdf, other]

doi 10.1088/1742-5468/ac98a6

Asymptotics of representation learning in finite Bayesian neural networks

Authors: Jacob A. Zavatone-Veth, Abdulkadir Canatar, Benjamin S. Ruben, Cengiz Pehlevan

Abstract: Recent works have suggested that finite Bayesian neural networks may sometimes outperform their infinite cousins because finite networks can flexibly adapt their internal representations. However, our theoretical understanding of how the learned hidden layer representations of finite networks differ from the fixed representations of infinite networks remains incomplete. Perturbative finite-width c… ▽ More Recent works have suggested that finite Bayesian neural networks may sometimes outperform their infinite cousins because finite networks can flexibly adapt their internal representations. However, our theoretical understanding of how the learned hidden layer representations of finite networks differ from the fixed representations of infinite networks remains incomplete. Perturbative finite-width corrections to the network prior and posterior have been studied, but the asymptotics of learned features have not been fully characterized. Here, we argue that the leading finite-width corrections to the average feature kernels for any Bayesian network with linear readout and Gaussian likelihood have a largely universal form. We illustrate this explicitly for three tractable network architectures: deep linear fully-connected and convolutional networks, and networks with a single nonlinear hidden layer. Our results begin to elucidate how task-relevant learning signals shape the hidden layer representations of wide Bayesian neural networks. △ Less

Submitted 8 February, 2022; v1 submitted 1 June, 2021; originally announced June 2021.

Comments: 13+28 pages, 4 figures; v3: extensive revision with improved exposition and new section on CNNs, accepted to NeurIPS 2021; v4: minor updates to supplement; v5: post-NeurIPS update, minor typos fixed

Journal ref: Advances in Neural Information Processing Systems 34 (2021); JSTAT 114008 (2022)

arXiv:2006.13198 [pdf, other]

doi 10.1038/s41467-021-23103-1

Spectral Bias and Task-Model Alignment Explain Generalization in Kernel Regression and Infinitely Wide Neural Networks

Authors: Abdulkadir Canatar, Blake Bordelon, Cengiz Pehlevan

Abstract: Generalization beyond a training dataset is a main goal of machine learning, but theoretical understanding of generalization remains an open problem for many models. The need for a new theory is exacerbated by recent observations in deep neural networks where overparameterization leads to better performance, contradicting the conventional wisdom from classical statistics. In this paper, we investi… ▽ More Generalization beyond a training dataset is a main goal of machine learning, but theoretical understanding of generalization remains an open problem for many models. The need for a new theory is exacerbated by recent observations in deep neural networks where overparameterization leads to better performance, contradicting the conventional wisdom from classical statistics. In this paper, we investigate generalization error for kernel regression, which, besides being a popular machine learning method, also includes infinitely overparameterized neural networks trained with gradient descent. We use techniques from statistical mechanics to derive an analytical expression for generalization error applicable to any kernel or data distribution. We present applications of our theory to real and synthetic datasets, and for many kernels including those that arise from training deep neural networks in the infinite-width limit. We elucidate an inductive bias of kernel regression to explain data with "simple functions", which are identified by solving a kernel eigenfunction problem on the data distribution. This notion of simplicity allows us to characterize whether a kernel is compatible with a learning task, facilitating good generalization performance from a small number of training examples. We show that more data may impair generalization when noisy or not expressible by the kernel, leading to non-monotonic learning curves with possibly many peaks. To further understand these phenomena, we turn to the broad class of rotation invariant kernels, which is relevant to training deep neural networks in the infinite-width limit, and present a detailed mathematical analysis of them when data is drawn from a spherically symmetric distribution and the number of input dimensions is large. △ Less

Submitted 4 February, 2022; v1 submitted 23 June, 2020; originally announced June 2020.

Comments: Accepted for publication in Nature Communications. SI Eq.71 is corrected

arXiv:2002.02561 [pdf, other]

Spectrum Dependent Learning Curves in Kernel Regression and Wide Neural Networks

Authors: Blake Bordelon, Abdulkadir Canatar, Cengiz Pehlevan

Abstract: We derive analytical expressions for the generalization performance of kernel regression as a function of the number of training samples using theoretical methods from Gaussian processes and statistical physics. Our expressions apply to wide neural networks due to an equivalence between training them and kernel regression with the Neural Tangent Kernel (NTK). By computing the decomposition of the… ▽ More We derive analytical expressions for the generalization performance of kernel regression as a function of the number of training samples using theoretical methods from Gaussian processes and statistical physics. Our expressions apply to wide neural networks due to an equivalence between training them and kernel regression with the Neural Tangent Kernel (NTK). By computing the decomposition of the total generalization error due to different spectral components of the kernel, we identify a new spectral principle: as the size of the training set grows, kernel machines and neural networks fit successively higher spectral modes of the target function. When data are sampled from a uniform distribution on a high-dimensional hypersphere, dot product kernels, including NTK, exhibit learning stages where different frequency modes of the target function are learned. We verify our theory with simulations on synthetic data and MNIST dataset. △ Less

Submitted 25 February, 2021; v1 submitted 6 February, 2020; originally announced February 2020.

Comments: ICML 2020 Update: Updated section on asymptotics generalization error for power law spectra, finding agreement with Spigler, Geiger, Wyart 2019 arXiv:1905.10843. Added a section on Discrete measures and an MNIST Experiment. Eigenvalue problem can be approximated by Kernel PCA. Typo fixed on 2/25/2021

Journal ref: Proceedings of the 37th International Conference on Machine Learning, PMLR 119:1024-1034, 2020

Showing 1–6 of 6 results for author: Canatar, A