Skip to main content

Showing 1–24 of 24 results for author: Adlam, B

.
  1. arXiv:2404.12481  [pdf, other

    stat.ML cs.LG

    Understanding Optimal Feature Transfer via a Fine-Grained Bias-Variance Analysis

    Authors: Yufan Li, Subhabrata Sen, Ben Adlam

    Abstract: In the transfer learning paradigm models learn useful representations (or features) during a data-rich pretraining stage, and then use the pretrained representation to improve model performance on data-scarce downstream tasks. In this work, we explore transfer learning with the goal of optimizing downstream performance. We introduce a simple linear model that takes as input an arbitrary pretrained… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  2. arXiv:2312.06585  [pdf, other

    cs.LG

    Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models

    Authors: Avi Singh, John D. Co-Reyes, Rishabh Agarwal, Ankesh Anand, Piyush Patil, Xavier Garcia, Peter J. Liu, James Harrison, Jaehoon Lee, Kelvin Xu, Aaron Parisi, Abhishek Kumar, Alex Alemi, Alex Rizkowsky, Azade Nova, Ben Adlam, Bernd Bohnet, Gamaleldin Elsayed, Hanie Sedghi, Igor Mordatch, Isabelle Simpson, Izzeddin Gur, Jasper Snoek, Jeffrey Pennington, Jiri Hron , et al. (16 additional authors not shown)

    Abstract: Fine-tuning language models~(LMs) on human-generated data remains a prevalent practice. However, the performance of such models is often limited by the quantity and diversity of high-quality human data. In this paper, we explore whether we can go beyond human data on tasks where we have access to scalar feedback, for example, on math problems where one can verify correctness. To do so, we investig… ▽ More

    Submitted 17 April, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

    Comments: Accepted to TMLR. Camera-ready version. First three authors contributed equally

  3. arXiv:2311.07587  [pdf, other

    cs.CL cs.AI cs.CY cs.LG

    Frontier Language Models are not Robust to Adversarial Arithmetic, or "What do I need to say so you agree 2+2=5?

    Authors: C. Daniel Freeman, Laura Culp, Aaron Parisi, Maxwell L Bileschi, Gamaleldin F Elsayed, Alex Rizkowsky, Isabelle Simpson, Alex Alemi, Azade Nova, Ben Adlam, Bernd Bohnet, Gaurav Mishra, Hanie Sedghi, Igor Mordatch, Izzeddin Gur, Jaehoon Lee, JD Co-Reyes, Jeffrey Pennington, Kelvin Xu, Kevin Swersky, Kshiteej Mahajan, Lechao Xiao, Rosanne Liu, Simon Kornblith, Noah Constant , et al. (5 additional authors not shown)

    Abstract: We introduce and study the problem of adversarial arithmetic, which provides a simple yet challenging testbed for language model alignment. This problem is comprised of arithmetic questions posed in natural language, with an arbitrary adversarial string inserted before the question is complete. Even in the simple setting of 1-digit addition problems, it is easy to find adversarial prompts that mak… ▽ More

    Submitted 15 November, 2023; v1 submitted 8 November, 2023; originally announced November 2023.

  4. arXiv:2309.14322  [pdf, other

    cs.LG

    Small-scale proxies for large-scale Transformer training instabilities

    Authors: Mitchell Wortsman, Peter J. Liu, Lechao Xiao, Katie Everett, Alex Alemi, Ben Adlam, John D. Co-Reyes, Izzeddin Gur, Abhishek Kumar, Roman Novak, Jeffrey Pennington, Jascha Sohl-dickstein, Kelvin Xu, Jaehoon Lee, Justin Gilmer, Simon Kornblith

    Abstract: Teams that have trained large Transformer-based models have reported training instabilities at large scale that did not appear when training with the same hyperparameters at smaller scales. Although the causes of such instabilities are of scientific interest, the amount of resources required to reproduce them has made investigation difficult. In this work, we seek ways to reproduce and study train… ▽ More

    Submitted 16 October, 2023; v1 submitted 25 September, 2023; originally announced September 2023.

  5. arXiv:2303.05420  [pdf, other

    stat.ML cs.CV cs.LG

    Kernel Regression with Infinite-Width Neural Networks on Millions of Examples

    Authors: Ben Adlam, Jaehoon Lee, Shreyas Padhy, Zachary Nado, Jasper Snoek

    Abstract: Neural kernels have drastically increased performance on diverse and nonstandard data modalities but require significantly more compute, which previously limited their application to smaller datasets. In this work, we address this by massively parallelizing their computation across many GPUs. We combine this with a distributed, preconditioned conjugate gradients algorithm to enable kernel regressi… ▽ More

    Submitted 9 March, 2023; originally announced March 2023.

  6. arXiv:2206.10566  [pdf, other

    stat.ML cs.LG

    Ensembling over Classifiers: a Bias-Variance Perspective

    Authors: Neha Gupta, Jamie Smith, Ben Adlam, Zelda Mariet

    Abstract: Ensembles are a straightforward, remarkably effective method for improving the accuracy,calibration, and robustness of models on classification tasks; yet, the reasons that underlie their success remain an active area of research. We build upon the extension to the bias-variance decomposition by Pfau (2013) in order to gain crucial insights into the behavior of ensembles of classifiers. Introducin… ▽ More

    Submitted 21 June, 2022; originally announced June 2022.

  7. arXiv:2206.07252  [pdf, other

    stat.ML cs.LG math.OC math.PR math.ST

    Implicit Regularization or Implicit Conditioning? Exact Risk Trajectories of SGD in High Dimensions

    Authors: Courtney Paquette, Elliot Paquette, Ben Adlam, Jeffrey Pennington

    Abstract: Stochastic gradient descent (SGD) is a pillar of modern machine learning, serving as the go-to optimization algorithm for a diverse array of problems. While the empirical success of SGD is often attributed to its computational efficiency and favorable generalization behavior, neither effect is well understood and disentangling them remains an open problem. Even in the simple setting of convex quad… ▽ More

    Submitted 14 June, 2022; originally announced June 2022.

    Comments: arXiv admin note: text overlap with arXiv:2205.07069

  8. arXiv:2205.07069  [pdf, other

    math.ST math.OC math.PR stat.ML

    Homogenization of SGD in high-dimensions: Exact dynamics and generalization properties

    Authors: Courtney Paquette, Elliot Paquette, Ben Adlam, Jeffrey Pennington

    Abstract: We develop a stochastic differential equation, called homogenized SGD, for analyzing the dynamics of stochastic gradient descent (SGD) on a high-dimensional random least squares problem with $\ell^2$-regularization. We show that homogenized SGD is the high-dimensional equivalence of SGD -- for any quadratic statistic (e.g., population risk with quadratic loss), the statistic under the iterates of… ▽ More

    Submitted 14 May, 2022; originally announced May 2022.

  9. arXiv:2202.04167  [pdf, other

    stat.ML cs.LG math.PR

    Understanding the bias-variance tradeoff of Bregman divergences

    Authors: Ben Adlam, Neha Gupta, Zelda Mariet, Jamie Smith

    Abstract: This paper builds upon the work of Pfau (2013), which generalized the bias variance tradeoff to any Bregman divergence loss function. Pfau (2013) showed that for Bregman divergences, the bias and variances are defined with respect to a central label, defined as the mean of the label variable, and a central prediction, of a more complex form. We show that, similarly to the label, the central predic… ▽ More

    Submitted 9 February, 2022; v1 submitted 8 February, 2022; originally announced February 2022.

  10. arXiv:2111.08234  [pdf, other

    stat.ML cs.LG

    Covariate Shift in High-Dimensional Random Feature Regression

    Authors: Nilesh Tripuraneni, Ben Adlam, Jeffrey Pennington

    Abstract: A significant obstacle in the development of robust machine learning models is covariate shift, a form of distribution shift that occurs when the input distributions of the training and test sets differ while the conditional label distributions remain the same. Despite the prevalence of covariate shift in real-world applications, a theoretical understanding in the context of modern machine learnin… ▽ More

    Submitted 16 November, 2021; originally announced November 2021.

    Comments: 107 pages, 10 figures

  11. arXiv:2011.03395  [pdf, other

    cs.LG stat.ML

    Underspecification Presents Challenges for Credibility in Modern Machine Learning

    Authors: Alexander D'Amour, Katherine Heller, Dan Moldovan, Ben Adlam, Babak Alipanahi, Alex Beutel, Christina Chen, Jonathan Deaton, Jacob Eisenstein, Matthew D. Hoffman, Farhad Hormozdiari, Neil Houlsby, Shaobo Hou, Ghassen Jerfel, Alan Karthikesalingam, Mario Lucic, Yian Ma, Cory McLean, Diana Mincu, Akinori Mitani, Andrea Montanari, Zachary Nado, Vivek Natarajan, Christopher Nielson, Thomas F. Osborne , et al. (15 additional authors not shown)

    Abstract: ML models often exhibit unexpectedly poor behavior when they are deployed in real-world domains. We identify underspecification as a key reason for these failures. An ML pipeline is underspecified when it can return many predictors with equivalently strong held-out performance in the training domain. Underspecification is common in modern ML pipelines, such as those based on deep learning. Predict… ▽ More

    Submitted 24 November, 2020; v1 submitted 6 November, 2020; originally announced November 2020.

    Comments: Updates: Updated statistical analysis in Section 6; Additional citations

  12. arXiv:2011.03321  [pdf, other

    stat.ML cs.LG

    Understanding Double Descent Requires a Fine-Grained Bias-Variance Decomposition

    Authors: Ben Adlam, Jeffrey Pennington

    Abstract: Classical learning theory suggests that the optimal generalization performance of a machine learning model should occur at an intermediate model complexity, with simpler models exhibiting high bias and more complex models exhibiting high variance of the predictive function. However, such a simple trade-off does not adequately describe deep learning models that simultaneously attain low bias and va… ▽ More

    Submitted 4 November, 2020; originally announced November 2020.

    Comments: Published as a conference paper in the Proceedings of the Thirty-fourth Conference on Neural Information Processing Systems; 54 pages; 5 figures. arXiv admin note: text overlap with arXiv:2008.06786

  13. arXiv:2010.07355  [pdf, other

    stat.ML cs.LG

    Exploring the Uncertainty Properties of Neural Networks' Implicit Priors in the Infinite-Width Limit

    Authors: Ben Adlam, Jaehoon Lee, Lechao Xiao, Jeffrey Pennington, Jasper Snoek

    Abstract: Modern deep learning models have achieved great success in predictive accuracy for many data modalities. However, their application to many real-world tasks is restricted by poor uncertainty estimates, such as overconfidence on out-of-distribution (OOD) data and ungraceful failing under distributional shift. Previous benchmarks have found that ensembles of neural networks (NNs) are typically the b… ▽ More

    Submitted 14 October, 2020; originally announced October 2020.

    Comments: 23 pages, 11 figures

  14. arXiv:2008.06786  [pdf, other

    stat.ML cs.LG

    The Neural Tangent Kernel in High Dimensions: Triple Descent and a Multi-Scale Theory of Generalization

    Authors: Ben Adlam, Jeffrey Pennington

    Abstract: Modern deep learning models employ considerably more parameters than required to fit the training data. Whereas conventional statistical wisdom suggests such models should drastically overfit, in practice these models generalize remarkably well. An emerging paradigm for describing this unexpected behavior is in terms of a \emph{double descent} curve, in which increasing a model's capacity causes i… ▽ More

    Submitted 15 August, 2020; originally announced August 2020.

    Comments: Published as a conference paper in the Proceedings of the 37th International Conference on Machine Learning; 31 pages; 4 figures

  15. arXiv:2008.00029  [pdf, other

    stat.ML cs.LG

    Cold Posteriors and Aleatoric Uncertainty

    Authors: Ben Adlam, Jasper Snoek, Samuel L. Smith

    Abstract: Recent work has observed that one can outperform exact inference in Bayesian neural networks by tuning the "temperature" of the posterior on a validation set (the "cold posterior" effect). To help interpret this phenomenon, we argue that commonly used priors in Bayesian neural networks can significantly overestimate the aleatoric uncertainty in the labels on many classification datasets. This prob… ▽ More

    Submitted 31 July, 2020; originally announced August 2020.

    Comments: 5 pages, 3 figures

    Journal ref: ICML workshop on Uncertainty and Robustness in Deep Learning (2020)

  16. arXiv:2007.15801  [pdf, other

    cs.LG stat.ML

    Finite Versus Infinite Neural Networks: an Empirical Study

    Authors: Jaehoon Lee, Samuel S. Schoenholz, Jeffrey Pennington, Ben Adlam, Lechao Xiao, Roman Novak, Jascha Sohl-Dickstein

    Abstract: We perform a careful, thorough, and large scale empirical study of the correspondence between wide neural networks and kernel methods. By doing so, we resolve a variety of open questions related to the study of infinitely wide neural networks. Our experimental results include: kernel methods outperform fully-connected finite-width networks, but underperform convolutional finite width networks; neu… ▽ More

    Submitted 8 September, 2020; v1 submitted 30 July, 2020; originally announced July 2020.

    Comments: 17+11 pages; v2 references added, minor improvements

  17. arXiv:2006.14599  [pdf, other

    cs.LG cs.NE stat.ML

    The Surprising Simplicity of the Early-Time Learning Dynamics of Neural Networks

    Authors: Wei Hu, Lechao Xiao, Ben Adlam, Jeffrey Pennington

    Abstract: Modern neural networks are often regarded as complex black-box functions whose behavior is difficult to understand owing to their nonlinear dependence on the data and the nonconvexity in their loss landscapes. In this work, we show that these common perceptions can be completely false in the early phase of learning. In particular, we formally prove that, for a class of well-behaved input distribut… ▽ More

    Submitted 25 June, 2020; originally announced June 2020.

  18. arXiv:1912.00827  [pdf, other

    stat.ML cs.LG

    A Random Matrix Perspective on Mixtures of Nonlinearities for Deep Learning

    Authors: Ben Adlam, Jake Levinson, Jeffrey Pennington

    Abstract: One of the distinguishing characteristics of modern deep learning systems is that they typically employ neural network architectures that utilize enormous numbers of parameters, often in the millions and sometimes even in the billions. While this paradigm has inspired significant research on the properties of large networks, relatively little work has been devoted to the fact that these networks a… ▽ More

    Submitted 12 November, 2021; v1 submitted 2 December, 2019; originally announced December 2019.

  19. arXiv:1910.14137  [pdf, other

    stat.ML cs.LG

    Investigating Under and Overfitting in Wasserstein Generative Adversarial Networks

    Authors: Ben Adlam, Charles Weill, Amol Kapoor

    Abstract: We investigate under and overfitting in Generative Adversarial Networks (GANs), using discriminators unseen by the generator to measure generalization. We find that the model capacity of the discriminator has a significant effect on the generator's model quality, and that the generator's poor performance coincides with the discriminator underfitting. Contrary to our expectations, we find that gene… ▽ More

    Submitted 30 October, 2019; originally announced October 2019.

  20. arXiv:1910.08965  [pdf, other

    cs.LG stat.ML

    Learning GANs and Ensembles Using Discrepancy

    Authors: Ben Adlam, Corinna Cortes, Mehryar Mohri, Ningshan Zhang

    Abstract: Generative adversarial networks (GANs) generate data based on minimizing a divergence between two distributions. The choice of that divergence is therefore critical. We argue that the divergence must take into account the hypothesis set and the loss function used in a subsequent learning task, where the data generated by a GAN serves for training. Taking that structural information into account is… ▽ More

    Submitted 5 November, 2019; v1 submitted 20 October, 2019; originally announced October 2019.

  21. arXiv:1905.00080  [pdf, other

    cs.LG stat.ML

    AdaNet: A Scalable and Flexible Framework for Automatically Learning Ensembles

    Authors: Charles Weill, Javier Gonzalvo, Vitaly Kuznetsov, Scott Yang, Scott Yak, Hanna Mazzawi, Eugen Hotaj, Ghassen Jerfel, Vladimir Macko, Ben Adlam, Mehryar Mohri, Corinna Cortes

    Abstract: AdaNet is a lightweight TensorFlow-based (Abadi et al., 2015) framework for automatically learning high-quality ensembles with minimal expert intervention. Our framework is inspired by the AdaNet algorithm (Cortes et al., 2017) which learns the structure of a neural network as an ensemble of subnetworks. We designed it to: (1) integrate with the existing TensorFlow ecosystem, (2) offer sensible de… ▽ More

    Submitted 30 April, 2019; originally announced May 2019.

  22. Stationary frequencies and mixing times for neutral drift processes with spatial structure

    Authors: Alex McAvoy, Ben Adlam, Benjamin Allen, Martin A. Nowak

    Abstract: We study a general setting of neutral evolution in which the population is of finite, constant size and can have spatial structure. Mutation leads to different genetic types ("traits"), which can be discrete or continuous. Under minimal assumptions, we show that the marginal trait distributions of the evolutionary process, which specify the probability that any given individual has a certain trait… ▽ More

    Submitted 20 September, 2018; originally announced September 2018.

    Comments: 18 pages

    Journal ref: Proceedings of the Royal Society A vol. 474, 20180238 (2018)

  23. arXiv:1509.03368  [pdf, ps, other

    math.PR

    Spectral Statistics of Sparse Random Graphs with a General Degree Distribution

    Authors: Ben Adlam, Ziliang Che

    Abstract: We consider the adjacency matrices of sparse random graphs from the Chung-Lu model, where edges are added independently between the $N$ vertices with varying probabilities $p_{ij}$. The rank of the matrix $(p_{ij})$ is some fixed positive integer. We prove that the distribution of eigenvalues is given by the solution of a functional self-consistent equation. We prove a local law down to the optima… ▽ More

    Submitted 10 September, 2015; originally announced September 2015.

    Comments: 28 pages

    MSC Class: 15B52; 60B20

  24. arXiv:1407.2580  [pdf, other

    q-bio.PE

    Universality of fixation probabilities in randomly structured populations

    Authors: Ben Adlam, Martin A. Nowak

    Abstract: The stage of evolution is the population of reproducing individuals. The structure of the population is know to affect the dynamics and outcome of evolutionary processes, but analytical results for generic random structures have been lacking. The most general result so far, the isothermal theorem, assumes the propensity for change in each position is exactly the same, but realistic biological stru… ▽ More

    Submitted 9 July, 2014; originally announced July 2014.