-
Survival of the Fittest Representation: A Case Study with Modular Addition
Authors:
Xiaoman Delores Ding,
Zifan Carl Guo,
Eric J. Michaud,
Ziming Liu,
Max Tegmark
Abstract:
When a neural network can learn multiple distinct algorithms to solve a task, how does it "choose" between them during training? To approach this question, we take inspiration from ecology: when multiple species coexist, they eventually reach an equilibrium where some survive while others die out. Analogously, we suggest that a neural network at initialization contains many solutions (representati…
▽ More
When a neural network can learn multiple distinct algorithms to solve a task, how does it "choose" between them during training? To approach this question, we take inspiration from ecology: when multiple species coexist, they eventually reach an equilibrium where some survive while others die out. Analogously, we suggest that a neural network at initialization contains many solutions (representations and algorithms), which compete with each other under pressure from resource constraints, with the "fittest" ultimately prevailing. To investigate this Survival of the Fittest hypothesis, we conduct a case study on neural networks performing modular addition, and find that these networks' multiple circular representations at different Fourier frequencies undergo such competitive dynamics, with only a few circles surviving at the end. We find that the frequencies with high initial signals and gradients, the "fittest," are more likely to survive. By increasing the embedding dimension, we also observe more surviving frequencies. Inspired by the Lotka-Volterra equations describing the dynamics between species, we find that the dynamics of the circles can be nicely characterized by a set of linear differential equations. Our results with modular addition show that it is possible to decompose complicated representations into simpler components, along with their basic interactions, to offer insight on the training dynamics of representations.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
Algorithmic progress in language models
Authors:
Anson Ho,
Tamay Besiroglu,
Ege Erdil,
David Owen,
Robi Rahman,
Zifan Carl Guo,
David Atkinson,
Neil Thompson,
Jaime Sevilla
Abstract:
We investigate the rate at which algorithms for pre-training language models have improved since the advent of deep learning. Using a dataset of over 200 language model evaluations on Wikitext and Penn Treebank spanning 2012-2023, we find that the compute required to reach a set performance threshold has halved approximately every 8 months, with a 95% confidence interval of around 5 to 14 months,…
▽ More
We investigate the rate at which algorithms for pre-training language models have improved since the advent of deep learning. Using a dataset of over 200 language model evaluations on Wikitext and Penn Treebank spanning 2012-2023, we find that the compute required to reach a set performance threshold has halved approximately every 8 months, with a 95% confidence interval of around 5 to 14 months, substantially faster than hardware gains per Moore's Law. We estimate augmented scaling laws, which enable us to quantify algorithmic progress and determine the relative contributions of scaling models versus innovations in training algorithms. Despite the rapid pace of algorithmic progress and the development of new architectures such as the transformer, our analysis reveals that the increase in compute made an even larger contribution to overall performance improvements over this time period. Though limited by noisy benchmark data, our analysis quantifies the rapid progress in language modeling, shedding light on the relative contributions from compute and algorithms.
△ Less
Submitted 9 March, 2024;
originally announced March 2024.
-
Opening the AI black box: program synthesis via mechanistic interpretability
Authors:
Eric J. Michaud,
Isaac Liao,
Vedang Lad,
Ziming Liu,
Anish Mudide,
Chloe Loughridge,
Zifan Carl Guo,
Tara Rezaei Kheirkhah,
Mateja Vukelić,
Max Tegmark
Abstract:
We present MIPS, a novel method for program synthesis based on automated mechanistic interpretability of neural networks trained to perform the desired task, auto-distilling the learned algorithm into Python code. We test MIPS on a benchmark of 62 algorithmic tasks that can be learned by an RNN and find it highly complementary to GPT-4: MIPS solves 32 of them, including 13 that are not solved by G…
▽ More
We present MIPS, a novel method for program synthesis based on automated mechanistic interpretability of neural networks trained to perform the desired task, auto-distilling the learned algorithm into Python code. We test MIPS on a benchmark of 62 algorithmic tasks that can be learned by an RNN and find it highly complementary to GPT-4: MIPS solves 32 of them, including 13 that are not solved by GPT-4 (which also solves 30). MIPS uses an integer autoencoder to convert the RNN into a finite state machine, then applies Boolean or integer symbolic regression to capture the learned algorithm. As opposed to large language models, this program synthesis technique makes no use of (and is therefore not limited by) human training data such as algorithms and code from GitHub. We discuss opportunities and challenges for scaling up this approach to make machine-learned models more interpretable and trustworthy.
△ Less
Submitted 7 February, 2024;
originally announced February 2024.
-
Universal Neurons in GPT2 Language Models
Authors:
Wes Gurnee,
Theo Horsley,
Zifan Carl Guo,
Tara Rezaei Kheirkhah,
Qinyi Sun,
Will Hathaway,
Neel Nanda,
Dimitris Bertsimas
Abstract:
A basic question within the emerging field of mechanistic interpretability is the degree to which neural networks learn the same underlying mechanisms. In other words, are neural mechanisms universal across different models? In this work, we study the universality of individual neurons across GPT2 models trained from different initial random seeds, motivated by the hypothesis that universal neuron…
▽ More
A basic question within the emerging field of mechanistic interpretability is the degree to which neural networks learn the same underlying mechanisms. In other words, are neural mechanisms universal across different models? In this work, we study the universality of individual neurons across GPT2 models trained from different initial random seeds, motivated by the hypothesis that universal neurons are likely to be interpretable. In particular, we compute pairwise correlations of neuron activations over 100 million tokens for every neuron pair across five different seeds and find that 1-5\% of neurons are universal, that is, pairs of neurons which consistently activate on the same inputs. We then study these universal neurons in detail, finding that they usually have clear interpretations and taxonomize them into a small number of neuron families. We conclude by studying patterns in neuron weights to establish several universal functional roles of neurons in simple circuits: deactivating attention heads, changing the entropy of the next token distribution, and predicting the next token to (not) be within a particular set.
△ Less
Submitted 22 January, 2024;
originally announced January 2024.
-
Data-Driven Batch Localization and SLAM Using Koopman Linearization
Authors:
Zi Cong Guo,
Frederike Dümbgen,
James R. Forbes,
Timothy D. Barfoot
Abstract:
We present a framework for model-free batch localization and SLAM. We use lifting functions to map a control-affine system into a high-dimensional space, where both the process model and the measurement model are rendered bilinear. During training, we solve a least-squares problem using groundtruth data to compute the high-dimensional model matrices associated with the lifted system purely from da…
▽ More
We present a framework for model-free batch localization and SLAM. We use lifting functions to map a control-affine system into a high-dimensional space, where both the process model and the measurement model are rendered bilinear. During training, we solve a least-squares problem using groundtruth data to compute the high-dimensional model matrices associated with the lifted system purely from data. At inference time, we solve for the unknown robot trajectory and landmarks through an optimization problem, where constraints are introduced to keep the solution on the manifold of the lifting functions. The problem is efficiently solved using a sequential quadratic program (SQP), where the complexity of an SQP iteration scales linearly with the number of timesteps. Our algorithms, called Reduced Constrained Koopman Linearization Localization (RCKL-Loc) and Reduced Constrained Koopman Linearization SLAM (RCKL-SLAM), are validated experimentally in simulation and on two datasets: one with an indoor mobile robot equipped with a laser rangefinder that measures range to cylindrical landmarks, and one on a golf cart equipped with RFID range sensors. We compare RCKL-Loc and RCKL-SLAM with classic model-based nonlinear batch estimation. While RCKL-Loc and RCKL-SLAM have similar performance compared to their model-based counterparts, they outperform the model-based approaches when the prior model is imperfect, showing the potential benefit of the proposed data-driven technique.
△ Less
Submitted 8 September, 2023;
originally announced September 2023.
-
Koopman Linearization for Data-Driven Batch State Estimation of Control-Affine Systems
Authors:
Zi Cong Guo,
Vassili Korotkine,
James R. Forbes,
Timothy D. Barfoot
Abstract:
We present the Koopman State Estimator (KoopSE), a framework for model-free batch state estimation of control-affine systems that makes no linearization assumptions, requires no problem-specific feature selections, and has an inference computational cost that is independent of the number of training points. We lift the original nonlinear system into a higher-dimensional Reproducing Kernel Hilbert…
▽ More
We present the Koopman State Estimator (KoopSE), a framework for model-free batch state estimation of control-affine systems that makes no linearization assumptions, requires no problem-specific feature selections, and has an inference computational cost that is independent of the number of training points. We lift the original nonlinear system into a higher-dimensional Reproducing Kernel Hilbert Space (RKHS), where the system becomes bilinear. The time-invariant model matrices can be learned by solving a least-squares problem on training trajectories. At test time, the system is algebraically manipulated into a linear time-varying system, where standard batch linear state estimation techniques can be used to efficiently compute state means and covariances. Random Fourier Features (RFF) are used to combine the computational efficiency of Koopman-based methods and the generality of kernel-embedding methods. KoopSE is validated experimentally on a localization task involving a mobile robot equipped with ultra-wideband receivers and wheel odometry. KoopSE estimates are more accurate and consistent than the standard model-based extended Rauch-Tung-Striebel (RTS) smoother, despite KoopSE having no prior knowledge of the system's motion or measurement models.
△ Less
Submitted 3 December, 2021; v1 submitted 14 September, 2021;
originally announced September 2021.