Skip to main content

Showing 1–6 of 6 results for author: Guo, Z C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.17420  [pdf, other

    cs.LG

    Survival of the Fittest Representation: A Case Study with Modular Addition

    Authors: Xiaoman Delores Ding, Zifan Carl Guo, Eric J. Michaud, Ziming Liu, Max Tegmark

    Abstract: When a neural network can learn multiple distinct algorithms to solve a task, how does it "choose" between them during training? To approach this question, we take inspiration from ecology: when multiple species coexist, they eventually reach an equilibrium where some survive while others die out. Analogously, we suggest that a neural network at initialization contains many solutions (representati… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  2. arXiv:2403.05812  [pdf, other

    cs.CL cs.AI

    Algorithmic progress in language models

    Authors: Anson Ho, Tamay Besiroglu, Ege Erdil, David Owen, Robi Rahman, Zifan Carl Guo, David Atkinson, Neil Thompson, Jaime Sevilla

    Abstract: We investigate the rate at which algorithms for pre-training language models have improved since the advent of deep learning. Using a dataset of over 200 language model evaluations on Wikitext and Penn Treebank spanning 2012-2023, we find that the compute required to reach a set performance threshold has halved approximately every 8 months, with a 95% confidence interval of around 5 to 14 months,… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

  3. arXiv:2402.05110  [pdf, other

    cs.LG

    Opening the AI black box: program synthesis via mechanistic interpretability

    Authors: Eric J. Michaud, Isaac Liao, Vedang Lad, Ziming Liu, Anish Mudide, Chloe Loughridge, Zifan Carl Guo, Tara Rezaei Kheirkhah, Mateja Vukelić, Max Tegmark

    Abstract: We present MIPS, a novel method for program synthesis based on automated mechanistic interpretability of neural networks trained to perform the desired task, auto-distilling the learned algorithm into Python code. We test MIPS on a benchmark of 62 algorithmic tasks that can be learned by an RNN and find it highly complementary to GPT-4: MIPS solves 32 of them, including 13 that are not solved by G… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

    Comments: 24 pages

  4. arXiv:2401.12181  [pdf, other

    cs.LG cs.AI cs.CL

    Universal Neurons in GPT2 Language Models

    Authors: Wes Gurnee, Theo Horsley, Zifan Carl Guo, Tara Rezaei Kheirkhah, Qinyi Sun, Will Hathaway, Neel Nanda, Dimitris Bertsimas

    Abstract: A basic question within the emerging field of mechanistic interpretability is the degree to which neural networks learn the same underlying mechanisms. In other words, are neural mechanisms universal across different models? In this work, we study the universality of individual neurons across GPT2 models trained from different initial random seeds, motivated by the hypothesis that universal neuron… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

  5. arXiv:2309.04375  [pdf, other

    cs.RO

    Data-Driven Batch Localization and SLAM Using Koopman Linearization

    Authors: Zi Cong Guo, Frederike Dümbgen, James R. Forbes, Timothy D. Barfoot

    Abstract: We present a framework for model-free batch localization and SLAM. We use lifting functions to map a control-affine system into a high-dimensional space, where both the process model and the measurement model are rendered bilinear. During training, we solve a least-squares problem using groundtruth data to compute the high-dimensional model matrices associated with the lifted system purely from da… ▽ More

    Submitted 8 September, 2023; originally announced September 2023.

    Comments: Submitted to IEEE T-RO. 18 pages, 9 figures, 1 table

  6. arXiv:2109.07000  [pdf, other

    cs.RO

    Koopman Linearization for Data-Driven Batch State Estimation of Control-Affine Systems

    Authors: Zi Cong Guo, Vassili Korotkine, James R. Forbes, Timothy D. Barfoot

    Abstract: We present the Koopman State Estimator (KoopSE), a framework for model-free batch state estimation of control-affine systems that makes no linearization assumptions, requires no problem-specific feature selections, and has an inference computational cost that is independent of the number of training points. We lift the original nonlinear system into a higher-dimensional Reproducing Kernel Hilbert… ▽ More

    Submitted 3 December, 2021; v1 submitted 14 September, 2021; originally announced September 2021.

    Comments: Accepted for publication in RA-L 2022. 9 pages, 5 figures, 1 table. Note: version submitted to RA-L did not include the Appendix section present in this arXiv version