Skip to main content

Showing 1–9 of 9 results for author: McGrath, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2310.16410  [pdf, other

    cs.AI cs.HC cs.LG stat.ML

    Bridging the Human-AI Knowledge Gap: Concept Discovery and Transfer in AlphaZero

    Authors: Lisa Schut, Nenad Tomasev, Tom McGrath, Demis Hassabis, Ulrich Paquet, Been Kim

    Abstract: Artificial Intelligence (AI) systems have made remarkable progress, attaining super-human performance across various domains. This presents us with an opportunity to further human knowledge and improve human expert performance by leveraging the hidden knowledge encoded within these highly performant AI systems. Yet, this knowledge is often hard to extract, and may be hard to understand or learn fr… ▽ More

    Submitted 25 October, 2023; originally announced October 2023.

    Comments: 61 pages, 29 figures

  2. arXiv:2310.04625  [pdf, other

    cs.LG cs.AI cs.CL

    Copy Suppression: Comprehensively Understanding an Attention Head

    Authors: Callum McDougall, Arthur Conmy, Cody Rushing, Thomas McGrath, Neel Nanda

    Abstract: We present a single attention head in GPT-2 Small that has one main role across the entire training distribution. If components in earlier layers predict a certain token, and this token appears earlier in the context, the head suppresses it: we call this copy suppression. Attention Head 10.7 (L10H7) suppresses naive copying behavior which improves overall model calibration. This explains why multi… ▽ More

    Submitted 6 October, 2023; originally announced October 2023.

  3. arXiv:2307.15771  [pdf, other

    cs.LG cs.AI cs.CL

    The Hydra Effect: Emergent Self-repair in Language Model Computations

    Authors: Thomas McGrath, Matthew Rahtz, Janos Kramar, Vladimir Mikulik, Shane Legg

    Abstract: We investigate the internal structure of language model computations using causal analysis and demonstrate two motifs: (1) a form of adaptive computation where ablations of one attention layer of a language model cause another layer to compensate (which we term the Hydra effect) and (2) a counterbalancing function of late MLP layers that act to downregulate the maximum-likelihood token. Our ablati… ▽ More

    Submitted 28 July, 2023; originally announced July 2023.

  4. arXiv:2301.05062  [pdf, other

    cs.LG cs.AI stat.ML

    Tracr: Compiled Transformers as a Laboratory for Interpretability

    Authors: David Lindner, János Kramár, Sebastian Farquhar, Matthew Rahtz, Thomas McGrath, Vladimir Mikulik

    Abstract: We show how to "compile" human-readable programs into standard decoder-only transformer models. Our compiler, Tracr, generates models with known structure. This structure can be used to design experiments. For example, we use it to study "superposition" in transformers that execute multi-step algorithms. Additionally, the known structure of Tracr-compiled models can serve as ground-truth for evalu… ▽ More

    Submitted 3 November, 2023; v1 submitted 12 January, 2023; originally announced January 2023.

    Comments: Presented at NeurIPS 2023 (Spotlight)

  5. Acquisition of Chess Knowledge in AlphaZero

    Authors: Thomas McGrath, Andrei Kapishnikov, Nenad Tomašev, Adam Pearce, Demis Hassabis, Been Kim, Ulrich Paquet, Vladimir Kramnik

    Abstract: What is learned by sophisticated neural network agents such as AlphaZero? This question is of both scientific and practical interest. If the representations of strong neural networks bear no resemblance to human concepts, our ability to understand faithful explanations of their decisions will be restricted, ultimately limiting what we can achieve with neural network interpretability. In this work… ▽ More

    Submitted 18 August, 2022; v1 submitted 17 November, 2021; originally announced November 2021.

    Comments: 69 pages, 44 figures

  6. arXiv:2103.03938  [pdf, other

    cs.AI cs.LG

    Causal Analysis of Agent Behavior for AI Safety

    Authors: Grégoire Déletang, Jordi Grau-Moya, Miljan Martic, Tim Genewein, Tom McGrath, Vladimir Mikulik, Markus Kunesch, Shane Legg, Pedro A. Ortega

    Abstract: As machine learning systems become more powerful they also become increasingly unpredictable and opaque. Yet, finding human-understandable explanations of how they work is essential for their safe deployment. This technical report illustrates a methodology for investigating the causal mechanisms that drive the behaviour of artificial agents. Six use cases are covered, each addressing a typical que… ▽ More

    Submitted 5 March, 2021; originally announced March 2021.

    Comments: 16 pages, 16 figures, 6 tables

  7. arXiv:2010.12237  [pdf, other

    cs.AI cs.LG

    Algorithms for Causal Reasoning in Probability Trees

    Authors: Tim Genewein, Tom McGrath, Grégoire Déletang, Vladimir Mikulik, Miljan Martic, Shane Legg, Pedro A. Ortega

    Abstract: Probability trees are one of the simplest models of causal generative processes. They possess clean semantics and -- unlike causal Bayesian networks -- they can represent context-specific causal dependencies, which are necessary for e.g. causal induction. Yet, they have received little attention from the AI and ML community. Here we present concrete algorithms for causal reasoning in discrete prob… ▽ More

    Submitted 11 November, 2020; v1 submitted 23 October, 2020; originally announced October 2020.

    Comments: (2nd version with correction to algorithm) 11 pages, 8 figures, 5 algorithms. A companion Colaboratory tutorial is available at https://github.com/deepmind/deepmind-research/tree/master/causal_reasoning

  8. arXiv:2010.11223  [pdf, other

    cs.AI cs.LG cs.NE

    Meta-trained agents implement Bayes-optimal agents

    Authors: Vladimir Mikulik, Grégoire Delétang, Tom McGrath, Tim Genewein, Miljan Martic, Shane Legg, Pedro A. Ortega

    Abstract: Memory-based meta-learning is a powerful technique to build agents that adapt fast to any task within a target distribution. A previous theoretical study has argued that this remarkable performance is because the meta-training protocol incentivises agents to behave Bayes-optimally. We empirically investigate this claim on a number of prediction and bandit tasks. Inspired by ideas from theoretical… ▽ More

    Submitted 21 October, 2020; originally announced October 2020.

    Comments: Published at 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada

  9. arXiv:1905.03030  [pdf, other

    cs.LG cs.AI stat.ML

    Meta-learning of Sequential Strategies

    Authors: Pedro A. Ortega, Jane X. Wang, Mark Rowland, Tim Genewein, Zeb Kurth-Nelson, Razvan Pascanu, Nicolas Heess, Joel Veness, Alex Pritzel, Pablo Sprechmann, Siddhant M. Jayakumar, Tom McGrath, Kevin Miller, Mohammad Azar, Ian Osband, Neil Rabinowitz, András György, Silvia Chiappa, Simon Osindero, Yee Whye Teh, Hado van Hasselt, Nando de Freitas, Matthew Botvinick, Shane Legg

    Abstract: In this report we review memory-based meta-learning as a tool for building sample-efficient strategies that learn from past experience to adapt to any task within a target class. Our goal is to equip the reader with the conceptual foundations of this tool for building new, scalable agents that operate on broad domains. To do so, we present basic algorithmic templates for building near-optimal pred… ▽ More

    Submitted 18 July, 2019; v1 submitted 8 May, 2019; originally announced May 2019.

    Comments: DeepMind Technical Report (15 pages, 6 figures). Version V1.1