Showing 1–2 of 2 results for author: Bricken, T

Search v0.5.6 released 2020-02-24

arXiv:2303.11934 [pdf, other]

cs.NE cond-mat.dis-nn cs.AI cs.LG q-bio.NC

Sparse Distributed Memory is a Continual Learner

Authors: Trenton Bricken, Xander Davies, Deepak Singh, Dmitry Krotov, Gabriel Kreiman

Abstract: Continual learning is a problem for artificial neural networks that their biological counterparts are adept at solving. Building on work using Sparse Distributed Memory (SDM) to connect a core neural circuit with the powerful Transformer model, we create a modified Multi-Layered Perceptron (MLP) that is a strong continual learner. We find that every component of our MLP variant translated from bio… ▽ More Continual learning is a problem for artificial neural networks that their biological counterparts are adept at solving. Building on work using Sparse Distributed Memory (SDM) to connect a core neural circuit with the powerful Transformer model, we create a modified Multi-Layered Perceptron (MLP) that is a strong continual learner. We find that every component of our MLP variant translated from biology is necessary for continual learning. Our solution is also free from any memory replay or task information, and introduces novel methods to train sparse networks that may be broadly applicable. △ Less

Submitted 20 March, 2023; originally announced March 2023.

Comments: 9 Pages. ICLR Acceptance

Journal ref: ICLR 2023
arXiv:2111.05498 [pdf, other]

cs.LG cs.AI

Attention Approximates Sparse Distributed Memory

Authors: Trenton Bricken, Cengiz Pehlevan

Abstract: While Attention has come to be an important mechanism in deep learning, there remains limited intuition for why it works so well. Here, we show that Transformer Attention can be closely related under certain data conditions to Kanerva's Sparse Distributed Memory (SDM), a biologically plausible associative memory model. We confirm that these conditions are satisfied in pre-trained GPT2 Transformer… ▽ More While Attention has come to be an important mechanism in deep learning, there remains limited intuition for why it works so well. Here, we show that Transformer Attention can be closely related under certain data conditions to Kanerva's Sparse Distributed Memory (SDM), a biologically plausible associative memory model. We confirm that these conditions are satisfied in pre-trained GPT2 Transformer models. We discuss the implications of the Attention-SDM map and provide new computational and biological interpretations of Attention. △ Less

Submitted 17 January, 2022; v1 submitted 9 November, 2021; originally announced November 2021.

Journal ref: 35th Conference on Neural Information Processing Systems (NeurIPS 2021)

Search v0.5.6 released 2020-02-24