Search | arXiv e-print repository

BP(λ): Online Learning via Synthetic Gradients

Authors: Joseph Pemberton, Rui Ponte Costa

Abstract: Training recurrent neural networks typically relies on backpropagation through time (BPTT). BPTT depends on forward and backward passes to be completed, rendering the network locked to these computations before loss gradients are available. Recently, Jaderberg et al. proposed synthetic gradients to alleviate the need for full BPTT. In their implementation synthetic gradients are learned through a… ▽ More Training recurrent neural networks typically relies on backpropagation through time (BPTT). BPTT depends on forward and backward passes to be completed, rendering the network locked to these computations before loss gradients are available. Recently, Jaderberg et al. proposed synthetic gradients to alleviate the need for full BPTT. In their implementation synthetic gradients are learned through a mixture of backpropagated gradients and bootstrapped synthetic gradients, analogous to the temporal difference (TD) algorithm in Reinforcement Learning (RL). However, as in TD learning, heavy use of bootstrap** can result in bias which leads to poor synthetic gradient estimates. Inspired by the accumulate $\mathrm{TD}(λ)$ in RL, we propose a fully online method for learning synthetic gradients which avoids the use of BPTT altogether: accumulate $BP(λ)$. As in accumulate $\mathrm{TD}(λ)$, we show analytically that accumulate $\mathrm{BP}(λ)$ can control the level of bias by using a mixture of temporal difference errors and recursively defined eligibility traces. We next demonstrate empirically that our model outperforms the original implementation for learning synthetic gradients in a variety of tasks, and is particularly suited for capturing longer timescales. Finally, building on recent work we reflect on accumulate $\mathrm{BP}(λ)$ as a principle for learning in biological circuits. In summary, inspired by RL principles we introduce an algorithm capable of bias-free online learning via synthetic gradients. △ Less

Submitted 13 January, 2024; originally announced January 2024.

Comments: 24 pages, 7 figures

MSC Class: 68T07

arXiv:2206.11769 [pdf, other]

Single-phase deep learning in cortico-cortical networks

Authors: Will Greedy, Heng Wei Zhu, Joseph Pemberton, Jack Mellor, Rui Ponte Costa

Abstract: The error-backpropagation (backprop) algorithm remains the most common solution to the credit assignment problem in artificial neural networks. In neuroscience, it is unclear whether the brain could adopt a similar strategy to correctly modify its synapses. Recent models have attempted to bridge this gap while being consistent with a range of experimental observations. However, these models are ei… ▽ More The error-backpropagation (backprop) algorithm remains the most common solution to the credit assignment problem in artificial neural networks. In neuroscience, it is unclear whether the brain could adopt a similar strategy to correctly modify its synapses. Recent models have attempted to bridge this gap while being consistent with a range of experimental observations. However, these models are either unable to effectively backpropagate error signals across multiple layers or require a multi-phase learning process, neither of which are reminiscent of learning in the brain. Here, we introduce a new model, Bursting Cortico-Cortical Networks (BurstCCN), which solves these issues by integrating known properties of cortical networks namely bursting activity, short-term plasticity (STP) and dendrite-targeting interneurons. BurstCCN relies on burst multiplexing via connection-type-specific STP to propagate backprop-like error signals within deep cortical networks. These error signals are encoded at distal dendrites and induce burst-dependent plasticity as a result of excitatory-inhibitory top-down inputs. First, we demonstrate that our model can effectively backpropagate errors through multiple layers using a single-phase learning process. Next, we show both empirically and analytically that learning in our model approximates backprop-derived gradients. Finally, we demonstrate that our model is capable of learning complex image classification tasks (MNIST and CIFAR-10). Overall, our results suggest that cortical features across sub-cellular, cellular, microcircuit and systems levels jointly underlie single-phase efficient deep learning in the brain. △ Less

Submitted 24 October, 2022; v1 submitted 23 June, 2022; originally announced June 2022.

Comments: Accepted to 36th Conference on Neural Information Processing Systems (NeurIPS 2022). 22 pages, 9 figures, 5 tables

arXiv:2204.02283 [pdf, other]

Lost in Latent Space: Disentangled Models and the Challenge of Combinatorial Generalisation

Authors: Milton L. Montero, Jeffrey S. Bowers, Rui Ponte Costa, Casimir J. H. Ludwig, Gaurav Malhotra

Abstract: Recent research has shown that generative models with highly disentangled representations fail to generalise to unseen combination of generative factor values. These findings contradict earlier research which showed improved performance in out-of-training distribution settings when compared to entangled representations. Additionally, it is not clear if the reported failures are due to (a) encoders… ▽ More Recent research has shown that generative models with highly disentangled representations fail to generalise to unseen combination of generative factor values. These findings contradict earlier research which showed improved performance in out-of-training distribution settings when compared to entangled representations. Additionally, it is not clear if the reported failures are due to (a) encoders failing to map novel combinations to the proper regions of the latent space or (b) novel combinations being mapped correctly but the decoder/downstream process is unable to render the correct output for the unseen combinations. We investigate these alternatives by testing several models on a range of datasets and training settings. We find that (i) when models fail, their encoders also fail to map unseen combinations to correct regions of the latent space and (ii) when models succeed, it is either because the test conditions do not exclude enough examples, or because excluded generative factors determine independent parts of the output image. Based on these results, we argue that to generalise properly, models not only need to capture factors of variation, but also understand how to invert the generative process that was used to generate the data. △ Less

Submitted 14 June, 2024; v1 submitted 5 April, 2022; originally announced April 2022.

Comments: 10 pages and 7 figures in main text (not including references). 27 pages and 31 figures in appendix. Updated to match the camera-ready version

ACM Class: I.2.6; I.2.10; I.4.5; I.4.10; I.5.1; I.5.3

Journal ref: Adv.Neur.Info.Proc.Sys. 35 (2022) 10136-1049

arXiv:2110.11501 [pdf, other]

Cortico-cerebellar networks as decoupling neural interfaces

Authors: Joseph Pemberton, Ellen Boven, Richard Apps, Rui Ponte Costa

Abstract: The brain solves the credit assignment problem remarkably well. For credit to be assigned across neural networks they must, in principle, wait for specific neural computations to finish. How the brain deals with this inherent locking problem has remained unclear. Deep learning methods suffer from similar locking constraints both on the forward and feedback phase. Recently, decoupled neural interfa… ▽ More The brain solves the credit assignment problem remarkably well. For credit to be assigned across neural networks they must, in principle, wait for specific neural computations to finish. How the brain deals with this inherent locking problem has remained unclear. Deep learning methods suffer from similar locking constraints both on the forward and feedback phase. Recently, decoupled neural interfaces (DNIs) were introduced as a solution to the forward and feedback locking problems in deep networks. Here we propose that a specialised brain region, the cerebellum, helps the cerebral cortex solve similar locking problems akin to DNIs. To demonstrate the potential of this framework we introduce a systems-level model in which a recurrent cortical network receives online temporal feedback predictions from a cerebellar module. We test this cortico-cerebellar recurrent neural network (ccRNN) model on a number of sensorimotor (line and digit drawing) and cognitive tasks (pattern recognition and caption generation) that have been shown to be cerebellar-dependent. In all tasks, we observe that ccRNNs facilitates learning while reducing ataxia-like behaviours, consistent with classical experimental observations. Moreover, our model also explains recent behavioural and neuronal observations while making several testable predictions across multiple levels. Overall, our work offers a novel perspective on the cerebellum as a brain-wide decoupling machine for efficient credit assignment and opens a new avenue between deep learning and neuroscience. △ Less

Submitted 28 October, 2021; v1 submitted 21 October, 2021; originally announced October 2021.

Comments: To appear in Advances in Neural Information Processing Systems 35 (NeurIPS 2021); 15 pages and 5 figures in the main manuscript; 8 pages and 8 figures in the supplementary material

arXiv:2109.10034 [pdf, other]

doi 10.1016/j.tins.2021.07.007

Learning offline: memory replay in biological and artificial reinforcement learning

Authors: Emma L. Roscow, Raymond Chua, Rui Ponte Costa, Matt W. Jones, Nathan Lepora

Abstract: Learning to act in an environment to maximise rewards is among the brain's key functions. This process has often been conceptualised within the framework of reinforcement learning, which has also gained prominence in machine learning and artificial intelligence (AI) as a way to optimise decision-making. A common aspect of both biological and machine reinforcement learning is the reactivation of pr… ▽ More Learning to act in an environment to maximise rewards is among the brain's key functions. This process has often been conceptualised within the framework of reinforcement learning, which has also gained prominence in machine learning and artificial intelligence (AI) as a way to optimise decision-making. A common aspect of both biological and machine reinforcement learning is the reactivation of previously experienced episodes, referred to as replay. Replay is important for memory consolidation in biological neural networks, and is key to stabilising learning in deep neural networks. Here, we review recent developments concerning the functional roles of replay in the fields of neuroscience and AI. Complementary progress suggests how replay might support learning processes, including generalisation and continual learning, affording opportunities to transfer knowledge across the two fields to advance the understanding of biological and artificial learning and memory. △ Less

Submitted 21 September, 2021; originally announced September 2021.

Comments: In press at Trends in Neurosciences

arXiv:2105.05382 [pdf]

Current State and Future Directions for Learning in Biological Recurrent Neural Networks: A Perspective Piece

Authors: Luke Y. Prince, Roy Henha Eyono, Ellen Boven, Arna Ghosh, Joe Pemberton, Franz Scherr, Claudia Clopath, Rui Ponte Costa, Wolfgang Maass, Blake A. Richards, Cristina Savin, Katharina Anna Wilmes

Abstract: We provide a brief review of the common assumptions about biological learning with findings from experimental neuroscience and contrast them with the efficiency of gradient-based learning in recurrent neural networks. The key issues discussed in this review include: synaptic plasticity, neural circuits, theory-experiment divide, and objective functions. We conclude with recommendations for both th… ▽ More We provide a brief review of the common assumptions about biological learning with findings from experimental neuroscience and contrast them with the efficiency of gradient-based learning in recurrent neural networks. The key issues discussed in this review include: synaptic plasticity, neural circuits, theory-experiment divide, and objective functions. We conclude with recommendations for both theoretical and experimental neuroscientists when designing new studies that could help bring clarity to these issues. △ Less

Submitted 5 January, 2022; v1 submitted 11 May, 2021; originally announced May 2021.

arXiv:1810.11393 [pdf, other]

Dendritic cortical microcircuits approximate the backpropagation algorithm

Authors: João Sacramento, Rui Ponte Costa, Yoshua Bengio, Walter Senn

Abstract: Deep learning has seen remarkable developments over the last years, many of them inspired by neuroscience. However, the main learning mechanism behind these advances - error backpropagation - appears to be at odds with neurobiology. Here, we introduce a multilayer neuronal network model with simplified dendritic compartments in which error-driven synaptic plasticity adapts the network towards a gl… ▽ More Deep learning has seen remarkable developments over the last years, many of them inspired by neuroscience. However, the main learning mechanism behind these advances - error backpropagation - appears to be at odds with neurobiology. Here, we introduce a multilayer neuronal network model with simplified dendritic compartments in which error-driven synaptic plasticity adapts the network towards a global desired output. In contrast to previous work our model does not require separate phases and synaptic learning is driven by local dendritic prediction errors continuously in time. Such errors originate at apical dendrites and occur due to a mismatch between predictive input from lateral interneurons and activity from actual top-down feedback. Through the use of simple dendritic compartments and different cell-types our model can represent both error and normal activity within a pyramidal neuron. We demonstrate the learning capabilities of the model in regression and classification tasks, and show analytically that it approximates the error backpropagation algorithm. Moreover, our framework is consistent with recent observations of learning between brain areas and the architecture of cortical microcircuits. Overall, we introduce a novel view of learning on dendritic cortical circuits and on how the brain may solve the long-standing synaptic credit assignment problem. △ Less

Submitted 26 October, 2018; originally announced October 2018.

Comments: To appear in Advances in Neural Information Processing Systems 31 (NIPS 2018). 12 pages, 3 figures, 9 pages of supplementary material (2 supplementary figures)

arXiv:1801.00062 [pdf, other]

Dendritic error backpropagation in deep cortical microcircuits

Authors: João Sacramento, Rui Ponte Costa, Yoshua Bengio, Walter Senn

Abstract: Animal behaviour depends on learning to associate sensory stimuli with the desired motor command. Understanding how the brain orchestrates the necessary synaptic modifications across different brain areas has remained a longstanding puzzle. Here, we introduce a multi-area neuronal network model in which synaptic plasticity continuously adapts the network towards a global desired output. In this mo… ▽ More Animal behaviour depends on learning to associate sensory stimuli with the desired motor command. Understanding how the brain orchestrates the necessary synaptic modifications across different brain areas has remained a longstanding puzzle. Here, we introduce a multi-area neuronal network model in which synaptic plasticity continuously adapts the network towards a global desired output. In this model synaptic learning is driven by a local dendritic prediction error that arises from a failure to predict the top-down input given the bottom-up activities. Such errors occur at apical dendrites of pyramidal neurons where both long-range excitatory feedback and local inhibitory predictions are integrated. When local inhibition fails to match excitatory feedback an error occurs which triggers plasticity at bottom-up synapses at basal dendrites of the same pyramidal neurons. We demonstrate the learning capabilities of the model in a number of tasks and show that it approximates the classical error backpropagation algorithm. Finally, complementing this cortical circuit with a disinhibitory mechanism enables attention-like stimulus denoising and generation. Our framework makes several experimental predictions on the function of dendritic integration and cortical microcircuits, is consistent with recent observations of cross-area learning, and suggests a biological implementation of deep learning. △ Less

Submitted 29 December, 2017; originally announced January 2018.

Comments: 27 pages, 5 figures, 10 pages supplementary information

arXiv:1711.02448 [pdf, other]

Cortical microcircuits as gated-recurrent neural networks

Authors: Rui Ponte Costa, Yannis M. Assael, Brendan Shillingford, Nando de Freitas, Tim P. Vogels

Abstract: Cortical circuits exhibit intricate recurrent architectures that are remarkably similar across different brain areas. Such stereotyped structure suggests the existence of common computational principles. However, such principles have remained largely elusive. Inspired by gated-memory networks, namely long short-term memory networks (LSTMs), we introduce a recurrent neural network in which informat… ▽ More Cortical circuits exhibit intricate recurrent architectures that are remarkably similar across different brain areas. Such stereotyped structure suggests the existence of common computational principles. However, such principles have remained largely elusive. Inspired by gated-memory networks, namely long short-term memory networks (LSTMs), we introduce a recurrent neural network in which information is gated through inhibitory cells that are subtractive (subLSTM). We propose a natural map** of subLSTMs onto known canonical excitatory-inhibitory cortical microcircuits. Our empirical evaluation across sequential image classification and language modelling tasks shows that subLSTM units can achieve similar performance to LSTM units. These results suggest that cortical circuits can be optimised to solve complex contextual problems and proposes a novel view on their computational function. Overall our work provides a step towards unifying recurrent networks as used in machine learning with their biological counterparts. △ Less

Submitted 3 January, 2018; v1 submitted 7 November, 2017; originally announced November 2017.

Comments: To appear in Advances in Neural Information Processing Systems 30 (NIPS 2017). 13 pages, 2 figures (and 1 supp. figure)

Showing 1–9 of 9 results for author: Costa, R P