-
Energy-based learning algorithms for analog computing: a comparative study
Authors:
Benjamin Scellier,
Maxence Ernoult,
Jack Kendall,
Suhas Kumar
Abstract:
Energy-based learning algorithms have recently gained a surge of interest due to their compatibility with analog (post-digital) hardware. Existing algorithms include contrastive learning (CL), equilibrium propagation (EP) and coupled learning (CpL), all consisting in contrasting two states, and differing in the type of perturbation used to obtain the second state from the first one. However, these…
▽ More
Energy-based learning algorithms have recently gained a surge of interest due to their compatibility with analog (post-digital) hardware. Existing algorithms include contrastive learning (CL), equilibrium propagation (EP) and coupled learning (CpL), all consisting in contrasting two states, and differing in the type of perturbation used to obtain the second state from the first one. However, these algorithms have never been explicitly compared on equal footing with same models and datasets, making it difficult to assess their scalability and decide which one to select in practice. In this work, we carry out a comparison of seven learning algorithms, namely CL and different variants of EP and CpL depending on the signs of the perturbations. Specifically, using these learning algorithms, we train deep convolutional Hopfield networks (DCHNs) on five vision tasks (MNIST, F-MNIST, SVHN, CIFAR-10 and CIFAR-100). We find that, while all algorithms yield comparable performance on MNIST, important differences in performance arise as the difficulty of the task increases. Our key findings reveal that negative perturbations are better than positive ones, and highlight the centered variant of EP (which uses two perturbations of opposite sign) as the best-performing algorithm. We also endorse these findings with theoretical arguments. Additionally, we establish new SOTA results with DCHNs on all five datasets, both in performance and speed. In particular, our DCHN simulations are 13.5 times faster with respect to Laborieux et al. (2021), which we achieve thanks to the use of a novel energy minimisation algorithm based on asynchronous updates, combined with reduced precision (16 bits).
△ Less
Submitted 22 December, 2023;
originally announced December 2023.
-
Towards Scaling Difference Target Propagation by Learning Backprop Targets
Authors:
Maxence Ernoult,
Fabrice Normandin,
Abhinav Moudgil,
Sean Spinney,
Eugene Belilovsky,
Irina Rish,
Blake Richards,
Yoshua Bengio
Abstract:
The development of biologically-plausible learning algorithms is important for understanding learning in the brain, but most of them fail to scale-up to real-world tasks, limiting their potential as explanations for learning by real brains. As such, it is important to explore learning algorithms that come with strong theoretical guarantees and can match the performance of backpropagation (BP) on c…
▽ More
The development of biologically-plausible learning algorithms is important for understanding learning in the brain, but most of them fail to scale-up to real-world tasks, limiting their potential as explanations for learning by real brains. As such, it is important to explore learning algorithms that come with strong theoretical guarantees and can match the performance of backpropagation (BP) on complex tasks. One such algorithm is Difference Target Propagation (DTP), a biologically-plausible learning algorithm whose close relation with Gauss-Newton (GN) optimization has been recently established. However, the conditions under which this connection rigorously holds preclude layer-wise training of the feedback pathway synaptic weights (which is more biologically plausible). Moreover, good alignment between DTP weight updates and loss gradients is only loosely guaranteed and under very specific conditions for the architecture being trained. In this paper, we propose a novel feedback weight training scheme that ensures both that DTP approximates BP and that layer-wise feedback weight training can be restored without sacrificing any theoretical guarantees. Our theory is corroborated by experimental results and we report the best performance ever achieved by DTP on CIFAR-10 and ImageNet 32$\times$32
△ Less
Submitted 31 January, 2022;
originally announced January 2022.
-
Training Dynamical Binary Neural Networks with Equilibrium Propagation
Authors:
Jérémie Laydevant,
Maxence Ernoult,
Damien Querlioz,
Julie Grollier
Abstract:
Equilibrium Propagation (EP) is an algorithm intrinsically adapted to the training of physical networks, thanks to the local updates of weights given by the internal dynamics of the system. However, the construction of such a hardware requires to make the algorithm compatible with existing neuromorphic CMOS technologies, which generally exploit digital communication between neurons and offer a lim…
▽ More
Equilibrium Propagation (EP) is an algorithm intrinsically adapted to the training of physical networks, thanks to the local updates of weights given by the internal dynamics of the system. However, the construction of such a hardware requires to make the algorithm compatible with existing neuromorphic CMOS technologies, which generally exploit digital communication between neurons and offer a limited amount of local memory. In this work, we demonstrate that EP can train dynamical networks with binary activations and weights. We first train systems with binary weights and full-precision activations, achieving an accuracy equivalent to that of full-precision models trained by standard EP on MNIST, and losing only 1.9% accuracy on CIFAR-10 with equal architecture. We then extend our method to the training of models with binary activations and weights on MNIST, achieving an accuracy within 1% of the full-precision reference for fully connected architectures and reaching the full-precision accuracy for convolutional architectures. Our extension of EP to binary networks opens new solutions for on-chip learning and provides a compact framework for training BNNs end-to-end with the same circuitry as for inference.
△ Less
Submitted 19 April, 2021; v1 submitted 16 March, 2021;
originally announced March 2021.
-
Synaptic metaplasticity in binarized neural networks
Authors:
Axel Laborieux,
Maxence Ernoult,
Tifenn Hirtzlin,
Damien Querlioz
Abstract:
Unlike the brain, artificial neural networks, including state-of-the-art deep neural networks for computer vision, are subject to "catastrophic forgetting": they rapidly forget the previous task when trained on a new one. Neuroscience suggests that biological synapses avoid this issue through the process of synaptic consolidation and metaplasticity: the plasticity itself changes upon repeated syna…
▽ More
Unlike the brain, artificial neural networks, including state-of-the-art deep neural networks for computer vision, are subject to "catastrophic forgetting": they rapidly forget the previous task when trained on a new one. Neuroscience suggests that biological synapses avoid this issue through the process of synaptic consolidation and metaplasticity: the plasticity itself changes upon repeated synaptic events. In this work, we show that this concept of metaplasticity can be transferred to a particular type of deep neural networks, binarized neural networks, to reduce catastrophic forgetting.
△ Less
Submitted 19 January, 2021;
originally announced January 2021.
-
Scaling Equilibrium Propagation to Deep ConvNets by Drastically Reducing its Gradient Estimator Bias
Authors:
Axel Laborieux,
Maxence Ernoult,
Benjamin Scellier,
Yoshua Bengio,
Julie Grollier,
Damien Querlioz
Abstract:
Equilibrium Propagation (EP) is a biologically-inspired counterpart of Backpropagation Through Time (BPTT) which, owing to its strong theoretical guarantees and the locality in space of its learning rule, fosters the design of energy-efficient hardware dedicated to learning. In practice, however, EP does not scale to visual tasks harder than MNIST. In this work, we show that a bias in the gradient…
▽ More
Equilibrium Propagation (EP) is a biologically-inspired counterpart of Backpropagation Through Time (BPTT) which, owing to its strong theoretical guarantees and the locality in space of its learning rule, fosters the design of energy-efficient hardware dedicated to learning. In practice, however, EP does not scale to visual tasks harder than MNIST. In this work, we show that a bias in the gradient estimate of EP, inherent in the use of finite nudging, is responsible for this phenomenon and that cancelling it allows training deep ConvNets by EP, including architectures with distinct forward and backward connections. These results highlight EP as a scalable approach to compute error gradients in deep neural networks, thereby motivating its hardware implementation.
△ Less
Submitted 14 January, 2021;
originally announced January 2021.
-
EqSpike: Spike-driven Equilibrium Propagation for Neuromorphic Implementations
Authors:
Erwann Martin,
Maxence Ernoult,
Jérémie Laydevant,
Shuai Li,
Damien Querlioz,
Teodora Petrisor,
Julie Grollier
Abstract:
Finding spike-based learning algorithms that can be implemented within the local constraints of neuromorphic systems, while achieving high accuracy, remains a formidable challenge. Equilibrium Propagation is a promising alternative to backpropagation as it only involves local computations, but hardware-oriented studies have so far focused on rate-based networks. In this work, we develop a spiking…
▽ More
Finding spike-based learning algorithms that can be implemented within the local constraints of neuromorphic systems, while achieving high accuracy, remains a formidable challenge. Equilibrium Propagation is a promising alternative to backpropagation as it only involves local computations, but hardware-oriented studies have so far focused on rate-based networks. In this work, we develop a spiking neural network algorithm called EqSpike, compatible with neuromorphic systems, which learns by Equilibrium Propagation. Through simulations, we obtain a test recognition accuracy of 97.6% on MNIST, similar to rate-based Equilibrium Propagation, and comparing favourably to alternative learning techniques for spiking neural networks. We show that EqSpike implemented in silicon neuromorphic technology could reduce the energy consumption of inference and training respectively by three orders and two orders of magnitude compared to GPUs. Finally, we also show that during learning, EqSpike weight updates exhibit a form of Spike Timing Dependent Plasticity, highlighting a possible connection with biology.
△ Less
Submitted 17 February, 2021; v1 submitted 15 October, 2020;
originally announced October 2020.
-
Scaling Equilibrium Propagation to Deep ConvNets by Drastically Reducing its Gradient Estimator Bias
Authors:
Axel Laborieux,
Maxence Ernoult,
Benjamin Scellier,
Yoshua Bengio,
Julie Grollier,
Damien Querlioz
Abstract:
Equilibrium Propagation (EP) is a biologically-inspired algorithm for convergent RNNs with a local learning rule that comes with strong theoretical guarantees. The parameter updates of the neural network during the credit assignment phase have been shown mathematically to approach the gradients provided by Backpropagation Through Time (BPTT) when the network is infinitesimally nudged toward its ta…
▽ More
Equilibrium Propagation (EP) is a biologically-inspired algorithm for convergent RNNs with a local learning rule that comes with strong theoretical guarantees. The parameter updates of the neural network during the credit assignment phase have been shown mathematically to approach the gradients provided by Backpropagation Through Time (BPTT) when the network is infinitesimally nudged toward its target. In practice, however, training a network with the gradient estimates provided by EP does not scale to visual tasks harder than MNIST. In this work, we show that a bias in the gradient estimate of EP, inherent in the use of finite nudging, is responsible for this phenomenon and that cancelling it allows training deep ConvNets by EP. We show that this bias can be greatly reduced by using symmetric nudging (a positive nudging and a negative one). We also generalize previous EP equations to the case of cross-entropy loss (by opposition to squared error). As a result of these advances, we are able to achieve a test error of 11.7% on CIFAR-10 by EP, which approaches the one achieved by BPTT and provides a major improvement with respect to the standard EP approach with same-sign nudging that gives 86% test error. We also apply these techniques to train an architecture with asymmetric forward and backward connections, yielding a 13.2% test error. These results highlight EP as a compelling biologically-plausible approach to compute error gradients in deep neural networks.
△ Less
Submitted 6 June, 2020;
originally announced June 2020.
-
Continual Weight Updates and Convolutional Architectures for Equilibrium Propagation
Authors:
Maxence Ernoult,
Julie Grollier,
Damien Querlioz,
Yoshua Bengio,
Benjamin Scellier
Abstract:
Equilibrium Propagation (EP) is a biologically inspired alternative algorithm to backpropagation (BP) for training neural networks. It applies to RNNs fed by a static input x that settle to a steady state, such as Hopfield networks. EP is similar to BP in that in the second phase of training, an error signal propagates backwards in the layers of the network, but contrary to BP, the learning rule o…
▽ More
Equilibrium Propagation (EP) is a biologically inspired alternative algorithm to backpropagation (BP) for training neural networks. It applies to RNNs fed by a static input x that settle to a steady state, such as Hopfield networks. EP is similar to BP in that in the second phase of training, an error signal propagates backwards in the layers of the network, but contrary to BP, the learning rule of EP is spatially local. Nonetheless, EP suffers from two major limitations. On the one hand, due to its formulation in terms of real-time dynamics, EP entails long simulation times, which limits its applicability to practical tasks. On the other hand, the biological plausibility of EP is limited by the fact that its learning rule is not local in time: the synapse update is performed after the dynamics of the second phase have converged and requires information of the first phase that is no longer available physically. Our work addresses these two issues and aims at widening the spectrum of EP from standard machine learning models to more bio-realistic neural networks. First, we propose a discrete-time formulation of EP which enables to simplify equations, speed up training and extend EP to CNNs. Our CNN model achieves the best performance ever reported on MNIST with EP. Using the same discrete-time formulation, we introduce Continual Equilibrium Propagation (C-EP): the weights of the network are adjusted continually in the second phase of training using local information in space and time. We show that in the limit of slow changes of synaptic strengths and small nudging, C-EP is equivalent to BPTT (Theorem 1). We numerically demonstrate Theorem 1 and C-EP training on MNIST and generalize it to the bio-realistic situation of a neural network with asymmetric connections between neurons.
△ Less
Submitted 29 April, 2020;
originally announced May 2020.
-
Equilibrium Propagation with Continual Weight Updates
Authors:
Maxence Ernoult,
Julie Grollier,
Damien Querlioz,
Yoshua Bengio,
Benjamin Scellier
Abstract:
Equilibrium Propagation (EP) is a learning algorithm that bridges Machine Learning and Neuroscience, by computing gradients closely matching those of Backpropagation Through Time (BPTT), but with a learning rule local in space. Given an input $x$ and associated target $y$, EP proceeds in two phases: in the first phase neurons evolve freely towards a first steady state; in the second phase output n…
▽ More
Equilibrium Propagation (EP) is a learning algorithm that bridges Machine Learning and Neuroscience, by computing gradients closely matching those of Backpropagation Through Time (BPTT), but with a learning rule local in space. Given an input $x$ and associated target $y$, EP proceeds in two phases: in the first phase neurons evolve freely towards a first steady state; in the second phase output neurons are nudged towards $y$ until they reach a second steady state. However, in existing implementations of EP, the learning rule is not local in time: the weight update is performed after the dynamics of the second phase have converged and requires information of the first phase that is no longer available physically. In this work, we propose a version of EP named Continual Equilibrium Propagation (C-EP) where neuron and synapse dynamics occur simultaneously throughout the second phase, so that the weight update becomes local in time. Such a learning rule local both in space and time opens the possibility of an extremely energy efficient hardware implementation of EP. We prove theoretically that, provided the learning rates are sufficiently small, at each time step of the second phase the dynamics of neurons and synapses follow the gradients of the loss given by BPTT (Theorem 1). We demonstrate training with C-EP on MNIST and generalize C-EP to neural networks where neurons are connected by asymmetric connections. We show through experiments that the more the network updates follows the gradients of BPTT, the best it performs in terms of training. These results bring EP a step closer to biology by better complying with hardware constraints while maintaining its intimate link with backpropagation.
△ Less
Submitted 29 April, 2020;
originally announced May 2020.
-
Synaptic Metaplasticity in Binarized Neural Networks
Authors:
Axel Laborieux,
Maxence Ernoult,
Tifenn Hirtzlin,
Damien Querlioz
Abstract:
While deep neural networks have surpassed human performance in multiple situations, they are prone to catastrophic forgetting: upon training a new task, they rapidly forget previously learned ones. Neuroscience studies, based on idealized tasks, suggest that in the brain, synapses overcome this issue by adjusting their plasticity depending on their past history. However, such "metaplastic" behavio…
▽ More
While deep neural networks have surpassed human performance in multiple situations, they are prone to catastrophic forgetting: upon training a new task, they rapidly forget previously learned ones. Neuroscience studies, based on idealized tasks, suggest that in the brain, synapses overcome this issue by adjusting their plasticity depending on their past history. However, such "metaplastic" behaviours do not transfer directly to mitigate catastrophic forgetting in deep neural networks. In this work, we interpret the hidden weights used by binarized neural networks, a low-precision version of deep neural networks, as metaplastic variables, and modify their training technique to alleviate forgetting. Building on this idea, we propose and demonstrate experimentally, in situations of multitask and stream learning, a training technique that reduces catastrophic forgetting without needing previously presented data, nor formal boundaries between datasets and with performance approaching more mainstream techniques with task boundaries. We support our approach with a theoretical analysis on a tractable task. This work bridges computational neuroscience and deep learning, and presents significant assets for future embedded and neuromorphic systems, especially when using novel nanodevices featuring physics analogous to metaplasticity.
△ Less
Submitted 23 March, 2021; v1 submitted 7 March, 2020;
originally announced March 2020.
-
Binding events through the mutual synchronization of spintronic nano-neurons
Authors:
Miguel Romera,
Philippe Talatchian,
Sumito Tsunegi,
Kay Yakushiji,
Akio Fukushima,
Hitoshi Kubota,
Shinji Yuasa,
Vincent Cros,
Paolo Bortolotti,
Maxence Ernoult,
Damien Querlioz,
Julie Grollier
Abstract:
The brain naturally binds events from different sources in unique concepts. It is hypothesized that this process occurs through the transient mutual synchronization of neurons located in different regions of the brain when the stimulus is presented. This mechanism of binding through synchronization can be directly implemented in neural networks composed of coupled oscillators. To do so, the oscill…
▽ More
The brain naturally binds events from different sources in unique concepts. It is hypothesized that this process occurs through the transient mutual synchronization of neurons located in different regions of the brain when the stimulus is presented. This mechanism of binding through synchronization can be directly implemented in neural networks composed of coupled oscillators. To do so, the oscillators must be able to mutually synchronize for the range of inputs corresponding to a single class, and otherwise remain desynchronized. Here we show that the outstanding ability of spintronic nano-oscillators to mutually synchronize and the possibility to precisely control the occurrence of mutual synchronization by tuning the oscillator frequencies over wide ranges allows pattern recognition. We demonstrate experimentally on a simple task that three spintronic nano-oscillators can bind consecutive events and thus recognize and distinguish temporal sequences. This work is a step forward in the construction of neural networks that exploit the non-linear dynamic properties of their components to perform brain-inspired computations.
△ Less
Submitted 22 January, 2020;
originally announced January 2020.
-
Updates of Equilibrium Prop Match Gradients of Backprop Through Time in an RNN with Static Input
Authors:
Maxence Ernoult,
Julie Grollier,
Damien Querlioz,
Yoshua Bengio,
Benjamin Scellier
Abstract:
Equilibrium Propagation (EP) is a biologically inspired learning algorithm for convergent recurrent neural networks, i.e. RNNs that are fed by a static input x and settle to a steady state. Training convergent RNNs consists in adjusting the weights until the steady state of output neurons coincides with a target y. Convergent RNNs can also be trained with the more conventional Backpropagation Thro…
▽ More
Equilibrium Propagation (EP) is a biologically inspired learning algorithm for convergent recurrent neural networks, i.e. RNNs that are fed by a static input x and settle to a steady state. Training convergent RNNs consists in adjusting the weights until the steady state of output neurons coincides with a target y. Convergent RNNs can also be trained with the more conventional Backpropagation Through Time (BPTT) algorithm. In its original formulation EP was described in the case of real-time neuronal dynamics, which is computationally costly. In this work, we introduce a discrete-time version of EP with simplified equations and with reduced simulation time, bringing EP closer to practical machine learning tasks. We first prove theoretically, as well as numerically that the neural and weight updates of EP, computed by forward-time dynamics, are step-by-step equal to the ones obtained by BPTT, with gradients computed backward in time. The equality is strict when the transition function of the dynamics derives from a primitive function and the steady state is maintained long enough. We then show for more standard discrete-time neural network dynamics that the same property is approximately respected and we subsequently demonstrate training with EP with equivalent performance to BPTT. In particular, we define the first convolutional architecture trained with EP achieving ~ 1% test error on MNIST, which is the lowest error reported with EP. These results can guide the development of deep neural networks trained with EP.
△ Less
Submitted 31 May, 2019;
originally announced May 2019.
-
Microwave neural processing and broadcasting with spintronic nano-oscillators
Authors:
P. Talatchian,
M. Romera,
S. Tsunegi,
F. Abreu Araujo,
V. Cros,
P. Bortolotti,
J. Trastoy,
K. Yakushiji,
A. Fukushima,
H. Kubota,
S. Yuasa,
M. Ernoult,
D. Vodenicarevic,
T. Hirtzlin,
N. Locatelli,
D. Querlioz,
J. Grollier
Abstract:
Can we build small neuromorphic chips capable of training deep networks with billions of parameters? This challenge requires hardware neurons and synapses with nanometric dimensions, which can be individually tuned, and densely connected. While nanosynaptic devices have been pursued actively in recent years, much less has been done on nanoscale artificial neurons. In this paper, we show that spint…
▽ More
Can we build small neuromorphic chips capable of training deep networks with billions of parameters? This challenge requires hardware neurons and synapses with nanometric dimensions, which can be individually tuned, and densely connected. While nanosynaptic devices have been pursued actively in recent years, much less has been done on nanoscale artificial neurons. In this paper, we show that spintronic nano-oscillators are promising to implement analog hardware neurons that can be densely interconnected through electromagnetic signals. We show how spintronic oscillators maps the requirements of artificial neurons. We then show experimentally how an ensemble of four coupled oscillators can learn to classify all twelve American vowels, realizing the most complicated tasks performed by nanoscale neurons.
△ Less
Submitted 25 April, 2019;
originally announced April 2019.
-
Vowel recognition with four coupled spin-torque nano-oscillators
Authors:
Miguel Romera,
Philippe Talatchian,
Sumito Tsunegi,
Flavio Abreu Araujo,
Vincent Cros,
Paolo Bortolotti,
Juan Trastoy,
Kay Yakushiji,
Akio Fukushima,
Hitoshi Kubota,
Shinji Yuasa,
Maxence Ernoult,
Damir Vodenicarevic,
Tifenn Hirtzlin,
Nicolas Locatelli,
Damien Querlioz,
Julie Grollier
Abstract:
Substantial evidence indicates that the brain uses principles of non-linear dynamics in neural processes, providing inspiration for computing with nanoelectronic devices. However, training neural networks composed of dynamical nanodevices requires finely controlling and tuning their coupled oscillations. In this work, we show that the outstanding tunability of spintronic nano-oscillators can solve…
▽ More
Substantial evidence indicates that the brain uses principles of non-linear dynamics in neural processes, providing inspiration for computing with nanoelectronic devices. However, training neural networks composed of dynamical nanodevices requires finely controlling and tuning their coupled oscillations. In this work, we show that the outstanding tunability of spintronic nano-oscillators can solve this challenge. We successfully train a hardware network of four spin-torque nano-oscillators to recognize spoken vowels by tuning their frequencies according to an automatic real-time learning rule. We show that the high experimental recognition rates stem from the high frequency tunability of the oscillators and their mutual coupling. Our results demonstrate that non-trivial pattern classification tasks can be achieved with small hardware neural networks by endowing them with non-linear dynamical features: here, oscillations and synchronization. This demonstration is a milestone for spintronics-based neuromorphic computing.
△ Less
Submitted 18 October, 2018; v1 submitted 8 November, 2017;
originally announced November 2017.
-
One single static measurement predicts wave localization in complex structures
Authors:
Gautier Lefebvre,
Alexane Gondel,
Marc Dubois,
Michael Atlan,
Florian Feppon,
Aimé Labbé,
Camille Gillot,
Alix Garelli,
Maxence Ernoult,
Svitlana Mayboroda,
Marcel Filoche,
Patrick Sebbah
Abstract:
A recent theoretical breakthrough has brought a new tool, called \emph{localization landscape}, to predict the localization regions of vibration modes in complex or disordered systems. Here, we report on the first experiment which measures the localization landscape and demonstrates its predictive power. Holographic measurement of the static deformation under uniform load of a thin plate with comp…
▽ More
A recent theoretical breakthrough has brought a new tool, called \emph{localization landscape}, to predict the localization regions of vibration modes in complex or disordered systems. Here, we report on the first experiment which measures the localization landscape and demonstrates its predictive power. Holographic measurement of the static deformation under uniform load of a thin plate with complex geometry provides direct access to the landscape function. When put in vibration, this system shows modes precisely confined within the sub-regions delineated by the landscape function. Also the maxima of this function match the measured eigenfrequencies, while the minima of the valley network gives the frequencies at which modes become extended. This approach fully characterizes the low frequency spectrum of a complex structure from a single static measurement. It paves the way to the control and engineering of eigenmodes in any vibratory system, especially where a structural or microscopic description is not accessible.
△ Less
Submitted 11 April, 2016;
originally announced April 2016.