Search | arXiv e-print repository

Energy-based learning algorithms for analog computing: a comparative study

Authors: Benjamin Scellier, Maxence Ernoult, Jack Kendall, Suhas Kumar

Abstract: Energy-based learning algorithms have recently gained a surge of interest due to their compatibility with analog (post-digital) hardware. Existing algorithms include contrastive learning (CL), equilibrium propagation (EP) and coupled learning (CpL), all consisting in contrasting two states, and differing in the type of perturbation used to obtain the second state from the first one. However, these… ▽ More Energy-based learning algorithms have recently gained a surge of interest due to their compatibility with analog (post-digital) hardware. Existing algorithms include contrastive learning (CL), equilibrium propagation (EP) and coupled learning (CpL), all consisting in contrasting two states, and differing in the type of perturbation used to obtain the second state from the first one. However, these algorithms have never been explicitly compared on equal footing with same models and datasets, making it difficult to assess their scalability and decide which one to select in practice. In this work, we carry out a comparison of seven learning algorithms, namely CL and different variants of EP and CpL depending on the signs of the perturbations. Specifically, using these learning algorithms, we train deep convolutional Hopfield networks (DCHNs) on five vision tasks (MNIST, F-MNIST, SVHN, CIFAR-10 and CIFAR-100). We find that, while all algorithms yield comparable performance on MNIST, important differences in performance arise as the difficulty of the task increases. Our key findings reveal that negative perturbations are better than positive ones, and highlight the centered variant of EP (which uses two perturbations of opposite sign) as the best-performing algorithm. We also endorse these findings with theoretical arguments. Additionally, we establish new SOTA results with DCHNs on all five datasets, both in performance and speed. In particular, our DCHN simulations are 13.5 times faster with respect to Laborieux et al. (2021), which we achieve thanks to the use of a novel energy minimisation algorithm based on asynchronous updates, combined with reduced precision (16 bits). △ Less

Submitted 22 December, 2023; originally announced December 2023.

Comments: NeurIPS 2023

arXiv:2304.08120 [pdf, other]

doi 10.1093/gji/ggad460

DAS-N2N: Machine learning Distributed Acoustic Sensing (DAS) signal denoising without clean data

Authors: Sacha Lapins, Antony Butcher, J. -Michael Kendall, Thomas S. Hudson, Anna L. Stork, Maximilian J. Werner, Jemma Gunning, Alex M. Brisbourne

Abstract: This article presents a weakly supervised machine learning method, which we call DAS-N2N, for suppressing strong random noise in distributed acoustic sensing (DAS) recordings. DAS-N2N requires no manually produced labels (i.e., pre-determined examples of clean event signals or sections of noise) for training and aims to map random noise processes to a chosen summary statistic, such as the distribu… ▽ More This article presents a weakly supervised machine learning method, which we call DAS-N2N, for suppressing strong random noise in distributed acoustic sensing (DAS) recordings. DAS-N2N requires no manually produced labels (i.e., pre-determined examples of clean event signals or sections of noise) for training and aims to map random noise processes to a chosen summary statistic, such as the distribution mean, median or mode, whilst retaining the true underlying signal. This is achieved by splicing (joining together) two fibres hosted within a single optical cable, recording two noisy copies of the same underlying signal corrupted by different independent realizations of random observational noise. A deep learning model can then be trained using only these two noisy copies of the data to produce a near fully-denoised copy. Once the model is trained, only noisy data from a single fibre is required. Using a dataset from a DAS array deployed on the surface of the Rutford Ice Stream in Antarctica, we demonstrate that DAS-N2N greatly suppresses incoherent noise and enhances the signal-to-noise ratios (SNR) of natural microseismic icequake events. We further show that this approach is inherently more efficient and effective than standard stop/pass band and white noise (e.g., Wiener) filtering routines, as well as a comparable self-supervised learning method based on masking individual DAS channels. Our preferred model for this task is lightweight, processing 30 seconds of data recorded at a sampling frequency of 1000 Hz over 985 channels (approx. 1 km of fiber) in $<$1 s. Due to the high noise levels in DAS recordings, efficient data-driven denoising methods, such as DAS-N2N, will prove essential to time-critical DAS earthquake detection, particularly in the case of microseismic monitoring. △ Less

Submitted 24 November, 2023; v1 submitted 17 April, 2023; originally announced April 2023.

Comments: Submitted for publication to Geophysical Journal International. For the purpose of open access, the author(s) has applied a Creative Commons Attribution (CC BY) licence to the Author Accepted Manuscript version arising from this submission

arXiv:2103.05636 [pdf, ps, other]

A Gradient Estimator for Time-Varying Electrical Networks with Non-Linear Dissipation

Authors: Jack Kendall

Abstract: We propose a method for extending the technique of equilibrium propagation for estimating gradients in fixed-point neural networks to the more general setting of directed, time-varying neural networks by modeling them as electrical circuits. We use electrical circuit theory to construct a Lagrangian capable of describing deep, directed neural networks modeled using nonlinear capacitors and inducto… ▽ More We propose a method for extending the technique of equilibrium propagation for estimating gradients in fixed-point neural networks to the more general setting of directed, time-varying neural networks by modeling them as electrical circuits. We use electrical circuit theory to construct a Lagrangian capable of describing deep, directed neural networks modeled using nonlinear capacitors and inductors, linear resistors and sources, and a special class of nonlinear dissipative elements called fractional memristors. We then derive an estimator for the gradient of the physical parameters of the network, such as synapse conductances, with respect to an arbitrary loss function. This estimator is entirely local, in that it only depends on information locally available to each synapse. We conclude by suggesting methods for extending these results to networks of biologically plausible neurons, e.g. Hodgkin-Huxley neurons. △ Less

Submitted 8 March, 2021; originally announced March 2021.

Comments: 12 pages, 0 figures

arXiv:2006.01981 [pdf, other]

Training End-to-End Analog Neural Networks with Equilibrium Propagation

Authors: Jack Kendall, Ross Pantone, Kalpana Manickavasagam, Yoshua Bengio, Benjamin Scellier

Abstract: We introduce a principled method to train end-to-end analog neural networks by stochastic gradient descent. In these analog neural networks, the weights to be adjusted are implemented by the conductances of programmable resistive devices such as memristors [Chua, 1971], and the nonlinear transfer functions (or `activation functions') are implemented by nonlinear components such as diodes. We show… ▽ More We introduce a principled method to train end-to-end analog neural networks by stochastic gradient descent. In these analog neural networks, the weights to be adjusted are implemented by the conductances of programmable resistive devices such as memristors [Chua, 1971], and the nonlinear transfer functions (or `activation functions') are implemented by nonlinear components such as diodes. We show mathematically that a class of analog neural networks (called nonlinear resistive networks) are energy-based models: they possess an energy function as a consequence of Kirchhoff's laws governing electrical circuits. This property enables us to train them using the Equilibrium Propagation framework [Scellier and Bengio, 2017]. Our update rule for each conductance, which is local and relies solely on the voltage drop across the corresponding resistor, is shown to compute the gradient of the loss function. Our numerical simulations, which use the SPICE-based Spectre simulation framework to simulate the dynamics of electrical circuits, demonstrate training on the MNIST classification task, performing comparably or better than equivalent-size software-based neural networks. Our work can guide the development of a new generation of ultra-fast, compact and low-power neural networks supporting on-chip learning. △ Less

Submitted 9 June, 2020; v1 submitted 2 June, 2020; originally announced June 2020.

arXiv:2003.02642 [pdf]

Deep Learning in Memristive Nanowire Networks

Authors: Jack D. Kendall, Ross D. Pantone, Juan C. Nino

Abstract: Analog crossbar architectures for accelerating neural network training and inference have made tremendous progress over the past several years. These architectures are ideal for dense layers with fewer than roughly a thousand neurons. However, for large sparse layers, crossbar architectures are highly inefficient. A new hardware architecture, dubbed the MN3 (Memristive Nanowire Neural Network), wa… ▽ More Analog crossbar architectures for accelerating neural network training and inference have made tremendous progress over the past several years. These architectures are ideal for dense layers with fewer than roughly a thousand neurons. However, for large sparse layers, crossbar architectures are highly inefficient. A new hardware architecture, dubbed the MN3 (Memristive Nanowire Neural Network), was recently described as an efficient architecture for simulating very wide, sparse neural network layers, on the order of millions of neurons per layer. The MN3 utilizes a high-density memristive nanowire mesh to efficiently connect large numbers of silicon neurons with modifiable weights. Here, in order to explore the MN3's ability to function as a deep neural network, we describe one algorithm for training deep MN3 models and benchmark simulations of the architecture on two deep learning tasks. We utilize a simple piecewise linear memristor model, since we seek to demonstrate that training is, in principle, possible for randomized nanowire architectures. In future work, we intend on utilizing more realistic memristor models, and we will adapt the presented algorithm appropriately. We show that the MN3 is capable of performing composition, gradient propagation, and weight updates, which together allow it to function as a deep neural network. We show that a simulated multilayer perceptron (MLP), built from MN3 networks, can obtain a 1.61% error rate on the popular MNIST dataset, comparable to equivalently sized software-based network. This work represents, to the authors' knowledge, the first randomized nanowire architecture capable of reproducing the backpropagation algorithm. △ Less

Submitted 3 March, 2020; originally announced March 2020.

arXiv:1804.08122 [pdf, other]

Matching Fingerphotos to Slap Fingerprint Images

Authors: Debayan Deb, Tarang Chugh, Joshua Engelsma, Kai Cao, Neeta Nain, Jake Kendall, Anil K. Jain

Abstract: We address the problem of comparing fingerphotos, fingerprint images from a commodity smartphone camera, with the corresponding legacy slap contact-based fingerprint images. Development of robust versions of these technologies would enable the use of the billions of standard Android phones as biometric readers through a simple software download, dramatically lowering the cost and complexity of dep… ▽ More We address the problem of comparing fingerphotos, fingerprint images from a commodity smartphone camera, with the corresponding legacy slap contact-based fingerprint images. Development of robust versions of these technologies would enable the use of the billions of standard Android phones as biometric readers through a simple software download, dramatically lowering the cost and complexity of deployment relative to using a separate fingerprint reader. Two fingerphoto apps running on Android phones and an optical slap reader were utilized for fingerprint collection of 309 subjects who primarily work as construction workers, farmers, and domestic helpers. Experimental results show that a True Accept Rate (TAR) of 95.79 at a False Accept Rate (FAR) of 0.1% can be achieved in matching fingerphotos to slaps (two thumbs and two index fingers) using a COTS fingerprint matcher. By comparison, a baseline TAR of 98.55% at 0.1% FAR is achieved when matching fingerprint images from two different contact-based optical readers. We also report the usability of the two smartphone apps, in terms of failure to acquire rate and fingerprint acquisition time. Our results show that fingerphotos are promising to authenticate individuals (against a national ID database) for banking, welfare distribution, and healthcare applications in develo** countries. △ Less

Submitted 22 April, 2018; originally announced April 2018.

Comments: 9 pages, 16 figures, 5 tables, conference

Showing 1–6 of 6 results for author: Kendall, J