Search | arXiv e-print repository

Hidden Traveling Waves bind Working Memory Variables in Recurrent Neural Networks

Authors: Arjun Karuvally, Terrence J. Sejnowski, Hava T. Siegelmann

Abstract: Traveling waves are a fundamental phenomenon in the brain, playing a crucial role in short-term information storage. In this study, we leverage the concept of traveling wave dynamics within a neural lattice to formulate a theoretical model of neural working memory, study its properties, and its real world implications in AI. The proposed model diverges from traditional approaches, which assume inf… ▽ More Traveling waves are a fundamental phenomenon in the brain, playing a crucial role in short-term information storage. In this study, we leverage the concept of traveling wave dynamics within a neural lattice to formulate a theoretical model of neural working memory, study its properties, and its real world implications in AI. The proposed model diverges from traditional approaches, which assume information storage in static, register-like locations updated by interference. Instead, the model stores data as waves that is updated by the wave's boundary conditions. We rigorously examine the model's capabilities in representing and learning state histories, which are vital for learning history-dependent dynamical systems. The findings reveal that the model reliably stores external information and enhances the learning process by addressing the diminishing gradient problem. To understand the model's real-world applicability, we explore two cases: linear boundary condition (LBC) and non-linear, self-attention-driven boundary condition (SBC). The model with the linear boundary condition results in a shift matrix plus low-rank matrix currently used in H3 state space RNN. Further, our experiments with LBC reveal that this matrix is effectively learned by Recurrent Neural Networks (RNNs) through backpropagation when modeling history-dependent dynamical systems. Conversely, the SBC parallels the autoregressive loop of an attention-only transformer with the context vector representing the wave substrate. Collectively, our findings suggest the broader relevance of traveling waves in AI and its potential in advancing neural network architectures. △ Less

Submitted 7 April, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

arXiv:2310.02430 [pdf, other]

Episodic Memory Theory for the Mechanistic Interpretation of Recurrent Neural Networks

Authors: Arjun Karuvally, Peter Delmastro, Hava T. Siegelmann

Abstract: Understanding the intricate operations of Recurrent Neural Networks (RNNs) mechanistically is pivotal for advancing their capabilities and applications. In this pursuit, we propose the Episodic Memory Theory (EMT), illustrating that RNNs can be conceptualized as discrete-time analogs of the recently proposed General Sequential Episodic Memory Model. To substantiate EMT, we introduce a novel set of… ▽ More Understanding the intricate operations of Recurrent Neural Networks (RNNs) mechanistically is pivotal for advancing their capabilities and applications. In this pursuit, we propose the Episodic Memory Theory (EMT), illustrating that RNNs can be conceptualized as discrete-time analogs of the recently proposed General Sequential Episodic Memory Model. To substantiate EMT, we introduce a novel set of algorithmic tasks tailored to probe the variable binding behavior in RNNs. Utilizing the EMT, we formulate a mathematically rigorous circuit that facilitates variable binding in these tasks. Our empirical investigations reveal that trained RNNs consistently converge to the variable binding circuit, thus indicating universality in the dynamics of RNNs. Building on these findings, we devise an algorithm to define a privileged basis, which reveals hidden neurons instrumental in the temporal storage and composition of variables, a mechanism vital for the successful generalization in these tasks. We show that the privileged basis enhances the interpretability of the learned parameters and hidden states of RNNs. Our work represents a step toward demystifying the internal mechanisms of RNNs and, for computational neuroscience, serves to bridge the gap between artificial neural networks and neural memory models. △ Less

Submitted 3 October, 2023; originally announced October 2023.

arXiv:2306.07125 [pdf, other]

On the Dynamics of Learning Time-Aware Behavior with Recurrent Neural Networks

Authors: Peter DelMastro, Rushiv Arora, Edward Rietman, Hava T. Siegelmann

Abstract: Recurrent Neural Networks (RNNs) have shown great success in modeling time-dependent patterns, but there is limited research on their learned representations of latent temporal features and the emergence of these representations during training. To address this gap, we use timed automata (TA) to introduce a family of supervised learning tasks modeling behavior dependent on hidden temporal variable… ▽ More Recurrent Neural Networks (RNNs) have shown great success in modeling time-dependent patterns, but there is limited research on their learned representations of latent temporal features and the emergence of these representations during training. To address this gap, we use timed automata (TA) to introduce a family of supervised learning tasks modeling behavior dependent on hidden temporal variables whose complexity is directly controllable. Building upon past studies from the perspective of dynamical systems, we train RNNs to emulate temporal flipflops, a new collection of TA that emphasizes the need for time-awareness over long-term memory. We find that these RNNs learn in phases: they quickly perfect any time-independent behavior, but they initially struggle to discover the hidden time-dependent features. In the case of periodic "time-of-day" aware automata, we show that the RNNs learn to switch between periodic orbits that encode time modulo the period of the transition rules. We subsequently apply fixed point stability analysis to monitor changes in the RNN dynamics during training, and we observe that the learning phases are separated by a bifurcation from which the periodic behavior emerges. In this way, we demonstrate how dynamical systems theory can provide insights into not only the learned representations of these models, but also the dynamics of the learning process itself. We argue that this style of analysis may provide insights into the training pathologies of recurrent architectures in contexts outside of time-awareness. △ Less

Submitted 12 June, 2023; originally announced June 2023.

Comments: Main paper: 11 pages, 8 figures. Supplemental Material: 6 pages, 5 figures, 1 table

arXiv:2305.18701 [pdf, other]

Temporally Layered Architecture for Efficient Continuous Control

Authors: Devdhar Patel, Terrence Sejnowski, Hava Siegelmann

Abstract: We present a temporally layered architecture (TLA) for temporally adaptive control with minimal energy expenditure. The TLA layers a fast and a slow policy together to achieve temporal abstraction that allows each layer to focus on a different time scale. Our design draws on the energy-saving mechanism of the human brain, which executes actions at different timescales depending on the environment'… ▽ More We present a temporally layered architecture (TLA) for temporally adaptive control with minimal energy expenditure. The TLA layers a fast and a slow policy together to achieve temporal abstraction that allows each layer to focus on a different time scale. Our design draws on the energy-saving mechanism of the human brain, which executes actions at different timescales depending on the environment's demands. We demonstrate that beyond energy saving, TLA provides many additional advantages, including persistent exploration, fewer required decisions, reduced jerk, and increased action repetition. We evaluate our method on a suite of continuous control tasks and demonstrate the significant advantages of TLA over existing methods when measured over multiple important metrics. We also introduce a multi-objective score to qualitatively assess continuous control policies and demonstrate a significantly better score for TLA. Our training algorithm uses minimal communication between the slow and fast layers to train both policies simultaneously, making it viable for future applications in distributed control. △ Less

Submitted 8 August, 2023; v1 submitted 29 May, 2023; originally announced May 2023.

Comments: 10 Pages, 2 Figures, 3 Tables. arXiv admin note: text overlap with arXiv:2301.00723

arXiv:2301.06648 [pdf, other]

doi 10.1016/j.neucom.2023.126388

Neuromorphic High-Frequency 3D Dancing Pose Estimation in Dynamic Environment

Authors: Zhongyang Zhang, Kaidong Chai, Haowen Yu, Ramzi Majaj, Francesca Walsh, Edward Wang, Upal Mahbub, Hava Siegelmann, Donghyun Kim, Tauhidur Rahman

Abstract: As a beloved sport worldwide, dancing is getting integrated into traditional and virtual reality-based gaming platforms nowadays. It opens up new opportunities in the technology-mediated dancing space. These platforms primarily rely on passive and continuous human pose estimation as an input capture mechanism. Existing solutions are mainly based on RGB or RGB-Depth cameras for dance games. The for… ▽ More As a beloved sport worldwide, dancing is getting integrated into traditional and virtual reality-based gaming platforms nowadays. It opens up new opportunities in the technology-mediated dancing space. These platforms primarily rely on passive and continuous human pose estimation as an input capture mechanism. Existing solutions are mainly based on RGB or RGB-Depth cameras for dance games. The former suffers in low-lighting conditions due to the motion blur and low sensitivity, while the latter is too power-hungry, has a low frame rate, and has limited working distance. With ultra-low latency, energy efficiency, and wide dynamic range characteristics, the event camera is a promising solution to overcome these shortcomings. We propose YeLan, an event camera-based 3-dimensional high-frequency human pose estimation(HPE) system that survives low-lighting conditions and dynamic backgrounds. We collected the world's first event camera dance dataset and developed a fully customizable motion-to-event physics-aware simulator. YeLan outperforms the baseline models in these challenging conditions and demonstrated robustness against different types of clothing, background motion, viewing angle, occlusion, and lighting fluctuations. △ Less

Submitted 27 January, 2023; v1 submitted 16 January, 2023; originally announced January 2023.

Report number: ISSN 0925-2312

Journal ref: Neurocomputing, Volume 547, 2023, 126388

arXiv:2301.04126 [pdf, other]

Temporal Weights

Authors: Adam Kohan, Ed Rietman, Hava Siegelmann

Abstract: In artificial neural networks, weights are a static representation of synapses. However, synapses are not static, they have their own interacting dynamics over time. To instill weights with interacting dynamics, we use a model describing synchronization that is capable of capturing core mechanisms of a range of neural and general biological phenomena over time. An ideal fit for these Temporal Weig… ▽ More In artificial neural networks, weights are a static representation of synapses. However, synapses are not static, they have their own interacting dynamics over time. To instill weights with interacting dynamics, we use a model describing synchronization that is capable of capturing core mechanisms of a range of neural and general biological phenomena over time. An ideal fit for these Temporal Weights (TW) are Neural ODEs, with continuous dynamics and a dependency on time. The resulting recurrent neural networks efficiently model temporal dynamics by computing on the ordering of sequences, and the length and scale of time. By adding temporal weights to a model, we demonstrate better performance, smaller models, and data efficiency on sparse, irregularly sampled time series datasets. △ Less

Submitted 13 December, 2022; originally announced January 2023.

arXiv:2301.00723 [pdf, other]

Temporally Layered Architecture for Adaptive, Distributed and Continuous Control

Authors: Devdhar Patel, Joshua Russell, Francesca Walsh, Tauhidur Rahman, Terrence Sejnowski, Hava Siegelmann

Abstract: We present temporally layered architecture (TLA), a biologically inspired system for temporally adaptive distributed control. TLA layers a fast and a slow controller together to achieve temporal abstraction that allows each layer to focus on a different time-scale. Our design is biologically inspired and draws on the architecture of the human brain which executes actions at different timescales de… ▽ More We present temporally layered architecture (TLA), a biologically inspired system for temporally adaptive distributed control. TLA layers a fast and a slow controller together to achieve temporal abstraction that allows each layer to focus on a different time-scale. Our design is biologically inspired and draws on the architecture of the human brain which executes actions at different timescales depending on the environment's demands. Such distributed control design is widespread across biological systems because it increases survivability and accuracy in certain and uncertain environments. We demonstrate that TLA can provide many advantages over existing approaches, including persistent exploration, adaptive control, explainable temporal behavior, compute efficiency and distributed control. We present two different algorithms for training TLA: (a) Closed-loop control, where the fast controller is trained over a pre-trained slow controller, allowing better exploration for the fast controller and closed-loop control where the fast controller decides whether to "act-or-not" at each timestep; and (b) Partially open loop control, where the slow controller is trained over a pre-trained fast controller, allowing for open loop-control where the slow controller picks a temporally extended action or defers the next n-actions to the fast controller. We evaluated our method on a suite of continuous control tasks and demonstrate the advantages of TLA over several strong baselines. △ Less

Submitted 5 February, 2023; v1 submitted 25 December, 2022; originally announced January 2023.

Comments: 10 pages, 4 figures

arXiv:2212.12866 [pdf, other]

QuickNets: Saving Training and Preventing Overconfidence in Early-Exit Neural Architectures

Authors: Devdhar Patel, Hava Siegelmann

Abstract: Deep neural networks have long training and processing times. Early exits added to neural networks allow the network to make early predictions using intermediate activations in the network in time-sensitive applications. However, early exits increase the training time of the neural networks. We introduce QuickNets: a novel cascaded training algorithm for faster training of neural networks. QuickNe… ▽ More Deep neural networks have long training and processing times. Early exits added to neural networks allow the network to make early predictions using intermediate activations in the network in time-sensitive applications. However, early exits increase the training time of the neural networks. We introduce QuickNets: a novel cascaded training algorithm for faster training of neural networks. QuickNets are trained in a layer-wise manner such that each successive layer is only trained on samples that could not be correctly classified by the previous layers. We demonstrate that QuickNets can dynamically distribute learning and have a reduced training cost and inference cost compared to standard Backpropagation. Additionally, we introduce commitment layers that significantly improve the early exits by identifying for over-confident predictions and demonstrate its success. △ Less

Submitted 25 December, 2022; originally announced December 2022.

Comments: 9 pages, 4 figures

arXiv:2212.05563 [pdf, other]

Energy-based General Sequential Episodic Memory Networks at the Adiabatic Limit

Authors: Arjun Karuvally, Terry J. Sejnowski, Hava T. Siegelmann

Abstract: The General Associative Memory Model (GAMM) has a constant state-dependant energy surface that leads the output dynamics to fixed points, retrieving single memories from a collection of memories that can be asynchronously preloaded. We introduce a new class of General Sequential Episodic Memory Models (GSEMM) that, in the adiabatic limit, exhibit temporally changing energy surface, leading to a se… ▽ More The General Associative Memory Model (GAMM) has a constant state-dependant energy surface that leads the output dynamics to fixed points, retrieving single memories from a collection of memories that can be asynchronously preloaded. We introduce a new class of General Sequential Episodic Memory Models (GSEMM) that, in the adiabatic limit, exhibit temporally changing energy surface, leading to a series of meta-stable states that are sequential episodic memories. The dynamic energy surface is enabled by newly introduced asymmetric synapses with signal propagation delays in the network's hidden layer. We study the theoretical and empirical properties of two memory models from the GSEMM class, differing in their activation functions. LISEM has non-linearities in the feature layer, whereas DSEM has non-linearity in the hidden layer. In principle, DSEM has a storage capacity that grows exponentially with the number of neurons in the network. We introduce a learning rule for the synapses based on the energy minimization principle and show it can learn single memories and their sequential relationships online. This rule is similar to the Hebbian learning algorithm and Spike-Timing Dependent Plasticity (STDP), which describe conditions under which synapses between neurons change strength. Thus, GSEMM combines the static and dynamic properties of episodic memory under a single theoretical framework and bridges neuroscience, machine learning, and artificial intelligence. △ Less

Submitted 11 December, 2022; originally announced December 2022.

arXiv:2204.01723 [pdf, other]

Signal Propagation: A Framework for Learning and Inference In a Forward Pass

Authors: Adam Kohan, Edward A. Rietman, Hava T. Siegelmann

Abstract: We propose a new learning framework, signal propagation (sigprop), for propagating a learning signal and updating neural network parameters via a forward pass, as an alternative to backpropagation. In sigprop, there is only the forward path for inference and learning. So, there are no structural or computational constraints necessary for learning to take place, beyond the inference model itself, s… ▽ More We propose a new learning framework, signal propagation (sigprop), for propagating a learning signal and updating neural network parameters via a forward pass, as an alternative to backpropagation. In sigprop, there is only the forward path for inference and learning. So, there are no structural or computational constraints necessary for learning to take place, beyond the inference model itself, such as feedback connectivity, weight transport, or a backward pass, which exist under backpropagation based approaches. That is, sigprop enables global supervised learning with only a forward path. This is ideal for parallel training of layers or modules. In biology, this explains how neurons without feedback connections can still receive a global learning signal. In hardware, this provides an approach for global supervised learning without backward connectivity. Sigprop by construction has compatibility with models of learning in the brain and in hardware than backpropagation, including alternative approaches relaxing learning constraints. We also demonstrate that sigprop is more efficient in time and memory than they are. To further explain the behavior of sigprop, we provide evidence that sigprop provides useful learning signals in context to backpropagation. To further support relevance to biological and hardware learning, we use sigprop to train continuous time neural networks with Hebbian updates, and train spiking neural networks with only the voltage or with biologically and hardware compatible surrogate functions. △ Less

Submitted 17 November, 2022; v1 submitted 4 April, 2022; originally announced April 2022.

arXiv:2202.07132 [pdf]

Memory via Temporal Delays in weightless Spiking Neural Network

Authors: Hananel Hazan, Simon Caby, Christopher Earl, Hava Siegelmann, Michael Levin

Abstract: A common view in the neuroscience community is that memory is encoded in the connection strength between neurons. This perception led artificial neural network models to focus on connection weights as the key variables to modulate learning. In this paper, we present a prototype for weightless spiking neural networks that can perform a simple classification task. The memory in this network is store… ▽ More A common view in the neuroscience community is that memory is encoded in the connection strength between neurons. This perception led artificial neural network models to focus on connection weights as the key variables to modulate learning. In this paper, we present a prototype for weightless spiking neural networks that can perform a simple classification task. The memory in this network is stored in the timing between neurons, rather than the strength of the connection, and is trained using a Hebbian Spike Timing Dependent Plasticity (STDP), which modulates the delays of the connection. △ Less

Submitted 14 February, 2022; originally announced February 2022.

arXiv:2104.04132 [pdf, other]

Replay in Deep Learning: Current Approaches and Missing Biological Elements

Authors: Tyler L. Hayes, Giri P. Krishnan, Maxim Bazhenov, Hava T. Siegelmann, Terrence J. Sejnowski, Christopher Kanan

Abstract: Replay is the reactivation of one or more neural patterns, which are similar to the activation patterns experienced during past waking experiences. Replay was first observed in biological neural networks during sleep, and it is now thought to play a critical role in memory formation, retrieval, and consolidation. Replay-like mechanisms have been incorporated into deep artificial neural networks th… ▽ More Replay is the reactivation of one or more neural patterns, which are similar to the activation patterns experienced during past waking experiences. Replay was first observed in biological neural networks during sleep, and it is now thought to play a critical role in memory formation, retrieval, and consolidation. Replay-like mechanisms have been incorporated into deep artificial neural networks that learn over time to avoid catastrophic forgetting of previous knowledge. Replay algorithms have been successfully used in a wide range of deep learning methods within supervised, unsupervised, and reinforcement learning paradigms. In this paper, we provide the first comprehensive comparison between replay in the mammalian brain and replay in artificial neural networks. We identify multiple aspects of biological replay that are missing in deep learning systems and hypothesize how they could be utilized to improve artificial neural networks. △ Less

Submitted 28 May, 2021; v1 submitted 1 April, 2021; originally announced April 2021.

Comments: Accepted for publication in the MIT Press journal of Neural Computation

arXiv:2005.02434 [pdf]

Nanotechnology-inspired Information Processing Systems of the Future

Authors: Randy Bryant, Mark Hill, Tom Kazior, Daniel Lee, Jie Liu, Klara Nahrstedt, Vijay Narayanan, Jan Rabaey, Hava Siegelmann, Naresh Shanbhag, Naveen Verma, H. -S. Philip Wong

Abstract: Nanoscale semiconductor technology has been a key enabler of the computing revolution. It has done so via advances in new materials and manufacturing processes that resulted in the size of the basic building block of computing systems - the logic switch and memory devices - being reduced into the nanoscale regime. Nanotechnology has provided increased computing functionality per unit volume, energ… ▽ More Nanoscale semiconductor technology has been a key enabler of the computing revolution. It has done so via advances in new materials and manufacturing processes that resulted in the size of the basic building block of computing systems - the logic switch and memory devices - being reduced into the nanoscale regime. Nanotechnology has provided increased computing functionality per unit volume, energy, and cost. In order for computing systems to continue to deliver substantial benefits for the foreseeable future to society at large, it is critical that the very notion of computing be examined in the light of nanoscale realities. In particular, one needs to ask what it means to compute when the very building block - the logic switch - no longer exhibits the level of determinism required by the von Neumann architecture. There needs to be a sustained and heavy investment in a nation-wide Vertically Integrated Semiconductor Ecosystem (VISE). VISE is a program in which research and development is conducted seamlessly across the entire compute stack - from applications, systems and algorithms, architectures, circuits and nanodevices, and materials. A nation-wide VISE provides clear strategic advantages in ensuring the US's global superiority in semiconductors. First, a VISE provides the highest quality seed-corn for nurturing transformative ideas that are critically needed today in order for nanotechnology-inspired computing to flourish. It does so by dramatically opening up new areas of semiconductor research that are inspired and driven by new application needs. Second, a VISE creates a very high barrier to entry from foreign competitors because it is extremely hard to establish, and even harder to duplicate. △ Less

Submitted 5 May, 2020; originally announced May 2020.

Comments: A Computing Community Consortium (CCC) workshop report, 18 pages

Report number: ccc2016report_3

arXiv:1909.02549 [pdf, other]

Minibatch Processing in Spiking Neural Networks

Authors: Daniel J. Saunders, Cooper Sigrist, Kenneth Chaney, Robert Kozma, Hava T. Siegelmann

Abstract: Spiking neural networks (SNNs) are a promising candidate for biologically-inspired and energy efficient computation. However, their simulation is notoriously time consuming, and may be seen as a bottleneck in develo** competitive training methods with potential deployment on neuromorphic hardware platforms. To address this issue, we provide an implementation of mini-batch processing applied to c… ▽ More Spiking neural networks (SNNs) are a promising candidate for biologically-inspired and energy efficient computation. However, their simulation is notoriously time consuming, and may be seen as a bottleneck in develo** competitive training methods with potential deployment on neuromorphic hardware platforms. To address this issue, we provide an implementation of mini-batch processing applied to clock-based SNN simulation, leading to drastically increased data throughput. To our knowledge, this is the first general-purpose implementation of mini-batch processing in a spiking neural networks simulator, which works with arbitrary neuron and synapse models. We demonstrate nearly constant-time scaling with batch size on a simulation setup (up to GPU memory limits), and showcase the effectiveness of large batch sizes in two SNN application domains, resulting in $\approx$880X and $\approx$24X reductions in wall-clock time respectively. Different parameter reduction techniques are shown to produce different learning outcomes in a simulation of networks trained with spike-timing-dependent plasticity. Machine learning practitioners and biological modelers alike may benefit from the drastically reduced simulation time and increased iteration speed this method enables. Code to reproduce the benchmarks and experimental findings in this paper can be found at https://github.com/djsaunde/snn-minibatch. △ Less

Submitted 5 September, 2019; originally announced September 2019.

arXiv:1906.11826 [pdf, other]

Lattice Map Spiking Neural Networks (LM-SNNs) for Clustering and Classifying Image Data

Authors: Hananel Hazan, Daniel J. Saunders, Darpan T. Sanghavi, Hava Siegelmann, Robert Kozma

Abstract: Spiking neural networks (SNNs) with a lattice architecture are introduced in this work, combining several desirable properties of SNNs and self-organized maps (SOMs). Networks are trained with biologically motivated, unsupervised learning rules to obtain a self-organized grid of filters via cooperative and competitive excitatory-inhibitory interactions. Several inhibition strategies are developed… ▽ More Spiking neural networks (SNNs) with a lattice architecture are introduced in this work, combining several desirable properties of SNNs and self-organized maps (SOMs). Networks are trained with biologically motivated, unsupervised learning rules to obtain a self-organized grid of filters via cooperative and competitive excitatory-inhibitory interactions. Several inhibition strategies are developed and tested, such as (i) incrementally increasing inhibition level over the course of network training, and (ii) switching the inhibition level from low to high (two-level) after an initial training segment. During the labeling phase, the spiking activity generated by data with known labels is used to assign neurons to categories of data, which are then used to evaluate the network's classification ability on a held-out set of test data. Several biologically plausible evaluation rules are proposed and compared, including a population-level confidence rating, and an $n$-gram inspired method. The effectiveness of the proposed self-organized learning mechanism is tested using the MNIST benchmark dataset, as well as using images produced by playing the Atari Breakout game. △ Less

Submitted 4 June, 2019; originally announced June 2019.

Comments: Original Manuscript Submitted: October 30, 2018. Revised: May 28, 2019. Special Issue: "Cognition and Neurocomputation" of Annals of Mathematics and Artificial Intelligence. arXiv admin note: text overlap with arXiv:1807.09374

arXiv:1905.11515 [pdf, other]

Abstraction Mechanisms Predict Generalization in Deep Neural Networks

Authors: Alex Gain, Hava Siegelmann

Abstract: A longstanding problem for Deep Neural Networks (DNNs) is understanding their puzzling ability to generalize well. We approach this problem through the unconventional angle of \textit{cognitive abstraction mechanisms}, drawing inspiration from recent neuroscience work, allowing us to define the Cognitive Neural Activation metric (CNA) for DNNs, which is the correlation between information complexi… ▽ More A longstanding problem for Deep Neural Networks (DNNs) is understanding their puzzling ability to generalize well. We approach this problem through the unconventional angle of \textit{cognitive abstraction mechanisms}, drawing inspiration from recent neuroscience work, allowing us to define the Cognitive Neural Activation metric (CNA) for DNNs, which is the correlation between information complexity (entropy) of given input and the concentration of higher activation values in deeper layers of the network. The CNA is highly predictive of generalization ability, outperforming norm-and-margin-based generalization metrics on an extensive evaluation of over 100 dataset-and-network-architecture combinations, especially in cases where additive noise is present and/or training labels are corrupted. These strong empirical results show the usefulness of CNA as a generalization metric, and encourage further research on the connection between information complexity and representations in the deeper layers of networks in order to better understand the generalization capabilities of DNNs. △ Less

Submitted 16 April, 2020; v1 submitted 27 May, 2019; originally announced May 2019.

arXiv:1904.06269 [pdf, other]

Locally Connected Spiking Neural Networks for Unsupervised Feature Learning

Authors: Daniel J. Saunders, Devdhar Patel, Hananel Hazan, Hava T. Siegelmann, Robert Kozma

Abstract: In recent years, Spiking Neural Networks (SNNs) have demonstrated great successes in completing various Machine Learning tasks. We introduce a method for learning image features by \textit{locally connected layers} in SNNs using spike-timing-dependent plasticity (STDP) rule. In our approach, sub-networks compete via competitive inhibitory interactions to learn features from different locations of… ▽ More In recent years, Spiking Neural Networks (SNNs) have demonstrated great successes in completing various Machine Learning tasks. We introduce a method for learning image features by \textit{locally connected layers} in SNNs using spike-timing-dependent plasticity (STDP) rule. In our approach, sub-networks compete via competitive inhibitory interactions to learn features from different locations of the input space. These \textit{Locally-Connected SNNs} (LC-SNNs) manifest key topological features of the spatial interaction of biological neurons. We explore biologically inspired n-gram classification approach allowing parallel processing over various patches of the the image space. We report the classification accuracy of simple two-layer LC-SNNs on two image datasets, which match the state-of-art performance and are the first results to date. LC-SNNs have the advantage of fast convergence to a dataset representation, and they require fewer learnable parameters than other SNN approaches with unsupervised learning. Robustness tests demonstrate that LC-SNNs exhibit graceful degradation of performance despite the random deletion of large amounts of synapses and neurons. △ Less

Submitted 12 April, 2019; originally announced April 2019.

Comments: 22 pages, 7 figures, and 4 tables

arXiv:1903.11012 [pdf, other]

Improved robustness of reinforcement learning policies upon conversion to spiking neuronal network platforms applied to ATARI games

Authors: Devdhar Patel, Hananel Hazan, Daniel J. Saunders, Hava Siegelmann, Robert Kozma

Abstract: Deep Reinforcement Learning (RL) demonstrates excellent performance on tasks that can be solved by trained policy. It plays a dominant role among cutting-edge machine learning approaches using multi-layer Neural networks (NNs). At the same time, Deep RL suffers from high sensitivity to noisy, incomplete, and misleading input data. Following biological intuition, we involve Spiking Neural Networks… ▽ More Deep Reinforcement Learning (RL) demonstrates excellent performance on tasks that can be solved by trained policy. It plays a dominant role among cutting-edge machine learning approaches using multi-layer Neural networks (NNs). At the same time, Deep RL suffers from high sensitivity to noisy, incomplete, and misleading input data. Following biological intuition, we involve Spiking Neural Networks (SNNs) to address some deficiencies of deep RL solutions. Previous studies in image classification domain demonstrated that standard NNs (with ReLU nonlinearity) trained using supervised learning can be converted to SNNs with negligible deterioration in performance. In this paper, we extend those conversion results to the domain of Q-Learning NNs trained using RL. We provide a proof of principle of the conversion of standard NN to SNN. In addition, we show that the SNN has improved robustness to occlusion in the input image. Finally, we introduce results with converting full-scale Deep Q-network to SNN, paving the way for future research to robust Deep RL applications. △ Less

Submitted 19 August, 2019; v1 submitted 26 March, 2019; originally announced March 2019.

arXiv:1808.08173 [pdf, other]

STDP Learning of Image Patches with Convolutional Spiking Neural Networks

Authors: Daniel J. Saunders, Hava T. Siegelmann, Robert Kozma, Miklós Ruszinkó

Abstract: Spiking neural networks are motivated from principles of neural systems and may possess unexplored advantages in the context of machine learning. A class of \textit{convolutional spiking neural networks} is introduced, trained to detect image features with an unsupervised, competitive learning mechanism. Image features can be shared within subpopulations of neurons, or each may evolve independentl… ▽ More Spiking neural networks are motivated from principles of neural systems and may possess unexplored advantages in the context of machine learning. A class of \textit{convolutional spiking neural networks} is introduced, trained to detect image features with an unsupervised, competitive learning mechanism. Image features can be shared within subpopulations of neurons, or each may evolve independently to capture different features in different regions of input space. We analyze the time and memory requirements of learning with and operating such networks. The MNIST dataset is used as an experimental testbed, and comparisons are made between the performance and convergence speed of a baseline spiking neural network. △ Less

Submitted 24 August, 2018; originally announced August 2018.

Comments: 7 pages, 9 figures, and 5 tables

arXiv:1808.03357 [pdf, other]

Error Forward-Propagation: Reusing Feedforward Connections to Propagate Errors in Deep Learning

Authors: Adam A. Kohan, Edward A. Rietman, Hava T. Siegelmann

Abstract: We introduce Error Forward-Propagation, a biologically plausible mechanism to propagate error feedback forward through the network. Architectural constraints on connectivity are virtually eliminated for error feedback in the brain; systematic backward connectivity is not used or needed to deliver error feedback. Feedback as a means of assigning credit to neurons earlier in the forward pathway for… ▽ More We introduce Error Forward-Propagation, a biologically plausible mechanism to propagate error feedback forward through the network. Architectural constraints on connectivity are virtually eliminated for error feedback in the brain; systematic backward connectivity is not used or needed to deliver error feedback. Feedback as a means of assigning credit to neurons earlier in the forward pathway for their contribution to the final output is thought to be used in learning in the brain. How the brain solves the credit assignment problem is unclear. In machine learning, error backpropagation is a highly successful mechanism for credit assignment in deep multilayered networks. Backpropagation requires symmetric reciprocal connectivity for every neuron. From a biological perspective, there is no evidence of such an architectural constraint, which makes backpropagation implausible for learning in the brain. This architectural constraint is reduced with the use of random feedback weights. Models using random feedback weights require backward connectivity patterns for every neuron, but avoid symmetric weights and reciprocal connections. In this paper, we practically remove this architectural constraint, requiring only a backward loop connection for effective error feedback. We propose reusing the forward connections to deliver the error feedback by feeding the outputs into the input receiving layer. This mechanism, Error Forward-Propagation, is a plausible basis for how error feedback occurs deep in the brain independent of and yet in support of the functionality underlying intricate network architectures. We show experimentally that recurrent neural networks with two and three hidden layers can be trained using Error Forward-Propagation on the MNIST and Fashion MNIST datasets, achieving $1.90\%$ and $11\%$ generalization errors respectively. △ Less

Submitted 9 August, 2018; originally announced August 2018.

arXiv:1807.09374 [pdf, other]

doi 10.1109/IJCNN.2018.8489673

Unsupervised Learning with Self-Organizing Spiking Neural Networks

Authors: Hananel Hazan, Daniel J. Saunders, Darpan T. Sanghavi, Hava T. Siegelmann, Robert Kozma

Abstract: We present a system comprising a hybridization of self-organized map (SOM) properties with spiking neural networks (SNNs) that retain many of the features of SOMs. Networks are trained in an unsupervised manner to learn a self-organized lattice of filters via excitatory-inhibitory interactions among populations of neurons. We develop and test various inhibition strategies, such as growing with int… ▽ More We present a system comprising a hybridization of self-organized map (SOM) properties with spiking neural networks (SNNs) that retain many of the features of SOMs. Networks are trained in an unsupervised manner to learn a self-organized lattice of filters via excitatory-inhibitory interactions among populations of neurons. We develop and test various inhibition strategies, such as growing with inter-neuron distance and two distinct levels of inhibition. The quality of the unsupervised learning algorithm is evaluated using examples with known labels. Several biologically-inspired classification tools are proposed and compared, including population-level confidence rating, and n-grams using spike motif algorithm. Using the optimal choice of parameters, our approach produces improvements over state-of-art spiking neural networks. △ Less

Submitted 24 July, 2018; originally announced July 2018.

Journal ref: Proceeding WCCI 2018

arXiv:1806.01423 [pdf, other]

doi 10.3389/fninf.2018.00089

BindsNET: A machine learning-oriented spiking neural networks library in Python

Authors: Hananel Hazan, Daniel J. Saunders, Hassaan Khan, Darpan T. Sanghavi, Hava T. Siegelmann, Robert Kozma

Abstract: The development of spiking neural network simulation software is a critical component enabling the modeling of neural systems and the development of biologically inspired algorithms. Existing software frameworks support a wide range of neural functionality, software abstraction levels, and hardware devices, yet are typically not suitable for rapid prototy** or application to problems in the doma… ▽ More The development of spiking neural network simulation software is a critical component enabling the modeling of neural systems and the development of biologically inspired algorithms. Existing software frameworks support a wide range of neural functionality, software abstraction levels, and hardware devices, yet are typically not suitable for rapid prototy** or application to problems in the domain of machine learning. In this paper, we describe a new Python package for the simulation of spiking neural networks, specifically geared towards machine learning and reinforcement learning. Our software, called BindsNET, enables rapid building and simulation of spiking networks and features user-friendly, concise syntax. BindsNET is built on top of the PyTorch deep neural networks library, enabling fast CPU and GPU computation for large spiking networks. The BindsNET framework can be adjusted to meet the needs of other existing computing and hardware environments, e.g., TensorFlow. We also provide an interface into the OpenAI gym library, allowing for training and evaluation of spiking networks on reinforcement learning problems. We argue that this package facilitates the use of spiking networks for large-scale machine learning experimentation, and show some simple examples of how we envision BindsNET can be used in practice. BindsNET code is available at https://github.com/Hananel-Hazan/bindsnet △ Less

Submitted 10 December, 2018; v1 submitted 4 June, 2018; originally announced June 2018.

Journal ref: Frontiers in Neuroinformatics. 12 December 2018

arXiv:cs/0304042 [pdf, ps, other]

On probabilistic analog automata

Authors: A. Ben-Hur, A. Roitershtein, H. Siegelmann

Abstract: We consider probabilistic automata on a general state space and study their computational power. The model is based on the concept of language recognition by probabilistic automata due to Rabin and models of analog computation in a noisy environment suggested by Maass and Orponen, and Maass and Sontag. Our main result is a generalization of Rabin's reduction theorem that implies that under very… ▽ More We consider probabilistic automata on a general state space and study their computational power. The model is based on the concept of language recognition by probabilistic automata due to Rabin and models of analog computation in a noisy environment suggested by Maass and Orponen, and Maass and Sontag. Our main result is a generalization of Rabin's reduction theorem that implies that under very mild conditions, the computational power of the automaton is limited to regular languages. △ Less

Submitted 30 April, 2003; v1 submitted 28 April, 2003; originally announced April 2003.

ACM Class: F.1.1; F.1.2

arXiv:cs/0110056 [pdf, ps, other]

Probabilistic analysis of a differential equation for linear programming

Authors: Asa Ben-Hur, Joshua Feinberg, Shmuel Fishman, Hava T. Siegelmann

Abstract: In this paper we address the complexity of solving linear programming problems with a set of differential equations that converge to a fixed point that represents the optimal solution. Assuming a probabilistic model, where the inputs are i.i.d. Gaussian variables, we compute the distribution of the convergence rate to the attracting fixed point. Using the framework of Random Matrix Theory, we de… ▽ More In this paper we address the complexity of solving linear programming problems with a set of differential equations that converge to a fixed point that represents the optimal solution. Assuming a probabilistic model, where the inputs are i.i.d. Gaussian variables, we compute the distribution of the convergence rate to the attracting fixed point. Using the framework of Random Matrix Theory, we derive a simple expression for this distribution in the asymptotic limit of large problem size. In this limit, we find that the distribution of the convergence rate is a scaling function, namely it is a function of one variable that is a combination of three parameters: the number of variables, the number of constraints and the convergence rate, rather than a function of these parameters separately. We also estimate numerically the distribution of computation times, namely the time required to reach a vicinity of the attracting fixed point, and find that it is also a scaling function. Using the problem size dependence of the distribution functions, we derive high probability bounds on the convergence rates and on the computation times. △ Less

Submitted 7 April, 2003; v1 submitted 29 October, 2001; originally announced October 2001.

Comments: 1+37 pages, latex, 5 eps figures. Version accepted for publication in the Journal of Complexity. Changes made: Presentation reorganized for clarity, expanded discussion of measure of complexity in the non-asymptotic regime (added a new section)

ACM Class: F.1.3, F.2

Showing 1–24 of 24 results for author: Siegelmann, H