-
LiteLSTM Architecture Based on Weights Sharing for Recurrent Neural Networks
Authors:
Nelly Elsayed,
Zag ElSayed,
Anthony S. Maida
Abstract:
Long short-term memory (LSTM) is one of the robust recurrent neural network architectures for learning sequential data. However, it requires considerable computational power to learn and implement both software and hardware aspects. This paper proposed a novel LiteLSTM architecture based on reducing the LSTM computation components via the weights sharing concept to reduce the overall architecture…
▽ More
Long short-term memory (LSTM) is one of the robust recurrent neural network architectures for learning sequential data. However, it requires considerable computational power to learn and implement both software and hardware aspects. This paper proposed a novel LiteLSTM architecture based on reducing the LSTM computation components via the weights sharing concept to reduce the overall architecture computation cost and maintain the architecture performance. The proposed LiteLSTM can be significant for processing large data where time-consuming is crucial while hardware resources are limited, such as the security of IoT devices and medical data processing. The proposed model was evaluated and tested empirically on three different datasets from the computer vision, cybersecurity, speech emotion recognition domains. The proposed LiteLSTM has comparable accuracy to the other state-of-the-art recurrent architecture while using a smaller computation budget.
△ Less
Submitted 11 January, 2023;
originally announced January 2023.
-
Deep Residual Axial Networks
Authors:
Nazmul Shahadat,
Anthony S. Maida
Abstract:
While convolutional neural networks (CNNs) demonstrate outstanding performance on computer vision tasks, their computational costs remain high. Several techniques are used to reduce these costs, like reducing channel count, and using separable and depthwise separable convolutions. This paper reduces computational costs by introducing a novel architecture, axial CNNs, which replaces spatial 2D conv…
▽ More
While convolutional neural networks (CNNs) demonstrate outstanding performance on computer vision tasks, their computational costs remain high. Several techniques are used to reduce these costs, like reducing channel count, and using separable and depthwise separable convolutions. This paper reduces computational costs by introducing a novel architecture, axial CNNs, which replaces spatial 2D convolution operations with two consecutive depthwise separable 1D operations. The axial CNNs are predicated on the assumption that the dataset supports approximately separable convolution operations with little or no loss of training accuracy. Deep axial separable CNNs still suffer from gradient problems when training deep networks. We modify the construction of axial separable CNNs with residual connections to improve the performance of deep axial architectures and introduce our final novel architecture namely residual axial networks (RANs). Extensive benchmark evaluation shows that RANs achieve at least 1% higher performance with about 77%, 86%, 75%, and 34% fewer parameters and about 75%, 80%, 67%, and 26% fewer flops than ResNets, wide ResNets, MobileNets, and SqueezeNexts on CIFAR benchmarks, SVHN, and Tiny ImageNet image classification datasets. Moreover, our proposed RANs improve deep recursive residual networks performance with 94% fewer parameters on the image super-resolution dataset.
△ Less
Submitted 17 March, 2023; v1 submitted 11 January, 2023;
originally announced January 2023.
-
Deep Axial Hypercomplex Networks
Authors:
Nazmul Shahadat,
Anthony S. Maida
Abstract:
Over the past decade, deep hypercomplex-inspired networks have enhanced feature extraction for image classification by enabling weight sharing across input channels. Recent works make it possible to improve representational capabilities by using hypercomplex-inspired networks which consume high computational costs. This paper reduces this cost by factorizing a quaternion 2D convolutional module in…
▽ More
Over the past decade, deep hypercomplex-inspired networks have enhanced feature extraction for image classification by enabling weight sharing across input channels. Recent works make it possible to improve representational capabilities by using hypercomplex-inspired networks which consume high computational costs. This paper reduces this cost by factorizing a quaternion 2D convolutional module into two consecutive vectormap 1D convolutional modules. Also, we use 5D parameterized hypercomplex multiplication based fully connected layers. Incorporating both yields our proposed hypercomplex network, a novel architecture that can be assembled to construct deep axial-hypercomplex networks (DANs) for image classifications. We conduct experiments on CIFAR benchmarks, SVHN, and Tiny ImageNet datasets and achieve better performance with fewer trainable parameters and FLOPS. Our proposed model achieves almost 2% higher performance for CIFAR and SVHN datasets, and more than 3% for the ImageNet-Tiny dataset and takes six times fewer parameters than the real-valued ResNets. Also, it shows state-of-the-art performance on CIFAR benchmarks in hypercomplex space.
△ Less
Submitted 11 January, 2023;
originally announced January 2023.
-
Enhancing ResNet Image Classification Performance by using Parameterized Hypercomplex Multiplication
Authors:
Nazmul Shahadat,
Anthony S. Maida
Abstract:
Recently, many deep networks have introduced hypercomplex and related calculations into their architectures. In regard to convolutional networks for classification, these enhancements have been applied to the convolution operations in the frontend to enhance accuracy and/or reduce the parameter requirements while maintaining accuracy. Although these enhancements have been applied to the convolutio…
▽ More
Recently, many deep networks have introduced hypercomplex and related calculations into their architectures. In regard to convolutional networks for classification, these enhancements have been applied to the convolution operations in the frontend to enhance accuracy and/or reduce the parameter requirements while maintaining accuracy. Although these enhancements have been applied to the convolutional frontend, it has not been studied whether adding hypercomplex calculations improves performance when applied to the densely connected backend. This paper studies ResNet architectures and incorporates parameterized hypercomplex multiplication (PHM) into the backend of residual, quaternion, and vectormap convolutional neural networks to assess the effect. We show that PHM does improve classification accuracy performance on several image datasets, including small, low-resolution CIFAR 10/100 and large high-resolution ImageNet and ASL, and can achieve state-of-the-art accuracy for hypercomplex networks.
△ Less
Submitted 11 January, 2023;
originally announced January 2023.
-
Vision-Based American Sign Language Classification Approach via Deep Learning
Authors:
Nelly Elsayed,
Zag ElSayed,
Anthony S. Maida
Abstract:
Hearing-impaired is the disability of partial or total hearing loss that causes a significant problem for communication with other people in society. American Sign Language (ASL) is one of the sign languages that most commonly used language used by Hearing impaired communities to communicate with each other. In this paper, we proposed a simple deep learning model that aims to classify the American…
▽ More
Hearing-impaired is the disability of partial or total hearing loss that causes a significant problem for communication with other people in society. American Sign Language (ASL) is one of the sign languages that most commonly used language used by Hearing impaired communities to communicate with each other. In this paper, we proposed a simple deep learning model that aims to classify the American Sign Language letters as a step in a path for removing communication barriers that are related to disabilities.
△ Less
Submitted 8 April, 2022;
originally announced April 2022.
-
LiteLSTM Architecture for Deep Recurrent Neural Networks
Authors:
Nelly Elsayed,
Zag ElSayed,
Anthony S. Maida
Abstract:
Long short-term memory (LSTM) is a robust recurrent neural network architecture for learning spatiotemporal sequential data. However, it requires significant computational power for learning and implementing from both software and hardware aspects. This paper proposes a novel LiteLSTM architecture based on reducing the computation components of the LSTM using the weights sharing concept to reduce…
▽ More
Long short-term memory (LSTM) is a robust recurrent neural network architecture for learning spatiotemporal sequential data. However, it requires significant computational power for learning and implementing from both software and hardware aspects. This paper proposes a novel LiteLSTM architecture based on reducing the computation components of the LSTM using the weights sharing concept to reduce the overall architecture cost and maintain the architecture performance. The proposed LiteLSTM can be significant for learning big data where time-consumption is crucial such as the security of IoT devices and medical data. Moreover, it helps to reduce the CO2 footprint. The proposed model was evaluated and tested empirically on two different datasets from computer vision and cybersecurity domains.
△ Less
Submitted 24 October, 2022; v1 submitted 27 January, 2022;
originally announced January 2022.
-
Improving Axial-Attention Network Classification via Cross-Channel Weight Sharing
Authors:
Nazmul Shahadat,
Anthony S. Maida
Abstract:
In recent years, hypercomplex-inspired neural networks (HCNNs) have been used to improve deep learning architectures due to their ability to enable channel-based weight sharing, treat colors as a single entity, and improve representational coherence within the layers. The work described herein studies the effect of replacing existing layers in an Axial Attention network with their representational…
▽ More
In recent years, hypercomplex-inspired neural networks (HCNNs) have been used to improve deep learning architectures due to their ability to enable channel-based weight sharing, treat colors as a single entity, and improve representational coherence within the layers. The work described herein studies the effect of replacing existing layers in an Axial Attention network with their representationally coherent variants to assess the effect on image classification. We experiment with the stem of the network, the bottleneck layers, and the fully connected backend, by replacing them with representationally coherent variants. These various modifications lead to novel architectures which all yield improved accuracy performance on the ImageNet300k classification dataset. Our baseline networks for comparison were the original real-valued ResNet, the original quaternion-valued ResNet, and the Axial Attention ResNet. Since improvement was observed regardless of which part of the network was modified, there is a promise that this technique may be generally useful in improving classification accuracy for a large class of networks.
△ Less
Submitted 12 January, 2023; v1 submitted 4 October, 2021;
originally announced October 2021.
-
Stacked LSTM Based Deep Recurrent Neural Network with Kalman Smoothing for Blood Glucose Prediction
Authors:
Md Fazle Rabby,
Yazhou Tu,
Md Imran Hossen,
Insup Le,
Anthony S Maida,
Xiali Hei
Abstract:
Blood glucose (BG) management is crucial for type-1 diabetes patients resulting in the necessity of reliable artificial pancreas or insulin infusion systems. In recent years, deep learning techniques have been utilized for a more accurate BG level prediction system. However, continuous glucose monitoring (CGM) readings are susceptible to sensor errors. As a result, inaccurate CGM readings would af…
▽ More
Blood glucose (BG) management is crucial for type-1 diabetes patients resulting in the necessity of reliable artificial pancreas or insulin infusion systems. In recent years, deep learning techniques have been utilized for a more accurate BG level prediction system. However, continuous glucose monitoring (CGM) readings are susceptible to sensor errors. As a result, inaccurate CGM readings would affect BG prediction and make it unreliable, even if the most optimal machine learning model is used. In this work, we propose a novel approach to predicting blood glucose level with a stacked Long short-term memory (LSTM) based deep recurrent neural network (RNN) model considering sensor fault. We use the Kalman smoothing technique for the correction of the inaccurate CGM readings due to sensor error. For the OhioT1DM dataset, containing eight weeks' data from six different patients, we achieve an average RMSE of 6.45 and 17.24 mg/dl for 30 minutes and 60 minutes of prediction horizon (PH), respectively. To the best of our knowledge, this is the leading average prediction accuracy for the ohioT1DM dataset. Different physiological information, e.g., Kalman smoothed CGM data, carbohydrates from the meal, bolus insulin, and cumulative step counts in a fixed time interval, are crafted to represent meaningful features used as input to the model. The goal of our approach is to lower the difference between the predicted CGM values and the fingerstick blood glucose readings - the ground truth. Our results indicate that the proposed approach is feasible for more reliable BG forecasting that might improve the performance of the artificial pancreas and insulin infusion system for T1D diabetes management.
△ Less
Submitted 17 January, 2021;
originally announced January 2021.
-
Generalizing Complex/Hyper-complex Convolutions to Vector Map Convolutions
Authors:
Chase J Gaudet,
Anthony S Maida
Abstract:
We show that the core reasons that complex and hypercomplex valued neural networks offer improvements over their real-valued counterparts is the weight sharing mechanism and treating multidimensional data as a single entity. Their algebra linearly combines the dimensions, making each dimension related to the others. However, both are constrained to a set number of dimensions, two for complex and f…
▽ More
We show that the core reasons that complex and hypercomplex valued neural networks offer improvements over their real-valued counterparts is the weight sharing mechanism and treating multidimensional data as a single entity. Their algebra linearly combines the dimensions, making each dimension related to the others. However, both are constrained to a set number of dimensions, two for complex and four for quaternions. Here we introduce novel vector map convolutions which capture both of these properties provided by complex/hypercomplex convolutions, while drop** the unnatural dimensionality constraints they impose. This is achieved by introducing a system that mimics the unique linear combination of input dimensions, such as the Hamilton product for quaternions. We perform three experiments to show that these novel vector map convolutions seem to capture all the benefits of complex and hyper-complex networks, such as their ability to capture internal latent relations, while avoiding the dimensionality restriction.
△ Less
Submitted 8 September, 2020;
originally announced September 2020.
-
Inception-inspired LSTM for Next-frame Video Prediction
Authors:
Matin Hosseini,
Anthony S. Maida,
Majid Hosseini,
Gottumukkala Raju
Abstract:
The problem of video frame prediction has received much interest due to its relevance to many computer vision applications such as autonomous vehicles or robotics. Supervised methods for video frame prediction rely on labeled data, which may not always be available. In this paper, we provide a novel unsupervised deep-learning method called Inception-based LSTM for video frame prediction. The gener…
▽ More
The problem of video frame prediction has received much interest due to its relevance to many computer vision applications such as autonomous vehicles or robotics. Supervised methods for video frame prediction rely on labeled data, which may not always be available. In this paper, we provide a novel unsupervised deep-learning method called Inception-based LSTM for video frame prediction. The general idea of inception networks is to implement wider networks instead of deeper networks. This network design was shown to improve the performance of image classification. The proposed method is evaluated on both Inception-v1 and Inception-v2 structures. The proposed Inception LSTM methods are compared with convolutional LSTM when applied using PredNet predictive coding framework for both the KITTI and KTH data sets. We observed that the Inception based LSTM outperforms the convolutional LSTM. Also, Inception LSTM has better prediction performance compared to Inception v2 LSTM. However, Inception v2 LSTM has a lower computational cost compared to Inception LSTM.
△ Less
Submitted 24 April, 2020; v1 submitted 27 August, 2019;
originally announced September 2019.
-
Deep Gated Recurrent and Convolutional Network Hybrid Model for Univariate Time Series Classification
Authors:
Nelly Elsayed,
Anthony S. Maida,
Magdy Bayoumi
Abstract:
Hybrid LSTM-fully convolutional networks (LSTM-FCN) for time series classification have produced state-of-the-art classification results on univariate time series. We show that replacing the LSTM with a gated recurrent unit (GRU) to create a GRU-fully convolutional network hybrid model (GRU-FCN) can offer even better performance on many time series datasets. The proposed GRU-FCN model outperforms…
▽ More
Hybrid LSTM-fully convolutional networks (LSTM-FCN) for time series classification have produced state-of-the-art classification results on univariate time series. We show that replacing the LSTM with a gated recurrent unit (GRU) to create a GRU-fully convolutional network hybrid model (GRU-FCN) can offer even better performance on many time series datasets. The proposed GRU-FCN model outperforms state-of-the-art classification performance in many univariate and multivariate time series datasets. In addition, since the GRU uses a simpler architecture than the LSTM, it has fewer training parameters, less training time, and a simpler hardware implementation, compared to the LSTM-based models.
△ Less
Submitted 19 February, 2019; v1 submitted 18 December, 2018;
originally announced December 2018.
-
Reduced-Gate Convolutional LSTM Using Predictive Coding for Spatiotemporal Prediction
Authors:
Nelly Elsayed,
Anthony S. Maida,
Magdy Bayoumi
Abstract:
Spatiotemporal sequence prediction is an important problem in deep learning. We study next-frame(s) video prediction using a deep-learning-based predictive coding framework that uses convolutional, long short-term memory (convLSTM) modules. We introduce a novel reduced-gate convolutional LSTM(rgcLSTM) architecture that requires a significantly lower parameter budget than a comparable convLSTM. By…
▽ More
Spatiotemporal sequence prediction is an important problem in deep learning. We study next-frame(s) video prediction using a deep-learning-based predictive coding framework that uses convolutional, long short-term memory (convLSTM) modules. We introduce a novel reduced-gate convolutional LSTM(rgcLSTM) architecture that requires a significantly lower parameter budget than a comparable convLSTM. By using a single multi-function gate, our reduced-gate model achieves equal or better next-frame(s) prediction accuracy than the original convolutional LSTM while using a smaller parameter budget, thereby reducing training time and memory requirements. We tested our reduced gate modules within a predictive coding architecture on the moving MNIST and KITTI datasets. We found that our reduced-gate model has a significant reduction of approximately 40 percent of the total number of training parameters and a 25 percent reduction in elapsed training time in comparison with the standard convolutional LSTM model. The performance accuracy of the new model was also improved. This makes our model more attractive for hardware implementation, especially on small devices. We also explored a space of twenty different gated architectures to get insight into how our rgcLSTM fit into that space.
△ Less
Submitted 22 December, 2019; v1 submitted 16 October, 2018;
originally announced October 2018.
-
Deep Learning in Spiking Neural Networks
Authors:
Amirhossein Tavanaei,
Masoud Ghodrati,
Saeed Reza Kheradpisheh,
Timothee Masquelier,
Anthony S. Maida
Abstract:
In recent years, deep learning has been a revolution in the field of machine learning, for computer vision in particular. In this approach, a deep (multilayer) artificial neural network (ANN) is trained in a supervised manner using backpropagation. Huge amounts of labeled examples are required, but the resulting classification accuracy is truly impressive, sometimes outperforming humans. Neurons i…
▽ More
In recent years, deep learning has been a revolution in the field of machine learning, for computer vision in particular. In this approach, a deep (multilayer) artificial neural network (ANN) is trained in a supervised manner using backpropagation. Huge amounts of labeled examples are required, but the resulting classification accuracy is truly impressive, sometimes outperforming humans. Neurons in an ANN are characterized by a single, static, continuous-valued activation. Yet biological neurons use discrete spikes to compute and transmit information, and the spike times, in addition to the spike rates, matter. Spiking neural networks (SNNs) are thus more biologically realistic than ANNs, and arguably the only viable option if one wants to understand how the brain computes. SNNs are also more hardware friendly and energy-efficient than ANNs, and are thus appealing for technology, especially for portable devices. However, training deep SNNs remains a challenge. Spiking neurons' transfer function is usually non-differentiable, which prevents using backpropagation. Here we review recent supervised and unsupervised methods to train deep SNNs, and compare them in terms of accuracy, but also computational cost and hardware friendliness. The emerging picture is that SNNs still lag behind ANNs in terms of accuracy, but the gap is decreasing, and can even vanish on some tasks, while the SNNs typically require much fewer operations.
△ Less
Submitted 20 January, 2019; v1 submitted 22 April, 2018;
originally announced April 2018.
-
BP-STDP: Approximating Backpropagation using Spike Timing Dependent Plasticity
Authors:
Amirhossein Tavanaei,
Anthony S. Maida
Abstract:
The problem of training spiking neural networks (SNNs) is a necessary precondition to understanding computations within the brain, a field still in its infancy. Previous work has shown that supervised learning in multi-layer SNNs enables bio-inspired networks to recognize patterns of stimuli through hierarchical feature acquisition. Although gradient descent has shown impressive performance in mul…
▽ More
The problem of training spiking neural networks (SNNs) is a necessary precondition to understanding computations within the brain, a field still in its infancy. Previous work has shown that supervised learning in multi-layer SNNs enables bio-inspired networks to recognize patterns of stimuli through hierarchical feature acquisition. Although gradient descent has shown impressive performance in multi-layer (and deep) SNNs, it is generally not considered biologically plausible and is also computationally expensive. This paper proposes a novel supervised learning approach based on an event-based spike-timing-dependent plasticity (STDP) rule embedded in a network of integrate-and-fire (IF) neurons. The proposed temporally local learning rule follows the backpropagation weight change updates applied at each time step. This approach enjoys benefits of both accurate gradient descent and temporally local, efficient STDP. Thus, this method is able to address some open questions regarding accurate and efficient computations that occur in the brain. The experimental results on the XOR problem, the Iris data, and the MNIST dataset demonstrate that the proposed SNN performs as successfully as the traditional NNs. Our approach also compares favorably with the state-of-the-art multi-layer SNNs.
△ Less
Submitted 9 March, 2018; v1 submitted 11 November, 2017;
originally announced November 2017.
-
Bio-Inspired Spiking Convolutional Neural Network using Layer-wise Sparse Coding and STDP Learning
Authors:
Amirhossein Tavanaei,
Anthony S. Maida
Abstract:
Hierarchical feature discovery using non-spiking convolutional neural networks (CNNs) has attracted much recent interest in machine learning and computer vision. However, it is still not well understood how to create a biologically plausible network of brain-like, spiking neurons with multi-layer, unsupervised learning. This paper explores a novel bio-inspired spiking CNN that is trained in a gree…
▽ More
Hierarchical feature discovery using non-spiking convolutional neural networks (CNNs) has attracted much recent interest in machine learning and computer vision. However, it is still not well understood how to create a biologically plausible network of brain-like, spiking neurons with multi-layer, unsupervised learning. This paper explores a novel bio-inspired spiking CNN that is trained in a greedy, layer-wise fashion. The proposed network consists of a spiking convolutional-pooling layer followed by a feature discovery layer extracting independent visual features. Kernels for the convolutional layer are trained using local learning. The learning is implemented using a sparse, spiking auto-encoder representing primary visual features. The feature discovery layer extracts independent features by probabilistic, leaky integrate-and-fire (LIF) neurons that are sparsely active in response to stimuli. The layer of the probabilistic, LIF neurons implicitly provides lateral inhibitions to extract sparse and independent features. Experimental results show that the convolutional layer is stack-admissible, enabling it to support a multi-layer learning. The visual features obtained from the proposed probabilistic LIF neurons in the feature discovery layer are utilized for training a classifier. Classification results contribute to the independent and informative visual features extracted in a hierarchy of convolutional and feature discovery layers. The proposed model is evaluated on the MNIST digit dataset using clean and noisy images. The recognition performance for clean images is above 98%. The performance loss for recognizing the noisy images is in the range 0.1% to 8.5% depending on noise types and densities. This level of performance loss indicates that the network is robust to additive noise.
△ Less
Submitted 23 June, 2017; v1 submitted 9 November, 2016;
originally announced November 2016.
-
Acquisition of Visual Features Through Probabilistic Spike-Timing-Dependent Plasticity
Authors:
Amirhossein Tavanaei,
Timothee Masquelier,
Anthony S Maida
Abstract:
The final version of this paper has been published in IEEEXplore available at http://ieeexplore.ieee.org/document/7727213. Please cite this paper as: Amirhossein Tavanaei, Timothee Masquelier, and Anthony Maida, Acquisition of visual features through probabilistic spike-timing-dependent plasticity. IEEE International Joint Conference on Neural Networks. pp. 307-314, IJCNN 2016.
This paper explor…
▽ More
The final version of this paper has been published in IEEEXplore available at http://ieeexplore.ieee.org/document/7727213. Please cite this paper as: Amirhossein Tavanaei, Timothee Masquelier, and Anthony Maida, Acquisition of visual features through probabilistic spike-timing-dependent plasticity. IEEE International Joint Conference on Neural Networks. pp. 307-314, IJCNN 2016.
This paper explores modifications to a feedforward five-layer spiking convolutional network (SCN) of the ventral visual stream [Masquelier, T., Thorpe, S., Unsupervised learning of visual features through spike timing dependent plasticity. PLoS Computational Biology, 3(2), 247-257]. The original model showed that a spike-timing-dependent plasticity (STDP) learning algorithm embedded in an appropriately selected SCN could perform unsupervised feature discovery. The discovered features where interpretable and could effectively be used to perform rapid binary decisions in a classifier. In order to study the robustness of the previous results, the present research examines the effects of modifying some of the components of the original model. For improved biological realism, we replace the original non-leaky integrate-and-fire neurons with Izhikevich-like neurons. We also replace the original STDP rule with a novel rule that has a probabilistic interpretation. The probabilistic STDP slightly but significantly improves the performance for both types of model neurons. Use of the Izhikevich-like neuron was not found to improve performance although performance was still comparable to the IF neuron. This shows that the model is robust enough to handle more biologically realistic neurons. We also conclude that the underlying reasons for stable performance in the model are preserved despite the overt changes to the explicit components of the model.
△ Less
Submitted 8 November, 2016; v1 submitted 3 June, 2016;
originally announced June 2016.
-
Training a Hidden Markov Model with a Bayesian Spiking Neural Network
Authors:
Amirhossein Tavanaei,
Anthony S Maida
Abstract:
It is of some interest to understand how statistically based mechanisms for signal processing might be integrated with biologically motivated mechanisms such as neural networks. This paper explores a novel hybrid approach for classifying segments of sequential data, such as individual spoken works. The approach combines a hidden Markov model (HMM) with a spiking neural network (SNN). The HMM, cons…
▽ More
It is of some interest to understand how statistically based mechanisms for signal processing might be integrated with biologically motivated mechanisms such as neural networks. This paper explores a novel hybrid approach for classifying segments of sequential data, such as individual spoken works. The approach combines a hidden Markov model (HMM) with a spiking neural network (SNN). The HMM, consisting of states and transitions, forms a fixed backbone with nonadaptive transition probabilities. The SNN, however, implements a biologically based Bayesian computation that derives from the spike timing-dependent plasticity (STDP) learning rule. The emission (observation) probabilities of the HMM are represented in the SNN and trained with the STDP rule. A separate SNN, each with the same architecture, is associated with each of the states of the HMM. Because of the STDP training, each SNN implements an expectation maximization algorithm to learn the emission probabilities for one HMM state. The model was studied on synthesized spike-train data and also on spoken word data. Preliminary results suggest its performance compares favorably with other biologically motivated approaches. Because of the model's uniqueness and initial promise, it warrants further study. It provides some new ideas on how the brain might implement the equivalent of an HMM in a neural circuit.
△ Less
Submitted 20 July, 2016; v1 submitted 2 June, 2016;
originally announced June 2016.
-
A Spiking Network that Learns to Extract Spike Signatures from Speech Signals
Authors:
Amirhossein Tavanaei,
Anthony S Maida
Abstract:
Spiking neural networks (SNNs) with adaptive synapses reflect core properties of biological neural networks. Speech recognition, as an application involving audio coding and dynamic learning, provides a good test problem to study SNN functionality. We present a simple, novel, and efficient nonrecurrent SNN that learns to convert a speech signal into a spike train signature. The signature is distin…
▽ More
Spiking neural networks (SNNs) with adaptive synapses reflect core properties of biological neural networks. Speech recognition, as an application involving audio coding and dynamic learning, provides a good test problem to study SNN functionality. We present a simple, novel, and efficient nonrecurrent SNN that learns to convert a speech signal into a spike train signature. The signature is distinguishable from signatures for other speech signals representing different words, thereby enabling digit recognition and discrimination in devices that use only spiking neurons. The method uses a small, nonrecurrent SNN consisting of Izhikevich neurons equipped with spike timing dependent plasticity (STDP) and biologically realistic synapses. This approach introduces an efficient and fast network without error-feedback training, although it does require supervised training. The new simulation results produce discriminative spike train patterns for spoken digits in which highly correlated spike trains belong to the same category and low correlated patterns belong to different categories. The proposed SNN is evaluated using a spoken digit recognition task where a subset of the Aurora speech dataset is used. The experimental results show that the network performs well in terms of accuracy rate and complexity.
△ Less
Submitted 11 March, 2017; v1 submitted 2 June, 2016;
originally announced June 2016.