License: arXiv.org perpetual non-exclusive license
arXiv:2312.05643v1 [cs.NE] 09 Dec 2023

NiSNN-A: Non-iterative Spiking Neural Networks with Attention with Application to Motor Imagery EEG Classification

Chuhan Zhang, Wei Pan, Cosimo Della Santina Chuhan Zhang, Cosimo Della Santina are with the Department of Cognitive Robotics, Faculty of Mechanical Engineering, Delft University of Technology, Delft, Netherlands (e-mail: [email protected]; [email protected]).Wei Pan is with the Department of Computer Science, The University of Manchester, Manchester, United Kingdom (e-mail: [email protected]).
Abstract

Motor imagery, an important category in electroencephalogram (EEG) research, often intersects with scenarios demanding low energy consumption, such as portable medical devices and isolated environment operations. Traditional deep learning algorithms, despite their effectiveness, are characterized by significant computational demands accompanied by high energy usage. As an alternative, spiking neural networks (SNNs), inspired by the biological functions of the brain, emerge as a promising energy-efficient solution. However, SNNs typically exhibit lower accuracy than their counterpart convolutional neural networks (CNNs). Although attention mechanisms successfully increase network accuracy by focusing on relevant features, their integration in the SNN framework remains an open question. In this work, we combine the SNN and the attention mechanisms for the EEG classification, aiming to improve precision and reduce energy consumption. To this end, we first propose a Non-iterative Leaky Integrate-and-Fire (LIF) neuron model, overcoming the gradient issues in the traditional SNNs using the Iterative LIF neurons. Then, we introduce the sequence-based attention mechanisms to refine the feature map. We evaluated the proposed Non-iterative SNN with Attention (NiSNN-A) model on OpenBMI, a large-scale motor imagery dataset. Experiment results demonstrate that 1) our model outperforms other SNN models by achieving higher accuracy, 2) our model increases energy efficiency compared to the counterpart CNN models (i.e., by 2.27 times) while maintaining comparable accuracy.

Index Terms:
Spiking neural networks, Attention mechanism, Motor imagery, EEG classification.

I Introduction

Electroencephalograms (EEGs), whether captured through non-invasive electrodes on the scalp or directly via invasive devices, are the cornerstone of the rapidly evolving domain of Brain-Computer Interfaces (BCI). Therefore, accurate classification of EEG signals has attracted substantial attention over the years - with applications ranging from advanced neurorehabilitation techniques [1] to diagnostics and real-time health monitoring [2]. Within the realm of BCI, motor imagery (MI) holds a distinctive place [3]. MI refers to the mental imagination of specific movements by subjects, leading to the generation of distinct EEG patterns. Accurately classifying these EEG signals opens up application possibilities in advanced fields like robot control and assistive technologies [4].

Recently, deep learning (DL) methods such as Convolutional Neural Networks (CNNs) [5], Recurrent Neural Networks (RNNs) [6], and Transformers [7] have received increasing interest as a mean for classifying EEG signals [8, 9, 10]. At present, in addition to conventional CNNs or RNNs, DL of EEG incorporates many other technologies [11, 12, 13, 14, 15]. Attention mechanisms [16], inspired by human cognitive processes, enhance the model’s focus on relevant features, facilitating more efficient and accurate feature extraction [17]. Attention mechanisms have been successfully applied to EEG classification. More details are presented in Section II-B. However, all of these methods suffer from the current limitation of high energy consumption, which poses a significant barrier to deployment in low energy scenarios such as edge devices for healthcare [18] or robot control [19].

In this work, we investigate the use of Spiking Neural Networks (SNNs) for EEG classification. This technique mimics how biological neurons operate, allowing it to be used to interpret natural neuronal signals [20]. SNNs also offer a promising avenue for reducing energy consumption due to their event-based nature [21, 22, 23], which is attractive in view of edge applications. More details on the state-of-the-art of SNN in EEG classification are provided in Section II-C. We propose a novel integration of SNNs and the attention mechanism, especially for EEG classification. At the core of our approach sits a newly proposed Non-iterative Leaky Integrate-and-Fire (LIF) neuron, which utilizes matrix operations to approximate the neural dynamics of the biological LIF neuron. In this way, we can avoid lengthy loop executions and mitigate the gradient vanishing problem during training, thereby boosting both the efficiency and accuracy of the execution process. The second key methodological contribution is a sequence-based attention model for EEG data, which can simultaneously obtain the attention scores of feature maps. It is worth noting that only one work has investigated attention SNNs, which, however, specifically targeted computer vision [24]. As we show in our experimental comparison, this method is not suitable for classifying long-term data such as EEG.

The contributions of this paper can be summarized as follows:

  1. 1.

    We show for the first time the combination of SNNs and attention mechanisms for motor imagery EEG classification, simultaneously achieving high accuracy and reducing energy consumption.

  2. 2.

    We propose a novel Non-iterative LIF neuron model for SNNs tackling the gradient problem during long-time step backpropagation in the traditional Iterative LIF neuron model.

  3. 3.

    We introduce a sequence-based attention mechanism for SNNs, improving the classification accuracy.

The rest of the paper is organized as follows. Section III introduces our proposed Non-iterative SNN with attention (NiSNN-A). Section IV gives the experiment details. The results and discussion are conducted in Section V. Section VI concludes the work.

II Related Work

In this section, the related works are introduced. Section II-A illustrates the application of deep learning techniques specific to EEG signal processing. Subsequently, Section II-B illustrates the works with attention mechanisms. Finally, the applications of SNN in EEG classification are described in Section II-C.

II-A Deep learning methods for EEG

In recent years, integrating deep learning techniques into EEG classification has gained significant traction [25]. CNN stands out among these techniques, particularly due to its ability to identify spatial-temporal patterns within the complex, multi-dimensional EEG data [26, 27, 28]. For instance, a temporal-spatial CNN was employed for EEG classification in [29, 30]. The work in [31] introduced EEGNet, which integrates a separable convolutional layer following the temporal and spatial modules. Enhancing the CNN architecture, [32, 33] incorporated residual blocks for classification. Another noteworthy approach is presented in [34], where an autoencoder is built upon the CNN framework. Furthermore, this work adopted a subject-independent training paradigm, emphasizing its scalability on varied EEG data from different subjects. Apart from CNNs, Long Short-Term Memory (LSTM) networks have also been recognized for EEG classification due to their inherent ability to process time sequences effectively [35, 36, 37]. These methods are limited in high energy consumption, making them difficult to use on some edge devices with low-energy requirements.

II-B Attention mechanism

The Transformer architecture and attention mechanism, originally introduced in [16] for natural language processing tasks, have seen increasing adoption in deep learning domain. These methods are now being explored in the field of EEG classification, given their ability to handle time sequence data. The attention mechanism is particularly important for EEG data analysis. It holds the potential to enhance classification accuracy and emphasizes specific segments of the data, offering deeper insights into EEG signal characteristics. Various attention models have emerged in the field of EEG classification. For instance, [12] presents a spatial and temporal attention model integrated with CNN. This approach leverages two distinct CNN modules to derive spatial and temporal attention scores separately, subsequently using four convolutional layers for classification. Meanwhile, [38] applies a multi-head attention module, as described in [16], combined with five convolutional layers to classify EEG signals. Direct applications of the Transformer attention structures from [16] for EEG classification are evident in [39, 40]. Beyond solely leveraging CNN and attention mechanisms, some studies have integrated additional techniques. For instance, [41] introduces a mirrored input approach combined with an attention model that operates across each data record. In a more intricate approach, [42] deploys spatial, spectral, and temporal Transformers, each catering to a different input data type. A popular EEG processing methodology, time-frequency Common Spatial Pattern (TFCSP), is highlighted in [43]. This method intertwines a two-layer CNN and an attention model, feeding their concatenated outputs into a classifier. A notable trend is the use of global attention in conjunction with three sequential models, as seen in [44, 45, 46]. These works employ three sequential attention models, each dedicated to a single dimension. By utilizing Global Average Pooling (GAP), they efficiently diminish unrelated dimensions and consolidate the attention scores across the three output dimensions. These models effectively achieved the goal of EEG signal classification. However, these methods have the limitation of high energy consumption due to the use of attention CNN, making them difficult to use on some edge devices with low-energy requirements. Also, they lack a special focus on data from different channels and different time areas, which is important in EEG data.

II-C Spiking neural networks

In recent years, SNNs have garnered increasing attention within the neural computing community with broad applications such as computer vision and robot control [47, 48, 49, 50, 22]. Their growing significance can be attributed to their closer resemblance to biological neural systems than CNNs. SNNs, by simulating the discrete, spike-based communication found in actual neurons, promise enhanced efficiency and energy savings. However, this bio-inspired approach comes with challenges, particularly during training. Gradient backpropagation, a staple in training CNNs, presents difficulties for SNNs due to their non-differentiable spiking nature. To address this, various training methods have been proposed [51, 52]: from leveraging evolutionary algorithms to adjust synaptic weights [53], to employing biologically-inspired synaptic update rules like Spiking-Time Dependent Plasticity (STDP) [54]. Some researchers have also explored converting a well-trained CNN into their SNN counterparts [55], while others have advocated for using surrogate gradients for continuing backpropagation [56]. Given the biology-inspired and efficient attributes of SNNs, several works have successfully applied SNNs in the domain of EEG signal classification, showcasing their potential in real-world applications. For example, [57] employed Particle Swarm Optimization as an evolutionary algorithm for weight updates, combined with an unsupervised classifier like K Nearest Neighbours and Multilayer Perceptron; however, their approach was not end-to-end and involved manual feature extraction. Studies that utilized STDP include [58], which integrated manual feature extraction methods like Fast Fourier Transform and Discrete Wavelet Transform with a 3D SNN reservoir and supervised classifiers. Similarly, [59] adopted an unsupervised learning framework, implementing a 3D SNN reservoir model. [60] combined the 3D reservoir with a Support Vector Machine as the classifier. Highlighting conversion techniques, [61] explored a tree structure, demonstrating the energy efficiency benefits of SNNs through a CNN-to-SNN conversion, while another work by [62] utilized Power Spectral Density for feature extraction before such a conversion. Back propagation-like methodologies also found their adaptation in SNNs and EEG classification realm with work [63] using SpikeProp [64]. [14] were notably the first to employ directly-trained SNNs for EEG signal classification. However, these methods have limitations when deepening the networks and seeking to learn complex representations.

III Methods

In this section, we introduce the novel Non-iterative LIF neuron in Section III-A. Subsequently, the proposed attention models are delineated in Section III-B. Finally, III-C describes the network architecture of the proposed NiSNN-A.

III-A LIF neuron

Neurons serve as the fundamental components of neural networks. In this section, the Iterative LIF neuron model is presented in Section III-A1. Then, our proposed Non-iterative LIF neuron model is detailed in Section III-A2. Finally, the comparisons are discussed in Section III-A3

III-A1 Background: Iterative LIF neuron model

In the SNN community, the Leaky Integrate-and-Fire (LIF) neuron model is widely used. It strikes a balance between simplicity and biologically inspired characteristics. The LIF model can be described using these equations:

membrane potential: (1)
{τdu(tc)dtc=u(tc)+wx(tc), if u(tc)Vth,limΔ0;Δ>0u(tc+Δ)=ureset, if u(tc)>Vth,casesformulae-sequence𝜏𝑑𝑢subscript𝑡c𝑑subscript𝑡c𝑢subscript𝑡c𝑤𝑥subscript𝑡c if 𝑢subscript𝑡csubscript𝑉thmissing-subexpressionformulae-sequencesubscriptformulae-sequenceΔ0Δ0𝑢subscript𝑡cΔsubscript𝑢reset if 𝑢subscript𝑡csubscript𝑉thmissing-subexpression\displaystyle\hskip 15.00002pt\left\{\begin{array}[]{ll}\tau\frac{du(t_{\text{% c}})}{dt_{\text{c}}}=-u(t_{\text{c}})+wx(t_{\text{c}}),\text{ if }u(t_{\text{c% }})\leq V_{\mathrm{th}},\\ \lim\limits_{\Delta\to 0;\Delta>0}u({t_{\text{c}}}+\Delta)=u_{\text{reset}},% \text{ if }u(t_{\text{c}})>V_{\mathrm{th}},\end{array}\right.{ start_ARRAY start_ROW start_CELL italic_τ divide start_ARG italic_d italic_u ( italic_t start_POSTSUBSCRIPT c end_POSTSUBSCRIPT ) end_ARG start_ARG italic_d italic_t start_POSTSUBSCRIPT c end_POSTSUBSCRIPT end_ARG = - italic_u ( italic_t start_POSTSUBSCRIPT c end_POSTSUBSCRIPT ) + italic_w italic_x ( italic_t start_POSTSUBSCRIPT c end_POSTSUBSCRIPT ) , if italic_u ( italic_t start_POSTSUBSCRIPT c end_POSTSUBSCRIPT ) ≤ italic_V start_POSTSUBSCRIPT roman_th end_POSTSUBSCRIPT , end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL roman_lim start_POSTSUBSCRIPT roman_Δ → 0 ; roman_Δ > 0 end_POSTSUBSCRIPT italic_u ( italic_t start_POSTSUBSCRIPT c end_POSTSUBSCRIPT + roman_Δ ) = italic_u start_POSTSUBSCRIPT reset end_POSTSUBSCRIPT , if italic_u ( italic_t start_POSTSUBSCRIPT c end_POSTSUBSCRIPT ) > italic_V start_POSTSUBSCRIPT roman_th end_POSTSUBSCRIPT , end_CELL start_CELL end_CELL end_ROW end_ARRAY
ureset={u(tc)Vth, with soft reset mechanism,ur, with hard reset mechanism,subscript𝑢resetcases𝑢subscript𝑡csubscript𝑉th with soft reset mechanism,subscript𝑢r with hard reset mechanism,\displaystyle\hskip 15.00002ptu_{\text{reset}}=\left\{\begin{array}[]{ll}u({t_% {\text{c}}})-V_{\text{th}},&\text{ with soft reset mechanism,}\\ u_{\text{r}},&\text{ with hard reset mechanism,}\end{array}\right.italic_u start_POSTSUBSCRIPT reset end_POSTSUBSCRIPT = { start_ARRAY start_ROW start_CELL italic_u ( italic_t start_POSTSUBSCRIPT c end_POSTSUBSCRIPT ) - italic_V start_POSTSUBSCRIPT th end_POSTSUBSCRIPT , end_CELL start_CELL with soft reset mechanism, end_CELL end_ROW start_ROW start_CELL italic_u start_POSTSUBSCRIPT r end_POSTSUBSCRIPT , end_CELL start_CELL with hard reset mechanism, end_CELL end_ROW end_ARRAY
spike generation:
o(tc)=g(uc),𝑜subscript𝑡c𝑔subscript𝑢𝑐\displaystyle\hskip 15.00002pto(t_{\text{c}})=g(u_{c}),italic_o ( italic_t start_POSTSUBSCRIPT c end_POSTSUBSCRIPT ) = italic_g ( italic_u start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) ,
g(a)={0,if aVth,1,if a>Vth,𝑔𝑎cases0if 𝑎subscript𝑉th1if 𝑎subscript𝑉th\displaystyle\hskip 15.00002ptg(a)=\left\{\begin{array}[]{ll}0,&\text{if }a% \leq V_{\mathrm{th}},\\ 1,&\text{if }a>V_{\mathrm{th}},\end{array}\right.italic_g ( italic_a ) = { start_ARRAY start_ROW start_CELL 0 , end_CELL start_CELL if italic_a ≤ italic_V start_POSTSUBSCRIPT roman_th end_POSTSUBSCRIPT , end_CELL end_ROW start_ROW start_CELL 1 , end_CELL start_CELL if italic_a > italic_V start_POSTSUBSCRIPT roman_th end_POSTSUBSCRIPT , end_CELL end_ROW end_ARRAY

where τ𝜏\tau\in\mathbb{R}italic_τ ∈ blackboard_R is the membrane time constant and u(tc)𝑢subscript𝑡cu(t_{\text{c}})\in\mathbb{R}italic_u ( italic_t start_POSTSUBSCRIPT c end_POSTSUBSCRIPT ) ∈ blackboard_R represents the neuron’s membrane potential at time tcsubscript𝑡c{t_{\text{c}}}italic_t start_POSTSUBSCRIPT c end_POSTSUBSCRIPT. tcsubscript𝑡c{t_{\text{c}}}italic_t start_POSTSUBSCRIPT c end_POSTSUBSCRIPT means continuous time. wx(tc)𝑤𝑥subscript𝑡cwx(t_{\text{c}})\in\mathbb{R}italic_w italic_x ( italic_t start_POSTSUBSCRIPT c end_POSTSUBSCRIPT ) ∈ blackboard_R is the input stimulus at time tcsubscript𝑡c{t_{\text{c}}}\in\mathbb{R}italic_t start_POSTSUBSCRIPT c end_POSTSUBSCRIPT ∈ blackboard_R, denoted as the weighted input of the present layer within the context of neural networks. w𝑤witalic_w is the trainable parameter. Vthsubscript𝑉thV_{\text{th}}\in\mathbb{R}italic_V start_POSTSUBSCRIPT th end_POSTSUBSCRIPT ∈ blackboard_R is the membrane potential threshold. Specifically, when the membrane potential exceeds the threshold Vthsubscript𝑉thV_{\mathrm{th}}italic_V start_POSTSUBSCRIPT roman_th end_POSTSUBSCRIPT, a spike is produced. When the membrane potential remains below the threshold, no spike is generated. After generating a spike, the membrane potential decreases to a reset potential uresetsubscript𝑢resetu_{\text{reset}}\in\mathbb{R}italic_u start_POSTSUBSCRIPT reset end_POSTSUBSCRIPT ∈ blackboard_R. Two types of reset mechanisms deal with the membrane potential after firing events: soft and hard reset. The soft reset mechanism resets the membrane potential by reducing the threshold potential Vthsubscript𝑉thV_{\text{th}}italic_V start_POSTSUBSCRIPT th end_POSTSUBSCRIPT; while the hard reset mechanism resets the membrane potential to a defined potential value ursubscript𝑢ru_{\text{r}}\in\mathbb{R}italic_u start_POSTSUBSCRIPT r end_POSTSUBSCRIPT ∈ blackboard_R. o(tc)𝑜subscript𝑡co(t_{\text{c}})\in\mathbb{R}italic_o ( italic_t start_POSTSUBSCRIPT c end_POSTSUBSCRIPT ) ∈ blackboard_R represents the output spike at time tcsubscript𝑡ct_{\text{c}}italic_t start_POSTSUBSCRIPT c end_POSTSUBSCRIPT. The function g()𝑔g(\cdot)italic_g ( ⋅ ) is the Heaviside step function, which illustrates the spike fire process.

To adapt to the requirements of backpropagation in neural networks, an Iterative LIF neuron model [56] with the soft reset mechanism is introduced:

ut+1superscript𝑢t1\displaystyle u^{\text{t}+1}italic_u start_POSTSUPERSCRIPT t + 1 end_POSTSUPERSCRIPT =λ(utVthot)+wxt,absent𝜆superscript𝑢tsubscript𝑉thsuperscript𝑜t𝑤superscript𝑥t\displaystyle=\lambda(u^{\text{t}}-V_{\text{th}}o^{\text{t}})+wx^{\text{t}},= italic_λ ( italic_u start_POSTSUPERSCRIPT t end_POSTSUPERSCRIPT - italic_V start_POSTSUBSCRIPT th end_POSTSUBSCRIPT italic_o start_POSTSUPERSCRIPT t end_POSTSUPERSCRIPT ) + italic_w italic_x start_POSTSUPERSCRIPT t end_POSTSUPERSCRIPT , (2)
otsuperscript𝑜t\displaystyle o^{\text{t}}italic_o start_POSTSUPERSCRIPT t end_POSTSUPERSCRIPT =g(ut),absent𝑔superscript𝑢t\displaystyle=g(u^{\text{t}}),= italic_g ( italic_u start_POSTSUPERSCRIPT t end_POSTSUPERSCRIPT ) ,

where λ𝜆\lambda\in\mathbb{R}italic_λ ∈ blackboard_R denotes the decay rate of the membrane potential. We use utsuperscript𝑢tu^{\text{t}}italic_u start_POSTSUPERSCRIPT t end_POSTSUPERSCRIPT to represent u(tc=t)𝑢subscript𝑡c𝑡u(t_{\text{c}}=t)italic_u ( italic_t start_POSTSUBSCRIPT c end_POSTSUBSCRIPT = italic_t ) where t𝑡t\in\mathbb{N}italic_t ∈ blackboard_N represents the discrete time step in the Iterative LIF neuron model. Similarly, otsuperscript𝑜to^{\text{t}}italic_o start_POSTSUPERSCRIPT t end_POSTSUPERSCRIPT means o(tc=t)𝑜subscript𝑡c𝑡o(t_{\text{c}}=t)italic_o ( italic_t start_POSTSUBSCRIPT c end_POSTSUBSCRIPT = italic_t ) and xtsuperscript𝑥tx^{\text{t}}italic_x start_POSTSUPERSCRIPT t end_POSTSUPERSCRIPT means x(tc=t)𝑥subscript𝑡c𝑡x(t_{\text{c}}=t)italic_x ( italic_t start_POSTSUBSCRIPT c end_POSTSUBSCRIPT = italic_t ). In this way, the membrane potential updates step by step, recurrently making the LIF neuron dynamics trainable by a network. However, the Heaviside step function g()𝑔g(\cdot)italic_g ( ⋅ ) in the firing process makes it non-differentiable. This characteristic brings challenges for gradient backpropagation. To address this limitation, surrogate functions have been proposed [65]. Nowadays, the Sigmoid functions are commonly employed as surrogate functions due to their capability to emulate the spike firing process, especially when associated with a high value of α𝛼\alphaitalic_α: Sigmoid(x)=11+eαxSigmoid𝑥11superscript𝑒𝛼𝑥\text{Sigmoid}(x)=\frac{1}{1+e^{-\alpha x}}Sigmoid ( italic_x ) = divide start_ARG 1 end_ARG start_ARG 1 + italic_e start_POSTSUPERSCRIPT - italic_α italic_x end_POSTSUPERSCRIPT end_ARG. Therefore, in the Iterative LIF neuron model, the Sigmoid functions are adopted during the gradient backpropagation:

ou=Sigmoid(u)u=Sigmoid(u)(1Sigmoid(u))α.𝑜𝑢Sigmoid𝑢𝑢Sigmoid𝑢1Sigmoid𝑢𝛼\frac{\partial o}{\partial u}=\frac{\partial\text{Sigmoid}(u)}{\partial u}=% \text{Sigmoid}(u)(1-\text{Sigmoid}(u))\alpha.divide start_ARG ∂ italic_o end_ARG start_ARG ∂ italic_u end_ARG = divide start_ARG ∂ Sigmoid ( italic_u ) end_ARG start_ARG ∂ italic_u end_ARG = Sigmoid ( italic_u ) ( 1 - Sigmoid ( italic_u ) ) italic_α . (3)

III-A2 Non-iterative LIF neuron model

Refer to caption
(a) Iterative LIF neuron
Refer to caption
(b) Non-iterative LIF neuron
Figure 1: The Iterative LIF neuron model and the Non-iterative LIF neuron model. (a) Iterative LIF neuron model. The membrane potential ut+1superscript𝑢t1u^{\text{t}+1}italic_u start_POSTSUPERSCRIPT t + 1 end_POSTSUPERSCRIPT is computed recurrently, with each time step depends on its previous state utsuperscript𝑢tu^{\text{t}}italic_u start_POSTSUPERSCRIPT t end_POSTSUPERSCRIPT and the output spikes otsuperscript𝑜to^{\text{t}}italic_o start_POSTSUPERSCRIPT t end_POSTSUPERSCRIPT from the preceding time step. (b) The non-iterative neuron model. The membrane potential U𝑈Uitalic_U is calculated using matrix operations, and the final membrane potential Ufinalsubscript𝑈finalU_{\text{final}}italic_U start_POSTSUBSCRIPT final end_POSTSUBSCRIPT for all time steps is obtained simultaneously rather than iteratively

As described in (2), the membrane potential u𝑢uitalic_u in the Iterative LIF neuron model depends on the outcomes of the preceding time step before it can be updated and determine spike generation. Therefore, for cases with multiple time steps, a long-step loop is required during the execution of the LIF neuron. This lengthens the computation duration and introduces a gradient problem across the time dimension [66]. Thus, we propose a Non-iterative LIF model designed to accelerate computation by no longer relying on loops and avoiding gradient problems caused by long-time execution.

The core idea behind the Non-iterative LIF model is to treat the input wxt𝑤superscript𝑥twx^{\text{t}}italic_w italic_x start_POSTSUPERSCRIPT t end_POSTSUPERSCRIPT along all time steps tnsubscript𝑡nt_{\text{n}}italic_t start_POSTSUBSCRIPT n end_POSTSUBSCRIPT as a matrix X𝑋Xitalic_X and utilize matrices to represent and approximate the neuron dynamics of the LIF neuron. First of all, given the assumption that input occurs only at time step 00, the differential equation denoted by (1) has the following solution ut=x0(1etτ)superscript𝑢tsuperscript𝑥01superscript𝑒𝑡𝜏u^{\text{t}}=x^{0}(1-e^{-\frac{t}{\tau}})italic_u start_POSTSUPERSCRIPT t end_POSTSUPERSCRIPT = italic_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ( 1 - italic_e start_POSTSUPERSCRIPT - divide start_ARG italic_t end_ARG start_ARG italic_τ end_ARG end_POSTSUPERSCRIPT ). The function e^()^𝑒\hat{e}(\cdot)over^ start_ARG italic_e end_ARG ( ⋅ ) is utilized to denote the leaky component:

e^(t)=1etτ.^𝑒𝑡1superscript𝑒𝑡𝜏\hat{e}(t)=1-e^{-\frac{t}{\tau}}.over^ start_ARG italic_e end_ARG ( italic_t ) = 1 - italic_e start_POSTSUPERSCRIPT - divide start_ARG italic_t end_ARG start_ARG italic_τ end_ARG end_POSTSUPERSCRIPT . (4)

Given the assumption that input only occurs at the time step i𝑖i\in\mathbb{N}italic_i ∈ blackboard_N, the feasible solution could be reformulated as ut=xie^(ti), where i[0,,t]formulae-sequencesuperscript𝑢tsuperscript𝑥i^𝑒𝑡𝑖 where 𝑖0𝑡u^{\text{t}}=x^{\rm i}{\hat{e}}(t-{i}),\text{ where }i\in[0,...,t]italic_u start_POSTSUPERSCRIPT t end_POSTSUPERSCRIPT = italic_x start_POSTSUPERSCRIPT roman_i end_POSTSUPERSCRIPT over^ start_ARG italic_e end_ARG ( italic_t - italic_i ) , where italic_i ∈ [ 0 , … , italic_t ]. Consequently, when inputs over all time steps before the current time step t𝑡titalic_t are accounted for, the solution could be represented as:

ut=i=0twxie^(ti).superscript𝑢tsubscriptsuperscript𝑡𝑖0𝑤superscript𝑥i^𝑒𝑡𝑖u^{\text{t}}=\sum^{t}_{i=0}wx^{\rm i}{\hat{e}}(t-i).italic_u start_POSTSUPERSCRIPT t end_POSTSUPERSCRIPT = ∑ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT italic_w italic_x start_POSTSUPERSCRIPT roman_i end_POSTSUPERSCRIPT over^ start_ARG italic_e end_ARG ( italic_t - italic_i ) . (5)

By using this method, (5) can be expressed through matrix computation for all time steps tnsubscript𝑡nt_{\text{n}}italic_t start_POSTSUBSCRIPT n end_POSTSUBSCRIPT:

U=XE^,𝑈𝑋^𝐸U=X\hat{E},italic_U = italic_X over^ start_ARG italic_E end_ARG , (6)

where U1×(tn+1)𝑈superscript1subscript𝑡n1U\in\mathbb{R}^{1\times(t_{\text{n}}+1)}italic_U ∈ blackboard_R start_POSTSUPERSCRIPT 1 × ( italic_t start_POSTSUBSCRIPT n end_POSTSUBSCRIPT + 1 ) end_POSTSUPERSCRIPT is defined as the membrane potential matrix, X1×(tn+1)𝑋superscript1subscript𝑡n1X\in\mathbb{R}^{1\times(t_{\text{n}}+1)}italic_X ∈ blackboard_R start_POSTSUPERSCRIPT 1 × ( italic_t start_POSTSUBSCRIPT n end_POSTSUBSCRIPT + 1 ) end_POSTSUPERSCRIPT is defined as the input matrix, and E^(tn+1)×(tn+1)^𝐸superscriptsubscript𝑡n1subscript𝑡n1\hat{E}\in\mathbb{R}^{(t_{\text{n}}+1)\times(t_{\text{n}}+1)}over^ start_ARG italic_E end_ARG ∈ blackboard_R start_POSTSUPERSCRIPT ( italic_t start_POSTSUBSCRIPT n end_POSTSUBSCRIPT + 1 ) × ( italic_t start_POSTSUBSCRIPT n end_POSTSUBSCRIPT + 1 ) end_POSTSUPERSCRIPT is defined as the leaky matrix:

U=𝑈absent\displaystyle U=italic_U = [u0u1utn]matrixsuperscript𝑢0superscript𝑢1superscript𝑢subscripttn\displaystyle\begin{bmatrix}u^{0}&u^{1}&\ldots&u^{\rm{t_{\text{n}}}}\end{bmatrix}[ start_ARG start_ROW start_CELL italic_u start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT end_CELL start_CELL italic_u start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT end_CELL start_CELL … end_CELL start_CELL italic_u start_POSTSUPERSCRIPT roman_t start_POSTSUBSCRIPT n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_CELL end_ROW end_ARG ] (7)
X=𝑋absent\displaystyle X=italic_X = [wx0wx1wxtn]matrix𝑤superscript𝑥0𝑤superscript𝑥1𝑤superscript𝑥subscripttn\displaystyle\begin{bmatrix}wx^{0}&wx^{1}&\ldots&wx^{\rm t_{\text{n}}}\end{bmatrix}[ start_ARG start_ROW start_CELL italic_w italic_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT end_CELL start_CELL italic_w italic_x start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT end_CELL start_CELL … end_CELL start_CELL italic_w italic_x start_POSTSUPERSCRIPT roman_t start_POSTSUBSCRIPT n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_CELL end_ROW end_ARG ]
E^=^𝐸absent\displaystyle\hat{E}=over^ start_ARG italic_E end_ARG = [e^(0)e^(1)e^(tn)0e^(0)e^(tn1)00e^(0)]matrix^𝑒0^𝑒1^𝑒subscript𝑡n0^𝑒0^𝑒subscript𝑡n100^𝑒0\displaystyle\begin{bmatrix}\hat{e}(0)&\hat{e}(1)&\ldots&\hat{e}(t_{\text{n}})% \\ 0&\hat{e}(0)&\ldots&\hat{e}(t_{\text{n}}-1)\\ \vdots&\vdots&\ddots&\vdots\\ 0&0&\ldots&\hat{e}(0)\end{bmatrix}[ start_ARG start_ROW start_CELL over^ start_ARG italic_e end_ARG ( 0 ) end_CELL start_CELL over^ start_ARG italic_e end_ARG ( 1 ) end_CELL start_CELL … end_CELL start_CELL over^ start_ARG italic_e end_ARG ( italic_t start_POSTSUBSCRIPT n end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL over^ start_ARG italic_e end_ARG ( 0 ) end_CELL start_CELL … end_CELL start_CELL over^ start_ARG italic_e end_ARG ( italic_t start_POSTSUBSCRIPT n end_POSTSUBSCRIPT - 1 ) end_CELL end_ROW start_ROW start_CELL ⋮ end_CELL start_CELL ⋮ end_CELL start_CELL ⋱ end_CELL start_CELL ⋮ end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL … end_CELL start_CELL over^ start_ARG italic_e end_ARG ( 0 ) end_CELL end_ROW end_ARG ]

Additionally, we approximate the firing process of the LIF neuron through matrix calculations. (5) shows U𝑈Uitalic_U that already accounts for all inputs is the upper limit of the membrane potentials. The difference between the real membrane potential and U𝑈Uitalic_U lies in the influence of the reset mechanism brought by the output spike matrix O1×(tn+1)𝑂superscript1subscript𝑡n1O\in\mathbb{R}^{1\times(t_{\text{n}}+1)}italic_O ∈ blackboard_R start_POSTSUPERSCRIPT 1 × ( italic_t start_POSTSUBSCRIPT n end_POSTSUBSCRIPT + 1 ) end_POSTSUPERSCRIPT. We introduce the reset matrix S(tn+1)×(tn+1)𝑆superscriptsubscript𝑡n1subscript𝑡n1S\in\mathbb{R}^{(t_{\text{n}}+1)\times(t_{\text{n}}+1)}italic_S ∈ blackboard_R start_POSTSUPERSCRIPT ( italic_t start_POSTSUBSCRIPT n end_POSTSUBSCRIPT + 1 ) × ( italic_t start_POSTSUBSCRIPT n end_POSTSUBSCRIPT + 1 ) end_POSTSUPERSCRIPT based on the soft reset mechanism (introduced in Section III-A1):

S=[0Vthe^(0)Vthe^(1)Vthe^(tn1)00Vthe^(0)Vthe^(tn2)000Vthe^(tn3)0000].𝑆matrix0subscript𝑉th^𝑒0subscript𝑉th^𝑒1subscript𝑉th^𝑒subscript𝑡n100subscript𝑉th^𝑒0subscript𝑉th^𝑒subscript𝑡n2000subscript𝑉th^𝑒subscript𝑡n30000S=\begin{bmatrix}0&V_{\mathrm{th}}\hat{e}(0)&V_{\mathrm{th}}\hat{e}(1)&\ldots&% V_{\mathrm{th}}\hat{e}(t_{\text{n}}-1)\\ 0&0&V_{\mathrm{th}}\hat{e}(0)&\ldots&V_{\mathrm{th}}\hat{e}(t_{\text{n}}-2)\\ 0&0&0&\ldots&V_{\mathrm{th}}\hat{e}(t_{\text{n}}-3)\\ \vdots&\vdots&\vdots&\ddots&\vdots\\ 0&0&0&\ldots&0\end{bmatrix}.italic_S = [ start_ARG start_ROW start_CELL 0 end_CELL start_CELL italic_V start_POSTSUBSCRIPT roman_th end_POSTSUBSCRIPT over^ start_ARG italic_e end_ARG ( 0 ) end_CELL start_CELL italic_V start_POSTSUBSCRIPT roman_th end_POSTSUBSCRIPT over^ start_ARG italic_e end_ARG ( 1 ) end_CELL start_CELL … end_CELL start_CELL italic_V start_POSTSUBSCRIPT roman_th end_POSTSUBSCRIPT over^ start_ARG italic_e end_ARG ( italic_t start_POSTSUBSCRIPT n end_POSTSUBSCRIPT - 1 ) end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL italic_V start_POSTSUBSCRIPT roman_th end_POSTSUBSCRIPT over^ start_ARG italic_e end_ARG ( 0 ) end_CELL start_CELL … end_CELL start_CELL italic_V start_POSTSUBSCRIPT roman_th end_POSTSUBSCRIPT over^ start_ARG italic_e end_ARG ( italic_t start_POSTSUBSCRIPT n end_POSTSUBSCRIPT - 2 ) end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL … end_CELL start_CELL italic_V start_POSTSUBSCRIPT roman_th end_POSTSUBSCRIPT over^ start_ARG italic_e end_ARG ( italic_t start_POSTSUBSCRIPT n end_POSTSUBSCRIPT - 3 ) end_CELL end_ROW start_ROW start_CELL ⋮ end_CELL start_CELL ⋮ end_CELL start_CELL ⋮ end_CELL start_CELL ⋱ end_CELL start_CELL ⋮ end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL 0 end_CELL start_CELL … end_CELL start_CELL 0 end_CELL end_ROW end_ARG ] . (8)

Therefore, with U𝑈Uitalic_U, S𝑆Sitalic_S, O𝑂Oitalic_O, and the Heaviside fire function g()𝑔g(\cdot)italic_g ( ⋅ ), we have the following identity:

g(UOS)=O,𝑔𝑈𝑂𝑆𝑂g(U-OS)=O,italic_g ( italic_U - italic_O italic_S ) = italic_O , (9)

where only O𝑂Oitalic_O is the unknown variable. To solve (9), we propose the following proposition:

Proposition 1

Given a LIF neuron with leaky and integrated dynamics (6), the inequality

g(UI𝑜𝑛𝑒S)Og(U)𝑔𝑈subscript𝐼𝑜𝑛𝑒𝑆𝑂𝑔𝑈g(U-I_{\text{one}}S)\leq O\leq g(U)italic_g ( italic_U - italic_I start_POSTSUBSCRIPT one end_POSTSUBSCRIPT italic_S ) ≤ italic_O ≤ italic_g ( italic_U ) (10)

always holds where O1×(t𝑛+1)𝑂superscript1subscript𝑡𝑛1O\in\mathbb{R}^{1\times(t_{\text{n}}+1)}italic_O ∈ blackboard_R start_POSTSUPERSCRIPT 1 × ( italic_t start_POSTSUBSCRIPT n end_POSTSUBSCRIPT + 1 ) end_POSTSUPERSCRIPT is the output spike matrix, I𝑜𝑛𝑒subscript𝐼𝑜𝑛𝑒I_{\text{one}}italic_I start_POSTSUBSCRIPT one end_POSTSUBSCRIPT is the all-ones matrix and S𝑆Sitalic_S is defined in (8).

Proof 1

Since g()𝑔normal-⋅g(\cdot)italic_g ( ⋅ ) is a step function, it is difficult to solve (9) through calculating the invert of g()𝑔normal-⋅g(\cdot)italic_g ( ⋅ ). Thus, we introduce the approximation of the reset part OS𝑂𝑆OSitalic_O italic_S to calculate O𝑂Oitalic_O as O^Snormal-^𝑂𝑆{\hat{O}}Sover^ start_ARG italic_O end_ARG italic_S. We use a reset matrix U𝑟𝑒𝑠𝑒𝑡1×(t𝑛+1)subscript𝑈𝑟𝑒𝑠𝑒𝑡superscript1subscript𝑡𝑛1U_{\text{reset}}\in\mathbb{R}^{1\times(t_{\text{n}}+1)}italic_U start_POSTSUBSCRIPT reset end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 1 × ( italic_t start_POSTSUBSCRIPT n end_POSTSUBSCRIPT + 1 ) end_POSTSUPERSCRIPT to denote the approximation O^Snormal-^𝑂𝑆{\hat{O}}Sover^ start_ARG italic_O end_ARG italic_S. Therefore, the extreme cases of U𝑟𝑒𝑠𝑒𝑡subscript𝑈𝑟𝑒𝑠𝑒𝑡U_{\text{reset}}italic_U start_POSTSUBSCRIPT reset end_POSTSUBSCRIPT are as follows:

𝑚𝑎𝑥(U𝑟𝑒𝑠𝑒𝑡)𝑚𝑎𝑥subscript𝑈𝑟𝑒𝑠𝑒𝑡\displaystyle\text{max}(U_{\text{reset}})max ( italic_U start_POSTSUBSCRIPT reset end_POSTSUBSCRIPT ) =𝑚𝑎𝑥(O^)S=I𝑜𝑛𝑒S,absent𝑚𝑎𝑥^𝑂𝑆subscript𝐼𝑜𝑛𝑒𝑆\displaystyle=\text{max}({\hat{O}})S=I_{\text{one}}S,= max ( over^ start_ARG italic_O end_ARG ) italic_S = italic_I start_POSTSUBSCRIPT one end_POSTSUBSCRIPT italic_S ,
𝑚𝑖𝑛(U𝑟𝑒𝑠𝑒𝑡)𝑚𝑖𝑛subscript𝑈𝑟𝑒𝑠𝑒𝑡\displaystyle\text{min}(U_{\text{reset}})min ( italic_U start_POSTSUBSCRIPT reset end_POSTSUBSCRIPT ) =𝑚𝑖𝑛(O^)S=I𝑧𝑒𝑟𝑜S=I𝑧𝑒𝑟𝑜,absent𝑚𝑖𝑛^𝑂𝑆subscript𝐼𝑧𝑒𝑟𝑜𝑆subscript𝐼𝑧𝑒𝑟𝑜\displaystyle=\text{min}({\hat{O}})S=I_{\text{zero}}S=I_{\text{zero}},= min ( over^ start_ARG italic_O end_ARG ) italic_S = italic_I start_POSTSUBSCRIPT zero end_POSTSUBSCRIPT italic_S = italic_I start_POSTSUBSCRIPT zero end_POSTSUBSCRIPT ,

where I𝑧𝑒𝑟𝑜1×(t𝑛+1)subscript𝐼𝑧𝑒𝑟𝑜superscript1subscript𝑡𝑛1I_{\text{zero}}\in\mathbb{R}^{1\times(t_{\text{n}}+1)}italic_I start_POSTSUBSCRIPT zero end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 1 × ( italic_t start_POSTSUBSCRIPT n end_POSTSUBSCRIPT + 1 ) end_POSTSUPERSCRIPT is an all-zeros matrix. Therefore, (9) can be reformulated as:

g(U𝑚𝑎𝑥(U𝑟𝑒𝑠𝑒𝑡))𝑔𝑈𝑚𝑎𝑥subscript𝑈𝑟𝑒𝑠𝑒𝑡\displaystyle g(U-\text{max}(U_{\text{reset}}))italic_g ( italic_U - max ( italic_U start_POSTSUBSCRIPT reset end_POSTSUBSCRIPT ) ) g(UOS)g(U𝑚𝑖𝑛(U𝑟𝑒𝑠𝑒𝑡)).absent𝑔𝑈𝑂𝑆𝑔𝑈𝑚𝑖𝑛subscript𝑈𝑟𝑒𝑠𝑒𝑡\displaystyle\leq g(U-OS)\leq g(U-\text{min}(U_{\text{reset}})).≤ italic_g ( italic_U - italic_O italic_S ) ≤ italic_g ( italic_U - min ( italic_U start_POSTSUBSCRIPT reset end_POSTSUBSCRIPT ) ) .
g(UI𝑜𝑛𝑒S)𝑔𝑈subscript𝐼𝑜𝑛𝑒𝑆\displaystyle g(U-I_{\text{one}}S)italic_g ( italic_U - italic_I start_POSTSUBSCRIPT one end_POSTSUBSCRIPT italic_S ) Og(U).absent𝑂𝑔𝑈\displaystyle\leq O\leq g(U).≤ italic_O ≤ italic_g ( italic_U ) .

\square

Thus, we use Proposition 1 to estimate O𝑂{O}italic_O in Uresetsubscript𝑈resetU_{\text{reset}}italic_U start_POSTSUBSCRIPT reset end_POSTSUBSCRIPT as O^^𝑂{\hat{O}}over^ start_ARG italic_O end_ARG:

UresetO^Sg(U)S.subscript𝑈reset^𝑂𝑆𝑔𝑈𝑆U_{\text{reset}}\approx{\hat{O}}S\leq g(U)S.italic_U start_POSTSUBSCRIPT reset end_POSTSUBSCRIPT ≈ over^ start_ARG italic_O end_ARG italic_S ≤ italic_g ( italic_U ) italic_S . (11)

The spiking neuron should generate as few spikes as possible to keep the sparsity and improve computation efficiency. Thus, the extreme case in (11) is adopted in the Non-iterative LIF neuron model to approximate Uresetsubscript𝑈resetU_{\text{reset}}italic_U start_POSTSUBSCRIPT reset end_POSTSUBSCRIPT:

Ureset=g(U)S.subscript𝑈reset𝑔𝑈𝑆U_{\text{reset}}=g(U)S.italic_U start_POSTSUBSCRIPT reset end_POSTSUBSCRIPT = italic_g ( italic_U ) italic_S . (12)

This yields the final output matrix as follows:

Ufinalsubscript𝑈final\displaystyle U_{\text{final}}italic_U start_POSTSUBSCRIPT final end_POSTSUBSCRIPT =UUreset,absent𝑈subscript𝑈reset\displaystyle=U-U_{\text{reset}},= italic_U - italic_U start_POSTSUBSCRIPT reset end_POSTSUBSCRIPT , (13)
Ofinalsubscript𝑂final\displaystyle O_{\text{final}}italic_O start_POSTSUBSCRIPT final end_POSTSUBSCRIPT =g(Ufinal).absent𝑔subscript𝑈final\displaystyle=g(U_{\text{final}}).= italic_g ( italic_U start_POSTSUBSCRIPT final end_POSTSUBSCRIPT ) .

To show the neuron model in the same expression format as the Iterative LIF neuron model in Section III-A1, we also express (6) to (13) in the same format of (2). Therefore, the Non-iterative LIF neuron model with the soft reset mechanism could be represented as:

utsuperscript𝑢t\displaystyle u^{\text{t}}italic_u start_POSTSUPERSCRIPT t end_POSTSUPERSCRIPT =i=0twxie^(ti)absentlimit-fromsubscriptsuperscript𝑡𝑖0𝑤superscript𝑥i^𝑒𝑡𝑖\displaystyle=\sum^{t}_{i=0}wx^{\rm i}{\hat{e}}(t-i)-= ∑ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT italic_w italic_x start_POSTSUPERSCRIPT roman_i end_POSTSUPERSCRIPT over^ start_ARG italic_e end_ARG ( italic_t - italic_i ) - (14)
i=0t1g(k=0iwxke^(ik))Vthe^(t1i),subscriptsuperscript𝑡1𝑖0𝑔subscriptsuperscript𝑖𝑘0𝑤superscript𝑥k^𝑒𝑖𝑘subscript𝑉th^𝑒𝑡1𝑖\displaystyle\sum^{t-1}_{i=0}g(\sum^{i}_{k=0}wx^{\rm k}{\hat{e}}(i-k))V_{\text% {th}}{\hat{e}}(t-1-i),∑ start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT italic_g ( ∑ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT italic_w italic_x start_POSTSUPERSCRIPT roman_k end_POSTSUPERSCRIPT over^ start_ARG italic_e end_ARG ( italic_i - italic_k ) ) italic_V start_POSTSUBSCRIPT th end_POSTSUBSCRIPT over^ start_ARG italic_e end_ARG ( italic_t - 1 - italic_i ) ,
otsuperscript𝑜t\displaystyle o^{\text{t}}italic_o start_POSTSUPERSCRIPT t end_POSTSUPERSCRIPT =g(ut).absent𝑔superscript𝑢t\displaystyle=g(u^{\text{t}}).= italic_g ( italic_u start_POSTSUPERSCRIPT t end_POSTSUPERSCRIPT ) .

We aim to preserve the original gradient values without introducing significant neuron changes. Therefore, we utilize the derivative of the ReLU function as the surrogate function during backpropagation:

ou={1, if u>0,0, if u0.𝑜𝑢cases1 if 𝑢00 if 𝑢0\frac{\partial o}{\partial u}=\left\{\begin{array}[]{ll}1,&\text{ if }u>0,\\ 0,&\text{ if }u\leq 0.\end{array}\right.divide start_ARG ∂ italic_o end_ARG start_ARG ∂ italic_u end_ARG = { start_ARRAY start_ROW start_CELL 1 , end_CELL start_CELL if italic_u > 0 , end_CELL end_ROW start_ROW start_CELL 0 , end_CELL start_CELL if italic_u ≤ 0 . end_CELL end_ROW end_ARRAY (15)

The pseudo-code of the Non-iterative LIF neuron model is shown in Algorithm 1.

Algorithm 1 Pseudocode for the Non-iterative LIF neuron model
function e_matrix(time_step𝑡𝑖𝑚𝑒_𝑠𝑡𝑒𝑝time\_stepitalic_t italic_i italic_m italic_e _ italic_s italic_t italic_e italic_p, τ𝜏\tauitalic_τ)
     ezeros((time_step,time_step))𝑒zeros𝑡𝑖𝑚𝑒_𝑠𝑡𝑒𝑝𝑡𝑖𝑚𝑒_𝑠𝑡𝑒𝑝e\leftarrow\text{zeros}((time\_step,time\_step))italic_e ← zeros ( ( italic_t italic_i italic_m italic_e _ italic_s italic_t italic_e italic_p , italic_t italic_i italic_m italic_e _ italic_s italic_t italic_e italic_p ) )
     for irange(time_step)𝑖range𝑡𝑖𝑚𝑒_𝑠𝑡𝑒𝑝i\in\text{range}(time\_step)italic_i ∈ range ( italic_t italic_i italic_m italic_e _ italic_s italic_t italic_e italic_p ) do
         for jrange(time_step)𝑗range𝑡𝑖𝑚𝑒_𝑠𝑡𝑒𝑝j\in\text{range}(time\_step)italic_j ∈ range ( italic_t italic_i italic_m italic_e _ italic_s italic_t italic_e italic_p ) do
              kji𝑘𝑗𝑖k\leftarrow j-iitalic_k ← italic_j - italic_i
              if k0𝑘0k\geq 0italic_k ≥ 0 then
                  e[i][j]1exp(((time_stepk)/τ))𝑒delimited-[]𝑖delimited-[]𝑗1𝑡𝑖𝑚𝑒_𝑠𝑡𝑒𝑝𝑘𝜏e[i][j]\leftarrow 1-\exp(-((time\_step-k)/\tau))italic_e [ italic_i ] [ italic_j ] ← 1 - roman_exp ( - ( ( italic_t italic_i italic_m italic_e _ italic_s italic_t italic_e italic_p - italic_k ) / italic_τ ) )
              end if
         end for
     end for
     return e𝑒eitalic_e
end function
function s_matrix(e𝑒eitalic_e, v𝑣vitalic_v)
     szeros(e)𝑠zeros𝑒s\leftarrow\text{zeros}(e)italic_s ← zeros ( italic_e )
     s[:,1:]=e[:,:1]vs[:,1:]=e[:,:-1]vitalic_s [ : , 1 : ] = italic_e [ : , : - 1 ] italic_v
     return s𝑠sitalic_s
end function
procedure Ni_LIF
     function __init__(time_step𝑡𝑖𝑚𝑒_𝑠𝑡𝑒𝑝time\_stepitalic_t italic_i italic_m italic_e _ italic_s italic_t italic_e italic_p, v𝑣vitalic_v)
         self.ee_matrix(time_step)formulae-sequence𝑠𝑒𝑙𝑓𝑒e_matrix𝑡𝑖𝑚𝑒_𝑠𝑡𝑒𝑝self.e\leftarrow\textsc{e\_matrix}(time\_step)italic_s italic_e italic_l italic_f . italic_e ← e_matrix ( italic_t italic_i italic_m italic_e _ italic_s italic_t italic_e italic_p )
         self.ss_matrix(self.e,v)self.s\leftarrow\textsc{s\_matrix}(self.e,v)italic_s italic_e italic_l italic_f . italic_s ← s_matrix ( italic_s italic_e italic_l italic_f . italic_e , italic_v )
     end function
     function forward(x𝑥xitalic_x)
         ux×self.eformulae-sequence𝑢𝑥𝑠𝑒𝑙𝑓𝑒u\leftarrow x\times self.eitalic_u ← italic_x × italic_s italic_e italic_l italic_f . italic_e
         o_hat(u>self.v)×self.so\_hat\leftarrow(u>self.v)\times self.sitalic_o _ italic_h italic_a italic_t ← ( italic_u > italic_s italic_e italic_l italic_f . italic_v ) × italic_s italic_e italic_l italic_f . italic_s
         uuo_hat𝑢𝑢𝑜_𝑎𝑡u\leftarrow u-o\_hatitalic_u ← italic_u - italic_o _ italic_h italic_a italic_t
         o(u>self.v)o\leftarrow(u>self.v)italic_o ← ( italic_u > italic_s italic_e italic_l italic_f . italic_v )
         return o𝑜oitalic_o
     end function
end procedure

III-A3 Comparison: Iterative LIF neuron and Non-iterative LIF neuron

Refer to caption
Figure 2: The illustrative comparison between the Iterative LIF neuron and Non-iterative LIF model. The uppermost figure shows the input spike train. The second figure represents the membrane potential. The output spikes are in the third figure. Both neurons utilize the same input spike train. From the third figure, both models produce an output spike at time step 4444. Notably, the Non-iterative neuron does not generate a second spike at time step 9999 due to inhibitory effects, resulting in more sparsity.

The operational flow diagrams for the two neurons are shown in Figure 1. The Iterative LIF neuron operates in a recurrent manner, relying on the output from the preceding time step for its next computation. Conversely, the Non-iterative LIF model simultaneously processes data from all time steps, requiring only a few matrix operations to determine the membrane potential and output spikes for all time steps. The non-loop characteristic of the Non-iterative LIF neuron could avoid the gradient issues caused by long time step loop executions. We assume the loss is L1×(tn+1)𝐿superscript1subscript𝑡n1L\in\mathbb{R}^{1\times(t_{\text{n}}+1)}italic_L ∈ blackboard_R start_POSTSUPERSCRIPT 1 × ( italic_t start_POSTSUBSCRIPT n end_POSTSUBSCRIPT + 1 ) end_POSTSUPERSCRIPT, which is equal to the summation of the loss ltsuperscript𝑙tl^{\rm t}\in\mathbb{R}italic_l start_POSTSUPERSCRIPT roman_t end_POSTSUPERSCRIPT ∈ blackboard_R of each time step t𝑡titalic_t. The gradient derivation for both LIF neuron models during backpropagation is shown as:

Lw=t=0tnltw=t=0tnltototututw.𝐿𝑤subscriptsuperscriptsubscript𝑡n𝑡0subscript𝑙t𝑤subscriptsuperscriptsubscript𝑡n𝑡0subscript𝑙tsuperscript𝑜tsuperscript𝑜tsuperscript𝑢tsuperscript𝑢t𝑤\frac{\partial L}{\partial w}=\sum^{t_{\rm n}}_{t=0}\frac{\partial l_{\rm t}}{% \partial w}=\sum^{t_{\rm n}}_{t=0}\frac{\partial l_{\rm t}}{\partial o^{\rm t}% }\frac{\partial o^{\rm t}}{\partial u^{\rm t}}\frac{\partial u^{\rm t}}{% \partial w}.divide start_ARG ∂ italic_L end_ARG start_ARG ∂ italic_w end_ARG = ∑ start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT roman_n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT divide start_ARG ∂ italic_l start_POSTSUBSCRIPT roman_t end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_w end_ARG = ∑ start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT roman_n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT divide start_ARG ∂ italic_l start_POSTSUBSCRIPT roman_t end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_o start_POSTSUPERSCRIPT roman_t end_POSTSUPERSCRIPT end_ARG divide start_ARG ∂ italic_o start_POSTSUPERSCRIPT roman_t end_POSTSUPERSCRIPT end_ARG start_ARG ∂ italic_u start_POSTSUPERSCRIPT roman_t end_POSTSUPERSCRIPT end_ARG divide start_ARG ∂ italic_u start_POSTSUPERSCRIPT roman_t end_POSTSUPERSCRIPT end_ARG start_ARG ∂ italic_w end_ARG . (16)

Therefore, we introduce two Propositions regarding utwsuperscript𝑢t𝑤\frac{\partial u^{\rm t}}{\partial w}divide start_ARG ∂ italic_u start_POSTSUPERSCRIPT roman_t end_POSTSUPERSCRIPT end_ARG start_ARG ∂ italic_w end_ARG in two LIF neuron models respectively to compare the gradient issues. Proposition 2 shows the gradient equation of the Iterative LIF neuron, and Proposition 3 introduces the gradient equation of the Non-iterative LIF neuron.

Proposition 2

Given an Iterative LIF neuron with the dynamics in (2), the limit condition

u𝐼tlimtu𝐼tw=0for-allsubscriptsuperscript𝑢t𝐼subscript𝑡subscriptsuperscript𝑢t𝐼𝑤0\forall u^{\rm t}_{\text{I}}\text{, }\lim\limits_{t\to\infty}\frac{\partial u^% {\rm t}_{\text{I}}}{\partial w}=0∀ italic_u start_POSTSUPERSCRIPT roman_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT I end_POSTSUBSCRIPT , roman_lim start_POSTSUBSCRIPT italic_t → ∞ end_POSTSUBSCRIPT divide start_ARG ∂ italic_u start_POSTSUPERSCRIPT roman_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT I end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_w end_ARG = 0 (17)

always holds, where u𝐼tsubscriptsuperscript𝑢normal-t𝐼u^{\rm t}_{\text{I}}italic_u start_POSTSUPERSCRIPT roman_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT I end_POSTSUBSCRIPT is the membrane potential of the Iterative LIF neuron at the time step t𝑡titalic_t.

Proof 2

In the Iterative LIF neuron model, u𝐼t𝑛wsubscriptsuperscript𝑢subscriptnormal-t𝑛𝐼𝑤\frac{\partial u^{\rm t_{\text{n}}}_{\text{I}}}{\partial w}divide start_ARG ∂ italic_u start_POSTSUPERSCRIPT roman_t start_POSTSUBSCRIPT n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT I end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_w end_ARG is derived in (20), which is composed of a summation of two parts of accumulated gradient multiplication i=0t𝑛1u𝐼i+1u𝐼isubscriptsuperscriptproductsubscript𝑡𝑛1𝑖0subscriptsuperscript𝑢normal-i1𝐼subscriptsuperscript𝑢normal-i𝐼\prod^{t_{\text{n}}-1}_{i=0}\frac{\partial u^{\rm i+1}_{\text{I}}}{\partial u^% {\rm i}_{\text{I}}}∏ start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT n end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT divide start_ARG ∂ italic_u start_POSTSUPERSCRIPT roman_i + 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT I end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_u start_POSTSUPERSCRIPT roman_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT I end_POSTSUBSCRIPT end_ARG. According to (2), u𝐼i+1u𝐼isubscriptsuperscript𝑢normal-i1𝐼subscriptsuperscript𝑢normal-i𝐼\frac{\partial u^{\rm i+1}_{\text{I}}}{\partial u^{\rm i}_{\text{I}}}divide start_ARG ∂ italic_u start_POSTSUPERSCRIPT roman_i + 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT I end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_u start_POSTSUPERSCRIPT roman_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT I end_POSTSUBSCRIPT end_ARG is equal to the decay rate λ(0,1)𝜆01\lambda\in(0,1)italic_λ ∈ ( 0 , 1 ) which is less than 1. Therefore, limti=0t1u𝐼i+1u𝐼i=limtλt=0subscriptnormal-→𝑡subscriptsuperscriptproduct𝑡1𝑖0subscriptsuperscript𝑢normal-i1𝐼subscriptsuperscript𝑢normal-i𝐼subscriptnormal-→𝑡superscript𝜆𝑡0\lim\limits_{t\to\infty}\prod^{t-1}_{i=0}\frac{\partial u^{\rm i+1}_{\text{I}}% }{\partial u^{\rm i}_{\text{I}}}=\lim\limits_{t\to\infty}\lambda^{t}=0roman_lim start_POSTSUBSCRIPT italic_t → ∞ end_POSTSUBSCRIPT ∏ start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT divide start_ARG ∂ italic_u start_POSTSUPERSCRIPT roman_i + 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT I end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_u start_POSTSUPERSCRIPT roman_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT I end_POSTSUBSCRIPT end_ARG = roman_lim start_POSTSUBSCRIPT italic_t → ∞ end_POSTSUBSCRIPT italic_λ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = 0, which causes limtu𝐼𝑡w=0subscriptnormal-→𝑡subscriptsuperscript𝑢𝑡𝐼𝑤0\lim\limits_{t\to\infty}\frac{\partial u^{\text{t}}_{\text{I}}}{\partial w}=0roman_lim start_POSTSUBSCRIPT italic_t → ∞ end_POSTSUBSCRIPT divide start_ARG ∂ italic_u start_POSTSUPERSCRIPT t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT I end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_w end_ARG = 0. normal-□\square

Proposition 3

Given a Non-iterative LIF neuron with the dynamics in (14), the limit condition

u𝑁𝑖t, s.t.limtu𝑁𝑖tw0subscriptsuperscript𝑢t𝑁𝑖, s.t.subscript𝑡subscriptsuperscript𝑢t𝑁𝑖𝑤0\exists u^{\rm t}_{\text{Ni}}\text{, s.t.}\lim\limits_{t\to\infty}\frac{% \partial u^{\rm t}_{\text{Ni}}}{\partial w}\neq 0∃ italic_u start_POSTSUPERSCRIPT roman_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT Ni end_POSTSUBSCRIPT , s.t. roman_lim start_POSTSUBSCRIPT italic_t → ∞ end_POSTSUBSCRIPT divide start_ARG ∂ italic_u start_POSTSUPERSCRIPT roman_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT Ni end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_w end_ARG ≠ 0 (18)

always holds where u𝑁𝑖tsubscriptsuperscript𝑢normal-t𝑁𝑖u^{\rm t}_{\text{Ni}}italic_u start_POSTSUPERSCRIPT roman_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT Ni end_POSTSUBSCRIPT is the membrane potential of the Non-iterative LIF neuron at the time step t𝑡titalic_t.

Proof 3

In the Non-iterative LIF neuron model, u𝑁𝑖t𝑛wsubscriptsuperscript𝑢subscriptnormal-t𝑛𝑁𝑖𝑤\frac{\partial u^{\rm t_{\text{n}}}_{\text{Ni}}}{\partial w}divide start_ARG ∂ italic_u start_POSTSUPERSCRIPT roman_t start_POSTSUBSCRIPT n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT Ni end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_w end_ARG is derived as:

u𝑁𝑖t𝑛w=subscriptsuperscript𝑢subscriptt𝑛𝑁𝑖𝑤absent\displaystyle\frac{\partial u^{\rm t_{\text{n}}}_{\text{Ni}}}{\partial w}=divide start_ARG ∂ italic_u start_POSTSUPERSCRIPT roman_t start_POSTSUBSCRIPT n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT Ni end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_w end_ARG = i=0txie^(ti)subscriptsuperscript𝑡𝑖0superscript𝑥i^𝑒𝑡𝑖\displaystyle\sum^{t}_{i=0}x^{\rm i}{\hat{e}}(t-i)∑ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT roman_i end_POSTSUPERSCRIPT over^ start_ARG italic_e end_ARG ( italic_t - italic_i ) (19)
i=0t1[V𝑡ℎe^(t1i)ou|u=k=0iwxke^(ik)\displaystyle-\sum^{t-1}_{i=0}\left[V_{\text{th}}{\hat{e}}(t\!-\!1\!-\!i)\left% .\frac{\partial o}{\partial u}\right|_{u=\sum^{i}_{k=0}wx^{\rm k}{\hat{e}}(i-k% )}\right.- ∑ start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT [ italic_V start_POSTSUBSCRIPT th end_POSTSUBSCRIPT over^ start_ARG italic_e end_ARG ( italic_t - 1 - italic_i ) divide start_ARG ∂ italic_o end_ARG start_ARG ∂ italic_u end_ARG | start_POSTSUBSCRIPT italic_u = ∑ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT italic_w italic_x start_POSTSUPERSCRIPT roman_k end_POSTSUPERSCRIPT over^ start_ARG italic_e end_ARG ( italic_i - italic_k ) end_POSTSUBSCRIPT
k=0ixke^(ik)],\displaystyle\cdot\left.\sum^{i}_{k=0}x^{\rm k}{\hat{e}}(i-k)\right],⋅ ∑ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT roman_k end_POSTSUPERSCRIPT over^ start_ARG italic_e end_ARG ( italic_i - italic_k ) ] ,

which is only composed of summations. We construct a specific example to prove u𝑁𝑖t𝑛, s.t.limt𝑛u𝑁𝑖t𝑛w0subscriptsuperscript𝑢subscriptnormal-t𝑛𝑁𝑖, s.t.subscriptnormal-→subscript𝑡𝑛subscriptsuperscript𝑢subscriptnormal-t𝑛𝑁𝑖𝑤0\exists u^{\rm t_{\text{n}}}_{\text{Ni}}\text{, s.t.}\lim\limits_{t_{\text{n}}% \to\infty}\frac{\partial u^{\rm t_{\text{n}}}_{\text{Ni}}}{\partial w}\neq 0∃ italic_u start_POSTSUPERSCRIPT roman_t start_POSTSUBSCRIPT n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT Ni end_POSTSUBSCRIPT , s.t. roman_lim start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT n end_POSTSUBSCRIPT → ∞ end_POSTSUBSCRIPT divide start_ARG ∂ italic_u start_POSTSUPERSCRIPT roman_t start_POSTSUBSCRIPT n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT Ni end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_w end_ARG ≠ 0. There exists w<0𝑤0w<0italic_w < 0 and i,xi<0for-all𝑖superscript𝑥normal-i0\forall i,x^{\rm i}<0∀ italic_i , italic_x start_POSTSUPERSCRIPT roman_i end_POSTSUPERSCRIPT < 0. Therefore, i,k=0iwxke^(ik)<0for-all𝑖subscriptsuperscript𝑖𝑘0𝑤superscript𝑥normal-knormal-^𝑒𝑖𝑘0\forall i,\sum^{i}_{k=0}wx^{\rm k}{\hat{e}}(i-k)<0∀ italic_i , ∑ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT italic_w italic_x start_POSTSUPERSCRIPT roman_k end_POSTSUPERSCRIPT over^ start_ARG italic_e end_ARG ( italic_i - italic_k ) < 0. According to (15), i,ou|u=k=0iwxke^(ik)=0for-all𝑖evaluated-at𝑜𝑢𝑢subscriptsuperscript𝑖𝑘0𝑤superscript𝑥normal-knormal-^𝑒𝑖𝑘0\forall i,\frac{\partial o}{\partial u}\left.\right|_{u=\sum^{i}_{k=0}wx^{\rm k% }{\hat{e}}(i-k)}=0∀ italic_i , divide start_ARG ∂ italic_o end_ARG start_ARG ∂ italic_u end_ARG | start_POSTSUBSCRIPT italic_u = ∑ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT italic_w italic_x start_POSTSUPERSCRIPT roman_k end_POSTSUPERSCRIPT over^ start_ARG italic_e end_ARG ( italic_i - italic_k ) end_POSTSUBSCRIPT = 0. Thus, the proposition is derived as i=0txie^(ti)subscriptsuperscript𝑡𝑖0superscript𝑥normal-inormal-^𝑒𝑡𝑖\sum^{t}_{i=0}x^{\rm i}{\hat{e}}(t-i)∑ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT roman_i end_POSTSUPERSCRIPT over^ start_ARG italic_e end_ARG ( italic_t - italic_i ). Due to i,xi<0for-all𝑖superscript𝑥normal-i0\forall i,x^{\rm i}<0∀ italic_i , italic_x start_POSTSUPERSCRIPT roman_i end_POSTSUPERSCRIPT < 0, i=0txie^(ti)<00subscriptsuperscript𝑡𝑖0superscript𝑥normal-inormal-^𝑒𝑡𝑖00\sum^{t}_{i=0}x^{\rm i}{\hat{e}}(t-i)<0\neq 0∑ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT roman_i end_POSTSUPERSCRIPT over^ start_ARG italic_e end_ARG ( italic_t - italic_i ) < 0 ≠ 0. This way, we prove that u𝑁𝑖t𝑛, s.t.limt𝑛u𝑁𝑖t𝑛w0subscriptsuperscript𝑢subscriptnormal-t𝑛𝑁𝑖, s.t.subscriptnormal-→subscript𝑡𝑛subscriptsuperscript𝑢subscriptnormal-t𝑛𝑁𝑖𝑤0\exists u^{\rm t_{\text{n}}}_{\text{Ni}}\text{, s.t.}\lim\limits_{t_{\text{n}}% \to\infty}\frac{\partial u^{\rm t_{\text{n}}}_{\text{Ni}}}{\partial w}\neq 0∃ italic_u start_POSTSUPERSCRIPT roman_t start_POSTSUBSCRIPT n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT Ni end_POSTSUBSCRIPT , s.t. roman_lim start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT n end_POSTSUBSCRIPT → ∞ end_POSTSUBSCRIPT divide start_ARG ∂ italic_u start_POSTSUPERSCRIPT roman_t start_POSTSUBSCRIPT n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT Ni end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_w end_ARG ≠ 0. normal-□\square

Remark 1

Proposition 2 and 3 shows that when the number of time steps is large, the iterative LIF neuron has the vanishing gradient problem, but the non-iterative LIF neuron does not. As well known in [66], the presence of a continuously accumulated gradient multiplication part, it𝑛1u𝐼i+1u𝐼isubscriptsuperscriptproductsubscript𝑡𝑛1𝑖subscriptsuperscript𝑢normal-i1𝐼subscriptsuperscript𝑢normal-i𝐼\prod^{t_{\text{n}}-1}_{i}\frac{\partial u^{\rm i+1}_{\text{I}}}{\partial u^{% \rm i}_{\text{I}}}∏ start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT n end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT divide start_ARG ∂ italic_u start_POSTSUPERSCRIPT roman_i + 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT I end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_u start_POSTSUPERSCRIPT roman_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT I end_POSTSUBSCRIPT end_ARG, can cause gradient vanishing issues during training. Conversely, the gradient equations of the proposed Non-iterative LIF neuron model is shown in (19) where gradient calculation is totally dependent on summation instead of it𝑛1u𝐼i+1u𝐼isubscriptsuperscriptproductsubscript𝑡𝑛1𝑖subscriptsuperscript𝑢normal-i1𝐼subscriptsuperscript𝑢normal-i𝐼\prod^{t_{\text{n}}-1}_{i}\frac{\partial u^{\rm i+1}_{\text{I}}}{\partial u^{% \rm i}_{\text{I}}}∏ start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT n end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT divide start_ARG ∂ italic_u start_POSTSUPERSCRIPT roman_i + 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT I end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_u start_POSTSUPERSCRIPT roman_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT I end_POSTSUBSCRIPT end_ARG in the Iterative LIF neuron model, which can avoid the gradient issue caused by accumulated multiplication. According to that, The Non-iterative LIF neuron model avoids the gradient problem caused by the recurrent execution of long step loops compared with the Iterative LIF neuron model.

uItwsubscriptsuperscript𝑢tI𝑤\displaystyle\frac{\partial u^{\text{t}}_{\text{I}}}{\partial w}divide start_ARG ∂ italic_u start_POSTSUPERSCRIPT t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT I end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_w end_ARG =uItuIt1uIt1w+uItot1ot1w+xt1absentsubscriptsuperscript𝑢tIsubscriptsuperscript𝑢t1Isubscriptsuperscript𝑢t1I𝑤subscriptsuperscript𝑢tIsuperscript𝑜t1superscript𝑜t1𝑤superscript𝑥t1\displaystyle=\frac{\partial u^{\text{t}}_{\text{I}}}{\partial u^{\rm{t-1}}_{% \text{I}}}\frac{\partial u^{\rm{t-1}}_{\text{I}}}{\partial w}+\frac{\partial u% ^{\text{t}}_{\text{I}}}{\partial o^{\rm{t-1}}}\frac{\partial o^{\rm{t-1}}}{% \partial w}+x^{\rm{t-1}}= divide start_ARG ∂ italic_u start_POSTSUPERSCRIPT t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT I end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_u start_POSTSUPERSCRIPT roman_t - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT I end_POSTSUBSCRIPT end_ARG divide start_ARG ∂ italic_u start_POSTSUPERSCRIPT roman_t - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT I end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_w end_ARG + divide start_ARG ∂ italic_u start_POSTSUPERSCRIPT t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT I end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_o start_POSTSUPERSCRIPT roman_t - 1 end_POSTSUPERSCRIPT end_ARG divide start_ARG ∂ italic_o start_POSTSUPERSCRIPT roman_t - 1 end_POSTSUPERSCRIPT end_ARG start_ARG ∂ italic_w end_ARG + italic_x start_POSTSUPERSCRIPT roman_t - 1 end_POSTSUPERSCRIPT (20)
=uItuIt1[uIt1uIt2uIt2w+uIt2ot2ot2w+xt2]+uItot1ot1w+xt1absentsubscriptsuperscript𝑢tIsubscriptsuperscript𝑢t1Idelimited-[]subscriptsuperscript𝑢t1Isubscriptsuperscript𝑢t2Isubscriptsuperscript𝑢t2I𝑤subscriptsuperscript𝑢t2Isuperscript𝑜t2superscript𝑜t2𝑤superscript𝑥t2subscriptsuperscript𝑢tIsuperscript𝑜t1superscript𝑜t1𝑤superscript𝑥t1\displaystyle=\frac{\partial u^{\text{t}}_{\text{I}}}{\partial u^{\rm{t-1}}_{% \text{I}}}\left[\frac{\partial u^{\rm{t-1}}_{\text{I}}}{\partial u^{\rm t-2}_{% \text{I}}}\frac{\partial u^{\rm t-2}_{\text{I}}}{\partial w}+\frac{\partial u^% {\rm t-2}_{\text{I}}}{\partial o^{\rm t-2}}\frac{\partial o^{\rm t-2}}{% \partial w}+x^{\rm t-2}\right]+\frac{\partial u^{\rm t}_{\text{I}}}{\partial o% ^{\rm{t-1}}}\frac{\partial o^{\rm{t-1}}}{\partial w}+x^{\rm{t-1}}= divide start_ARG ∂ italic_u start_POSTSUPERSCRIPT t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT I end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_u start_POSTSUPERSCRIPT roman_t - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT I end_POSTSUBSCRIPT end_ARG [ divide start_ARG ∂ italic_u start_POSTSUPERSCRIPT roman_t - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT I end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_u start_POSTSUPERSCRIPT roman_t - 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT I end_POSTSUBSCRIPT end_ARG divide start_ARG ∂ italic_u start_POSTSUPERSCRIPT roman_t - 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT I end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_w end_ARG + divide start_ARG ∂ italic_u start_POSTSUPERSCRIPT roman_t - 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT I end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_o start_POSTSUPERSCRIPT roman_t - 2 end_POSTSUPERSCRIPT end_ARG divide start_ARG ∂ italic_o start_POSTSUPERSCRIPT roman_t - 2 end_POSTSUPERSCRIPT end_ARG start_ARG ∂ italic_w end_ARG + italic_x start_POSTSUPERSCRIPT roman_t - 2 end_POSTSUPERSCRIPT ] + divide start_ARG ∂ italic_u start_POSTSUPERSCRIPT roman_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT I end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_o start_POSTSUPERSCRIPT roman_t - 1 end_POSTSUPERSCRIPT end_ARG divide start_ARG ∂ italic_o start_POSTSUPERSCRIPT roman_t - 1 end_POSTSUPERSCRIPT end_ARG start_ARG ∂ italic_w end_ARG + italic_x start_POSTSUPERSCRIPT roman_t - 1 end_POSTSUPERSCRIPT
=uItuIt1[uIt1uIt2[uIt2uIt3[[uI2uI1[uI1uI0uI0w+uI1o0o0w+x0]+uI2o1o1w+x1]+]+uIt2ot3ot3w+xt3]+uIt1ot2ot2w\displaystyle=\frac{\partial u^{\text{t}}_{\text{I}}}{\partial u^{\rm{t-1}}_{% \text{I}}}\left[\frac{\partial u^{\rm{t-1}}_{\text{I}}}{\partial u^{\rm t-2}_{% \text{I}}}\left[\frac{\partial u^{\rm t-2}_{\text{I}}}{\partial u^{\rm t-3}_{% \text{I}}}\left[\ldots\left[\frac{\partial u^{2}_{\text{I}}}{\partial u^{1}_{% \text{I}}}\left[\frac{\partial u^{1}_{\text{I}}}{\partial u^{0}_{\text{I}}}% \frac{\partial u^{0}_{\text{I}}}{\partial w}\!+\!\frac{\partial u^{1}_{\text{I% }}}{\partial o^{0}}\frac{\partial o^{0}}{\partial w}\!+\!x^{0}\right]\!+\!% \frac{\partial u^{2}_{\text{I}}}{\partial o^{1}}\frac{\partial o^{1}}{\partial w% }\!+\!x^{1}\right]\!+\!\ldots\right]\!+\!\frac{\partial u^{\rm t-2}_{\text{I}}% }{\partial o^{\rm t-3}}\frac{\partial o^{\rm t-3}}{\partial w}\!+\!x^{t-3}% \right]\!+\!\frac{\partial u^{\rm{t-1}}_{\text{I}}}{\partial o^{\rm t-2}}\frac% {\partial o^{\rm t-2}}{\partial w}\right.= divide start_ARG ∂ italic_u start_POSTSUPERSCRIPT t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT I end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_u start_POSTSUPERSCRIPT roman_t - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT I end_POSTSUBSCRIPT end_ARG [ divide start_ARG ∂ italic_u start_POSTSUPERSCRIPT roman_t - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT I end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_u start_POSTSUPERSCRIPT roman_t - 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT I end_POSTSUBSCRIPT end_ARG [ divide start_ARG ∂ italic_u start_POSTSUPERSCRIPT roman_t - 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT I end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_u start_POSTSUPERSCRIPT roman_t - 3 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT I end_POSTSUBSCRIPT end_ARG [ … [ divide start_ARG ∂ italic_u start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT I end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_u start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT I end_POSTSUBSCRIPT end_ARG [ divide start_ARG ∂ italic_u start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT I end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_u start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT I end_POSTSUBSCRIPT end_ARG divide start_ARG ∂ italic_u start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT I end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_w end_ARG + divide start_ARG ∂ italic_u start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT I end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_o start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT end_ARG divide start_ARG ∂ italic_o start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT end_ARG start_ARG ∂ italic_w end_ARG + italic_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ] + divide start_ARG ∂ italic_u start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT I end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_o start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT end_ARG divide start_ARG ∂ italic_o start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT end_ARG start_ARG ∂ italic_w end_ARG + italic_x start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ] + … ] + divide start_ARG ∂ italic_u start_POSTSUPERSCRIPT roman_t - 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT I end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_o start_POSTSUPERSCRIPT roman_t - 3 end_POSTSUPERSCRIPT end_ARG divide start_ARG ∂ italic_o start_POSTSUPERSCRIPT roman_t - 3 end_POSTSUPERSCRIPT end_ARG start_ARG ∂ italic_w end_ARG + italic_x start_POSTSUPERSCRIPT italic_t - 3 end_POSTSUPERSCRIPT ] + divide start_ARG ∂ italic_u start_POSTSUPERSCRIPT roman_t - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT I end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_o start_POSTSUPERSCRIPT roman_t - 2 end_POSTSUPERSCRIPT end_ARG divide start_ARG ∂ italic_o start_POSTSUPERSCRIPT roman_t - 2 end_POSTSUPERSCRIPT end_ARG start_ARG ∂ italic_w end_ARG
+xt2]+uItot1ot1w+xt1\displaystyle\quad+x^{\rm t-2}\bigg{]}+\frac{\partial u^{\text{t}}_{\text{I}}}% {\partial o^{\rm{t-1}}}\frac{\partial o^{\rm{t-1}}}{\partial w}+x^{\rm{t-1}}+ italic_x start_POSTSUPERSCRIPT roman_t - 2 end_POSTSUPERSCRIPT ] + divide start_ARG ∂ italic_u start_POSTSUPERSCRIPT t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT I end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_o start_POSTSUPERSCRIPT roman_t - 1 end_POSTSUPERSCRIPT end_ARG divide start_ARG ∂ italic_o start_POSTSUPERSCRIPT roman_t - 1 end_POSTSUPERSCRIPT end_ARG start_ARG ∂ italic_w end_ARG + italic_x start_POSTSUPERSCRIPT roman_t - 1 end_POSTSUPERSCRIPT
=i=1tn[k=itn1uIk+1uIk]xi1+i=2tn[k=itn1uIk+1uIk]uIioi1oi1w=i=2tn[k=itn1uIk+1uIk]Accumulatinggradient mult.[xi1+uIioi1oi1wAccumulatinggradient mult.]+i=1tn1uIi+1uIiAccumulatinggradient mult.x0.absentsubscriptsuperscriptsubscript𝑡n𝑖1delimited-[]subscriptsuperscriptproductsubscript𝑡𝑛1𝑘𝑖subscriptsuperscript𝑢k1Isubscriptsuperscript𝑢kIsuperscript𝑥𝑖1subscriptsuperscriptsubscript𝑡n𝑖2delimited-[]subscriptsuperscriptproductsubscript𝑡𝑛1𝑘𝑖subscriptsuperscript𝑢k1Isubscriptsuperscript𝑢kIsubscriptsuperscript𝑢iIsuperscript𝑜i1superscript𝑜i1𝑤subscriptsuperscriptsubscript𝑡n𝑖2subscriptdelimited-[]subscriptsuperscriptproductsubscript𝑡𝑛1𝑘𝑖subscriptsuperscript𝑢k1Isubscriptsuperscript𝑢kIAccumulatinggradient mult.delimited-[]superscript𝑥𝑖1subscriptsuperscript𝑢iIsuperscript𝑜i1subscriptsuperscript𝑜i1𝑤Accumulatinggradient mult.subscriptsubscriptsuperscriptproductsubscript𝑡𝑛1𝑖1subscriptsuperscript𝑢i1Isubscriptsuperscript𝑢iIAccumulatinggradient mult.superscript𝑥0\displaystyle=\sum^{t_{\text{n}}}_{i=1}\left[\prod^{t_{n}-1}_{k=i}\frac{% \partial u^{\rm k+1}_{\text{I}}}{\partial u^{\rm k}_{\text{I}}}\right]x^{i-1}+% \sum^{t_{\text{n}}}_{i=2}\left[\prod^{t_{n}-1}_{k=i}\frac{\partial u^{\rm k+1}% _{\text{I}}}{\partial u^{\rm k}_{\text{I}}}\right]\frac{\partial u^{\rm i}_{% \text{I}}}{\partial o^{\rm i-1}}\frac{\partial o^{\rm i-1}}{\partial w}=\sum^{% t_{\text{n}}}_{i=2}\underbrace{\left[\prod^{t_{n}-1}_{k=i}\frac{\partial u^{% \rm k+1}_{\text{I}}}{\partial u^{\rm k}_{\text{I}}}\right]}_{\begin{subarray}{% c}\text{Accumulating}\\ \text{gradient mult.}\end{subarray}}[x^{i-1}+\frac{\partial u^{\rm i}_{\text{I% }}}{\partial o^{\rm i-1}}\underbrace{\frac{\partial o^{\rm i-1}}{\partial w}}_% {\begin{subarray}{c}\text{Accumulating}\\ \text{gradient mult.}\end{subarray}}]+\underbrace{\prod^{t_{n}-1}_{i=1}\frac{% \partial u^{\rm i+1}_{\text{I}}}{\partial u^{\rm i}_{\text{I}}}}_{\begin{% subarray}{c}\text{Accumulating}\\ \text{gradient mult.}\end{subarray}}x^{0}.= ∑ start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT [ ∏ start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k = italic_i end_POSTSUBSCRIPT divide start_ARG ∂ italic_u start_POSTSUPERSCRIPT roman_k + 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT I end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_u start_POSTSUPERSCRIPT roman_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT I end_POSTSUBSCRIPT end_ARG ] italic_x start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT + ∑ start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i = 2 end_POSTSUBSCRIPT [ ∏ start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k = italic_i end_POSTSUBSCRIPT divide start_ARG ∂ italic_u start_POSTSUPERSCRIPT roman_k + 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT I end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_u start_POSTSUPERSCRIPT roman_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT I end_POSTSUBSCRIPT end_ARG ] divide start_ARG ∂ italic_u start_POSTSUPERSCRIPT roman_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT I end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_o start_POSTSUPERSCRIPT roman_i - 1 end_POSTSUPERSCRIPT end_ARG divide start_ARG ∂ italic_o start_POSTSUPERSCRIPT roman_i - 1 end_POSTSUPERSCRIPT end_ARG start_ARG ∂ italic_w end_ARG = ∑ start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i = 2 end_POSTSUBSCRIPT under⏟ start_ARG [ ∏ start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k = italic_i end_POSTSUBSCRIPT divide start_ARG ∂ italic_u start_POSTSUPERSCRIPT roman_k + 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT I end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_u start_POSTSUPERSCRIPT roman_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT I end_POSTSUBSCRIPT end_ARG ] end_ARG start_POSTSUBSCRIPT start_ARG start_ROW start_CELL Accumulating end_CELL end_ROW start_ROW start_CELL gradient mult. end_CELL end_ROW end_ARG end_POSTSUBSCRIPT [ italic_x start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT + divide start_ARG ∂ italic_u start_POSTSUPERSCRIPT roman_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT I end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_o start_POSTSUPERSCRIPT roman_i - 1 end_POSTSUPERSCRIPT end_ARG under⏟ start_ARG divide start_ARG ∂ italic_o start_POSTSUPERSCRIPT roman_i - 1 end_POSTSUPERSCRIPT end_ARG start_ARG ∂ italic_w end_ARG end_ARG start_POSTSUBSCRIPT start_ARG start_ROW start_CELL Accumulating end_CELL end_ROW start_ROW start_CELL gradient mult. end_CELL end_ROW end_ARG end_POSTSUBSCRIPT ] + under⏟ start_ARG ∏ start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT divide start_ARG ∂ italic_u start_POSTSUPERSCRIPT roman_i + 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT I end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_u start_POSTSUPERSCRIPT roman_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT I end_POSTSUBSCRIPT end_ARG end_ARG start_POSTSUBSCRIPT start_ARG start_ROW start_CELL Accumulating end_CELL end_ROW start_ROW start_CELL gradient mult. end_CELL end_ROW end_ARG end_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT .

In our Non-iterative LIF neuron model, the final membrane potential Ufinalsubscript𝑈finalU_{\text{final}}italic_U start_POSTSUBSCRIPT final end_POSTSUBSCRIPT is decided by the reset matrix Uresetsubscript𝑈resetU_{\text{reset}}italic_U start_POSTSUBSCRIPT reset end_POSTSUBSCRIPT, which is an approximation of the actual situation. It is because matrix O^^𝑂\hat{O}over^ start_ARG italic_O end_ARG overestimates the output spikes, resulting in Uresetsubscript𝑈resetU_{\text{reset}}italic_U start_POSTSUBSCRIPT reset end_POSTSUBSCRIPT being larger than it should be, yielding a reduced final membrane potential Ufinalsubscript𝑈finalU_{\text{final}}italic_U start_POSTSUBSCRIPT final end_POSTSUBSCRIPT and fewer output spikes. Due to this characteristic of the firing process, the Non-iterative LIF neuron exhibits greater sparsity compared to the Iterative LIF model, as shown in Figure 2. Given the identical input spike train, the non-iterative LIF neuron produces fewer spikes than its iterative counterpart. This sparsity brings more energy efficiency during execution.

III-B Attention model

In this section, we present the proposed attention models. The fundamental goal of the attention mechanism is to employ neural networks to compute the attention score. This score is applied across the entire feature map, assigning weights to the features. This way, relevant features are emphasized while irrelevant features are downplayed, aiming to extract useful ones. In the context of EEG signals, the original input data is shaped as B×C×Dsuperscript𝐵𝐶𝐷\mathbb{R}^{B\times C\times D}blackboard_R start_POSTSUPERSCRIPT italic_B × italic_C × italic_D end_POSTSUPERSCRIPT, where B𝐵B\in\mathbb{N}italic_B ∈ blackboard_N is the batch size, C𝐶C\in\mathbb{N}italic_C ∈ blackboard_N represents the channel size and D𝐷D\in\mathbb{N}italic_D ∈ blackboard_N is the length of data for each channel. We segment the data into timepieces, resulting in a new size of B×C×S×Tsuperscript𝐵𝐶𝑆𝑇\mathbb{R}^{B\times C\times S\times T}blackboard_R start_POSTSUPERSCRIPT italic_B × italic_C × italic_S × italic_T end_POSTSUPERSCRIPT. Here, S𝑆Sitalic_S is the number of timepieces and T𝑇T\in\mathbb{N}italic_T ∈ blackboard_N is the number of time steps in each segment. It is important to note that the multiplication result of S𝑆Sitalic_S and T𝑇Titalic_T should equal D𝐷Ditalic_D. Given that EEG data typically presents as long temporal sequences, it is important to determine which timepieces are crucial for classification. Thus, our attention model places particular emphasis on the dimension S𝑆Sitalic_S. We introduce two distinct attention mechanisms: Sequence attention (Seq-attention), described in Section III-B1, and Channel Sequence attention (ChanSeq-attention), detailed in Section III-B2. Section III-B3 presents Global-attention, a special case of ChanSeq-attention.

Contrary to the sequential attention models like those in [24], our intention is to utilize a single model to capture the attention score simultaneously. We introduce two model architectures for each attention mechanism: the linear architecture and the convolutional architecture. Figure 3 illustrates how the architecture of the linear attention differs from the attention model described in Figure 4. The distinction between these architectures lies in their methods for attention score computation: the linear architecture incorporates fully connected layers, whereas the convolutional architecture employs convolutional layers. Notably, both Seq-attention and ChanSeq-attention have linear and convolutional versions.

III-B1 Seq-attention mechanism

The Sequence attention mechanism mainly focuses on identifying which timepieces require attention. It firstly reshapes the input data into the format of B×S×(C×T)superscript𝐵𝑆𝐶𝑇\mathbb{R}^{B\times S\times(C\times T)}blackboard_R start_POSTSUPERSCRIPT italic_B × italic_S × ( italic_C × italic_T ) end_POSTSUPERSCRIPT, which the attention model then processes.

Refer to caption
Figure 3: Illustration of linear attention model. The linear attention model integrates position embedding with the input data. This model employs three linear layers to obtain the Query, Key, and Value matrices. The attention score is computed through the multiplication of the query and key matrices, then undergoing normalization through a Softmax function. To produce the final enhanced output, an additional linear layer is utilized in the final step.

The linear Seq-attention (Linear-Seq-attention) model are given as follows:

q=Qfc(x+p),Qfc:B×S×(C×T)B×d1×S×d2,:𝑞subscript𝑄fc𝑥𝑝subscript𝑄fcsuperscript𝐵𝑆𝐶𝑇superscript𝐵subscript𝑑1𝑆subscript𝑑2\displaystyle q=Q_{\text{fc}}(x+p),\;Q_{\text{fc}}:\mathbb{R}^{B\times S\times% (C\times T)}\rightarrow\mathbb{R}^{B\times d_{1}\times S\times d_{2}},italic_q = italic_Q start_POSTSUBSCRIPT fc end_POSTSUBSCRIPT ( italic_x + italic_p ) , italic_Q start_POSTSUBSCRIPT fc end_POSTSUBSCRIPT : blackboard_R start_POSTSUPERSCRIPT italic_B × italic_S × ( italic_C × italic_T ) end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_B × italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × italic_S × italic_d start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , (21)
k=Kfc(x+p),Kfc:B×S×(C×T)B×d1×S×d2,:𝑘subscript𝐾fc𝑥𝑝subscript𝐾fcsuperscript𝐵𝑆𝐶𝑇superscript𝐵subscript𝑑1𝑆subscript𝑑2\displaystyle k=K_{\text{fc}}(x+p),\;K_{\text{fc}}:\mathbb{R}^{B\times S\times% (C\times T)}\rightarrow\mathbb{R}^{B\times d_{1}\times S\times d_{2}},italic_k = italic_K start_POSTSUBSCRIPT fc end_POSTSUBSCRIPT ( italic_x + italic_p ) , italic_K start_POSTSUBSCRIPT fc end_POSTSUBSCRIPT : blackboard_R start_POSTSUPERSCRIPT italic_B × italic_S × ( italic_C × italic_T ) end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_B × italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × italic_S × italic_d start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ,
v=Vfc(x+p),Vfc:B×S×(C×T)B×d1×S×d2,:𝑣subscript𝑉fc𝑥𝑝subscript𝑉fcsuperscript𝐵𝑆𝐶𝑇superscript𝐵subscript𝑑1𝑆subscript𝑑2\displaystyle v=V_{\text{fc}}(x+p),\;V_{\text{fc}}:\mathbb{R}^{B\times S\times% (C\times T)}\rightarrow\mathbb{R}^{B\times d_{1}\times S\times d_{2}},italic_v = italic_V start_POSTSUBSCRIPT fc end_POSTSUBSCRIPT ( italic_x + italic_p ) , italic_V start_POSTSUBSCRIPT fc end_POSTSUBSCRIPT : blackboard_R start_POSTSUPERSCRIPT italic_B × italic_S × ( italic_C × italic_T ) end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_B × italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × italic_S × italic_d start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ,
A=Softmax(qkTdk),AB×d1×S×S,formulae-sequence𝐴Softmax𝑞superscript𝑘𝑇subscript𝑑𝑘𝐴superscript𝐵subscript𝑑1𝑆𝑆\displaystyle A=\text{Softmax}\left(\frac{qk^{T}}{\sqrt{d_{k}}}\right),\;A\in% \mathbb{R}^{B\times d_{1}\times S\times S},italic_A = Softmax ( divide start_ARG italic_q italic_k start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT end_ARG start_ARG square-root start_ARG italic_d start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG end_ARG ) , italic_A ∈ blackboard_R start_POSTSUPERSCRIPT italic_B × italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × italic_S × italic_S end_POSTSUPERSCRIPT ,
x^=Av,x^B×d1×S×d2,formulae-sequence^𝑥𝐴𝑣^𝑥superscript𝐵subscript𝑑1𝑆subscript𝑑2\displaystyle\hat{x}=Av,\;\hat{x}\in\mathbb{R}^{B\times d_{1}\times S\times d_% {2}},over^ start_ARG italic_x end_ARG = italic_A italic_v , over^ start_ARG italic_x end_ARG ∈ blackboard_R start_POSTSUPERSCRIPT italic_B × italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × italic_S × italic_d start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ,
hlinear-seq(x)=FCseq(x^),subscriptlinear-seq𝑥𝐹subscript𝐶seq^𝑥\displaystyle h_{\text{linear-seq}}(x)=FC_{\text{seq}}(\hat{x}),\;italic_h start_POSTSUBSCRIPT linear-seq end_POSTSUBSCRIPT ( italic_x ) = italic_F italic_C start_POSTSUBSCRIPT seq end_POSTSUBSCRIPT ( over^ start_ARG italic_x end_ARG ) ,
FCseq:B×d1×S×d2B×C×S×T,:𝐹subscript𝐶seqsuperscript𝐵subscript𝑑1𝑆subscript𝑑2superscript𝐵𝐶𝑆𝑇\displaystyle FC_{\text{seq}}:\mathbb{R}^{B\times d_{1}\times S\times d_{2}}% \rightarrow\mathbb{R}^{B\times C\times S\times T},italic_F italic_C start_POSTSUBSCRIPT seq end_POSTSUBSCRIPT : blackboard_R start_POSTSUPERSCRIPT italic_B × italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × italic_S × italic_d start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_B × italic_C × italic_S × italic_T end_POSTSUPERSCRIPT ,

where Qfcsubscript𝑄fcQ_{\text{fc}}italic_Q start_POSTSUBSCRIPT fc end_POSTSUBSCRIPT, Kfcsubscript𝐾fcK_{\text{fc}}italic_K start_POSTSUBSCRIPT fc end_POSTSUBSCRIPT and Vfcsubscript𝑉fcV_{\text{fc}}italic_V start_POSTSUBSCRIPT fc end_POSTSUBSCRIPT are three fully connect layers followed by reshape techniques. Their respective functions are to generate the query matrix q𝑞qitalic_q, the key matrix k𝑘kitalic_k, and the value matrix v𝑣vitalic_v. In particular, these matrices adhere to dimensions B×d1×S×d2superscript𝐵subscript𝑑1𝑆subscript𝑑2\mathbb{R}^{B\times d_{1}\times S\times d_{2}}blackboard_R start_POSTSUPERSCRIPT italic_B × italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × italic_S × italic_d start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, where d1subscript𝑑1d_{1}italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and d2subscript𝑑2d_{2}italic_d start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT represent hyperparameters. p𝑝pitalic_p is the position embedding [16]. Subsequently, the attention score A𝐴Aitalic_A is derived by matrix multiplication, followed by normalization. By applying the Softmax function to A𝐴Aitalic_A, weights are assigned to the values in v𝑣vitalic_v. Then, a fully connected layer produces the final output hlinear-seq(x)subscriptlinear-seq𝑥h_{\text{linear-seq}}(x)italic_h start_POSTSUBSCRIPT linear-seq end_POSTSUBSCRIPT ( italic_x ), which represents the input data enhanced with attention.

The convolutional Seq-attention (Conv-Seq-attention) model are presented as follows:

q=Qconv(x),Qconv:B×S×C×TB×S×(d×C×T),:𝑞subscript𝑄conv𝑥subscript𝑄convsuperscript𝐵𝑆𝐶𝑇superscript𝐵𝑆𝑑𝐶𝑇\displaystyle q=Q_{\text{conv}}(x),\;Q_{\text{conv}}:\mathbb{R}^{B\times S% \times C\times T}\rightarrow\mathbb{R}^{B\times S\times(d\times C\times T)},italic_q = italic_Q start_POSTSUBSCRIPT conv end_POSTSUBSCRIPT ( italic_x ) , italic_Q start_POSTSUBSCRIPT conv end_POSTSUBSCRIPT : blackboard_R start_POSTSUPERSCRIPT italic_B × italic_S × italic_C × italic_T end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_B × italic_S × ( italic_d × italic_C × italic_T ) end_POSTSUPERSCRIPT , (22)
k=Kconv(x),Kconv:B×S×C×TB×S×(d×C×T),:𝑘subscript𝐾conv𝑥subscript𝐾convsuperscript𝐵𝑆𝐶𝑇superscript𝐵𝑆𝑑𝐶𝑇\displaystyle k=K_{\text{conv}}(x),\;K_{\text{conv}}:\mathbb{R}^{B\times S% \times C\times T}\rightarrow\mathbb{R}^{B\times S\times(d\times C\times T)},italic_k = italic_K start_POSTSUBSCRIPT conv end_POSTSUBSCRIPT ( italic_x ) , italic_K start_POSTSUBSCRIPT conv end_POSTSUBSCRIPT : blackboard_R start_POSTSUPERSCRIPT italic_B × italic_S × italic_C × italic_T end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_B × italic_S × ( italic_d × italic_C × italic_T ) end_POSTSUPERSCRIPT ,
A=Softmax(qkT),AB×S×S,formulae-sequence𝐴Softmax𝑞superscript𝑘𝑇𝐴superscript𝐵𝑆𝑆\displaystyle A=\text{Softmax}(qk^{T}),\;A\in\mathbb{R}^{B\times S\times S},italic_A = Softmax ( italic_q italic_k start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) , italic_A ∈ blackboard_R start_POSTSUPERSCRIPT italic_B × italic_S × italic_S end_POSTSUPERSCRIPT ,
x^=Ax,x^B×C×S×T,formulae-sequence^𝑥𝐴𝑥^𝑥superscript𝐵𝐶𝑆𝑇\displaystyle\hat{x}=Ax,\;\hat{x}\in\mathbb{R}^{B\times C\times S\times T},over^ start_ARG italic_x end_ARG = italic_A italic_x , over^ start_ARG italic_x end_ARG ∈ blackboard_R start_POSTSUPERSCRIPT italic_B × italic_C × italic_S × italic_T end_POSTSUPERSCRIPT ,
hconv-seq(x)=αx^+x,subscriptconv-seq𝑥𝛼^𝑥𝑥\displaystyle h_{\text{conv-seq}}(x)=\alpha\hat{x}+x,italic_h start_POSTSUBSCRIPT conv-seq end_POSTSUBSCRIPT ( italic_x ) = italic_α over^ start_ARG italic_x end_ARG + italic_x ,

where Qconvsubscript𝑄convQ_{\text{conv}}italic_Q start_POSTSUBSCRIPT conv end_POSTSUBSCRIPT and Kconvsubscript𝐾convK_{\text{conv}}italic_K start_POSTSUBSCRIPT conv end_POSTSUBSCRIPT are two convolutional layers with reshape techniques. Similarly to the Linear-Seq-attention mechanism, they are designated to generate the query matrix q𝑞qitalic_q and key matrix k𝑘kitalic_k, respectively. Within this model, d𝑑ditalic_d serves as a hyperparameter. Unlike its linear counterpart, the Conv-Seq-attention model omits the computation of the value matrix and instead directly calculates the attention score A𝐴Aitalic_A by multiplying the matrix. Subsequent to the Softmax function, these weights are integrated with the input data via matrix multiplication. Finally, the Conv-Seq-attention introduces a trainable parameter β𝛽\betaitalic_β to modulate the balance between the attention-enhanced result and the original input data.

Refer to caption
Figure 4: Illustration of convolutional attention model. Two convolutional layers are employed to get the Query and Key matrices in the convolutional attention model. The attention score is computed through matrix multiplication and subsequently normalized by the Softmax function. A matrix operation then integrates the attention score with the input data. Depending on the model variant, this can manifest as matrix multiplication in the Seq-attention model (refer to Section III-B1) and the ChanSeq-attention model (detailed in Section III-B2), or as an element-wise product in the Global-attention model (see Section III-B3). In the final step, a trainable parameter, denoted as β𝛽\betaitalic_β, is introduced to balance the original and refined features.

Both the Linear-Seq-attention and Conv-Seq-attention mechanisms yield attention scores of size S×Ssuperscript𝑆𝑆\mathbb{R}^{S\times S}blackboard_R start_POSTSUPERSCRIPT italic_S × italic_S end_POSTSUPERSCRIPT in the last two dimensions. Then they utilize matrix multiplication to produce the final enhanced feature map. In this way, attention is exclusively directed toward different timepieces and their interaction without consideration of information from other dimensions.

III-B2 ChanSeq-attention mechanism

Typically, EEG signals are collected from multiple channels. For example, the OpenBMI dataset that we use in this work has 62 channels [67]. Beyond directing attention to timepieces, it is also valuable to be concerned with which channels receive the most attention. Thus, we introduce an attention mechanism named Channel Sequence attention mechanism, designed to determine when and where the features must be focused.

The equations for linear Channel Sequence attention (Linear-ChanSeq-attention) are as follows:

q=Qfc(x+p),Qfc:B×C×S×TB×C×d1×S×d2,:𝑞subscript𝑄fc𝑥𝑝subscript𝑄fcsuperscript𝐵𝐶𝑆𝑇superscript𝐵𝐶subscript𝑑1𝑆subscript𝑑2\displaystyle q=Q_{\text{fc}}(x+p),\;Q_{\text{fc}}:\mathbb{R}^{B\times C\times S% \times T}\rightarrow\mathbb{R}^{B\times C\times d_{1}\times S\times d_{2}},italic_q = italic_Q start_POSTSUBSCRIPT fc end_POSTSUBSCRIPT ( italic_x + italic_p ) , italic_Q start_POSTSUBSCRIPT fc end_POSTSUBSCRIPT : blackboard_R start_POSTSUPERSCRIPT italic_B × italic_C × italic_S × italic_T end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_B × italic_C × italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × italic_S × italic_d start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , (23)
k=Kfc(x+p),Kfc:B×C×S×TB×C×d1×S×d2,:𝑘subscript𝐾fc𝑥𝑝subscript𝐾fcsuperscript𝐵𝐶𝑆𝑇superscript𝐵𝐶subscript𝑑1𝑆subscript𝑑2\displaystyle k=K_{\text{fc}}(x+p),\;K_{\text{fc}}:\mathbb{R}^{B\times C\times S% \times T}\rightarrow\mathbb{R}^{B\times C\times d_{1}\times S\times d_{2}},italic_k = italic_K start_POSTSUBSCRIPT fc end_POSTSUBSCRIPT ( italic_x + italic_p ) , italic_K start_POSTSUBSCRIPT fc end_POSTSUBSCRIPT : blackboard_R start_POSTSUPERSCRIPT italic_B × italic_C × italic_S × italic_T end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_B × italic_C × italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × italic_S × italic_d start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ,
v=Vfc(x+p),Vfc:B×C×S×TB×C×d1×S×d2,:𝑣subscript𝑉fc𝑥𝑝subscript𝑉fcsuperscript𝐵𝐶𝑆𝑇superscript𝐵𝐶subscript𝑑1𝑆subscript𝑑2\displaystyle v=V_{\text{fc}}(x+p),\;V_{\text{fc}}:\mathbb{R}^{B\times C\times S% \times T}\rightarrow\mathbb{R}^{B\times C\times d_{1}\times S\times d_{2}},italic_v = italic_V start_POSTSUBSCRIPT fc end_POSTSUBSCRIPT ( italic_x + italic_p ) , italic_V start_POSTSUBSCRIPT fc end_POSTSUBSCRIPT : blackboard_R start_POSTSUPERSCRIPT italic_B × italic_C × italic_S × italic_T end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_B × italic_C × italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × italic_S × italic_d start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ,
A=Softmax(qkTdk),AB×C×d1×S×S,formulae-sequence𝐴Softmax𝑞superscript𝑘𝑇subscript𝑑𝑘𝐴superscript𝐵𝐶subscript𝑑1𝑆𝑆\displaystyle A=\text{Softmax}\left(\frac{qk^{T}}{\sqrt{d_{k}}}\right),\;A\in% \mathbb{R}^{B\times C\times d_{1}\times S\times S},italic_A = Softmax ( divide start_ARG italic_q italic_k start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT end_ARG start_ARG square-root start_ARG italic_d start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG end_ARG ) , italic_A ∈ blackboard_R start_POSTSUPERSCRIPT italic_B × italic_C × italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × italic_S × italic_S end_POSTSUPERSCRIPT ,
x^=Av,x^B×C×d1×S×d2,formulae-sequence^𝑥𝐴𝑣^𝑥superscript𝐵𝐶subscript𝑑1𝑆subscript𝑑2\displaystyle\hat{x}=Av,\;\hat{x}\in\mathbb{R}^{B\times C\times d_{1}\times S% \times d_{2}},over^ start_ARG italic_x end_ARG = italic_A italic_v , over^ start_ARG italic_x end_ARG ∈ blackboard_R start_POSTSUPERSCRIPT italic_B × italic_C × italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × italic_S × italic_d start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ,
hlinear-chanseq(x)=FCchanseq(x^),subscriptlinear-chanseq𝑥𝐹subscript𝐶chanseq^𝑥\displaystyle h_{\text{linear-chanseq}}(x)=FC_{\text{chanseq}}(\hat{x}),\;italic_h start_POSTSUBSCRIPT linear-chanseq end_POSTSUBSCRIPT ( italic_x ) = italic_F italic_C start_POSTSUBSCRIPT chanseq end_POSTSUBSCRIPT ( over^ start_ARG italic_x end_ARG ) ,
FCchanseq:B×C×d1×S×d2B×C×S×T,:𝐹subscript𝐶chanseqsuperscript𝐵𝐶subscript𝑑1𝑆subscript𝑑2superscript𝐵𝐶𝑆𝑇\displaystyle FC_{\text{chanseq}}:\mathbb{R}^{B\times C\times d_{1}\times S% \times d_{2}}\rightarrow\mathbb{R}^{B\times C\times S\times T},italic_F italic_C start_POSTSUBSCRIPT chanseq end_POSTSUBSCRIPT : blackboard_R start_POSTSUPERSCRIPT italic_B × italic_C × italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × italic_S × italic_d start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_B × italic_C × italic_S × italic_T end_POSTSUPERSCRIPT ,

where the parameters have the same meaning as those in the Linear-Seq-attention. However, it is worth noting that to implement attention across both channels and timepieces simultaneously, there are modifications to the dimensions of the model. The resultant dimensions of Qfcsubscript𝑄fcQ_{\text{fc}}italic_Q start_POSTSUBSCRIPT fc end_POSTSUBSCRIPT, Kfcsubscript𝐾fcK_{\text{fc}}italic_K start_POSTSUBSCRIPT fc end_POSTSUBSCRIPT, and Vfcsubscript𝑉fcV_{\text{fc}}italic_V start_POSTSUBSCRIPT fc end_POSTSUBSCRIPT are adjusted to B×C×d1×S×d2superscript𝐵𝐶subscript𝑑1𝑆subscript𝑑2\mathbb{R}^{B\times C\times d_{1}\times S\times d_{2}}blackboard_R start_POSTSUPERSCRIPT italic_B × italic_C × italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × italic_S × italic_d start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT instead of B×d1×S×d2superscript𝐵subscript𝑑1𝑆subscript𝑑2\mathbb{R}^{B\times d_{1}\times S\times d_{2}}blackboard_R start_POSTSUPERSCRIPT italic_B × italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × italic_S × italic_d start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT. This new channel dimension is consistently maintained throughout the entirety of the model.

The equations of corresponding convolutional Channel Sequence attention (Conv-ChanSeq-attention) are as follows:

q=Qconv(x),Qconv:B×C×S×TB×C×S×(d×T),:𝑞subscript𝑄conv𝑥subscript𝑄convsuperscript𝐵𝐶𝑆𝑇superscript𝐵𝐶𝑆𝑑𝑇\displaystyle q=Q_{\text{conv}}(x),\;Q_{\text{conv}}:\mathbb{R}^{B\times C% \times S\times T}\rightarrow\mathbb{R}^{B\times C\times S\times(d\times T)},italic_q = italic_Q start_POSTSUBSCRIPT conv end_POSTSUBSCRIPT ( italic_x ) , italic_Q start_POSTSUBSCRIPT conv end_POSTSUBSCRIPT : blackboard_R start_POSTSUPERSCRIPT italic_B × italic_C × italic_S × italic_T end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_B × italic_C × italic_S × ( italic_d × italic_T ) end_POSTSUPERSCRIPT , (24)
k=Kconv(x),Kconv:B×C×S×TB×C×S×(d×T),:𝑘subscript𝐾conv𝑥subscript𝐾convsuperscript𝐵𝐶𝑆𝑇superscript𝐵𝐶𝑆𝑑𝑇\displaystyle k=K_{\text{conv}}(x),\;K_{\text{conv}}:\mathbb{R}^{B\times C% \times S\times T}\rightarrow\mathbb{R}^{B\times C\times S\times(d\times T)},italic_k = italic_K start_POSTSUBSCRIPT conv end_POSTSUBSCRIPT ( italic_x ) , italic_K start_POSTSUBSCRIPT conv end_POSTSUBSCRIPT : blackboard_R start_POSTSUPERSCRIPT italic_B × italic_C × italic_S × italic_T end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_B × italic_C × italic_S × ( italic_d × italic_T ) end_POSTSUPERSCRIPT ,
A=Softmax(qkT),AB×C×S×S,formulae-sequence𝐴Softmax𝑞superscript𝑘𝑇𝐴superscript𝐵𝐶𝑆𝑆\displaystyle A=\text{Softmax}(qk^{T}),\;A\in\mathbb{R}^{B\times C\times S% \times S},italic_A = Softmax ( italic_q italic_k start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) , italic_A ∈ blackboard_R start_POSTSUPERSCRIPT italic_B × italic_C × italic_S × italic_S end_POSTSUPERSCRIPT ,
x^=Ax,x^B×C×S×T,formulae-sequence^𝑥𝐴𝑥^𝑥superscript𝐵𝐶𝑆𝑇\displaystyle\hat{x}=Ax,\;\hat{x}\in\mathbb{R}^{B\times C\times S\times T},over^ start_ARG italic_x end_ARG = italic_A italic_x , over^ start_ARG italic_x end_ARG ∈ blackboard_R start_POSTSUPERSCRIPT italic_B × italic_C × italic_S × italic_T end_POSTSUPERSCRIPT ,
hconv-chanseq(x)=αx^+x,subscriptconv-chanseq𝑥𝛼^𝑥𝑥\displaystyle h_{\text{conv-chanseq}}(x)=\alpha\hat{x}+x,italic_h start_POSTSUBSCRIPT conv-chanseq end_POSTSUBSCRIPT ( italic_x ) = italic_α over^ start_ARG italic_x end_ARG + italic_x ,

where the parameters in this context hold the same representations to those in the Conv-Seq-attention. Mirroring the adjustments in Linear-ChanSeq-attention, modifications to the dimensions are implemented here as well. Specifically, the output of Qconvsubscript𝑄convQ_{\text{conv}}italic_Q start_POSTSUBSCRIPT conv end_POSTSUBSCRIPT and Kconvsubscript𝐾convK_{\text{conv}}italic_K start_POSTSUBSCRIPT conv end_POSTSUBSCRIPThas dimensions B×C×S×(d×T)superscript𝐵𝐶𝑆𝑑𝑇\mathbb{R}^{B\times C\times S\times(d\times T)}blackboard_R start_POSTSUPERSCRIPT italic_B × italic_C × italic_S × ( italic_d × italic_T ) end_POSTSUPERSCRIPT, which is different from the previous approach of merging the C𝐶Citalic_C and T𝑇Titalic_T dimensions. As a result, the attention score A𝐴Aitalic_A is characterized by a size of B×C×S×Ssuperscript𝐵𝐶𝑆𝑆\mathbb{R}^{B\times C\times S\times S}blackboard_R start_POSTSUPERSCRIPT italic_B × italic_C × italic_S × italic_S end_POSTSUPERSCRIPT, facilitating the attention across various channels while accounting for different timepieces.

III-B3 Global-attention mechanism

In this section, we go further from the ChanSeq-attention. Given input data with dimensions B×C×S×Tsuperscript𝐵𝐶𝑆𝑇\mathbb{R}^{B\times C\times S\times T}blackboard_R start_POSTSUPERSCRIPT italic_B × italic_C × italic_S × italic_T end_POSTSUPERSCRIPT, not only are channels and timepieces considered, but also the specific time steps are important. To address this, we introduce the Global-attention mechanism, which operates attention across all three dimensions: C𝐶Citalic_C, S𝑆Sitalic_S, and T𝑇Titalic_T. It decides when, where, and which feature is essential.

The Global-attention can be viewed as a special case of the Conv-ChanSeq-attention when the dimensions of S𝑆Sitalic_S and T𝑇Titalic_T are equivalent. The corresponding equations are presented below:

q=Qconv(x),Qconv:B×C×S×TB×C×S×(d×T),:𝑞subscript𝑄conv𝑥subscript𝑄convsuperscript𝐵𝐶𝑆𝑇superscript𝐵𝐶𝑆𝑑𝑇\displaystyle q=Q_{\text{conv}}(x),\;Q_{\text{conv}}:\mathbb{R}^{B\times C% \times S\times T}\rightarrow\mathbb{R}^{B\times C\times S\times(d\times T)},italic_q = italic_Q start_POSTSUBSCRIPT conv end_POSTSUBSCRIPT ( italic_x ) , italic_Q start_POSTSUBSCRIPT conv end_POSTSUBSCRIPT : blackboard_R start_POSTSUPERSCRIPT italic_B × italic_C × italic_S × italic_T end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_B × italic_C × italic_S × ( italic_d × italic_T ) end_POSTSUPERSCRIPT , (25)
k=Kconv(x),Kconv:B×C×S×TB×C×T×(d×S),:𝑘subscript𝐾conv𝑥subscript𝐾convsuperscript𝐵𝐶𝑆𝑇superscript𝐵𝐶𝑇𝑑𝑆\displaystyle k=K_{\text{conv}}(x),\;K_{\text{conv}}:\mathbb{R}^{B\times C% \times S\times T}\rightarrow\mathbb{R}^{B\times C\times T\times(d\times S)},italic_k = italic_K start_POSTSUBSCRIPT conv end_POSTSUBSCRIPT ( italic_x ) , italic_K start_POSTSUBSCRIPT conv end_POSTSUBSCRIPT : blackboard_R start_POSTSUPERSCRIPT italic_B × italic_C × italic_S × italic_T end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_B × italic_C × italic_T × ( italic_d × italic_S ) end_POSTSUPERSCRIPT ,
A=Softmax(qkT),AB×C×S×T,formulae-sequence𝐴Softmax𝑞superscript𝑘𝑇𝐴superscript𝐵𝐶𝑆𝑇\displaystyle A=\text{Softmax}(qk^{T}),\;A\in\mathbb{R}^{B\times C\times S% \times T},italic_A = Softmax ( italic_q italic_k start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) , italic_A ∈ blackboard_R start_POSTSUPERSCRIPT italic_B × italic_C × italic_S × italic_T end_POSTSUPERSCRIPT ,
x^=Ax,x^B×C×S×T,formulae-sequence^𝑥direct-product𝐴𝑥^𝑥superscript𝐵𝐶𝑆𝑇\displaystyle\hat{x}=A\odot x,\;\hat{x}\in\mathbb{R}^{B\times C\times S\times T},over^ start_ARG italic_x end_ARG = italic_A ⊙ italic_x , over^ start_ARG italic_x end_ARG ∈ blackboard_R start_POSTSUPERSCRIPT italic_B × italic_C × italic_S × italic_T end_POSTSUPERSCRIPT ,
hconv-global(x)=αx^+x.subscriptconv-global𝑥𝛼^𝑥𝑥\displaystyle h_{\text{conv-global}}(x)=\alpha\hat{x}+x.italic_h start_POSTSUBSCRIPT conv-global end_POSTSUBSCRIPT ( italic_x ) = italic_α over^ start_ARG italic_x end_ARG + italic_x .

In this context, the parameters align with those defined in the Conv-ChanSeq-attention. In particular, when S𝑆Sitalic_S and T𝑇Titalic_T have identical dimension sizes, both A𝐴Aitalic_A and x𝑥xitalic_x have the size of B×C×S×Tsuperscript𝐵𝐶𝑆𝑇\mathbb{R}^{B\times C\times S\times T}blackboard_R start_POSTSUPERSCRIPT italic_B × italic_C × italic_S × italic_T end_POSTSUPERSCRIPT. Contrary to the Conv-ChanSeq-attention, the Global-attention employs element-wise product instead of matrix multiplication when calculating enhanced feature map x^^𝑥\hat{x}over^ start_ARG italic_x end_ARG in (25). Consequently, with A𝐴Aitalic_A having dimensions B×C×S×Tsuperscript𝐵𝐶𝑆𝑇\mathbb{R}^{B\times C\times S\times T}blackboard_R start_POSTSUPERSCRIPT italic_B × italic_C × italic_S × italic_T end_POSTSUPERSCRIPT, each time step in each timepiece of each channel receives a specific attention score to improve the feature map. This method distinguishes itself from other attention models we proposed by its careful consideration of data and its direct approach to data enhancement.

III-C Network Architecture

Refer to caption
(a) SNN architecture
Refer to caption
(b) CNN architecture
Figure 5: Network architecture overview. (a) NiSNN-A architecture.The SNN has a residual block containing two spiking layers. Instead of traditional batch normalization layers and activation functions, the model employs LIF neurons to transform real-valued data streams into spikes. Notably, the proposed Non-iterative LIF neurons are utilized, leveraging the Heaviside function during the feed-forward phase and the derivative of the ReLU function during backpropagation. Max pooling layers are adopted after spiking layers to maintain a binary data flow. After the second spiking layer, an attention mechanism is integrated to refine the feature maps. The final stage of the model consists of two fully connected layers for classification. (b) Attention CNN architecture. The overarching architecture of the CNN closely mirrors that of the SNN, encompassing a single residual block with two 2D convolutional layers. A batch normalization layer follows the convolutional layer and utilizes the ReLU function as its activation function. The network employs average pooling layers for down-sampling.

In this section, we present the network architectures. Figure 5 illustrates an illustrative overview of these architectures. Figure 5a illustrates the architecture of the NiSNN-A. We utilize a two-layer residual spiking convolutional framework to process the data, where the first spiking layer could be seen as a spiking encoder as proposed in [14]. Furthermore, we have integrated the membrane potential batch normalization introduced in [68] in the LIF neuron. The LIF neuron yields a binary sequence as its output. To preserve the binary nature of the data stream, a max pooling layer is utilized to reduce dimensionality. After the second spiking layer, our proposed attention model is integrated. Finally, two linear layers without activation functions are employed to classify the output labels.

To compare and verify that the proposed attention mechanism can also be applied to the CNN network, we show the attention CNN counterpart to the NiSNN-A in Figure 5b. The CNN architecture mirrors the SNNs to regulate extraneous variables, encompassing a two-layer convolutional residual framework. After each convolutional layer, a batch normalization layer is applied, followed by the ReLU activation function. The average pooling layer is then used to reduce the dimension.

In particular, since the SNN network has the characteristics of binary data flow, the input data for both the second spiking layer and the first linear layer consists of accumulator operations (AC). In contrast, all operations within the CNN are multiplicative and accumulate operations (MAC).

IV Experiments

In this section, we outline the details of the experiment. Details about the dataset and its processing can be found in Section IV-A. The network configuration is elaborated in Section IV-B. Lastly, the approach to energy analysis is presented in Section IV-C.

IV-A Dataset

In our study, we utilize the OpenBMI dataset [67] to validate the efficacy of the proposed attention-based neural networks. OpenBMI is a comprehensive large-scale motor imagery EEG signal dataset encompassing data from 54 participants and 62 channels. Each participant has 400 data records lasting 4 seconds and a sampling frequency of 1000 Hz. Our aim is to assess our ability to recognize common features within these EEG signals. To achieve this, we employ a subject-independent approach for training and testing the network, which is utilized in [34]. Specifically, data from 53 subjects serve as the training set, with data from 1 subject reserved for testing. This approach guarantees that any results derived from a previously unseen subject, ensuring that the trained network remains unknown to the test data. This subject-independent method validates the practical scalability of our model. Therefore, the training dataset has 53 subject records, totaling 21,200 data records. Each of these records contains 4,000 time steps with 62 channels. To manage this, we employ a downsampling technique, reducing the size of time steps to a more manageable 400. The downsampled EEG data uses normalization with 0 as the mean and 1 as the standard derivation for preprocessing. And we select 20 channels as [34]: FC5𝐹subscript𝐶5FC_{5}italic_F italic_C start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT, FC3𝐹subscript𝐶3FC_{3}italic_F italic_C start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT, FC1𝐹subscript𝐶1FC_{1}italic_F italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, FC2𝐹subscript𝐶2FC_{2}italic_F italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, FC4𝐹subscript𝐶4FC_{4}italic_F italic_C start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT, FC6𝐹subscript𝐶6FC_{6}italic_F italic_C start_POSTSUBSCRIPT 6 end_POSTSUBSCRIPT, C5subscript𝐶5C_{5}italic_C start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT, C3subscript𝐶3C_{3}italic_C start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT, C1subscript𝐶1C_{1}italic_C start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, Czsubscript𝐶𝑧C_{z}italic_C start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT, C2subscript𝐶2C_{2}italic_C start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, C4subscript𝐶4C_{4}italic_C start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT, C6subscript𝐶6C_{6}italic_C start_POSTSUBSCRIPT 6 end_POSTSUBSCRIPT, CP5𝐶subscript𝑃5CP_{5}italic_C italic_P start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT, CP3𝐶subscript𝑃3CP_{3}italic_C italic_P start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT, CP1𝐶subscript𝑃1CP_{1}italic_C italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, CPz𝐶subscript𝑃𝑧CP_{z}italic_C italic_P start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT, CP2𝐶subscript𝑃2CP_{2}italic_C italic_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, CP4𝐶subscript𝑃4CP_{4}italic_C italic_P start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT, CP6𝐶subscript𝑃6CP_{6}italic_C italic_P start_POSTSUBSCRIPT 6 end_POSTSUBSCRIPT. For testing, we utilize the packaged test set available within the OpenBMI interface, which offers 200 records for each test subject. Throughout the training phase, we cycle through each subject as a test case, from the first to the last. The final performance result is determined by averaging the accuracy across these iterations. Essentially, each model undergoes training and testing 54 times to ascertain the final result, removing some uncertainty.

IV-B Network Setups

TABLE I: Architecture of networks during training. B is the batch size, C is channel size, S is the number of timepieces, and T is the number of time points in each timepiece.

Block

SNN layer

CNN layer

Filters

Size/padding

Output

Spike Encoding

Input

Input

-

-

(B, C, S, T)

Clone residual

Clone residual

-

-

(B, C, S, T)

Conv2d

Conv2d

C

(1, 5)/same

(B, C, S, T)

-

BatchNorm2d

C

-

(B, C, S, T)

-

ReLU

-

-

(B, C, S, T)

LIF

-

-

-

(B, C, S, T)

MaxPool2d

AvgPool2d

-

(2, 2)

(B, C, S/2, T/2)

Classifier

AC-Conv MAC-Conv2d

C

(10, 10)/same

(B, C, S/2, T/2)

Attention

Attention

-

-

(B, C, S/2, T/2)

Add residual

Add residual

-

-

(B, C, S/2, T/2)

-

BatchNorm2d

C

-

(B, C, S/2, T/2)

-

ReLU

-

-

(B, C, S/2, T/2)

LIF

-

-

-

(B, C, S/2, T/2)

MaxPool2d

AvgPool2d

-

(2, 2)

(B, C, S/4, T/4)

Flatten

Flatten

-

-

(B, C×\times×S/4×\times×T/4)

AC-Linear MAC-Linear

-

-

(B, 20)

Linear

Linear

-

-

(B, 2)

As described in Section IV-A, the input data processed has a dimension size of B×C×S×Tsuperscript𝐵𝐶𝑆𝑇\mathbb{R}^{B\times C\times S\times T}blackboard_R start_POSTSUPERSCRIPT italic_B × italic_C × italic_S × italic_T end_POSTSUPERSCRIPT. In this paper, B𝐵Bitalic_B denotes the batch size, C𝐶Citalic_C is the channel size, S𝑆Sitalic_S indicates the number of timepieces, and T𝑇Titalic_T is the number of time steps within each timepiece. In the experiment, we set a batch size of 64 and a channel size of 20. Subsequent to the downsampling process, each record has a total of 400 time steps. We set these records to be segmented into 20 timepieces, rendering S𝑆Sitalic_S and T𝑇Titalic_T as 20. We found it does not affect to involve e^()^𝑒\hat{e}(\cdot)over^ start_ARG italic_e end_ARG ( ⋅ ) into (8). Therefore, to simplify the computation, we uniformly assign a value of 1 to all e^()^𝑒\hat{e}(\cdot)over^ start_ARG italic_e end_ARG ( ⋅ ) within (8). Also, we set the threshold Vthsubscript𝑉𝑡V_{th}italic_V start_POSTSUBSCRIPT italic_t italic_h end_POSTSUBSCRIPT in the Non-iterative LIF neuron as 1.0.

The network parameters are shown in Table I. The first convolutional layer acts as a spiking encoder, only considering the information within each timepiece. Therefore, its kernel size is set as (1,5)15(1,5)( 1 , 5 ), accompanied by padding to the same size with the input and a stride of 1. The second convolutional layer plays the role of classifier, taking into account both intra-timepieces and inter-timepiece information. Therefore, this layer has a kernel size of (10,10)1010(10,10)( 10 , 10 ), padding to the size dimension with the input and a stride of 1. All pooling layers in the network employ the kernel size of (2,2)22(2,2)( 2 , 2 ). The first linear layer reduces the flattened data dimension to 20, and the final linear layer reduces it further to 2, thereby having the final classification result. Within the attention layer, the hyperparameters d1subscript𝑑1d_{1}italic_d start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and d2subscript𝑑2d_{2}italic_d start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT are set to 6 and 20, respectively, for all linear attention models. For convolutional attention models, the hyperparameter d𝑑ditalic_d is set to 8. The models are trained for 20 epochs. We employ a well-trained CNN model as a pre-trained network for its corresponding SNN model to accelerate the training procedure. During the training process, we adopted Adam optimizer with a learning rate of 0.001. The cross-entropy loss is utilized as the loss function:

CE_Loss(y,p)=nnynlog(pn),CE_Loss𝑦𝑝superscriptsubscript𝑛nsubscript𝑦nsubscript𝑝n\text{CE\_Loss}(y,p)=-\sum_{n}^{\rm n}y_{\text{n}}\log(p_{\text{n}}),CE_Loss ( italic_y , italic_p ) = - ∑ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_n end_POSTSUPERSCRIPT italic_y start_POSTSUBSCRIPT n end_POSTSUBSCRIPT roman_log ( italic_p start_POSTSUBSCRIPT n end_POSTSUBSCRIPT ) , (26)

where y𝑦yitalic_y represents the label of the data and p𝑝pitalic_p is the network’s output.

IV-C Energy analysis

For analyzing the energy consumption of CNN and SNN models, we adopt the same energy analysis method in [24], which calculates the network’s floating point operations (FLOP).

The main operations within neural networks in this context can be categorically divided into three primary types: the convolutional layer, the linear layer, and matrix multiplication. The FLOPs associated with each of these operations can be described as:

FLOPsconvn=k0nk1nhnwncncn1,superscriptsubscriptFLOPsconvnsubscriptsuperscript𝑘n0subscriptsuperscript𝑘n1superscriptnsuperscript𝑤nsuperscript𝑐nsuperscript𝑐n1\displaystyle\text{FLOPs}_{\text{conv}}^{\rm n}=k^{\rm n}_{0}k^{\rm n}_{1}h^{% \rm n}w^{\rm n}c^{\rm n}c^{\rm n-1},FLOPs start_POSTSUBSCRIPT conv end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_n end_POSTSUPERSCRIPT = italic_k start_POSTSUPERSCRIPT roman_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_k start_POSTSUPERSCRIPT roman_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_h start_POSTSUPERSCRIPT roman_n end_POSTSUPERSCRIPT italic_w start_POSTSUPERSCRIPT roman_n end_POSTSUPERSCRIPT italic_c start_POSTSUPERSCRIPT roman_n end_POSTSUPERSCRIPT italic_c start_POSTSUPERSCRIPT roman_n - 1 end_POSTSUPERSCRIPT , (27)
FLOPsfcn=inon,superscriptsubscriptFLOPsfcnsuperscript𝑖nsuperscript𝑜n\displaystyle\text{FLOPs}_{\text{fc}}^{\rm n}=i^{\rm n}o^{\rm n},FLOPs start_POSTSUBSCRIPT fc end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_n end_POSTSUPERSCRIPT = italic_i start_POSTSUPERSCRIPT roman_n end_POSTSUPERSCRIPT italic_o start_POSTSUPERSCRIPT roman_n end_POSTSUPERSCRIPT ,
FLOPsmm=m1nm2,subscriptFLOPsmmsubscript𝑚1𝑛subscript𝑚2\displaystyle\text{FLOPs}_{\text{mm}}=m_{1}nm_{2},FLOPs start_POSTSUBSCRIPT mm end_POSTSUBSCRIPT = italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_n italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ,

where for the nthsuperscript𝑛thn^{\text{th}}italic_n start_POSTSUPERSCRIPT th end_POSTSUPERSCRIPT convolutional layer, k0nsubscriptsuperscript𝑘n0k^{\rm n}_{0}italic_k start_POSTSUPERSCRIPT roman_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and k1nsubscriptsuperscript𝑘n1k^{\rm n}_{1}italic_k start_POSTSUPERSCRIPT roman_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT denote the dimensions of the convolutional kernel, while hnsuperscriptnh^{\rm n}italic_h start_POSTSUPERSCRIPT roman_n end_POSTSUPERSCRIPT and wnsuperscript𝑤nw^{\rm n}italic_w start_POSTSUPERSCRIPT roman_n end_POSTSUPERSCRIPT represent the dimensions of the output map. In addition, cnsuperscript𝑐nc^{\rm n}italic_c start_POSTSUPERSCRIPT roman_n end_POSTSUPERSCRIPT and cn1superscript𝑐n1c^{\rm n-1}italic_c start_POSTSUPERSCRIPT roman_n - 1 end_POSTSUPERSCRIPT specify the size of the input and output data channels, respectively. For the nthsuperscript𝑛𝑡n^{th}italic_n start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT linear layer, insuperscript𝑖ni^{\rm n}italic_i start_POSTSUPERSCRIPT roman_n end_POSTSUPERSCRIPT represents the input size, and onsuperscript𝑜no^{\rm n}italic_o start_POSTSUPERSCRIPT roman_n end_POSTSUPERSCRIPT denotes the output size. Finally, for matrix multiplication involving two matrices of dimensions [m1,n]subscript𝑚1𝑛[m_{1},n][ italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_n ] and [n,m2]𝑛subscript𝑚2[n,m_{2}][ italic_n , italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ], the FLOPs are determined as the product of these three dimensions.

In the CNN models, all network data consist of real-valued numbers, which makes all operations MAC. Table II analyzes the FLOPs associated with CNN models. This analysis can be divided into the FLOPs corresponding to the standard vanilla CNN model, denoted by x𝑥xitalic_x, and the FLOPs for the additional attention model. Given our consistent network architecture for all models, the value of x𝑥xitalic_x remains constant, 4.810,040.

TABLE II: FLOPs of CNN model execution

Method

MAC

Autthasan et al.[34]

x𝑥xitalic_x

Huang et al.[32]

x𝑥xitalic_x

Liu et al.[12]

x𝑥xitalic_x+1,644,200

Luo et al.[41]

x𝑥xitalic_x+2,469,600

Fan et al.[46], Wang et al. [44]

x𝑥xitalic_x+45,200

Zhang et al.[43]

x𝑥xitalic_x+180,000

Linear-Seq-attention

x𝑥xitalic_x+97,800

Conv-Seq-attention

x𝑥xitalic_x+822,100

Linear-ChanSeq-attention

x𝑥xitalic_x+496,800

Conv-ChanSeq-attention

x𝑥xitalic_x+824,000

Global-attention

x𝑥xitalic_x+806,000

In the SNN models, the FLOPs analysis is more complicated. As elaborated in III-C, the inputs to both the second convolutional layer and the first linear layer are all binary. Consequently, these layers employ AC operations, which accumulate weight directly without involving multiplications. It is worth noting that the number of AC operations is contingent upon the spike rate, given that accumulation happens only when the input is 1. Thus, the spike rates of these two layers emerge as important parameters when quantifying the AC operations within SNN models. The rest of the SNN retains the use of MAC operations, mirroring the corresponding CNN models. Therefore, the FLOPs assessment for SNN models can be partitioned into three parts: AC of the binary layers, MAC of the rest of the SNN model, and the MAC of the additional attention model. The MAC of the rest of the SNN model could be divided into two parts: MAC of LIF neurons; and MAC of the rest of the SNN model with LIF neurons. We present the MAC of the rest SNN model with Non-iterative LIF neurons as y𝑦yitalic_y, which is a constant across all SNN models with Non-iterative LIF neurons, calculated as 1,170,040. The MAC of SNN with Iterative LIF neurons is less than y𝑦yitalic_y, which is calculated as y180,000𝑦180000y-180,000italic_y - 180 , 000. It should be highlighted that while the FLOPs associated with Iterative LIF neurons are fewer than those of Non-iterative LIF neurons, the latter are implemented using matrix operations, whereas the former rely on loop structures, leading to increased execution time. The original AC for the second convolutional layer is 4,000,000, denoted by a𝑎aitalic_a, while for the first linear layer, it is 10,000, represented by b𝑏bitalic_b. The final amount of AC is the product of the original AC and the spike rate. Details of FLOPs for SNN models are presented in Table III.

TABLE III: FLOPs of SNN model execution. Several SNNs with attention models were proposed in [24]: Channel Attention (CA), Spatial Attention (SA), Temporal Attention (TA), Channel-Temporal Attention (CTA), Channel-Spatial Attention(CSA), Spatial-Temporal Attention (STA), Channel-Spatial-Temporal Attention (CSTA)

Method

MAC

AC_Conv

AC_Linear

Wu et al. [56] (Iterative LIF)

y𝑦yitalic_y-180,000

0.811a

0.404b

Wu et al. [56]

y𝑦yitalic_y

0.320a

0.297b

Hu et al. [69]

y𝑦yitalic_y

0.292a

0.317b

Yao et al. [24] (CA)

y𝑦yitalic_y+3,600

0.307a

0.299b

Yao et al. [24] (SA)

y𝑦yitalic_y+2,400

0.364a

0.240b

Yao et al. [24] (TA)

y𝑦yitalic_y+2,400

0.376a

0.310b

Yao et al. [24] (CTA)

y𝑦yitalic_y+6,000

0.344a

0.293b

Yao et al. [24] (CSA)

y𝑦yitalic_y+6,000

0.362a

0.296b

Yao et al. [24] (STA)

y𝑦yitalic_y+4,800

0.430a

0.293b

Yao et al. [24] (CSTA)

y𝑦yitalic_y+8,400

0.398a

0.295b

Linear-Seq-attention

y𝑦yitalic_y+97,800

0.297a

0.297b

Conv-Seq-attention

y𝑦yitalic_y+822,100

0.308a

0.293b

Linear-ChanSeq-attention

y𝑦yitalic_y+496,800

0.305a

0.268b

Conv-ChanSeq-attention

y𝑦yitalic_y+824,000

0.308a

0.290b

Global-attention

y𝑦yitalic_y+806,000

0.308a

0.295b

After the FLOPs analysis, we could calculate the total energy cost. We adopt the same assumption as in [24] that the data for various operations are implemented as floating point 32 bits in 45nm technology, where the MAC energy is 4.6pJ𝑝𝐽pJitalic_p italic_J and the AC energy is 0.9pJ0.9𝑝𝐽0.9pJ0.9 italic_p italic_J.

V Result and Discussion

In this section, we describe the results of the proposed attention models. The results are delineated in Section V-A. Then, a discussion along with the result visualization is provided in Section V-B.

TABLE IV: Accuracy and energy performance of Left H. vs Right H. subject-independent classification using the OpenBMI dataset for SNNs with attention mechanisms.

Reference

Method

Type

Accuracy

Energy (μJ𝜇𝐽\mu{J}italic_μ italic_J)

Autthasan et al. [34]

vanilla CNN

CNN

0.73679 +/-0.13201

23.569

Huang et al. [32]

vanilla CNN with residual blocks

CNN

0.73698 +/- 0.13137

23.569

Liu et al. [12]

Sequence + Temporal attention

CNN

0.73939 +/- 0.12799

31.626

Luo et al. [41]

Sequence + Temporal attention

CNN

0.71383 +/- 0.14112

35.670

Fan et al. [46], Wang et al. [44]

Channel + Sequence + Temporal attention

CNN

0.73862 +/- 0.12788

23.791

Zhang et al. [43]

Channel attention

CNN

0.74045 +/- 0.13622

24.451

Wu et al. [56]

vanilla SNN (with the Iterative LIF neuron)

SNN

0.50357 +/- 0.00529

7.774

Wu et al. [56]

vanilla SNN

SNN

0.70669 +/- 0.11510

6.888

Hu et al. [69]

vanilla SNN with residual blocks

SNN

0.70129 +/- 0.11720

6.787

Yao et al. [24]

Channel attention

SNN

0.68673 +/- 0.11113

6.859

Yao et al. [24]

Sequence attention

SNN

0.68798 +/- 0.10939

7.058

Yao et al. [24]

Temporal attention

SNN

0.68056 +/- 0.10473

7.101

Yao et al. [24]

Channel + Temporal attention

SNN

0.67805 +/- 0.11100

7.004

Yao et al. [24]

Channel + Sequence attention

SNN

0.67284 +/- 0.10830

7.068

Yao et al. [24]

Sequence + Temporal attention

SNN

0.66262 +/- 0.09970

7.307

Yao et al. [24]

Channel + Sequence + Temporal attention

SNN

0.64574 +/- 0.09904

7.210

Ours

Linear-Seq-attention

SNN

0.70689 +/- 0.12624

7.284

Ours

Conv-Seq-attention

SNN

0.72222 +/- 0.12551

10.873

Ours

Linear-ChanSeq-attention

SNN

0.71354 +/- 0.12620

9.268

Ours

Conv-ChanSeq-attention

SNN

0.72791 +/- 0.12791

10.882

Ours

Global-attention

SNN

0.72830 +/- 0.12700

10.794

  • *

    Only the first vanilla SNN uses the Iterative LIF neurons (specified already), otherwise all SNN models use the proposed Non-iterative LIF neuron model.

TABLE V: Accuracy and energy performance of Left H. vs Right H. subject-independent classification using the OpenBMI dataset for CNNs with proposed attention mechanisms.

Method

Type

Accuracy

Energy (μJ𝜇𝐽\mu{J}italic_μ italic_J)

CNN with Linear-Seq-attention

CNN

0.73582 +/- 0.13443

24.048

CNN with Linear-ChanSeq-attention

CNN

0.73708 +/- 0.13239

27.597

CNN with Conv-Seq-attention

CNN

0.74267 +/- 0.12989

26.004

CNN with Conv-ChanSeq-attention

CNN

0.74055 +/- 0.13320

27.607

CNN with Global-attention

CNN

0.74122 +/- 0.13039

27.519

V-A Comparison with state-of-the-art methods

Table IV compares the performances of various attention mechanisms with CNNs and SNNs. We have chosen seven CNN models with attention mechanisms for EEG classification as our benchmark. Autthasam et al. [34] have proposed a vanilla CNN model, employing a subject-independent approach for training and testing phases. Furthermore, Huang et al. [32] incorporated residual blocks into the CNN model. Of the seven baseline models, five models employ attention mechanisms. It is worth noting that [46] and [44] employ a global attention mechanism that encompasses all three dimensions. However, their approach is characterized by extracting attention scores individually for each dimension after using the pooling methods to minimize unrelated dimensions. This stands differently from the methodology of our proposed Global-attention model. A total of ten models were chosen as the SNN baselines. It is worth noting that in the context of EEG signal processing, the Iterative LIF model struggles with the long time steps, potentially leading to gradient issues. Therefore, we adopted the proposed Non-iterative LIF neuron in other baseline models. Concerning the attention mechanism, the baseline models are from the image attention SNN model developed for computer vision, as introduced in [24]. This approach composes various dimensional attention components sequentially to achieve the attention score, diverging from our proposed models that utilize a singular model.

The accuracy metric was determined by the ratio of correctly classified samples to the total number of samples, as given by:

Accuracy=TP+TNTP+TN+FP+FN,AccuracyTPTNTPTNFPFN\text{Accuracy}=\frac{\text{TP}+\text{TN}}{\text{TP}+\text{TN}+\text{FP}+\text% {FN}},Accuracy = divide start_ARG TP + TN end_ARG start_ARG TP + TN + FP + FN end_ARG , (28)

where TP, TN, FP, and FN represent true positives, true negatives, false positives, and false negatives, respectively. For energy analysis, we adopted the unit μJ𝜇𝐽\mu Jitalic_μ italic_J to quantify the energy consumption of the networks as discussed in Section IV-C.

From our analysis, three proposed convolution-based NiSNN-As all achieve an accuracy of over 0.72. Our Global NiSNN-A achieves the highest accuracy of 0.7283 among SNN models. However, this accuracy could be comparable with the CNN models, with the additional advantage of consuming around 2 times less energy.

Notably, compared with the Iterative LIF neuron model, the proposed Non-iterative LIF neuron model improves the accuracy by 0.2 on the same network architecture while reducing the energy cost. In table III, the firing rate of non-iterative LIF neurons is much lower than that of iterative LIF neurons, which illustrates the sparsity of our proposed method.

To verify our attention mechanisms could also work on CNN models, we present the comparison results of attention CNNs in Table V. The results suggest that our Conv-ChanSeq-attention mechanism overperforms other CNN models, showing the proposed methods’ feasibility.

V-B Discussion

Refer to caption
Figure 6: The generated attention visualization example, using our Conv-Channel-Sequence SNN attention model. The top figure shows the input EEG signals. In the following figure, black scatters represent the spikes after encoding. The shade of orange blocks represents the degree of attention. The darker orange block shows a higher attention score, and the lighter orange block represents a lower one.

The comparative results of CNN and SNN models underscore the potential of the attention mechanism to improve the accuracy of EEG signal classification, especially for SNN models. To provide a clearer understanding of how the attention mechanism works, we aim to present a straightforward visualization that illustrates the functioning of the attention score A𝐴Aitalic_A across the entire feature map. The Global-attention model employs an element-wise product to derive the enhanced feature map, facilitating a more intuitive representation. In this model, the attention score at each position directly indicates the relative importance allocated to the corresponding position. Figure 6 visually represents the Global-attention model’s attention score A𝐴Aitalic_A. The upper part of the figure delineates the input 20 channel EEG signals in a numeric format with real values. The following part presents the spikes after the spiking encoder in the network, represented as the black raster. The orange blocks represent the attention scores. Darker shades of orange represent higher attention scores, whereas lighter shades indicate lower ones. It illustrates the internal operations of the attention mechanism, highlighting the model’s capacity to assign variable attention across different regions of the EEG signals, thereby answering when, where, and which information is relevant. By employing such a dynamic weighting method, the model ensures that the most important regions of the input are prioritized, potentially contributing to the high accuracy in classification tasks.

From an energy consumption point of view, the results highlight the significance of employing SNNs, which achieve accuracy comparable to CNN models and offer a 2-fold reduction in energy use. This efficiency can be important in applications where power consumption is a concern, such as portable EEG devices or real-time EEG monitoring systems. Therefore, SNNs present promising potential for these energy-conscious edge devices.

VI Conclusion

This paper introduced an innovative NiSNN-A model, encompassing a novel Non-iterative LIF neuron and diverse attention mechanisms. The newly proposed Non-iterative LIF neuron retains the biological attributes of traditional LIF neurons while efficiently handling long temporal data. This design avoids long loops in execution and gradient challenges by leveraging matrix operations within neurons. Subsequently, the attention mechanism emphasizes important parts in the feature map. Notably, all our proposed attention models integrate computations within one singular model instead of using sequential architectures. We employed the OpenBMI dataset for validation, adopting a subject-independent approach to demonstrate the model’s capabilities in uniformed feature extraction for unfamiliar participants. The results indicate that our approach surpasses other SNN models in accuracy performance. It achieves accuracy comparable to its CNN counterparts, but improves energy efficiency. Furthermore, our attention visualization results reveal that our model improves the classification task’s accuracy and offers deeper insights into EEG signal interpretation.

This research has provided a way for novel methodologies in EEG classification, focusing on potential cooperation between attention mechanisms and spiking neural network architectures. In the future, as the field of EEG signal processing continues to evolve, our findings require continued innovation and adaptive strategies to address challenges.

References

  • [1] A. Tsiamalou, E. Dardiotis, K. Paterakis, G. Fotakopoulos, I. Liampas, M. Sgantzos, V. Siokas, and A. G. Brotis, “EEG in neurorehabilitation: a bibliometric analysis and content review,” Neurology International, vol. 14, no. 4, pp. 1046–1061, 2022.
  • [2] C. Del Percio, S. Lopez, G. Noce, R. Lizio, F. Tucci, A. Soricelli, R. Ferri, F. Nobili, D. Arnaldi, F. Famà et al., “What a single electroencephalographic (EEG) channel can tell us about alzheimer’s disease patients with mild cognitive impairment,” Clinical EEG and Neuroscience, vol. 54, no. 1, pp. 21–35, 2023.
  • [3] A. Singh, A. A. Hussain, S. Lal, and H. W. Guesgen, “A comprehensive review on critical issues and possible solutions of motor imagery based electroencephalography brain-computer interface,” Sensors, vol. 21, no. 6, p. 2173, 2021.
  • [4] J. Zhang and M. Wang, “A survey on robots controlled by motor imagery brain-computer interfaces,” Cognitive Robotics, vol. 1, pp. 12–24, 2021.
  • [5] S. Rajwal and S. Aggarwal, “Convolutional neural network-based EEG signal analysis: A systematic review,” Archives of Computational Methods in Engineering, pp. 1–31, 2023.
  • [6] S. Alhagry, A. A. Fahmy, and R. A. El-Khoribi, “Emotion recognition based on eeg using lstm recurrent neural network,” International Journal of Advanced Computer Science and Applications, vol. 8, no. 10, 2017.
  • [7] Y. Song, Q. Zheng, B. Liu, and X. Gao, “Eeg conformer: Convolutional transformer for eeg decoding and visualization,” IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 31, pp. 710–719, 2022.
  • [8] O.-Y. Kwon, M.-H. Lee, C. Guan, and S.-W. Lee, “Subject-independent brain–computer interfaces based on deep convolutional neural networks,” IEEE Transactions on Neural Networks and Learning Systems, vol. 31, no. 10, pp. 3839–3852, Oct. 2020.
  • [9] J.-S. Bang, M.-H. Lee, S. Fazli, C. Guan, and S.-W. Lee, “Spatio-spectral feature representation for motor imagery classification using convolutional neural networks,” IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 7, pp. 3038–3049, Jul. 2022.
  • [10] S. Zhang, L. Wu, S. Yu, E. Shi, N. Qiang, H. Gao, J. Zhao, and S. Zhao, “An explainable and generalizable recurrent neural network approach for differentiating human brain states on EEG dataset,” IEEE Transactions on Neural Networks and Learning Systems, pp. 1–12, 2022.
  • [11] C. Ju and C. Guan, “Tensor-cspnet: a novel geometric deep learning framework for motor imagery classification,” IEEE Transactions on Neural Networks and Learning Systems, vol. 34, no. 12, pp. 10 955–10 969, Dec. 2023.
  • [12] X. Liu, Y. Shen, J. Liu, J. Yang, P. Xiong, and F. Lin, “Parallel spatial–temporal self-attention cnn-based motor imagery classification for bci,” Frontiers in neuroscience, vol. 14, p. 587520, 2020.
  • [13] E. Eldele, Z. Chen, C. Liu, M. Wu, C.-K. Kwoh, X. Li, and C. Guan, “An attention-based deep learning approach for sleep stage classification with single-channel EEG,” IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 29, pp. 809–818, 2021.
  • [14] X. Liao, Y. Wu, Z. Wang, D. Wang, and H. Zhang, “A convolutional spiking neural network with adaptive coding for motor imagery classification,” Neurocomputing, p. 126470, 2023.
  • [15] Y. Zhang, T. Zhou, W. Wu, H. Xie, H. Zhu, G. Zhou, and A. Cichocki, “Improving EEG decoding via clustering-based multitask feature learning,” IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 8, pp. 3587–3597, Aug. 2022.
  • [16] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
  • [17] S. Li, Q. Yan, and P. Liu, “An efficient fire detection method based on multiscale feature extraction, implicit deep supervision and channel attention mechanism,” IEEE Transactions on Image Processing, vol. 29, pp. 8467–8475, 2020.
  • [18] S. Abirami and P. Chitra, “Energy-efficient edge based real-time healthcare support system,” in Advances in computers.   Elsevier, 2020, vol. 117, no. 1, pp. 339–368.
  • [19] Z. Huang and M. Wang, “A review of electroencephalogram signal processing methods for brain-controlled robots,” Cognitive Robotics, vol. 1, pp. 111–124, 2021.
  • [20] S. Woźniak, A. Pantazi, T. Bohnstingl, and E. Eleftheriou, “Deep learning incorporating biologically inspired neural dynamics and in-memory computing,” Nature Machine Intelligence, vol. 2, no. 6, pp. 325–336, 2020.
  • [21] C. D. Schuman, S. R. Kulkarni, M. Parsa, J. P. Mitchell, P. Date, and B. Kay, “Opportunities for neuromorphic computing algorithms and applications,” Nature Computational Science, vol. 2, no. 1, pp. 10–19, 2022.
  • [22] Y. Liu and W. Pan, “Spiking neural-networks-based data-driven control,” Electronics, vol. 12, no. 2, p. 310, 2023.
  • [23] M. A. Siddiqi, D. Vrijenhoek, L. P. Landsmeer, J. van der Kleij, A. Gebregiorgis, V. Romano, R. Bishnoi, S. Hamdioui, and C. Strydis, “A lightweight architecture for real-time neuronal-spike classification,” arXiv preprint arXiv:2311.04808, 2023.
  • [24] M. Yao, G. Zhao, H. Zhang, Y. Hu, L. Deng, Y. Tian, B. Xu, and G. Li, “Attention spiking neural networks,” IEEE transactions on pattern analysis and machine intelligence, 2023.
  • [25] A. Al-Saegh, S. A. Dawwd, and J. M. Abdul-Jabbar, “Deep learning for motor imagery EEG-based classification: A review,” Biomedical Signal Processing and Control, vol. 63, p. 102172, 2021.
  • [26] J. Cui, Z. Lan, O. Sourina, and W. Müller-Wittig, “EEG-based cross-subject driver drowsiness recognition with an interpretable convolutional neural network,” IEEE Transactions on Neural Networks and Learning Systems, vol. 34, no. 10, pp. 7921–7933, Oct. 2023.
  • [27] H. Dong, A. Supratak, W. Pan, C. Wu, P. M. Matthews, and Y. Guo, “Mixed neural network approach for temporal sleep stage classification,” IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 26, no. 2, pp. 324–333, 2017.
  • [28] J. Wang, R. Gao, H. Zheng, H. Zhu, and C.-J. R. Shi, “Ssgcnet: a sparse spectra graph convolutional network for epileptic EEG signal classification,” IEEE Transactions on Neural Networks and Learning Systems, pp. 1–15, 2023.
  • [29] H. Dose, J. S. Møller, H. K. Iversen, and S. Puthusserypady, “An end-to-end deep learning approach to mi-EEG signal classification for bcis,” Expert Systems with Applications, vol. 114, pp. 532–542, 2018.
  • [30] Y. Li, L. Guo, Y. Liu, J. Liu, and F. Meng, “A temporal-spectral-based squeeze-and-excitation feature fusion network for motor imagery EEG decoding,” IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 29, pp. 1534–1545, 2021.
  • [31] V. J. Lawhern, A. J. Solon, N. R. Waytowich, S. M. Gordon, C. P. Hung, and B. J. Lance, “Eegnet: a compact convolutional neural network for EEG-based brain–computer interfaces,” Journal of neural engineering, vol. 15, no. 5, p. 056013, 2018.
  • [32] J.-S. Huang, W.-S. Liu, B. Yao, Z.-X. Wang, S.-F. Chen, and W.-F. Sun, “Electroencephalogram-based motor imagery classification using deep residual convolutional networks,” Frontiers in Neuroscience, vol. 15, p. 774857, 2021.
  • [33] A. I. Humayun, A. S. Sushmit, T. Hasan, and M. I. H. Bhuiyan, “End-to-end sleep staging with raw single channel eeg using deep residual convnets,” in 2019 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI).   IEEE, 2019, pp. 1–5.
  • [34] P. Autthasan, R. Chaisaen, T. Sudhawiyangkul, P. Rangpong, S. Kiatthaveephong, N. Dilokthanakul, G. Bhakdisongkhram, H. Phan, C. Guan, and T. Wilaiprasitporn, “Min2net: End-to-end multi-task learning for subject-independent motor imagery EEG classification,” IEEE Transactions on Biomedical Engineering, vol. 69, no. 6, pp. 2105–2118, 2021.
  • [35] X. Ma, S. Qiu, C. Du, J. Xing, and H. He, “Improving EEG-based motor imagery classification via spatial and temporal recurrent neural networks,” in 2018 40th annual international conference of the IEEE engineering in medicine and biology society (EMBC).   IEEE, 2018, pp. 1903–1906.
  • [36] L.-M. Zhao, X. Yan, and B.-L. Lu, “Plug-and-play domain adaptation for cross-subject eeg-based emotion recognition,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 1, 2021, pp. 863–870.
  • [37] L. Xia, Y. Feng, Z. Guo, J. Ding, Y. Li, Y. Li, M. Ma, G. Gan, Y. Xu, J. Luo, Z. Shi, and Y. Guan, “Mulhita: a novel multiclass classification framework with multibranch lstm and hierarchical temporal attention for early detection of mental stress,” IEEE Transactions on Neural Networks and Learning Systems, vol. 34, no. 12, pp. 9657–9670, Dec. 2023.
  • [38] L. Zhang, F. Xiao, and Z. Cao, “Multi-channel eeg signals classification via cnn and multi-head self-attention on evidence theory,” Information Sciences, vol. 642, p. 119107, 2023.
  • [39] X. Wang and Z. Wang, “Cnn with self-attention in EEG classification,” in International Conference on Human-Computer Interaction.   Springer, 2022, pp. 512–526.
  • [40] Y. Wen, W. He, and Y. Zhang, “A new attention-based 3d densely connected cross-stage-partial network for motor imagery classification in bci,” Journal of Neural Engineering, vol. 19, no. 5, p. 056026, 2022.
  • [41] J. Luo, Y. Wang, S. Xia, N. Lu, X. Ren, Z. Shi, and X. Hei, “A shallow mirror transformer for subject-independent motor imagery bci,” Computers in Biology and Medicine, vol. 164, p. 107254, 2023.
  • [42] Y. Ma, Y. Song, and F. Gao, “A novel hybrid cnn-transformer model for EEG motor imagery classification,” in 2022 International Joint Conference on Neural Networks (IJCNN).   IEEE, 2022, pp. 1–8.
  • [43] R. Zhang, G. Liu, Y. Wen, and W. Zhou, “Self-attention-based convolutional neural network and time-frequency common spatial pattern for enhanced motor imagery classification,” Journal of Neuroscience Methods, vol. 398, p. 109953, 2023.
  • [44] T. Wang, J. Mao, R. Xiao, W. Wang, G. Ding, and Z. Zhang, “Residual learning attention cnn for motion intention recognition based on EEG data,” in 2021 IEEE Biomedical Circuits and Systems Conference (BioCAS).   IEEE, 2021, pp. 1–6.
  • [45] H. Li, H. Chen, Z. Jia, R. Zhang, and F. Yin, “A parallel multi-scale time-frequency block convolutional neural network based on channel attention module for motor imagery classification,” Biomedical Signal Processing and Control, vol. 79, p. 104066, 2023.
  • [46] C.-C. Fan, H. Yang, Z.-G. Hou, Z.-L. Ni, S. Chen, and Z. Fang, “Bilinear neural network with 3-d attention for brain decoding of motor imagery movements from the human EEG,” Cognitive Neurodynamics, vol. 15, pp. 181–189, 2021.
  • [47] A. Barton, E. Volna, M. Kotyrba, and R. Jarusek, “Proposal of a control algorithm for multiagent cooperation using spiking neural networks,” IEEE Transactions on Neural Networks and Learning Systems, vol. 34, no. 4, pp. 2016–2027, Apr. 2023.
  • [48] Q. Liu, G. Pan, H. Ruan, D. Xing, Q. Xu, and H. Tang, “Unsupervised aer object recognition based on multiscale spatio-temporal features and spiking neurons,” IEEE Transactions on Neural Networks and Learning Systems, vol. 31, no. 12, pp. 5300–5311, Dec. 2020.
  • [49] D. Liu, N. Bellotto, and S. Yue, “Deep spiking neural network for video-based disguise face recognition based on dynamic facial movements,” IEEE Transactions on Neural Networks and Learning Systems, vol. 31, no. 6, pp. 1843–1855, Jun. 2020.
  • [50] A. Safa, F. Corradi, L. Keuninckx, I. Ocket, A. Bourdoux, F. Catthoor, and G. G. E. Gielen, “Improving the accuracy of spiking neural networks for radar gesture recognition through preprocessing,” IEEE Transactions on Neural Networks and Learning Systems, vol. 34, no. 6, pp. 2869–2881, Jun. 2023.
  • [51] M. Dampfhoffer, T. Mesquida, A. Valentian, and L. Anghel, “Backpropagation-based learning techniques for deep spiking neural networks: A survey,” IEEE Transactions on Neural Networks and Learning Systems, 2023.
  • [52] T. Zhang, S. Jia, X. Cheng, and B. Xu, “Tuning convolutional spiking neural network with biologically plausible reward propagation,” IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 12, pp. 7621–7631, Dec. 2022.
  • [53] S. Schliebs and N. Kasabov, “Evolving spiking neural network—a survey,” Evolving Systems, vol. 4, pp. 87–98, 2013.
  • [54] S. R. Kheradpisheh, M. Ganjtabesh, S. J. Thorpe, and T. Masquelier, “Stdp-based spiking deep convolutional neural networks for object recognition,” Neural Networks, vol. 99, pp. 56–67, 2018.
  • [55] B. Rueckauer, I.-A. Lungu, Y. Hu, M. Pfeiffer, and S.-C. Liu, “Conversion of continuous-valued deep networks to efficient event-driven networks for image classification,” Frontiers in neuroscience, vol. 11, p. 682, 2017.
  • [56] Y. Wu, L. Deng, G. Li, J. Zhu, and L. Shi, “Spatio-temporal backpropagation for training high-performance spiking neural networks,” Frontiers in neuroscience, vol. 12, p. 331, 2018.
  • [57] J. M. Antelis, L. E. Falcón et al., “Spiking neural networks applied to the classification of motor tasks in EEG signals,” Neural networks, vol. 122, pp. 130–143, 2020.
  • [58] Y. Luo, Q. Fu, J. Xie, Y. Qin, G. Wu, J. Liu, F. Jiang, Y. Cao, and X. Ding, “EEG-based emotion classification using spiking neural networks,” IEEE Access, vol. 8, pp. 46 007–46 016, 2020.
  • [59] N. Kasabov and E. Capecci, “Spiking neural network methodology for modelling, classification and understanding of EEG spatio-temporal data measuring cognitive processes,” Information Sciences, vol. 294, pp. 565–575, 2015.
  • [60] X. Wu, Y. Feng, S. Lou, H. Zheng, B. Hu, Z. Hong, and J. Tan, “Improving neucube spiking neural network for EEG-based pattern recognition using transfer learning,” Neurocomputing, vol. 529, pp. 222–235, 2023.
  • [61] Z. Yan, J. Zhou, and W.-F. Wong, “Energy efficient ecg classification with spiking neural network,” Biomedical Signal Processing and Control, vol. 63, p. 102170, 2021.
  • [62] ——, “EEG classification with spiking neural network: Smaller, better, more energy efficient,” Smart Health, vol. 24, p. 100261, 2022.
  • [63] S. Ghosh-Dastidar and H. Adeli, “Improved spiking neural networks for EEG classification and epilepsy and seizure detection,” Integrated Computer-Aided Engineering, vol. 14, no. 3, pp. 187–212, 2007.
  • [64] S. M. Bohte, J. N. Kok, and H. La Poutre, “Error-backpropagation in temporally encoded networks of spiking neurons,” Neurocomputing, vol. 48, no. 1-4, pp. 17–37, 2002.
  • [65] E. O. Neftci, H. Mostafa, and F. Zenke, “Surrogate gradient learning in spiking neural networks: Bringing the power of gradient-based optimization to spiking neural networks,” IEEE Signal Processing Magazine, vol. 36, no. 6, pp. 51–63, 2019.
  • [66] S. Hochreiter, Y. Bengio, P. Frasconi, J. Schmidhuber et al., “Gradient flow in recurrent nets: the difficulty of learning long-term dependencies,” 2001.
  • [67] M.-H. Lee, O.-Y. Kwon, Y.-J. Kim, H.-K. Kim, Y.-E. Lee, J. Williamson, S. Fazli, and S.-W. Lee, “EEG dataset and openbmi toolbox for three bci paradigms: An investigation into bci illiteracy,” GigaScience, vol. 8, no. 5, p. giz002, 2019.
  • [68] Y. Guo, Y. Zhang, Y. Chen, W. Peng, X. Liu, L. Zhang, X. Huang, and Z. Ma, “Membrane potential batch normalization for spiking neural networks,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 19 420–19 430.
  • [69] Y. Hu, Y. Wu, L. Deng, and G. Li, “Advancing residual learning towards powerful deep spiking neural networks,” arXiv e-prints, pp. arXiv–2112, 2021.
>