NiSNN-A: Non-iterative Spiking Neural Networks with Attention with Application to Motor Imagery EEG Classification

Chuhan Zhang, Wei Pan, Cosimo Della Santina Chuhan Zhang, Cosimo Della Santina are with the Department of Cognitive Robotics, Faculty of Mechanical Engineering, Delft University of Technology, Delft, Netherlands (e-mail: [email protected]; [email protected]).Wei Pan is with the Department of Computer Science, The University of Manchester, Manchester, United Kingdom (e-mail: [email protected]).

Abstract

Motor imagery, an important category in electroencephalogram (EEG) research, often intersects with scenarios demanding low energy consumption, such as portable medical devices and isolated environment operations. Traditional deep learning algorithms, despite their effectiveness, are characterized by significant computational demands accompanied by high energy usage. As an alternative, spiking neural networks (SNNs), inspired by the biological functions of the brain, emerge as a promising energy-efficient solution. However, SNNs typically exhibit lower accuracy than their counterpart convolutional neural networks (CNNs). Although attention mechanisms successfully increase network accuracy by focusing on relevant features, their integration in the SNN framework remains an open question. In this work, we combine the SNN and the attention mechanisms for the EEG classification, aiming to improve precision and reduce energy consumption. To this end, we first propose a Non-iterative Leaky Integrate-and-Fire (LIF) neuron model, overcoming the gradient issues in the traditional SNNs using the Iterative LIF neurons. Then, we introduce the sequence-based attention mechanisms to refine the feature map. We evaluated the proposed Non-iterative SNN with Attention (NiSNN-A) model on OpenBMI, a large-scale motor imagery dataset. Experiment results demonstrate that 1) our model outperforms other SNN models by achieving higher accuracy, 2) our model increases energy efficiency compared to the counterpart CNN models (i.e., by 2.27 times) while maintaining comparable accuracy.

Index Terms:

Spiking neural networks, Attention mechanism, Motor imagery, EEG classification.

I Introduction

Electroencephalograms (EEGs), whether captured through non-invasive electrodes on the scalp or directly via invasive devices, are the cornerstone of the rapidly evolving domain of Brain-Computer Interfaces (BCI). Therefore, accurate classification of EEG signals has attracted substantial attention over the years - with applications ranging from advanced neurorehabilitation techniques [1] to diagnostics and real-time health monitoring [2]. Within the realm of BCI, motor imagery (MI) holds a distinctive place [3]. MI refers to the mental imagination of specific movements by subjects, leading to the generation of distinct EEG patterns. Accurately classifying these EEG signals opens up application possibilities in advanced fields like robot control and assistive technologies [4].

Recently, deep learning (DL) methods such as Convolutional Neural Networks (CNNs) [5], Recurrent Neural Networks (RNNs) [6], and Transformers [7] have received increasing interest as a mean for classifying EEG signals [8, 9, 10]. At present, in addition to conventional CNNs or RNNs, DL of EEG incorporates many other technologies [11, 12, 13, 14, 15]. Attention mechanisms [16], inspired by human cognitive processes, enhance the model’s focus on relevant features, facilitating more efficient and accurate feature extraction [17]. Attention mechanisms have been successfully applied to EEG classification. More details are presented in Section II-B. However, all of these methods suffer from the current limitation of high energy consumption, which poses a significant barrier to deployment in low energy scenarios such as edge devices for healthcare [18] or robot control [19].

In this work, we investigate the use of Spiking Neural Networks (SNNs) for EEG classification. This technique mimics how biological neurons operate, allowing it to be used to interpret natural neuronal signals [20]. SNNs also offer a promising avenue for reducing energy consumption due to their event-based nature [21, 22, 23], which is attractive in view of edge applications. More details on the state-of-the-art of SNN in EEG classification are provided in Section II-C. We propose a novel integration of SNNs and the attention mechanism, especially for EEG classification. At the core of our approach sits a newly proposed Non-iterative Leaky Integrate-and-Fire (LIF) neuron, which utilizes matrix operations to approximate the neural dynamics of the biological LIF neuron. In this way, we can avoid lengthy loop executions and mitigate the gradient vanishing problem during training, thereby boosting both the efficiency and accuracy of the execution process. The second key methodological contribution is a sequence-based attention model for EEG data, which can simultaneously obtain the attention scores of feature maps. It is worth noting that only one work has investigated attention SNNs, which, however, specifically targeted computer vision [24]. As we show in our experimental comparison, this method is not suitable for classifying long-term data such as EEG.

The contributions of this paper can be summarized as follows:

1.

We show for the first time the combination of SNNs and attention mechanisms for motor imagery EEG classification, simultaneously achieving high accuracy and reducing energy consumption.
2.

We propose a novel Non-iterative LIF neuron model for SNNs tackling the gradient problem during long-time step backpropagation in the traditional Iterative LIF neuron model.
3.

We introduce a sequence-based attention mechanism for SNNs, improving the classification accuracy.

The rest of the paper is organized as follows. Section III introduces our proposed Non-iterative SNN with attention (NiSNN-A). Section IV gives the experiment details. The results and discussion are conducted in Section V. Section VI concludes the work.

II Related Work

In this section, the related works are introduced. Section II-A illustrates the application of deep learning techniques specific to EEG signal processing. Subsequently, Section II-B illustrates the works with attention mechanisms. Finally, the applications of SNN in EEG classification are described in Section II-C.

II-A Deep learning methods for EEG

In recent years, integrating deep learning techniques into EEG classification has gained significant traction [25]. CNN stands out among these techniques, particularly due to its ability to identify spatial-temporal patterns within the complex, multi-dimensional EEG data [26, 27, 28]. For instance, a temporal-spatial CNN was employed for EEG classification in [29, 30]. The work in [31] introduced EEGNet, which integrates a separable convolutional layer following the temporal and spatial modules. Enhancing the CNN architecture, [32, 33] incorporated residual blocks for classification. Another noteworthy approach is presented in [34], where an autoencoder is built upon the CNN framework. Furthermore, this work adopted a subject-independent training paradigm, emphasizing its scalability on varied EEG data from different subjects. Apart from CNNs, Long Short-Term Memory (LSTM) networks have also been recognized for EEG classification due to their inherent ability to process time sequences effectively [35, 36, 37]. These methods are limited in high energy consumption, making them difficult to use on some edge devices with low-energy requirements.

II-B Attention mechanism

The Transformer architecture and attention mechanism, originally introduced in [16] for natural language processing tasks, have seen increasing adoption in deep learning domain. These methods are now being explored in the field of EEG classification, given their ability to handle time sequence data. The attention mechanism is particularly important for EEG data analysis. It holds the potential to enhance classification accuracy and emphasizes specific segments of the data, offering deeper insights into EEG signal characteristics. Various attention models have emerged in the field of EEG classification. For instance, [12] presents a spatial and temporal attention model integrated with CNN. This approach leverages two distinct CNN modules to derive spatial and temporal attention scores separately, subsequently using four convolutional layers for classification. Meanwhile, [38] applies a multi-head attention module, as described in [16], combined with five convolutional layers to classify EEG signals. Direct applications of the Transformer attention structures from [16] for EEG classification are evident in [39, 40]. Beyond solely leveraging CNN and attention mechanisms, some studies have integrated additional techniques. For instance, [41] introduces a mirrored input approach combined with an attention model that operates across each data record. In a more intricate approach, [42] deploys spatial, spectral, and temporal Transformers, each catering to a different input data type. A popular EEG processing methodology, time-frequency Common Spatial Pattern (TFCSP), is highlighted in [43]. This method intertwines a two-layer CNN and an attention model, feeding their concatenated outputs into a classifier. A notable trend is the use of global attention in conjunction with three sequential models, as seen in [44, 45, 46]. These works employ three sequential attention models, each dedicated to a single dimension. By utilizing Global Average Pooling (GAP), they efficiently diminish unrelated dimensions and consolidate the attention scores across the three output dimensions. These models effectively achieved the goal of EEG signal classification. However, these methods have the limitation of high energy consumption due to the use of attention CNN, making them difficult to use on some edge devices with low-energy requirements. Also, they lack a special focus on data from different channels and different time areas, which is important in EEG data.

II-C Spiking neural networks

In recent years, SNNs have garnered increasing attention within the neural computing community with broad applications such as computer vision and robot control [47, 48, 49, 50, 22]. Their growing significance can be attributed to their closer resemblance to biological neural systems than CNNs. SNNs, by simulating the discrete, spike-based communication found in actual neurons, promise enhanced efficiency and energy savings. However, this bio-inspired approach comes with challenges, particularly during training. Gradient backpropagation, a staple in training CNNs, presents difficulties for SNNs due to their non-differentiable spiking nature. To address this, various training methods have been proposed [51, 52]: from leveraging evolutionary algorithms to adjust synaptic weights [53], to employing biologically-inspired synaptic update rules like Spiking-Time Dependent Plasticity (STDP) [54]. Some researchers have also explored converting a well-trained CNN into their SNN counterparts [55], while others have advocated for using surrogate gradients for continuing backpropagation [56]. Given the biology-inspired and efficient attributes of SNNs, several works have successfully applied SNNs in the domain of EEG signal classification, showcasing their potential in real-world applications. For example, [57] employed Particle Swarm Optimization as an evolutionary algorithm for weight updates, combined with an unsupervised classifier like K Nearest Neighbours and Multilayer Perceptron; however, their approach was not end-to-end and involved manual feature extraction. Studies that utilized STDP include [58], which integrated manual feature extraction methods like Fast Fourier Transform and Discrete Wavelet Transform with a 3D SNN reservoir and supervised classifiers. Similarly, [59] adopted an unsupervised learning framework, implementing a 3D SNN reservoir model. [60] combined the 3D reservoir with a Support Vector Machine as the classifier. Highlighting conversion techniques, [61] explored a tree structure, demonstrating the energy efficiency benefits of SNNs through a CNN-to-SNN conversion, while another work by [62] utilized Power Spectral Density for feature extraction before such a conversion. Back propagation-like methodologies also found their adaptation in SNNs and EEG classification realm with work [63] using SpikeProp [64]. [14] were notably the first to employ directly-trained SNNs for EEG signal classification. However, these methods have limitations when deepening the networks and seeking to learn complex representations.

III Methods

In this section, we introduce the novel Non-iterative LIF neuron in Section III-A. Subsequently, the proposed attention models are delineated in Section III-B. Finally, III-C describes the network architecture of the proposed NiSNN-A.

III-A LIF neuron

Neurons serve as the fundamental components of neural networks. In this section, the Iterative LIF neuron model is presented in Section III-A1. Then, our proposed Non-iterative LIF neuron model is detailed in Section III-A2. Finally, the comparisons are discussed in Section III-A3

III-A1 Background: Iterative LIF neuron model

In the SNN community, the Leaky Integrate-and-Fire (LIF) neuron model is widely used. It strikes a balance between simplicity and biologically inspired characteristics. The LIF model can be described using these equations:

		membrane potential:		(1)
		$\displaystyle\hskip 15.00002pt\left\{\begin{array}[]{ll}\tau\frac{du(t_{\text{% c}})}{dt_{\text{c}}}=-u(t_{\text{c}})+wx(t_{\text{c}}),\text{ if }u(t_{\text{c% }})\leq V_{\mathrm{th}},\\ \lim\limits_{\Delta\to 0;\Delta>0}u({t_{\text{c}}}+\Delta)=u_{\text{reset}},% \text{ if }u(t_{\text{c}})>V_{\mathrm{th}},\end{array}\right.$
		$\displaystyle\hskip 15.00002ptu_{\text{reset}}=\left\{\begin{array}[]{ll}u({t_% {\text{c}}})-V_{\text{th}},&\text{ with soft reset mechanism,}\\ u_{\text{r}},&\text{ with hard reset mechanism,}\end{array}\right.$
		spike generation:
		$\displaystyle\hskip 15.00002pto(t_{\text{c}})=g(u_{c}),$
		$\displaystyle\hskip 15.00002ptg(a)=\left\{\begin{array}[]{ll}0,&\text{if }a% \leq V_{\mathrm{th}},\\ 1,&\text{if }a>V_{\mathrm{th}},\end{array}\right.$

where $\tau\in\mathbb{R}$ is the membrane time constant and $u(t_{\text{c}})\in\mathbb{R}$ represents the neuron’s membrane potential at time ${t_{\text{c}}}$ . ${t_{\text{c}}}$ means continuous time. $wx(t_{\text{c}})\in\mathbb{R}$ is the input stimulus at time ${t_{\text{c}}}\in\mathbb{R}$ , denoted as the weighted input of the present layer within the context of neural networks. $w$ is the trainable parameter. $V_{\text{th}}\in\mathbb{R}$ is the membrane potential threshold. Specifically, when the membrane potential exceeds the threshold $V_{\mathrm{th}}$ , a spike is produced. When the membrane potential remains below the threshold, no spike is generated. After generating a spike, the membrane potential decreases to a reset potential $u_{\text{reset}}\in\mathbb{R}$ . Two types of reset mechanisms deal with the membrane potential after firing events: soft and hard reset. The soft reset mechanism resets the membrane potential by reducing the threshold potential $V_{\text{th}}$ ; while the hard reset mechanism resets the membrane potential to a defined potential value $u_{\text{r}}\in\mathbb{R}$ . $o(t_{\text{c}})\in\mathbb{R}$ represents the output spike at time $t_{\text{c}}$ . The function $g(\cdot)$ is the Heaviside step function, which illustrates the spike fire process.

To adapt to the requirements of backpropagation in neural networks, an Iterative LIF neuron model [56] with the soft reset mechanism is introduced:

	$\displaystyle u^{\text{t}+1}$	$\displaystyle=\lambda(u^{\text{t}}-V_{\text{th}}o^{\text{t}})+wx^{\text{t}},$		(2)
	$\displaystyle o^{\text{t}}$	$\displaystyle=g(u^{\text{t}}),$		(2)

where $\lambda\in\mathbb{R}$ denotes the decay rate of the membrane potential. We use $u^{\text{t}}$ to represent $u(t_{\text{c}}=t)$ where $t\in\mathbb{N}$ represents the discrete time step in the Iterative LIF neuron model. Similarly, $o^{\text{t}}$ means $o(t_{\text{c}}=t)$ and $x^{\text{t}}$ means $x(t_{\text{c}}=t)$ . In this way, the membrane potential updates step by step, recurrently making the LIF neuron dynamics trainable by a network. However, the Heaviside step function $g(\cdot)$ in the firing process makes it non-differentiable. This characteristic brings challenges for gradient backpropagation. To address this limitation, surrogate functions have been proposed [65]. Nowadays, the Sigmoid functions are commonly employed as surrogate functions due to their capability to emulate the spike firing process, especially when associated with a high value of $\alpha$ : $\text{Sigmoid}(x)=\frac{1}{1+e^{-\alpha x}}$ . Therefore, in the Iterative LIF neuron model, the Sigmoid functions are adopted during the gradient backpropagation:

\frac{\partial o}{\partial u}=\frac{\partial\text{Sigmoid}(u)}{\partial u}=% \text{Sigmoid}(u)(1-\text{Sigmoid}(u))\alpha.

(3)

III-A2 Non-iterative LIF neuron model

Refer to caption — (a) Iterative LIF neuron

As described in (2), the membrane potential $u$ in the Iterative LIF neuron model depends on the outcomes of the preceding time step before it can be updated and determine spike generation. Therefore, for cases with multiple time steps, a long-step loop is required during the execution of the LIF neuron. This lengthens the computation duration and introduces a gradient problem across the time dimension [66]. Thus, we propose a Non-iterative LIF model designed to accelerate computation by no longer relying on loops and avoiding gradient problems caused by long-time execution.

The core idea behind the Non-iterative LIF model is to treat the input $wx^{\text{t}}$ along all time steps $t_{\text{n}}$ as a matrix $X$ and utilize matrices to represent and approximate the neuron dynamics of the LIF neuron. First of all, given the assumption that input occurs only at time step $0$ , the differential equation denoted by (1) has the following solution $u^{\text{t}}=x^{0}(1-e^{-\frac{t}{\tau}})$ . The function $\hat{e}(\cdot)$ is utilized to denote the leaky component:

\hat{e}(t)=1-e^{-\frac{t}{\tau}}.

(4)

Given the assumption that input only occurs at the time step $i\in\mathbb{N}$ , the feasible solution could be reformulated as $u^{\text{t}}=x^{\rm i}{\hat{e}}(t-{i}),\text{ where }i\in[0,...,t]$ . Consequently, when inputs over all time steps before the current time step $t$ are accounted for, the solution could be represented as:

u^{\text{t}}=\sum^{t}_{i=0}wx^{\rm i}{\hat{e}}(t-i).

(5)

By using this method, (5) can be expressed through matrix computation for all time steps $t_{\text{n}}$ :

U=X\hat{E},

(6)

where $U\in\mathbb{R}^{1\times(t_{\text{n}}+1)}$ is defined as the membrane potential matrix, $X\in\mathbb{R}^{1\times(t_{\text{n}}+1)}$ is defined as the input matrix, and $\hat{E}\in\mathbb{R}^{(t_{\text{n}}+1)\times(t_{\text{n}}+1)}$ is defined as the leaky matrix:

$\displaystyle U=$	$\displaystyle\begin{bmatrix}u^{0}&u^{1}&\ldots&u^{\rm{t_{\text{n}}}}\end{bmatrix}$	(7)
$\displaystyle X=$	$\displaystyle\begin{bmatrix}wx^{0}&wx^{1}&\ldots&wx^{\rm t_{\text{n}}}\end{bmatrix}$
$\displaystyle\hat{E}=$	$\displaystyle\begin{bmatrix}\hat{e}(0)&\hat{e}(1)&\ldots&\hat{e}(t_{\text{n}})% \\ 0&\hat{e}(0)&\ldots&\hat{e}(t_{\text{n}}-1)\\ \vdots&\vdots&\ddots&\vdots\\ 0&0&\ldots&\hat{e}(0)\end{bmatrix}$

Additionally, we approximate the firing process of the LIF neuron through matrix calculations. (5) shows $U$ that already accounts for all inputs is the upper limit of the membrane potentials. The difference between the real membrane potential and $U$ lies in the influence of the reset mechanism brought by the output spike matrix $O\in\mathbb{R}^{1\times(t_{\text{n}}+1)}$ . We introduce the reset matrix $S\in\mathbb{R}^{(t_{\text{n}}+1)\times(t_{\text{n}}+1)}$ based on the soft reset mechanism (introduced in Section III-A1):

S=\begin{bmatrix}0&V_{\mathrm{th}}\hat{e}(0)&V_{\mathrm{th}}\hat{e}(1)&\ldots&% V_{\mathrm{th}}\hat{e}(t_{\text{n}}-1)\\ 0&0&V_{\mathrm{th}}\hat{e}(0)&\ldots&V_{\mathrm{th}}\hat{e}(t_{\text{n}}-2)\\ 0&0&0&\ldots&V_{\mathrm{th}}\hat{e}(t_{\text{n}}-3)\\ \vdots&\vdots&\vdots&\ddots&\vdots\\ 0&0&0&\ldots&0\end{bmatrix}.

(8)

Therefore, with $U$ , $S$ , $O$ , and the Heaviside fire function $g(\cdot)$ , we have the following identity:

g(U-OS)=O,

(9)

where only $O$ is the unknown variable. To solve (9), we propose the following proposition:

Proposition 1

Given a LIF neuron with leaky and integrated dynamics (6), the inequality

g(U-I_{\text{one}}S)\leq O\leq g(U)

(10)

always holds where $O\in\mathbb{R}^{1\times(t_{\text{n}}+1)}$ is the output spike matrix, $I_{\text{one}}$ is the all-ones matrix and $S$ is defined in (8).

Proof 1

Since $g(\cdot)$ is a step function, it is difficult to solve (9) through calculating the invert of $g(\cdot)$ . Thus, we introduce the approximation of the reset part $OS$ to calculate $O$ as ${\hat{O}}S$ . We use a reset matrix $U_{\text{reset}}\in\mathbb{R}^{1\times(t_{\text{n}}+1)}$ to denote the approximation ${\hat{O}}S$ . Therefore, the extreme cases of $U_{\text{reset}}$ are as follows:

	$\displaystyle\text{max}(U_{\text{reset}})$	$\displaystyle=\text{max}({\hat{O}})S=I_{\text{one}}S,$
	$\displaystyle\text{min}(U_{\text{reset}})$	$\displaystyle=\text{min}({\hat{O}})S=I_{\text{zero}}S=I_{\text{zero}},$

where $I_{\text{zero}}\in\mathbb{R}^{1\times(t_{\text{n}}+1)}$ is an all-zeros matrix. Therefore, (9) can be reformulated as:

	$\displaystyle g(U-\text{max}(U_{\text{reset}}))$	$\displaystyle\leq g(U-OS)\leq g(U-\text{min}(U_{\text{reset}})).$
	$\displaystyle g(U-I_{\text{one}}S)$	$\displaystyle\leq O\leq g(U).$

$\square$

Thus, we use Proposition 1 to estimate ${O}$ in $U_{\text{reset}}$ as ${\hat{O}}$ :

U_{\text{reset}}\approx{\hat{O}}S\leq g(U)S.

(11)

The spiking neuron should generate as few spikes as possible to keep the sparsity and improve computation efficiency. Thus, the extreme case in (11) is adopted in the Non-iterative LIF neuron model to approximate $U_{\text{reset}}$ :

U_{\text{reset}}=g(U)S.

(12)

This yields the final output matrix as follows:

	$\displaystyle U_{\text{final}}$	$\displaystyle=U-U_{\text{reset}},$		(13)
	$\displaystyle O_{\text{final}}$	$\displaystyle=g(U_{\text{final}}).$		(13)

To show the neuron model in the same expression format as the Iterative LIF neuron model in Section III-A1, we also express (6) to (13) in the same format of (2). Therefore, the Non-iterative LIF neuron model with the soft reset mechanism could be represented as:

$\displaystyle u^{\text{t}}$	$\displaystyle=\sum^{t}_{i=0}wx^{\rm i}{\hat{e}}(t-i)-$	(14)
	$\displaystyle\sum^{t-1}_{i=0}g(\sum^{i}_{k=0}wx^{\rm k}{\hat{e}}(i-k))V_{\text% {th}}{\hat{e}}(t-1-i),$
$\displaystyle o^{\text{t}}$	$\displaystyle=g(u^{\text{t}}).$

We aim to preserve the original gradient values without introducing significant neuron changes. Therefore, we utilize the derivative of the ReLU function as the surrogate function during backpropagation:

\frac{\partial o}{\partial u}=\left\{\begin{array}[]{ll}1,&\text{ if }u>0,\\ 0,&\text{ if }u\leq 0.\end{array}\right.

(15)

The pseudo-code of the Non-iterative LIF neuron model is shown in Algorithm 1.

Algorithm 1 Pseudocode for the Non-iterative LIF neuron model

function e_matrix(

time\_step

\tau

)

e\leftarrow\text{zeros}((time\_step,time\_step))

for

i\in\text{range}(time\_step)

for

j\in\text{range}(time\_step)

k\leftarrow j-i

k\geq 0

then

e[i][j]\leftarrow 1-\exp(-((time\_step-k)/\tau))

end if

end for

return

e

end function

function s_matrix(

e

v

)

s\leftarrow\text{zeros}(e)

s[:,1:]=e[:,:-1]v

return

s

end function

procedure Ni_LIF

function __init__(

time\_step

v

)

self.e\leftarrow\textsc{e\_matrix}(time\_step)

self.s\leftarrow\textsc{s\_matrix}(self.e,v)

end function

function forward(

x

)

u\leftarrow x\times self.e

o\_hat\leftarrow(u>self.v)\times self.s

u\leftarrow u-o\_hat

o\leftarrow(u>self.v)

return

o

end function

end procedure

III-A3 Comparison: Iterative LIF neuron and Non-iterative LIF neuron

The operational flow diagrams for the two neurons are shown in Figure 1. The Iterative LIF neuron operates in a recurrent manner, relying on the output from the preceding time step for its next computation. Conversely, the Non-iterative LIF model simultaneously processes data from all time steps, requiring only a few matrix operations to determine the membrane potential and output spikes for all time steps. The non-loop characteristic of the Non-iterative LIF neuron could avoid the gradient issues caused by long time step loop executions. We assume the loss is $L\in\mathbb{R}^{1\times(t_{\text{n}}+1)}$ , which is equal to the summation of the loss $l^{\rm t}\in\mathbb{R}$ of each time step $t$ . The gradient derivation for both LIF neuron models during backpropagation is shown as:

\frac{\partial L}{\partial w}=\sum^{t_{\rm n}}_{t=0}\frac{\partial l_{\rm t}}{% \partial w}=\sum^{t_{\rm n}}_{t=0}\frac{\partial l_{\rm t}}{\partial o^{\rm t}% }\frac{\partial o^{\rm t}}{\partial u^{\rm t}}\frac{\partial u^{\rm t}}{% \partial w}.

(16)

Therefore, we introduce two Propositions regarding $\frac{\partial u^{\rm t}}{\partial w}$ in two LIF neuron models respectively to compare the gradient issues. Proposition 2 shows the gradient equation of the Iterative LIF neuron, and Proposition 3 introduces the gradient equation of the Non-iterative LIF neuron.

Proposition 2

Given an Iterative LIF neuron with the dynamics in (2), the limit condition

\forall u^{\rm t}_{\text{I}}\text{, }\lim\limits_{t\to\infty}\frac{\partial u^% {\rm t}_{\text{I}}}{\partial w}=0

(17)

always holds, where $u^{\rm t}_{\text{I}}$ is the membrane potential of the Iterative LIF neuron at the time step $t$ .

Proof 2

In the Iterative LIF neuron model, $\frac{\partial u^{\rm t_{\text{n}}}_{\text{I}}}{\partial w}$ is derived in (20), which is composed of a summation of two parts of accumulated gradient multiplication $\prod^{t_{\text{n}}-1}_{i=0}\frac{\partial u^{\rm i+1}_{\text{I}}}{\partial u^% {\rm i}_{\text{I}}}$ . According to (2), $\frac{\partial u^{\rm i+1}_{\text{I}}}{\partial u^{\rm i}_{\text{I}}}$ is equal to the decay rate $\lambda\in(0,1)$ which is less than 1. Therefore, $\lim\limits_{t\to\infty}\prod^{t-1}_{i=0}\frac{\partial u^{\rm i+1}_{\text{I}}% }{\partial u^{\rm i}_{\text{I}}}=\lim\limits_{t\to\infty}\lambda^{t}=0$ , which causes $\lim\limits_{t\to\infty}\frac{\partial u^{\text{t}}_{\text{I}}}{\partial w}=0$ . $\square$

Proposition 3

Given a Non-iterative LIF neuron with the dynamics in (14), the limit condition

\exists u^{\rm t}_{\text{Ni}}\text{, s.t.}\lim\limits_{t\to\infty}\frac{% \partial u^{\rm t}_{\text{Ni}}}{\partial w}\neq 0

(18)

always holds where $u^{\rm t}_{\text{Ni}}$ is the membrane potential of the Non-iterative LIF neuron at the time step $t$ .

Proof 3

In the Non-iterative LIF neuron model, $\frac{\partial u^{\rm t_{\text{n}}}_{\text{Ni}}}{\partial w}$ is derived as:

$\displaystyle\frac{\partial u^{\rm t_{\text{n}}}_{\text{Ni}}}{\partial w}=$	$\displaystyle\sum^{t}_{i=0}x^{\rm i}{\hat{e}}(t-i)$	(19)
	$\displaystyle-\sum^{t-1}_{i=0}\left[V_{\text{th}}{\hat{e}}(t\!-\!1\!-\!i)\left% .\frac{\partial o}{\partial u}\right\|_{u=\sum^{i}_{k=0}wx^{\rm k}{\hat{e}}(i-k% )}\right.$
	$\displaystyle\cdot\left.\sum^{i}_{k=0}x^{\rm k}{\hat{e}}(i-k)\right],$

which is only composed of summations. We construct a specific example to prove $\exists u^{\rm t_{\text{n}}}_{\text{Ni}}\text{, s.t.}\lim\limits_{t_{\text{n}}% \to\infty}\frac{\partial u^{\rm t_{\text{n}}}_{\text{Ni}}}{\partial w}\neq 0$ . There exists $w<0$ and $\forall i,x^{\rm i}<0$ . Therefore, $\forall i,\sum^{i}_{k=0}wx^{\rm k}{\hat{e}}(i-k)<0$ . According to (15), $\forall i,\frac{\partial o}{\partial u}\left.\right|_{u=\sum^{i}_{k=0}wx^{\rm k% }{\hat{e}}(i-k)}=0$ . Thus, the proposition is derived as $\sum^{t}_{i=0}x^{\rm i}{\hat{e}}(t-i)$ . Due to $\forall i,x^{\rm i}<0$ , $\sum^{t}_{i=0}x^{\rm i}{\hat{e}}(t-i)<0\neq 0$ . This way, we prove that $\exists u^{\rm t_{\text{n}}}_{\text{Ni}}\text{, s.t.}\lim\limits_{t_{\text{n}}% \to\infty}\frac{\partial u^{\rm t_{\text{n}}}_{\text{Ni}}}{\partial w}\neq 0$ . $\square$

Remark 1

Proposition 2 and 3 shows that when the number of time steps is large, the iterative LIF neuron has the vanishing gradient problem, but the non-iterative LIF neuron does not. As well known in [66], the presence of a continuously accumulated gradient multiplication part, $\prod^{t_{\text{n}}-1}_{i}\frac{\partial u^{\rm i+1}_{\text{I}}}{\partial u^{% \rm i}_{\text{I}}}$ , can cause gradient vanishing issues during training. Conversely, the gradient equations of the proposed Non-iterative LIF neuron model is shown in (19) where gradient calculation is totally dependent on summation instead of $\prod^{t_{\text{n}}-1}_{i}\frac{\partial u^{\rm i+1}_{\text{I}}}{\partial u^{% \rm i}_{\text{I}}}$ in the Iterative LIF neuron model, which can avoid the gradient issue caused by accumulated multiplication. According to that, The Non-iterative LIF neuron model avoids the gradient problem caused by the recurrent execution of long step loops compared with the Iterative LIF neuron model.

$\displaystyle\frac{\partial u^{\text{t}}_{\text{I}}}{\partial w}$	$\displaystyle=\frac{\partial u^{\text{t}}_{\text{I}}}{\partial u^{\rm{t-1}}_{% \text{I}}}\frac{\partial u^{\rm{t-1}}_{\text{I}}}{\partial w}+\frac{\partial u% ^{\text{t}}_{\text{I}}}{\partial o^{\rm{t-1}}}\frac{\partial o^{\rm{t-1}}}{% \partial w}+x^{\rm{t-1}}$	(20)
	$\displaystyle=\frac{\partial u^{\text{t}}_{\text{I}}}{\partial u^{\rm{t-1}}_{% \text{I}}}\left[\frac{\partial u^{\rm{t-1}}_{\text{I}}}{\partial u^{\rm t-2}_{% \text{I}}}\frac{\partial u^{\rm t-2}_{\text{I}}}{\partial w}+\frac{\partial u^% {\rm t-2}_{\text{I}}}{\partial o^{\rm t-2}}\frac{\partial o^{\rm t-2}}{% \partial w}+x^{\rm t-2}\right]+\frac{\partial u^{\rm t}_{\text{I}}}{\partial o% ^{\rm{t-1}}}\frac{\partial o^{\rm{t-1}}}{\partial w}+x^{\rm{t-1}}$
	$\displaystyle=\frac{\partial u^{\text{t}}_{\text{I}}}{\partial u^{\rm{t-1}}_{% \text{I}}}\left[\frac{\partial u^{\rm{t-1}}_{\text{I}}}{\partial u^{\rm t-2}_{% \text{I}}}\left[\frac{\partial u^{\rm t-2}_{\text{I}}}{\partial u^{\rm t-3}_{% \text{I}}}\left[\ldots\left[\frac{\partial u^{2}_{\text{I}}}{\partial u^{1}_{% \text{I}}}\left[\frac{\partial u^{1}_{\text{I}}}{\partial u^{0}_{\text{I}}}% \frac{\partial u^{0}_{\text{I}}}{\partial w}\!+\!\frac{\partial u^{1}_{\text{I% }}}{\partial o^{0}}\frac{\partial o^{0}}{\partial w}\!+\!x^{0}\right]\!+\!% \frac{\partial u^{2}_{\text{I}}}{\partial o^{1}}\frac{\partial o^{1}}{\partial w% }\!+\!x^{1}\right]\!+\!\ldots\right]\!+\!\frac{\partial u^{\rm t-2}_{\text{I}}% }{\partial o^{\rm t-3}}\frac{\partial o^{\rm t-3}}{\partial w}\!+\!x^{t-3}% \right]\!+\!\frac{\partial u^{\rm{t-1}}_{\text{I}}}{\partial o^{\rm t-2}}\frac% {\partial o^{\rm t-2}}{\partial w}\right.$
	$\displaystyle\quad+x^{\rm t-2}\bigg{]}+\frac{\partial u^{\text{t}}_{\text{I}}}% {\partial o^{\rm{t-1}}}\frac{\partial o^{\rm{t-1}}}{\partial w}+x^{\rm{t-1}}$
	$\displaystyle=\sum^{t_{\text{n}}}_{i=1}\left[\prod^{t_{n}-1}_{k=i}\frac{% \partial u^{\rm k+1}_{\text{I}}}{\partial u^{\rm k}_{\text{I}}}\right]x^{i-1}+% \sum^{t_{\text{n}}}_{i=2}\left[\prod^{t_{n}-1}_{k=i}\frac{\partial u^{\rm k+1}% _{\text{I}}}{\partial u^{\rm k}_{\text{I}}}\right]\frac{\partial u^{\rm i}_{% \text{I}}}{\partial o^{\rm i-1}}\frac{\partial o^{\rm i-1}}{\partial w}=\sum^{% t_{\text{n}}}_{i=2}\underbrace{\left[\prod^{t_{n}-1}_{k=i}\frac{\partial u^{% \rm k+1}_{\text{I}}}{\partial u^{\rm k}_{\text{I}}}\right]}_{\begin{subarray}{% c}\text{Accumulating}\\ \text{gradient mult.}\end{subarray}}[x^{i-1}+\frac{\partial u^{\rm i}_{\text{I% }}}{\partial o^{\rm i-1}}\underbrace{\frac{\partial o^{\rm i-1}}{\partial w}}_% {\begin{subarray}{c}\text{Accumulating}\\ \text{gradient mult.}\end{subarray}}]+\underbrace{\prod^{t_{n}-1}_{i=1}\frac{% \partial u^{\rm i+1}_{\text{I}}}{\partial u^{\rm i}_{\text{I}}}}_{\begin{% subarray}{c}\text{Accumulating}\\ \text{gradient mult.}\end{subarray}}x^{0}.$

In our Non-iterative LIF neuron model, the final membrane potential $U_{\text{final}}$ is decided by the reset matrix $U_{\text{reset}}$ , which is an approximation of the actual situation. It is because matrix $\hat{O}$ overestimates the output spikes, resulting in $U_{\text{reset}}$ being larger than it should be, yielding a reduced final membrane potential $U_{\text{final}}$ and fewer output spikes. Due to this characteristic of the firing process, the Non-iterative LIF neuron exhibits greater sparsity compared to the Iterative LIF model, as shown in Figure 2. Given the identical input spike train, the non-iterative LIF neuron produces fewer spikes than its iterative counterpart. This sparsity brings more energy efficiency during execution.

III-B Attention model

In this section, we present the proposed attention models. The fundamental goal of the attention mechanism is to employ neural networks to compute the attention score. This score is applied across the entire feature map, assigning weights to the features. This way, relevant features are emphasized while irrelevant features are downplayed, aiming to extract useful ones. In the context of EEG signals, the original input data is shaped as $\mathbb{R}^{B\times C\times D}$ , where $B\in\mathbb{N}$ is the batch size, $C\in\mathbb{N}$ represents the channel size and $D\in\mathbb{N}$ is the length of data for each channel. We segment the data into timepieces, resulting in a new size of $\mathbb{R}^{B\times C\times S\times T}$ . Here, $S$ is the number of timepieces and $T\in\mathbb{N}$ is the number of time steps in each segment. It is important to note that the multiplication result of $S$ and $T$ should equal $D$ . Given that EEG data typically presents as long temporal sequences, it is important to determine which timepieces are crucial for classification. Thus, our attention model places particular emphasis on the dimension $S$ . We introduce two distinct attention mechanisms: Sequence attention (Seq-attention), described in Section III-B1, and Channel Sequence attention (ChanSeq-attention), detailed in Section III-B2. Section III-B3 presents Global-attention, a special case of ChanSeq-attention.

Contrary to the sequential attention models like those in [24], our intention is to utilize a single model to capture the attention score simultaneously. We introduce two model architectures for each attention mechanism: the linear architecture and the convolutional architecture. Figure 3 illustrates how the architecture of the linear attention differs from the attention model described in Figure 4. The distinction between these architectures lies in their methods for attention score computation: the linear architecture incorporates fully connected layers, whereas the convolutional architecture employs convolutional layers. Notably, both Seq-attention and ChanSeq-attention have linear and convolutional versions.

III-B1 Seq-attention mechanism

The Sequence attention mechanism mainly focuses on identifying which timepieces require attention. It firstly reshapes the input data into the format of $\mathbb{R}^{B\times S\times(C\times T)}$ , which the attention model then processes.

The linear Seq-attention (Linear-Seq-attention) model are given as follows:

		$\displaystyle q=Q_{\text{fc}}(x+p),\;Q_{\text{fc}}:\mathbb{R}^{B\times S\times% (C\times T)}\rightarrow\mathbb{R}^{B\times d_{1}\times S\times d_{2}},$		(21)
		$\displaystyle k=K_{\text{fc}}(x+p),\;K_{\text{fc}}:\mathbb{R}^{B\times S\times% (C\times T)}\rightarrow\mathbb{R}^{B\times d_{1}\times S\times d_{2}},$
		$\displaystyle v=V_{\text{fc}}(x+p),\;V_{\text{fc}}:\mathbb{R}^{B\times S\times% (C\times T)}\rightarrow\mathbb{R}^{B\times d_{1}\times S\times d_{2}},$
		$\displaystyle A=\text{Softmax}\left(\frac{qk^{T}}{\sqrt{d_{k}}}\right),\;A\in% \mathbb{R}^{B\times d_{1}\times S\times S},$
		$\displaystyle\hat{x}=Av,\;\hat{x}\in\mathbb{R}^{B\times d_{1}\times S\times d_% {2}},$
		$\displaystyle h_{\text{linear-seq}}(x)=FC_{\text{seq}}(\hat{x}),\;$
		$\displaystyle FC_{\text{seq}}:\mathbb{R}^{B\times d_{1}\times S\times d_{2}}% \rightarrow\mathbb{R}^{B\times C\times S\times T},$

where $Q_{\text{fc}}$ , $K_{\text{fc}}$ and $V_{\text{fc}}$ are three fully connect layers followed by reshape techniques. Their respective functions are to generate the query matrix $q$ , the key matrix $k$ , and the value matrix $v$ . In particular, these matrices adhere to dimensions $\mathbb{R}^{B\times d_{1}\times S\times d_{2}}$ , where $d_{1}$ and $d_{2}$ represent hyperparameters. $p$ is the position embedding [16]. Subsequently, the attention score $A$ is derived by matrix multiplication, followed by normalization. By applying the Softmax function to $A$ , weights are assigned to the values in $v$ . Then, a fully connected layer produces the final output $h_{\text{linear-seq}}(x)$ , which represents the input data enhanced with attention.

The convolutional Seq-attention (Conv-Seq-attention) model are presented as follows:

		$\displaystyle q=Q_{\text{conv}}(x),\;Q_{\text{conv}}:\mathbb{R}^{B\times S% \times C\times T}\rightarrow\mathbb{R}^{B\times S\times(d\times C\times T)},$		(22)
		$\displaystyle k=K_{\text{conv}}(x),\;K_{\text{conv}}:\mathbb{R}^{B\times S% \times C\times T}\rightarrow\mathbb{R}^{B\times S\times(d\times C\times T)},$
		$\displaystyle A=\text{Softmax}(qk^{T}),\;A\in\mathbb{R}^{B\times S\times S},$
		$\displaystyle\hat{x}=Ax,\;\hat{x}\in\mathbb{R}^{B\times C\times S\times T},$
		$\displaystyle h_{\text{conv-seq}}(x)=\alpha\hat{x}+x,$

where $Q_{\text{conv}}$ and $K_{\text{conv}}$ are two convolutional layers with reshape techniques. Similarly to the Linear-Seq-attention mechanism, they are designated to generate the query matrix $q$ and key matrix $k$ , respectively. Within this model, $d$ serves as a hyperparameter. Unlike its linear counterpart, the Conv-Seq-attention model omits the computation of the value matrix and instead directly calculates the attention score $A$ by multiplying the matrix. Subsequent to the Softmax function, these weights are integrated with the input data via matrix multiplication. Finally, the Conv-Seq-attention introduces a trainable parameter $\beta$ to modulate the balance between the attention-enhanced result and the original input data.

Both the Linear-Seq-attention and Conv-Seq-attention mechanisms yield attention scores of size $\mathbb{R}^{S\times S}$ in the last two dimensions. Then they utilize matrix multiplication to produce the final enhanced feature map. In this way, attention is exclusively directed toward different timepieces and their interaction without consideration of information from other dimensions.

III-B2 ChanSeq-attention mechanism

Typically, EEG signals are collected from multiple channels. For example, the OpenBMI dataset that we use in this work has 62 channels [67]. Beyond directing attention to timepieces, it is also valuable to be concerned with which channels receive the most attention. Thus, we introduce an attention mechanism named Channel Sequence attention mechanism, designed to determine when and where the features must be focused.

The equations for linear Channel Sequence attention (Linear-ChanSeq-attention) are as follows:

		$\displaystyle q=Q_{\text{fc}}(x+p),\;Q_{\text{fc}}:\mathbb{R}^{B\times C\times S% \times T}\rightarrow\mathbb{R}^{B\times C\times d_{1}\times S\times d_{2}},$		(23)
		$\displaystyle k=K_{\text{fc}}(x+p),\;K_{\text{fc}}:\mathbb{R}^{B\times C\times S% \times T}\rightarrow\mathbb{R}^{B\times C\times d_{1}\times S\times d_{2}},$
		$\displaystyle v=V_{\text{fc}}(x+p),\;V_{\text{fc}}:\mathbb{R}^{B\times C\times S% \times T}\rightarrow\mathbb{R}^{B\times C\times d_{1}\times S\times d_{2}},$
		$\displaystyle A=\text{Softmax}\left(\frac{qk^{T}}{\sqrt{d_{k}}}\right),\;A\in% \mathbb{R}^{B\times C\times d_{1}\times S\times S},$
		$\displaystyle\hat{x}=Av,\;\hat{x}\in\mathbb{R}^{B\times C\times d_{1}\times S% \times d_{2}},$
		$\displaystyle h_{\text{linear-chanseq}}(x)=FC_{\text{chanseq}}(\hat{x}),\;$
		$\displaystyle FC_{\text{chanseq}}:\mathbb{R}^{B\times C\times d_{1}\times S% \times d_{2}}\rightarrow\mathbb{R}^{B\times C\times S\times T},$

where the parameters have the same meaning as those in the Linear-Seq-attention. However, it is worth noting that to implement attention across both channels and timepieces simultaneously, there are modifications to the dimensions of the model. The resultant dimensions of $Q_{\text{fc}}$ , $K_{\text{fc}}$ , and $V_{\text{fc}}$ are adjusted to $\mathbb{R}^{B\times C\times d_{1}\times S\times d_{2}}$ instead of $\mathbb{R}^{B\times d_{1}\times S\times d_{2}}$ . This new channel dimension is consistently maintained throughout the entirety of the model.

The equations of corresponding convolutional Channel Sequence attention (Conv-ChanSeq-attention) are as follows:

		$\displaystyle q=Q_{\text{conv}}(x),\;Q_{\text{conv}}:\mathbb{R}^{B\times C% \times S\times T}\rightarrow\mathbb{R}^{B\times C\times S\times(d\times T)},$		(24)
		$\displaystyle k=K_{\text{conv}}(x),\;K_{\text{conv}}:\mathbb{R}^{B\times C% \times S\times T}\rightarrow\mathbb{R}^{B\times C\times S\times(d\times T)},$
		$\displaystyle A=\text{Softmax}(qk^{T}),\;A\in\mathbb{R}^{B\times C\times S% \times S},$
		$\displaystyle\hat{x}=Ax,\;\hat{x}\in\mathbb{R}^{B\times C\times S\times T},$
		$\displaystyle h_{\text{conv-chanseq}}(x)=\alpha\hat{x}+x,$

where the parameters in this context hold the same representations to those in the Conv-Seq-attention. Mirroring the adjustments in Linear-ChanSeq-attention, modifications to the dimensions are implemented here as well. Specifically, the output of $Q_{\text{conv}}$ and $K_{\text{conv}}$ has dimensions $\mathbb{R}^{B\times C\times S\times(d\times T)}$ , which is different from the previous approach of merging the $C$ and $T$ dimensions. As a result, the attention score $A$ is characterized by a size of $\mathbb{R}^{B\times C\times S\times S}$ , facilitating the attention across various channels while accounting for different timepieces.

III-B3 Global-attention mechanism

In this section, we go further from the ChanSeq-attention. Given input data with dimensions $\mathbb{R}^{B\times C\times S\times T}$ , not only are channels and timepieces considered, but also the specific time steps are important. To address this, we introduce the Global-attention mechanism, which operates attention across all three dimensions: $C$ , $S$ , and $T$ . It decides when, where, and which feature is essential.

The Global-attention can be viewed as a special case of the Conv-ChanSeq-attention when the dimensions of $S$ and $T$ are equivalent. The corresponding equations are presented below:

		$\displaystyle q=Q_{\text{conv}}(x),\;Q_{\text{conv}}:\mathbb{R}^{B\times C% \times S\times T}\rightarrow\mathbb{R}^{B\times C\times S\times(d\times T)},$		(25)
		$\displaystyle k=K_{\text{conv}}(x),\;K_{\text{conv}}:\mathbb{R}^{B\times C% \times S\times T}\rightarrow\mathbb{R}^{B\times C\times T\times(d\times S)},$
		$\displaystyle A=\text{Softmax}(qk^{T}),\;A\in\mathbb{R}^{B\times C\times S% \times T},$
		$\displaystyle\hat{x}=A\odot x,\;\hat{x}\in\mathbb{R}^{B\times C\times S\times T},$
		$\displaystyle h_{\text{conv-global}}(x)=\alpha\hat{x}+x.$

In this context, the parameters align with those defined in the Conv-ChanSeq-attention. In particular, when $S$ and $T$ have identical dimension sizes, both $A$ and $x$ have the size of $\mathbb{R}^{B\times C\times S\times T}$ . Contrary to the Conv-ChanSeq-attention, the Global-attention employs element-wise product instead of matrix multiplication when calculating enhanced feature map $\hat{x}$ in (25). Consequently, with $A$ having dimensions $\mathbb{R}^{B\times C\times S\times T}$ , each time step in each timepiece of each channel receives a specific attention score to improve the feature map. This method distinguishes itself from other attention models we proposed by its careful consideration of data and its direct approach to data enhancement.

III-C Network Architecture

In this section, we present the network architectures. Figure 5 illustrates an illustrative overview of these architectures. Figure 5a illustrates the architecture of the NiSNN-A. We utilize a two-layer residual spiking convolutional framework to process the data, where the first spiking layer could be seen as a spiking encoder as proposed in [14]. Furthermore, we have integrated the membrane potential batch normalization introduced in [68] in the LIF neuron. The LIF neuron yields a binary sequence as its output. To preserve the binary nature of the data stream, a max pooling layer is utilized to reduce dimensionality. After the second spiking layer, our proposed attention model is integrated. Finally, two linear layers without activation functions are employed to classify the output labels.

To compare and verify that the proposed attention mechanism can also be applied to the CNN network, we show the attention CNN counterpart to the NiSNN-A in Figure 5b. The CNN architecture mirrors the SNNs to regulate extraneous variables, encompassing a two-layer convolutional residual framework. After each convolutional layer, a batch normalization layer is applied, followed by the ReLU activation function. The average pooling layer is then used to reduce the dimension.

In particular, since the SNN network has the characteristics of binary data flow, the input data for both the second spiking layer and the first linear layer consists of accumulator operations (AC). In contrast, all operations within the CNN are multiplicative and accumulate operations (MAC).

IV Experiments

In this section, we outline the details of the experiment. Details about the dataset and its processing can be found in Section IV-A. The network configuration is elaborated in Section IV-B. Lastly, the approach to energy analysis is presented in Section IV-C.

IV-A Dataset

In our study, we utilize the OpenBMI dataset [67] to validate the efficacy of the proposed attention-based neural networks. OpenBMI is a comprehensive large-scale motor imagery EEG signal dataset encompassing data from 54 participants and 62 channels. Each participant has 400 data records lasting 4 seconds and a sampling frequency of 1000 Hz. Our aim is to assess our ability to recognize common features within these EEG signals. To achieve this, we employ a subject-independent approach for training and testing the network, which is utilized in [34]. Specifically, data from 53 subjects serve as the training set, with data from 1 subject reserved for testing. This approach guarantees that any results derived from a previously unseen subject, ensuring that the trained network remains unknown to the test data. This subject-independent method validates the practical scalability of our model. Therefore, the training dataset has 53 subject records, totaling 21,200 data records. Each of these records contains 4,000 time steps with 62 channels. To manage this, we employ a downsampling technique, reducing the size of time steps to a more manageable 400. The downsampled EEG data uses normalization with 0 as the mean and 1 as the standard derivation for preprocessing. And we select 20 channels as [34]: $FC_{5}$ , $FC_{3}$ , $FC_{1}$ , $FC_{2}$ , $FC_{4}$ , $FC_{6}$ , $C_{5}$ , $C_{3}$ , $C_{1}$ , $C_{z}$ , $C_{2}$ , $C_{4}$ , $C_{6}$ , $CP_{5}$ , $CP_{3}$ , $CP_{1}$ , $CP_{z}$ , $CP_{2}$ , $CP_{4}$ , $CP_{6}$ . For testing, we utilize the packaged test set available within the OpenBMI interface, which offers 200 records for each test subject. Throughout the training phase, we cycle through each subject as a test case, from the first to the last. The final performance result is determined by averaging the accuracy across these iterations. Essentially, each model undergoes training and testing 54 times to ascertain the final result, removing some uncertainty.

IV-B Network Setups

TABLE I: Architecture of networks during training. B is the batch size, C is channel size, S is the number of timepieces, and T is the number of time points in each timepiece.

Block	SNN layer	CNN layer	Filters	Size/padding	Output
Spike Encoding	Input	Input	-	-	(B, C, S, T)
	Clone residual	Clone residual	-	-	(B, C, S, T)
	Conv2d	Conv2d	C	(1, 5)/same	(B, C, S, T)
	-	BatchNorm2d	C	-	(B, C, S, T)
	-	ReLU	-	-	(B, C, S, T)
	LIF	-	-	-	(B, C, S, T)
	MaxPool2d	AvgPool2d	-	(2, 2)	(B, C, S/2, T/2)
Classifier	AC-Conv	MAC-Conv2d	C	(10, 10)/same	(B, C, S/2, T/2)
	Attention	Attention	-	-	(B, C, S/2, T/2)
	Add residual	Add residual	-	-	(B, C, S/2, T/2)
	-	BatchNorm2d	C	-	(B, C, S/2, T/2)
	-	ReLU	-	-	(B, C, S/2, T/2)
	LIF	-	-	-	(B, C, S/2, T/2)
	MaxPool2d	AvgPool2d	-	(2, 2)	(B, C, S/4, T/4)
	Flatten	Flatten	-	-	(B, C $\times$ S/4 $\times$ T/4)
	AC-Linear	MAC-Linear	-	-	(B, 20)
	Linear	Linear	-	-	(B, 2)

As described in Section IV-A, the input data processed has a dimension size of $\mathbb{R}^{B\times C\times S\times T}$ . In this paper, $B$ denotes the batch size, $C$ is the channel size, $S$ indicates the number of timepieces, and $T$ is the number of time steps within each timepiece. In the experiment, we set a batch size of 64 and a channel size of 20. Subsequent to the downsampling process, each record has a total of 400 time steps. We set these records to be segmented into 20 timepieces, rendering $S$ and $T$ as 20. We found it does not affect to involve $\hat{e}(\cdot)$ into (8). Therefore, to simplify the computation, we uniformly assign a value of 1 to all $\hat{e}(\cdot)$ within (8). Also, we set the threshold $V_{th}$ in the Non-iterative LIF neuron as 1.0.

The network parameters are shown in Table I. The first convolutional layer acts as a spiking encoder, only considering the information within each timepiece. Therefore, its kernel size is set as $(1,5)$ , accompanied by padding to the same size with the input and a stride of 1. The second convolutional layer plays the role of classifier, taking into account both intra-timepieces and inter-timepiece information. Therefore, this layer has a kernel size of $(10,10)$ , padding to the size dimension with the input and a stride of 1. All pooling layers in the network employ the kernel size of $(2,2)$ . The first linear layer reduces the flattened data dimension to 20, and the final linear layer reduces it further to 2, thereby having the final classification result. Within the attention layer, the hyperparameters $d_{1}$ and $d_{2}$ are set to 6 and 20, respectively, for all linear attention models. For convolutional attention models, the hyperparameter $d$ is set to 8. The models are trained for 20 epochs. We employ a well-trained CNN model as a pre-trained network for its corresponding SNN model to accelerate the training procedure. During the training process, we adopted Adam optimizer with a learning rate of 0.001. The cross-entropy loss is utilized as the loss function:

\text{CE\_Loss}(y,p)=-\sum_{n}^{\rm n}y_{\text{n}}\log(p_{\text{n}}),

(26)

where $y$ represents the label of the data and $p$ is the network’s output.

IV-C Energy analysis

For analyzing the energy consumption of CNN and SNN models, we adopt the same energy analysis method in [24], which calculates the network’s floating point operations (FLOP).

The main operations within neural networks in this context can be categorically divided into three primary types: the convolutional layer, the linear layer, and matrix multiplication. The FLOPs associated with each of these operations can be described as:

		$\displaystyle\text{FLOPs}_{\text{conv}}^{\rm n}=k^{\rm n}_{0}k^{\rm n}_{1}h^{% \rm n}w^{\rm n}c^{\rm n}c^{\rm n-1},$		(27)
		$\displaystyle\text{FLOPs}_{\text{fc}}^{\rm n}=i^{\rm n}o^{\rm n},$
		$\displaystyle\text{FLOPs}_{\text{mm}}=m_{1}nm_{2},$

where for the $n^{\text{th}}$ convolutional layer, $k^{\rm n}_{0}$ and $k^{\rm n}_{1}$ denote the dimensions of the convolutional kernel, while $h^{\rm n}$ and $w^{\rm n}$ represent the dimensions of the output map. In addition, $c^{\rm n}$ and $c^{\rm n-1}$ specify the size of the input and output data channels, respectively. For the $n^{th}$ linear layer, $i^{\rm n}$ represents the input size, and $o^{\rm n}$ denotes the output size. Finally, for matrix multiplication involving two matrices of dimensions $[m_{1},n]$ and $[n,m_{2}]$ , the FLOPs are determined as the product of these three dimensions.

In the CNN models, all network data consist of real-valued numbers, which makes all operations MAC. Table II analyzes the FLOPs associated with CNN models. This analysis can be divided into the FLOPs corresponding to the standard vanilla CNN model, denoted by $x$ , and the FLOPs for the additional attention model. Given our consistent network architecture for all models, the value of $x$ remains constant, 4.810,040.

TABLE II: FLOPs of CNN model execution

Method	MAC
Autthasan et al.[34]	$x$
Huang et al.[32]	$x$
Liu et al.[12]	$x$ +1,644,200
Luo et al.[41]	$x$ +2,469,600
Fan et al.[46], Wang et al. [44]	$x$ +45,200
Zhang et al.[43]	$x$ +180,000
Linear-Seq-attention	$x$ +97,800
Conv-Seq-attention	$x$ +822,100
Linear-ChanSeq-attention	$x$ +496,800
Conv-ChanSeq-attention	$x$ +824,000
Global-attention	$x$ +806,000

In the SNN models, the FLOPs analysis is more complicated. As elaborated in III-C, the inputs to both the second convolutional layer and the first linear layer are all binary. Consequently, these layers employ AC operations, which accumulate weight directly without involving multiplications. It is worth noting that the number of AC operations is contingent upon the spike rate, given that accumulation happens only when the input is 1. Thus, the spike rates of these two layers emerge as important parameters when quantifying the AC operations within SNN models. The rest of the SNN retains the use of MAC operations, mirroring the corresponding CNN models. Therefore, the FLOPs assessment for SNN models can be partitioned into three parts: AC of the binary layers, MAC of the rest of the SNN model, and the MAC of the additional attention model. The MAC of the rest of the SNN model could be divided into two parts: MAC of LIF neurons; and MAC of the rest of the SNN model with LIF neurons. We present the MAC of the rest SNN model with Non-iterative LIF neurons as $y$ , which is a constant across all SNN models with Non-iterative LIF neurons, calculated as 1,170,040. The MAC of SNN with Iterative LIF neurons is less than $y$ , which is calculated as $y-180,000$ . It should be highlighted that while the FLOPs associated with Iterative LIF neurons are fewer than those of Non-iterative LIF neurons, the latter are implemented using matrix operations, whereas the former rely on loop structures, leading to increased execution time. The original AC for the second convolutional layer is 4,000,000, denoted by $a$ , while for the first linear layer, it is 10,000, represented by $b$ . The final amount of AC is the product of the original AC and the spike rate. Details of FLOPs for SNN models are presented in Table III.

TABLE III: FLOPs of SNN model execution. Several SNNs with attention models were proposed in [24]: Channel Attention (CA), Spatial Attention (SA), Temporal Attention (TA), Channel-Temporal Attention (CTA), Channel-Spatial Attention(CSA), Spatial-Temporal Attention (STA), Channel-Spatial-Temporal Attention (CSTA)

Method	MAC	AC_Conv	AC_Linear
Wu et al. [56] (Iterative LIF)	$y$ -180,000	0.811a	0.404b
Wu et al. [56]	$y$	0.320a	0.297b
Hu et al. [69]	$y$	0.292a	0.317b
Yao et al. [24] (CA)	$y$ +3,600	0.307a	0.299b
Yao et al. [24] (SA)	$y$ +2,400	0.364a	0.240b
Yao et al. [24] (TA)	$y$ +2,400	0.376a	0.310b
Yao et al. [24] (CTA)	$y$ +6,000	0.344a	0.293b
Yao et al. [24] (CSA)	$y$ +6,000	0.362a	0.296b
Yao et al. [24] (STA)	$y$ +4,800	0.430a	0.293b
Yao et al. [24] (CSTA)	$y$ +8,400	0.398a	0.295b
Linear-Seq-attention	$y$ +97,800	0.297a	0.297b
Conv-Seq-attention	$y$ +822,100	0.308a	0.293b
Linear-ChanSeq-attention	$y$ +496,800	0.305a	0.268b
Conv-ChanSeq-attention	$y$ +824,000	0.308a	0.290b
Global-attention	$y$ +806,000	0.308a	0.295b

After the FLOPs analysis, we could calculate the total energy cost. We adopt the same assumption as in [24] that the data for various operations are implemented as floating point 32 bits in 45nm technology, where the MAC energy is 4.6 $pJ$ and the AC energy is $0.9pJ$ .

V Result and Discussion

In this section, we describe the results of the proposed attention models. The results are delineated in Section V-A. Then, a discussion along with the result visualization is provided in Section V-B.

TABLE IV: Accuracy and energy performance of Left H. vs Right H. subject-independent classification using the OpenBMI dataset for SNNs with attention mechanisms.

Reference	Method	Type	Accuracy	Energy ( $\mu{J}$ )
Autthasan et al. [34]	vanilla CNN	CNN	0.73679 +/-0.13201	23.569
Huang et al. [32]	vanilla CNN with residual blocks	CNN	0.73698 +/- 0.13137	23.569
Liu et al. [12]	Sequence + Temporal attention	CNN	0.73939 +/- 0.12799	31.626
Luo et al. [41]	Sequence + Temporal attention	CNN	0.71383 +/- 0.14112	35.670
Fan et al. [46], Wang et al. [44]	Channel + Sequence + Temporal attention	CNN	0.73862 +/- 0.12788	23.791
Zhang et al. [43]	Channel attention	CNN	0.74045 +/- 0.13622	24.451
Wu et al. [56]	vanilla SNN (with the Iterative LIF neuron)	SNN	0.50357 +/- 0.00529	7.774
Wu et al. [56]	vanilla SNN	SNN	0.70669 +/- 0.11510	6.888
Hu et al. [69]	vanilla SNN with residual blocks	SNN	0.70129 +/- 0.11720	6.787
Yao et al. [24]	Channel attention	SNN	0.68673 +/- 0.11113	6.859
Yao et al. [24]	Sequence attention	SNN	0.68798 +/- 0.10939	7.058
Yao et al. [24]	Temporal attention	SNN	0.68056 +/- 0.10473	7.101
Yao et al. [24]	Channel + Temporal attention	SNN	0.67805 +/- 0.11100	7.004
Yao et al. [24]	Channel + Sequence attention	SNN	0.67284 +/- 0.10830	7.068
Yao et al. [24]	Sequence + Temporal attention	SNN	0.66262 +/- 0.09970	7.307
Yao et al. [24]	Channel + Sequence + Temporal attention	SNN	0.64574 +/- 0.09904	7.210
Ours	Linear-Seq-attention	SNN	0.70689 +/- 0.12624	7.284
Ours	Conv-Seq-attention	SNN	0.72222 +/- 0.12551	10.873
Ours	Linear-ChanSeq-attention	SNN	0.71354 +/- 0.12620	9.268
Ours	Conv-ChanSeq-attention	SNN	0.72791 +/- 0.12791	10.882
Ours	Global-attention	SNN	0.72830 +/- 0.12700	10.794

*

Only the first vanilla SNN uses the Iterative LIF neurons (specified already), otherwise all SNN models use the proposed Non-iterative LIF neuron model.

TABLE V: Accuracy and energy performance of Left H. vs Right H. subject-independent classification using the OpenBMI dataset for CNNs with proposed attention mechanisms.

Method	Type	Accuracy	Energy ( $\mu{J}$ )
CNN with Linear-Seq-attention	CNN	0.73582 +/- 0.13443	24.048
CNN with Linear-ChanSeq-attention	CNN	0.73708 +/- 0.13239	27.597
CNN with Conv-Seq-attention	CNN	0.74267 +/- 0.12989	26.004
CNN with Conv-ChanSeq-attention	CNN	0.74055 +/- 0.13320	27.607
CNN with Global-attention	CNN	0.74122 +/- 0.13039	27.519

V-A Comparison with state-of-the-art methods

Table IV compares the performances of various attention mechanisms with CNNs and SNNs. We have chosen seven CNN models with attention mechanisms for EEG classification as our benchmark. Autthasam et al. [34] have proposed a vanilla CNN model, employing a subject-independent approach for training and testing phases. Furthermore, Huang et al. [32] incorporated residual blocks into the CNN model. Of the seven baseline models, five models employ attention mechanisms. It is worth noting that [46] and [44] employ a global attention mechanism that encompasses all three dimensions. However, their approach is characterized by extracting attention scores individually for each dimension after using the pooling methods to minimize unrelated dimensions. This stands differently from the methodology of our proposed Global-attention model. A total of ten models were chosen as the SNN baselines. It is worth noting that in the context of EEG signal processing, the Iterative LIF model struggles with the long time steps, potentially leading to gradient issues. Therefore, we adopted the proposed Non-iterative LIF neuron in other baseline models. Concerning the attention mechanism, the baseline models are from the image attention SNN model developed for computer vision, as introduced in [24]. This approach composes various dimensional attention components sequentially to achieve the attention score, diverging from our proposed models that utilize a singular model.

The accuracy metric was determined by the ratio of correctly classified samples to the total number of samples, as given by:

\text{Accuracy}=\frac{\text{TP}+\text{TN}}{\text{TP}+\text{TN}+\text{FP}+\text% {FN}},

(28)

where TP, TN, FP, and FN represent true positives, true negatives, false positives, and false negatives, respectively. For energy analysis, we adopted the unit $\mu J$ to quantify the energy consumption of the networks as discussed in Section IV-C.

From our analysis, three proposed convolution-based NiSNN-As all achieve an accuracy of over 0.72. Our Global NiSNN-A achieves the highest accuracy of 0.7283 among SNN models. However, this accuracy could be comparable with the CNN models, with the additional advantage of consuming around 2 times less energy.

Notably, compared with the Iterative LIF neuron model, the proposed Non-iterative LIF neuron model improves the accuracy by 0.2 on the same network architecture while reducing the energy cost. In table III, the firing rate of non-iterative LIF neurons is much lower than that of iterative LIF neurons, which illustrates the sparsity of our proposed method.

To verify our attention mechanisms could also work on CNN models, we present the comparison results of attention CNNs in Table V. The results suggest that our Conv-ChanSeq-attention mechanism overperforms other CNN models, showing the proposed methods’ feasibility.

V-B Discussion

The comparative results of CNN and SNN models underscore the potential of the attention mechanism to improve the accuracy of EEG signal classification, especially for SNN models. To provide a clearer understanding of how the attention mechanism works, we aim to present a straightforward visualization that illustrates the functioning of the attention score $A$ across the entire feature map. The Global-attention model employs an element-wise product to derive the enhanced feature map, facilitating a more intuitive representation. In this model, the attention score at each position directly indicates the relative importance allocated to the corresponding position. Figure 6 visually represents the Global-attention model’s attention score $A$ . The upper part of the figure delineates the input 20 channel EEG signals in a numeric format with real values. The following part presents the spikes after the spiking encoder in the network, represented as the black raster. The orange blocks represent the attention scores. Darker shades of orange represent higher attention scores, whereas lighter shades indicate lower ones. It illustrates the internal operations of the attention mechanism, highlighting the model’s capacity to assign variable attention across different regions of the EEG signals, thereby answering when, where, and which information is relevant. By employing such a dynamic weighting method, the model ensures that the most important regions of the input are prioritized, potentially contributing to the high accuracy in classification tasks.

From an energy consumption point of view, the results highlight the significance of employing SNNs, which achieve accuracy comparable to CNN models and offer a 2-fold reduction in energy use. This efficiency can be important in applications where power consumption is a concern, such as portable EEG devices or real-time EEG monitoring systems. Therefore, SNNs present promising potential for these energy-conscious edge devices.

VI Conclusion

This paper introduced an innovative NiSNN-A model, encompassing a novel Non-iterative LIF neuron and diverse attention mechanisms. The newly proposed Non-iterative LIF neuron retains the biological attributes of traditional LIF neurons while efficiently handling long temporal data. This design avoids long loops in execution and gradient challenges by leveraging matrix operations within neurons. Subsequently, the attention mechanism emphasizes important parts in the feature map. Notably, all our proposed attention models integrate computations within one singular model instead of using sequential architectures. We employed the OpenBMI dataset for validation, adopting a subject-independent approach to demonstrate the model’s capabilities in uniformed feature extraction for unfamiliar participants. The results indicate that our approach surpasses other SNN models in accuracy performance. It achieves accuracy comparable to its CNN counterparts, but improves energy efficiency. Furthermore, our attention visualization results reveal that our model improves the classification task’s accuracy and offers deeper insights into EEG signal interpretation.

This research has provided a way for novel methodologies in EEG classification, focusing on potential cooperation between attention mechanisms and spiking neural network architectures. In the future, as the field of EEG signal processing continues to evolve, our findings require continued innovation and adaptive strategies to address challenges.

References

[1] A. Tsiamalou, E. Dardiotis, K. Paterakis, G. Fotakopoulos, I. Liampas, M. Sgantzos, V. Siokas, and A. G. Brotis, “EEG in neurorehabilitation: a bibliometric analysis and content review,” Neurology International, vol. 14, no. 4, pp. 1046–1061, 2022.
[2] C. Del Percio, S. Lopez, G. Noce, R. Lizio, F. Tucci, A. Soricelli, R. Ferri, F. Nobili, D. Arnaldi, F. Famà et al., “What a single electroencephalographic (EEG) channel can tell us about alzheimer’s disease patients with mild cognitive impairment,” Clinical EEG and Neuroscience, vol. 54, no. 1, pp. 21–35, 2023.
[3] A. Singh, A. A. Hussain, S. Lal, and H. W. Guesgen, “A comprehensive review on critical issues and possible solutions of motor imagery based electroencephalography brain-computer interface,” Sensors, vol. 21, no. 6, p. 2173, 2021.
[4] J. Zhang and M. Wang, “A survey on robots controlled by motor imagery brain-computer interfaces,” Cognitive Robotics, vol. 1, pp. 12–24, 2021.
[5] S. Rajwal and S. Aggarwal, “Convolutional neural network-based EEG signal analysis: A systematic review,” Archives of Computational Methods in Engineering, pp. 1–31, 2023.
[6] S. Alhagry, A. A. Fahmy, and R. A. El-Khoribi, “Emotion recognition based on eeg using lstm recurrent neural network,” International Journal of Advanced Computer Science and Applications, vol. 8, no. 10, 2017.
[7] Y. Song, Q. Zheng, B. Liu, and X. Gao, “Eeg conformer: Convolutional transformer for eeg decoding and visualization,” IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 31, pp. 710–719, 2022.
[8] O.-Y. Kwon, M.-H. Lee, C. Guan, and S.-W. Lee, “Subject-independent brain–computer interfaces based on deep convolutional neural networks,” IEEE Transactions on Neural Networks and Learning Systems, vol. 31, no. 10, pp. 3839–3852, Oct. 2020.
[9] J.-S. Bang, M.-H. Lee, S. Fazli, C. Guan, and S.-W. Lee, “Spatio-spectral feature representation for motor imagery classification using convolutional neural networks,” IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 7, pp. 3038–3049, Jul. 2022.
[10] S. Zhang, L. Wu, S. Yu, E. Shi, N. Qiang, H. Gao, J. Zhao, and S. Zhao, “An explainable and generalizable recurrent neural network approach for differentiating human brain states on EEG dataset,” IEEE Transactions on Neural Networks and Learning Systems, pp. 1–12, 2022.
[11] C. Ju and C. Guan, “Tensor-cspnet: a novel geometric deep learning framework for motor imagery classification,” IEEE Transactions on Neural Networks and Learning Systems, vol. 34, no. 12, pp. 10 955–10 969, Dec. 2023.
[12] X. Liu, Y. Shen, J. Liu, J. Yang, P. Xiong, and F. Lin, “Parallel spatial–temporal self-attention cnn-based motor imagery classification for bci,” Frontiers in neuroscience, vol. 14, p. 587520, 2020.
[13] E. Eldele, Z. Chen, C. Liu, M. Wu, C.-K. Kwoh, X. Li, and C. Guan, “An attention-based deep learning approach for sleep stage classification with single-channel EEG,” IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 29, pp. 809–818, 2021.
[14] X. Liao, Y. Wu, Z. Wang, D. Wang, and H. Zhang, “A convolutional spiking neural network with adaptive coding for motor imagery classification,” Neurocomputing, p. 126470, 2023.
[15] Y. Zhang, T. Zhou, W. Wu, H. Xie, H. Zhu, G. Zhou, and A. Cichocki, “Improving EEG decoding via clustering-based multitask feature learning,” IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 8, pp. 3587–3597, Aug. 2022.
[16] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
[17] S. Li, Q. Yan, and P. Liu, “An efficient fire detection method based on multiscale feature extraction, implicit deep supervision and channel attention mechanism,” IEEE Transactions on Image Processing, vol. 29, pp. 8467–8475, 2020.
[18] S. Abirami and P. Chitra, “Energy-efficient edge based real-time healthcare support system,” in Advances in computers. Elsevier, 2020, vol. 117, no. 1, pp. 339–368.
[19] Z. Huang and M. Wang, “A review of electroencephalogram signal processing methods for brain-controlled robots,” Cognitive Robotics, vol. 1, pp. 111–124, 2021.
[20] S. Woźniak, A. Pantazi, T. Bohnstingl, and E. Eleftheriou, “Deep learning incorporating biologically inspired neural dynamics and in-memory computing,” Nature Machine Intelligence, vol. 2, no. 6, pp. 325–336, 2020.
[21] C. D. Schuman, S. R. Kulkarni, M. Parsa, J. P. Mitchell, P. Date, and B. Kay, “Opportunities for neuromorphic computing algorithms and applications,” Nature Computational Science, vol. 2, no. 1, pp. 10–19, 2022.
[22] Y. Liu and W. Pan, “Spiking neural-networks-based data-driven control,” Electronics, vol. 12, no. 2, p. 310, 2023.
[23] M. A. Siddiqi, D. Vrijenhoek, L. P. Landsmeer, J. van der Kleij, A. Gebregiorgis, V. Romano, R. Bishnoi, S. Hamdioui, and C. Strydis, “A lightweight architecture for real-time neuronal-spike classification,” arXiv preprint arXiv:2311.04808, 2023.
[24] M. Yao, G. Zhao, H. Zhang, Y. Hu, L. Deng, Y. Tian, B. Xu, and G. Li, “Attention spiking neural networks,” IEEE transactions on pattern analysis and machine intelligence, 2023.
[25] A. Al-Saegh, S. A. Dawwd, and J. M. Abdul-Jabbar, “Deep learning for motor imagery EEG-based classification: A review,” Biomedical Signal Processing and Control, vol. 63, p. 102172, 2021.
[26] J. Cui, Z. Lan, O. Sourina, and W. Müller-Wittig, “EEG-based cross-subject driver drowsiness recognition with an interpretable convolutional neural network,” IEEE Transactions on Neural Networks and Learning Systems, vol. 34, no. 10, pp. 7921–7933, Oct. 2023.
[27] H. Dong, A. Supratak, W. Pan, C. Wu, P. M. Matthews, and Y. Guo, “Mixed neural network approach for temporal sleep stage classification,” IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 26, no. 2, pp. 324–333, 2017.
[28] J. Wang, R. Gao, H. Zheng, H. Zhu, and C.-J. R. Shi, “Ssgcnet: a sparse spectra graph convolutional network for epileptic EEG signal classification,” IEEE Transactions on Neural Networks and Learning Systems, pp. 1–15, 2023.
[29] H. Dose, J. S. Møller, H. K. Iversen, and S. Puthusserypady, “An end-to-end deep learning approach to mi-EEG signal classification for bcis,” Expert Systems with Applications, vol. 114, pp. 532–542, 2018.
[30] Y. Li, L. Guo, Y. Liu, J. Liu, and F. Meng, “A temporal-spectral-based squeeze-and-excitation feature fusion network for motor imagery EEG decoding,” IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 29, pp. 1534–1545, 2021.
[31] V. J. Lawhern, A. J. Solon, N. R. Waytowich, S. M. Gordon, C. P. Hung, and B. J. Lance, “Eegnet: a compact convolutional neural network for EEG-based brain–computer interfaces,” Journal of neural engineering, vol. 15, no. 5, p. 056013, 2018.
[32] J.-S. Huang, W.-S. Liu, B. Yao, Z.-X. Wang, S.-F. Chen, and W.-F. Sun, “Electroencephalogram-based motor imagery classification using deep residual convolutional networks,” Frontiers in Neuroscience, vol. 15, p. 774857, 2021.
[33] A. I. Humayun, A. S. Sushmit, T. Hasan, and M. I. H. Bhuiyan, “End-to-end sleep staging with raw single channel eeg using deep residual convnets,” in 2019 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI). IEEE, 2019, pp. 1–5.
[34] P. Autthasan, R. Chaisaen, T. Sudhawiyangkul, P. Rangpong, S. Kiatthaveephong, N. Dilokthanakul, G. Bhakdisongkhram, H. Phan, C. Guan, and T. Wilaiprasitporn, “Min2net: End-to-end multi-task learning for subject-independent motor imagery EEG classification,” IEEE Transactions on Biomedical Engineering, vol. 69, no. 6, pp. 2105–2118, 2021.
[35] X. Ma, S. Qiu, C. Du, J. Xing, and H. He, “Improving EEG-based motor imagery classification via spatial and temporal recurrent neural networks,” in 2018 40th annual international conference of the IEEE engineering in medicine and biology society (EMBC). IEEE, 2018, pp. 1903–1906.
[36] L.-M. Zhao, X. Yan, and B.-L. Lu, “Plug-and-play domain adaptation for cross-subject eeg-based emotion recognition,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 1, 2021, pp. 863–870.
[37] L. Xia, Y. Feng, Z. Guo, J. Ding, Y. Li, Y. Li, M. Ma, G. Gan, Y. Xu, J. Luo, Z. Shi, and Y. Guan, “Mulhita: a novel multiclass classification framework with multibranch lstm and hierarchical temporal attention for early detection of mental stress,” IEEE Transactions on Neural Networks and Learning Systems, vol. 34, no. 12, pp. 9657–9670, Dec. 2023.
[38] L. Zhang, F. Xiao, and Z. Cao, “Multi-channel eeg signals classification via cnn and multi-head self-attention on evidence theory,” Information Sciences, vol. 642, p. 119107, 2023.
[39] X. Wang and Z. Wang, “Cnn with self-attention in EEG classification,” in International Conference on Human-Computer Interaction. Springer, 2022, pp. 512–526.
[40] Y. Wen, W. He, and Y. Zhang, “A new attention-based 3d densely connected cross-stage-partial network for motor imagery classification in bci,” Journal of Neural Engineering, vol. 19, no. 5, p. 056026, 2022.
[41] J. Luo, Y. Wang, S. Xia, N. Lu, X. Ren, Z. Shi, and X. Hei, “A shallow mirror transformer for subject-independent motor imagery bci,” Computers in Biology and Medicine, vol. 164, p. 107254, 2023.
[42] Y. Ma, Y. Song, and F. Gao, “A novel hybrid cnn-transformer model for EEG motor imagery classification,” in 2022 International Joint Conference on Neural Networks (IJCNN). IEEE, 2022, pp. 1–8.
[43] R. Zhang, G. Liu, Y. Wen, and W. Zhou, “Self-attention-based convolutional neural network and time-frequency common spatial pattern for enhanced motor imagery classification,” Journal of Neuroscience Methods, vol. 398, p. 109953, 2023.
[44] T. Wang, J. Mao, R. Xiao, W. Wang, G. Ding, and Z. Zhang, “Residual learning attention cnn for motion intention recognition based on EEG data,” in 2021 IEEE Biomedical Circuits and Systems Conference (BioCAS). IEEE, 2021, pp. 1–6.
[45] H. Li, H. Chen, Z. Jia, R. Zhang, and F. Yin, “A parallel multi-scale time-frequency block convolutional neural network based on channel attention module for motor imagery classification,” Biomedical Signal Processing and Control, vol. 79, p. 104066, 2023.
[46] C.-C. Fan, H. Yang, Z.-G. Hou, Z.-L. Ni, S. Chen, and Z. Fang, “Bilinear neural network with 3-d attention for brain decoding of motor imagery movements from the human EEG,” Cognitive Neurodynamics, vol. 15, pp. 181–189, 2021.
[47] A. Barton, E. Volna, M. Kotyrba, and R. Jarusek, “Proposal of a control algorithm for multiagent cooperation using spiking neural networks,” IEEE Transactions on Neural Networks and Learning Systems, vol. 34, no. 4, pp. 2016–2027, Apr. 2023.
[48] Q. Liu, G. Pan, H. Ruan, D. Xing, Q. Xu, and H. Tang, “Unsupervised aer object recognition based on multiscale spatio-temporal features and spiking neurons,” IEEE Transactions on Neural Networks and Learning Systems, vol. 31, no. 12, pp. 5300–5311, Dec. 2020.
[49] D. Liu, N. Bellotto, and S. Yue, “Deep spiking neural network for video-based disguise face recognition based on dynamic facial movements,” IEEE Transactions on Neural Networks and Learning Systems, vol. 31, no. 6, pp. 1843–1855, Jun. 2020.
[50] A. Safa, F. Corradi, L. Keuninckx, I. Ocket, A. Bourdoux, F. Catthoor, and G. G. E. Gielen, “Improving the accuracy of spiking neural networks for radar gesture recognition through preprocessing,” IEEE Transactions on Neural Networks and Learning Systems, vol. 34, no. 6, pp. 2869–2881, Jun. 2023.
[51] M. Dampfhoffer, T. Mesquida, A. Valentian, and L. Anghel, “Backpropagation-based learning techniques for deep spiking neural networks: A survey,” IEEE Transactions on Neural Networks and Learning Systems, 2023.
[52] T. Zhang, S. Jia, X. Cheng, and B. Xu, “Tuning convolutional spiking neural network with biologically plausible reward propagation,” IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 12, pp. 7621–7631, Dec. 2022.
[53] S. Schliebs and N. Kasabov, “Evolving spiking neural network—a survey,” Evolving Systems, vol. 4, pp. 87–98, 2013.
[54] S. R. Kheradpisheh, M. Ganjtabesh, S. J. Thorpe, and T. Masquelier, “Stdp-based spiking deep convolutional neural networks for object recognition,” Neural Networks, vol. 99, pp. 56–67, 2018.
[55] B. Rueckauer, I.-A. Lungu, Y. Hu, M. Pfeiffer, and S.-C. Liu, “Conversion of continuous-valued deep networks to efficient event-driven networks for image classification,” Frontiers in neuroscience, vol. 11, p. 682, 2017.
[56] Y. Wu, L. Deng, G. Li, J. Zhu, and L. Shi, “Spatio-temporal backpropagation for training high-performance spiking neural networks,” Frontiers in neuroscience, vol. 12, p. 331, 2018.
[57] J. M. Antelis, L. E. Falcón et al., “Spiking neural networks applied to the classification of motor tasks in EEG signals,” Neural networks, vol. 122, pp. 130–143, 2020.
[58] Y. Luo, Q. Fu, J. Xie, Y. Qin, G. Wu, J. Liu, F. Jiang, Y. Cao, and X. Ding, “EEG-based emotion classification using spiking neural networks,” IEEE Access, vol. 8, pp. 46 007–46 016, 2020.
[59] N. Kasabov and E. Capecci, “Spiking neural network methodology for modelling, classification and understanding of EEG spatio-temporal data measuring cognitive processes,” Information Sciences, vol. 294, pp. 565–575, 2015.
[60] X. Wu, Y. Feng, S. Lou, H. Zheng, B. Hu, Z. Hong, and J. Tan, “Improving neucube spiking neural network for EEG-based pattern recognition using transfer learning,” Neurocomputing, vol. 529, pp. 222–235, 2023.
[61] Z. Yan, J. Zhou, and W.-F. Wong, “Energy efficient ecg classification with spiking neural network,” Biomedical Signal Processing and Control, vol. 63, p. 102170, 2021.
[62] ——, “EEG classification with spiking neural network: Smaller, better, more energy efficient,” Smart Health, vol. 24, p. 100261, 2022.
[63] S. Ghosh-Dastidar and H. Adeli, “Improved spiking neural networks for EEG classification and epilepsy and seizure detection,” Integrated Computer-Aided Engineering, vol. 14, no. 3, pp. 187–212, 2007.
[64] S. M. Bohte, J. N. Kok, and H. La Poutre, “Error-backpropagation in temporally encoded networks of spiking neurons,” Neurocomputing, vol. 48, no. 1-4, pp. 17–37, 2002.
[65] E. O. Neftci, H. Mostafa, and F. Zenke, “Surrogate gradient learning in spiking neural networks: Bringing the power of gradient-based optimization to spiking neural networks,” IEEE Signal Processing Magazine, vol. 36, no. 6, pp. 51–63, 2019.
[66] S. Hochreiter, Y. Bengio, P. Frasconi, J. Schmidhuber et al., “Gradient flow in recurrent nets: the difficulty of learning long-term dependencies,” 2001.
[67] M.-H. Lee, O.-Y. Kwon, Y.-J. Kim, H.-K. Kim, Y.-E. Lee, J. Williamson, S. Fazli, and S.-W. Lee, “EEG dataset and openbmi toolbox for three bci paradigms: An investigation into bci illiteracy,” GigaScience, vol. 8, no. 5, p. giz002, 2019.
[68] Y. Guo, Y. Zhang, Y. Chen, W. Peng, X. Liu, L. Zhang, X. Huang, and Z. Ma, “Membrane potential batch normalization for spiking neural networks,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 19 420–19 430.
[69] Y. Hu, Y. Wu, L. Deng, and G. Li, “Advancing residual learning towards powerful deep spiking neural networks,” arXiv e-prints, pp. arXiv–2112, 2021.