HTML conversions sometimes display errors due to content that did not convert correctly from the source. This paper uses the following packages that are not yet supported by the HTML conversion tool. Feedback on these issues are not necessary; they are known and are being worked on.

  • failed: pgfmath

Authors: achieve the best HTML results from your LaTeX submissions by selecting from this list of supported packages.

License: arXiv.org perpetual non-exclusive license
arXiv:2302.11410v3 [eess.SP] 15 Dec 2023

Score-Based Data Generation for EEG Spatial Covariance Matrices: Towards Boosting BCI Performance

Ce Ju11{}^{1}start_FLOATSUPERSCRIPT 1 end_FLOATSUPERSCRIPT, Reinmar Josef Kobler22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT and Cuntai Guan11{}^{1}start_FLOATSUPERSCRIPT 1 end_FLOATSUPERSCRIPT 11{}^{1}start_FLOATSUPERSCRIPT 1 end_FLOATSUPERSCRIPT  Ce Ju and Cuntai Guan are with the S-Lab and School of Computer Science and Engineering, Nanyang Technological University, 50 Nanyang Avenue, Singapore (emails: {juce0001,ctguan}@ntu.edu.sg).22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT  Reinmar Josef Kobler is with RIKEN Artificial Intelligence Project, Tokyo, and Advanced Telecommunications Research Institute International, Kyoto, Japan (email: [email protected]).
Abstract

The efficacy of Electroencephalogram (EEG) classifiers can be augmented by increasing the quantity of available data. In the case of geometric deep learning classifiers, the input consists of spatial covariance matrices derived from EEGs. In order to synthesize these spatial covariance matrices and facilitate future improvements of geometric deep learning classifiers, we propose a generative modeling technique based on state-of-the-art score-based models. The quality of generated samples is evaluated through visual and quantitative assessments using a left/right-hand-movement motor imagery dataset. The exceptional pixel-level resolution of these generative samples highlights the formidable capacity of score-based generative modeling. Additionally, the center (Fréchet mean) of the generated samples aligns with neurophysiological evidence that event-related desynchronization and synchronization occur on electrodes C3 and C4 within the Mu and Beta frequency bands during motor imagery processing. The quantitative evaluation revealed that 84.3% of the generated samples could be accurately predicted by a pre-trained classifier and an improvement of up to 8.7% in the average accuracy over ten runs for a specific test subject in a holdout experiment.

I INTRODUCTION

Recently, deep learning techniques have been extensively adopted for classifying electroencephalography (EEG) data [1]. Despite this, a significant obstacle persists in the limited availability of training data [2]. To circumvent this constraint, researchers have turned to generative modeling, a rapidly evolving field in machine learning, to generate synthetic EEG time series through a process known as data augmentation [3]. This technique involves the creation of plausible samples that were not present in the original dataset, thereby expanding the training data with "unseen" examples.

To tackle the non-Euclidean characteristics inherent in EEG signals, researchers have been exploring the use of geometric deep learning methods to classify EEG signals in brain-computer interfaces (BCIs) [4, 5, 6, 7, 8, 9]. These techniques involve the application of second-order neural networks on matrices, known as spatial covariance matrices (SCMs), which are derived from EEG signals. It is noteworthy that these SCMs possess a wealth of discriminative information, including the variance of signals recorded by individual channels and the coherence between signals recorded by neighboring channels [6]. As a result, the development of generative models for SCMs with neurocognitive relevance presents a promising approach to enhancing the efficacy of geometric deep learning classifiers, ultimately delivering tangible advantages to BCI research.

In this study, we generate SCMs utilizing a cutting-edge generative modeling technique known as score-based generative modeling [10, 11]. Score-based generative models generate samples from noise by introducing a gradual increase in noise to the data, which is then undone through the estimation of the score function, which represents the gradient of the log-density function relative to the data. This noise perturbation can be described as a forward diffusion process modeled by a stochastic differential equation (SDE) [12]. This approach has been demonstrated to be successful in generating images, audio, and point clouds. In contrast to three-channel RGB images, which have pixel light intensities that range from 0 to 255, SCMs are generally preprocessed as multiple-channel square matrices that possess both symmetric and positive semidefinite properties and are decimal entities. We evaluate our approach using the Korea University (KU) dataset, which is the largest EEG-BCI dataset for two-class motor imagery classification.

Refer to caption
Figure 1: The procedure of sampling can be depicted in the subsequent manner: Initially, we possess a noise matrix, represented as XTsubscript𝑋𝑇X_{T}italic_X start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT. By employing the score-based generative modeling technique, we synthesize the spatial covariance matrix, X0subscript𝑋0X_{0}italic_X start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, with an intermediate state denoted as Xtsubscript𝑋𝑡X_{t}italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. Owing to the acquired knowledge, the generated SCM X0subscript𝑋0X_{0}italic_X start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is approximated as being nearly symmetric positive definite (SPD). In order to counterbalance numerical inconsistencies, we ensure the projection of X0subscript𝑋0X_{0}italic_X start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT as SPD by imposing a threshold upon all eigenvalues, denoted as ϵ=1e4italic-ϵ1𝑒4\epsilon=1e-4italic_ϵ = 1 italic_e - 4. The arrangement of SCM channels proceeds sequentially from beginning to end as follows: FC-5/3/1/2/4/6, C-5/3/1/z/2/4/5, and CP-5/3/1/z/2/4/6.  

II METHODOLOGY

In this section, we propose a two-step process for generating SCMs utilizing score-based generative modeling. The mathematical foundations of score-based generative modeling and projection are presented in Section IV-AIV-B, and IV-D. The process of sampling is depicted in Figure 1.

  • Score-based Generative Modeling: During the training process, unprocessed EEG signals undergo filtration and segmentation in both frequency and temporal domains, employing methodologies delineated in [4, 6]. Explicitly, a collection of bandpass filters (i.e., Chebyshev Type II filters) is utilized to dissect the EEGs into multiple-frequency passbands. Subsequently, a temporal segmentation scheme is executed to partition the EEGs into diminutive segments, with or without overlap. For a segment within T𝑇Titalic_T duration XnC×nT𝑋superscriptsubscript𝑛𝐶subscript𝑛𝑇X\in\mathbb{R}^{n_{C}\times n_{T}}italic_X ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT × italic_n start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, the spatial covariance matrix is denoted as S:=XXnC×nCassign𝑆𝑋superscript𝑋topsuperscriptsubscript𝑛𝐶subscript𝑛𝐶S:=X\cdot X^{\top}\in\mathbb{R}^{n_{C}\times n_{C}}italic_S := italic_X ⋅ italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT × italic_n start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, where nCsubscript𝑛𝐶n_{C}italic_n start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT and nTsubscript𝑛𝑇n_{T}italic_n start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT represent the quantity of channels and timestamps, correspondingly. In the terminal procedure of this phase, SCMs undergo scaling by division with their respective 2limit-fromsubscript2\ell_{2}-roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT -norms, indicated as S¯:=S/S2assign¯𝑆𝑆subscriptnorm𝑆subscript2\bar{S}:=S/||S||_{\ell_{2}}over¯ start_ARG italic_S end_ARG := italic_S / | | italic_S | | start_POSTSUBSCRIPT roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT. Utilizing score-based generative modeling, the unknown prior distribution pdata(S)subscript𝑝𝑑𝑎𝑡𝑎𝑆p_{data}(S)italic_p start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT ( italic_S ) is approximated through score matching, generating samples within specific frequency bands and temporal intervals of EEGs, employing either Langevin dynamics or time-reversal SDEs. The model is concurrently fitted for all frequency bands. During the sampling process, the generated samples consist of nC×nCsubscript𝑛𝐶subscript𝑛𝐶n_{C}\times n_{C}italic_n start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT × italic_n start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT matrices, albeit generally lacking symmetry and positivity.

  • Projection: In this step, we approximate the generated matrices to preserve symmetry and positivity. Specifically, suppose a generated matrix XnC×nC𝑋superscriptsubscript𝑛𝐶subscript𝑛𝐶X\in\mathbb{R}^{n_{C}\times n_{C}}italic_X ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT × italic_n start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, then the projected SCM as follows,

    S:=i=1nCmax{λi,ϵ}uiui,assignsuperscript𝑆superscriptsubscript𝑖1subscript𝑛𝐶subscript𝜆𝑖italic-ϵsubscript𝑢𝑖superscriptsubscript𝑢𝑖topS^{\dagger}:=\sum_{i=1}^{n_{C}}\max\{\lambda_{i},\epsilon\}u_{i}u_{i}^{\top},italic_S start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT := ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT end_POSTSUPERSCRIPT roman_max { italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_ϵ } italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ,

    where ϵ>0italic-ϵ0\epsilon>0italic_ϵ > 0 is a preset threshold, eigenvalues {λi}i=1nCsuperscriptsubscriptsubscript𝜆𝑖𝑖1subscript𝑛𝐶\{\lambda_{i}\}_{i=1}^{n_{C}}{ italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT end_POSTSUPERSCRIPT and corresponding orthonormal eigenvectors {ui}i=1nCsuperscriptsubscriptsubscript𝑢𝑖𝑖1subscript𝑛𝐶\{u_{i}\}_{i=1}^{n_{C}}{ italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT end_POSTSUPERSCRIPT are crafted from symmetric matrix 12(X+X)12𝑋superscript𝑋top\frac{1}{2}(X+X^{\top})divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( italic_X + italic_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ).

Refer to caption
(a) Raw/Generating Dist. for all nine frequency bands
Refer to caption
(b) Raw/Generating Dist. for the Mu and Beta frequency bands.
Refer to caption
(c) Fréchet Mean of the raw dataset. (Triangle sign in Subfigure 1(a))
Refer to caption
(d) Fréchet Mean of the generating dataset. (Cross sign in Subfigure 1(a))
Figure 2: Subfigures 1(a) and 1(b): Illustration of raw and generating distributions of the 2-dimensional projection of covariance matrices: Each 2-dimensional point in the figure is projected from its associated 20×20202020\times 2020 × 20 covariance matrices (i.e., a 400-dimensional tensor) using t-SNE. There are 151,200 points (i.e., 9 frequency bands ×\times× 8400 trials ×\times× raw/generating options) in Subfigure 1(a) and 84,000 points (i.e., 5 frequency bands ×\times× 8400 trials ×\times× raw/generating options) in Subfigure 1(b).
Subfigures 1(c) and 1(d): Illustration of Fréchet means of covariance matrices for the nine frequency bands, 4 similar-to\sim 8 Hz, 8 similar-to\sim 12 Hz, 12 similar-to\sim 16 Hz, 16 similar-to\sim 20 Hz, 20 similar-to\sim 24 Hz, 24 similar-to\sim 28 Hz, 28 similar-to\sim 32 Hz, 32 similar-to\sim 36 Hz, and 36 similar-to\sim 40 Hz.  
Refer to caption
(a) Fréchet Mean of the left-hand-class trials in the raw dataset.
Refer to caption
(b) Fréchet Mean of the right-hand-class trials in the raw dataset.
Refer to caption
(c) Fréchet Mean of the left-hand-class trials in the generating dataset.
Refer to caption
(d) Fréchet Mean of the right-hand-class trials in the generating dataset.
Figure 3: Illustration of Fréchet means of covariance matrices within the nine frequency bands for the left and right-hand classes. The highlight entities of SCMs in subfigures 2(a) and 2(c) (Mu and Beta bands) locate in the regions of FC4, C4, and CP4 over the scalp, while those in subfigures 2(b) and 2(d) fall in the regions of FC3, C3, and CP3.  
Refer to caption
(a) Real SCMs, left-hand (Trial No.27 of Subject No.1)
Refer to caption
(b) Generated SCMs, left-hand (Sample 1)
Refer to caption
(c) Generated SCMs, left-hand (Sample 2)
Refer to caption
(d) Real SCMs, right-hand (Trial No.8 of Subject No.1)
Refer to caption
(e) Generated SCMs, right-hand (Sample 1)
Refer to caption
(f) Generated SCMs, right-hand (Sample 2)
Figure 4: Conditional SCM Generation: In each line, we plot a picked SCM derived from an actual EEG segment for a category and another two generated samples within the same class.  

III EXPERIMENTS

III-A The Korea University Dataset

The Korea University (KU) Dataset 111  The KU dataset refers to the http://gigadb.org/dataset/100542 and its corresponding description article https://academic.oup.com/gigascience/article/8/5/giz002/5304369., also known as the OpenBMI dataset, was collected from 54 subjects performing a binary class EEG-MI task (i.e., the left-hand movement, and the right-hand movement). The EEG signals were captured at a rate of 1,000 Hz using 62 Ag/AgCl electrodes. The continuous EEG data were then segmented from 1,000 to 3,500 ms with reference to the stimulus onset. For evaluation, 20 electrodes in the motor cortex region were selected (i.e., FC-5/3/1/2/4/6, C-5/3/1/z/2/4/5, and CP-5/3/1/z/2/4/6). The study comprised two sessions, designated S1subscript𝑆1S_{1}italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and S2subscript𝑆2S_{2}italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, each of which was divided into two phases, a training phase, and a test phase, each with 100 trials balanced between right and left-hand imagery tasks, resulting in a total of 21,600 trials (i.e., 54 subjects ×\times× 2 sessions ×\times× 200 trials) available for evaluation.

In accordance with the subject-specific study of the KU dataset in [6], the subjects were divided into two groups: a training subject group and a test subject group. The criteria for inclusion in the training subject group were that the accuracies of 10-fold cross-validation on both S1subscript𝑆1S_{1}italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and S2subscript𝑆2S_{2}italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT must be higher than 70% (criterion level). A total of 21 subjects met this criterion and were included in the training subject group: Subjects No. 2, 3, 5, 6, 8, 12, 17, 18, 21, 22, 28, 29, 32, 33, 35, 36, 37, 39, 43, 44, and 45. The remaining 33 subjects were included in the test subject group: Subjects No. 1, 4, 7, 9, 10, 11, 13, 14, 15, 16, 19, 20, 23, 24, 25, 26, 27, 30, 31, 34, 38, 40, 41, 42, 46, 47, 48, 49, 50, 51, 52, 53, and 54.

III-B Parameters in Training Models

For the score-based generative modeling, the Variance Exploding (VE) SDE approach, incorporating the NCSN++ model architecture 222  The PyTorch implementation for Score-Based Generative Modeling refers to the GitHub repository (https://github.com/yang-song/score_sde_pytorch)., was selected for evaluation. We independently train two generative models to produce the left and right-hand samples using 4,200 trials in the training subject group, comprising 21 subjects over 2 sessions with 100 trials/per class each. The signal from each trial was initially transformed into a covariance matrix and scaled. The noise parameters were set to σmax=10subscript𝜎𝑚𝑎𝑥10\sigma_{max}=10italic_σ start_POSTSUBSCRIPT italic_m italic_a italic_x end_POSTSUBSCRIPT = 10 and σmin=0.01subscript𝜎𝑚𝑖𝑛0.01\sigma_{min}=0.01italic_σ start_POSTSUBSCRIPT italic_m italic_i italic_n end_POSTSUBSCRIPT = 0.01. The training process was performed over 100,000 iterations, utilizing a batch size of 128. It is noteworthy that the CNN filter size within NCSN++ was set to 20×20202020\times 2020 × 20 due to the reason discussed in Section III-D1. We pick ϵ=1e4italic-ϵ1𝑒4\epsilon=1e-4italic_ϵ = 1 italic_e - 4 in the projection step.

III-C Results

To alleviate the discrepancy between the raw and generated distributions resulting from the generative methods, we normalize each covariance matrix by zero-centering the means and scaling the variances to unity before visualization and quantitative analysis.

III-C1 Visualization

In Figure 1(a), we present 2-dimensional projections of the 8,400 trials’ raw SCMs from the training subject group and the generated 8,400 covariance matrices (both left and right-hand classes) using t-SNE [13]. The Fréchet means 333  Fréchet mean is introduced in Appendix IV-C. of both distributions are marked with triangle and cross signs, respectively. The two distributions are nearly coincident, and the Riemannian distance between their Fréchet means is relatively small (approximately 4.323), indicating that the center of the learned distribution closely matches the raw distribution. Figure 1(b) provides a more detailed view of the projections within the Mu and Beta frequency bands. Furthermore, a set of covariance matrices corresponding to the two labeled Fréchet means in Figure 1(a) have been plotted. The Fréchet mean was computed independently for each frequency band. The Fréchet mean of the frequency bands 812similar-to8128\sim 128 ∼ 12 Hz, 1216similar-to121612\sim 1612 ∼ 16 Hz, 1620similar-to162016\sim 2016 ∼ 20 Hz, and 2024similar-to202420\sim 2420 ∼ 24 Hz exhibit a similar profile, while the others differ. It is worth noting that these frequency bands are associated with event-related desynchronization and synchronization during cognitive and motor processing.

Figure 3 illustrates the Fréchet means of covariance matrices differentiated between the left and right-hand classes within nine frequency bands. Subfigures 2(a) and 2(b) display the SCMs with key regions highlighted, corresponding to neurophysiological findings. Specifically, the highlighted entities in subfigures 2(a) and 2(c) (Mu and Beta bands) are situated in the regions of FC4, C4, and CP4 across the scalp, while those in subfigures 2(b) and 2(d) are located in the regions of FC3, C3, and CP3 across the scalp. To provide a more comprehensive visual representation of the texture of the generated samples, Figure 4 offers a closer examination of the SCMs derived from actual EEGs and those generated from the proposed methodology for the two categories.

III-C2 Classification

To assess the generated samples’ performance, we classify them using the pre-trained Tensor-CSPNet model 444  The Python implementation of Tensor-CSPNet refers to the following GitHub repository: (https://github.com/GeometricBCI/Tensor-CSPNet-and-Graph-CSPNet). on all subjects (two sessions) in the training subject group, which comprises a total of 8400 trials. The model architecture adopts a simplified 2500 ms time window and incorporates two-level BiMap layers, transforming the input dimension of 20 to 30 and back to an output dimension of 20. There are 8400 balanced generated samples, with each class containing 4200. The pre-trained classifier predicts an accuracy of 84.30% over all samples, and the confusion matrix is as follows:

TABLE I: Confusion Matrix: Predicted labels in a total of 8400.
True \\\backslash\ Predicted Right-hand Left-hand
Right-hand 3730 (44.4%) 470 (5.6%)
Left-hand 849 (10.1%) 3351 (39.9%)
TABLE II: Cross-session classification with data augmentation approach: Each column depicts the number of samples incorporated into the training session. The samples are divided equally between the two classes - left-hand and right-hand. The selected cross-session scenario originates from the training and evaluation sessions in the KU dataset. The initial session of 200 trials and the added samples serve as the training data, while the first half of the second session, comprising 100 trials, are utilized for validation and the latter half, also consisting of 100 trials, for testing purposes. The results (%) presented encompass the mean of 10 times runs across all scenarios and the optimal performance.  
Argumentation
Samples None 20 40 60 80 100 120 140 160 180 200
Subject No.30
Avg.(Std.) 55.2(3.9) 52.1(3.9) 55.5(6.8) 56.2(2.9) 54.7(5.1) 53.8(4.4) 56.7(4.8) 57.2(4.3) 56.8(6.4) 58.3(5.3) 57.6(4.5)
Best 61.0 59.0 71.0 63.0 64.0 62.0 64.0 66.0 66.0 66.0 67.0
Subject No.42
Avg.(Std.) 59.2(3.5) 59.1(6.2) 63.2(4.6) 62.5(3.4) 65.6(4.6) 65.1(4.1) 64.8(3.6) 65.8(2.8) 65.4(3.6) 61.8(4.1) 67.9(2.5)
Best 63.0 66.0 69.0 67.0 72.0 73.0 71.0 70.0 72.0 69.0 72.0

In this study, we conducted an additional experiment in a cross-session setting where one session of trials was utilized for training, the first half of another for validation, and the second half for testing, which is also known as the holdout scenario. This task presents a significant challenge due to the signal variability across sessions, and many state-of-the-art algorithms, including geometric methods [4, 6] performed poorly, yielding accuracy rates below 70%. The proposed generative method was applied to generate SCMs using all subjects (two sessions) in the training subject group. The classifier, Tensor-CSPNet, was trained using the first session and the generated samples, validated in the first half of the second session, and evaluated in the second half for testing. Table II shows the cross-session classification accuracies, where each column of "None", "20", "40", "60", "80", "100", "120", "140", "160", "180", and "200" represents the number of added generative samples to the training set. The "None" column results are the typical cross-session outcomes but applied to normalized SCMs and without segmenting the time interval, thus slightly different from those in [6]. We selected Subjects 30 and 42 as representatives from the testing subject group. In the case of Subject 30, the average accuracy, calculated over ten runs, increased by 3.1% after the addition of 180 generated samples. Conversely, Subject 42 saw a substantial improvement of 8.7% in average accuracy after incorporating 200 generated samples in each trial.

III-D Discussions

This study explores a new method for generating SCMs for BCI applications using score-based generative modeling with the SDE approach. The generated samples are analyzed through both visual and quantitative evaluations. Visually, the samples produced by the proposed method have a comparable appearance to the SCMs obtained from actual EEG recordings. Furthermore, the center (Fréchet mean) of the generated samples aligns with neurophysiological findings that event-related desynchronization and synchronization occur on electrodes C3 and C4 within the Mu and Beta frequency bands during motor imagery processing. From a quantitative standpoint, 84.3% of the samples can be accurately predicted by a pre-trained Tensor-CSPNet, and holdout experiments on two subjects (Subject No.30 and No.42) show an improvement of up to 8.7% in the average accuracy of 10-times runs.

Although our findings are promising, the absence of evaluation metrics for generative models in the EEG-BCI classification, akin to those commonly employed in the computer vision domain, such as Inception Score [14] and Fréchet Inception Distance [15], precludes our study from providing more detailed outcomes, such as individual participant results. It is crucial to recognize that not all participants exhibit noticeable improvements after incorporating generative samples. At present, no established criterion exists for determining which subjects may benefit from this technique. Furthermore, the current approach has room for enhancement in several aspects, which we will address in following:

III-D1 Non-Euclidean Nature

In the experiments, the SCM channels are ordered from start to end as FC-5/3/1/2/4/6, C-5/3/1/z/2/4/5, and CP-5/3/1/z/2/4/6. The score-based generative model employs a CNN-structured architecture to capture local information from adjacent channels in this sequence. However, this order fails to reflect the correlations between EEG channels with respect to their spatial locations, a phenomenon referred to as the non-Euclidean nature, which results in limited performance. To tackle this problem, we propose a heuristic approach that sets the filter size to 20×20202020\times 2020 × 20, which corresponds to the total size of the SCMs. It may not be readily applicable to complex scenarios, as it can be challenging to capture the signal granularity with a large filter size.

III-D2 Randomness

It is possible that some of the generated samples may contain valuable discriminatory information for classification, while others may not. The randomness introduced by the sampling process in the score-based generative modeling may compromise the performance of the classifier. Additionally, as we just mentioned, this randomness also leads to the ineffectiveness of conventional evaluation methods, which is due to the varying generated samples used for assessment each time, yet there is no common evaluation metric for the generative models in the EEG-BCI classification. After all, the texture of EEG spatial covariance matrices is not even present in the general image recognition databases.

III-D3 Cross-frequency Coupling

A potential explanation for the limited performance could be attributed to the diversity of the generative model. Since each SCM over a specific frequency band is independently generated from random noise, the composite SCM generated from these independent SCMs may lack neurophysiological significance and have yet to be previously observed. In simpler terms, real SCMs derived from the EEG signal where changes in brain activity occur during cognitive and motor processing, resulting in event-related desynchronization and synchronization. However, a generative SCM may not have this same origin, even though it may appear similar. For instance, the SCM within the frequency range of 32 to 36 Hz, as depicted in Subfigure 3(e), highlights a novel instance of the typical occurrence of high-intensity activities within the Mu and Beta bands.

III-D4 Distribution Shift

Despite the fact that the generative samples may contain ample discriminatory information, the limited performance observed may still stem from the disparity between the prior and learned distributions. This incongruence can result in variations in the numerical ranges of pixels or entities within the SCMs. To mitigate this challenge, we utilize a simple heuristic normalization technique for the covariance matrices by zero-centering the means and scaling the variances to unity. This approach results in well-overlap** raw and generated distributions, but it may not always be a reliable method in complex scenarios.

IV APPENDICES

In the appendices, we will briefly introduce score-based generative modeling. For a formal convention, we suppose we have samples of spatial covariance matrices {SinC×nC}i=1Nsuperscriptsubscriptsubscript𝑆𝑖superscriptsubscript𝑛𝐶subscript𝑛𝐶𝑖1𝑁\{S_{i}\in\mathbb{R}^{n_{C}\times n_{C}}\}_{i=1}^{N}{ italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT × italic_n start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT from an (unknown) distribution pdata(S)subscript𝑝𝑑𝑎𝑡𝑎𝑆p_{data}(S)italic_p start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT ( italic_S ).

IV-A Score-based Generative Modeling

In the score-based generative modeling, the score network sθ:nC×nCnC×nC:subscript𝑠𝜃superscriptsubscript𝑛𝐶subscript𝑛𝐶superscriptsubscript𝑛𝐶subscript𝑛𝐶s_{\theta}:\mathbb{R}^{n_{C}\times n_{C}}\longmapsto\mathbb{R}^{n_{C}\times n_% {C}}italic_s start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT : blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT × italic_n start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ⟼ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT × italic_n start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT end_POSTSUPERSCRIPT is a deep neural network parametrized by θ𝜃\thetaitalic_θ and used to learn the score of a probability density Slogp(S)subscript𝑆𝑝𝑆\nabla_{S}\log p(S)∇ start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT roman_log italic_p ( italic_S ). To train score network sθsubscript𝑠𝜃s_{\theta}italic_s start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT, a technique called denoising score matching [16] is proposed to firstly replace pdata(S)subscript𝑝𝑑𝑎𝑡𝑎𝑆p_{data}(S)italic_p start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT ( italic_S ) using a Gaussian noise σ𝜎\sigmaitalic_σ-pertubed version pdataσ(S~)superscriptsubscript𝑝𝑑𝑎𝑡𝑎𝜎~𝑆p_{data}^{\sigma}(\tilde{S})italic_p start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_σ end_POSTSUPERSCRIPT ( over~ start_ARG italic_S end_ARG ), where pdataσ(S~)=p𝒩σ(S~|S)pdata(S)𝑑Ssuperscriptsubscript𝑝𝑑𝑎𝑡𝑎𝜎~𝑆superscriptsubscript𝑝𝒩𝜎conditional~𝑆𝑆subscript𝑝𝑑𝑎𝑡𝑎𝑆differential-d𝑆p_{data}^{\sigma}(\tilde{S})=\int p_{\mathcal{N}}^{\sigma}(\tilde{S}|S)\cdot p% _{data}(S)~{}dSitalic_p start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_σ end_POSTSUPERSCRIPT ( over~ start_ARG italic_S end_ARG ) = ∫ italic_p start_POSTSUBSCRIPT caligraphic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_σ end_POSTSUPERSCRIPT ( over~ start_ARG italic_S end_ARG | italic_S ) ⋅ italic_p start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT ( italic_S ) italic_d italic_S, and the denoising objective 𝒥D(θ)subscript𝒥𝐷𝜃\mathcal{J}_{D}(\theta)caligraphic_J start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT ( italic_θ ) with noise level σ𝜎\sigmaitalic_σ is then given as follows,

𝒥Dσ(θ):=𝔼p𝒩σ(S~|S)pdata(S)||sθ(S~)S~logp𝒩σ(S~|S)||.\mathcal{J}_{D}^{\sigma}(\theta):=\mathbb{E}_{p_{\mathcal{N}}^{\sigma}(\tilde{% S}|S)\cdot p_{data}(S)}||s_{\theta}(\tilde{S})-\nabla_{\tilde{S}}\log p_{% \mathcal{N}}^{\sigma}(\tilde{S}|S)||.caligraphic_J start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_σ end_POSTSUPERSCRIPT ( italic_θ ) := blackboard_E start_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT caligraphic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_σ end_POSTSUPERSCRIPT ( over~ start_ARG italic_S end_ARG | italic_S ) ⋅ italic_p start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT ( italic_S ) end_POSTSUBSCRIPT | | italic_s start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( over~ start_ARG italic_S end_ARG ) - ∇ start_POSTSUBSCRIPT over~ start_ARG italic_S end_ARG end_POSTSUBSCRIPT roman_log italic_p start_POSTSUBSCRIPT caligraphic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_σ end_POSTSUPERSCRIPT ( over~ start_ARG italic_S end_ARG | italic_S ) | | .

Keep in mind that the noise model term S~logp𝒩σ(S~|S)subscript~𝑆superscriptsubscript𝑝𝒩𝜎conditional~𝑆𝑆\nabla_{\tilde{S}}\log p_{\mathcal{N}}^{\sigma}(\tilde{S}|S)∇ start_POSTSUBSCRIPT over~ start_ARG italic_S end_ARG end_POSTSUBSCRIPT roman_log italic_p start_POSTSUBSCRIPT caligraphic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_σ end_POSTSUPERSCRIPT ( over~ start_ARG italic_S end_ARG | italic_S ) has a simple analytic form, written S~logp𝒩σ(S~|S)=1σ2(S~S)subscript~𝑆superscriptsubscript𝑝𝒩𝜎conditional~𝑆𝑆1superscript𝜎2~𝑆𝑆\nabla_{\tilde{S}}\log p_{\mathcal{N}}^{\sigma}(\tilde{S}|S)=-\frac{1}{\sigma^% {2}}(\tilde{S}-S)∇ start_POSTSUBSCRIPT over~ start_ARG italic_S end_ARG end_POSTSUBSCRIPT roman_log italic_p start_POSTSUBSCRIPT caligraphic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_σ end_POSTSUPERSCRIPT ( over~ start_ARG italic_S end_ARG | italic_S ) = - divide start_ARG 1 end_ARG start_ARG italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ( over~ start_ARG italic_S end_ARG - italic_S ). In the sampling phase, Langevin dynamics are applied to recursively generate samples using the score function sθsubscript𝑠𝜃s_{\theta}italic_s start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT as follows,

S~t=S~t1+ϵ2sθ(S~t1)+ϵZt,subscript~𝑆𝑡subscript~𝑆𝑡1italic-ϵ2subscript𝑠𝜃subscript~𝑆𝑡1italic-ϵsubscript𝑍𝑡\tilde{S}_{t}=\tilde{S}_{t-1}+\frac{\epsilon}{2}\cdot s_{\theta}(\tilde{S}_{t-% 1})+\sqrt{\epsilon}\cdot Z_{t},over~ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = over~ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT + divide start_ARG italic_ϵ end_ARG start_ARG 2 end_ARG ⋅ italic_s start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( over~ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) + square-root start_ARG italic_ϵ end_ARG ⋅ italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ,

where initial S~0π(x)similar-tosubscript~𝑆0𝜋𝑥\tilde{S}_{0}\sim\pi(x)over~ start_ARG italic_S end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∼ italic_π ( italic_x ) (prior distribution) and fixed step size ϵ>0italic-ϵ0\epsilon>0italic_ϵ > 0 are given, and Zt𝒩(0,I)subscript𝑍𝑡𝒩0𝐼Z_{t}\in\mathcal{N}(0,I)italic_Z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ caligraphic_N ( 0 , italic_I ).

IV-B Diffusing Samples with an SDE

For a continuum of distribution evolving over time t𝑡titalic_t, the score-based generative modeling has been further established within a unified framework of stochastic diffusion equations (SDEs) with diffusion probabilistic modeling. [12] Technically, the SDEs are of the form as follows,

dS=f(S,t)dt+g(t)dW,𝑑𝑆𝑓𝑆𝑡𝑑𝑡𝑔𝑡𝑑𝑊dS=f(S,t)~{}dt+g(t)~{}dW,italic_d italic_S = italic_f ( italic_S , italic_t ) italic_d italic_t + italic_g ( italic_t ) italic_d italic_W ,

where f(,t):nC×nCnC×nC:𝑓𝑡superscriptsubscript𝑛𝐶subscript𝑛𝐶superscriptsubscript𝑛𝐶subscript𝑛𝐶f(\cdot,t):\mathbb{R}^{n_{C}\times n_{C}}\longmapsto\mathbb{R}^{n_{C}\times n_% {C}}italic_f ( ⋅ , italic_t ) : blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT × italic_n start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ⟼ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT × italic_n start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT end_POSTSUPERSCRIPT and g(t)𝑔𝑡g(t)\in\mathbb{R}italic_g ( italic_t ) ∈ blackboard_R are the drift and diffusion coefficient respectively, and WnC×nC𝑊superscriptsubscript𝑛𝐶subscript𝑛𝐶W\in\mathbb{R}^{n_{C}\times n_{C}}italic_W ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT × italic_n start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT end_POSTSUPERSCRIPT is a standard Wiener process. The solution of the above SDE is a diffusion process {S(t)}t[0,T]subscript𝑆𝑡𝑡0𝑇\{S(t)\}_{t\in[0,T]}{ italic_S ( italic_t ) } start_POSTSUBSCRIPT italic_t ∈ [ 0 , italic_T ] end_POSTSUBSCRIPT over a finite time horizon [0,T]0𝑇[0,T][ 0 , italic_T ], and pt(S)subscript𝑝𝑡𝑆p_{t}(S)italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_S ) is the marginal distribution of S(t)𝑆𝑡S(t)italic_S ( italic_t ). For variance exploding (VE) SDEs, f(St,t):=αtsθ(St)assign𝑓subscript𝑆𝑡𝑡subscript𝛼𝑡subscript𝑠𝜃subscript𝑆𝑡f(S_{t},t):=\alpha_{t}\cdot s_{\theta}(S_{t})italic_f ( italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t ) := italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ⋅ italic_s start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) and g(St,t):=2αtassign𝑔subscript𝑆𝑡𝑡2subscript𝛼𝑡g(S_{t},t):=\sqrt{2\alpha_{t}}italic_g ( italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t ) := square-root start_ARG 2 italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG. The score-based generative modeling relies on the following time-reversal diffusion process for generating samples,

dS=(f(x,t)g(t)2Slogpt(S))dt+g(t)dW¯,𝑑𝑆𝑓𝑥𝑡𝑔superscript𝑡2subscript𝑆subscript𝑝𝑡𝑆𝑑𝑡𝑔𝑡𝑑¯𝑊dS=\big{(}f(x,t)-g(t)^{2}\cdot\nabla_{S}\log p_{t}(S)\big{)}~{}dt+g(t)~{}d\bar% {W},italic_d italic_S = ( italic_f ( italic_x , italic_t ) - italic_g ( italic_t ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⋅ ∇ start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_S ) ) italic_d italic_t + italic_g ( italic_t ) italic_d over¯ start_ARG italic_W end_ARG ,

where W¯nC×nC¯𝑊superscriptsubscript𝑛𝐶subscript𝑛𝐶\bar{W}\in\mathbb{R}^{n_{C}\times n_{C}}over¯ start_ARG italic_W end_ARG ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT × italic_n start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT end_POSTSUPERSCRIPT is a standard Wiener process in the reverse-time direction. A time-dependent neural network sθ(S,t)subscript𝑠𝜃𝑆𝑡s_{\theta}(S,t)italic_s start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_S , italic_t ) is used to estimate Slogpt(S)subscript𝑆subscript𝑝𝑡𝑆\nabla_{S}\log p_{t}(S)∇ start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_S ) by squeezing the following loss 𝒥D(θ;λ)subscript𝒥𝐷𝜃𝜆\mathcal{J}_{D}\big{(}\theta;\lambda\big{)}caligraphic_J start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT ( italic_θ ; italic_λ ) as

0T𝔼p0t(S~|S)pdata(S)λ(t)||sθ(S~,t)S~logp0t(S~|S)||dt,\int_{0}^{T}\mathbb{E}_{p_{0t}(\tilde{S}|S)\cdot p_{data}(S)}\lambda(t)\cdot||% s_{\theta}(\tilde{S},t)-\nabla_{\tilde{S}}\log p_{0t}(\tilde{S}|S)||~{}dt,∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT blackboard_E start_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT 0 italic_t end_POSTSUBSCRIPT ( over~ start_ARG italic_S end_ARG | italic_S ) ⋅ italic_p start_POSTSUBSCRIPT italic_d italic_a italic_t italic_a end_POSTSUBSCRIPT ( italic_S ) end_POSTSUBSCRIPT italic_λ ( italic_t ) ⋅ | | italic_s start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( over~ start_ARG italic_S end_ARG , italic_t ) - ∇ start_POSTSUBSCRIPT over~ start_ARG italic_S end_ARG end_POSTSUBSCRIPT roman_log italic_p start_POSTSUBSCRIPT 0 italic_t end_POSTSUBSCRIPT ( over~ start_ARG italic_S end_ARG | italic_S ) | | italic_d italic_t ,

where p0t(S~|S)subscript𝑝0𝑡conditional~𝑆𝑆p_{0t}(\tilde{S}|S)italic_p start_POSTSUBSCRIPT 0 italic_t end_POSTSUBSCRIPT ( over~ start_ARG italic_S end_ARG | italic_S ) is the transition distribution from S(0)𝑆0S(0)italic_S ( 0 ) to S(t)𝑆𝑡S(t)italic_S ( italic_t ), and λ:[0,T]>0:𝜆0𝑇subscriptabsent0\lambda:[0,T]\rightarrow\mathbb{R}_{>0}italic_λ : [ 0 , italic_T ] → blackboard_R start_POSTSUBSCRIPT > 0 end_POSTSUBSCRIPT is a positive weighting function.

IV-C Fréchet Mean

The generated sample in the current approach is in the form of an SPD matrix, which means that traditional measures for generative modeling in computer vision, such as the Inception score and Fréchet Inception Distance, cannot be used. The Riemannian distance on SPD manifolds is used as an alternative method to evaluate the distance between the prior and generated distributions. From a mathematical perspective, a conventional treatment to view the space of spatial covariance matrices is on the symmetric positive definite (SPD) manifolds, which is equipped with a Riemannian metric, i.e., affine invariant Riemannian metric (AIRM) [17], written as (𝒮++,AIRM)superscript𝒮absent𝐴𝐼𝑅𝑀(\mathcal{S}^{++},AIRM)( caligraphic_S start_POSTSUPERSCRIPT + + end_POSTSUPERSCRIPT , italic_A italic_I italic_R italic_M ). The Riemannian distance between two spatial covariance matrices is dAIRM(S1,S2):=log(S11S2)assignsubscript𝑑𝐴𝐼𝑅𝑀subscript𝑆1subscript𝑆2subscriptnormsuperscriptsubscript𝑆11subscript𝑆2d_{AIRM}(S_{1},S_{2}):=||\log{(S_{1}^{-1}\cdot S_{2})}||_{\mathcal{F}}italic_d start_POSTSUBSCRIPT italic_A italic_I italic_R italic_M end_POSTSUBSCRIPT ( italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) := | | roman_log ( italic_S start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ⋅ italic_S start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) | | start_POSTSUBSCRIPT caligraphic_F end_POSTSUBSCRIPT, where \mathcal{F}caligraphic_F is Frobenius norm and log\logroman_log is the logarithm. Given a set of SPD matrices {S1,,SN}superscript𝑆1superscript𝑆𝑁\{S^{1},\cdots,S^{N}\}{ italic_S start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , ⋯ , italic_S start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT }, the Fréchet mean μ𝜇\muitalic_μ of that set is given as follows,

μ:=argminμ𝒮++1Ni=1NdAIRM2(Si,μ).assign𝜇subscript𝜇superscript𝒮absent1𝑁superscriptsubscript𝑖1𝑁superscriptsubscript𝑑𝐴𝐼𝑅𝑀2superscript𝑆𝑖𝜇\mu:=\arg\min_{\mu\in\mathcal{S}^{++}}\frac{1}{N}\cdot\sum_{i=1}^{N}~{}d_{AIRM% }^{2}(S^{i},\mu).italic_μ := roman_arg roman_min start_POSTSUBSCRIPT italic_μ ∈ caligraphic_S start_POSTSUPERSCRIPT + + end_POSTSUPERSCRIPT end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ⋅ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_A italic_I italic_R italic_M end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_S start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT , italic_μ ) .

IV-D Mathematical Fundamentals in the Projection Step

Consider a d×d𝑑𝑑d\times ditalic_d × italic_d real symmetric matrix S𝑆Sitalic_S that possesses eigenvalues λ1λ2λdsubscript𝜆1subscript𝜆2subscript𝜆𝑑\lambda_{1}\geq\lambda_{2}\geq\cdots\geq\lambda_{d}italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≥ italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≥ ⋯ ≥ italic_λ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT and corresponding orthonormal eigenvectors {ui}i=1dsuperscriptsubscriptsubscript𝑢𝑖𝑖1𝑑\{u_{i}\}_{i=1}^{d}{ italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT. In this context, the spectral decomposition of S𝑆Sitalic_S is expressed as S=i=1dλiuiuiT𝑆superscriptsubscript𝑖1𝑑subscript𝜆𝑖subscript𝑢𝑖superscriptsubscript𝑢𝑖𝑇S=\sum_{i=1}^{d}\lambda_{i}\cdot u_{i}u_{i}^{T}italic_S = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⋅ italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT. To deal with nonnegative eigenvalues in symmetric matrices, the following lemma from [6] introduces a fundamental technique: Projection S:=i=1dmax{λi,0}uiuiTassignsuperscript𝑆superscriptsubscript𝑖1𝑑subscript𝜆𝑖0subscript𝑢𝑖superscriptsubscript𝑢𝑖𝑇S^{\dagger}:=\sum_{i=1}^{d}\max{\{\lambda_{i},0\}}\,u_{i}u_{i}^{T}italic_S start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT := ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT roman_max { italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , 0 } italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT on the positive semidefinite cone is the extremum of the minimization problem SS22superscriptsubscriptnorm𝑆superscript𝑆22||S-S^{\dagger}||_{2}^{2}| | italic_S - italic_S start_POSTSUPERSCRIPT † end_POSTSUPERSCRIPT | | start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT subject to S0succeeds-or-equals𝑆0S\succeq 0italic_S ⪰ 0.

V ACKNOWLEDGMENT

This work was supported under the RIE2020 Industry Alignment Fund–Industry Collaboration Projects (IAF-ICP) Funding Initiative, as well as cash and in-kind contributions from industry partner(s); This work was supported by the RIE2020 AME Programmatic Fund, Singapore (No. A20G8b0102); This work was also supported by Innovative Science and Technology Initiative for Security Grant Number JPJ004596, ATLA, Japan.

References

  • [1] R. T. Schirrmeister, J. T. Springenberg, L. D. J. Fiederer, M. Glasstetter, K. Eggensperger, M. Tangermann, F. Hutter, W. Burgard, and T. Ball, “Deep learning with convolutional neural networks for eeg decoding and visualization,” Human brain map**, vol. 38, no. 11, pp. 5391–5420, 2017.
  • [2] C. Ju, D. Gao, R. Mane, B. Tan, Y. Liu, and C. Guan, “Federated transfer learning for eeg signal classification,” in 2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC).   IEEE, 2020, pp. 3040–3045.
  • [3] K. G. Hartmann, R. T. Schirrmeister, and T. Ball, “Eeg-gan: Generative adversarial networks for electroencephalograhic (eeg) brain signals,” arXiv preprint arXiv:1806.01875, 2018.
  • [4] C. Ju and C. Guan, “Tensor-cspnet: A novel geometric deep learning framework for motor imagery classification,” IEEE Transactions on Neural Networks and Learning Systems, 2022.
  • [5] ——, “Deep optimal transport on spd manifolds for domain adaptation,” arXiv preprint arXiv:2201.05745, 2022.
  • [6] ——, “Graph neural networks on spd manifolds for motor imagery classification: A perspective from the time-frequency analysis,” arXiv preprint arXiv:2211.02641, 2022.
  • [7] R. Kobler, J.-i. Hirayama, Q. Zhao, and M. Kawanabe, “Spd domain-specific batch normalization to crack interpretable unsupervised domain adaptation in eeg,” in Advances in Neural Information Processing Systems, vol. 35, 2022, pp. 6219–6235.
  • [8] Y.-T. Pan, J.-L. Chou, and C.-S. Wei, “Matt: A manifold attention network for eeg decoding,” arXiv preprint arXiv:2210.01986, 2022.
  • [9] D. Wilson, L. A. W. Gemein, R. T. Schirrmeister, and T. Ball, “Deep riemannian networks for eeg decoding,” arXiv preprint arXiv:2212.10426, 2022.
  • [10] Y. Song and S. Ermon, “Generative modeling by estimating gradients of the data distribution,” Advances in Neural Information Processing Systems, vol. 32, 2019.
  • [11] ——, “Improved techniques for training score-based generative models,” Advances in neural information processing systems, vol. 33, pp. 12 438–12 448, 2020.
  • [12] Y. Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole, “Score-based generative modeling through stochastic differential equations,” in International Conference on Learning Representations, 2021.
  • [13] L. Van der Maaten and G. Hinton, “Visualizing data using t-sne.” Journal of machine learning research, vol. 9, no. 11, 2008.
  • [14] T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and X. Chen, “Improved techniques for training gans,” Advances in neural information processing systems, vol. 29, 2016.
  • [15] M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, “Gans trained by a two time-scale update rule converge to a local nash equilibrium,” Advances in neural information processing systems, vol. 30, 2017.
  • [16] P. Vincent, “A connection between score matching and denoising autoencoders,” Neural computation, vol. 23, no. 7, pp. 1661–1674, 2011.
  • [17] X. Pennec, P. Fillard, and N. Ayache, “A riemannian framework for tensor computing,” International Journal of computer vision, vol. 66, no. 1, pp. 41–66, 2006.