Score-Based Data Generation for EEG Spatial Covariance Matrices: Towards Boosting BCI Performance
Abstract
The efficacy of Electroencephalogram (EEG) classifiers can be augmented by increasing the quantity of available data. In the case of geometric deep learning classifiers, the input consists of spatial covariance matrices derived from EEGs. In order to synthesize these spatial covariance matrices and facilitate future improvements of geometric deep learning classifiers, we propose a generative modeling technique based on state-of-the-art score-based models. The quality of generated samples is evaluated through visual and quantitative assessments using a left/right-hand-movement motor imagery dataset. The exceptional pixel-level resolution of these generative samples highlights the formidable capacity of score-based generative modeling. Additionally, the center (Fréchet mean) of the generated samples aligns with neurophysiological evidence that event-related desynchronization and synchronization occur on electrodes C3 and C4 within the Mu and Beta frequency bands during motor imagery processing. The quantitative evaluation revealed that 84.3% of the generated samples could be accurately predicted by a pre-trained classifier and an improvement of up to 8.7% in the average accuracy over ten runs for a specific test subject in a holdout experiment.
I INTRODUCTION
Recently, deep learning techniques have been extensively adopted for classifying electroencephalography (EEG) data [1]. Despite this, a significant obstacle persists in the limited availability of training data [2]. To circumvent this constraint, researchers have turned to generative modeling, a rapidly evolving field in machine learning, to generate synthetic EEG time series through a process known as data augmentation [3]. This technique involves the creation of plausible samples that were not present in the original dataset, thereby expanding the training data with "unseen" examples.
To tackle the non-Euclidean characteristics inherent in EEG signals, researchers have been exploring the use of geometric deep learning methods to classify EEG signals in brain-computer interfaces (BCIs) [4, 5, 6, 7, 8, 9]. These techniques involve the application of second-order neural networks on matrices, known as spatial covariance matrices (SCMs), which are derived from EEG signals. It is noteworthy that these SCMs possess a wealth of discriminative information, including the variance of signals recorded by individual channels and the coherence between signals recorded by neighboring channels [6]. As a result, the development of generative models for SCMs with neurocognitive relevance presents a promising approach to enhancing the efficacy of geometric deep learning classifiers, ultimately delivering tangible advantages to BCI research.
In this study, we generate SCMs utilizing a cutting-edge generative modeling technique known as score-based generative modeling [10, 11]. Score-based generative models generate samples from noise by introducing a gradual increase in noise to the data, which is then undone through the estimation of the score function, which represents the gradient of the log-density function relative to the data. This noise perturbation can be described as a forward diffusion process modeled by a stochastic differential equation (SDE) [12]. This approach has been demonstrated to be successful in generating images, audio, and point clouds. In contrast to three-channel RGB images, which have pixel light intensities that range from 0 to 255, SCMs are generally preprocessed as multiple-channel square matrices that possess both symmetric and positive semidefinite properties and are decimal entities. We evaluate our approach using the Korea University (KU) dataset, which is the largest EEG-BCI dataset for two-class motor imagery classification.
II METHODOLOGY
In this section, we propose a two-step process for generating SCMs utilizing score-based generative modeling. The mathematical foundations of score-based generative modeling and projection are presented in Section IV-A, IV-B, and IV-D. The process of sampling is depicted in Figure 1.
-
•
Score-based Generative Modeling: During the training process, unprocessed EEG signals undergo filtration and segmentation in both frequency and temporal domains, employing methodologies delineated in [4, 6]. Explicitly, a collection of bandpass filters (i.e., Chebyshev Type II filters) is utilized to dissect the EEGs into multiple-frequency passbands. Subsequently, a temporal segmentation scheme is executed to partition the EEGs into diminutive segments, with or without overlap. For a segment within duration , the spatial covariance matrix is denoted as , where and represent the quantity of channels and timestamps, correspondingly. In the terminal procedure of this phase, SCMs undergo scaling by division with their respective norms, indicated as . Utilizing score-based generative modeling, the unknown prior distribution is approximated through score matching, generating samples within specific frequency bands and temporal intervals of EEGs, employing either Langevin dynamics or time-reversal SDEs. The model is concurrently fitted for all frequency bands. During the sampling process, the generated samples consist of matrices, albeit generally lacking symmetry and positivity.
-
•
Projection: In this step, we approximate the generated matrices to preserve symmetry and positivity. Specifically, suppose a generated matrix , then the projected SCM as follows,
where is a preset threshold, eigenvalues and corresponding orthonormal eigenvectors are crafted from symmetric matrix .
III EXPERIMENTS
III-A The Korea University Dataset
The Korea University (KU) Dataset 111 The KU dataset refers to the http://gigadb.org/dataset/100542 and its corresponding description article https://academic.oup.com/gigascience/article/8/5/giz002/5304369., also known as the OpenBMI dataset, was collected from 54 subjects performing a binary class EEG-MI task (i.e., the left-hand movement, and the right-hand movement). The EEG signals were captured at a rate of 1,000 Hz using 62 Ag/AgCl electrodes. The continuous EEG data were then segmented from 1,000 to 3,500 ms with reference to the stimulus onset. For evaluation, 20 electrodes in the motor cortex region were selected (i.e., FC-5/3/1/2/4/6, C-5/3/1/z/2/4/5, and CP-5/3/1/z/2/4/6). The study comprised two sessions, designated and , each of which was divided into two phases, a training phase, and a test phase, each with 100 trials balanced between right and left-hand imagery tasks, resulting in a total of 21,600 trials (i.e., 54 subjects 2 sessions 200 trials) available for evaluation.
In accordance with the subject-specific study of the KU dataset in [6], the subjects were divided into two groups: a training subject group and a test subject group. The criteria for inclusion in the training subject group were that the accuracies of 10-fold cross-validation on both and must be higher than 70% (criterion level). A total of 21 subjects met this criterion and were included in the training subject group: Subjects No. 2, 3, 5, 6, 8, 12, 17, 18, 21, 22, 28, 29, 32, 33, 35, 36, 37, 39, 43, 44, and 45. The remaining 33 subjects were included in the test subject group: Subjects No. 1, 4, 7, 9, 10, 11, 13, 14, 15, 16, 19, 20, 23, 24, 25, 26, 27, 30, 31, 34, 38, 40, 41, 42, 46, 47, 48, 49, 50, 51, 52, 53, and 54.
III-B Parameters in Training Models
For the score-based generative modeling, the Variance Exploding (VE) SDE approach, incorporating the NCSN++ model architecture 222 The PyTorch implementation for Score-Based Generative Modeling refers to the GitHub repository (https://github.com/yang-song/score_sde_pytorch)., was selected for evaluation. We independently train two generative models to produce the left and right-hand samples using 4,200 trials in the training subject group, comprising 21 subjects over 2 sessions with 100 trials/per class each. The signal from each trial was initially transformed into a covariance matrix and scaled. The noise parameters were set to and . The training process was performed over 100,000 iterations, utilizing a batch size of 128. It is noteworthy that the CNN filter size within NCSN++ was set to due to the reason discussed in Section III-D1. We pick in the projection step.
III-C Results
To alleviate the discrepancy between the raw and generated distributions resulting from the generative methods, we normalize each covariance matrix by zero-centering the means and scaling the variances to unity before visualization and quantitative analysis.
III-C1 Visualization
In Figure 1(a), we present 2-dimensional projections of the 8,400 trials’ raw SCMs from the training subject group and the generated 8,400 covariance matrices (both left and right-hand classes) using t-SNE [13]. The Fréchet means 333 Fréchet mean is introduced in Appendix IV-C. of both distributions are marked with triangle and cross signs, respectively. The two distributions are nearly coincident, and the Riemannian distance between their Fréchet means is relatively small (approximately 4.323), indicating that the center of the learned distribution closely matches the raw distribution. Figure 1(b) provides a more detailed view of the projections within the Mu and Beta frequency bands. Furthermore, a set of covariance matrices corresponding to the two labeled Fréchet means in Figure 1(a) have been plotted. The Fréchet mean was computed independently for each frequency band. The Fréchet mean of the frequency bands Hz, Hz, Hz, and Hz exhibit a similar profile, while the others differ. It is worth noting that these frequency bands are associated with event-related desynchronization and synchronization during cognitive and motor processing.
Figure 3 illustrates the Fréchet means of covariance matrices differentiated between the left and right-hand classes within nine frequency bands. Subfigures 2(a) and 2(b) display the SCMs with key regions highlighted, corresponding to neurophysiological findings. Specifically, the highlighted entities in subfigures 2(a) and 2(c) (Mu and Beta bands) are situated in the regions of FC4, C4, and CP4 across the scalp, while those in subfigures 2(b) and 2(d) are located in the regions of FC3, C3, and CP3 across the scalp. To provide a more comprehensive visual representation of the texture of the generated samples, Figure 4 offers a closer examination of the SCMs derived from actual EEGs and those generated from the proposed methodology for the two categories.
III-C2 Classification
To assess the generated samples’ performance, we classify them using the pre-trained Tensor-CSPNet model 444 The Python implementation of Tensor-CSPNet refers to the following GitHub repository: (https://github.com/GeometricBCI/Tensor-CSPNet-and-Graph-CSPNet). on all subjects (two sessions) in the training subject group, which comprises a total of 8400 trials. The model architecture adopts a simplified 2500 ms time window and incorporates two-level BiMap layers, transforming the input dimension of 20 to 30 and back to an output dimension of 20. There are 8400 balanced generated samples, with each class containing 4200. The pre-trained classifier predicts an accuracy of 84.30% over all samples, and the confusion matrix is as follows:
True Predicted | Right-hand | Left-hand |
---|---|---|
Right-hand | 3730 (44.4%) | 470 (5.6%) |
Left-hand | 849 (10.1%) | 3351 (39.9%) |
Argumentation | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Samples | None | 20 | 40 | 60 | 80 | 100 | 120 | 140 | 160 | 180 | 200 |
Subject No.30 | |||||||||||
Avg.(Std.) | 55.2(3.9) | 52.1(3.9) | 55.5(6.8) | 56.2(2.9) | 54.7(5.1) | 53.8(4.4) | 56.7(4.8) | 57.2(4.3) | 56.8(6.4) | 58.3(5.3) | 57.6(4.5) |
Best | 61.0 | 59.0 | 71.0 | 63.0 | 64.0 | 62.0 | 64.0 | 66.0 | 66.0 | 66.0 | 67.0 |
Subject No.42 | |||||||||||
Avg.(Std.) | 59.2(3.5) | 59.1(6.2) | 63.2(4.6) | 62.5(3.4) | 65.6(4.6) | 65.1(4.1) | 64.8(3.6) | 65.8(2.8) | 65.4(3.6) | 61.8(4.1) | 67.9(2.5) |
Best | 63.0 | 66.0 | 69.0 | 67.0 | 72.0 | 73.0 | 71.0 | 70.0 | 72.0 | 69.0 | 72.0 |
In this study, we conducted an additional experiment in a cross-session setting where one session of trials was utilized for training, the first half of another for validation, and the second half for testing, which is also known as the holdout scenario. This task presents a significant challenge due to the signal variability across sessions, and many state-of-the-art algorithms, including geometric methods [4, 6] performed poorly, yielding accuracy rates below 70%. The proposed generative method was applied to generate SCMs using all subjects (two sessions) in the training subject group. The classifier, Tensor-CSPNet, was trained using the first session and the generated samples, validated in the first half of the second session, and evaluated in the second half for testing. Table II shows the cross-session classification accuracies, where each column of "None", "20", "40", "60", "80", "100", "120", "140", "160", "180", and "200" represents the number of added generative samples to the training set. The "None" column results are the typical cross-session outcomes but applied to normalized SCMs and without segmenting the time interval, thus slightly different from those in [6]. We selected Subjects 30 and 42 as representatives from the testing subject group. In the case of Subject 30, the average accuracy, calculated over ten runs, increased by 3.1% after the addition of 180 generated samples. Conversely, Subject 42 saw a substantial improvement of 8.7% in average accuracy after incorporating 200 generated samples in each trial.
III-D Discussions
This study explores a new method for generating SCMs for BCI applications using score-based generative modeling with the SDE approach. The generated samples are analyzed through both visual and quantitative evaluations. Visually, the samples produced by the proposed method have a comparable appearance to the SCMs obtained from actual EEG recordings. Furthermore, the center (Fréchet mean) of the generated samples aligns with neurophysiological findings that event-related desynchronization and synchronization occur on electrodes C3 and C4 within the Mu and Beta frequency bands during motor imagery processing. From a quantitative standpoint, 84.3% of the samples can be accurately predicted by a pre-trained Tensor-CSPNet, and holdout experiments on two subjects (Subject No.30 and No.42) show an improvement of up to 8.7% in the average accuracy of 10-times runs.
Although our findings are promising, the absence of evaluation metrics for generative models in the EEG-BCI classification, akin to those commonly employed in the computer vision domain, such as Inception Score [14] and Fréchet Inception Distance [15], precludes our study from providing more detailed outcomes, such as individual participant results. It is crucial to recognize that not all participants exhibit noticeable improvements after incorporating generative samples. At present, no established criterion exists for determining which subjects may benefit from this technique. Furthermore, the current approach has room for enhancement in several aspects, which we will address in following:
III-D1 Non-Euclidean Nature
In the experiments, the SCM channels are ordered from start to end as FC-5/3/1/2/4/6, C-5/3/1/z/2/4/5, and CP-5/3/1/z/2/4/6. The score-based generative model employs a CNN-structured architecture to capture local information from adjacent channels in this sequence. However, this order fails to reflect the correlations between EEG channels with respect to their spatial locations, a phenomenon referred to as the non-Euclidean nature, which results in limited performance. To tackle this problem, we propose a heuristic approach that sets the filter size to , which corresponds to the total size of the SCMs. It may not be readily applicable to complex scenarios, as it can be challenging to capture the signal granularity with a large filter size.
III-D2 Randomness
It is possible that some of the generated samples may contain valuable discriminatory information for classification, while others may not. The randomness introduced by the sampling process in the score-based generative modeling may compromise the performance of the classifier. Additionally, as we just mentioned, this randomness also leads to the ineffectiveness of conventional evaluation methods, which is due to the varying generated samples used for assessment each time, yet there is no common evaluation metric for the generative models in the EEG-BCI classification. After all, the texture of EEG spatial covariance matrices is not even present in the general image recognition databases.
III-D3 Cross-frequency Coupling
A potential explanation for the limited performance could be attributed to the diversity of the generative model. Since each SCM over a specific frequency band is independently generated from random noise, the composite SCM generated from these independent SCMs may lack neurophysiological significance and have yet to be previously observed. In simpler terms, real SCMs derived from the EEG signal where changes in brain activity occur during cognitive and motor processing, resulting in event-related desynchronization and synchronization. However, a generative SCM may not have this same origin, even though it may appear similar. For instance, the SCM within the frequency range of 32 to 36 Hz, as depicted in Subfigure 3(e), highlights a novel instance of the typical occurrence of high-intensity activities within the Mu and Beta bands.
III-D4 Distribution Shift
Despite the fact that the generative samples may contain ample discriminatory information, the limited performance observed may still stem from the disparity between the prior and learned distributions. This incongruence can result in variations in the numerical ranges of pixels or entities within the SCMs. To mitigate this challenge, we utilize a simple heuristic normalization technique for the covariance matrices by zero-centering the means and scaling the variances to unity. This approach results in well-overlap** raw and generated distributions, but it may not always be a reliable method in complex scenarios.
IV APPENDICES
In the appendices, we will briefly introduce score-based generative modeling. For a formal convention, we suppose we have samples of spatial covariance matrices from an (unknown) distribution .
IV-A Score-based Generative Modeling
In the score-based generative modeling, the score network is a deep neural network parametrized by and used to learn the score of a probability density . To train score network , a technique called denoising score matching [16] is proposed to firstly replace using a Gaussian noise -pertubed version , where , and the denoising objective with noise level is then given as follows,
Keep in mind that the noise model term has a simple analytic form, written . In the sampling phase, Langevin dynamics are applied to recursively generate samples using the score function as follows,
where initial (prior distribution) and fixed step size are given, and .
IV-B Diffusing Samples with an SDE
For a continuum of distribution evolving over time , the score-based generative modeling has been further established within a unified framework of stochastic diffusion equations (SDEs) with diffusion probabilistic modeling. [12] Technically, the SDEs are of the form as follows,
where and are the drift and diffusion coefficient respectively, and is a standard Wiener process. The solution of the above SDE is a diffusion process over a finite time horizon , and is the marginal distribution of . For variance exploding (VE) SDEs, and . The score-based generative modeling relies on the following time-reversal diffusion process for generating samples,
where is a standard Wiener process in the reverse-time direction. A time-dependent neural network is used to estimate by squeezing the following loss as
where is the transition distribution from to , and is a positive weighting function.
IV-C Fréchet Mean
The generated sample in the current approach is in the form of an SPD matrix, which means that traditional measures for generative modeling in computer vision, such as the Inception score and Fréchet Inception Distance, cannot be used. The Riemannian distance on SPD manifolds is used as an alternative method to evaluate the distance between the prior and generated distributions. From a mathematical perspective, a conventional treatment to view the space of spatial covariance matrices is on the symmetric positive definite (SPD) manifolds, which is equipped with a Riemannian metric, i.e., affine invariant Riemannian metric (AIRM) [17], written as . The Riemannian distance between two spatial covariance matrices is , where is Frobenius norm and is the logarithm. Given a set of SPD matrices , the Fréchet mean of that set is given as follows,
IV-D Mathematical Fundamentals in the Projection Step
Consider a real symmetric matrix that possesses eigenvalues and corresponding orthonormal eigenvectors . In this context, the spectral decomposition of is expressed as . To deal with nonnegative eigenvalues in symmetric matrices, the following lemma from [6] introduces a fundamental technique: Projection on the positive semidefinite cone is the extremum of the minimization problem subject to .
V ACKNOWLEDGMENT
This work was supported under the RIE2020 Industry Alignment Fund–Industry Collaboration Projects (IAF-ICP) Funding Initiative, as well as cash and in-kind contributions from industry partner(s); This work was supported by the RIE2020 AME Programmatic Fund, Singapore (No. A20G8b0102); This work was also supported by Innovative Science and Technology Initiative for Security Grant Number JPJ004596, ATLA, Japan.
References
- [1] R. T. Schirrmeister, J. T. Springenberg, L. D. J. Fiederer, M. Glasstetter, K. Eggensperger, M. Tangermann, F. Hutter, W. Burgard, and T. Ball, “Deep learning with convolutional neural networks for eeg decoding and visualization,” Human brain map**, vol. 38, no. 11, pp. 5391–5420, 2017.
- [2] C. Ju, D. Gao, R. Mane, B. Tan, Y. Liu, and C. Guan, “Federated transfer learning for eeg signal classification,” in 2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC). IEEE, 2020, pp. 3040–3045.
- [3] K. G. Hartmann, R. T. Schirrmeister, and T. Ball, “Eeg-gan: Generative adversarial networks for electroencephalograhic (eeg) brain signals,” arXiv preprint arXiv:1806.01875, 2018.
- [4] C. Ju and C. Guan, “Tensor-cspnet: A novel geometric deep learning framework for motor imagery classification,” IEEE Transactions on Neural Networks and Learning Systems, 2022.
- [5] ——, “Deep optimal transport on spd manifolds for domain adaptation,” arXiv preprint arXiv:2201.05745, 2022.
- [6] ——, “Graph neural networks on spd manifolds for motor imagery classification: A perspective from the time-frequency analysis,” arXiv preprint arXiv:2211.02641, 2022.
- [7] R. Kobler, J.-i. Hirayama, Q. Zhao, and M. Kawanabe, “Spd domain-specific batch normalization to crack interpretable unsupervised domain adaptation in eeg,” in Advances in Neural Information Processing Systems, vol. 35, 2022, pp. 6219–6235.
- [8] Y.-T. Pan, J.-L. Chou, and C.-S. Wei, “Matt: A manifold attention network for eeg decoding,” arXiv preprint arXiv:2210.01986, 2022.
- [9] D. Wilson, L. A. W. Gemein, R. T. Schirrmeister, and T. Ball, “Deep riemannian networks for eeg decoding,” arXiv preprint arXiv:2212.10426, 2022.
- [10] Y. Song and S. Ermon, “Generative modeling by estimating gradients of the data distribution,” Advances in Neural Information Processing Systems, vol. 32, 2019.
- [11] ——, “Improved techniques for training score-based generative models,” Advances in neural information processing systems, vol. 33, pp. 12 438–12 448, 2020.
- [12] Y. Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole, “Score-based generative modeling through stochastic differential equations,” in International Conference on Learning Representations, 2021.
- [13] L. Van der Maaten and G. Hinton, “Visualizing data using t-sne.” Journal of machine learning research, vol. 9, no. 11, 2008.
- [14] T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and X. Chen, “Improved techniques for training gans,” Advances in neural information processing systems, vol. 29, 2016.
- [15] M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, “Gans trained by a two time-scale update rule converge to a local nash equilibrium,” Advances in neural information processing systems, vol. 30, 2017.
- [16] P. Vincent, “A connection between score matching and denoising autoencoders,” Neural computation, vol. 23, no. 7, pp. 1661–1674, 2011.
- [17] X. Pennec, P. Fillard, and N. Ayache, “A riemannian framework for tensor computing,” International Journal of computer vision, vol. 66, no. 1, pp. 41–66, 2006.