11institutetext: Department of Electrical Engineering and Information Technology, University of Naples Federico II, Naples, Italy 22institutetext: Laboratory of Augmented Reality for Health Monitoring (ARHeMLab)

SincVAE: a New Approach to Improve Anomaly Detection on EEG Data Using SincNet and Variational Autoencoder

Andrea Pollastro 1122    Francesco Isgrò 1122    Roberto Prevete 1122
Abstract

Over the past few decades, electroencephalography (EEG) monitoring has become a pivotal tool for diagnosing neurological disorders, particularly for detecting seizures. Epilepsy, one of the most prevalent neurological diseases worldwide, affects approximately the 1 % of the population. These patients face significant risks, underscoring the need for reliable, continuous seizure monitoring in daily life. Most of the techniques discussed in the literature rely on supervised Machine Learning (ML) methods. However, the challenge of accurately labeling variations in epileptic EEG waveforms complicates the use of these approaches. Additionally, the rarity of ictal events introduces an high imbalancing within the data, which could lead to poor prediction performance in supervised learning approaches. Instead, a semi-supervised approach allows to train the model only on data not containing seizures, thus avoiding the issues related to the data imbalancing. This work proposes a semi-supervised approach for detecting epileptic seizures from EEG data, utilizing a novel Deep Learning-based method called SincVAE. This proposal incorporates the learning of an ad-hoc array of bandpass filter as a first layer of a Variational Autoencoder (VAE), potentially eliminating the preprocessing stage where informative band frequencies are identified and isolated. Results indicate that SincVAE improves seizure detection in EEG data and is capable of identifying early seizures during the preictal stage as well as monitoring patients throughout the postictal stage.

Keywords:
Variational Autoencoders SincNet Anomaly Detection Seizure Detection Brain Computer Interfaces

1 Introduction

In the last years, the information technology growth had a significant impact on our daily life. In particular, its impact is evident by observing the quantity of data produced every day. These vast datasets provide a snapshot of the entities under observation, offering valuable insights for companies and organizations. These insights not only enhance understanding but also furnish competitive advantages. Consequently, it is crucial that these datasets are meticolously processed [1]. Anomalies (known also as outliers, deviants or rare events in some context [2]) represent unique behaviors within observed phenomena that can significantly influence the data generation process [2, 3]. The presence of anomalies in data during analysis can be dangerous, as they may lead to erroneous conclusions during data interpretation. Consequently, it is crucial for the analysts that these anomalies are meticulously identified and properly addressed of both before and during the analysis process. The increasing interest in identifying and analyzing anomalies led scientists to isolate this problem into the active and dedicated research field of anomaly detection.

In most instances, phenomena are monitored over time using time series data. Within this framework, not every outlier is pertinent to an analysis. For instance, some outliers could be attributed to sensor transmission errors or other sources of noise, while others might represent unusual phenomena, such as those observed in fraud detection scenarios [2]. In the former scenario, outliers can be eliminated or corrected to enhance data quality. Conversely, in the latter case, these outliers transform into anomalies of interest that needs special attention as they often provide significant and crucial insights across diverse application domains [2]. For instance, anomalies in credit card transaction records may indicate potential fraud or identity theft [4]. Similarly, an unusual pattern in network traffic could suggest that a compromised computer is transmitting confidential information to an unauthorized destination [5]. Additionally, anomalies detected in sensor data from civil infrastructure might signal structural damage [6, 7].

Significantly, healthcare represents a domain where anomaly detection can have a profound impact [8, 9, 10, 11, 12]. For example, the ability to detect abnormal physiological data through these techniques can expedite emergency responses and provide new insights into the progression of medical conditions, greatly influencing everyday life. In [13], the authors presented a methodology for identifying anomalies in heart rate data, leveraging on its value as a noninvasive indicator of health concerns and physical activity. Meanwhile, the study in [14] explores the potential of anomaly detection within healthcare analytics, specifically through IoT systems.

Numerous studies have explored detecting epileptic seizures as abnormal brain activity using electroencephalography (EEG) data, applicable in various settings, including Brain-Computer Interfaces (BCI) [15]. In such contexts, accurately identifying seizures through EEG can trigger hardware interventions designed to assist patients and enhance their quality of life [16, 17, 18]. Epilepsy is a chronic central nervous system disorder characterized by recurrent seizures, affecting approximately 1 % of the global population [19, 20]. Seizures manifest as temporary disruptions in brain electrical activity, leading to symptoms like attention lapses, memory gaps, sensory hallucinations, or full-body convulsions. Despite treatment with various anti-epileptic drugs, about one-third of affected individuals frequently experience seizures that are challenging to control. These seizures significantly increase the risk of injury, limit personal independence and mobility, and can result in social and economic difficulties [21, 22]. The brain activity of individuals with epilepsy can be categorized into four states: regular brain activity (interictal), brain activity preceding the seizure (preictal), brain activity during the seizure (ictal), and brain activity immediately following a seizure (postictal) [23].

Among various methodologies [24, 25], Machine Learning (ML) techniques provide systematic approaches for extracting insights from the vast amounts of data. These techniques enable researchers to explore the anomaly detection landscape, proposing solutions to diverse scenarios. In particular, Deep Learning (DL) methods have consistently outperformed traditional Machine Learning approaches in the last decades. This success is largely due to their ability to extract intricate patterns within complex, high-dimensional datasets [26, 27]. As a result, DL has taken a leading role in various fields that leverage data-driven strategies, especially in the development of anomaly detection methodologies [28, 29, 30]. A significant portion of DL approaches for anomaly detection relies on Autoencoder (AE) architectures [31, 32, 33]. AEs are neural networks comprising two main components: an encoder, which compresses the input into a latent representation, and a decoder, which reconstructs that representation back into the input space. In anomaly detection, a well-established strategy involves training Autoencoders (AEs) specifically to minimize reconstruction errors for normal data instances. This technique results in higher reconstruction errors when processing anomalous data, making these errors a useful anomaly score. When combined with a user-defined decision rule, this score becomes an effective tool for classifying data as normal or anomalous [6]. AEs architectures can be realized in various forms, including Variational Autoencoders (VAE), that is a generative model where the encoder and the decoder does not represent a functional map** as in standard AEs [34]. Due to their promising performances, there is a growing interest in using generative models to identify anomalies [35, 36, 8, 37] that, within the context of AEs, has been manifested through the use of VAEs [38, 39, 40, 41].

VAEs can be implemented using a variety of processing strategies documented in literature, including Multilayer Perceptrons (MLP) [42] and Recurrent Neural Networks (RNN) [43]. Convolutional Neural Networks (CNNs), widely utilized for processing time series data [44, 45, 46, 47, 48], operate through a series of trainable filters. These filters are inspired by the biological mechanisms of visual perception, enabling the recognition of informative patterns from the input data [49]. In the field of speaker recognition, Ravanelli and Bengio introduced SincNet [50], a CNN whose filters are structured as a learnable array of parametrized sinc functions that are designed to operate as bandpass filters.

Numerous ML methods have been applied to detect epileptic seizures, with the aim of not only mitigate risks for patients but also enhance their ability to seek timely assistance and reduce the likelihood of injury [23, 51, 52, 53]. Although the physiological activity involved in epilepsy is inherently multiclass, many studies, such as [21, 22], approach seizure detection as a binary supervised classification problem. In this framework, the two classes to be identified are seizure activity (ictal) and non-seizure activity (interictal). This reduction to two classes is due to the challenges and impracticalities associated with an expert’s ability to identify and label transitional states between ictal and interictal states. Conversely, having an expert categorize brain electrical activity into seizure and non-seizure aligns with standard clinical protocols [22].

This work proposes SincVAE, a DL architecture that introduces SincNet in the VAE framework to process EEG data for seizure detection in a semi-supervised setting. Bandpass filters are crucial for isolating meaningful frequency bands from input signals, especially in BCI applications [54, 55, 56, 57]. However, the application of bandpass filters typically involves two stages: (i) selecting well-known informative band frequencies pertinent to the specific context, and (ii) conducting an analysis where the analyst identifies and isolates relevant frequency bands from the data. SincNet offers an efficient solution in this process, providing a compact and precise method to develop custom bandpass filters optimized for specific applications. Applications of seizure detection could benefit from this improvement since it could allow to enhance (or eventually, eliminate), preliminary phases dedicated to the study and extraction of frequency bands from the data and, thus, achieve a more refined and precise extraction of frequency bands, leading to an acceleration and improvement of the whole processing pipeline development. A semi-supervised approach is particularly advantageous as it allows to fit a model on training sets that contain only normal instances [2, 58, 59, 6]. This choice is motivated by the flexibility this approach offers, especially in scenarios with imbalanced datasets, where anomalous data points are scarce or even absent. This is particularly relevant in healthcare, where the prevalence of normal data contrasts with the rarity of abnormal data.

2 Related Works

This section reviews existing research that employs AEs and VAEs for detecting seizures from EEG data.

Khan et al. in [60] shows a novel seizure detection method by integrating AEs with traditional classifiers in an hybrid model. Specifically, the AE is used to extract features from the input data through its encoder. These latent representations are then fed into a classifier, such as an Support Vector Machine (SVM) [61] or k-Nearest Neighbors (k-NN) [61], to perform a supervised classification. Yuan et al. in [62] proposed a novel approach that employs an AE model to extract multi-view features from multi-channel EEG data. Then, the features extracted are used for supervised seizure detection. Abdelhameed et al. in [63] proposed a methodology based on convolutional VAE to extract features from EEG input data with the goal of eliminating the need of an engineered feature extraction phase previous to the model fitting. Then, the extracted features are fed in a supervised classifier to detect seizures. The same authors in [64] improved the feature extraction phase by using a two-dimensional Deep Convolutional Autoencoder (2D-DCAE). The extracted features are then used to train a neural network-based classifier for seizure detection in a supervised manner. In [65] instead, the same authors proposed a methodology based on a convolutional VAE trained in a supervised manner to perform simultaneously automatic feature learning and classification on the data latent representations. Daoud et al. in [66] compared two methodologies, both having the automatic features extraction as main goal. The first method utilizes a deep convolutional AE, where features extracted from the encoder are classified using a Multilayer Perceptron (MLP) classifier. The second method consists in an unsupervised pipeline integrating a deep convolutional VAE with the K-Means [61] clustering algorithm fitted on the latent representations of data. Wang et al. in [67] introduced the Residual Convolution VAE (RCVAE) method to extract features from EEG recordings. The extracted features are used to train a supervised neural network classifier for the seizure detection. The same research group in [68] proposed an improved version called Residual Convolution VAE with Randomly Translation Strategy (RTS-RCVAE) to solve issues related to introduce data augmentation strategies. Wen et al. in [69] proposed the AE-CDNN method to extract the feature previous to a supervised classification stage performed using MLPs. Similarly, Shoeibi et al. in [70] employed autoencoders for dimensionality reduction. Latent representations of data were used to fit several methods to classify seizures including adaptive neuro-fuzzy inference system (ANFIS) and its variants.

Most of the works on seizure detection leverages on AEs or VAEs primarily for feature extraction, which are then utilized in conjunction with supervised methods for classifying EEG data. There exists a relatively small subset of methodologies that rely exclusively on AEs and VAEs for the entire process of seizure detection, using only the reconstruction error metrics for the classification stage. Huang et al. in [71] used AEs for feature extraction in epilepsy detection, comparing their performance against traditional Principal Component Analysis (PCA) [61]. Then, the authors employed three metrics, i.e. original-to-reconstructed signal ratio (ORSR), MSE and Cosine Similarity (CS), to evaluate the signal reconstruction and identify these metrics as sensitive indicators for epilepsy. Also, the authors utilize permutation importance and SHapley Additive exPlanations (SHAP) [72] for model interpretability, confirming the better efficacy and rationale of the AE-based feature extraction compared to the PCA one. The authors in [73] leveraged on a three-layered convolutional VAEs trained exclusively on non-seizure EEG recordings to detect seizures. Reconstruction error was used to identify seizure activity. In particular, they adopted the median reconstruction error as a metric to distinguish between seizure and non-seizure events. You et al. in [12] used a VAE to model the latent representations of non-seizure EEG signals. They then used deviations from these baseline representations in conjunction with reconstruction loss to devise a personalized anomaly score for each patient. De Sousa et al. in [74] used AEs and VAEs to detect Interictal Epileptiform Discharges (IEDs) by treating these events as anomalies within EEG data. The comparative analysis in their study revealed that VAEs outperformed traditional AEs, likely due to their enhanced ability to model the distribution of EEG data and handle anomalies more effectively. Potter et al. in [75] proposed an architecture based on AE with a transformer encoder to reconstruct EEG recordings of non-seizure activity. Then, reconstruction error is served as anomaly score to detect EEG recordings containing seizures,

3 Proposed Method

In this section, we will introduce the SincNet model to the reader, followed by an overview of the VAE framework. Finally, we will present SincVAE, the proposed method of this work.

3.1 SincNet

Ravanelli and Bengio in [50] introduced SincNet in the context of speaker recognition. The convolution operation in a standard CNN layer is represented as follows [50]:

y[n]=x[n]h[n]=l=0L1x[l]h[nl],𝑦delimited-[]𝑛𝑥delimited-[]𝑛delimited-[]𝑛superscriptsubscript𝑙0𝐿1𝑥delimited-[]𝑙delimited-[]𝑛𝑙y[n]=x[n]*h[n]=\sum_{l=0}^{L-1}x[l]\cdot h[n-l],italic_y [ italic_n ] = italic_x [ italic_n ] ∗ italic_h [ italic_n ] = ∑ start_POSTSUBSCRIPT italic_l = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L - 1 end_POSTSUPERSCRIPT italic_x [ italic_l ] ⋅ italic_h [ italic_n - italic_l ] , (1)

where x[n]𝑥delimited-[]𝑛x[n]italic_x [ italic_n ] is the input signal segment, h[n]delimited-[]𝑛h[n]italic_h [ italic_n ] is the filter of length L𝐿Litalic_L, * denotes the convolution operation, and y[n]𝑦delimited-[]𝑛y[n]italic_y [ italic_n ] is the output. Typically, each element of the filter h[n]delimited-[]𝑛h[n]italic_h [ italic_n ] is learned during the training phase.

SincNet modifies this process by using a pre-defined function g𝑔gitalic_g that depends on a limited number of learnable parameters θ𝜃\thetaitalic_θ:

y[n]=x[n]g[n,θ]𝑦delimited-[]𝑛𝑥delimited-[]𝑛𝑔𝑛𝜃y[n]=x[n]*g[n,\theta]italic_y [ italic_n ] = italic_x [ italic_n ] ∗ italic_g [ italic_n , italic_θ ] (2)

This function g𝑔gitalic_g is designed to implement a filterbank consisting of rectangular bandpass filters. The magnitude of a generic bandpass filter, in the frequency domain, can be expressed as the difference between two low-pass filters as follows [50]:

G[f,f1,f2]=rect(f2f2)rect(f2f1),𝐺𝑓subscript𝑓1subscript𝑓2rect𝑓2subscript𝑓2rect𝑓2subscript𝑓1G[f,f_{1},f_{2}]=\text{rect}\Bigl{(}\frac{f}{2f_{2}}\Bigr{)}-\text{rect}\Bigl{% (}\frac{f}{2f_{1}}\Bigr{)},italic_G [ italic_f , italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ] = rect ( divide start_ARG italic_f end_ARG start_ARG 2 italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG ) - rect ( divide start_ARG italic_f end_ARG start_ARG 2 italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG ) , (3)

with f1subscript𝑓1f_{1}italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and f2subscript𝑓2f_{2}italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT representing the learned low and high cutoff frequencies, respectively. The rect()rect\text{rect}(\cdot)rect ( ⋅ ) function denotes the rectangular function in the frequency domain. In the time domain, the function g𝑔gitalic_g is defined as:

g[n,f1,f2]=2f2sinc(2πf2n)2f1sinc(2πf1n)𝑔𝑛subscript𝑓1subscript𝑓22subscript𝑓2sinc2𝜋subscript𝑓2𝑛2subscript𝑓1sinc2𝜋subscript𝑓1𝑛g[n,f_{1},f_{2}]=2f_{2}\text{sinc}(2\pi f_{2}n)-2f_{1}\text{sinc}(2\pi f_{1}n)italic_g [ italic_n , italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ] = 2 italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT sinc ( 2 italic_π italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_n ) - 2 italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT sinc ( 2 italic_π italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_n ) (4)

where sinc(x)=sin(x)/xsinc𝑥𝑥𝑥\text{sinc}(x)=\sin(x)/xsinc ( italic_x ) = roman_sin ( italic_x ) / italic_x.

The cutoff frequencies are initialized randomly within the range [0,fs/2]0subscript𝑓𝑠2[0,f_{s}/2][ 0 , italic_f start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT / 2 ], where fssubscript𝑓𝑠f_{s}italic_f start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT is the sampling frequency of the input signal. To ensure that f10subscript𝑓10f_{1}\geq 0italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≥ 0 and f2f1subscript𝑓2subscript𝑓1f_{2}\geq f_{1}italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≥ italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, the parameters are adjusted as:

f1abssuperscriptsubscript𝑓1abs\displaystyle f_{1}^{\text{abs}}italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT abs end_POSTSUPERSCRIPT =|f1|absentsubscript𝑓1\displaystyle=|f_{1}|= | italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | (5)
f2abssuperscriptsubscript𝑓2abs\displaystyle f_{2}^{\text{abs}}italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT abs end_POSTSUPERSCRIPT =f1+|f2f1|.absentsubscript𝑓1subscript𝑓2subscript𝑓1\displaystyle=f_{1}+|f_{2}-f_{1}|.= italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + | italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | . (6)

The authors in [50] points out that the training process naturally keeps f2subscript𝑓2f_{2}italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT below the Nyquist frequency [76], eliminating the need for explicit constraints.

Additionally, SincNet applies a windowing function [77] to smooth the discontinuities at the ends of g𝑔gitalic_g. This is achieved by multiplying g𝑔gitalic_g with a Hamming window [78]:

g[n,f1,f2]=g[n,f1,f2]w[n],𝑔𝑛subscript𝑓1subscript𝑓2𝑔𝑛subscript𝑓1subscript𝑓2𝑤delimited-[]𝑛g[n,f_{1},f_{2}]=g[n,f_{1},f_{2}]\cdot w[n],italic_g [ italic_n , italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ] = italic_g [ italic_n , italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ] ⋅ italic_w [ italic_n ] , (7)

where w[n]𝑤delimited-[]𝑛w[n]italic_w [ italic_n ] is defined as:

w[n]=0.540.46cos(2πnL).𝑤delimited-[]𝑛0.540.462𝜋𝑛𝐿w[n]=0.54-0.46\cdot\cos\Bigl{(}\frac{2\pi n}{L}\Bigr{)}.italic_w [ italic_n ] = 0.54 - 0.46 ⋅ roman_cos ( divide start_ARG 2 italic_π italic_n end_ARG start_ARG italic_L end_ARG ) . (8)

All operations within SincNet are differentiable, allowing the optimization of cutoff frequencies alongside other neural network parameters through gradient-based methods.

3.2 Variational Autoencoder

A Variational Autoencoder (VAE) is a generative model made of two principal components: a probabilistic decoder and a probabilistic encoder. The decoder, with parameters θ𝜃\thetaitalic_θ, models the likelihood function pθ(x|z)subscript𝑝𝜃conditional𝑥𝑧p_{\theta}(x|z)italic_p start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_x | italic_z ) and generates new data x𝑥xitalic_x given a latent variable z𝑧zitalic_z. The encoder, with parameters ϕitalic-ϕ\phiitalic_ϕ, models the posterior distribution qϕ(z|x)subscript𝑞italic-ϕconditional𝑧𝑥q_{\phi}(z|x)italic_q start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT ( italic_z | italic_x ) to approximate the true intractable posterior pθ(z|x)subscript𝑝𝜃conditional𝑧𝑥p_{\theta}(z|x)italic_p start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_z | italic_x ) [34]. During the training of a VAE, both θ𝜃\thetaitalic_θ and ϕitalic-ϕ\phiitalic_ϕ are optimized through the following generative process [79]:

maxϕ,θ𝔼qϕ(z|x)[logpθ(x|z)]subscriptitalic-ϕ𝜃subscript𝔼subscript𝑞italic-ϕconditional𝑧𝑥delimited-[]subscript𝑝𝜃conditional𝑥𝑧\max_{\phi,\theta}\mathbb{E}_{q_{\phi}(z|x)}[\log{p_{\theta}(x|z)}]roman_max start_POSTSUBSCRIPT italic_ϕ , italic_θ end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT ( italic_z | italic_x ) end_POSTSUBSCRIPT [ roman_log italic_p start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_x | italic_z ) ] (9)

which can be re-written as:

logpθ(x|z)=DKL(q(z|x)||p(z))+(θ,ϕ;x,z)\log{p_{\theta}(x|z)}=D_{KL}(q(z|x)||p(z))+\mathcal{L}(\theta,\phi;x,z)roman_log italic_p start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_x | italic_z ) = italic_D start_POSTSUBSCRIPT italic_K italic_L end_POSTSUBSCRIPT ( italic_q ( italic_z | italic_x ) | | italic_p ( italic_z ) ) + caligraphic_L ( italic_θ , italic_ϕ ; italic_x , italic_z ) (10)

where DKL()subscript𝐷𝐾𝐿D_{KL}(\cdot)italic_D start_POSTSUBSCRIPT italic_K italic_L end_POSTSUBSCRIPT ( ⋅ ) is the non-negative Kullback–Leibler (KL) divergence and p(z)𝑝𝑧p(z)italic_p ( italic_z ) represents the prior distribution over the latent variables z𝑧zitalic_z [79]. Since the KL divergence is non-negative, the term (θ,ϕ;x,z)𝜃italic-ϕ𝑥𝑧\mathcal{L}(\theta,\phi;x,z)caligraphic_L ( italic_θ , italic_ϕ ; italic_x , italic_z ) is called Evidence Lower Bound (ELBO) of logpθ(x|z)subscript𝑝𝜃conditional𝑥𝑧\log{p_{\theta}(x|z)}roman_log italic_p start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_x | italic_z ), and it can be rewritten as below:

logpθ(x|z)(θ,ϕ;x,z)=DKL(qϕ(z|x)||p(z))+𝔼qϕ(z|x)[logpθ(x|z)]\log{p_{\theta}(x|z)}\geq\mathcal{L}(\theta,\phi;x,z)=-D_{KL}(q_{\phi}(z|x)||p% (z))\\ +{\mathbb{E}}_{q_{\phi}(z|x)}[\log{p_{\theta}(x|z)}]roman_log italic_p start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_x | italic_z ) ≥ caligraphic_L ( italic_θ , italic_ϕ ; italic_x , italic_z ) = - italic_D start_POSTSUBSCRIPT italic_K italic_L end_POSTSUBSCRIPT ( italic_q start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT ( italic_z | italic_x ) | | italic_p ( italic_z ) ) + blackboard_E start_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT ( italic_z | italic_x ) end_POSTSUBSCRIPT [ roman_log italic_p start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_x | italic_z ) ] (11)

where the second term is an expected reconstruction error between the input and generated data.

The VAE optimization can be focused on maximizing the ELBO [79]. A key challenge in this process is the need to sample random latent variables z𝑧zitalic_z from qϕ(z|x)subscript𝑞italic-ϕconditional𝑧𝑥q_{\phi}(z|x)italic_q start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT ( italic_z | italic_x ), which makes the training intractable. The reparametrization trick is employed to avoid this problem: by assuming both the prior p(z)𝑝𝑧p(z)italic_p ( italic_z ) and the posterior qϕ(z|x)subscript𝑞italic-ϕconditional𝑧𝑥q_{\phi}(z|x)italic_q start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT ( italic_z | italic_x ) to be Gaussian distributions with a diagonal covariance matrix, with the prior p(z)𝑝𝑧p(z)italic_p ( italic_z ) set to the isotropic unit Gaussian 𝒩(0,I)𝒩0𝐼\mathcal{N}(0,I)caligraphic_N ( 0 , italic_I ), each random variable ziqϕ(zi|x)=𝒩(μi,σi)similar-tosubscript𝑧𝑖subscript𝑞italic-ϕconditionalsubscript𝑧𝑖𝑥𝒩subscript𝜇𝑖subscript𝜎𝑖z_{i}\sim q_{\phi}(z_{i}|x)=\mathcal{N}(\mu_{i},\sigma_{i})italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∼ italic_q start_POSTSUBSCRIPT italic_ϕ end_POSTSUBSCRIPT ( italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_x ) = caligraphic_N ( italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) is reparametrized as differential transformation of a noise variable ϵi𝒩(0,1)similar-tosubscriptitalic-ϵ𝑖𝒩01\epsilon_{i}\sim\mathcal{N}(0,1)italic_ϵ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∼ caligraphic_N ( 0 , 1 ) as follows [79]:

zi=μi+σiϵisubscript𝑧𝑖subscript𝜇𝑖subscript𝜎𝑖subscriptitalic-ϵ𝑖z_{i}=\mu_{i}+\sigma_{i}\epsilon_{i}italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_μ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_ϵ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT (12)

Under this framework, the ELBO can be differentiated and optimized with respect to the parameters θ𝜃\thetaitalic_θ and ϕitalic-ϕ\phiitalic_ϕ. Specifically, the ELBO can be maximized using gradient based methods. This approach allows for considerable flexibility in the design of both the probabilistic encoder and the probabilistic decoder.

3.3 SincVAE

This work proposes SincVAE, which integrates SincNet as the first layer in the VAE’s probabilistic encoder. This model is specifically designed to enhance anomaly detection in time series data. In particular, this design is aimed at learning an ad-hoc array of bandpass filters, facilitating the decomposition of the input time series and enhancing the feature extraction within the time series processing.

The reconstruction error, computed in terms of Mean Squared Error (MSE) [27], between the input data and its reconstruction is involved into the classification process given a threshold t𝑡titalic_t. A graphical representation of the proposed pipeline is shown in Figure 1.

Refer to caption
Figure 1: Graphical representation of the SincVAE architecture. An input time series x𝑥xitalic_x is given as input to the VAE that generates its reconstruction x^^𝑥\hat{x}over^ start_ARG italic_x end_ARG. The first layer of the VAE’s probabilistic encoder employs the SincNet layer, that filters the data by using an ad-hoc array of bandpass filters learned during the training stage. Then, both the input time series and its reconstruction are used to compute the reconstruction error (in this picture, illustrated in terms of MSE). Finally, a threshold t𝑡titalic_t is involved to classify the input time series as anomalous or not.

This work explores the impact of SincVAE on anomaly detection challenges within the healthcare sector, with a focus on seizure detection in EEG data. It is expected the integration of SincNet within the VAE framework in the seizure detection problem can enhance the learning of normal patterns within EEG data, thus leading to a more robust identification of anomalous patterns. The SincVAE’s ability to learn custom bandpass filters could not only to boost the overall efficacy of the seizure detection process but also to potentially reduce or even eliminate the preprocessing currently required to identify band frequencies where informative content is more present.

4 Experimental Assessment

In this work, the effectiveness of SincVAE for seizure detection was analyzed through experimental results obtained from two datasets, i.e. the Bonn dataset [80] and the CHB-MIT dataset [81]. These datasets have been widely utilized in numerous studies, including [82, 83, 84, 85, 86, 87, 88, 70, 89]. The seizure detection problem was tackled with a semi-supervised approach, thus models were trained exclusively on non-seizure data. The classification of data into seizure or non-seizure categories is determined by analyzing the reconstruction error generated by the model.

For a robust comparative analysis, the experiments employed a fixed base VAE architecture in two distinct configurations: (i) the VAE with a SincNet layer on top of the encoder network, referred as SincVAE, and (ii) the VAE without the SincNet layer, referred as VAE. Specifically, the AE-CDNN architecture detailed in [69] was adopted, with modifications applied to the specific needs of VAE operations, particularly in the latent space and training configurations. This methodological choice facilitates a clear examination of the SincNet layer’s contribution to the VAE’s performance. In accordance with [50], the SincNet layer was followed by a Layer Normalization and an activation function.

The model selection stage, performed separately on each dataset, was aimed at optimizing the hyperparameters of the SincNet layer to enhance its effectiveness in seizure detection. Simultaneously, the latent space was tuned and consistently applied on both the VAE and SincVAE models. The best hyperparameter set was chosen to strike an optimal balance between model complexity and inference speed, i.e. by reducing the number of parameters while maintaining robust performance.

Grid search [90] was used as the method for automatic hyperparameter tuning, employing specific search spaces outlined in the respective dataset sections. The Adam optimizer was selected for weights optimization, with a learning rate of 0.00050.00050.00050.0005. Data were processed in random batches of 128128128128 samples each. The training stage was limited to a maximum of 1000 epochs, and early stop** [91] was used as convergence criterion, with a patience of 20 epochs. These hyperparameters were established through manual preliminary assessment and fixed through all the experiments.

4.1 The Bonn Dataset

Dataset description.

The Bonn dataset [80], acquired by the Bonn University in Germany, consists in five distinct collections of EEG signals. Each collection includes 100 single-channel EEG segments, each lasting 23.6 seconds, derived from continuous multi-channel EEG recordings. These segments were selected after a visual inspection to eliminate artifacts. Each EEG segment was captured using a 128-channel amplifier system paired with a 12-bit analog to digital converter at a sampling rate of 173.61 Hztimes173.61Hz173.61\text{\,}\mathrm{H}\mathrm{z}start_ARG 173.61 end_ARG start_ARG times end_ARG start_ARG roman_Hz end_ARG. The dataset is organized into five groups:

  • Sets A and B contain EEG recordings from healthy volunteers. In particular, Set A contains data acquired during eyes-open condition, and Set B data acquired during eyes-closed conditions;

  • Sets C and D consist of interictal EEG signals from patients post-successful epilepsy surgery. Signals in Set C were recorded from the hippocampal formation opposite the epileptogenic zone, and Set D from within the epileptogenic zone;

  • Set E contains only ictal segments.

Since this work focuses on the seizure detection as a binary anomaly detection problem, only Sets A, B, and E were utilized for analysis, following the methodology outlined in Table 1 of [88]. Specifically, the efficacy of the proposed method was tested through the following comparisons:

  1. 1.

    Set A vs Set E

  2. 2.

    Set B vs Set E

It is important to remark that the proposed method exclusively utilizes non-seizure data (Sets A and B) during the training stages.

Data preprocessing.

The final 20 % of each training dataset is kept apart for testing the models on non-seizure data instances. Then, the data were segmented into one-second frames, resulting in 1840 samples each with a length of 173 data points. Subsequently, the dataset underwent filtering with a 40 Hz low-pass filter following [80]. Z-score normalization [92] was then applied. A summary of the Bonn dataset, including details on the preprocessing and data partitioning, is presented in Table 1.

Case Training set Test set
Set A vs Set E 1840 2760
Set B vs Set E 1840 2760
Table 1: Details on the training and test sets sizes involved in the experimental session on the Bonn dataset.

Model selection.

The model selection stage was performed through the search space defined in Table 2.

Hyperparameter Search Space
Kernel Size {3, 5, 7} \bigcup {11, 21, …, 131}
Filters {2n},1n9superscript2𝑛1𝑛9\{2^{n}\},1\leq n\leq 9{ 2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT } , 1 ≤ italic_n ≤ 9
Activation Function { ReLU, Tanh, Identity }
Latent Space Dimension {2n},3n7superscript2𝑛3𝑛7\{2^{n}\},3\leq n\leq 7{ 2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT } , 3 ≤ italic_n ≤ 7
Table 2: Search spaces for the grid search conducted during the model selection process on the Bonn dataset.

The total number of configuration explored was 2295. 10-fold Cross Validation [61] was employed as the resampling method, with the mean MSE across the 10 folds used as the criterion for selecting the best architecture. In this setup, for each fold of the Cross Validation, 20 % of the training data was sampled and designated as the validation set. This analysis was done separately on the Set A and Set B. For the sake of clarity, the methodology in this paragraph is explained through the results obtained on the Set A only.

From the Cross Validation analysis, it was observed that the configuration with the lowest mean MSE across the folds showed a distribution of results that overlapped with several other configurations . This indicated that while this configuration performed well in terms of MSE, it was not distinctly superior to others. Thus, other analyses were done.

Initially, configurations with a mean MSE that did not fall within one standard deviation of the lowest mean MSE were excluded from further analysis. This filtering process narrowed down the selection to the top 244 configurations.

Then, the normality of the results for each configuration was evaluated using the Shapiro-Wilk test [93]. By setting the significance level at α=0.05𝛼0.05\alpha=0.05italic_α = 0.05, it was found that 42 out of the 244 configurations yielded a p-value lower than α𝛼\alphaitalic_α, indicating that their result distributions do not conform to a normal distribution.

Consequently, the Kruskal-Wallis test [94] was applied to compare all configurations, using the same significance level α=0.05𝛼0.05\alpha=0.05italic_α = 0.05. This test produced a p-value lower than α𝛼\alphaitalic_α, confirming significant differences among the distributions of the configurations.

Then, a pairwise comparison of the remaining configurations was conducted using the Mann-Whitney U-test [95]. The results are shown in Figure 2, where the 244 configurations are ordered in ascending order of mean MSE over the folds.

Refer to caption
Figure 2: Graphical representation of the Mann-Whitney U-test of multiple configurations over the best 244 configuration on Set A. The left heatmap details pairwise comparisons between configurations, with color intensity reflecting the p-value magnitude. The right heatmap displays these p-values thresholded by the significance level α=0.05𝛼0.05\alpha=0.05italic_α = 0.05; red denotes p-values below α𝛼\alphaitalic_α, highlighting statistically significant differences, whereas gray indicates p-values above α𝛼\alphaitalic_α, indicating non-significant differences between configurations. In both of the plots, configurations are represented by their ID.

Assuming a significance level of α=0.05𝛼0.05\alpha=0.05italic_α = 0.05, this analysis was designed to identify configurations that do not significantly differ from the one exhibiting the lowest mean MSE across the 10 folds. Configurations yielding a p-value lower than α𝛼\alphaitalic_α in these comparisons were excluded. This process effectively isolated the top 131 configurations that demonstrated no significant difference in MSE performance compared to the best-performing configuration.

As stated above, the best configuration was chosen prioritizing the lowest model complexity among these 131 configurations. Thus, the latent space was first fixed to 32, the Identity activation function was chosen as activation function of the SincNet layer, and 16 filters were selected with a kernel size of 41.

As mentioned before, the same procedure was applied to Set B. The analyses resulted in similar results and to the same configuration.

4.2 The CHB-MIT Dataset

Dataset description.

The CHB-MIT dataset [81] was acquired by the Children’s Hospital Boston (CHB) and the Massachusetts Institute of Technology (MIT). It features EEG recordings from pediatric patients diagnosed with intractable seizures. These patients were observed over multiple days following the discontinuation of anti-seizure medications, to assess their seizure activity and evaluate their suitability for surgical treatment. The dataset includes recordings from 24 subjects (see Table 3 for subjects’ details).

Subject Gender Age # Tracks # Seizures Tracks
1 F 11 42 7
2 M 11 36 3
3 F 14 38 7
4 M 22 42 3
5 F 7 39 5
6 F 1.5 18 7
7 F 14.5 19 3
8 M 3.5 20 5
9 F 10 19 3
10 M 3 25 7
11 F 12 35 3
12 F 2 24 13
13 F 3 33 8
14 F 9 26 7
15 M 16 40 14
16 F 7 19 7
17 F 12 21 3
18 F 18 36 6
19 F 19 30 3
20 F 6 29 6
21 F 13 33 4
22 F 9 31 3
23 F 6 9 3
24 N/A N/A 22 12
Table 3: Summary of CHB-MIT patients data

All EEG signals in the CHB-MIT dataset were recorded at a sampling rate of 256 Hztimes256Hz256\text{\,}\mathrm{H}\mathrm{z}start_ARG 256 end_ARG start_ARG times end_ARG start_ARG roman_Hz end_ARG with a 16-bit resolution. These recordings conform to the International 10-20 system for EEG electrode placement and naming conventions. The number of channels recorded varied among subjects, with a minimum of 23 channels used. In the cases of subjects 4 and 9, the dataset includes additional channels for ECG and vagal nerve stimulus. In this work, the experiments are based only the data acquired from the 23 channels used by all the subjects. Subjects’ recordings are organized into tracks, which are labeled according to whether they contain seizure activity. For tracks with seizures, the specific time intervals of the seizure occurrences are meticulously documented111For further details on the CHB-MIT dataset, see https://physionet.org/content/chbmit/1.0.0/.

Data preprocessing

Data were segmented into frames of one second of length 256 with 23 channels. Then, the dataset was filtered using a bandpass filter of 0.5 Hz to 25 Hzrangetimes0.5Hztimes25Hz0.5\text{\,}\mathrm{H}\mathrm{z}25\text{\,}\mathrm{H}\mathrm{z}start_ARG start_ARG 0.5 end_ARG start_ARG times end_ARG start_ARG roman_Hz end_ARG end_ARG to start_ARG start_ARG 25 end_ARG start_ARG times end_ARG start_ARG roman_Hz end_ARG end_ARG, following [22]. Following this, Z-score normalization was applied. Finally, for each subject, a random sample consisting of 1-second windows totaling 10 minutes was drawn from the non-seizure tracks to test the models’ ability to detect non-seizure cases. The remaining data were used to train the models. Details on the CHB-MIT dataset are provided in Table 4.

Subject Training set Test set
1 114230 1042
2 110969 691
3 103777 1002
4 107686 760
5 114578 1158
6 156677 669
7 181777 925
8 46187 1519
9 208250 726
10 114604 1047
11 107368 1406
12 60628 664
13 133796 815
14 64182 727
15 89375 1944
16 38989 638
17 35390 893
18 96573 917
19 85776 836
20 78707 741
21 96572 799
22 92985 804
23 68109 713
24 38989 929
Table 4: Details on the training and test sets extracted from the CHB-MIT dataset for each subject.

Model selection.

The model selection procedure for the CHB-MIT dataset follows the methodology applied to the Bonn dataset, with a critical adjustment to accommodate the specific structure of the CHB-MIT dataset. Each subject in the CHB-MIT dataset is associated with multiple tracks, each approximately one hour in length. Thus, the 10-fold Cross Validation was substituted with a leave-one-out strategy tailored for this context, that will be referred as Leave-One-Track-Out. In this approach, five randomly selected non-seizure tracks are used; four tracks are employed for training the model, and the remaining track serves as the test set. To maintain clarity and focus in the analysis, detailed results will be provided exclusively for Subject 1.

The search space involved in this stage is reported in Table 5.

Hyperparameter Search Space
Kernel Size {71, 81, 111, 131, 151}
Filters {2n},2n8superscript2𝑛2𝑛8\{2^{n}\},2\leq n\leq 8{ 2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT } , 2 ≤ italic_n ≤ 8
Activation Function { ReLU, Identity }
Latent Space Dimension {2n},5n7superscript2𝑛5𝑛7\{2^{n}\},5\leq n\leq 7{ 2 start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT } , 5 ≤ italic_n ≤ 7
Table 5: Search spaces for the grid search conducted during the model selection process on the CHB-MIT dataset.

To manage the increased computational effort of the model selection stage, necessitated by the larger dataset size, the search space was limited based on insights gained from the Bonn dataset analysis. Specifically, only larger kernel sizes were chosen, and the Tanh activation function was excluded due to its observed negative impact on performance. In particular, the total number of configuration explored was 210.

Similar to the results obtained on the Bonn dataset, the configuration with the lowest mean MSE across the tracks showed notable overlap with several other configurations. Thus, only those configurations whose mean MSE falls within one standard deviation of the lowest mean MSE were retained for further analysis. After this filtering, the top 67 configurations were selected for further detailed analysis.

By applying the Shapiro-Wilk test to the results of each configuration, and using a significance level of α=0.05𝛼0.05\alpha=0.05italic_α = 0.05, none of the configurations rejected the null hypothesis, suggesting that the data from all configurations can be considered normally distributed. Consequently, an ANOVA test [96] was conducted using the same significance level. This test also did not reject the null hypothesis, indicating that there were no significant differences among the configurations. This result implies that the performance across different configurations is statistically comparable under the conditions tested.

Also for this dataset, the best configuration was identified from the 67 configurations selected by prioritizing a low model complexity. Thus, the latent space was fixed to 128, The Identity function was chosen as activation function, and 4 filters were used with a kernel size of 71 was chosen.

Interestingly, the analyses conducted on the first five subjects, under identical conditions, led to the identification of the same optimal configuration using the same selection criterion. As a result, this configuration was adopted for subsequent experiments across all subjects.

5 Results

In this section, the results of the seizure detection experiments conducted on both the Bonn and CHB-MIT datasets are presented. For each dataset, the models were trained using the configurations identified during their respective model selection stages. During each training phase, 20 % of the training data was randomly sampled and used as the validation set.

It is reminded that for each results, the model without the SincNet layer is referred to as VAE; whereas, the model incorporating a SincNet layer on top the encoder network, which is the proposal of this work, is referred to as SincVAE. It is crucial to note that the primary focus of these experiments is to integrate SincNet into the VAE framework and evaluate its effectiveness in the tackled problem, rather than achieving new state-of-the-art results on the datasets used.

5.1 The Bonn Dataset

The results of the seizure detection for both declared comparisons, i.e. Set A vs Set E and Set B vs Set E, will be discussed. In both scenarios, the test set includes 2760 samples, which exhibits a significant imbalance with 460 samples labeled as non-seizure and 2300 labeled as seizure. This disparity was taken in account during the evaluations.

Refer to caption
Refer to caption
Refer to caption
Figure 3: Graphical representation of seizure (top-left) and non-seizure (top-right) test data, and validation (bottom) MSE distribution for SincVAE and VAE. Non-seizure test data are drawn from Set A.

Figure 3 shows the MSE distributions for the seizure data (top-left) and non-seizure data from (top-right) in the test sets, as well as for the validation data (bottom), comparing both the VAE and SincVAE models. The summary statistics for these MSE distributions are detailed in Table 6.

Method Seizure Test MSE Non-Seizure Test MSE Validation Set MSE
Mean Min Max Mean Min Max Mean Min Max
SincVAE 43.149±42.930plus-or-minus43.14942.93043.149\pm 42.93043.149 ± 42.930 0.1700.1700.1700.170 253.480253.480253.480253.480 0.208±0.121plus-or-minus0.2080.1210.208\pm 0.1210.208 ± 0.121 0.0600.0600.0600.060 1.1001.1001.1001.100 0.194±0.123plus-or-minus0.1940.1230.194\pm 0.1230.194 ± 0.123 0.0480.0480.0480.048 0.9920.9920.9920.992
VAE 10.663±15.315plus-or-minus10.66315.31510.663\pm 15.31510.663 ± 15.315 0.1060.1060.1060.106 123.913123.913123.913123.913 0.137±0.064plus-or-minus0.1370.0640.137\pm 0.0640.137 ± 0.064 0.0360.0360.0360.036 0.4060.4060.4060.406 0.128±0.084plus-or-minus0.1280.0840.128\pm 0.0840.128 ± 0.084 0.0280.0280.0280.028 0.6510.6510.6510.651
Table 6: Summary statistics of the distributions shown in Figure 3.

It is observed that, on non-seizure test data, the SincVAE model exhibits a mean MSE of 0.208±0.121plus-or-minus0.2080.1210.208\pm 0.1210.208 ± 0.121, while the VAE shows a slightly lower mean MSE of 0.137±0.064plus-or-minus0.1370.0640.137\pm 0.0640.137 ± 0.064. Both results align with the statistics from the validation set.

Conversely, on the seizure test data, SincVAE shows a mean MSE of 43.149±42.930plus-or-minus43.14942.93043.149\pm 42.93043.149 ± 42.930, indicating significant variability among cases, whereas VAE presents a lower mean MSE of 10.663±15.315plus-or-minus10.66315.31510.663\pm 15.31510.663 ± 15.315. SincVAE’s larger discrepancy between its seizure and non-seizure MSE values could indicate a better distinction between the seizure and non-seizure conditions, which could ease the threshold-based classification. In contrast, the narrower MSE range of VAE could make this differentiation more challenging. Similar conclusions can be drawn on Set B (see Figure 4 and Table 7).

Refer to caption
Refer to caption
Refer to caption
Figure 4: Graphical representation of seizure (top-left) and non-seizure (top-right) test data, and validation (bottom) MSE distribution for SincVAE and VAE. Non-seizure test data are drawn from Set B.
Method Seizure Test MSE Non-Seizure Test MSE Validation Set MSE
Mean Min Max Mean Min Max Mean Min Max
SincVAE 18.534±19.078plus-or-minus18.53419.07818.534\pm 19.07818.534 ± 19.078 0.0920.0920.0920.092 114.236114.236114.236114.236 0.175±0.086plus-or-minus0.1750.0860.175\pm 0.0860.175 ± 0.086 0.0450.0450.0450.045 0.6760.6760.6760.676 0.152±0.086plus-or-minus0.1520.0860.152\pm 0.0860.152 ± 0.086 0.0450.0450.0450.045 0.9370.9370.9370.937
VAE 4.511±6.197plus-or-minus4.5116.1974.511\pm 6.1974.511 ± 6.197 0.0700.0700.0700.070 56.84856.84856.84856.848 0.121±0.052plus-or-minus0.1210.0520.121\pm 0.0520.121 ± 0.052 0.0380.0380.0380.038 0.3950.3950.3950.395 0.098±0.056plus-or-minus0.0980.0560.098\pm 0.0560.098 ± 0.056 0.0230.0230.0230.023 0.3420.3420.3420.342
Table 7: Summary statistics of the distributions shown in Figure 4.

To assess the classification performance of the models, a classification threshold must be determined. This threshold is typically based on the MSE values derived from the trained model on the validation or training set, as suggested by various studies [97, 98, 99, 100], and tailored to meet specific user requirements. For this study, aiming to reduce the number of false positives, the classification threshold was selected using the following criteria:

  • The maximum MSE from the validation set, defined as t1subscript𝑡1t_{1}italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT;

  • The 95th percentile of validation MSE, defined as t2subscript𝑡2t_{2}italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT;

Figure 5 shows the confusion matrices for both SincVAE and VAE, evaluated under the two thresholds t1subscript𝑡1t_{1}italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and t2subscript𝑡2t_{2}italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, on the test data for the Set A vs Set E case.

Threshold set to t1subscript𝑡1t_{1}italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT Threshold set to t2subscript𝑡2t_{2}italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT
SincVAE
VAE
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 5: Classification results shown as confusion matrices of SincVAE (first row) and VAE (second row) under the two selected thresholds t1subscript𝑡1t_{1}italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT (first column) and t2subscript𝑡2t_{2}italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT (second column) on the test data of the case Set A vs Set E.

Both models demonstrate to correctly identificate non-seizure instances, as reflected by high values of true negatives across both thresholds. VAE slightly outperforms SincVAE in this aspect, primarily due to its tendency to generate fewer false positives. Notably, SincVAE shows fewer false negatives compared to the VAE model. This suggests that SincVAE may be more precise in predicting seizures, aligning with the observations made in 6. Table 8 shows F1, precision and recall metrics [61] related to this set of experiments.

Method Threshold set to t1subscript𝑡1t_{1}italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT Threshold set to t2subscript𝑡2t_{2}italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT
F1 Precision Recall F1 Precision Recall
SincVAE 0.998 1 0.996 0.994 0.989 1
VAE 0.956 1 0.917 0.992 0.992 0.992
Table 8: Classification metrics related to the results of SincVAE (first row) and VAE (second row) under the two selected thresholds t1subscript𝑡1t_{1}italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT (first column) and t2subscript𝑡2t_{2}italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT (second column) on the test data of the case Set A vs Set E.

The performance of SincVAE appears to be superior to that of VAE on this experimental case: SincVAE demonstrates higher recall, crucial for ensuring no seizure goes undetected, and F1 scores, indicating it is the more reliable for the seizure detection across the chosen thresholds.

Threshold set to t1subscript𝑡1t_{1}italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT Threshold set to t2subscript𝑡2t_{2}italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT
SincVAE
VAE
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 6: Confusion matrices obtained from the classification results of SincVAE (first row) and VAE (second row) under the two selected thresholds t1subscript𝑡1t_{1}italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT (first column) and t2subscript𝑡2t_{2}italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT (second column) on the case Set B vs Set E.

Regarding the case Set B vs Set E, Table 9 shows the same classification metrics, while Figure 6 shows the confusion matrices. In this case, SincVAE consistently maintains high performance across both thresholds, demonstrating a lower number of both false positives and false negatives. This indicates a robust ability of SincVAE to accurately differentiate between seizure and non-seizure events.

Method Threshold set to t1subscript𝑡1t_{1}italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT Threshold set to t2subscript𝑡2t_{2}italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT
F1 Precision Recall F1 Precision Recall
SincVAE 0.967 1 0.936 0.989 0.985 0.994
VAE 0.948 1 0.902 0.980 0.980 0.979
Table 9: Classification metrics related to the results of SincVAE (first row) and VAE (second row) under the two selected thresholds t1subscript𝑡1t_{1}italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT (first column) and t2subscript𝑡2t_{2}italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT (second column) on the case Set B vs Set E.

The SincVAE model consistently outperforms in terms of F1 score and recall across both scenarios and thresholds, indicating robust predictive capabilities, particularly in minimizing missed seizures, which could be a critical aspect for clinical applications, due to its impact on enhancing patient safety and treatment efficacy.

Figure 7 displays the EEG recording reconstructions for three randomly selected non-seizure samples from Set A, using both the SincVAE and VAE models. The reconstructions of these non-seizure samples appear similarly accurate across both networks, suggesting that each model is effectively capturing and replicating EEG patterns associated with non-seizure states.

Refer to caption
Figure 7: Graphical representation of three randomly selected non-seizure samples drawn from Set A, reconstructed using SincVAE (first row) and VAE (second row). In each plot, the original input is represented by a solid blue line, while its reconstruction is represented by a dashed orange line.

Figure 8 shows the EEG signal reconstructions performed by both the VAE and SincVAE models on seizure samples from Set E. Notably, the VAE model exhibits superior reconstruction fidelity for these seizure signals compared to SincVAE, which results in a higher number of false negatives. Consistent with the classification results, this difference in reconstruction quality suggests varying performance between the two models in processing seizure-related EEG data. The SincVAE’s difficulty in reconstructing seizure signals, suggests its ability to distinguish between normal and seizure states, thereby establishing it as a more reliable solution for seizure detection.

Refer to caption
Figure 8: Graphical representation of three randomly selected seizure samples drawn from Set E, reconstructed using SincVAE (first row) and VAE (second row). In each plot, the original input is represented by a solid blue line, while its reconstruction is represented by a dashed orange line.

5.2 The CHB-MIT Dataset

The analyses conducted for the CHB-MIT dataset followed the binary semi-supervised classification approach used for the Bonn dataset. Specifically, the training dataset included only non-seizure data, and to evaluate the seizure class detection, only windows that occurred during the ictal periods, as denoted by the dataset’s authors, were included in the test. Thus, preictal and postictal periods were excluded from the analyses.

Based on the findings from the experiments with the Bonn dataset, the analyses on the CHB-MIT dataset will be presented in terms of F1 score, precision, and recall performances of both the SincVAE and VAE models for each subject in the CHB-MIT dataset. The decision threshold was chosen to be unique and equal to the 95th percentile of the validation set’s MSE for all subjects Results are shown in Table 10.

Subject F1 Precision Recall
SincVAE VAE SincVAE VAE SincVAE VAE
1 0.919 0.918 0.900 0.909 0.939 0.928
2 0.756 0.708 0.651 0.627 0.901 0.813
3 0.950 0.943 0.927 0.922 0.975 0.965
4 0.718 0.645 0.644 0.617 0.812 0.675
5 0.956 0.934 0.946 0.942 0.966 0.927
6 0.108 0.069 0.208 0.167 0.072 0.043
7 0.962 0.931 0.981 0.980 0.945 0.886
8 0.768 0.799 0.808 0.847 0.731 0.756
9 0.849 0.923 0.747 0.896 0.984 0.952
10 0.959 0.878 0.998 0.994 0.924 0.785
11 0.975 0.968 0.986 0.987 0.964 0.949
12 0.175 0.161 0.438 0.304 0.109 0.109
13 0.157 0.165 0.704 0.714 0.088 0.093
14 0.308 0.308 0.509 0.509 0.220 0.220
15 0.660 0.606 0.926 0.922 0.513 0.452
16 0.055 0.053 0.042 0.040 0.079 0.079
17 0.207 0.096 0.944 0.833 0.116 0.051
18 0.853 0.854 0.946 0.950 0.776 0.776
19 0.859 0.881 0.779 0.828 0.958 0.941
20 0.698 0.670 0.872 0.884 0.582 0.539
21 0.266 0.091 0.547 0.500 0.176 0.050
22 0.927 0.902 0.953 0.951 0.902 0.858
23 0.817 0.854 0.895 0.946 0.752 0.779
24 0.670 0.683 0.708 0.752 0.635 0.626
Table 10: The F1, precision and recall results for detecting seizure and non-seizure EEG recordings on the CHB-MIT dataset for each subject. Classification employed a fixed threshold at the 95th percentile of the validation set’s MSE.

Observing the F1 scores, SincVAE performs comparably or slightly better than VAE for the majority of the subjects in the CHB-MIT dataset. Specifically, for subjects 1, 2, 3, 4, 5, 6, 7, 10, 11, 12, 14, 15, 16, 17, 20, 21, and 22, SincVAE either matches or exceeds the performance of VAE. Notably, there are cases where VAE has an higher F1 score than SincVAE, specifically for subjects 8, 9, 13, 18, 19, 23, and 24. These variations suggest that VAE might be better suited for certain cases.

It’s noteworthy that the F1 scores for some subjects, such as 6 and 13, are low, which may be influenced by factors such as the inherent quality of the EEG recordings, the need of specific data preprocessing or the need of more sophisticated model architectures. However, the main aim of this study is to explore the effectiveness of integrating a SincNet layer within a VAE framework, rather than establishing new state-of-the-art results on the CHB-MIT dataset. Thus, discussions on specific architectural enhancements to improve EEG data quality are outside the scope of this work, which focuses instead on assessing the added value of the SincNet layer within the VAE architecture.

As stated above, the seizure tracks in the CHB-MIT dataset are included of preictal, ictal, and postictal phases. Figure 9 shows the MSE values obtained for each second of track number 19 from subject 9, a track identified by the dataset’s authors as including a seizure event from seconds 5299 to 5361. These specific seconds are visually demarcated with two dashed vertical blue lines. The horizontal red line on the plot indicates the decision threshold, set at the 95th percentile of the MSE values obtained from the validation set. Blue dots on the plot represent EEG recordings that have been classified as non-seizure, while red dots indicate those classified as seizure.

Refer to caption
Figure 9: Graphical representation of MSE values obtained using SincVAE and VAE for each second on the track number 19 of the subject 9. The two vertical dashed blue lines indicate, from left to right, the ictal phase annotated by the dataset authors. In both of the plots, the horizontal red line indicate the decision threshold used to classify the EEG recordings. The horizontal red line represents the decision threshold, set at the 95th percentile of the MSE values from the validation sets. Blue dots indicate EEG recordings classified as non-seizure, while the red dots indicate EEG recordings classified as seizure.

It can be noted that several recordings are classified as anomalous by both of the models either in the preictal and in postictal phase. Figure 10 shows the percentage of anomalous point detected by SincVAE and VAE in the preictal, ictal and postictal phases.

Refer to caption
Figure 10: Graphical representation of the detection rates of anomalies by SincVAE and VAE on the track number 19 of the subject 9. The detection is shown across Preictal, Ictal, Postictal, and Interictal phases.

Both models exhibit detect seizures during the ictal phase with a high rate, and demonstrate effectiveness in recognizing the interictal phase with a low rate of false positives. It is interesting to notice that both of the models are able to recognize anomalies during the preictal and postictal phases.

Specifically, SincVAE identifies 18.04 % of anomalies in the preictal phase compared to 7.49 % by VAE, and it detects 50 % of anomalies in the postictal phase, against the 27.81 % by VAE. This enhanced capability to detect anomalies during the preictal and postictal phases suggests that SincVAE could be particularly advantageous for applications that require an early warning system for seizures and consistent monitoring during the recovery phase following a seizure.

Subject Preictal Postictal
SincVAE VAE SincVAE VAE
1 1.91±1.48plus-or-minus1.911.481.91\pm 1.481.91 ± 1.48 1.94±1.87plus-or-minus1.941.87\mathbf{1.94\pm 1.87}bold_1.94 ± bold_1.87 2.27±2.67plus-or-minus2.272.67\mathbf{2.27\pm 2.67}bold_2.27 ± bold_2.67 1.55±0.64plus-or-minus1.550.641.55\pm 0.641.55 ± 0.64
2 6.39±0.23plus-or-minus6.390.23\mathbf{6.39\pm 0.23}bold_6.39 ± bold_0.23 5.1±1.25plus-or-minus5.11.255.1\pm 1.255.1 ± 1.25 35.48±8.41plus-or-minus35.488.4135.48\pm 8.4135.48 ± 8.41 35.99±9.71plus-or-minus35.999.71\mathbf{35.99\pm 9.71}bold_35.99 ± bold_9.71
3 3.41±3.77plus-or-minus3.413.773.41\pm 3.773.41 ± 3.77 3.5±3.82plus-or-minus3.53.82\mathbf{3.5\pm 3.82}bold_3.5 ± bold_3.82 9.11±5.76plus-or-minus9.115.76\mathbf{9.11\pm 5.76}bold_9.11 ± bold_5.76 8.78±5.41plus-or-minus8.785.418.78\pm 5.418.78 ± 5.41
4 54.07±40.03plus-or-minus54.0740.0354.07\pm 40.0354.07 ± 40.03 54.16±39.96plus-or-minus54.1639.96\mathbf{54.16\pm 39.96}bold_54.16 ± bold_39.96 6.21±0.98plus-or-minus6.210.98\mathbf{6.21\pm 0.98}bold_6.21 ± bold_0.98 6.07±0.72plus-or-minus6.070.726.07\pm 0.726.07 ± 0.72
5 5.14±4.64plus-or-minus5.144.645.14\pm 4.645.14 ± 4.64 5.33±4.42plus-or-minus5.334.42\mathbf{5.33\pm 4.42}bold_5.33 ± bold_4.42 4.63±1.7plus-or-minus4.631.7\mathbf{4.63\pm 1.7}bold_4.63 ± bold_1.7 4.06±1.28plus-or-minus4.061.284.06\pm 1.284.06 ± 1.28
6 3.89±4.38plus-or-minus3.894.38\mathbf{3.89\pm 4.38}bold_3.89 ± bold_4.38 3.38±4.35plus-or-minus3.384.353.38\pm 4.353.38 ± 4.35 14.79±21.39plus-or-minus14.7921.39\mathbf{14.79\pm 21.39}bold_14.79 ± bold_21.39 13.32±21.96plus-or-minus13.3221.9613.32\pm 21.9613.32 ± 21.96
7 4.34±1.16plus-or-minus4.341.16\mathbf{4.34\pm 1.16}bold_4.34 ± bold_1.16 3.36±1.11plus-or-minus3.361.113.36\pm 1.113.36 ± 1.11 15.55±11.59plus-or-minus15.5511.59\mathbf{15.55\pm 11.59}bold_15.55 ± bold_11.59 13.88±10.45plus-or-minus13.8810.4513.88\pm 10.4513.88 ± 10.45
8 21.62±9.76plus-or-minus21.629.76\mathbf{21.62\pm 9.76}bold_21.62 ± bold_9.76 17.71±9.01plus-or-minus17.719.0117.71\pm 9.0117.71 ± 9.01 19.2±24.17plus-or-minus19.224.1719.2\pm 24.1719.2 ± 24.17 19.94±24.28plus-or-minus19.9424.28\mathbf{19.94\pm 24.28}bold_19.94 ± bold_24.28
9 18.18±0.14plus-or-minus18.180.14\mathbf{18.18\pm 0.14}bold_18.18 ± bold_0.14 9.96±2.46plus-or-minus9.962.469.96\pm 2.469.96 ± 2.46 27.78±22.22plus-or-minus27.7822.22\mathbf{27.78\pm 22.22}bold_27.78 ± bold_22.22 14.76±13.05plus-or-minus14.7613.0514.76\pm 13.0514.76 ± 13.05
10 10.82±14.71plus-or-minus10.8214.71\mathbf{10.82\pm 14.71}bold_10.82 ± bold_14.71 9.72±13.76plus-or-minus9.7213.769.72\pm 13.769.72 ± 13.76 10.01±16.37plus-or-minus10.0116.37\mathbf{10.01\pm 16.37}bold_10.01 ± bold_16.37 9.62±15.75plus-or-minus9.6215.759.62\pm 15.759.62 ± 15.75
11 21.09±10.45plus-or-minus21.0910.45\mathbf{21.09\pm 10.45}bold_21.09 ± bold_10.45 18.96±9.55plus-or-minus18.969.5518.96\pm 9.5518.96 ± 9.55 8.88±9.13plus-or-minus8.889.13\mathbf{8.88\pm 9.13}bold_8.88 ± bold_9.13 8.82±9.17plus-or-minus8.829.178.82\pm 9.178.82 ± 9.17
12 2.38±1.77plus-or-minus2.381.77\mathbf{2.38\pm 1.77}bold_2.38 ± bold_1.77 2.21±1.75plus-or-minus2.211.752.21\pm 1.752.21 ± 1.75 5.85±0.27plus-or-minus5.850.27\mathbf{5.85\pm 0.27}bold_5.85 ± bold_0.27 5.57±1.97plus-or-minus5.571.975.57\pm 1.975.57 ± 1.97
13 2.09±1.65plus-or-minus2.091.65\mathbf{2.09\pm 1.65}bold_2.09 ± bold_1.65 1.95±1.82plus-or-minus1.951.821.95\pm 1.821.95 ± 1.82 2.19±2.24plus-or-minus2.192.242.19\pm 2.242.19 ± 2.24 2.53±2.5plus-or-minus2.532.5\mathbf{2.53\pm 2.5}bold_2.53 ± bold_2.5
14 6.14±2.81plus-or-minus6.142.81\mathbf{6.14\pm 2.81}bold_6.14 ± bold_2.81 6.25±3.19plus-or-minus6.253.196.25\pm 3.196.25 ± 3.19 5.76±2.39plus-or-minus5.762.39\mathbf{5.76\pm 2.39}bold_5.76 ± bold_2.39 6.34±2.95plus-or-minus6.342.956.34\pm 2.956.34 ± 2.95
15 14.14±19.81plus-or-minus14.1419.81\mathbf{14.14\pm 19.81}bold_14.14 ± bold_19.81 12.7±17.86plus-or-minus12.717.8612.7\pm 17.8612.7 ± 17.86 6.26±7.78plus-or-minus6.267.78\mathbf{6.26\pm 7.78}bold_6.26 ± bold_7.78 6.11±7.52plus-or-minus6.117.526.11\pm 7.526.11 ± 7.52
16 7.25±4.53plus-or-minus7.254.537.25\pm 4.537.25 ± 4.53 7.78±4.79plus-or-minus7.784.79\mathbf{7.78\pm 4.79}bold_7.78 ± bold_4.79 7.2±4.57plus-or-minus7.24.577.2\pm 4.577.2 ± 4.57 7.53±4.81plus-or-minus7.534.81\mathbf{7.53\pm 4.81}bold_7.53 ± bold_4.81
17 2.72±1.41plus-or-minus2.721.41\mathbf{2.72\pm 1.41}bold_2.72 ± bold_1.41 2.47±1.45plus-or-minus2.471.452.47\pm 1.452.47 ± 1.45 0.76±0.72plus-or-minus0.760.72\mathbf{0.76\pm 0.72}bold_0.76 ± bold_0.72 0.43±0.41plus-or-minus0.430.410.43\pm 0.410.43 ± 0.41
18 6.61±5.14plus-or-minus6.615.14\mathbf{6.61\pm 5.14}bold_6.61 ± bold_5.14 6.36±5.16plus-or-minus6.365.166.36\pm 5.166.36 ± 5.16 25.17±16.73plus-or-minus25.1716.73\mathbf{25.17\pm 16.73}bold_25.17 ± bold_16.73 23.6±16.52plus-or-minus23.616.5223.6\pm 16.5223.6 ± 16.52
19 6.58±2.77plus-or-minus6.582.77\mathbf{6.58\pm 2.77}bold_6.58 ± bold_2.77 4.01±2.37plus-or-minus4.012.374.01\pm 2.374.01 ± 2.37 70.67±21.8plus-or-minus70.6721.8\mathbf{70.67\pm 21.8}bold_70.67 ± bold_21.8 63.13±23.1plus-or-minus63.1323.163.13\pm 23.163.13 ± 23.1
20 0.97±1.37plus-or-minus0.971.37\mathbf{0.97\pm 1.37}bold_0.97 ± bold_1.37 0.77±1.07plus-or-minus0.771.070.77\pm 1.070.77 ± 1.07 0.74±0.48plus-or-minus0.740.48\mathbf{0.74\pm 0.48}bold_0.74 ± bold_0.48 0.68±0.48plus-or-minus0.680.480.68\pm 0.480.68 ± 0.48
21 6.44±2.9plus-or-minus6.442.96.44\pm 2.96.44 ± 2.9 7.14±3.27plus-or-minus7.143.27\mathbf{7.14\pm 3.27}bold_7.14 ± bold_3.27 6.14±5.26plus-or-minus6.145.266.14\pm 5.266.14 ± 5.26 6.77±5.5plus-or-minus6.775.5\mathbf{6.77\pm 5.5}bold_6.77 ± bold_5.5
22 4.63±0.48plus-or-minus4.630.484.63\pm 0.484.63 ± 0.48 4.77±0.46plus-or-minus4.770.46\mathbf{4.77\pm 0.46}bold_4.77 ± bold_0.46 19.41±16.04plus-or-minus19.4116.04\mathbf{19.41\pm 16.04}bold_19.41 ± bold_16.04 18.63±14.98plus-or-minus18.6314.9818.63\pm 14.9818.63 ± 14.98
23 2.95±0.0plus-or-minus2.950.02.95\pm 0.02.95 ± 0.0 2.98±0.0plus-or-minus2.980.0\mathbf{2.98\pm 0.0}bold_2.98 ± bold_0.0 2.64±0.0plus-or-minus2.640.0\mathbf{2.64\pm 0.0}bold_2.64 ± bold_0.0 2.29±0.0plus-or-minus2.290.02.29\pm 0.02.29 ± 0.0
24 3.75±8.83plus-or-minus3.758.83\mathbf{3.75\pm 8.83}bold_3.75 ± bold_8.83 3.35±7.71plus-or-minus3.357.713.35\pm 7.713.35 ± 7.71 4.24±10.44plus-or-minus4.2410.44\mathbf{4.24\pm 10.44}bold_4.24 ± bold_10.44 3.84±9.46plus-or-minus3.849.463.84\pm 9.463.84 ± 9.46
Table 11: Average percentage of EEG readings identified as anomalous by SincVAE and VAE during the preictal and postictal phases across tracks containing seizures for each subject.

Table 11 extends the analysis shown in Figure 10 covering all subjects but specifically focusing on the preictal and postictal phases. The anomalies detected during these phases are averaged across the seizure tracks for each subject, since vary in numerosity. The findings from Figure 10 are consistent across the majority of subjects, indicating that SincVAE detects a higher rate of anomalies during the preictal and postictal phases.

6 Conclusions

This work proposed the SincVAE architecture, integrating SincNet within the VAE framework to perform semi-supervised anomaly detection on time series data. In particular, the framework was explored through the seizure detection problem on EEG data. From the experimental session, SincVAE has demonstrated considerable potential in improving the reliability and accuracy of seizure detection in EEG data compared to a standard VAE.

This method not only simplifies the preprocessing steps by effectively utilizing the bandpass filters of SincNet but also enhances the overall detection process by focusing on significant EEG frequency bands detected by the neural network during the training stage. The experiments conducted on various datasets validated the effectiveness of SincVAE, showcasing its superiority in various scenarios and making it a valuable approach for real-world applications in seizure detection.

Also, the capability of SincVAE to discern subtle anomalies in EEG data indicates SincVAE as a tool to detect early signs of epilepsy in the preictal stage and to monitor the patients status during the postictal stage. This aspect can profoundly affect patient monitoring.

Furthermore, the semi-supervised nature of SincVAE, requiring only non-seizure data for training, addresses challenges associated with the scarcity of labeled anomaly data in medical datasets, making it an efficient solution in real-world applications where anomalies are rare but critical to detect.

Future work could explore the application of SincVAE in other types of time-series anomaly detection tasks, potentially broadening its utility in healthcare and other fields.

Acknowledgement

This work was supported by PRIN research project PE0000013 Future Artificial Intelligence Research – FAIR, E63C22002150007

References

  • [1] Seref Sagiroglu and Duygu Sinanc. Big data: A review. In 2013 international conference on collaboration technologies and systems (CTS), pages 42–47. IEEE, 2013.
  • [2] Charu C Aggarwal and Charu C Aggarwal. An introduction to outlier analysis. Springer, 2017.
  • [3] Raghavendra Chalapathy and Sanjay Chawla. Deep learning for anomaly detection: A survey. arXiv preprint arXiv:1901.03407, 2019.
  • [4] Emin Aleskerov, Bernd Freisleben, and Bharat Rao. Cardwatch: A neural network based database mining system for credit card fraud detection. In Proceedings of the IEEE/IAFE 1997 computational intelligence for financial engineering (CIFEr), pages 220–226. IEEE, 1997.
  • [5] Vipin Kumar. Parallel and distributed computing for cybersecurity. IEEE Distributed Systems Online, 6(10), 2005.
  • [6] Andrea Pollastro, Giusiana Testa, Antonio Bilotta, and Roberto Prevete. Semi-supervised detection of structural damage using variational autoencoder and a one-class support vector machine. IEEE Access, 2023.
  • [7] Alessandra De Angelis, Antonio Bilotta, Maria Rosaria Pecce, Andrea Pollastro, and Roberto Prevete. Dynamic identification methods and artificial intelligence algorithms for damage detection of masonry infills. Journal of Civil Structural Health Monitoring, pages 1–20, 2024.
  • [8] Thomas Schlegl, Philipp Seeböck, Sebastian M Waldstein, Ursula Schmidt-Erfurth, and Georg Langs. Unsupervised anomaly detection with generative adversarial networks to guide marker discovery. In International conference on information processing in medical imaging, pages 146–157. Springer, 2017.
  • [9] Osman Salem, Yaning Liu, Ahmed Mehaoua, and Raouf Boutaba. Online anomaly detection in wireless body area networks for reliable healthcare monitoring. IEEE journal of biomedical and health informatics, 18(5):1541–1551, 2014.
  • [10] M Kavitha, PVVS Srinivas, PS Latha Kalyampudi, Singaraju Srinivasulu, et al. Machine learning techniques for anomaly detection in smart healthcare. In 2021 Third International Conference on Inventive Research in Computing Applications (ICIRCA), pages 1350–1356. IEEE, 2021.
  • [11] Krishnan Naidoo and Vukosi Marivate. Unsupervised anomaly detection of healthcare providers using generative adversarial networks. In Responsible Design, Implementation and Use of Information and Communication Technology: 19th IFIP WG 6.11 Conference on e-Business, e-Services, and e-Society, I3E 2020, Skukuza, South Africa, April 6–8, 2020, Proceedings, Part I 19, pages 419–430. Springer, 2020.
  • [12] Sungmin You, Baek Hwan Cho, Young-Min Shon, Dae-Won Seo, and In Young Kim. Semi-supervised automatic seizure detection using personalized anomaly detecting variational autoencoder with behind-the-ear eeg. Computer Methods and Programs in Biomedicine, 213:106542, 2022.
  • [13] Edin Šabić, David Keeley, Bailey Henderson, and Sara Nannemann. Healthcare and anomaly detection: using machine learning to predict anomalies in heart rate data. AI & SOCIETY, 36(1):149–158, 2021.
  • [14] Arijit Ukil, Soma Bandyoapdhyay, Chetanya Puri, and Arpan Pal. Iot healthcare analytics: The importance of anomaly detection. In 2016 IEEE 30th international conference on advanced information networking and applications (AINA), pages 994–997. IEEE, 2016.
  • [15] Jerry J Shih, Dean J Krusienski, and Jonathan R Wolpaw. Brain-computer interfaces in medicine. In Mayo clinic proceedings, volume 87, pages 268–279. Elsevier, 2012.
  • [16] Mohammad-Parsa Hosseini, Dario Pompili, Kost Elisevich, and Hamid Soltanian-Zadeh. Optimized deep learning for eeg big data and seizure prediction bci via internet of things. IEEE Transactions on Big Data, 3(4):392–404, 2017.
  • [17] Sheng-Fu Liang, Fu-Zen Shaw, Chung-** Young, Da-Wei Chang, and Yi-Cheng Liao. A closed-loop brain computer interface for real-time seizure detection and control. In 2010 Annual International Conference of the IEEE Engineering in Medicine and Biology, pages 4950–4953. IEEE, 2010.
  • [18] Swati Vaid, Preeti Singh, and Chamandeep Kaur. Eeg signal analysis for bci interface: A review. In 2015 fifth international conference on advanced computing & communication technologies, pages 143–147. IEEE, 2015.
  • [19] Mohamad Shahbazi and Hamid Aghajan. A generalizable model for seizure prediction based on deep learning using cnn-lstm architecture. In 2018 IEEE Global Conference on Signal and Information Processing (GlobalSIP), pages 469–473. IEEE, 2018.
  • [20] Brian Litt and Javier Echauz. Prediction of epileptic seizures. The Lancet Neurology, 1(1):22–30, 2002.
  • [21] Ali Hossam Shoeb. Application of machine learning to epileptic seizure onset detection and treatment. PhD thesis, Massachusetts Institute of Technology, 2009.
  • [22] Ali H Shoeb and John V Guttag. Application of machine learning to epileptic seizure detection. In Proceedings of the 27th international conference on machine learning (ICML-10), pages 975–982, 2010.
  • [23] Zakary Georgis-Yap, Milos R Popovic, and Shehroz S Khan. Supervised and unsupervised deep learning approaches for eeg seizure prediction. arXiv preprint arXiv:2304.14922, 2023.
  • [24] Mohiuddin Ahmed, Abdun Naser Mahmood, and Jiankun Hu. A survey of network anomaly detection techniques. Journal of Network and Computer Applications, 60:19–31, 2016.
  • [25] Monowar H Bhuyan, Dhruba Kumar Bhattacharyya, and Jugal K Kalita. Network anomaly detection: methods, systems and tools. Ieee communications surveys & tutorials, 16(1):303–336, 2013.
  • [26] Christopher M Bishop and Nasser M Nasrabadi. Pattern recognition and machine learning, volume 4. Springer, 2006.
  • [27] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep learning. MIT press, 2016.
  • [28] Salima Omar, Asri Ngadi, and Hamid H Jebur. Machine learning techniques for anomaly detection: an overview. International Journal of Computer Applications, 79(2), 2013.
  • [29] Guansong Pang, Chunhua Shen, Longbing Cao, and Anton Van Den Hengel. Deep learning for anomaly detection: A review. ACM computing surveys (CSUR), 54(2):1–38, 2021.
  • [30] Ruoying Wang, Kexin Nie, Tie Wang, Yang Yang, and Bo Long. Deep learning for anomaly detection. In Proceedings of the 13th international conference on web search and data mining, pages 894–896, 2020.
  • [31] Nanjun Li and Faliang Chang. Video anomaly detection and localization via multivariate gaussian fully convolution adversarial autoencoder. Neurocomputing, 369:92–105, 2019.
  • [32] **an Fan, Qianru Zhang, Jialei Zhu, Meng Zhang, Zhou Yang, and Hanxiang Cao. Robust deep auto-encoding gaussian process regression for unsupervised anomaly detection. Neurocomputing, 376:180–190, 2020.
  • [33] Chong Zhou and Randy C Paffenroth. Anomaly detection with robust deep autoencoders. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, pages 665–674, 2017.
  • [34] Diederik P Kingma and Max Welling. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
  • [35] Cristian I Challu, Peihong Jiang, Ying Nian Wu, and Laurent Callot. Deep generative model with hierarchical latent factors for time series anomaly detection. In International Conference on Artificial Intelligence and Statistics, pages 1643–1654. PMLR, 2022.
  • [36] Shelly Sheynin, Sagie Benaim, and Lior Wolf. A hierarchical transformation-discriminating generative model for few shot anomaly detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8495–8504, 2021.
  • [37] Hyunsun Choi and Eric Jang. Generative ensembles for robust anomaly detection. 2018.
  • [38] **won An and Sungzoon Cho. Variational autoencoder based anomaly detection using reconstruction probability. Special lecture on IE, 2(1):1–18, 2015.
  • [39] Ilia A Luchnikov, Alexander Ryzhov, Pieter-Jan Stas, Sergey N Filippov, and Henni Ouerdane. Variational autoencoder reconstruction of complex many-body physics. Entropy, 21(11):1091, 2019.
  • [40] David Zimmerer, Simon AA Kohl, Jens Petersen, Fabian Isensee, and Klaus H Maier-Hein. Context-encoding variational autoencoder for unsupervised anomaly detection. arXiv preprint arXiv:1812.05941, 2018.
  • [41] Yifu Ren, **hai Liu, Jianan Zhang, Lin Jiang, and Yanhong Luo. A data reconstruction method based on adversarial conditional variational autoencoder. In 2020 IEEE 9th Data Driven Control and Learning Systems Conference (DDCLS), pages 622–626. IEEE, 2020.
  • [42] Leonardo Noriega. Multilayer perceptron tutorial. School of Computing. Staffordshire University, 4(5):444, 2005.
  • [43] Larry R Medsker and LC Jain. Recurrent neural networks. Design and Applications, 5(64-67):2, 2001.
  • [44] Bendong Zhao, Huanzhang Lu, Shangfeng Chen, Junliang Liu, and Dongya Wu. Convolutional neural networks for time series classification. Journal of Systems Engineering and Electronics, 28(1):162–169, 2017.
  • [45] Zhicheng Cui, Wenlin Chen, and Yixin Chen. Multi-scale convolutional neural networks for time series classification. arXiv preprint arXiv:1603.06995, 2016.
  • [46] Anastasia Borovykh, Sander Bohte, and Cornelis W Oosterlee. Conditional time series forecasting with convolutional neural networks. arXiv preprint arXiv:1703.04691, 2017.
  • [47] Irena Koprinska, Dengsong Wu, and Zheng Wang. Convolutional neural networks for energy time series forecasting. In 2018 international joint conference on neural networks (IJCNN), pages 1–8. IEEE, 2018.
  • [48] Yi Zheng, Qi Liu, Enhong Chen, Yong Ge, and J Leon Zhao. Time series classification using multi-channels deep convolutional neural networks. In International conference on web-age information management, pages 298–310. Springer, 2014.
  • [49] Grace W Lindsay. Convolutional neural networks as a model of the visual system: Past, present, and future. Journal of cognitive neuroscience, 33(10):2017–2031, 2021.
  • [50] Mirco Ravanelli and Yoshua Bengio. Speaker recognition from raw waveform with sincnet. In 2018 IEEE spoken language technology workshop (SLT), pages 1021–1028. IEEE, 2018.
  • [51] Annisa Humairani, Achmad Rizal, Inung Wijayanto, Sugondo Hadiyoso, and Yunendah Nur Fuadah. Wavelet-based entropy analysis on eeg signal for detecting seizures. In 2022 10th International Conference on Information and Communication Technology (ICoICT), pages 93–98. IEEE, 2022.
  • [52] Muhammad Zubair Ahmad, Awais Mehmood Kamboh, Sajid Saleem, and Amir Ali Khan. Mallat’s scattering transform based anomaly sensing for detection of seizures in scalp eeg. IEEE Access, 5:16919–16929, 2017.
  • [53] Poomipat Boonyakitanont, Apiwat Lek-Uthai, Krisnachai Chomtho, and Jitkomut Songsiri. A review of feature extraction and performance evaluation in epileptic seizure detection using eeg. Biomedical Signal Processing and Control, 57:101702, 2020.
  • [54] Kai Keng Ang, Zheng Yang Chin, Chuanchu Wang, Cuntai Guan, and Haihong Zhang. Filter bank common spatial pattern algorithm on bci competition iv datasets 2a and 2b. Frontiers in neuroscience, 6:39, 2012.
  • [55] Nijisha Shajil, Sasikala Mohan, Poonguzhali Srinivasan, Janani Arivudaiyanambi, and Arunnagiri Arasappan Murrugesan. Multiclass classification of spatially filtered motor imagery eeg signals using convolutional neural network for bci based applications. Journal of Medical and Biological Engineering, 40:663–672, 2020.
  • [56] Andrea Apicella, Pasquale Arpaia, Mirco Frosolone, Giovanni Improta, Nicola Moccaldi, and Andrea Pollastro. Eeg-based measurement system for monitoring student engagement in learning 4.0. Scientific Reports, 12(1):5857, 2022.
  • [57] Pasquale Arpaia, Elisa Bertone, Antonio Esposito, Angela Natalizio, Marco Parvis, Alessandra Laura Giulia Pedrocchi, and Andrea Pollastro. Sinc-eegnet for improving performance while reducing calibration of a motor imagery-based bci. In 2023 IEEE International Conference on Metrology for eXtended Reality, Artificial Intelligence and Neural Engineering (MetroXRAINE), pages 1063–1068. IEEE, 2023.
  • [58] Lukas Ruff, Robert A Vandermeulen, Nico Görnitz, Alexander Binder, Emmanuel Müller, Klaus-Robert Müller, and Marius Kloft. Deep semi-supervised anomaly detection. arXiv preprint arXiv:1906.02694, 2019.
  • [59] Samet Akcay, Amir Atapour-Abarghouei, and Toby P Breckon. Ganomaly: Semi-supervised anomaly detection via adversarial training. In Computer Vision–ACCV 2018: 14th Asian Conference on Computer Vision, Perth, Australia, December 2–6, 2018, Revised Selected Papers, Part III 14, pages 622–637. Springer, 2019.
  • [60] Gul Hameed Khan, Nadeem Ahmad Khan, Muhammad Awais Bin Altaf, and Qammer Abbasi. A shallow autoencoder framework for epileptic seizure detection in eeg signals. Sensors, 23(8):4112, 2023.
  • [61] Trevor Hastie, Robert Tibshirani, Jerome H Friedman, and Jerome H Friedman. The elements of statistical learning: data mining, inference, and prediction, volume 2. Springer, 2009.
  • [62] Ye Yuan, Guangxu Xun, Kebin Jia, and Aidong Zhang. A multi-view deep learning framework for eeg seizure detection. IEEE journal of biomedical and health informatics, 23(1):83–94, 2018.
  • [63] Ahmed M Abdelhameed and Magdy Bayoumi. Semi-supervised eeg signals classification system for epileptic seizure detection. IEEE Signal Processing Letters, 26(12):1922–1926, 2019.
  • [64] Ahmed Abdelhameed and Magdy Bayoumi. A deep learning approach for automatic seizure detection in children with epilepsy. Frontiers in Computational Neuroscience, 15:650050, 2021.
  • [65] Ahmed M Abdelhameed and Magdy Bayoumi. An efficient deep learning system for epileptic seizure prediction. In 2021 IEEE International Symposium on Circuits and Systems (ISCAS), pages 1–5. IEEE, 2021.
  • [66] Hisham Daoud and Magdy Bayoumi. Deep learning approach for epileptic focus localization. IEEE transactions on biomedical circuits and systems, 14(2):209–220, 2019.
  • [67] Ruyan Wang, Linhai Wang, Peng He, Ya** Cui, and Dapeng Wu. Epileptic seizures prediction based on unsupervised learning for feature extraction. In ICC 2022-IEEE International Conference on Communications, pages 4643–4648. IEEE, 2022.
  • [68] Peng He, Linhai Wang, Ya** Cui, Ruyan Wang, and Dapeng Wu. Unsupervised feature learning based on autoencoder for epileptic seizures prediction. Applied Intelligence, 53(18):20766–20784, 2023.
  • [69] Tingxi Wen and Zhongnan Zhang. Deep convolution neural network and autoencoders-based unsupervised feature learning of eeg signals. IEEE Access, 6:25399–25410, 2018.
  • [70] Afshin Shoeibi, Navid Ghassemi, Marjane Khodatars, Parisa Moridian, Roohallah Alizadehsani, Assef Zare, Abbas Khosravi, Abdulhamit Subasi, U Rajendra Acharya, and Juan M Gorriz. Detection of epileptic seizures on eeg signals using anfis classifier, autoencoders and fuzzy entropies. Biomedical Signal Processing and Control, 73:103417, 2022.
  • [71] Xiaojie Huang, Xiangtao Sun, Lijun Zhang, Tong Zhu, Hao Yang, Qingsong Xiong, and Lijie Feng. A novel epilepsy detection method based on feature extraction by deep autoencoder on eeg signal. International Journal of Environmental Research and Public Health, 19(22):15110, 2022.
  • [72] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017.
  • [73] İlkay Yıldız, Rachael Garner, Matthew Lai, and Dominique Duncan. Unsupervised seizure identification on eeg. Computer methods and programs in biomedicine, 215:106604, 2022.
  • [74] Ana Maria Amaro de Sousa, Michel JAM van Putten, Stéphanie van den Berg, and Maryam Amir Haeri. Detection of interictal epileptiform discharges with semi-supervised deep learning. Biomedical Signal Processing and Control, 88:105610, 2024.
  • [75] Ilkay Yıldız Potter, George Zerveas, Carsten Eickhoff, and Dominique Duncan. Unsupervised multivariate time-series transformers for seizure identification on eeg. In 2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA), pages 1304–1311. IEEE, 2022.
  • [76] Ulf Grenander. The nyquist frequency is that frequency whose period is two sampling intervals. Probability and Statistics: The Harald Cramér Volume, 434, 1959.
  • [77] Lawrence Rabiner and Ronald Schafer. Theory and applications of digital speech processing. Prentice Hall Press, 2010.
  • [78] Sanjit K Mitra. Digital signal processing: a computer-based approach. McGraw-Hill Higher Education, 2001.
  • [79] Christopher P Burgess, Irina Higgins, Arka Pal, Loic Matthey, Nick Watters, Guillaume Desjardins, and Alexander Lerchner. Understanding disentangling in β𝛽\betaitalic_β-vae. arXiv preprint arXiv:1804.03599, 2018.
  • [80] Ralph G Andrzejak, Klaus Lehnertz, Florian Mormann, Christoph Rieke, Peter David, and Christian E Elger. Indications of nonlinear deterministic and finite-dimensional structures in time series of brain electrical activity: Dependence on recording region and brain state. Physical Review E, 64(6):061907, 2001.
  • [81] PhysioToolkit PhysioBank. Physionet: components of a new research resource for complex physiologic signals. Circulation, 101(23):e215–e220, 2000.
  • [82] J Prasanna, MSP Subathra, Mazin Abed Mohammed, Robertas Damaševičius, Nanjappan Jothiraj Sairamya, and S Thomas George. Automated epileptic seizure detection in pediatric subjects of chb-mit eeg database—a survey. Journal of Personalized Medicine, 11(10):1028, 2021.
  • [83] Gwangho Choi, Chulkyun Park, Junkyung Kim, Kyoungin Cho, Tae-Joon Kim, HwangSik Bae, Kyeongyuk Min, Ki-Young Jung, and Jongwha Chong. A novel multi-scale 3d cnn with deep neural network for epileptic seizure detection. In 2019 IEEE International Conference on Consumer Electronics (ICCE), pages 1–2. IEEE, 2019.
  • [84] Mengni Zhou, Cheng Tian, Rui Cao, Bin Wang, Yan Niu, Ting Hu, Hao Guo, and Jie Xiang. Epileptic seizure detection based on eeg signals and cnn. Frontiers in neuroinformatics, 12:95, 2018.
  • [85] Irem Tasci, Burak Tasci, Prabal D Barua, Sengul Dogan, Turker Tuncer, Elizabeth Emma Palmer, Hamido Fujita, and U Rajendra Acharya. Epilepsy detection in 121 patient populations using hypercube pattern from eeg signals. Information Fusion, 96:252–268, 2023.
  • [86] Chulkyun Park, Gwangho Choi, Junkyung Kim, Sangdeok Kim, Tae-Joon Kim, Kyeongyuk Min, Ki-Young Jung, and Jongwha Chong. Epileptic seizure detection for multi-channel eeg with deep convolutional neural network. In 2018 International Conference on Electronics, Information, and Communication (ICEIC), pages 1–5. IEEE, 2018.
  • [87] Omar Kaziha and Talal Bonny. A convolutional neural network for seizure detection. In 2020 Advances in Science and Engineering Technology International Conferences (ASET), pages 1–5. IEEE, 2020.
  • [88] Fatima Hassan, Syed Fawad Hussain, and Saeed Mian Qaisar. Epileptic seizure detection using a hybrid 1d cnn-machine learning approach from eeg data. Journal of Healthcare Engineering, 2022, 2022.
  • [89] Yang Qiu, Weidong Zhou, Nana Yu, and Peidong Du. Denoising sparse autoencoder-based ictal eeg classification. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 26(9):1717–1726, 2018.
  • [90] Petro Liashchynskyi and Pavlo Liashchynskyi. Grid search, random search, genetic algorithm: a big comparison for nas. arXiv preprint arXiv:1912.06059, 2019.
  • [91] Yuan Yao, Lorenzo Rosasco, and Andrea Caponnetto. On early stop** in gradient descent learning. Constructive Approximation, 26:289–315, 2007.
  • [92] Andrea Apicella, Francesco Isgrò, Andrea Pollastro, and Roberto Prevete. On the effects of data normalization for domain adaptation on eeg data. Engineering Applications of Artificial Intelligence, 123:106205, 2023.
  • [93] Samuel Sanford Shapiro and Martin B Wilk. An analysis of variance test for normality (complete samples). Biometrika, 52(3-4):591–611, 1965.
  • [94] Eva Ostertagova, Oskar Ostertag, and Jozef Kováč. Methodology and application of the kruskal-wallis test. Applied mechanics and materials, 611:115–120, 2014.
  • [95] Patrick E McKnight and Julius Najab. Mann-whitney u test. The Corsini encyclopedia of psychology, pages 1–1, 2010.
  • [96] Tae Kyun Kim. Understanding one-way anova using conceptual figures. Korean journal of anesthesiology, 70(1):22, 2017.
  • [97] Sean Givnan, Carl Chalmers, Paul Fergus, Sandra Ortega-Martorell, and Tom Whalley. Anomaly detection using autoencoder reconstruction upon industrial motors. Sensors, 22(9):3166, 2022.
  • [98] May Thet Tun, Dim En Nyaung, and Myat Pwint Phyu. Network anomaly detection using threshold-based sparse. In Proceedings of the 11th International Conference on Advances in Information Technology, pages 1–8, 2020.
  • [99] Mahmoud Said Elsayed, Nhien-An Le-Khac, Soumyabrata Dev, and Anca Delia Jurcut. Detecting abnormal traffic in large-scale networks. In 2020 International Symposium on Networks, Computers and Communications (ISNCC), pages 1–7. IEEE, 2020.
  • [100] Ihar Lobach, M Borland, K Harkay, N Kuklev, A Sannibale, Y Sun, et al. Machine learning for anomaly detection and classification in particle accelerators. 2022.