Search | arXiv e-print repository

Acoustical Features as Knee Health Biomarkers: A Critical Analysis

Authors: Christodoulos Kechris, Jerome Thevenot, Tomas Teijeiro, Vincent A. Stadelmann, Nicola A. Maffiuletti, David Atienza

Abstract: Acoustical knee health assessment has long promised an alternative to clinically available medical imaging tools, but this modality has yet to be adopted in medical practice. The field is currently led by machine learning models processing acoustical features, which have presented promising diagnostic performances. However, these methods overlook the intricate multi-source nature of audio signals… ▽ More Acoustical knee health assessment has long promised an alternative to clinically available medical imaging tools, but this modality has yet to be adopted in medical practice. The field is currently led by machine learning models processing acoustical features, which have presented promising diagnostic performances. However, these methods overlook the intricate multi-source nature of audio signals and the underlying mechanisms at play. By addressing this critical gap, the present paper introduces a novel causal framework for validating knee acoustical features. We argue that current machine learning methodologies for acoustical knee diagnosis lack the required assurances and thus cannot be used to classify acoustic features as biomarkers. Our framework establishes a set of essential theoretical guarantees necessary to validate this claim. We apply our methodology to three real-world experiments investigating the effect of researchers' expectations, the experimental protocol and the wearable employed sensor. This investigation reveals latent issues such as underlying shortcut learning and performance inflation. This study is the first independent result reproduction study in the field of acoustical knee health evaluation. We conclude with actionable insights from our findings, offering valuable guidance to navigate these crucial limitations in future research. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2209.04360 [pdf, other]

doi 10.1016/j.cmpb.2023.107743

A Semi-Supervised Algorithm for Improving the Consistency of Crowdsourced Datasets: The COVID-19 Case Study on Respiratory Disorder Classification

Authors: Lara Orlandic, Tomas Teijeiro, David Atienza

Abstract: Cough audio signal classification is a potentially useful tool in screening for respiratory disorders, such as COVID-19. Since it is dangerous to collect data from patients with such contagious diseases, many research teams have turned to crowdsourcing to quickly gather cough sound data, as it was done to generate the COUGHVID dataset. The COUGHVID dataset enlisted expert physicians to diagnose th… ▽ More Cough audio signal classification is a potentially useful tool in screening for respiratory disorders, such as COVID-19. Since it is dangerous to collect data from patients with such contagious diseases, many research teams have turned to crowdsourcing to quickly gather cough sound data, as it was done to generate the COUGHVID dataset. The COUGHVID dataset enlisted expert physicians to diagnose the underlying diseases present in a limited number of uploaded recordings. However, this approach suffers from potential mislabeling of the coughs, as well as notable disagreement between experts. In this work, we use a semi-supervised learning (SSL) approach to improve the labeling consistency of the COUGHVID dataset and the robustness of COVID-19 versus healthy cough sound classification. First, we leverage existing SSL expert knowledge aggregation techniques to overcome the labeling inconsistencies and sparsity in the dataset. Next, our SSL approach is used to identify a subsample of re-labeled COUGHVID audio samples that can be used to train or augment future cough classification models. The consistency of the re-labeled data is demonstrated in that it exhibits a high degree of class separability, 3x higher than that of the user-labeled data, despite the expert label inconsistency present in the original dataset. Furthermore, the spectral differences in the user-labeled audio segments are amplified in the re-labeled data, resulting in significantly different power spectral densities between healthy and COVID-19 coughs, which demonstrates both the increased consistency of the new dataset and its explainability from an acoustic perspective. Finally, we demonstrate how the re-labeled dataset can be used to train a cough classifier. This SSL approach can be used to combine the medical knowledge of several experts to improve the database consistency for any diagnostic classification task. △ Less

Submitted 9 September, 2022; originally announced September 2022.

arXiv:2208.00885 [pdf, other]

Many-to-One Knowledge Distillation of Real-Time Epileptic Seizure Detection for Low-Power Wearable Internet of Things Systems

Authors: Saleh Baghersalimi, Alireza Amirshahi, Farnaz Forooghifar, Tomas Teijeiro, Amir Aminifar, David Atienza

Abstract: Integrating low-power wearable Internet of Things (IoT) systems into routine health monitoring is an ongoing challenge. Recent advances in the computation capabilities of wearables make it possible to target complex scenarios by exploiting multiple biosignals and using high-performance algorithms, such as Deep Neural Networks (DNNs). There is, however, a trade-off between performance of the algori… ▽ More Integrating low-power wearable Internet of Things (IoT) systems into routine health monitoring is an ongoing challenge. Recent advances in the computation capabilities of wearables make it possible to target complex scenarios by exploiting multiple biosignals and using high-performance algorithms, such as Deep Neural Networks (DNNs). There is, however, a trade-off between performance of the algorithms and the low-power requirements of IoT platforms with limited resources. Besides, physically larger and multi-biosignal-based wearables bring significant discomfort to the patients. Consequently, reducing power consumption and discomfort is necessary for patients to use IoT devices continuously during everyday life. To overcome these challenges, in the context of epileptic seizure detection, we propose a many-to-one signals knowledge distillation approach targeting single-biosignal processing in IoT wearable systems. The starting point is to get a highly-accurate multi-biosignal DNN, then apply our approach to develop a single-biosignal DNN solution for IoT systems that achieves an accuracy comparable to the original multi-biosignal DNN. To assess the practicality of our approach to real-life scenarios, we perform a comprehensive simulation experiment analysis on several state-of-the-art edge computing platforms, such as Kendryte K210 and Raspberry Pi Zero. △ Less

Submitted 20 July, 2022; originally announced August 2022.

arXiv:2207.01856 [pdf, other]

Event-based sampled ECG morphology reconstruction through self-similarity

Authors: Silvio Zanoli, Tomas Teijeiro, Giovanni Ansaloni, David Atienza

Abstract: Background and Objective: Event-based analog-to-digital converters allow for sparse bio-signal acquisition, enabling local sub-Nyquist sampling frequency. However, aggressive event selection can cause the loss of important bio-markers, not recoverable with standard interpolation techniques. In this work, we leverage the self-similarity of the electrocardiogram (ECG) signal to recover missing featu… ▽ More Background and Objective: Event-based analog-to-digital converters allow for sparse bio-signal acquisition, enabling local sub-Nyquist sampling frequency. However, aggressive event selection can cause the loss of important bio-markers, not recoverable with standard interpolation techniques. In this work, we leverage the self-similarity of the electrocardiogram (ECG) signal to recover missing features in event-based sampled ECG signals, dynamically selecting patient-representative templates together with a novel dynamic time war** algorithm to infer the morphology of event-based sampled heartbeats. Methods: We acquire a set of uniformly sampled heartbeats and use a graph-based clustering algorithm to define representative templates for the patient. Then, for each event-based sampled heartbeat, we select the morphologically nearest template, and we then reconstruct the heartbeat with piece-wise linear deformations of the selected template, according to a novel dynamic time war** algorithm that matches events to template segments. Results: Synthetic tests on a standard normal sinus rhythm dataset, composed of approximately 1.8 million normal heartbeats, show a big leap in performance with respect to standard resampling techniques. In particular (when compared to classic linear resampling), we show an improvement in P-wave detection of up to 10 times, an improvement in T-wave detection of up to three times, and a 30\% improvement in the dynamic time war** morphological distance. Conclusion: In this work, we have developed an event-based processing pipeline that leverages signal self-similarity to reconstruct event-based sampled ECG signals. Synthetic tests show clear advantages over classical resampling techniques. △ Less

Submitted 5 July, 2022; originally announced July 2022.

arXiv:2205.07654 [pdf, other]

Hyperdimensional computing encoding for feature selection on the use case of epileptic seizure detection

Authors: Una Pale, Tomas Teijeiro, David Atienza

Abstract: The healthcare landscape is moving from the reactive interventions focused on symptoms treatment to a more proactive prevention, from one-size-fits-all to personalized medicine, and from centralized to distributed paradigms. Wearable IoT devices and novel algorithms for continuous monitoring are essential components of this transition. Hyperdimensional (HD) computing is an emerging ML paradigm ins… ▽ More The healthcare landscape is moving from the reactive interventions focused on symptoms treatment to a more proactive prevention, from one-size-fits-all to personalized medicine, and from centralized to distributed paradigms. Wearable IoT devices and novel algorithms for continuous monitoring are essential components of this transition. Hyperdimensional (HD) computing is an emerging ML paradigm inspired by neuroscience research with various aspects interesting for IoT devices and biomedical applications. Here we explore the not yet addressed topic of optimal encoding of spatio-temporal data, such as electroencephalogram (EEG) signals, and all information it entails to the HD vectors. Further, we demonstrate how the HD computing framework can be used to perform feature selection by choosing an adequate encoding. To the best of our knowledge, this is the first approach to performing feature selection using HD computing in the literature. As a result, we believe it can support the ML community to further foster the research in multiple directions related to feature and channel selection, as well as model interpretability. △ Less

Submitted 16 May, 2022; originally announced May 2022.

arXiv:2201.09759 [pdf, other]

Exploration of Hyperdimensional Computing Strategies for Enhanced Learning on Epileptic Seizure Detection

Authors: Una Pale, Tomas Teijeiro, David Atienza

Abstract: Wearable and unobtrusive monitoring and prediction of epileptic seizures has the potential to significantly increase the life quality of patients, but is still an unreached goal due to challenges of real-time detection and wearable devices design. Hyperdimensional (HD) computing has evolved in recent years as a new promising machine learning approach, especially when talking about wearable applica… ▽ More Wearable and unobtrusive monitoring and prediction of epileptic seizures has the potential to significantly increase the life quality of patients, but is still an unreached goal due to challenges of real-time detection and wearable devices design. Hyperdimensional (HD) computing has evolved in recent years as a new promising machine learning approach, especially when talking about wearable applications. But in the case of epilepsy detection, standard HD computing is not performing at the level of other state-of-the-art algorithms. This could be due to the inherent complexity of the seizures and their signatures in different biosignals, such as the electroencephalogram (EEG), the highly personalized nature, and the disbalance of seizure and non-seizure instances. In the literature, different strategies for improved learning of HD computing have been proposed, such as iterative (multi-pass) learning, multi-centroid learning and learning with sample weight ("OnlineHD"). Yet, most of them have not been tested on the challenging task of epileptic seizure detection, and it stays unclear whether they can increase the HD computing performance to the level of the current state-of-the-art algorithms, such as random forests. Thus, in this paper, we implement different learning strategies and assess their performance on an individual basis, or in combination, regarding detection performance and memory and computational requirements. Results show that the best-performing algorithm, which is a combination of multi-centroid and multi-pass, can indeed reach the performance of the random forest model on a highly unbalanced dataset imitating a real-life epileptic seizure detection application. △ Less

Submitted 24 January, 2022; originally announced January 2022.

arXiv:2112.04369 [pdf, other]

doi 10.1109/TBME.2022.3205304

Adaptive R-Peak Detection on Wearable ECG Sensors for High-Intensity Exercise

Authors: Elisabetta De Giovanni, Tomas Teijeiro, Grégoire P. Millet, David Atienza

Abstract: Objective: Continuous monitoring of biosignals via wearable sensors has quickly expanded in the medical and wellness fields. At rest, automatic detection of vital parameters is generally accurate. However, in conditions such as high-intensity exercise, sudden physiological changes occur to the signals, compromising the robustness of standard algorithms. Methods: Our method, called BayeSlope, is ba… ▽ More Objective: Continuous monitoring of biosignals via wearable sensors has quickly expanded in the medical and wellness fields. At rest, automatic detection of vital parameters is generally accurate. However, in conditions such as high-intensity exercise, sudden physiological changes occur to the signals, compromising the robustness of standard algorithms. Methods: Our method, called BayeSlope, is based on unsupervised learning, Bayesian filtering, and non-linear normalization to enhance and correctly detect the R peaks according to their expected positions in the ECG. Furthermore, as BayeSlope is computationally heavy and can drain the device battery quickly, we propose an online design that adapts its robustness to sudden physiological changes, and its complexity to the heterogeneous resources of modern embedded platforms. This method combines BayeSlope with a lightweight algorithm, executed in cores with different capabilities, to reduce the energy consumption while preserving the accuracy. Results: BayeSlope achieves an F1 score of 99.3% in experiments during intense cycling exercise with 20 subjects. Additionally, the online adaptive process achieves an F1 score of 99% across five different exercise intensities, with a total energy consumption of 1.55+-0.54~mJ. Conclusion: We propose a highly accurate and robust method, and a complete energy-efficient implementation in a modern ultra-low-power embedded platform to improve R peak detection in challenging conditions, such as during high-intensity exercise. Significance: The experiments show that BayeSlope outperforms a state-of-the-art algorithm up to 8.4% in F1 score, while our online adaptive method can reach energy savings up to 38.7% on modern heterogeneous wearable platforms. △ Less

Submitted 8 December, 2021; originally announced December 2021.

Comments: 12 pages, 14 figures, 2 tables

MSC Class: 68U35

arXiv:2111.08463 [pdf, other]

doi 10.3389/fneur.2022.816294

Multi-Centroid Hyperdimensional Computing Approach for Epileptic Seizure Detection

Authors: Una Pale, Tomas Teijeiro, David Atienza

Abstract: Long-term monitoring of patients with epilepsy presents a challenging problem from the engineering perspective of real-time detection and wearable devices design. It requires new solutions that allow continuous unobstructed monitoring and reliable detection and prediction of seizures. A high variability in the electroencephalogram (EEG) patterns exists among people, brain states, and time instance… ▽ More Long-term monitoring of patients with epilepsy presents a challenging problem from the engineering perspective of real-time detection and wearable devices design. It requires new solutions that allow continuous unobstructed monitoring and reliable detection and prediction of seizures. A high variability in the electroencephalogram (EEG) patterns exists among people, brain states, and time instances during seizures, but also during non-seizure periods. This makes epileptic seizure detection very challenging, especially if data is grouped under only seizure and non-seizure labels. Hyperdimensional (HD) computing, a novel machine learning approach, comes in as a promising tool. However, it has certain limitations when the data shows a high intra-class variability. Therefore, in this work, we propose a novel semi-supervised learning approach based on a multi-centroid HD computing. The multi-centroid approach allows to have several prototype vectors representing seizure and non-seizure states, which leads to significantly improved performance when compared to a simple 2-class HD model. Further, real-life data imbalance poses an additional challenge and the performance reported on balanced subsets of data is likely to be overestimated. Thus, we test our multi-centroid approach with three different dataset balancing scenarios, showing that performance improvement is higher for the less balanced dataset. More specifically, up to 14% improvement is achieved on an unbalanced test set with 10 times more non-seizure than seizure data. At the same time, the total number of sub-classes is not significantly increased compared to the balanced dataset. Thus, the proposed multi-centroid approach can be an important element in achieving a high performance of epilepsy detection with real-life data balance or during online learning, where seizures are infrequent. △ Less

Submitted 16 November, 2021; originally announced November 2021.

arXiv:2106.13545 [pdf, other]

doi 10.1109/JETCAS.2023.3269623

An Error-Based Approximation Sensing Circuit for Event-Triggered, Low Power Wearable Sensors

Authors: Silvio Zanoli, Flavio Ponzina, Tomás Teijeiro, Alexandre Levisse, David Atienza

Abstract: Event-based sensors have the potential to optimize energy consumption at every stage in the signal processing pipeline, including data acquisition, transmission, processing and storage. However, almost all state-of-the-art systems are still built upon the classical Nyquist-based periodic signal acquisition. In this work, we design and validate the Polygonal Approximation Sampler (PAS), a novel cir… ▽ More Event-based sensors have the potential to optimize energy consumption at every stage in the signal processing pipeline, including data acquisition, transmission, processing and storage. However, almost all state-of-the-art systems are still built upon the classical Nyquist-based periodic signal acquisition. In this work, we design and validate the Polygonal Approximation Sampler (PAS), a novel circuit to implement a general-purpose event-based sampler using a polygonal approximation algorithm as the underlying sampling trigger. The circuit can be dynamically reconfigured to produce a coarse or a detailed reconstruction of the analog input, by adjusting the error threshold of the approximation. The proposed circuit is designed at the Register Transfer Level and processes each input sample received from the ADC in a single clock cycle. The PAS has been tested with three different types of archetypal signals captured by wearable devices (electrocardiogram, accelerometer and respiration data) and compared with a standard periodic ADC. These tests show that single-channel signals, with slow variations and constant segments (like the used single-lead ECG and the respiration signals) take great advantage from the used sampling technique, reducing the amount of data used up to 99% without significant performance degradation. At the same time, multi-channel signals (like the six-dimensional accelerometer signal) can still benefit from the designed circuit, achieving a reduction factor up to 80% with minor performance degradation. These results open the door to new types of wearable sensors with reduced size and higher battery lifetime. △ Less

Submitted 11 May, 2023; v1 submitted 25 June, 2021; originally announced June 2021.

arXiv:2105.00934 [pdf, other]

doi 10.1109/EMBC46164.2021.9629648

Systematic Assessment of Hyperdimensional Computing for Epileptic Seizure Detection

Authors: Una Pale, Tomas Teijeiro, David Atienza

Abstract: Hyperdimensional computing is a promising novel paradigm for low-power embedded machine learning. It has been applied on different biomedical applications, and particularly on epileptic seizure detection. Unfortunately, due to differences in data preparation, segmentation, encoding strategies, and performance metrics, results are hard to compare, which makes building upon that knowledge difficult.… ▽ More Hyperdimensional computing is a promising novel paradigm for low-power embedded machine learning. It has been applied on different biomedical applications, and particularly on epileptic seizure detection. Unfortunately, due to differences in data preparation, segmentation, encoding strategies, and performance metrics, results are hard to compare, which makes building upon that knowledge difficult. Thus, the main goal of this work is to perform a systematic assessment of the HD computing framework for the detection of epileptic seizures, comparing different feature approaches mapped to HD vectors. More precisely, we test two previously implemented features as well as several novel approaches with HD computing on epileptic seizure detection. We evaluate them in a comparable way, i.e., with the same preprocessing setup, and with the identical performance measures. We use two different datasets in order to assess the generalizability of our conclusions. The systematic assessment involved three primary aspects relevant for potential wearable implementations: 1) detection performance, 2) memory requirements, and 3) computational complexity. Our analysis shows a significant difference in detection performance between approaches, but also that the ones with the highest performance might not be ideal for wearable applications due to their high memory or computational requirements. Furthermore, we evaluate a post-processing strategy to adjust the predictions to the dynamics of epileptic seizures, showing that performance is significantly improved in all the approaches and also that after post-processing, differences in performance are much smaller between approaches. △ Less

Submitted 3 May, 2021; originally announced May 2021.

arXiv:2009.11644 [pdf, other]

doi 10.1038/s41597-021-00937-4

The COUGHVID crowdsourcing dataset: A corpus for the study of large-scale cough analysis algorithms

Authors: Lara Orlandic, Tomas Teijeiro, David Atienza

Abstract: Cough audio signal classification has been successfully used to diagnose a variety of respiratory conditions, and there has been significant interest in leveraging Machine Learning (ML) to provide widespread COVID-19 screening. However, there is currently no validated database of cough sounds with which to train such ML models. The COUGHVID dataset provides over 20,000 crowdsourced cough recording… ▽ More Cough audio signal classification has been successfully used to diagnose a variety of respiratory conditions, and there has been significant interest in leveraging Machine Learning (ML) to provide widespread COVID-19 screening. However, there is currently no validated database of cough sounds with which to train such ML models. The COUGHVID dataset provides over 20,000 crowdsourced cough recordings representing a wide range of subject ages, genders, geographic locations, and COVID-19 statuses. First, we filtered the dataset using our open-sourced cough detection algorithm. Second, experienced pulmonologists labeled more than 2,000 recordings to diagnose medical abnormalities present in the coughs, thereby contributing one of the largest expert-labeled cough datasets in existence that can be used for a plethora of cough audio classification tasks. Finally, we ensured that coughs labeled as symptomatic and COVID-19 originate from countries with high infection rates, and that their expert labels are consistent. As a result, the COUGHVID dataset contributes a wealth of cough recordings for training ML models to address the world's most urgent health crises. △ Less

Submitted 24 September, 2020; originally announced September 2020.

Comments: 11 pages, 3 figures

Showing 1–11 of 11 results for author: Teijeiro, T