-
Using Self-supervised Learning Can Improve Model Fairness
Authors:
Sofia Yfantidou,
Dimitris Spathis,
Marios Constantinides,
Athena Vakali,
Daniele Quercia,
Fahim Kawsar
Abstract:
Self-supervised learning (SSL) has become the de facto training paradigm of large models, where pre-training is followed by supervised fine-tuning using domain-specific data and labels. Despite demonstrating comparable performance with supervised methods, comprehensive efforts to assess SSL's impact on machine learning fairness (i.e., performing equally on different demographic breakdowns) are lac…
▽ More
Self-supervised learning (SSL) has become the de facto training paradigm of large models, where pre-training is followed by supervised fine-tuning using domain-specific data and labels. Despite demonstrating comparable performance with supervised methods, comprehensive efforts to assess SSL's impact on machine learning fairness (i.e., performing equally on different demographic breakdowns) are lacking. Hypothesizing that SSL models would learn more generic, hence less biased representations, this study explores the impact of pre-training and fine-tuning strategies on fairness. We introduce a fairness assessment framework for SSL, comprising five stages: defining dataset requirements, pre-training, fine-tuning with gradual unfreezing, assessing representation similarity conditioned on demographics, and establishing domain-specific evaluation processes. We evaluate our method's generalizability on three real-world human-centric datasets (i.e., MIMIC, MESA, and GLOBEM) by systematically comparing hundreds of SSL and fine-tuned models on various dimensions spanning from the intermediate representations to appropriate evaluation metrics. Our findings demonstrate that SSL can significantly improve model fairness, while maintaining performance on par with supervised methods-exhibiting up to a 30% increase in fairness with minimal loss in performance through self-supervision. We posit that such differences can be attributed to representation dissimilarities found between the best- and the worst-performing demographics across models-up to x13 greater for protected attributes with larger performance discrepancies between segments.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
A collection of the accepted papers for the Human-Centric Representation Learning workshop at AAAI 2024
Authors:
Dimitris Spathis,
Aaqib Saeed,
Ali Etemad,
Sana Tonekaboni,
Stefanos Laskaridis,
Shohreh Deldari,
Chi Ian Tang,
Patrick Schwab,
Shyam Tailor
Abstract:
This non-archival index is not complete, as some accepted papers chose to opt-out of inclusion. The list of all accepted papers is available on the workshop website.
This non-archival index is not complete, as some accepted papers chose to opt-out of inclusion. The list of all accepted papers is available on the workshop website.
△ Less
Submitted 14 March, 2024;
originally announced March 2024.
-
Learning under Label Noise through Few-Shot Human-in-the-Loop Refinement
Authors:
Aaqib Saeed,
Dimitris Spathis,
Jungwoo Oh,
Edward Choi,
Ali Etemad
Abstract:
Wearable technologies enable continuous monitoring of various health metrics, such as physical activity, heart rate, sleep, and stress levels. A key challenge with wearable data is obtaining quality labels. Unlike modalities like video where the videos themselves can be effectively used to label objects or events, wearable data do not contain obvious cues about the physical manifestation of the us…
▽ More
Wearable technologies enable continuous monitoring of various health metrics, such as physical activity, heart rate, sleep, and stress levels. A key challenge with wearable data is obtaining quality labels. Unlike modalities like video where the videos themselves can be effectively used to label objects or events, wearable data do not contain obvious cues about the physical manifestation of the users and usually require rich metadata. As a result, label noise can become an increasingly thorny issue when labeling such data. In this paper, we propose a novel solution to address noisy label learning, entitled Few-Shot Human-in-the-Loop Refinement (FHLR). Our method initially learns a seed model using weak labels. Next, it fine-tunes the seed model using a handful of expert corrections. Finally, it achieves better generalizability and robustness by merging the seed and fine-tuned models via weighted parameter averaging. We evaluate our approach on four challenging tasks and datasets, and compare it against eight competitive baselines designed to deal with noisy labels. We show that FHLR achieves significantly better performance when learning from noisy labels and achieves state-of-the-art by a large margin, with up to 19% accuracy improvement under symmetric and asymmetric noise. Notably, we find that FHLR is particularly robust to increased label noise, unlike prior works that suffer from severe performance degradation. Our work not only achieves better generalization in high-stakes health sensing benchmarks but also sheds light on how noise affects commonly-used models.
△ Less
Submitted 25 January, 2024;
originally announced January 2024.
-
Balancing Continual Learning and Fine-tuning for Human Activity Recognition
Authors:
Chi Ian Tang,
Lorena Qendro,
Dimitris Spathis,
Fahim Kawsar,
Akhil Mathur,
Cecilia Mascolo
Abstract:
Wearable-based Human Activity Recognition (HAR) is a key task in human-centric machine learning due to its fundamental understanding of human behaviours. Due to the dynamic nature of human behaviours, continual learning promises HAR systems that are tailored to users' needs. However, because of the difficulty in collecting labelled data with wearable sensors, existing approaches that focus on supe…
▽ More
Wearable-based Human Activity Recognition (HAR) is a key task in human-centric machine learning due to its fundamental understanding of human behaviours. Due to the dynamic nature of human behaviours, continual learning promises HAR systems that are tailored to users' needs. However, because of the difficulty in collecting labelled data with wearable sensors, existing approaches that focus on supervised continual learning have limited applicability, while unsupervised continual learning methods only handle representation learning while delaying classifier training to a later stage. This work explores the adoption and adaptation of CaSSLe, a continual self-supervised learning model, and Kaizen, a semi-supervised continual learning model that balances representation learning and down-stream classification, for the task of wearable-based HAR. These schemes re-purpose contrastive learning for knowledge retention and, Kaizen combines that with self-training in a unified scheme that can leverage unlabelled and labelled data for continual learning. In addition to comparing state-of-the-art self-supervised continual learning schemes, we further investigated the importance of different loss terms and explored the trade-off between knowledge retention and learning from new tasks. In particular, our extensive evaluation demonstrated that the use of a weighting factor that reflects the ratio between learned and new classes achieves the best overall trade-off in continual learning.
△ Less
Submitted 4 January, 2024;
originally announced January 2024.
-
Evaluating Fairness in Self-supervised and Supervised Models for Sequential Data
Authors:
Sofia Yfantidou,
Dimitris Spathis,
Marios Constantinides,
Athena Vakali,
Daniele Quercia,
Fahim Kawsar
Abstract:
Self-supervised learning (SSL) has become the de facto training paradigm of large models where pre-training is followed by supervised fine-tuning using domain-specific data and labels. Hypothesizing that SSL models would learn more generic, hence less biased, representations, this study explores the impact of pre-training and fine-tuning strategies on fairness (i.e., performing equally on differen…
▽ More
Self-supervised learning (SSL) has become the de facto training paradigm of large models where pre-training is followed by supervised fine-tuning using domain-specific data and labels. Hypothesizing that SSL models would learn more generic, hence less biased, representations, this study explores the impact of pre-training and fine-tuning strategies on fairness (i.e., performing equally on different demographic breakdowns). Motivated by human-centric applications on real-world timeseries data, we interpret inductive biases on the model, layer, and metric levels by systematically comparing SSL models to their supervised counterparts. Our findings demonstrate that SSL has the capacity to achieve performance on par with supervised methods while significantly enhancing fairness--exhibiting up to a 27% increase in fairness with a mere 1% loss in performance through self-supervision. Ultimately, this work underscores SSL's potential in human-centric computing, particularly high-stakes, data-scarce application domains like healthcare.
△ Less
Submitted 3 January, 2024;
originally announced January 2024.
-
FairComp: Workshop on Fairness and Robustness in Machine Learning for Ubiquitous Computing
Authors:
Sofia Yfantidou,
Dimitris Spathis,
Marios Constantinides,
Tong Xia,
Niels van Berkel
Abstract:
How can we ensure that Ubiquitous Computing (UbiComp) research outcomes are both ethical and fair? While fairness in machine learning (ML) has gained traction in recent years, fairness in UbiComp remains unexplored. This workshop aims to discuss fairness in UbiComp research and its social, technical, and legal implications. From a social perspective, we will examine the relationship between fairne…
▽ More
How can we ensure that Ubiquitous Computing (UbiComp) research outcomes are both ethical and fair? While fairness in machine learning (ML) has gained traction in recent years, fairness in UbiComp remains unexplored. This workshop aims to discuss fairness in UbiComp research and its social, technical, and legal implications. From a social perspective, we will examine the relationship between fairness and UbiComp research and identify pathways to ensure that ubiquitous technologies do not cause harm or infringe on individual rights. From a technical perspective, we will initiate a discussion on data practices to develop bias mitigation approaches tailored to UbiComp research. From a legal perspective, we will examine how new policies shape our community's work and future research. We aim to foster a vibrant community centered around the topic of responsible UbiComp, while also charting a clear path for future research endeavours in this field.
△ Less
Submitted 22 September, 2023;
originally announced September 2023.
-
The first step is the hardest: Pitfalls of Representing and Tokenizing Temporal Data for Large Language Models
Authors:
Dimitris Spathis,
Fahim Kawsar
Abstract:
Large Language Models (LLMs) have demonstrated remarkable generalization across diverse tasks, leading individuals to increasingly use them as personal assistants and universal computing engines. Nevertheless, a notable obstacle emerges when feeding numerical/temporal data into these models, such as data sourced from wearables or electronic health records. LLMs employ tokenizers in their input tha…
▽ More
Large Language Models (LLMs) have demonstrated remarkable generalization across diverse tasks, leading individuals to increasingly use them as personal assistants and universal computing engines. Nevertheless, a notable obstacle emerges when feeding numerical/temporal data into these models, such as data sourced from wearables or electronic health records. LLMs employ tokenizers in their input that break down text into smaller units. However, tokenizers are not designed to represent numerical values and might struggle to understand repetitive patterns and context, treating consecutive values as separate tokens and disregarding their temporal relationships. Here, we discuss recent works that employ LLMs for human-centric tasks such as in mobile health sensing and present a case study showing that popular LLMs tokenize temporal data incorrectly. To address that, we highlight potential solutions such as prompt tuning with lightweight embedding layers as well as multimodal adapters, that can help bridge this "modality gap". While the capability of language models to generalize to other modalities with minimal or no finetuning is exciting, this paper underscores the fact that their outputs cannot be meaningful if they stumble over input nuances.
△ Less
Submitted 12 September, 2023;
originally announced September 2023.
-
CroSSL: Cross-modal Self-Supervised Learning for Time-series through Latent Masking
Authors:
Shohreh Deldari,
Dimitris Spathis,
Mohammad Malekzadeh,
Fahim Kawsar,
Flora Salim,
Akhil Mathur
Abstract:
Limited availability of labeled data for machine learning on multimodal time-series extensively hampers progress in the field. Self-supervised learning (SSL) is a promising approach to learning data representations without relying on labels. However, existing SSL methods require expensive computations of negative pairs and are typically designed for single modalities, which limits their versatilit…
▽ More
Limited availability of labeled data for machine learning on multimodal time-series extensively hampers progress in the field. Self-supervised learning (SSL) is a promising approach to learning data representations without relying on labels. However, existing SSL methods require expensive computations of negative pairs and are typically designed for single modalities, which limits their versatility. We introduce CroSSL (Cross-modal SSL), which puts forward two novel concepts: masking intermediate embeddings produced by modality-specific encoders, and their aggregation into a global embedding through a cross-modal aggregator that can be fed to down-stream classifiers. CroSSL allows for handling missing modalities and end-to-end cross-modal learning without requiring prior data preprocessing for handling missing inputs or negative-pair sampling for contrastive learning. We evaluate our method on a wide range of data, including motion sensors such as accelerometers or gyroscopes and biosignals (heart rate, electroencephalograms, electromyograms, electrooculograms, and electrodermal) to investigate the impact of masking ratios and masking strategies for various data types and the robustness of the learned representations to missing data. Overall, CroSSL outperforms previous SSL and supervised benchmarks using minimal labeled data, and also sheds light on how latent masking can improve cross-modal learning. Our code is open-sourced at https://github.com/dr-bell/CroSSL.
△ Less
Submitted 19 February, 2024; v1 submitted 31 July, 2023;
originally announced July 2023.
-
UDAMA: Unsupervised Domain Adaptation through Multi-discriminator Adversarial Training with Noisy Labels Improves Cardio-fitness Prediction
Authors:
Yu Wu,
Dimitris Spathis,
Hong Jia,
Ignacio Perez-Pozuelo,
Tomas Gonzales,
Soren Brage,
Nicholas Wareham,
Cecilia Mascolo
Abstract:
Deep learning models have shown great promise in various healthcare monitoring applications. However, most healthcare datasets with high-quality (gold-standard) labels are small-scale, as directly collecting ground truth is often costly and time-consuming. As a result, models developed and validated on small-scale datasets often suffer from overfitting and do not generalize well to unseen scenario…
▽ More
Deep learning models have shown great promise in various healthcare monitoring applications. However, most healthcare datasets with high-quality (gold-standard) labels are small-scale, as directly collecting ground truth is often costly and time-consuming. As a result, models developed and validated on small-scale datasets often suffer from overfitting and do not generalize well to unseen scenarios. At the same time, large amounts of imprecise (silver-standard) labeled data, annotated by approximate methods with the help of modern wearables and in the absence of ground truth validation, are starting to emerge. However, due to measurement differences, this data displays significant label distribution shifts, which motivates the use of domain adaptation. To this end, we introduce UDAMA, a method with two key components: Unsupervised Domain Adaptation and Multidiscriminator Adversarial Training, where we pre-train on the silver-standard data and employ adversarial adaptation with the gold-standard data along with two domain discriminators. In particular, we showcase the practical potential of UDAMA by applying it to Cardio-respiratory fitness (CRF) prediction. CRF is a crucial determinant of metabolic disease and mortality, and it presents labels with various levels of noise (goldand silver-standard), making it challenging to establish an accurate prediction model. Our results show promising performance by alleviating distribution shifts in various label shift settings. Additionally, by using data from two free-living cohort studies (Fenland and BBVS), we show that UDAMA consistently outperforms up to 12% compared to competitive transfer learning and state-of-the-art domain adaptation models, paving the way for leveraging noisy labeled data to improve fitness estimation at scale.
△ Less
Submitted 31 July, 2023;
originally announced July 2023.
-
The State of Algorithmic Fairness in Mobile Human-Computer Interaction
Authors:
Sofia Yfantidou,
Marios Constantinides,
Dimitris Spathis,
Athena Vakali,
Daniele Quercia,
Fahim Kawsar
Abstract:
This paper explores the intersection of Artificial Intelligence and Machine Learning (AI/ML) fairness and mobile human-computer interaction (MobileHCI). Through a comprehensive analysis of MobileHCI proceedings published between 2017 and 2022, we first aim to understand the current state of algorithmic fairness in the community. By manually analyzing 90 papers, we found that only a small portion (…
▽ More
This paper explores the intersection of Artificial Intelligence and Machine Learning (AI/ML) fairness and mobile human-computer interaction (MobileHCI). Through a comprehensive analysis of MobileHCI proceedings published between 2017 and 2022, we first aim to understand the current state of algorithmic fairness in the community. By manually analyzing 90 papers, we found that only a small portion (5%) thereof adheres to modern fairness reporting, such as analyses conditioned on demographic breakdowns. At the same time, the overwhelming majority draws its findings from highly-educated, employed, and Western populations. We situate these findings within recent efforts to capture the current state of algorithmic fairness in mobile and wearable computing, and envision that our results will serve as an open invitation to the design and development of fairer ubiquitous technologies.
△ Less
Submitted 22 July, 2023;
originally announced July 2023.
-
Kaizen: Practical Self-supervised Continual Learning with Continual Fine-tuning
Authors:
Chi Ian Tang,
Lorena Qendro,
Dimitris Spathis,
Fahim Kawsar,
Cecilia Mascolo,
Akhil Mathur
Abstract:
Self-supervised learning (SSL) has shown remarkable performance in computer vision tasks when trained offline. However, in a Continual Learning (CL) scenario where new data is introduced progressively, models still suffer from catastrophic forgetting. Retraining a model from scratch to adapt to newly generated data is time-consuming and inefficient. Previous approaches suggested re-purposing self-…
▽ More
Self-supervised learning (SSL) has shown remarkable performance in computer vision tasks when trained offline. However, in a Continual Learning (CL) scenario where new data is introduced progressively, models still suffer from catastrophic forgetting. Retraining a model from scratch to adapt to newly generated data is time-consuming and inefficient. Previous approaches suggested re-purposing self-supervised objectives with knowledge distillation to mitigate forgetting across tasks, assuming that labels from all tasks are available during fine-tuning. In this paper, we generalize self-supervised continual learning in a practical setting where available labels can be leveraged in any step of the SSL process. With an increasing number of continual tasks, this offers more flexibility in the pre-training and fine-tuning phases. With Kaizen, we introduce a training architecture that is able to mitigate catastrophic forgetting for both the feature extractor and classifier with a carefully designed loss function. By using a set of comprehensive evaluation metrics reflecting different aspects of continual learning, we demonstrated that Kaizen significantly outperforms previous SSL models in competitive vision benchmarks, with up to 16.5% accuracy improvement on split CIFAR-100. Kaizen is able to balance the trade-off between knowledge retention and learning from new data with an end-to-end model, paving the way for practical deployment of continual learning systems.
△ Less
Submitted 7 February, 2024; v1 submitted 30 March, 2023;
originally announced March 2023.
-
Beyond Accuracy: A Critical Review of Fairness in Machine Learning for Mobile and Wearable Computing
Authors:
Sofia Yfantidou,
Marios Constantinides,
Dimitris Spathis,
Athena Vakali,
Daniele Quercia,
Fahim Kawsar
Abstract:
The field of mobile and wearable computing is undergoing a revolutionary integration of machine learning. Devices can now diagnose diseases, predict heart irregularities, and unlock the full potential of human cognition. However, the underlying algorithms powering these predictions are not immune to biases with respect to sensitive attributes (e.g., gender, race), leading to discriminatory outcome…
▽ More
The field of mobile and wearable computing is undergoing a revolutionary integration of machine learning. Devices can now diagnose diseases, predict heart irregularities, and unlock the full potential of human cognition. However, the underlying algorithms powering these predictions are not immune to biases with respect to sensitive attributes (e.g., gender, race), leading to discriminatory outcomes. The goal of this work is to explore the extent to which the mobile and wearable computing community has adopted ways of reporting information about datasets and models to surface and, eventually, counter biases. Our systematic review of papers published in the Proceedings of the ACM Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT) journal from 2018-2022 indicates that, while there has been progress made on algorithmic fairness, there is still ample room for growth. Our findings show that only a small portion (5%) of published papers adheres to modern fairness reporting, while the overwhelming majority thereof focuses on accuracy or error metrics. To generalize these results across venues of similar scope, we analyzed recent proceedings of ACM MobiCom, MobiSys, and SenSys, IEEE Pervasive, and IEEE Transactions on Mobile Computing Computing, and found no deviation from our primary result. In light of these findings, our work provides practical guidelines for the design and development of mobile and wearable technologies that not only strive for accuracy but also fairness.
△ Less
Submitted 22 September, 2023; v1 submitted 27 March, 2023;
originally announced March 2023.
-
Turning Silver into Gold: Domain Adaptation with Noisy Labels for Wearable Cardio-Respiratory Fitness Prediction
Authors:
Yu Wu,
Dimitris Spathis,
Hong Jia,
Ignacio Perez-Pozuelo,
Tomas I. Gonzales,
Soren Brage,
Nicholas Wareham,
Cecilia Mascolo
Abstract:
Deep learning models have shown great promise in various healthcare applications. However, most models are developed and validated on small-scale datasets, as collecting high-quality (gold-standard) labels for health applications is often costly and time-consuming. As a result, these models may suffer from overfitting and not generalize well to unseen data. At the same time, an extensive amount of…
▽ More
Deep learning models have shown great promise in various healthcare applications. However, most models are developed and validated on small-scale datasets, as collecting high-quality (gold-standard) labels for health applications is often costly and time-consuming. As a result, these models may suffer from overfitting and not generalize well to unseen data. At the same time, an extensive amount of data with imprecise labels (silver-standard) is starting to be generally available, as collected from inexpensive wearables like accelerometers and electrocardiography sensors. These currently underutilized datasets and labels can be leveraged to produce more accurate clinical models. In this work, we propose UDAMA, a novel model with two key components: Unsupervised Domain Adaptation and Multi-discriminator Adversarial training, which leverage noisy data from source domain (the silver-standard dataset) to improve gold-standard modeling. We validate our framework on the challenging task of predicting lab-measured maximal oxygen consumption (VO$_{2}$max), the benchmark metric of cardio-respiratory fitness, using free-living wearable sensor data from two cohort studies as inputs. Our experiments show that the proposed framework achieves the best performance of corr = 0.665 $\pm$ 0.04, paving the way for accurate fitness estimation at scale.
△ Less
Submitted 20 November, 2022;
originally announced November 2022.
-
Looking for Out-of-Distribution Environments in Multi-center Critical Care Data
Authors:
Dimitris Spathis,
Stephanie L. Hyland
Abstract:
Clinical machine learning models show a significant performance drop when tested in settings not seen during training. Domain generalisation models promise to alleviate this problem, however, there is still scepticism about whether they improve over traditional training. In this work, we take a principled approach to identifying Out of Distribution (OoD) environments, motivated by the problem of c…
▽ More
Clinical machine learning models show a significant performance drop when tested in settings not seen during training. Domain generalisation models promise to alleviate this problem, however, there is still scepticism about whether they improve over traditional training. In this work, we take a principled approach to identifying Out of Distribution (OoD) environments, motivated by the problem of cross-hospital generalization in critical care. We propose model-based and heuristic approaches to identify OoD environments and systematically compare models with different levels of held-out information. We find that access to OoD data does not translate to increased performance, pointing to inherent limitations in defining potential OoD environments potentially due to data harmonisation and sampling. Echoing similar results with other popular clinical benchmarks in the literature, new approaches are required to evaluate robust models on health records.
△ Less
Submitted 11 November, 2022; v1 submitted 26 May, 2022;
originally announced May 2022.
-
Longitudinal cardio-respiratory fitness prediction through wearables in free-living environments
Authors:
Dimitris Spathis,
Ignacio Perez-Pozuelo,
Tomas I. Gonzales,
Yu Wu,
Soren Brage,
Nicholas Wareham,
Cecilia Mascolo
Abstract:
Cardiorespiratory fitness is an established predictor of metabolic disease and mortality. Fitness is directly measured as maximal oxygen consumption (VO$_{2}max$), or indirectly assessed using heart rate responses to standard exercise tests. However, such testing is costly and burdensome because it requires specialized equipment such as treadmills and oxygen masks, limiting its utility. Modern wea…
▽ More
Cardiorespiratory fitness is an established predictor of metabolic disease and mortality. Fitness is directly measured as maximal oxygen consumption (VO$_{2}max$), or indirectly assessed using heart rate responses to standard exercise tests. However, such testing is costly and burdensome because it requires specialized equipment such as treadmills and oxygen masks, limiting its utility. Modern wearables capture dynamic real-world data which could improve fitness prediction. In this work, we design algorithms and models that convert raw wearable sensor data into cardiorespiratory fitness estimates. We validate these estimates' ability to capture fitness profiles in free-living conditions using the Fenland Study (N=11,059), along with its longitudinal cohort (N=2,675), and a third external cohort using the UK Biobank Validation Study (N=181) who underwent maximal VO$_{2}max$ testing, the gold standard measurement of fitness. Our results show that the combination of wearables and other biomarkers as inputs to neural networks yields a strong correlation to ground truth in a holdout sample (r = 0.82, 95CI 0.80-0.83), outperforming other approaches and models and detects fitness change over time (e.g., after 7 years). We also show how the model's latent space can be used for fitness-aware patient subty** paving the way to scalable interventions and personalized trial recruitment. These results demonstrate the value of wearables for fitness estimation that today can be measured only with laboratory tests.
△ Less
Submitted 24 October, 2022; v1 submitted 6 May, 2022;
originally announced May 2022.
-
A Summary of the ComParE COVID-19 Challenges
Authors:
Harry Coppock,
Alican Akman,
Christian Bergler,
Maurice Gerczuk,
Chloë Brown,
Jagmohan Chauhan,
Andreas Grammenos,
Apinan Hasthanasombat,
Dimitris Spathis,
Tong Xia,
Pietro Cicuta,
**g Han,
Shahin Amiriparian,
Alice Baird,
Lukas Stappen,
Sandra Ottl,
Panagiotis Tzirakis,
Anton Batliner,
Cecilia Mascolo,
Björn W. Schuller
Abstract:
The COVID-19 pandemic has caused massive humanitarian and economic damage. Teams of scientists from a broad range of disciplines have searched for methods to help governments and communities combat the disease. One avenue from the machine learning field which has been explored is the prospect of a digital mass test which can detect COVID-19 from infected individuals' respiratory sounds. We present…
▽ More
The COVID-19 pandemic has caused massive humanitarian and economic damage. Teams of scientists from a broad range of disciplines have searched for methods to help governments and communities combat the disease. One avenue from the machine learning field which has been explored is the prospect of a digital mass test which can detect COVID-19 from infected individuals' respiratory sounds. We present a summary of the results from the INTERSPEECH 2021 Computational Paralinguistics Challenges: COVID-19 Cough, (CCS) and COVID-19 Speech, (CSS).
△ Less
Submitted 17 February, 2022;
originally announced February 2022.
-
Exploring Longitudinal Cough, Breath, and Voice Data for COVID-19 Progression Prediction via Sequential Deep Learning: Model Development and Validation
Authors:
Ting Dang,
**g Han,
Tong Xia,
Dimitris Spathis,
Erika Bondareva,
Chloë Siegele-Brown,
Jagmohan Chauhan,
Andreas Grammenos,
Apinan Hasthanasombat,
Andres Floto,
Pietro Cicuta,
Cecilia Mascolo
Abstract:
Recent work has shown the potential of using audio data (eg, cough, breathing, and voice) in the screening for COVID-19. However, these approaches only focus on one-off detection and detect the infection given the current audio sample, but do not monitor disease progression in COVID-19. Limited exploration has been put forward to continuously monitor COVID-19 progression, especially recovery, thro…
▽ More
Recent work has shown the potential of using audio data (eg, cough, breathing, and voice) in the screening for COVID-19. However, these approaches only focus on one-off detection and detect the infection given the current audio sample, but do not monitor disease progression in COVID-19. Limited exploration has been put forward to continuously monitor COVID-19 progression, especially recovery, through longitudinal audio data. Tracking disease progression characteristics could lead to more timely treatment.
The primary objective of this study is to explore the potential of longitudinal audio samples over time for COVID-19 progression prediction and, especially, recovery trend prediction using sequential deep learning techniques.
Crowdsourced respiratory audio data, including breathing, cough, and voice samples, from 212 individuals over 5-385 days were analyzed. We developed a deep learning-enabled tracking tool using gated recurrent units (GRUs) to detect COVID-19 progression by exploring the audio dynamics of the individuals' historical audio biomarkers. The investigation comprised 2 parts: (1) COVID-19 detection in terms of positive and negative (healthy) tests, and (2) longitudinal disease progression prediction over time in terms of probability of positive tests.
The strong performance for COVID-19 detection, yielding an AUROC of 0.79, a sensitivity of 0.75, and a specificity of 0.71 supported the effectiveness of the approach compared to methods that do not leverage longitudinal dynamics. We further examined the predicted disease progression trajectory, displaying high consistency with test results with a correlation of 0.75 in the test cohort and 0.86 in a subset of the test cohort who reported recovery. Our findings suggest that monitoring COVID-19 evolution via longitudinal audio data has potential in the tracking of individuals' disease progression and recovery.
△ Less
Submitted 22 June, 2022; v1 submitted 4 January, 2022;
originally announced January 2022.
-
Evaluating Contrastive Learning on Wearable Timeseries for Downstream Clinical Outcomes
Authors:
Kevalee Shah,
Dimitris Spathis,
Chi Ian Tang,
Cecilia Mascolo
Abstract:
Vast quantities of person-generated health data (wearables) are collected but the process of annotating to feed to machine learning models is impractical. This paper discusses ways in which self-supervised approaches that use contrastive losses, such as SimCLR and BYOL, previously applied to the vision domain, can be applied to high-dimensional health signals for downstream classification tasks of…
▽ More
Vast quantities of person-generated health data (wearables) are collected but the process of annotating to feed to machine learning models is impractical. This paper discusses ways in which self-supervised approaches that use contrastive losses, such as SimCLR and BYOL, previously applied to the vision domain, can be applied to high-dimensional health signals for downstream classification tasks of various diseases spanning sleep, heart, and metabolic conditions. To this end, we adapt the data augmentation step and the overall architecture to suit the temporal nature of the data (wearable traces) and evaluate on 5 downstream tasks by comparing other state-of-the-art methods including supervised learning and an adversarial unsupervised representation learning method. We show that SimCLR outperforms the adversarial method and a fully-supervised method in the majority of the downstream evaluation tasks, and that all self-supervised methods outperform the fully-supervised methods. This work provides a comprehensive benchmark for contrastive methods applied to the wearable time-series domain, showing the promise of task-agnostic representations for downstream clinical outcomes.
△ Less
Submitted 13 November, 2021;
originally announced November 2021.
-
Sounds of COVID-19: exploring realistic performance of audio-based digital testing
Authors:
**g Han,
Tong Xia,
Dimitris Spathis,
Erika Bondareva,
Chloë Brown,
Jagmohan Chauhan,
Ting Dang,
Andreas Grammenos,
Apinan Hasthanasombat,
Andres Floto,
Pietro Cicuta,
Cecilia Mascolo
Abstract:
Researchers have been battling with the question of how we can identify Coronavirus disease (COVID-19) cases efficiently, affordably and at scale. Recent work has shown how audio based approaches, which collect respiratory audio data (cough, breathing and voice) can be used for testing, however there is a lack of exploration of how biases and methodological decisions impact these tools' performanc…
▽ More
Researchers have been battling with the question of how we can identify Coronavirus disease (COVID-19) cases efficiently, affordably and at scale. Recent work has shown how audio based approaches, which collect respiratory audio data (cough, breathing and voice) can be used for testing, however there is a lack of exploration of how biases and methodological decisions impact these tools' performance in practice. In this paper, we explore the realistic performance of audio-based digital testing of COVID-19. To investigate this, we collected a large crowdsourced respiratory audio dataset through a mobile app, alongside recent COVID-19 test result and symptoms intended as a ground truth. Within the collected dataset, we selected 5,240 samples from 2,478 participants and split them into different participant-independent sets for model development and validation. Among these, we controlled for potential confounding factors (such as demographics and language). The unbiased model takes features extracted from breathing, coughs, and voice signals as predictors and yields an AUC-ROC of 0.71 (95\% CI: 0.65$-$0.77). We further explore different unbalanced distributions to show how biases and participant splits affect performance. Finally, we discuss how the realistic model presented could be integrated in clinical practice to realize continuous, ubiquitous, sustainable and affordable testing at population scale.
△ Less
Submitted 29 June, 2021;
originally announced June 2021.
-
Anticipatory Detection of Compulsive Body-focused Repetitive Behaviors with Wearables
Authors:
Benjamin Lucas Searle,
Dimitris Spathis,
Marios Constantinides,
Daniele Quercia,
Cecilia Mascolo
Abstract:
Body-focused repetitive behaviors (BFRBs), like face-touching or skin-picking, are hand-driven behaviors which can damage one's appearance, if not identified early and treated. Technology for automatic detection is still under-explored, with few previous works being limited to wearables with single modalities (e.g., motion). Here, we propose a multi-sensory approach combining motion, orientation,…
▽ More
Body-focused repetitive behaviors (BFRBs), like face-touching or skin-picking, are hand-driven behaviors which can damage one's appearance, if not identified early and treated. Technology for automatic detection is still under-explored, with few previous works being limited to wearables with single modalities (e.g., motion). Here, we propose a multi-sensory approach combining motion, orientation, and heart rate sensors to detect BFRBs. We conducted a feasibility study in which participants (N=10) were exposed to BFRBs-inducing tasks, and analyzed 380 mins of signals under an extensive evaluation of sensing modalities, cross-validation methods, and observation windows. Our models achieved an AUC > 0.90 in distinguishing BFRBs, which were more evident in observation windows 5 mins prior to the behavior as opposed to 1-min ones. In a follow-up qualitative survey, we found that not only the timing of detection matters but also models need to be context-aware, when designing just-in-time interventions to prevent BFRBs.
△ Less
Submitted 21 June, 2021;
originally announced June 2021.
-
The INTERSPEECH 2021 Computational Paralinguistics Challenge: COVID-19 Cough, COVID-19 Speech, Escalation & Primates
Authors:
Björn W. Schuller,
Anton Batliner,
Christian Bergler,
Cecilia Mascolo,
**g Han,
Iulia Lefter,
Heysem Kaya,
Shahin Amiriparian,
Alice Baird,
Lukas Stappen,
Sandra Ottl,
Maurice Gerczuk,
Panagiotis Tzirakis,
Chloë Brown,
Jagmohan Chauhan,
Andreas Grammenos,
Apinan Hasthanasombat,
Dimitris Spathis,
Tong Xia,
Pietro Cicuta,
Leon J. M. Rothkrantz,
Joeri Zwerts,
Jelle Treep,
Casper Kaandorp
Abstract:
The INTERSPEECH 2021 Computational Paralinguistics Challenge addresses four different problems for the first time in a research competition under well-defined conditions: In the COVID-19 Cough and COVID-19 Speech Sub-Challenges, a binary classification on COVID-19 infection has to be made based on coughing sounds and speech; in the Escalation SubChallenge, a three-way assessment of the level of es…
▽ More
The INTERSPEECH 2021 Computational Paralinguistics Challenge addresses four different problems for the first time in a research competition under well-defined conditions: In the COVID-19 Cough and COVID-19 Speech Sub-Challenges, a binary classification on COVID-19 infection has to be made based on coughing sounds and speech; in the Escalation SubChallenge, a three-way assessment of the level of escalation in a dialogue is featured; and in the Primates Sub-Challenge, four species vs background need to be classified. We describe the Sub-Challenges, baseline feature extraction, and classifiers based on the 'usual' COMPARE and BoAW features as well as deep unsupervised representation learning using the AuDeep toolkit, and deep feature extraction from pre-trained CNNs using the Deep Spectrum toolkit; in addition, we add deep end-to-end sequential modelling, and partially linguistic analysis.
△ Less
Submitted 24 February, 2021;
originally announced February 2021.
-
SelfHAR: Improving Human Activity Recognition through Self-training with Unlabeled Data
Authors:
Chi Ian Tang,
Ignacio Perez-Pozuelo,
Dimitris Spathis,
Soren Brage,
Nick Wareham,
Cecilia Mascolo
Abstract:
Machine learning and deep learning have shown great promise in mobile sensing applications, including Human Activity Recognition. However, the performance of such models in real-world settings largely depends on the availability of large datasets that captures diverse behaviors. Recently, studies in computer vision and natural language processing have shown that leveraging massive amounts of unlab…
▽ More
Machine learning and deep learning have shown great promise in mobile sensing applications, including Human Activity Recognition. However, the performance of such models in real-world settings largely depends on the availability of large datasets that captures diverse behaviors. Recently, studies in computer vision and natural language processing have shown that leveraging massive amounts of unlabeled data enables performance on par with state-of-the-art supervised models.
In this work, we present SelfHAR, a semi-supervised model that effectively learns to leverage unlabeled mobile sensing datasets to complement small labeled datasets. Our approach combines teacher-student self-training, which distills the knowledge of unlabeled and labeled datasets while allowing for data augmentation, and multi-task self-supervision, which learns robust signal-level representations by predicting distorted versions of the input.
We evaluated SelfHAR on various HAR datasets and showed state-of-the-art performance over supervised and previous semi-supervised approaches, with up to 12% increase in F1 score using the same number of model parameters at inference. Furthermore, SelfHAR is data-efficient, reaching similar performance using up to 10 times less labeled data compared to supervised approaches. Our work not only achieves state-of-the-art performance in a diverse set of HAR datasets, but also sheds light on how pre-training tasks may affect downstream performance.
△ Less
Submitted 11 February, 2021;
originally announced February 2021.
-
Exploring Automatic COVID-19 Diagnosis via voice and symptoms from Crowdsourced Data
Authors:
**g Han,
Chloë Brown,
Jagmohan Chauhan,
Andreas Grammenos,
Apinan Hasthanasombat,
Dimitris Spathis,
Tong Xia,
Pietro Cicuta,
Cecilia Mascolo
Abstract:
The development of fast and accurate screening tools, which could facilitate testing and prevent more costly clinical tests, is key to the current pandemic of COVID-19. In this context, some initial work shows promise in detecting diagnostic signals of COVID-19 from audio sounds. In this paper, we propose a voice-based framework to automatically detect individuals who have tested positive for COVI…
▽ More
The development of fast and accurate screening tools, which could facilitate testing and prevent more costly clinical tests, is key to the current pandemic of COVID-19. In this context, some initial work shows promise in detecting diagnostic signals of COVID-19 from audio sounds. In this paper, we propose a voice-based framework to automatically detect individuals who have tested positive for COVID-19. We evaluate the performance of the proposed framework on a subset of data crowdsourced from our app, containing 828 samples from 343 participants. By combining voice signals and reported symptoms, an AUC of $0.79$ has been attained, with a sensitivity of $0.68$ and a specificity of $0.82$. We hope that this study opens the door to rapid, low-cost, and convenient pre-screening tools to automatically detect the disease.
△ Less
Submitted 9 February, 2021;
originally announced February 2021.
-
Self-supervised transfer learning of physiological representations from free-living wearable data
Authors:
Dimitris Spathis,
Ignacio Perez-Pozuelo,
Soren Brage,
Nicholas J. Wareham,
Cecilia Mascolo
Abstract:
Wearable devices such as smartwatches are becoming increasingly popular tools for objectively monitoring physical activity in free-living conditions. To date, research has primarily focused on the purely supervised task of human activity recognition, demonstrating limited success in inferring high-level health outcomes from low-level signals. Here, we present a novel self-supervised representation…
▽ More
Wearable devices such as smartwatches are becoming increasingly popular tools for objectively monitoring physical activity in free-living conditions. To date, research has primarily focused on the purely supervised task of human activity recognition, demonstrating limited success in inferring high-level health outcomes from low-level signals. Here, we present a novel self-supervised representation learning method using activity and heart rate (HR) signals without semantic labels. With a deep neural network, we set HR responses as the supervisory signal for the activity data, leveraging their underlying physiological relationship. In addition, we propose a custom quantile loss function that accounts for the long-tailed HR distribution present in the general population.
We evaluate our model in the largest free-living combined-sensing dataset (comprising >280k hours of wrist accelerometer & wearable ECG data). Our contributions are two-fold: i) the pre-training task creates a model that can accurately forecast HR based only on cheap activity sensors, and ii) we leverage the information captured through this task by proposing a simple method to aggregate the learnt latent representations (embeddings) from the window-level to user-level. Notably, we show that the embeddings can generalize in various downstream tasks through transfer learning with linear classifiers, capturing physiologically meaningful, personalized information. For instance, they can be used to predict variables associated with individuals' health, fitness and demographic characteristics, outperforming unsupervised autoencoders and common bio-markers. Overall, we propose the first multimodal self-supervised method for behavioral and physiological data with implications for large-scale health and lifestyle monitoring.
△ Less
Submitted 18 November, 2020;
originally announced November 2020.
-
Exploring Contrastive Learning in Human Activity Recognition for Healthcare
Authors:
Chi Ian Tang,
Ignacio Perez-Pozuelo,
Dimitris Spathis,
Cecilia Mascolo
Abstract:
Human Activity Recognition (HAR) constitutes one of the most important tasks for wearable and mobile sensing given its implications in human well-being and health monitoring. Motivated by the limitations of labeled datasets in HAR, particularly when employed in healthcare-related applications, this work explores the adoption and adaptation of SimCLR, a contrastive learning technique for visual rep…
▽ More
Human Activity Recognition (HAR) constitutes one of the most important tasks for wearable and mobile sensing given its implications in human well-being and health monitoring. Motivated by the limitations of labeled datasets in HAR, particularly when employed in healthcare-related applications, this work explores the adoption and adaptation of SimCLR, a contrastive learning technique for visual representations, to HAR. The use of contrastive learning objectives causes the representations of corresponding views to be more similar, and those of non-corresponding views to be more different. After an extensive evaluation exploring 64 combinations of different signal transformations for augmenting the data, we observed significant performance differences owing to the order and the function thereof. In particular, preliminary results indicated an improvement over supervised and unsupervised learning methods when using fine-tuning and random rotation for augmentation, however, future work should explore under which conditions SimCLR is beneficial for HAR systems and other healthcare-related applications.
△ Less
Submitted 11 February, 2021; v1 submitted 23 November, 2020;
originally announced November 2020.
-
Learning Generalizable Physiological Representations from Large-scale Wearable Data
Authors:
Dimitris Spathis,
Ignacio Perez-Pozuelo,
Soren Brage,
Nicholas J. Wareham,
Cecilia Mascolo
Abstract:
To date, research on sensor-equipped mobile devices has primarily focused on the purely supervised task of human activity recognition (walking, running, etc), demonstrating limited success in inferring high-level health outcomes from low-level signals, such as acceleration. Here, we present a novel self-supervised representation learning method using activity and heart rate (HR) signals without se…
▽ More
To date, research on sensor-equipped mobile devices has primarily focused on the purely supervised task of human activity recognition (walking, running, etc), demonstrating limited success in inferring high-level health outcomes from low-level signals, such as acceleration. Here, we present a novel self-supervised representation learning method using activity and heart rate (HR) signals without semantic labels. With a deep neural network, we set HR responses as the supervisory signal for the activity data, leveraging their underlying physiological relationship.
We evaluate our model in the largest free-living combined-sensing dataset (comprising more than 280,000 hours of wrist accelerometer & wearable ECG data) and show that the resulting embeddings can generalize in various downstream tasks through transfer learning with linear classifiers, capturing physiologically meaningful, personalized information. For instance, they can be used to predict (higher than 70 AUC) variables associated with individuals' health, fitness and demographic characteristics, outperforming unsupervised autoencoders and common bio-markers. Overall, we propose the first multimodal self-supervised method for behavioral and physiological data with implications for large-scale health and lifestyle monitoring.
△ Less
Submitted 9 November, 2020;
originally announced November 2020.
-
Exploring Automatic Diagnosis of COVID-19 from Crowdsourced Respiratory Sound Data
Authors:
Chloë Brown,
Jagmohan Chauhan,
Andreas Grammenos,
**g Han,
Apinan Hasthanasombat,
Dimitris Spathis,
Tong Xia,
Pietro Cicuta,
Cecilia Mascolo
Abstract:
Audio signals generated by the human body (e.g., sighs, breathing, heart, digestion, vibration sounds) have routinely been used by clinicians as indicators to diagnose disease or assess disease progression. Until recently, such signals were usually collected through manual auscultation at scheduled visits. Research has now started to use digital technology to gather bodily sounds (e.g., from digit…
▽ More
Audio signals generated by the human body (e.g., sighs, breathing, heart, digestion, vibration sounds) have routinely been used by clinicians as indicators to diagnose disease or assess disease progression. Until recently, such signals were usually collected through manual auscultation at scheduled visits. Research has now started to use digital technology to gather bodily sounds (e.g., from digital stethoscopes) for cardiovascular or respiratory examination, which could then be used for automatic analysis. Some initial work shows promise in detecting diagnostic signals of COVID-19 from voice and coughs. In this paper we describe our data analysis over a large-scale crowdsourced dataset of respiratory sounds collected to aid diagnosis of COVID-19. We use coughs and breathing to understand how discernible COVID-19 sounds are from those in asthma or healthy controls. Our results show that even a simple binary machine learning classifier is able to classify correctly healthy and COVID-19 sounds. We also show how we distinguish a user who tested positive for COVID-19 and has a cough from a healthy user with a cough, and users who tested positive for COVID-19 and have a cough from users with asthma and a cough. Our models achieve an AUC of above 80% across all tasks. These results are preliminary and only scratch the surface of the potential of this type of data and audio-based machine learning. This work opens the door to further investigation of how automatically analysed respiratory patterns could be used as pre-screening signals to aid COVID-19 diagnosis.
△ Less
Submitted 18 January, 2021; v1 submitted 10 June, 2020;
originally announced June 2020.
-
Interactive dimensionality reduction using similarity projections
Authors:
Dimitris Spathis,
Nikolaos Passalis,
Anastasios Tefas
Abstract:
Recent advances in machine learning allow us to analyze and describe the content of high-dimensional data like text, audio, images or other signals. In order to visualize that data in 2D or 3D, usually Dimensionality Reduction (DR) techniques are employed. Most of these techniques, e.g., PCA or t-SNE, produce static projections without taking into account corrections from humans or other data expl…
▽ More
Recent advances in machine learning allow us to analyze and describe the content of high-dimensional data like text, audio, images or other signals. In order to visualize that data in 2D or 3D, usually Dimensionality Reduction (DR) techniques are employed. Most of these techniques, e.g., PCA or t-SNE, produce static projections without taking into account corrections from humans or other data exploration scenarios. In this work, we propose the interactive Similarity Projection (iSP), a novel interactive DR framework based on similarity embeddings, where we form a differentiable objective based on the user interactions and perform learning using gradient descent, with an end-to-end trainable architecture. Two interaction scenarios are evaluated. First, a common methodology in multidimensional projection is to project a subset of data, arrange them in classes or clusters, and project the rest unseen dataset based on that manipulation, in a kind of semi-supervised interpolation. We report results that outperform competitive baselines in a wide range of metrics and datasets. Second, we explore the scenario of manipulating some classes, while enriching the optimization with high-dimensional neighbor information. Apart from improving classification precision and clustering on images and text documents, the new emerging structure of the projection unveils semantic manifolds. For example, on the Head Pose dataset, by just dragging the faces looking far left to the left and those looking far right to the right, all faces are re-arranged on a continuum even on the vertical axis (face up and down). This end-to-end framework can be used for fast, visual semi-supervised learning, manifold exploration, interactive domain adaptation of neural embeddings and transfer learning.
△ Less
Submitted 13 November, 2018;
originally announced November 2018.
-
Photo-Quality Evaluation based on Computational Aesthetics: Review of Feature Extraction Techniques
Authors:
Dimitris Spathis
Abstract:
Researchers try to model the aesthetic quality of photographs into low and high- level features, drawing inspiration from art theory, psychology and marketing. We attempt to describe every feature extraction measure employed in the above process. The contribution of this literature review is the taxonomy of each feature by its implementation complexity, considering real-world applications and inte…
▽ More
Researchers try to model the aesthetic quality of photographs into low and high- level features, drawing inspiration from art theory, psychology and marketing. We attempt to describe every feature extraction measure employed in the above process. The contribution of this literature review is the taxonomy of each feature by its implementation complexity, considering real-world applications and integration in mobile apps and digital cameras. Also, we discuss the machine learning results along with some unexplored research areas as future work.
△ Less
Submitted 19 December, 2016;
originally announced December 2016.