-
Enhancing Dimension-Reduced Scatter Plots with Class and Feature Centroids
Authors:
Daniel B. Hier,
Tayo Obafemi-Ajayi,
Gayla R. Olbricht,
Devin M. Burns,
Sasha Petrenko,
Donald C. Wunsch II
Abstract:
Dimension reduction is increasingly applied to high-dimensional biomedical data to improve its interpretability. When datasets are reduced to two dimensions, each observation is assigned an x and y coordinates and is represented as a point on a scatter plot. A significant challenge lies in interpreting the meaning of the x and y axes due to the complexities inherent in dimension reduction. This st…
▽ More
Dimension reduction is increasingly applied to high-dimensional biomedical data to improve its interpretability. When datasets are reduced to two dimensions, each observation is assigned an x and y coordinates and is represented as a point on a scatter plot. A significant challenge lies in interpreting the meaning of the x and y axes due to the complexities inherent in dimension reduction. This study addresses this challenge by using the x and y coordinates derived from dimension reduction to calculate class and feature centroids, which can be overlaid onto the scatter plots. This method connects the low-dimension space to the original high-dimensional space. We illustrate the utility of this approach with data derived from the phenotypes of three neurogenetic diseases and demonstrate how the addition of class and feature centroids increases the interpretability of scatter plots.
△ Less
Submitted 29 March, 2024;
originally announced March 2024.
-
Social Behavior and Mental Health: A Snapshot Survey under COVID-19 Pandemic
Authors:
Sahraoui Dhelim,
Liming Luke Chen,
Huansheng Ning,
Sajal K Das,
Chris Nugent,
Devin Burns,
Gerard Leavey,
Dirk Pesch,
Eleanor Bantry-White
Abstract:
Online social media provides a channel for monitoring people's social behaviors and their mental distress. Due to the restrictions imposed by COVID-19 people are increasingly using online social networks to express their feelings. Consequently, there is a significant amount of diverse user-generated social media content. However, COVID-19 pandemic has changed the way we live, study, socialize and…
▽ More
Online social media provides a channel for monitoring people's social behaviors and their mental distress. Due to the restrictions imposed by COVID-19 people are increasingly using online social networks to express their feelings. Consequently, there is a significant amount of diverse user-generated social media content. However, COVID-19 pandemic has changed the way we live, study, socialize and recreate and this has affected our well-being and mental health problems. There are growing researches that leverage online social media analysis to detect and assess user's mental status. In this paper, we survey the literature of social media analysis for mental disorders detection, with a special focus on the studies conducted in the context of COVID-19 during 2020-2021. Firstly, we classify the surveyed studies in terms of feature extraction types, varying from language usage patterns to aesthetic preferences and online behaviors. Secondly, we explore detection methods used for mental disorders detection including machine learning and deep learning detection methods. Finally, we discuss the challenges of mental disorder detection using social media data, including the privacy and ethical concerns, as well as the technical challenges of scaling and deploying such systems at large scales, and discuss the learnt lessons over the last few years.
△ Less
Submitted 17 May, 2021;
originally announced May 2021.
-
How Domain Terminology Affects Meeting Summarization Performance
Authors:
Jia ** Koay,
Alexander Roustai,
Xiao** Dai,
Dillon Burns,
Alec Kerrigan,
Fei Liu
Abstract:
Meetings are essential to modern organizations. Numerous meetings are held and recorded daily, more than can ever be comprehended. A meeting summarization system that identifies salient utterances from the transcripts to automatically generate meeting minutes can help. It empowers users to rapidly search and sift through large meeting collections. To date, the impact of domain terminology on the p…
▽ More
Meetings are essential to modern organizations. Numerous meetings are held and recorded daily, more than can ever be comprehended. A meeting summarization system that identifies salient utterances from the transcripts to automatically generate meeting minutes can help. It empowers users to rapidly search and sift through large meeting collections. To date, the impact of domain terminology on the performance of meeting summarization remains understudied, despite that meetings are rich with domain knowledge. In this paper, we create gold-standard annotations for domain terminology on a sizable meeting corpus; they are known as jargon terms. We then analyze the performance of a meeting summarization system with and without jargon terms. Our findings reveal that domain terminology can have a substantial impact on summarization performance. We publicly release all domain terminology to advance research in meeting summarization.
△ Less
Submitted 8 November, 2020; v1 submitted 1 November, 2020;
originally announced November 2020.
-
Personalized Activity Recognition with Deep Triplet Embeddings
Authors:
David M. Burns,
Cari M. Whyne
Abstract:
A significant challenge for a supervised learning approach to inertial human activity recognition is the heterogeneity of data between individual users, resulting in very poor performance of impersonal algorithms for some subjects. We present an approach to personalized activity recognition based on deep embeddings derived from a fully convolutional neural network. We experiment with both categori…
▽ More
A significant challenge for a supervised learning approach to inertial human activity recognition is the heterogeneity of data between individual users, resulting in very poor performance of impersonal algorithms for some subjects. We present an approach to personalized activity recognition based on deep embeddings derived from a fully convolutional neural network. We experiment with both categorical cross entropy loss and triplet loss for training the embedding, and describe a novel triplet loss function based on subject triplets. We evaluate these methods on three publicly available inertial human activity recognition data sets (MHEALTH, WISDM, and SPAR) comparing classification accuracy, out-of-distribution activity detection, and embedding generalization to new activities. The novel subject triplet loss provides the best performance overall, and all personalized deep embeddings out-perform our baseline personalized engineered feature embedding and an impersonal fully convolutional neural network classifier.
△ Less
Submitted 15 January, 2020;
originally announced January 2020.
-
Seglearn: A Python Package for Learning Sequences and Time Series
Authors:
David M. Burns,
Cari M. Whyne
Abstract:
Seglearn is an open-source python package for machine learning time series or sequences using a sliding window segmentation approach. The implementation provides a flexible pipeline for tackling classification, regression, and forecasting problems with multivariate sequence and contextual data. This package is compatible with scikit-learn and is listed under scikit-learn Related Projects. The pack…
▽ More
Seglearn is an open-source python package for machine learning time series or sequences using a sliding window segmentation approach. The implementation provides a flexible pipeline for tackling classification, regression, and forecasting problems with multivariate sequence and contextual data. This package is compatible with scikit-learn and is listed under scikit-learn Related Projects. The package depends on numpy, scipy, and scikit-learn. Seglearn is distributed under the BSD 3-Clause License. Documentation includes a detailed API description, user guide, and examples. Unit tests provide a high degree of code coverage.
△ Less
Submitted 18 October, 2018; v1 submitted 21 March, 2018;
originally announced March 2018.
-
Shoulder Physiotherapy Exercise Recognition: Machine Learning the Inertial Signals from a Smartwatch
Authors:
David Burns,
Nathan Leung,
Michael Hardisty,
Cari Whyne,
Patrick Henry,
Stewart McLachlin
Abstract:
Objective: Participation in a physical therapy program is considered one of the greatest predictors of successful conservative management of common shoulder disorders. However, adherence to these protocols is often poor and typically worse for unsupervised home exercise programs. Currently, there are limited tools available for objective measurement of adherence in the home setting. The goal of th…
▽ More
Objective: Participation in a physical therapy program is considered one of the greatest predictors of successful conservative management of common shoulder disorders. However, adherence to these protocols is often poor and typically worse for unsupervised home exercise programs. Currently, there are limited tools available for objective measurement of adherence in the home setting. The goal of this study was to develop and evaluate the potential for performing home shoulder physiotherapy monitoring using a commercial smartwatch.
Approach: Twenty healthy adult subjects with no prior shoulder disorders performed seven exercises from an evidence-based rotator cuff physiotherapy protocol, while 6-axis inertial sensor data was collected from the active extremity. Within an activity recognition chain (ARC) framework, four supervised learning algorithms were trained and optimized to classify the exercises: k-nearest neighbor (k-NN), random forest (RF), support vector machine classifier (SVC), and a convolutional recurrent neural network (CRNN). Algorithm performance was evaluated using 5-fold cross-validation stratified first temporally and then by subject.
Main Results: Categorical classification accuracy was above 94% for all algorithms on the temporally stratified cross validation, with the best performance achieved by the CRNN algorithm (99.4%). The subject stratified cross validation, which evaluated classifier performance on unseen subjects, yielded lower accuracies scores again with CRNN performing best (88.9%).
Significance: This proof of concept study demonstrates the technical feasibility of a smartwatch device and supervised machine learning approach to more easily monitor and assess the at-home adherence of shoulder physiotherapy exercise protocols.
△ Less
Submitted 28 February, 2018; v1 submitted 5 February, 2018;
originally announced February 2018.