-
Efficient Unbiased Sparsification
Authors:
Leighton Barnes,
Timothy Chow,
Emma Cohen,
Keith Frankston,
Benjamin Howard,
Fred Kochman,
Daniel Scheinerman,
Jeffrey VanderKam
Abstract:
An unbiased $m$-sparsification of a vector $p\in \mathbb{R}^n$ is a random vector $Q\in \mathbb{R}^n$ with mean $p$ that has at most $m<n$ nonzero coordinates. Unbiased sparsification compresses the original vector without introducing bias; it arises in various contexts, such as in federated learning and sampling sparse probability distributions. Ideally, unbiased sparsification should also minimi…
▽ More
An unbiased $m$-sparsification of a vector $p\in \mathbb{R}^n$ is a random vector $Q\in \mathbb{R}^n$ with mean $p$ that has at most $m<n$ nonzero coordinates. Unbiased sparsification compresses the original vector without introducing bias; it arises in various contexts, such as in federated learning and sampling sparse probability distributions. Ideally, unbiased sparsification should also minimize the expected value of a divergence function $\mathsf{Div}(Q,p)$ that measures how far away $Q$ is from the original $p$. If $Q$ is optimal in this sense, then we call it efficient. Our main results describe efficient unbiased sparsifications for divergences that are either permutation-invariant or additively separable. Surprisingly, the characterization for permutation-invariant divergences is robust to the choice of divergence function, in the sense that our class of optimal $Q$ for squared Euclidean distance coincides with our class of optimal $Q$ for Kullback-Leibler divergence, or indeed any of a wide variety of divergences.
△ Less
Submitted 22 February, 2024;
originally announced February 2024.
-
Multivariate Priors and the Linearity of Optimal Bayesian Estimators under Gaussian Noise
Authors:
Leighton P. Barnes,
Alex Dytso,
**gbo Liu,
H. Vincent Poor
Abstract:
Consider the task of estimating a random vector $X$ from noisy observations $Y = X + Z$, where $Z$ is a standard normal vector, under the $L^p$ fidelity criterion. This work establishes that, for $1 \leq p \leq 2$, the optimal Bayesian estimator is linear and positive definite if and only if the prior distribution on $X$ is a (non-degenerate) multivariate Gaussian. Furthermore, for $p > 2$, it is…
▽ More
Consider the task of estimating a random vector $X$ from noisy observations $Y = X + Z$, where $Z$ is a standard normal vector, under the $L^p$ fidelity criterion. This work establishes that, for $1 \leq p \leq 2$, the optimal Bayesian estimator is linear and positive definite if and only if the prior distribution on $X$ is a (non-degenerate) multivariate Gaussian. Furthermore, for $p > 2$, it is demonstrated that there are infinitely many priors that can induce such an estimator.
△ Less
Submitted 29 January, 2024;
originally announced January 2024.
-
Incremental Semi-supervised Federated Learning for Health Inference via Mobile Sensing
Authors:
Guimin Dong,
Lihua Cai,
Mingyue Tang,
Laura E. Barnes,
Mehdi Boukhechba
Abstract:
Mobile sensing appears as a promising solution for health inference problem (e.g., influenza-like symptom recognition) by leveraging diverse smart sensors to capture fine-grained information about human behaviors and ambient contexts. Centralized training of machine learning models can place mobile users' sensitive information under privacy risks due to data breach and misexploitation. Federated L…
▽ More
Mobile sensing appears as a promising solution for health inference problem (e.g., influenza-like symptom recognition) by leveraging diverse smart sensors to capture fine-grained information about human behaviors and ambient contexts. Centralized training of machine learning models can place mobile users' sensitive information under privacy risks due to data breach and misexploitation. Federated Learning (FL) enables mobile devices to collaboratively learn global models without the exposure of local private data. However, there are challenges of on-device FL deployment using mobile sensing: 1) long-term and continuously collected mobile sensing data may exhibit domain shifts as sensing objects (e.g. humans) have varying behaviors as a result of internal and/or external stimulus; 2) model retraining using all available data may increase computation and memory burden; and 3) the sparsity of annotated crowd-sourced data causes supervised FL to lack robustness. In this work, we propose FedMobile, an incremental semi-supervised federated learning algorithm, to train models semi-supervisedly and incrementally in a decentralized online fashion. We evaluate FedMobile using a real-world mobile sensing dataset for influenza-like symptom recognition. Our empirical results show that FedMobile-trained models achieve the best results in comparison to the selected baseline methods.
△ Less
Submitted 19 December, 2023;
originally announced December 2023.
-
$L^1$ Estimation: On the Optimality of Linear Estimators
Authors:
Leighton P. Barnes,
Alex Dytso,
**gbo Liu,
H. Vincent Poor
Abstract:
Consider the problem of estimating a random variable $X$ from noisy observations $Y = X+ Z$, where $Z$ is standard normal, under the $L^1$ fidelity criterion. It is well known that the optimal Bayesian estimator in this setting is the conditional median. This work shows that the only prior distribution on $X$ that induces linearity in the conditional median is Gaussian.
Along the way, several ot…
▽ More
Consider the problem of estimating a random variable $X$ from noisy observations $Y = X+ Z$, where $Z$ is standard normal, under the $L^1$ fidelity criterion. It is well known that the optimal Bayesian estimator in this setting is the conditional median. This work shows that the only prior distribution on $X$ that induces linearity in the conditional median is Gaussian.
Along the way, several other results are presented. In particular, it is demonstrated that if the conditional distribution $P_{X|Y=y}$ is symmetric for all $y$, then $X$ must follow a Gaussian distribution. Additionally, we consider other $L^p$ losses and observe the following phenomenon: for $p \in [1,2]$, Gaussian is the only prior distribution that induces a linear optimal Bayesian estimator, and for $p \in (2,\infty)$, infinitely many prior distributions on $X$ can induce linearity. Finally, extensions are provided to encompass noise models leading to conditional distributions from certain exponential families.
△ Less
Submitted 9 January, 2024; v1 submitted 16 September, 2023;
originally announced September 2023.
-
Personalized State Anxiety Detection: An Empirical Study with Linguistic Biomarkers and A Machine Learning Pipeline
Authors:
Zhiyuan Wang,
Mingyue Tang,
Maria A. Larrazabal,
Emma R. Toner,
Mark Rucker,
Congyu Wu,
Bethany A. Teachman,
Mehdi Boukhechba,
Laura E. Barnes
Abstract:
Individuals high in social anxiety symptoms often exhibit elevated state anxiety in social situations. Research has shown it is possible to detect state anxiety by leveraging digital biomarkers and machine learning techniques. However, most existing work trains models on an entire group of participants, failing to capture individual differences in their psychological and behavioral responses to so…
▽ More
Individuals high in social anxiety symptoms often exhibit elevated state anxiety in social situations. Research has shown it is possible to detect state anxiety by leveraging digital biomarkers and machine learning techniques. However, most existing work trains models on an entire group of participants, failing to capture individual differences in their psychological and behavioral responses to social contexts. To address this concern, in Study 1, we collected linguistic data from N=35 high socially anxious participants in a variety of social contexts, finding that digital linguistic biomarkers significantly differ between evaluative vs. non-evaluative social contexts and between individuals having different trait psychological symptoms, suggesting the likely importance of personalized approaches to detect state anxiety. In Study 2, we used the same data and results from Study 1 to model a multilayer personalized machine learning pipeline to detect state anxiety that considers contextual and individual differences. This personalized model outperformed the baseline F1-score by 28.0%. Results suggest that state anxiety can be more accurately detected with personalized machine learning approaches, and that linguistic biomarkers hold promise for identifying periods of state anxiety in an unobtrusive way.
△ Less
Submitted 19 April, 2023;
originally announced April 2023.
-
Wearable Sensor-based Multimodal Physiological Responses of Socially Anxious Individuals across Social Contexts
Authors:
Emma R. Toner,
Mark Rucker,
Zhiyuan Wang,
Maria A. Larrazabal,
Lihua Cai,
Debajyoti Datta,
Elizabeth Thompson,
Haroon Lone,
Mehdi Boukhechba,
Bethany A. Teachman,
Laura E. Barnes
Abstract:
Correctly identifying an individual's social context from passively worn sensors holds promise for delivering just-in-time adaptive interventions (JITAIs) to treat social anxiety disorder. In this study, we present results using passively collected data from a within-subject experiment that assessed physiological response across different social contexts (i.e, alone vs. with others), social phases…
▽ More
Correctly identifying an individual's social context from passively worn sensors holds promise for delivering just-in-time adaptive interventions (JITAIs) to treat social anxiety disorder. In this study, we present results using passively collected data from a within-subject experiment that assessed physiological response across different social contexts (i.e, alone vs. with others), social phases (i.e., pre- and post-interaction vs. during an interaction), social interaction sizes (i.e., dyadic vs. group interactions), and levels of social threat (i.e., implicit vs. explicit social evaluation). Participants in the study ($N=46$) reported moderate to severe social anxiety symptoms as assessed by the Social Interaction Anxiety Scale ($\geq$34 out of 80). Univariate paired difference tests, multivariate random forest models, and follow-up cluster analyses were used to explore physiological response patterns across different social and non-social contexts. Our results suggest that social context is more reliably distinguishable than social phase, group size, or level of social threat, but that there is considerable variability in physiological response patterns even among these distinguishable contexts. Implications for real-world context detection and deployment of JITAIs are discussed.
△ Less
Submitted 3 April, 2023;
originally announced April 2023.
-
Shape Analysis for Pediatric Upper Body Motor Function Assessment
Authors:
Shashwat Kumar,
Robert Gutierez,
Debajyoti Datta,
Sarah Tolman,
Allison McCrady,
Silvia Blemker,
Rebecca J. Scharf,
Laura Barnes
Abstract:
Neuromuscular disorders, such as Spinal Muscular Atrophy (SMA) and Duchenne Muscular Dystrophy (DMD), cause progressive muscular degeneration and loss of motor function for 1 in 6,000 children. Traditional upper limb motor function assessments do not quantitatively measure patient-performed motions, which makes it difficult to track progress for incremental changes. Assessing motor function in chi…
▽ More
Neuromuscular disorders, such as Spinal Muscular Atrophy (SMA) and Duchenne Muscular Dystrophy (DMD), cause progressive muscular degeneration and loss of motor function for 1 in 6,000 children. Traditional upper limb motor function assessments do not quantitatively measure patient-performed motions, which makes it difficult to track progress for incremental changes. Assessing motor function in children with neuromuscular disorders is particularly challenging because they can be nervous or excited during experiments, or simply be too young to follow precise instructions. These challenges translate to confounding factors such as performing different parts of the arm curl slower or faster (phase variability) which affects the assessed motion quality. This paper uses curve registration and shape analysis to temporally align trajectories while simultaneously extracting a mean reference shape. Distances from this mean shape are used to assess the quality of motion. The proposed metric is invariant to confounding factors, such as phase variability, while suggesting several clinically relevant insights. First, there are statistically significant differences between functional scores for the control and patient populations (p$=$0.0213$\le$0.05). Next, several patients in the patient cohort are able to perform motion on par with the healthy cohort and vice versa. Our metric, which is computed based on wearables, is related to the Brooke's score ((p$=$0.00063$\le$0.05)), as well as motor function assessments based on dynamometry ((p$=$0.0006$\le$0.05)). These results show promise towards ubiquitous motion quality assessment in daily life.
△ Less
Submitted 10 September, 2022;
originally announced September 2022.
-
Graph Neural Networks in IoT: A Survey
Authors:
Guimin Dong,
Mingyue Tang,
Zhiyuan Wang,
Jiechao Gao,
Sikun Guo,
Lihua Cai,
Robert Gutierrez,
Bradford Campbell,
Laura E. Barnes,
Mehdi Boukhechba
Abstract:
The Internet of Things (IoT) boom has revolutionized almost every corner of people's daily lives: healthcare, home, transportation, manufacturing, supply chain, and so on. With the recent development of sensor and communication technologies, IoT devices including smart wearables, cameras, smartwatches, and autonomous vehicles can accurately measure and perceive their surrounding environment. Conti…
▽ More
The Internet of Things (IoT) boom has revolutionized almost every corner of people's daily lives: healthcare, home, transportation, manufacturing, supply chain, and so on. With the recent development of sensor and communication technologies, IoT devices including smart wearables, cameras, smartwatches, and autonomous vehicles can accurately measure and perceive their surrounding environment. Continuous sensing generates massive amounts of data and presents challenges for machine learning. Deep learning models (e.g., convolution neural networks and recurrent neural networks) have been extensively employed in solving IoT tasks by learning patterns from multi-modal sensory data. Graph Neural Networks (GNNs), an emerging and fast-growing family of neural network models, can capture complex interactions within sensor topology and have been demonstrated to achieve state-of-the-art results in numerous IoT learning tasks. In this survey, we present a comprehensive review of recent advances in the application of GNNs to the IoT field, including a deep dive analysis of GNN design in various IoT sensing environments, an overarching list of public data and source code from the collected publications, and future research directions. To keep track of newly published works, we collect representative papers and their open-source implementations and create a Github repository at https://github.com/GuiminDong/GNN4IoT.
△ Less
Submitted 31 March, 2022; v1 submitted 29 March, 2022;
originally announced March 2022.
-
Improved Information Theoretic Generalization Bounds for Distributed and Federated Learning
Authors:
L. P. Barnes,
Alex Dytso,
H. V. Poor
Abstract:
We consider information-theoretic bounds on expected generalization error for statistical learning problems in a networked setting. In this setting, there are $K$ nodes, each with its own independent dataset, and the models from each node have to be aggregated into a final centralized model. We consider both simple averaging of the models as well as more complicated multi-round algorithms. We give…
▽ More
We consider information-theoretic bounds on expected generalization error for statistical learning problems in a networked setting. In this setting, there are $K$ nodes, each with its own independent dataset, and the models from each node have to be aggregated into a final centralized model. We consider both simple averaging of the models as well as more complicated multi-round algorithms. We give upper bounds on the expected generalization error for a variety of problems, such as those with Bregman divergence or Lipschitz continuous losses, that demonstrate an improved dependence of $1/K$ on the number of nodes. These "per node" bounds are in terms of the mutual information between the training dataset and the trained weights at each node, and are therefore useful in describing the generalization properties inherent to having communication or privacy constraints at each node.
△ Less
Submitted 15 January, 2024; v1 submitted 4 February, 2022;
originally announced February 2022.
-
Improving mathematical questioning in teacher training
Authors:
Debajyoti Datta,
Maria Phillips,
James P Bywater,
Jennifer Chiu,
Ginger S. Watson,
Laura E. Barnes,
Donald E Brown
Abstract:
High-fidelity, AI-based simulated classroom systems enable teachers to rehearse effective teaching strategies. However, dialogue-oriented open-ended conversations such as teaching a student about scale factors can be difficult to model. This paper builds a text-based interactive conversational agent to help teachers practice mathematical questioning skills based on the well-known Instructional Qua…
▽ More
High-fidelity, AI-based simulated classroom systems enable teachers to rehearse effective teaching strategies. However, dialogue-oriented open-ended conversations such as teaching a student about scale factors can be difficult to model. This paper builds a text-based interactive conversational agent to help teachers practice mathematical questioning skills based on the well-known Instructional Quality Assessment. We take a human-centered approach to designing our system, relying on advances in deep learning, uncertainty quantification, and natural language processing while acknowledging the limitations of conversational agents for specific pedagogical needs. Using experts' input directly during the simulation, we demonstrate how conversation success rate and high user satisfaction can be achieved.
△ Less
Submitted 6 December, 2021; v1 submitted 2 December, 2021;
originally announced December 2021.
-
Evaluation of mathematical questioning strategies using data collected through weak supervision
Authors:
Debajyoti Datta,
Maria Phillips,
James P Bywater,
Jennifer Chiu,
Ginger S. Watson,
Laura E. Barnes,
Donald E Brown
Abstract:
A large body of research demonstrates how teachers' questioning strategies can improve student learning outcomes. However, develo** new scenarios is challenging because of the lack of training data for a specific scenario and the costs associated with labeling. This paper presents a high-fidelity, AI-based classroom simulator to help teachers rehearse research-based mathematical questioning skil…
▽ More
A large body of research demonstrates how teachers' questioning strategies can improve student learning outcomes. However, develo** new scenarios is challenging because of the lack of training data for a specific scenario and the costs associated with labeling. This paper presents a high-fidelity, AI-based classroom simulator to help teachers rehearse research-based mathematical questioning skills. Using a human-in-the-loop approach, we collected a high-quality training dataset for a mathematical questioning scenario. Using recent advances in uncertainty quantification, we evaluated our conversational agent for usability and analyzed the practicality of incorporating a human-in-the-loop approach for data collection and system evaluation for a mathematical questioning scenario.
△ Less
Submitted 2 December, 2021;
originally announced December 2021.
-
From Personalized Medicine to Population Health: A Survey of mHealth Sensing Techniques
Authors:
Zhiyuan Wang,
Haoyi Xiong,
Jie Zhang,
Sijia Yang,
Mehdi Boukhechba,
Laura E. Barnes,
Daqing Zhang,
De**g Dou
Abstract:
Mobile Sensing Apps have been widely used as a practical approach to collect behavioral and health-related information from individuals and provide timely intervention to promote health and well-beings, such as mental health and chronic cares. As the objectives of mobile sensing could be either \emph{(a) personalized medicine for individuals} or \emph{(b) public health for populations}, in this wo…
▽ More
Mobile Sensing Apps have been widely used as a practical approach to collect behavioral and health-related information from individuals and provide timely intervention to promote health and well-beings, such as mental health and chronic cares. As the objectives of mobile sensing could be either \emph{(a) personalized medicine for individuals} or \emph{(b) public health for populations}, in this work we review the design of these mobile sensing apps, and propose to categorize the design of these apps/systems in two paradigms -- \emph{(i) Personal Sensing} and \emph{(ii) Crowd Sensing} paradigms. While both sensing paradigms might incorporate with common ubiquitous sensing technologies, such as wearable sensors, mobility monitoring, mobile data offloading, and/or cloud-based data analytics to collect and process sensing data from individuals, we present a novel taxonomy system with two major components that can specify and classify apps/systems from aspects of the life-cycle of mHealth Sensing: \emph{(1) Sensing Task Creation \& Participation}, \emph{(2) Health Surveillance \& Data Collection}, and \emph{(3) Data Analysis \& Knowledge Discovery}. With respect to different goals of the two paradigms, this work systematically reviews this field, and summarizes the design of typical apps/systems in the view of the configurations and interactions between these two components. In addition to summarization, the proposed taxonomy system also helps figure out the potential directions of mobile sensing for health from both personalized medicines and population health perspectives.
△ Less
Submitted 16 May, 2022; v1 submitted 2 July, 2021;
originally announced July 2021.
-
Over-the-Air Statistical Estimation
Authors:
Chuan-Zheng Lee,
Leighton Pate Barnes,
Ayfer Ozgur
Abstract:
We study schemes and lower bounds for distributed minimax statistical estimation over a Gaussian multiple-access channel (MAC) under squared error loss, in a framework combining statistical estimation and wireless communication. First, we develop "analog" joint estimation-communication schemes that exploit the superposition property of the Gaussian MAC and we characterize their risk in terms of th…
▽ More
We study schemes and lower bounds for distributed minimax statistical estimation over a Gaussian multiple-access channel (MAC) under squared error loss, in a framework combining statistical estimation and wireless communication. First, we develop "analog" joint estimation-communication schemes that exploit the superposition property of the Gaussian MAC and we characterize their risk in terms of the number of nodes and dimension of the parameter space. Then, we derive information-theoretic lower bounds on the minimax risk of any estimation scheme restricted to communicate the samples over a given number of uses of the channel and show that the risk achieved by our proposed schemes is within a logarithmic factor of these lower bounds. We compare both achievability and lower bound results to previous "digital" lower bounds, where nodes transmit errorless bits at the Shannon capacity of the MAC, showing that estimation schemes that leverage the physical layer offer a drastic reduction in estimation error over digital schemes relying on a physical-layer abstraction.
△ Less
Submitted 5 March, 2021;
originally announced March 2021.
-
Fisher Information and Mutual Information Constraints
Authors:
Leighton Pate Barnes,
Ayfer Ozgur
Abstract:
We consider the processing of statistical samples $X\sim P_θ$ by a channel $p(y|x)$, and characterize how the statistical information from the samples for estimating the parameter $θ\in\mathbb{R}^d$ can scale with the mutual information or capacity of the channel. We show that if the statistical model has a sub-Gaussian score function, then the trace of the Fisher information matrix for estimating…
▽ More
We consider the processing of statistical samples $X\sim P_θ$ by a channel $p(y|x)$, and characterize how the statistical information from the samples for estimating the parameter $θ\in\mathbb{R}^d$ can scale with the mutual information or capacity of the channel. We show that if the statistical model has a sub-Gaussian score function, then the trace of the Fisher information matrix for estimating $θ$ from $Y$ can scale at most linearly with the mutual information between $X$ and $Y$. We apply this result to obtain minimax lower bounds in distributed statistical estimation problems, and obtain a tight preconstant for Gaussian mean estimation. We then show how our Fisher information bound can also imply mutual information or Jensen-Shannon divergence based distributed strong data processing inequalities.
△ Less
Submitted 8 July, 2021; v1 submitted 10 February, 2021;
originally announced February 2021.
-
LonelyText: A Short Messaging Based Classification of Loneliness
Authors:
Mawulolo K. Ameko,
Sonia Baee,
Laura E. Barnes
Abstract:
Loneliness does not only have emotional implications on a person but also on his/her well-being. The study of loneliness has been challenging and largely inconclusive in findings because of the several factors that might correlate to the phenomenon. We present one approach to predicting this event by discovering patterns of language associated with loneliness. Our results show insights and promisi…
▽ More
Loneliness does not only have emotional implications on a person but also on his/her well-being. The study of loneliness has been challenging and largely inconclusive in findings because of the several factors that might correlate to the phenomenon. We present one approach to predicting this event by discovering patterns of language associated with loneliness. Our results show insights and promising directions for mining text from instant messaging to predict loneliness.
△ Less
Submitted 22 January, 2021;
originally announced January 2021.
-
Sparse Longitudinal Representations of Electronic Health Record Data for the Early Detection of Chronic Kidney Disease in Diabetic Patients
Authors:
**ghe Zhang,
Kamran Kowsari,
Mehdi Boukhechba,
James Harrison,
Jennifer Lobo,
Laura Barnes
Abstract:
Chronic kidney disease (CKD) is a gradual loss of renal function over time, and it increases the risk of mortality, decreased quality of life, as well as serious complications. The prevalence of CKD has been increasing in the last couple of decades, which is partly due to the increased prevalence of diabetes and hypertension. To accurately detect CKD in diabetic patients, we propose a novel framew…
▽ More
Chronic kidney disease (CKD) is a gradual loss of renal function over time, and it increases the risk of mortality, decreased quality of life, as well as serious complications. The prevalence of CKD has been increasing in the last couple of decades, which is partly due to the increased prevalence of diabetes and hypertension. To accurately detect CKD in diabetic patients, we propose a novel framework to learn sparse longitudinal representations of patients' medical records. The proposed method is also compared with widely used baselines such as Aggregated Frequency Vector and Bag-of-Pattern in Sequences on real EHR data, and the experimental results indicate that the proposed model achieves higher predictive performance. Additionally, the learned representations are interpreted and visualized to bring clinical insights.
△ Less
Submitted 17 November, 2020; v1 submitted 9 November, 2020;
originally announced November 2020.
-
HHAR-net: Hierarchical Human Activity Recognition using Neural Networks
Authors:
Mehrdad Fazli,
Kamran Kowsari,
Erfaneh Gharavi,
Laura Barnes,
Afsaneh Doryab
Abstract:
Activity recognition using built-in sensors in smart and wearable devices provides great opportunities to understand and detect human behavior in the wild and gives a more holistic view of individuals' health and well being. Numerous computational methods have been applied to sensor streams to recognize different daily activities. However, most methods are unable to capture different layers of act…
▽ More
Activity recognition using built-in sensors in smart and wearable devices provides great opportunities to understand and detect human behavior in the wild and gives a more holistic view of individuals' health and well being. Numerous computational methods have been applied to sensor streams to recognize different daily activities. However, most methods are unable to capture different layers of activities concealed in human behavior. Also, the performance of the models starts to decrease with increasing the number of activities. This research aims at building a hierarchical classification with Neural Networks to recognize human activities based on different levels of abstraction. We evaluate our model on the Extrasensory dataset; a dataset collected in the wild and containing data from smartphones and smartwatches. We use a two-level hierarchy with a total of six mutually exclusive labels namely, "lying down", "sitting", "standing in place", "walking", "running", and "bicycling" divided into "stationary" and "non-stationary". The results show that our model can recognize low-level activities (stationary/non-stationary) with 95.8% accuracy and overall accuracy of 92.8% over six labels. This is 3% above our best performing baseline.
△ Less
Submitted 10 November, 2020; v1 submitted 28 October, 2020;
originally announced October 2020.
-
Improving Classification through Weak Supervision in Context-specific Conversational Agent Development for Teacher Education
Authors:
Debajyoti Datta,
Maria Phillips,
Jennifer Chiu,
Ginger S. Watson,
James P. Bywater,
Laura Barnes,
Donald Brown
Abstract:
Machine learning techniques applied to the Natural Language Processing (NLP) component of conversational agent development show promising results for improved accuracy and quality of feedback that a conversational agent can provide. The effort required to develop an educational scenario specific conversational agent is time consuming as it requires domain experts to label and annotate noisy data s…
▽ More
Machine learning techniques applied to the Natural Language Processing (NLP) component of conversational agent development show promising results for improved accuracy and quality of feedback that a conversational agent can provide. The effort required to develop an educational scenario specific conversational agent is time consuming as it requires domain experts to label and annotate noisy data sources such as classroom videos. Previous approaches to modeling annotations have relied on labeling thousands of examples and calculating inter-annotator agreement and majority votes in order to model the necessary scenarios. This method, while proven successful, ignores individual annotator strengths in labeling a data point and under-utilizes examples that do not have a majority vote for labeling. We propose using a multi-task weak supervision method combined with active learning to address these concerns. This approach requires less labeling than traditional methods and shows significant improvements in precision, efficiency, and time-requirements than the majority vote method (Ratner 2019). We demonstrate the validity of this method on the Google Jigsaw data set and then propose a scenario to apply this method using the Instructional Quality Assessment(IQA) to define the categories for labeling. We propose using probabilistic modeling of annotator labeling to generate active learning examples to further label the data. Active learning is able to iteratively improve the training performance and accuracy of the original classification model. This approach combines state-of-the art labeling techniques of weak supervision and active learning to optimize results in the educational domain and could be further used to lessen the data requirements for expanded scenarios within the education domain through transfer learning.
△ Less
Submitted 23 October, 2020;
originally announced October 2020.
-
Geometry matters: Exploring language examples at the decision boundary
Authors:
Debajyoti Datta,
Shashwat Kumar,
Laura Barnes,
Tom Fletcher
Abstract:
A growing body of recent evidence has highlighted the limitations of natural language processing (NLP) datasets and classifiers. These include the presence of annotation artifacts in datasets, classifiers relying on shallow features like a single word (e.g., if a movie review has the word "romantic", the review tends to be positive), or unnecessary words (e.g., learning a proper noun to classify a…
▽ More
A growing body of recent evidence has highlighted the limitations of natural language processing (NLP) datasets and classifiers. These include the presence of annotation artifacts in datasets, classifiers relying on shallow features like a single word (e.g., if a movie review has the word "romantic", the review tends to be positive), or unnecessary words (e.g., learning a proper noun to classify a movie as positive or negative). The presence of such artifacts has subsequently led to the development of challenging datasets to force the model to generalize better. While a variety of heuristic strategies, such as counterfactual examples and contrast sets, have been proposed, the theoretical justification about what makes these examples difficult for the classifier is often lacking or unclear. In this paper, using tools from information geometry, we propose a theoretical way to quantify the difficulty of an example in NLP. Using our approach, we explore difficult examples for several deep learning architectures. We discover that both BERT, CNN and fasttext are susceptible to word substitutions in high difficulty examples. These classifiers tend to perform poorly on the FIM test set. (generated by sampling and perturbing difficult examples, with accuracy drop** below 50%). We replicate our experiments on 5 NLP datasets (YelpReviewPolarity, AGNEWS, SogouNews, YelpReviewFull and Yahoo Answers). On YelpReviewPolarity we observe a correlation coefficient of -0.4 between resilience to perturbations and the difficulty score. Similarly we observe a correlation of 0.35 between the difficulty score and the empirical success probability of random substitutions. Our approach is simple, architecture agnostic and can be used to study the fragilities of text classification models. All the code used will be made publicly available, including a tool to explore the difficult examples for other datasets.
△ Less
Submitted 28 October, 2021; v1 submitted 14 October, 2020;
originally announced October 2020.
-
A Framework for Addressing the Risks and Opportunities In AI-Supported Virtual Health Coaches
Authors:
Sonia Baee,
Mark Rucker,
Anna Baglione,
Mawulolo K. Ameko,
Laura Barnes
Abstract:
Virtual coaching has rapidly evolved into a foundational component of modern clinical practice. At a time when healthcare professionals are in short supply and the demand for low-cost treatments is ever-increasing, virtual health coaches (VHCs) offer intervention-on-demand for those limited by finances or geographic access to care. More recently, AI-powered virtual coaches have become a viable com…
▽ More
Virtual coaching has rapidly evolved into a foundational component of modern clinical practice. At a time when healthcare professionals are in short supply and the demand for low-cost treatments is ever-increasing, virtual health coaches (VHCs) offer intervention-on-demand for those limited by finances or geographic access to care. More recently, AI-powered virtual coaches have become a viable complement to human coaches. However, the push for AI-powered coaching systems raises several important issues for researchers, designers, clinicians, and patients. In this paper, we present a novel framework to guide the design and development of virtual coaching systems. This framework augments a traditional data science pipeline with four key guiding goals: reliability, fairness, engagement, and ethics.
△ Less
Submitted 12 October, 2020;
originally announced October 2020.
-
Offline Contextual Multi-armed Bandits for Mobile Health Interventions: A Case Study on Emotion Regulation
Authors:
Mawulolo K. Ameko,
Miranda L. Beltzer,
Lihua Cai,
Mehdi Boukhechba,
Bethany A. Teachman,
Laura E. Barnes
Abstract:
Delivering treatment recommendations via pervasive electronic devices such as mobile phones has the potential to be a viable and scalable treatment medium for long-term health behavior management. But active experimentation of treatment options can be time-consuming, expensive and altogether unethical in some cases. There is a growing interest in methodological approaches that allow an experimente…
▽ More
Delivering treatment recommendations via pervasive electronic devices such as mobile phones has the potential to be a viable and scalable treatment medium for long-term health behavior management. But active experimentation of treatment options can be time-consuming, expensive and altogether unethical in some cases. There is a growing interest in methodological approaches that allow an experimenter to learn and evaluate the usefulness of a new treatment strategy before deployment. We present the first development of a treatment recommender system for emotion regulation using real-world historical mobile digital data from n = 114 high socially anxious participants to test the usefulness of new emotion regulation strategies. We explore a number of offline contextual bandits estimators for learning and propose a general framework for learning algorithms. Our experimentation shows that the proposed doubly robust offline learning algorithms performed significantly better than baseline approaches, suggesting that this type of recommender algorithm could improve emotion regulation. Given that emotion regulation is impaired across many mental illnesses and such a recommender algorithm could be scaled up easily, this approach holds potential to increase access to treatment for many people. We also share some insights that allow us to translate contextual bandit models to this complex real-world data, including which contextual features appear to be most important for predicting emotion regulation strategy effectiveness.
△ Less
Submitted 21 August, 2020;
originally announced August 2020.
-
Fisher information under local differential privacy
Authors:
Leighton Pate Barnes,
Wei-Ning Chen,
Ayfer Ozgur
Abstract:
We develop data processing inequalities that describe how Fisher information from statistical samples can scale with the privacy parameter $\varepsilon$ under local differential privacy constraints. These bounds are valid under general conditions on the distribution of the score of the statistical model, and they elucidate under which conditions the dependence on $\varepsilon$ is linear, quadratic…
▽ More
We develop data processing inequalities that describe how Fisher information from statistical samples can scale with the privacy parameter $\varepsilon$ under local differential privacy constraints. These bounds are valid under general conditions on the distribution of the score of the statistical model, and they elucidate under which conditions the dependence on $\varepsilon$ is linear, quadratic, or exponential. We show how these inequalities imply order optimal lower bounds for private estimation for both the Gaussian location model and discrete distribution estimation for all levels of privacy $\varepsilon>0$. We further apply these inequalities to sparse Bernoulli models and demonstrate privacy mechanisms and estimators with order-matching squared $\ell^2$ error.
△ Less
Submitted 21 May, 2020;
originally announced May 2020.
-
rTop-k: A Statistical Estimation Approach to Distributed SGD
Authors:
Leighton Pate Barnes,
Huseyin A. Inan,
Berivan Isik,
Ayfer Ozgur
Abstract:
The large communication cost for exchanging gradients between different nodes significantly limits the scalability of distributed training for large-scale learning models. Motivated by this observation, there has been significant recent interest in techniques that reduce the communication cost of distributed Stochastic Gradient Descent (SGD), with gradient sparsification techniques such as top-k a…
▽ More
The large communication cost for exchanging gradients between different nodes significantly limits the scalability of distributed training for large-scale learning models. Motivated by this observation, there has been significant recent interest in techniques that reduce the communication cost of distributed Stochastic Gradient Descent (SGD), with gradient sparsification techniques such as top-k and random-k shown to be particularly effective. The same observation has also motivated a separate line of work in distributed statistical estimation theory focusing on the impact of communication constraints on the estimation efficiency of different statistical models. The primary goal of this paper is to connect these two research lines and demonstrate how statistical estimation models and their analysis can lead to new insights in the design of communication-efficient training techniques. We propose a simple statistical estimation model for the stochastic gradients which captures the sparsity and skewness of their distribution. The statistically optimal communication scheme arising from the analysis of this model leads to a new sparsification technique for SGD, which concatenates random-k and top-k, considered separately in the prior literature. We show through extensive experiments on both image and language domains with CIFAR-10, ImageNet, and Penn Treebank datasets that the concatenated application of these two sparsification methods consistently and significantly outperforms either method applied alone.
△ Less
Submitted 2 December, 2020; v1 submitted 21 May, 2020;
originally announced May 2020.
-
Gender Detection on Social Networks using Ensemble Deep Learning
Authors:
Kamran Kowsari,
Mojtaba Heidarysafa,
Tolu Odukoya,
Philip Potter,
Laura E. Barnes,
Donald E. Brown
Abstract:
Analyzing the ever-increasing volume of posts on social media sites such as Facebook and Twitter requires improved information processing methods for profiling authorship. Document classification is central to this task, but the performance of traditional supervised classifiers has degraded as the volume of social media has increased. This paper addresses this problem in the context of gender dete…
▽ More
Analyzing the ever-increasing volume of posts on social media sites such as Facebook and Twitter requires improved information processing methods for profiling authorship. Document classification is central to this task, but the performance of traditional supervised classifiers has degraded as the volume of social media has increased. This paper addresses this problem in the context of gender detection through ensemble classification that employs multi-model deep learning architectures to generate specialized understanding from different feature spaces.
△ Less
Submitted 9 September, 2020; v1 submitted 13 April, 2020;
originally announced April 2020.
-
The Courtade-Kumar Most Informative Boolean Function Conjecture and a Symmetrized Li-Médard Conjecture are Equivalent
Authors:
Leighton Pate Barnes,
Ayfer Özgür
Abstract:
We consider the Courtade-Kumar most informative Boolean function conjecture for balanced functions, as well as a conjecture by Li and Médard that dictatorship functions also maximize the $L^α$ norm of $T_pf$ for $1\leqα\leq2$ where $T_p$ is the noise operator and $f$ is a balanced Boolean function. By using a result due to Laguerre from the 1880's, we are able to bound how many times an $L^α$-norm…
▽ More
We consider the Courtade-Kumar most informative Boolean function conjecture for balanced functions, as well as a conjecture by Li and Médard that dictatorship functions also maximize the $L^α$ norm of $T_pf$ for $1\leqα\leq2$ where $T_p$ is the noise operator and $f$ is a balanced Boolean function. By using a result due to Laguerre from the 1880's, we are able to bound how many times an $L^α$-norm related quantity can cross zero as a function of $α$, and show that these two conjectures are essentially equivalent.
△ Less
Submitted 2 April, 2020;
originally announced April 2020.
-
Reward Sha** for Human Learning via Inverse Reinforcement Learning
Authors:
Mark A. Rucker,
Layne T. Watson,
Matthew S. Gerber,
Laura E. Barnes
Abstract:
Humans are spectacular reinforcement learners, constantly learning from and adjusting to experience and feedback. Unfortunately, this doesn't necessarily mean humans are fast learners. When tasks are challenging, learning can become unacceptably slow. Fortunately, humans do not have to learn tabula rasa, and learning speed can be greatly increased with learning aids. In this work we validate a new…
▽ More
Humans are spectacular reinforcement learners, constantly learning from and adjusting to experience and feedback. Unfortunately, this doesn't necessarily mean humans are fast learners. When tasks are challenging, learning can become unacceptably slow. Fortunately, humans do not have to learn tabula rasa, and learning speed can be greatly increased with learning aids. In this work we validate a new type of learning aid -- reward sha** for humans via inverse reinforcement learning (IRL). The goal of this aid is to increase the speed with which humans can learn good policies for specific tasks. Furthermore this approach compliments alternative machine learning techniques such as safety features that try to prevent individuals from making poor decisions. To achieve our results we first extend a well known IRL algorithm via kernel methods. Afterwards we conduct two human subjects experiments using an online game where players have limited time to learn a good policy. We show with statistical significance that players who receive our learning aid are able to approach desired policies more quickly than the control group.
△ Less
Submitted 15 December, 2022; v1 submitted 25 February, 2020;
originally announced February 2020.
-
Enabling Smartphone-based Estimation of Heart Rate
Authors:
Nutta Homdee,
Mehdi Boukhechba,
Yixue W. Feng,
Natalie Kramer,
John Lach,
Laura E. Barnes
Abstract:
Continuous, ubiquitous monitoring through wearable sensors has the potential to collect useful information about users' context. Heart rate is an important physiologic measure used in a wide variety of applications, such as fitness tracking and health monitoring. However, wearable sensors that monitor heart rate, such as smartwatches and electrocardiogram (ECG) patches, can have gaps in their data…
▽ More
Continuous, ubiquitous monitoring through wearable sensors has the potential to collect useful information about users' context. Heart rate is an important physiologic measure used in a wide variety of applications, such as fitness tracking and health monitoring. However, wearable sensors that monitor heart rate, such as smartwatches and electrocardiogram (ECG) patches, can have gaps in their data streams because of technical issues (e.g., bad wireless channels, battery depletion, etc.) or user-related reasons (e.g. motion artifacts, user compliance, etc.). The ability to use other available sensor data (e.g., smartphone data) to estimate missing heart rate readings is useful to cope with any such gaps, thus improving data quality and continuity. In this paper, we test the feasibility of estimating raw heart rate using smartphone sensor data. Using data generated by 12 participants in a one-week study period, we were able to build both personalized and generalized models using regression, SVM, and random forest algorithms. All three algorithms outperformed the baseline moving-average interpolation method for both personalized and generalized settings. Moreover, our findings suggest that personalized models outperformed the generalized models, which speaks to the importance of considering personal physiology, behavior, and life style in the estimation of heart rate. The promising results provide preliminary evidence of the feasibility of combining smartphone sensor data with wearable sensor data for continuous heart rate monitoring.
△ Less
Submitted 18 December, 2019;
originally announced December 2019.
-
MEDIRL: Predicting the Visual Attention of Drivers via Maximum Entropy Deep Inverse Reinforcement Learning
Authors:
Sonia Baee,
Erfan Pakdamanian,
Inki Kim,
Lu Feng,
Vicente Ordonez,
Laura Barnes
Abstract:
Inspired by human visual attention, we propose a novel inverse reinforcement learning formulation using Maximum Entropy Deep Inverse Reinforcement Learning (MEDIRL) for predicting the visual attention of drivers in accident-prone situations. MEDIRL predicts fixation locations that lead to maximal rewards by learning a task-sensitive reward function from eye fixation patterns recorded from attentiv…
▽ More
Inspired by human visual attention, we propose a novel inverse reinforcement learning formulation using Maximum Entropy Deep Inverse Reinforcement Learning (MEDIRL) for predicting the visual attention of drivers in accident-prone situations. MEDIRL predicts fixation locations that lead to maximal rewards by learning a task-sensitive reward function from eye fixation patterns recorded from attentive drivers. Additionally, we introduce EyeCar, a new driver attention dataset in accident-prone situations. We conduct comprehensive experiments to evaluate our proposed model on three common benchmarks: (DR(eye)VE, BDD-A, DADA-2000), and our EyeCar dataset. Results indicate that MEDIRL outperforms existing models for predicting attention and achieves state-of-the-art performance. We present extensive ablation studies to provide more insights into different features of our proposed model.
△ Less
Submitted 6 October, 2021; v1 submitted 16 December, 2019;
originally announced December 2019.
-
Women in ISIS Propaganda: A Natural Language Processing Analysis of Topics and Emotions in a Comparison with Mainstream Religious Group
Authors:
Mojtaba Heidarysafa,
Kamran Kowsari,
Tolu Odukoya,
Philip Potter,
Laura E. Barnes,
Donald E. Brown
Abstract:
Online propaganda is central to the recruitment strategies of extremist groups and in recent years these efforts have increasingly extended to women. To investigate ISIS' approach to targeting women in their online propaganda and uncover implications for counterterrorism, we rely on text mining and natural language processing (NLP). Specifically, we extract articles published in Dabiq and Rumiyah…
▽ More
Online propaganda is central to the recruitment strategies of extremist groups and in recent years these efforts have increasingly extended to women. To investigate ISIS' approach to targeting women in their online propaganda and uncover implications for counterterrorism, we rely on text mining and natural language processing (NLP). Specifically, we extract articles published in Dabiq and Rumiyah (ISIS's online English language publications) to identify prominent topics. To identify similarities or differences between these texts and those produced by non-violent religious groups, we extend the analysis to articles from a Catholic forum dedicated to women. We also perform an emotional analysis of both of these resources to better understand the emotional components of propaganda. We rely on Depechemood (a lexical-base emotion analysis method) to detect emotions most likely to be evoked in readers of these materials. The findings indicate that the emotional appeal of ISIS and Catholic materials are similar
△ Less
Submitted 8 December, 2019;
originally announced December 2019.
-
Minimax Bounds for Distributed Logistic Regression
Authors:
Leighton Pate Barnes,
Ayfer Ozgur
Abstract:
We consider a distributed logistic regression problem where labeled data pairs $(X_i,Y_i)\in \mathbb{R}^d\times\{-1,1\}$ for $i=1,\ldots,n$ are distributed across multiple machines in a network and must be communicated to a centralized estimator using at most $k$ bits per labeled pair. We assume that the data $X_i$ come independently from some distribution $P_X$, and that the distribution of…
▽ More
We consider a distributed logistic regression problem where labeled data pairs $(X_i,Y_i)\in \mathbb{R}^d\times\{-1,1\}$ for $i=1,\ldots,n$ are distributed across multiple machines in a network and must be communicated to a centralized estimator using at most $k$ bits per labeled pair. We assume that the data $X_i$ come independently from some distribution $P_X$, and that the distribution of $Y_i$ conditioned on $X_i$ follows a logistic model with some parameter $θ\in\mathbb{R}^d$. By using a Fisher information argument, we give minimax lower bounds for estimating $θ$ under different assumptions on the tail of the distribution $P_X$. We consider both $\ell^2$ and logistic losses, and show that for the logistic loss our sub-Gaussian lower bound is order-optimal and cannot be improved.
△ Less
Submitted 3 October, 2019;
originally announced October 2019.
-
Text Classification Algorithms: A Survey
Authors:
Kamran Kowsari,
Kiana Jafari Meimandi,
Mojtaba Heidarysafa,
Sanjana Mendu,
Laura E. Barnes,
Donald E. Brown
Abstract:
In recent years, there has been an exponential growth in the number of complex documents and texts that require a deeper understanding of machine learning methods to be able to accurately classify texts in many applications. Many machine learning approaches have achieved surpassing results in natural language processing. The success of these learning algorithms relies on their capacity to understa…
▽ More
In recent years, there has been an exponential growth in the number of complex documents and texts that require a deeper understanding of machine learning methods to be able to accurately classify texts in many applications. Many machine learning approaches have achieved surpassing results in natural language processing. The success of these learning algorithms relies on their capacity to understand complex models and non-linear relationships within data. However, finding suitable structures, architectures, and techniques for text classification is a challenge for researchers. In this paper, a brief overview of text classification algorithms is discussed. This overview covers different text feature extractions, dimensionality reduction methods, existing algorithms and techniques, and evaluations methods. Finally, the limitations of each technique and their application in the real-world problem are discussed.
△ Less
Submitted 20 May, 2020; v1 submitted 16 April, 2019;
originally announced April 2019.
-
Lower Bounds for Learning Distributions under Communication Constraints via Fisher Information
Authors:
Leighton Pate Barnes,
Yanjun Han,
Ayfer Ozgur
Abstract:
We consider the problem of learning high-dimensional, nonparametric and structured (e.g. Gaussian) distributions in distributed networks, where each node in the network observes an independent sample from the underlying distribution and can use $k$ bits to communicate its sample to a central processor. We consider three different models for communication. Under the independent model, each node com…
▽ More
We consider the problem of learning high-dimensional, nonparametric and structured (e.g. Gaussian) distributions in distributed networks, where each node in the network observes an independent sample from the underlying distribution and can use $k$ bits to communicate its sample to a central processor. We consider three different models for communication. Under the independent model, each node communicates its sample to a central processor by independently encoding it into $k$ bits. Under the more general sequential or blackboard communication models, nodes can share information interactively but each node is restricted to write at most $k$ bits on the final transcript. We characterize the impact of the communication constraint $k$ on the minimax risk of estimating the underlying distribution under $\ell^2$ loss. We develop minimax lower bounds that apply in a unified way to many common statistical models and reveal that the impact of the communication constraint can be qualitatively different depending on the tail behavior of the score function associated with each model. A key ingredient in our proofs is a geometric characterization of Fisher information from quantized samples.
△ Less
Submitted 31 May, 2019; v1 submitted 7 February, 2019;
originally announced February 2019.
-
An Isoperimetric Result on High-Dimensional Spheres
Authors:
Leighton Pate Barnes,
Ayfer Ozgur,
Xiugang Wu
Abstract:
We consider an extremal problem for subsets of high-dimensional spheres that can be thought of as an extension of the classical isoperimetric problem on the sphere. Let $A$ be a subset of the $(m-1)$-dimensional sphere $\mathbb{S}^{m-1}$, and let $\mathbf{y}\in \mathbb{S}^{m-1}$ be a randomly chosen point on the sphere. What is the measure of the intersection of the $t$-neighborhood of the point…
▽ More
We consider an extremal problem for subsets of high-dimensional spheres that can be thought of as an extension of the classical isoperimetric problem on the sphere. Let $A$ be a subset of the $(m-1)$-dimensional sphere $\mathbb{S}^{m-1}$, and let $\mathbf{y}\in \mathbb{S}^{m-1}$ be a randomly chosen point on the sphere. What is the measure of the intersection of the $t$-neighborhood of the point $\mathbf{y}$ with the subset $A$? We show that with high probability this intersection is approximately as large as the intersection that would occur with high probability if $A$ were a spherical cap of the same measure.
△ Less
Submitted 20 November, 2018;
originally announced November 2018.
-
Analysis of Railway Accidents' Narratives Using Deep Learning
Authors:
Mojtaba Heidarysafa,
Kamran Kowsari,
Laura E. Barnes,
Donald E. Brown
Abstract:
Automatic understanding of domain specific texts in order to extract useful relationships for later use is a non-trivial task. One such relationship would be between railroad accidents' causes and their correspondent descriptions in reports. From 2001 to 2016 rail accidents in the U.S. cost more than $4.6B. Railroads involved in accidents are required to submit an accident report to the Federal Ra…
▽ More
Automatic understanding of domain specific texts in order to extract useful relationships for later use is a non-trivial task. One such relationship would be between railroad accidents' causes and their correspondent descriptions in reports. From 2001 to 2016 rail accidents in the U.S. cost more than $4.6B. Railroads involved in accidents are required to submit an accident report to the Federal Railroad Administration (FRA). These reports contain a variety of fixed field entries including primary cause of the accidents (a coded variable with 389 values) as well as a narrative field which is a short text description of the accident. Although these narratives provide more information than a fixed field entry, the terminologies used in these reports are not easy to understand by a non-expert reader. Therefore, providing an assisting method to fill in the primary cause from such domain specific texts(narratives) would help to label the accidents with more accuracy. Another important question for transportation safety is whether the reported accident cause is consistent with narrative description. To address these questions, we applied deep learning methods together with powerful word embeddings such as Word2Vec and GloVe to classify accident cause values for the primary cause field using the text in the narratives. The results show that such approaches can both accurately classify accident causes based on report narratives and find important inconsistencies in accident reporting.
△ Less
Submitted 20 May, 2020; v1 submitted 17 October, 2018;
originally announced October 2018.
-
Patient2Vec: A Personalized Interpretable Deep Representation of the Longitudinal Electronic Health Record
Authors:
**ghe Zhang,
Kamran Kowsari,
James H. Harrison,
Jennifer M. Lobo,
Laura E. Barnes
Abstract:
The wide implementation of electronic health record (EHR) systems facilitates the collection of large-scale health data from real clinical settings. Despite the significant increase in adoption of EHR systems, this data remains largely unexplored, but presents a rich data source for knowledge discovery from patient health histories in tasks such as understanding disease correlations and predicting…
▽ More
The wide implementation of electronic health record (EHR) systems facilitates the collection of large-scale health data from real clinical settings. Despite the significant increase in adoption of EHR systems, this data remains largely unexplored, but presents a rich data source for knowledge discovery from patient health histories in tasks such as understanding disease correlations and predicting health outcomes. However, the heterogeneity, sparsity, noise, and bias in this data present many complex challenges. This complexity makes it difficult to translate potentially relevant information into machine learning algorithms. In this paper, we propose a computational framework, Patient2Vec, to learn an interpretable deep representation of longitudinal EHR data which is personalized for each patient. To evaluate this approach, we apply it to the prediction of future hospitalizations using real EHR data and compare its predictive performance with baseline methods. Patient2Vec produces a vector space with meaningful structure and it achieves an AUC around 0.799 outperforming baseline methods. In the end, the learned feature importance can be visualized and interpreted at both the individual and population levels to bring clinical insights.
△ Less
Submitted 25 October, 2018; v1 submitted 10 October, 2018;
originally announced October 2018.
-
An Improvement of Data Classification Using Random Multimodel Deep Learning (RMDL)
Authors:
Mojtaba Heidarysafa,
Kamran Kowsari,
Donald E. Brown,
Kiana Jafari Meimandi,
Laura E. Barnes
Abstract:
The exponential growth in the number of complex datasets every year requires more enhancement in machine learning methods to provide robust and accurate data classification. Lately, deep learning approaches have achieved surpassing results in comparison to previous machine learning algorithms. However, finding the suitable structure for these models has been a challenge for researchers. This paper…
▽ More
The exponential growth in the number of complex datasets every year requires more enhancement in machine learning methods to provide robust and accurate data classification. Lately, deep learning approaches have achieved surpassing results in comparison to previous machine learning algorithms. However, finding the suitable structure for these models has been a challenge for researchers. This paper introduces Random Multimodel Deep Learning (RMDL): a new ensemble, deep learning approach for classification. RMDL solves the problem of finding the best deep learning structure and architecture while simultaneously improving robustness and accuracy through ensembles of deep learning architectures. In short, RMDL trains multiple randomly generated models of Deep Neural Network (DNN), Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) in parallel and combines their results to produce better result of any of those models individually. In this paper, we describe RMDL model and compare the results for image and text classification as well as face recognition. We used MNIST and CIFAR-10 datasets as ground truth datasets for image classification and WOS, Reuters, IMDB, and 20newsgroup datasets for text classification. Lastly, we used ORL dataset to compare the model performance on face recognition task.
△ Less
Submitted 22 August, 2018;
originally announced August 2018.
-
RMDL: Random Multimodel Deep Learning for Classification
Authors:
Kamran Kowsari,
Mojtaba Heidarysafa,
Donald E. Brown,
Kiana Jafari Meimandi,
Laura E. Barnes
Abstract:
The continually increasing number of complex datasets each year necessitates ever improving machine learning methods for robust and accurate categorization of these data. This paper introduces Random Multimodel Deep Learning (RMDL): a new ensemble, deep learning approach for classification. Deep learning models have achieved state-of-the-art results across many domains. RMDL solves the problem of…
▽ More
The continually increasing number of complex datasets each year necessitates ever improving machine learning methods for robust and accurate categorization of these data. This paper introduces Random Multimodel Deep Learning (RMDL): a new ensemble, deep learning approach for classification. Deep learning models have achieved state-of-the-art results across many domains. RMDL solves the problem of finding the best deep learning structure and architecture while simultaneously improving robustness and accuracy through ensembles of deep learning architectures. RDML can accept as input a variety data to include text, video, images, and symbolic. This paper describes RMDL and shows test results for image and text data including MNIST, CIFAR-10, WOS, Reuters, IMDB, and 20newsgroup. These test results show that RDML produces consistently better performance than standard methods over a broad range of data types and classification problems.
△ Less
Submitted 31 May, 2018; v1 submitted 3 May, 2018;
originally announced May 2018.
-
Cluster-based Approach to Improve Affect Recognition from Passively Sensed Data
Authors:
Mawulolo K. Ameko,
Lihua Cai,
Mehdi Boukhechba,
Alexander Daros,
Philip I. Chow,
Bethany A. Teachman,
Matthew S. Gerber,
Laura E. Barnes
Abstract:
Negative affect is a proxy for mental health in adults. By being able to predict participants' negative affect states unobtrusively, researchers and clinicians will be better positioned to deliver targeted, just-in-time mental health interventions via mobile applications. This work attempts to personalize the passive recognition of negative affect states via group-based modeling of user behavior p…
▽ More
Negative affect is a proxy for mental health in adults. By being able to predict participants' negative affect states unobtrusively, researchers and clinicians will be better positioned to deliver targeted, just-in-time mental health interventions via mobile applications. This work attempts to personalize the passive recognition of negative affect states via group-based modeling of user behavior patterns captured from mobility, communication, and activity patterns. Results show that group models outperform generalized models in a dataset based on two weeks of users' daily lives.
△ Less
Submitted 31 January, 2018;
originally announced February 2018.
-
HDLTex: Hierarchical Deep Learning for Text Classification
Authors:
Kamran Kowsari,
Donald E. Brown,
Mojtaba Heidarysafa,
Kiana Jafari Meimandi,
Matthew S. Gerber,
Laura E. Barnes
Abstract:
The continually increasing number of documents produced each year necessitates ever improving information processing methods for searching, retrieving, and organizing text. Central to these information processing methods is document classification, which has become an important application for supervised learning. Recently the performance of these traditional classifiers has degraded as the number…
▽ More
The continually increasing number of documents produced each year necessitates ever improving information processing methods for searching, retrieving, and organizing text. Central to these information processing methods is document classification, which has become an important application for supervised learning. Recently the performance of these traditional classifiers has degraded as the number of documents has increased. This is because along with this growth in the number of documents has come an increase in the number of categories. This paper approaches this problem differently from current document classification methods that view the problem as multi-class classification. Instead we perform hierarchical classification using an approach we call Hierarchical Deep Learning for Text classification (HDLTex). HDLTex employs stacks of deep learning architectures to provide specialized understanding at each level of the document hierarchy.
△ Less
Submitted 6 October, 2017; v1 submitted 24 September, 2017;
originally announced September 2017.
-
"The Capacity of the Relay Channel": Solution to Cover's Problem in the Gaussian Case
Authors:
Xiugang Wu,
Leighton Pate Barnes,
Ayfer Ozgur
Abstract:
Consider a memoryless relay channel, where the relay is connected to the destination with an isolated bit pipe of capacity $C_0$. Let $C(C_0)$ denote the capacity of this channel as a function of $C_0$. What is the critical value of $C_0$ such that $C(C_0)$ first equals $C(\infty)$? This is a long-standing open problem posed by Cover and named "The Capacity of the Relay Channel," in…
▽ More
Consider a memoryless relay channel, where the relay is connected to the destination with an isolated bit pipe of capacity $C_0$. Let $C(C_0)$ denote the capacity of this channel as a function of $C_0$. What is the critical value of $C_0$ such that $C(C_0)$ first equals $C(\infty)$? This is a long-standing open problem posed by Cover and named "The Capacity of the Relay Channel," in $Open \ Problems \ in \ Communication \ and \ Computation$, Springer-Verlag, 1987. In this paper, we answer this question in the Gaussian case and show that $C(C_0)$ can not equal to $C(\infty)$ unless $C_0=\infty$, regardless of the SNR of the Gaussian channels. This result follows as a corollary to a new upper bound we develop on the capacity of this channel. Instead of "single-letterizing" expressions involving information measures in a high-dimensional space as is typically done in converse results in information theory, our proof directly quantifies the tension between the pertinent $n$-letter forms. This is done by translating the information tension problem to a problem in high-dimensional geometry. As an intermediate result, we develop an extension of the classical isoperimetric inequality on a high-dimensional sphere, which can be of interest in its own right.
△ Less
Submitted 7 October, 2018; v1 submitted 8 January, 2017;
originally announced January 2017.
-
Efficient Resource Matching in Heterogeneous Grid Using Resource Vector
Authors:
Srirangam V Addepallil,
Per Andersen,
George L Barnes
Abstract:
In this paper, a method for efficient scheduling to obtain optimum job throughput in a distributed campus grid environment is presented; Traditional job schedulers determine job scheduling using user and job resource attributes. User attributes are related to current usage, historical usage, user priority and project access. Job resource attributes mainly comprise of soft requirements (compilers,…
▽ More
In this paper, a method for efficient scheduling to obtain optimum job throughput in a distributed campus grid environment is presented; Traditional job schedulers determine job scheduling using user and job resource attributes. User attributes are related to current usage, historical usage, user priority and project access. Job resource attributes mainly comprise of soft requirements (compilers, libraries) and hard requirements like memory, storage and interconnect. A job scheduler dispatches jobs to a resource if a job's hard and soft requirements are met by a resource. In current scenario during execution of a job, if a resource becomes unavailable, schedulers are presented with limited options, namely re-queuing job or migrating job to a different resource. Both options are expensive in terms of data and compute time. These situations can be avoided, if the often ignored factor, availability time of a resource in a grid environment is considered. We propose resource rank approach, in which jobs are dispatched to a resource which has the highest rank among all resources that match the job's requirement. The results show that our approach can increase throughput of many serial / monolithic jobs.
△ Less
Submitted 7 June, 2010;
originally announced June 2010.