Search | arXiv e-print repository

Clustering Interval-Censored Time-Series for Disease Phenoty**

Authors: Irene Y. Chen, Rahul G. Krishnan, David Sontag

Abstract: Unsupervised learning is often used to uncover clusters in data. However, different kinds of noise may impede the discovery of useful patterns from real-world time-series data. In this work, we focus on mitigating the interference of interval censoring in the task of clustering for disease phenoty**. We develop a deep generative, continuous-time model of time-series data that clusters time-serie… ▽ More Unsupervised learning is often used to uncover clusters in data. However, different kinds of noise may impede the discovery of useful patterns from real-world time-series data. In this work, we focus on mitigating the interference of interval censoring in the task of clustering for disease phenoty**. We develop a deep generative, continuous-time model of time-series data that clusters time-series while correcting for censorship time. We provide conditions under which clusters and the amount of delayed entry may be identified from data under a noiseless model. On synthetic data, we demonstrate accurate, stable, and interpretable results that outperform several benchmarks. On real-world clinical datasets of heart failure and Parkinson's disease patients, we study how interval censoring can adversely affect the task of disease phenoty**. Our model corrects for this source of error and recovers known clinical subtypes. △ Less

Submitted 5 December, 2021; v1 submitted 13 February, 2021; originally announced February 2021.

Comments: AAAI 2022

arXiv:2009.11087 [pdf, other]

Probabilistic Machine Learning for Healthcare

Authors: Irene Y. Chen, Shalmali Joshi, Marzyeh Ghassemi, Rajesh Ranganath

Abstract: Machine learning can be used to make sense of healthcare data. Probabilistic machine learning models help provide a complete picture of observed data in healthcare. In this review, we examine how probabilistic machine learning can advance healthcare. We consider challenges in the predictive model building pipeline where probabilistic models can be beneficial including calibration and missing data.… ▽ More Machine learning can be used to make sense of healthcare data. Probabilistic machine learning models help provide a complete picture of observed data in healthcare. In this review, we examine how probabilistic machine learning can advance healthcare. We consider challenges in the predictive model building pipeline where probabilistic models can be beneficial including calibration and missing data. Beyond predictive models, we also investigate the utility of probabilistic machine learning models in phenoty**, in generative models for clinical use cases, and in reinforcement learning. △ Less

Submitted 23 September, 2020; originally announced September 2020.

Comments: Annual Reviews of Biomedical Data Science 2021

arXiv:2003.00827 [pdf, other]

CheXclusion: Fairness gaps in deep chest X-ray classifiers

Authors: Laleh Seyyed-Kalantari, Guanxiong Liu, Matthew McDermott, Irene Y. Chen, Marzyeh Ghassemi

Abstract: Machine learning systems have received much attention recently for their ability to achieve expert-level performance on clinical tasks, particularly in medical imaging. Here, we examine the extent to which state-of-the-art deep learning classifiers trained to yield diagnostic labels from X-ray images are biased with respect to protected attributes. We train convolution neural networks to predict 1… ▽ More Machine learning systems have received much attention recently for their ability to achieve expert-level performance on clinical tasks, particularly in medical imaging. Here, we examine the extent to which state-of-the-art deep learning classifiers trained to yield diagnostic labels from X-ray images are biased with respect to protected attributes. We train convolution neural networks to predict 14 diagnostic labels in 3 prominent public chest X-ray datasets: MIMIC-CXR, Chest-Xray8, CheXpert, as well as a multi-site aggregation of all those datasets. We evaluate the TPR disparity -- the difference in true positive rates (TPR) -- among different protected attributes such as patient sex, age, race, and insurance type as a proxy for socioeconomic status. We demonstrate that TPR disparities exist in the state-of-the-art classifiers in all datasets, for all clinical tasks, and all subgroups. A multi-source dataset corresponds to the smallest disparities, suggesting one way to reduce bias. We find that TPR disparities are not significantly correlated with a subgroup's proportional disease burden. As clinical models move from papers to products, we encourage clinical decision makers to carefully audit for algorithmic disparities prior to deployment. Our code can be found at, https://github.com/LalehSeyyed/CheXclusion △ Less

Submitted 15 October, 2020; v1 submitted 14 February, 2020; originally announced March 2020.

Comments: Paper is accepted in Pacific Symposium on Biocomputing 2021 (PSB2021). Code can be found at, https://github.com/LalehSeyyed/CheXclusion

arXiv:1910.01116 [pdf, other]

Robustly Extracting Medical Knowledge from EHRs: A Case Study of Learning a Health Knowledge Graph

Authors: Irene Y. Chen, Monica Agrawal, Steven Horng, David Sontag

Abstract: Increasingly large electronic health records (EHRs) provide an opportunity to algorithmically learn medical knowledge. In one prominent example, a causal health knowledge graph could learn relationships between diseases and symptoms and then serve as a diagnostic tool to be refined with additional clinical input. Prior research has demonstrated the ability to construct such a graph from over 270,0… ▽ More Increasingly large electronic health records (EHRs) provide an opportunity to algorithmically learn medical knowledge. In one prominent example, a causal health knowledge graph could learn relationships between diseases and symptoms and then serve as a diagnostic tool to be refined with additional clinical input. Prior research has demonstrated the ability to construct such a graph from over 270,000 emergency department patient visits. In this work, we describe methods to evaluate a health knowledge graph for robustness. Moving beyond precision and recall, we analyze for which diseases and for which patients the graph is most accurate. We identify sample size and unmeasured confounders as major sources of error in the health knowledge graph. We introduce a method to leverage non-linear functions in building the causal graph to better understand existing model assumptions. Finally, to assess model generalizability, we extend to a larger set of complete patient visits within a hospital system. We conclude with a discussion on how to robustly extract medical knowledge from EHRs. △ Less

Submitted 1 October, 2019; originally announced October 2019.

Comments: 12 pages, presented at PSB 2020

arXiv:1806.00388 [pdf]

A Review of Challenges and Opportunities in Machine Learning for Health

Authors: Marzyeh Ghassemi, Tristan Naumann, Peter Schulam, Andrew L. Beam, Irene Y. Chen, Rajesh Ranganath

Abstract: Modern electronic health records (EHRs) provide data to answer clinically meaningful questions. The growing data in EHRs makes healthcare ripe for the use of machine learning. However, learning in a clinical setting presents unique challenges that complicate the use of common machine learning methodologies. For example, diseases in EHRs are poorly labeled, conditions can encompass multiple underly… ▽ More Modern electronic health records (EHRs) provide data to answer clinically meaningful questions. The growing data in EHRs makes healthcare ripe for the use of machine learning. However, learning in a clinical setting presents unique challenges that complicate the use of common machine learning methodologies. For example, diseases in EHRs are poorly labeled, conditions can encompass multiple underlying endotypes, and healthy individuals are underrepresented. This article serves as a primer to illuminate these challenges and highlights opportunities for members of the machine learning community to contribute to healthcare. △ Less

Submitted 5 December, 2019; v1 submitted 1 June, 2018; originally announced June 2018.

Comments: Updated version

Showing 1–5 of 5 results for author: Chen, I Y