-
Analysis of time-to-event for observational studies: Guidance to the use of intensity models
Authors:
Per Kragh Andersen,
Maja Pohar Perme,
Hans C van Houwelingen,
Richard J Cook,
Pierre Joly,
Torben Martinussen,
Jeremy MG Taylor,
Michal Abrahamowicz,
Terry M Therneau
Abstract:
This paper provides guidance for researchers with some mathematical background on the conduct of time-to-event analysis in observational studies based on intensity (hazard) models. Discussions of basic concepts like time axis, event definition and censoring are given. Hazard models are introduced, with special emphasis on the Cox proportional hazards regression model. We provide check lists that m…
▽ More
This paper provides guidance for researchers with some mathematical background on the conduct of time-to-event analysis in observational studies based on intensity (hazard) models. Discussions of basic concepts like time axis, event definition and censoring are given. Hazard models are introduced, with special emphasis on the Cox proportional hazards regression model. We provide check lists that may be useful both when fitting the model and assessing its goodness of fit and when interpreting the results. Special attention is paid to how to avoid problems with immortal time bias by introducing time-dependent covariates. We discuss prediction based on hazard models and difficulties when attempting to draw proper causal conclusions from such models. Finally, we present a series of examples where the methods and check lists are exemplified. Computational details and implementation using the freely available R software are documented in Supplementary Material. The paper was prepared as part of the STRATOS initiative.
△ Less
Submitted 28 May, 2020; v1 submitted 27 May, 2020;
originally announced May 2020.
-
Unsupervised Machine Learning for the Discovery of Latent Disease Clusters and Patient Subgroups Using Electronic Health Records
Authors:
Yanshan Wang,
Yiqing Zhao,
Terry M. Therneau,
Elizabeth J. Atkinson,
Ahmad P. Tafti,
Nan Zhang,
Shreyasee Amin,
Andrew H. Limper,
Hongfang Liu
Abstract:
Machine learning has become ubiquitous and a key technology on mining electronic health records (EHRs) for facilitating clinical research and practice. Unsupervised machine learning, as opposed to supervised learning, has shown promise in identifying novel patterns and relations from EHRs without using human created labels. In this paper, we investigate the application of unsupervised machine lear…
▽ More
Machine learning has become ubiquitous and a key technology on mining electronic health records (EHRs) for facilitating clinical research and practice. Unsupervised machine learning, as opposed to supervised learning, has shown promise in identifying novel patterns and relations from EHRs without using human created labels. In this paper, we investigate the application of unsupervised machine learning models in discovering latent disease clusters and patient subgroups based on EHRs. We utilized Latent Dirichlet Allocation (LDA), a generative probabilistic model, and proposed a novel model named Poisson Dirichlet Model (PDM), which extends the LDA approach using a Poisson distribution to model patients' disease diagnoses and to alleviate age and sex factors by considering both observed and expected observations. In the empirical experiments, we evaluated LDA and PDM on three patient cohorts with EHR data retrieved from the Rochester Epidemiology Project (REP), for the discovery of latent disease clusters and patient subgroups. We compared the effectiveness of LDA and PDM in identifying latent disease clusters through the visualization of disease representations learned by two approaches. We also tested the performance of LDA and PDM in differentiating patient subgroups through survival analysis, as well as statistical analysis. The experimental results show that the proposed PDM could effectively identify distinguished disease clusters by alleviating the impact of age and sex, and that LDA could stratify patients into more differentiable subgroups than PDM in terms of p-values. However, the subgroups discovered by PDM might imply the underlying patterns of diseases of greater interest in epidemiology research due to the alleviation of age and sex. Both unsupervised machine learning approaches could be leveraged to discover patient subgroups using EHRs but with different foci.
△ Less
Submitted 17 May, 2019;
originally announced May 2019.
-
A Bayesian Approach to Multi-State Hidden Markov Models: Application to Dementia Progression
Authors:
Jonathan P Williams,
Curtis B Storlie,
Terry M Therneau,
Clifford R Jack Jr,
Jan Hannig
Abstract:
People are living longer than ever before, and with this arises new complications and challenges for humanity. Among the most pressing of these challenges is of understanding the role of aging in the development of dementia. This paper is motivated by the Mayo Clinic Study of Aging data for 4742 subjects since 2004, and how it can be used to draw inference on the role of aging in the development o…
▽ More
People are living longer than ever before, and with this arises new complications and challenges for humanity. Among the most pressing of these challenges is of understanding the role of aging in the development of dementia. This paper is motivated by the Mayo Clinic Study of Aging data for 4742 subjects since 2004, and how it can be used to draw inference on the role of aging in the development of dementia. We construct a hidden Markov model (HMM) to represent progression of dementia from states associated with the buildup of amyloid plaque in the brain, and the loss of cortical thickness. A hierarchical Bayesian approach is taken to estimate the parameters of the HMM with a truly time-inhomogeneous infinitesimal generator matrix, and response functions of the continuous-valued biomarker measurements are cut-point agnostic. A Bayesian approach with these features could be useful in many disease progression models. Additionally, an approach is illustrated for correcting a common bias in delayed enrollment studies, in which some or all subjects are not observed at baseline. Standard software is incapable of accounting for this critical feature, so code to perform the estimation of the model described below is made available online.
△ Less
Submitted 6 August, 2018; v1 submitted 7 February, 2018;
originally announced February 2018.
-
Prediction and Inference with Missing Data in Patient Alert Systems
Authors:
Curtis B. Storlie,
Terry M. Therneau,
Rickey E. Carter,
Nicholas Chia,
John R. Bergquist,
Jeanne M. Huddleston,
Santiago Romero-Brufau
Abstract:
We describe the Bedside Patient Rescue (BPR) project, the goal of which is risk prediction of adverse events for non-ICU patients using ~200 variables (vitals, lab results, assessments, ...). There are several missing predictor values for most patients, which in the health sciences is the norm, rather than the exception. A Bayesian approach is presented that addresses many of the shortcomings to s…
▽ More
We describe the Bedside Patient Rescue (BPR) project, the goal of which is risk prediction of adverse events for non-ICU patients using ~200 variables (vitals, lab results, assessments, ...). There are several missing predictor values for most patients, which in the health sciences is the norm, rather than the exception. A Bayesian approach is presented that addresses many of the shortcomings to standard approaches to missing predictors: (i) treatment of the uncertainty due to imputation is straight-forward in the Bayesian paradigm, (ii) the predictor distribution is flexibly modeled as an infinite normal mixture with latent variables to explicitly account for discrete predictors (i.e., as in multivariate probit regression models), and (iii) certain missing not at random situations can be handled effectively by allowing the indicator of missingness into the predictor distribution only to inform the distribution of the missing variables. The proposed approach also has the benefit of providing a distribution for the prediction, including the uncertainty inherent in the imputation. Therefore, we can ask questions such as: is it possible this individual is at high risk but we are missing too much information to know for sure? How much would we reduce the uncertainty in our risk prediction by obtaining a particular missing value? This approach is applied to the BPR problem resulting in excellent predictive capability to identify deteriorating patients.
△ Less
Submitted 25 April, 2017;
originally announced April 2017.