-
Tree-based Subgroup Discovery In Electronic Health Records: Heterogeneity of Treatment Effects for DTG-containing Therapies
Authors:
Jiabei Yang,
Ann W. Mwangi,
Rami Kantor,
Issa J. Dahabreh,
Monicah Nyambura,
Allison Delong,
Joseph W. Hogan,
Jon A. Steingrimsson
Abstract:
The rich longitudinal individual level data available from electronic health records (EHRs) can be used to examine treatment effect heterogeneity. However, estimating treatment effects using EHR data poses several challenges, including time-varying confounding, repeated and temporally non-aligned measurements of covariates, treatment assignments and outcomes, and loss-to-follow-up due to dropout.…
▽ More
The rich longitudinal individual level data available from electronic health records (EHRs) can be used to examine treatment effect heterogeneity. However, estimating treatment effects using EHR data poses several challenges, including time-varying confounding, repeated and temporally non-aligned measurements of covariates, treatment assignments and outcomes, and loss-to-follow-up due to dropout. Here, we develop the Subgroup Discovery for Longitudinal Data (SDLD) algorithm, a tree-based algorithm for discovering subgroups with heterogeneous treatment effects using longitudinal data by combining the generalized interaction tree algorithm, a general data-driven method for subgroup discovery, with longitudinal targeted maximum likelihood estimation. We apply the algorithm to EHR data to discover subgroups of people living with human immunodeficiency virus (HIV) who are at higher risk of weight gain when receiving dolutegravir-containing antiretroviral therapies (ARTs) versus when receiving non dolutegravir-containing ARTs.
△ Less
Submitted 30 August, 2022;
originally announced August 2022.
-
Inference for BART with Multinomial Outcomes
Authors:
Yizhen Xu,
Joseph W. Hogan,
Michael J. Daniels,
Rami Kantor,
Ann Mwangi
Abstract:
The multinomial probit Bayesian additive regression trees (MPBART) framework was proposed by Kindo et al. (KD), approximating the latent utilities in the multinomial probit (MNP) model with BART (Chipman et al. 2010). Compared to multinomial logistic models, MNP does not assume independent alternatives and the correlation structure among alternatives can be specified through multivariate Gaussian…
▽ More
The multinomial probit Bayesian additive regression trees (MPBART) framework was proposed by Kindo et al. (KD), approximating the latent utilities in the multinomial probit (MNP) model with BART (Chipman et al. 2010). Compared to multinomial logistic models, MNP does not assume independent alternatives and the correlation structure among alternatives can be specified through multivariate Gaussian distributed latent utilities. We introduce two new algorithms for fitting the MPBART and show that the theoretical mixing rates of our proposals are equal or superior to the existing algorithm in KD. Through simulations, we explore the robustness of the methods to the choice of reference level, imbalance in outcome frequencies, and the specifications of prior hyperparameters for the utility error term. The work is motivated by the application of generating posterior predictive distributions for mortality and engagement in care among HIV-positive patients based on electronic health records (EHRs) from the Academic Model Providing Access to Healthcare (AMPATH) in Kenya. In both the application and simulations, we observe better performance using our proposals as compared to KD in terms of MCMC convergence rate and posterior predictive accuracy.
△ Less
Submitted 12 August, 2022; v1 submitted 17 January, 2021;
originally announced January 2021.
-
Modeling the Causal Effect of Treatment Initiation Time on Survival: Application to HIV/TB Co-infection
Authors:
Liangyuan Hu,
Joseph W. Hogan,
Ann W. Mwangi,
Abraham Siika
Abstract:
The timing of antiretroviral therapy (ART) initiation for HIV and tuberculosis (TB) co-infected patients needs to be considered carefully. CD4 cell count can be used to guide decision making about when to initiate ART. Evidence from recent randomized trials and observational studies generally supports early initiation but does not provide information about effects of initiation time on a continuou…
▽ More
The timing of antiretroviral therapy (ART) initiation for HIV and tuberculosis (TB) co-infected patients needs to be considered carefully. CD4 cell count can be used to guide decision making about when to initiate ART. Evidence from recent randomized trials and observational studies generally supports early initiation but does not provide information about effects of initiation time on a continuous scale. In this paper, we develop and apply a highly flexible structural proportional hazards model for characterizing the effect of treatment initiation time on a survival distribution. The model can be fitted using a weighted partial likelihood score function. Construction of both the score function and the weights must accommodate censoring of the treatment initiation time, the outcome, or both. The methods are applied to data on 4903 individuals with HIV/TB co-infection, derived from electronic health records in a large HIV care program in Kenya. We use a model formulation that flexibly captures the joint effects of ART initiation time and ART duration using natural cubic splines. The model is used to generate survival curves corresponding to specific treatment initiation times; and to identify optimal times for ART initiation for subgroups defined by CD4 count at time of TB diagnosis. Our findings potentially provide "higher resolution" information about the relationship between ART timing and mortality, and about the differential effect of ART timing within CD4 subgroups.
△ Less
Submitted 2 April, 2019;
originally announced April 2019.
-
Classification using Ensemble Learning under Weighted Misclassification Loss
Authors:
Yizhen Xu,
Tao Liu,
Michael J. Daniels,
Rami Kantor,
Ann Mwangi,
Joseph W. Hogan
Abstract:
Binary classification rules based on covariates typically depend on simple loss functions such as zero-one misclassification. Some cases may require more complex loss functions. For example, individual-level monitoring of HIV-infected individuals on antiretroviral therapy (ART) requires periodic assessment of treatment failure, defined as having a viral load (VL) value above a certain threshold. I…
▽ More
Binary classification rules based on covariates typically depend on simple loss functions such as zero-one misclassification. Some cases may require more complex loss functions. For example, individual-level monitoring of HIV-infected individuals on antiretroviral therapy (ART) requires periodic assessment of treatment failure, defined as having a viral load (VL) value above a certain threshold. In some resource limited settings, VL tests may be limited by cost or technology, and diagnoses are based on other clinical markers. Depending on scenario, higher premium may be placed on avoiding false-positives which brings greater cost and reduced treatment options. Here, the optimal rule is determined by minimizing a weighted misclassification loss/risk.
We propose a method for finding and cross-validating optimal binary classification rules under weighted misclassification loss. We focus on rules comprising a prediction score and an associated threshold, where the score is derived using an ensemble learner. Simulations and examples show that our method, which derives the score and threshold jointly, more accurately estimates overall risk and has better operating characteristics compared with methods that derive the score first and the cutoff conditionally on the score especially for finite samples.
△ Less
Submitted 10 May, 2019; v1 submitted 16 December, 2018;
originally announced December 2018.