Search | arXiv e-print repository

Adjusting for Selection Bias Due to Missing Eligibility Criteria in Emulated Target Trials

Authors: Luke Benz, Rajarshi Mukherjee, Issa Dahabreh, Rui Wang, David Arterburn, Catherine Lee, Heidi Fischer, Susan Shortreed, Sebastien Haneuse

Abstract: Target trial emulation (TTE) is a popular framework for observational studies based on electronic health records (EHR). A key component of this framework is determining the patient population eligible for inclusion in both a target trial of interest and its observational emulation. Missingness in variables that define eligibility criteria, however, presents a major challenge towards determining th… ▽ More Target trial emulation (TTE) is a popular framework for observational studies based on electronic health records (EHR). A key component of this framework is determining the patient population eligible for inclusion in both a target trial of interest and its observational emulation. Missingness in variables that define eligibility criteria, however, presents a major challenge towards determining the eligible population when emulating a target trial with an observational study. In practice, patients with incomplete data are almost always excluded from analysis despite the possibility of selection bias, which can arise when subjects with observed eligibility data are fundamentally different than excluded subjects. Despite this, to the best of our knowledge, very little work has been done to mitigate this concern. In this paper, we propose a novel conceptual framework to address selection bias in TTE studies, tailored towards time-to-event endpoints, and describe estimation and inferential procedures via inverse probability weighting (IPW). Under an EHR-based simulation infrastructure, developed to reflect the complexity of EHR data, we characterize common settings under which missing eligibility data poses the threat of selection bias and investigate the ability of the proposed methods to address it. Finally, using EHR databases from Kaiser Permanente, we demonstrate the use of our method to evaluate the effect of bariatric surgery on microvascular outcomes among a cohort of severely obese patients with Type II diabetes mellitus (T2DM). △ Less

Submitted 24 June, 2024; originally announced June 2024.

arXiv:2311.01638 [pdf, other]

Inference on summaries of a model-agnostic longitudinal variable importance trajectory

Authors: Brian D. Williamson, Erica E. M. Moodie, Susan M. Shortreed

Abstract: In prediction settings where data are collected over time, it is often of interest to understand both the importance of variables for predicting the response at each time point and the importance summarized over the time series. Building on recent advances in estimation and inference for variable importance measures, we define summaries of variable importance trajectories. These measures can be es… ▽ More In prediction settings where data are collected over time, it is often of interest to understand both the importance of variables for predicting the response at each time point and the importance summarized over the time series. Building on recent advances in estimation and inference for variable importance measures, we define summaries of variable importance trajectories. These measures can be estimated and the same approaches for inference can be applied regardless of the choice of the algorithm(s) used to estimate the prediction function. We propose a nonparametric efficient estimation and inference procedure as well as a null hypothesis testing procedure that are valid even when complex machine learning tools are used for prediction. Through simulations, we demonstrate that our proposed procedures have good operating characteristics, and we illustrate their use by investigating the longitudinal importance of risk factors for suicide attempt. △ Less

Submitted 2 November, 2023; originally announced November 2023.

Comments: 65 pages (29 main, 36 supplementary), 5 figures (3 main, 2 supplementary), 19 tables (2 main, 17 supplementary)

arXiv:2310.09239 [pdf, other]

Causal Quantile Treatment Effects with missing data by double-sampling

Authors: Shuo, Sun, Sebastien Haneuse, Alexander W. Levis, Catherine Lee, David E Arterburn, Heidi Fischer, Susan Shortreed, Rajarshi Mukherjee

Abstract: Causal weighted quantile treatment effects (WQTE) are a useful complement to standard causal contrasts that focus on the mean when interest lies at the tails of the counterfactual distribution. To-date, however, methods for estimation and inference regarding causal WQTEs have assumed complete data on all relevant factors. In most practical settings, however, data will be missing or incomplete data… ▽ More Causal weighted quantile treatment effects (WQTE) are a useful complement to standard causal contrasts that focus on the mean when interest lies at the tails of the counterfactual distribution. To-date, however, methods for estimation and inference regarding causal WQTEs have assumed complete data on all relevant factors. In most practical settings, however, data will be missing or incomplete data, particularly when the data are not collected for research purposes, as is the case for electronic health records and disease registries. Furthermore, such data sources may be particularly susceptible to the outcome data being missing-not-at-random (MNAR). In this paper, we consider the use of double-sampling, through which the otherwise missing data are ascertained on a sub-sample of study units, as a strategy to mitigate bias due to MNAR data in the estimation of causal WQTEs. With the additional data in-hand, we present identifying conditions that do not require assumptions regarding missingness in the original data. We then propose a novel inverse-probability weighted estimator and derive its asymptotic properties, both pointwise at specific quantiles and uniformly across a range of quantiles over some compact subset of (0,1), allowing the propensity score and double-sampling probabilities to be estimated. For practical inference, we develop a bootstrap method that can be used for both pointwise and uniform inference. A simulation study is conducted to examine the finite sample performance of the proposed estimators. The proposed method is illustrated with data from an EHR-based study examining the relative effects of two bariatric surgery procedures on BMI loss at 3 years post-surgery. △ Less

Submitted 13 May, 2024; v1 submitted 13 October, 2023; originally announced October 2023.

arXiv:2205.13609 [pdf, ps, other]

Variable Selection for Individualized Treatment Rules with Discrete Outcomes

Authors: Zeyu Bian, Erica EM Moodie, Susan M Shortreed, Sylvie D Lambert, Sahir Bhatnagar

Abstract: An individualized treatment rule (ITR) is a decision rule that aims to improve individual patients health outcomes by recommending optimal treatments according to patients specific information. In observational studies, collected data may contain many variables that are irrelevant for making treatment decisions. Including all available variables in the statistical model for the ITR could yield a l… ▽ More An individualized treatment rule (ITR) is a decision rule that aims to improve individual patients health outcomes by recommending optimal treatments according to patients specific information. In observational studies, collected data may contain many variables that are irrelevant for making treatment decisions. Including all available variables in the statistical model for the ITR could yield a loss of efficiency and an unnecessarily complicated treatment rule, which is difficult for physicians to interpret or implement. Thus, a data-driven approach to select important tailoring variables with the aim of improving the estimated decision rules is crucial. While there is a growing body of literature on selecting variables in ITRs with continuous outcomes, relatively few methods exist for discrete outcomes, which pose additional computational challenges even in the absence of variable selection. In this paper, we propose a variable selection method for ITRs with discrete outcomes. We show theoretically and empirically that our approach has the double robustness property, and that it compares favorably with other competing approaches. We illustrate the proposed method on data from a study of an adaptive web-based stress management tool to identify which variables are relevant for tailoring treatment. △ Less

Submitted 29 September, 2023; v1 submitted 26 May, 2022; originally announced May 2022.

arXiv:2202.09611 [pdf, other]

Estimating Individualized Treatment Rules in Longitudinal Studies with Covariate-Driven Observation Times

Authors: Janie Coulombe, Erica E. M. Moodie, Susan M. Shortreed, Christel Renoux

Abstract: The sequential treatment decisions made by physicians to treat chronic diseases are formalized in the statistical literature as dynamic treatment regimes. To date, methods for dynamic treatment regimes have been developed under the assumption that observation times, i.e., treatment and outcome monitoring times, are determined by study investigators. That assumption is often not satisfied in electr… ▽ More The sequential treatment decisions made by physicians to treat chronic diseases are formalized in the statistical literature as dynamic treatment regimes. To date, methods for dynamic treatment regimes have been developed under the assumption that observation times, i.e., treatment and outcome monitoring times, are determined by study investigators. That assumption is often not satisfied in electronic health records data in which the outcome, the observation times, and the treatment mechanism are associated with patients' characteristics. The treatment and observation processes can lead to spurious associations between the treatment of interest and the outcome to be optimized under the dynamic treatment regime if not adequately considered in the analysis. We address these associations by incorporating two inverse weights that are functions of a patient's covariates into dynamic weighted ordinary least squares to develop optimal single stage dynamic treatment regimes, known as individualized treatment rules. We show empirically that our methodology yields consistent, multiply robust estimators. In a cohort of new users of antidepressant drugs from the United Kingdom's Clinical Practice Research Datalink, the proposed method is used to develop an optimal treatment rule that chooses between two antidepressants to optimize a utility function related to the change in body mass index. △ Less

Submitted 19 February, 2022; originally announced February 2022.

arXiv:2202.09451 [pdf, ps, other]

Using Pilot Data to Size Observational Studies for the Estimation of Dynamic Treatment Regimes

Authors: Eric J. Rose, Erica E. M. Moodie, Susan Shortreed

Abstract: There has been significant attention given to develo** data-driven methods for tailoring patient care based on individual patient characteristics. Dynamic treatment regimes formalize this through a sequence of decision rules that map patient information to a suggested treatment. The data for estimating and evaluating treatment regimes are ideally gathered through the use of Sequential Multiple A… ▽ More There has been significant attention given to develo** data-driven methods for tailoring patient care based on individual patient characteristics. Dynamic treatment regimes formalize this through a sequence of decision rules that map patient information to a suggested treatment. The data for estimating and evaluating treatment regimes are ideally gathered through the use of Sequential Multiple Assignment Randomized Trials (SMARTs) though longitudinal observational studies are commonly used due to the potentially prohibitive costs of conducting a SMART. These studies are typically sized for simple comparisons of fixed treatment sequences or, in the case of observational studies, a priori sample size calculations are often not performed. We develop sample size procedures for the estimation of dynamic treatment regimes from observational studies. Our approach uses pilot data to ensure a study will have sufficient power for comparing the value of the optimal regime, i.e. the expected outcome if all patients in the population were treated by following the optimal regime, with a known comparison mean. Our approach also ensures the value of the estimated optimal treatment regime is within an a priori set range of the value of the true optimal regime with a high probability. We examine the performance of the proposed procedure with a simulation study and use it to size a study for reducing depressive symptoms using data from electronic health records. △ Less

Submitted 18 February, 2022; originally announced February 2022.

arXiv:2202.09448 [pdf, other]

Monte Carlo Sensitivity Analysis for Unmeasured Confounding in Dynamic Treatment Regimes

Authors: Eric J. Rose, Erica E. M. Moodie, Susan Shortreed

Abstract: Data-driven methods for personalizing treatment assignment have garnered much attention from clinicians and researchers. Dynamic treatment regimes formalize this through a sequence of decision rules that map individual patient characteristics to a recommended treatment. Observational studies are commonly used for estimating dynamic treatment regimes due to the potentially prohibitive costs of cond… ▽ More Data-driven methods for personalizing treatment assignment have garnered much attention from clinicians and researchers. Dynamic treatment regimes formalize this through a sequence of decision rules that map individual patient characteristics to a recommended treatment. Observational studies are commonly used for estimating dynamic treatment regimes due to the potentially prohibitive costs of conducting sequential multiple assignment randomized trials. However, estimating a dynamic treatment regime from observational data can lead to bias in the estimated regime due to unmeasured confounding. Sensitivity analyses are useful for assessing how robust the conclusions of the study are to a potential unmeasured confounder. A Monte Carlo sensitivity analysis is a probabilistic approach that involves positing and sampling from distributions for the parameters governing the bias. We propose a method for performing a Monte Carlo sensitivity analysis of the bias due to unmeasured confounding in the estimation of dynamic treatment regimes. We demonstrate the performance of the proposed procedure with a simulation study and apply it to an observational study examining tailoring the use of antidepressants for reducing symptoms of depression using data from Kaiser Permanente Washington (KPWA). △ Less

Submitted 18 February, 2022; originally announced February 2022.

arXiv:2202.07003 [pdf, other]

Privacy-preserving estimation of an optimal individualized treatment rule : A case study in maximizing time to severe depression-related outcomes

Authors: Erica EM Moodie, Janie Coulombe, Coraline Danieli, Christel Renoux, Susan M Shortreed

Abstract: Estimating individualized treatment rules - particularly in the context of right-censored outcomes - is challenging because the treatment effect heterogeneity of interest is often small, thus difficult to detect. While this motivates the use of very large datasets such as those from multiple health systems or centres, data privacy may be of concern with participating data centres reluctant to shar… ▽ More Estimating individualized treatment rules - particularly in the context of right-censored outcomes - is challenging because the treatment effect heterogeneity of interest is often small, thus difficult to detect. While this motivates the use of very large datasets such as those from multiple health systems or centres, data privacy may be of concern with participating data centres reluctant to share individual-level data. In this case study on the treatment of depression, we demonstrate an application of distributed regression for privacy protection used in combination with dynamic weighted survival modelling (DWSurv) to estimate an optimal individualized treatment rule whilst obscuring individual-level data. In simulations, we demonstrate the flexibility of this approach to address local treatment practices that may affect confounding, and show that DWSurv retains its double robustness even when performed through a (weighted) distributed regression approach. The work is motivated by, and illustrated with, an analysis of treatment for unipolar depression using the United Kingdom's Clinical Practice Research Datalink. △ Less

Submitted 14 February, 2022; originally announced February 2022.

arXiv:2101.07359 [pdf, other]

Variable Selection in Regression-based Estimation of Dynamic Treatment Regimes

Authors: Zeyu Bian, Erica EM Moodie, Susan M Shortreed, Sahir Bhatnagar

Abstract: Dynamic treatment regimes (DTRs) consist of a sequence of decision rules, one per stage of intervention, that finds effective treatments for individual patients according to patient information history. DTRs can be estimated from models which include the interaction between treatment and a small number of covariates which are often chosen a priori. However, with increasingly large and complex data… ▽ More Dynamic treatment regimes (DTRs) consist of a sequence of decision rules, one per stage of intervention, that finds effective treatments for individual patients according to patient information history. DTRs can be estimated from models which include the interaction between treatment and a small number of covariates which are often chosen a priori. However, with increasingly large and complex data being collected, it is difficult to know which prognostic factors might be relevant in the treatment rule. Therefore, a more data-driven approach of selecting these covariates might improve the estimated decision rules and simplify models to make them easier to interpret. We propose a variable selection method for DTR estimation using penalized dynamic weighted least squares. Our method has the strong heredity property, that is, an interaction term can be included in the model only if the corresponding main terms have also been selected. Through simulations, we show our method has both the double robustness property and the oracle property, and the newly proposed methods compare favorably with other variable selection approaches. △ Less

Submitted 3 December, 2021; v1 submitted 18 January, 2021; originally announced January 2021.

arXiv:1207.1358 [pdf]

Unsupervised spectral learning

Authors: Susan Shortreed, Marina Meila

Abstract: In spectral clustering and spectral image segmentation, the data is partioned starting from a given matrix of pairwise similarities S. the matrix S is constructed by hand, or learned on a separate training set. In this paper we show how to achieve spectral clustering in unsupervised mode. Our algorithm starts with a set of observed pairwise features, which are possible components of an unknown, pa… ▽ More In spectral clustering and spectral image segmentation, the data is partioned starting from a given matrix of pairwise similarities S. the matrix S is constructed by hand, or learned on a separate training set. In this paper we show how to achieve spectral clustering in unsupervised mode. Our algorithm starts with a set of observed pairwise features, which are possible components of an unknown, parametric similarity function. This function is learned iteratively, at the same time as the clustering of the data. The algorithm shows promosing results on synthetic and real data. △ Less

Submitted 4 July, 2012; originally announced July 2012.

Comments: Appears in Proceedings of the Twenty-First Conference on Uncertainty in Artificial Intelligence (UAI2005)

Report number: UAI-P-2005-PG-534-541

Showing 1–10 of 10 results for author: Shortreed, S