-
Robust integration of external control data in randomized trials
Authors:
Rickard Karlsson,
Guanbo Wang,
Jesse H. Krijthe,
Issa J. Dahabreh
Abstract:
One approach for increasing the efficiency of randomized trials is the use of "external controls" -- individuals who received the control treatment in the trial during routine practice or in prior experimental studies. Existing external control methods, however, can have substantial bias if the populations underlying the trial and the external control data are not exchangeable. Here, we characteri…
▽ More
One approach for increasing the efficiency of randomized trials is the use of "external controls" -- individuals who received the control treatment in the trial during routine practice or in prior experimental studies. Existing external control methods, however, can have substantial bias if the populations underlying the trial and the external control data are not exchangeable. Here, we characterize a randomization-aware class of treatment effect estimators in the population underlying the trial that remain consistent and asymptotically normal when using external control data, even when exchangeability does not hold. We consider two members of this class of estimators: the well-known augmented inverse probability weighting trial-only estimator, which is the efficient estimator when only trial data are used; and a more efficient member of the class when exchangeability holds and external control data are available, which we refer to as the optimized randomization-aware estimator. To achieve robust integration of external control data in trial analyses, we then propose a combined estimator based on the efficient trial-only estimator and the optimized randomization-aware estimator. We show that the combined estimator is consistent and no less efficient than the most efficient of the two component estimators, whether the exchangeability assumption holds or not. We examine the estimators' performance in simulations and we illustrate their use with data from two trials of paliperidone extended-release for schizophrenia.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
Adjusting for Selection Bias Due to Missing Eligibility Criteria in Emulated Target Trials
Authors:
Luke Benz,
Rajarshi Mukherjee,
Issa Dahabreh,
Rui Wang,
David Arterburn,
Catherine Lee,
Heidi Fischer,
Susan Shortreed,
Sebastien Haneuse
Abstract:
Target trial emulation (TTE) is a popular framework for observational studies based on electronic health records (EHR). A key component of this framework is determining the patient population eligible for inclusion in both a target trial of interest and its observational emulation. Missingness in variables that define eligibility criteria, however, presents a major challenge towards determining th…
▽ More
Target trial emulation (TTE) is a popular framework for observational studies based on electronic health records (EHR). A key component of this framework is determining the patient population eligible for inclusion in both a target trial of interest and its observational emulation. Missingness in variables that define eligibility criteria, however, presents a major challenge towards determining the eligible population when emulating a target trial with an observational study. In practice, patients with incomplete data are almost always excluded from analysis despite the possibility of selection bias, which can arise when subjects with observed eligibility data are fundamentally different than excluded subjects. Despite this, to the best of our knowledge, very little work has been done to mitigate this concern. In this paper, we propose a novel conceptual framework to address selection bias in TTE studies, tailored towards time-to-event endpoints, and describe estimation and inferential procedures via inverse probability weighting (IPW). Under an EHR-based simulation infrastructure, developed to reflect the complexity of EHR data, we characterize common settings under which missing eligibility data poses the threat of selection bias and investigate the ability of the proposed methods to address it. Finally, using EHR databases from Kaiser Permanente, we demonstrate the use of our method to evaluate the effect of bariatric surgery on microvascular outcomes among a cohort of severely obese patients with Type II diabetes mellitus (T2DM).
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Combining an experimental study with external data: study designs and identification strategies
Authors:
Lawson Ung,
Guanbo Wang,
Sebastien Haneuse,
Miguel A. Hernan,
Issa J. Dahabreh
Abstract:
There is increasing interest in combining information from experimental studies, including randomized and single-group trials, with information from external experimental or observational data sources. Such efforts are usually motivated by the desire to compare treatments evaluated in different studies -- for instance, through the introduction of external treatment groups -- or to estimate treatme…
▽ More
There is increasing interest in combining information from experimental studies, including randomized and single-group trials, with information from external experimental or observational data sources. Such efforts are usually motivated by the desire to compare treatments evaluated in different studies -- for instance, through the introduction of external treatment groups -- or to estimate treatment effects with greater precision. Proposals to combine experimental studies with external data were made at least as early as the 1970s, but in recent years have come under increasing consideration by regulatory agencies involved in drug and device evaluation, particularly with the increasing availability of rich observational data. In this paper, we describe basic templates of study designs and data structures for combining information from experimental studies with external data, and use the potential (counterfactual) outcomes framework to elaborate identification strategies for potential outcome means and average treatment effects in these designs. In formalizing designs and identification strategies for combining information from experimental studies with external data, we hope to provide a conceptual foundation to support the systematic use and evaluation of such efforts.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Causal inference under transportability assumptions for conditional relative effect measures
Authors:
Guanbo Wang,
Alexander Levis,
Jon Steingrimsson,
Issa Dahabreh
Abstract:
When extending inferences from a randomized trial to a new target population, an assumption of transportability of difference effect measures (e.g., conditional average treatment effects) -- or even stronger assumptions of transportability in expectation or distribution of potential outcomes -- is invoked to identify the marginal causal mean difference in the target population. However, many clini…
▽ More
When extending inferences from a randomized trial to a new target population, an assumption of transportability of difference effect measures (e.g., conditional average treatment effects) -- or even stronger assumptions of transportability in expectation or distribution of potential outcomes -- is invoked to identify the marginal causal mean difference in the target population. However, many clinical investigators believe that relative effect measures conditional on covariates, such as conditional risk ratios and mean ratios, are more likely to be ``transportable'' across populations compared with difference effect measures. Here, we examine the identification and estimation of the marginal counterfactual mean difference and ratio under a transportability assumption for conditional relative effect measures. We obtain identification results for two scenarios that often arise in practice when individuals in the target population (1) only have access to the control treatment, or (2) have access to the control and other treatments but not necessarily the experimental treatment evaluated in the trial. We then propose multiply robust and nonparametric efficient estimators that allow for the use of data-adaptive methods (e.g., machine learning techniques) to model the nuisance parameters. We examine the performance of the methods in simulation studies and illustrate their use with data from two trials of paliperidone for patients with schizophrenia. We conclude that the proposed methods are attractive when background knowledge suggests that the transportability assumption for conditional relative effect measures is more plausible than alternative assumptions.
△ Less
Submitted 4 February, 2024;
originally announced February 2024.
-
Efficient estimation of subgroup treatment effects using multi-source data
Authors:
Guanbo Wang,
Alexander Levis,
Jon Steingrimsson,
Issa Dahabreh
Abstract:
Investigators often use multi-source data (e.g., multi-center trials, meta-analyses of randomized trials, pooled analyses of observational cohorts) to learn about the effects of interventions in subgroups of some well-defined target population. Such a target population can correspond to one of the data sources of the multi-source data or an external population in which the treatment and outcome in…
▽ More
Investigators often use multi-source data (e.g., multi-center trials, meta-analyses of randomized trials, pooled analyses of observational cohorts) to learn about the effects of interventions in subgroups of some well-defined target population. Such a target population can correspond to one of the data sources of the multi-source data or an external population in which the treatment and outcome information may not be available. We develop and evaluate methods for using multi-source data to estimate subgroup potential outcome means and treatment effects in a target population. We consider identifiability conditions and propose doubly robust estimators that, under mild conditions, are non-parametrically efficient and allow for nuisance functions to be estimated using flexible data-adaptive methods (e.g., machine learning techniques). We also show how to construct confidence intervals and simultaneous confidence bands for the estimated subgroup treatment effects. We examine the properties of the proposed estimators in simulation studies and compare performance against alternative estimators. We also conclude that our methods work well when the sample size of the target population is much larger than the sample size of the multi-source data. We illustrate the proposed methods in a meta-analysis of randomized trials for schizophrenia.
△ Less
Submitted 4 February, 2024;
originally announced February 2024.
-
Assessing model performance for counterfactual predictions
Authors:
Christopher B. Boyer,
Issa J. Dahabreh,
Jon A. Steingrimsson
Abstract:
Counterfactual prediction methods are required when a model will be deployed in a setting where treatment policies differ from the setting where the model was developed, or when the prediction question is explicitly counterfactual. However, estimating and evaluating counterfactual prediction models is challenging because one does not observe the full set of potential outcomes for all individuals.…
▽ More
Counterfactual prediction methods are required when a model will be deployed in a setting where treatment policies differ from the setting where the model was developed, or when the prediction question is explicitly counterfactual. However, estimating and evaluating counterfactual prediction models is challenging because one does not observe the full set of potential outcomes for all individuals. Here, we discuss how to tailor a model to a counterfactual estimand, how to assess the model's performance, and how to perform model and tuning parameter selection. We also provide identifiability results for measures of performance for a potentially misspecified counterfactual prediction model based on training and test data from the same (factual) source population. Last, we illustrate the methods using simulation and apply them to the task of develo** a statin-naïve risk prediction model for cardiovascular disease.
△ Less
Submitted 6 September, 2023; v1 submitted 24 August, 2023;
originally announced August 2023.
-
Sensitivity analysis for studies transporting prediction models
Authors:
Jon A. Steingrimsson,
Sarah E. Robertson,
Issa J. Dahabreh
Abstract:
We consider the estimation of measures of model performance in a target population when covariate and outcome data are available on a sample from some source population and covariate data, but not outcome data, are available on a simple random sample from the target population. When outcome data are not available from the target population, identification of measures of model performance is possib…
▽ More
We consider the estimation of measures of model performance in a target population when covariate and outcome data are available on a sample from some source population and covariate data, but not outcome data, are available on a simple random sample from the target population. When outcome data are not available from the target population, identification of measures of model performance is possible under an untestable assumption that the outcome and population (source or target population) are independent conditional on covariates. In practice, this assumption is uncertain and, in some cases, controversial. Therefore, sensitivity analysis may be useful for examining the impact of assumption violations on inferences about model performance. Here, we propose an exponential tilt sensitivity analysis model and develop statistical methods to determine how sensitive measures of model performance are to violations of the assumption of conditional independence between outcome and population. We provide identification results and estimators for the risk in the target population, examine the large-sample properties of the estimators, and apply the estimators to data on individuals with stable ischemic heart disease.
△ Less
Submitted 13 June, 2023;
originally announced June 2023.
-
Generalizability analyses with a partially nested trial design: the Necrotizing Enterocolitis Surgery Trial
Authors:
Sarah E. Robertson,
Matthew A. Rysavy,
Martin L. Blakely,
Jon A. Steingrimsson,
Issa J. Dahabreh
Abstract:
We discuss generalizability analyses under a partially nested trial design, where part of the trial is nested within a cohort of trial-eligible individuals, while the rest of the trial is not nested. This design arises, for example, when only some centers participating in a trial are able to collect data on non-randomized individuals, or when data on non-randomized individuals cannot be collected…
▽ More
We discuss generalizability analyses under a partially nested trial design, where part of the trial is nested within a cohort of trial-eligible individuals, while the rest of the trial is not nested. This design arises, for example, when only some centers participating in a trial are able to collect data on non-randomized individuals, or when data on non-randomized individuals cannot be collected for the full duration of the trial. Our work is motivated by the Necrotizing Enterocolitis Surgery Trial (NEST) that compared initial laparotomy versus peritoneal drain for infants with necrotizing enterocolitis or spontaneous intestinal perforation. During the first phase of the study, data were collected from randomized individuals as well as consenting non-randomized individuals; during the second phase of the study, however, data were only collected from randomized individuals, resulting in a partially nested trial design. We propose methods for generalizability analyses with partially nested trial designs. We describe identification conditions and propose estimators for causal estimands in the target population of all trial-eligible individuals, both randomized and non-randomized, in the part of the data where the trial is nested, while using trial information spanning both parts. We evaluate the estimators in a simulation study.
△ Less
Submitted 1 June, 2023;
originally announced June 2023.
-
A Causal Roadmap for Generating High-Quality Real-World Evidence
Authors:
Lauren E Dang,
Susan Gruber,
Hana Lee,
Issa Dahabreh,
Elizabeth A Stuart,
Brian D Williamson,
Richard Wyss,
Iván Díaz,
Debashis Ghosh,
Emre Kıcıman,
Demissie Alemayehu,
Katherine L Hoffman,
Carla Y Vossen,
Raymond A Huml,
Henrik Ravn,
Kajsa Kvist,
Richard Pratley,
Mei-Chiung Shih,
Gene Pennello,
David Martin,
Salina P Waddy,
Charles E Barr,
Mouna Akacha,
John B Buse,
Mark van der Laan
, et al. (1 additional authors not shown)
Abstract:
Increasing emphasis on the use of real-world evidence (RWE) to support clinical policy and regulatory decision-making has led to a proliferation of guidance, advice, and frameworks from regulatory agencies, academia, professional societies, and industry. A broad spectrum of studies use real-world data (RWD) to produce RWE, ranging from randomized controlled trials with outcomes assessed using RWD…
▽ More
Increasing emphasis on the use of real-world evidence (RWE) to support clinical policy and regulatory decision-making has led to a proliferation of guidance, advice, and frameworks from regulatory agencies, academia, professional societies, and industry. A broad spectrum of studies use real-world data (RWD) to produce RWE, ranging from randomized controlled trials with outcomes assessed using RWD to fully observational studies. Yet many RWE study proposals lack sufficient detail to evaluate adequacy, and many analyses of RWD suffer from implausible assumptions, other methodological flaws, or inappropriate interpretations. The Causal Roadmap is an explicit, itemized, iterative process that guides investigators to pre-specify analytic study designs; it addresses a wide range of guidance within a single framework. By requiring transparent evaluation of causal assumptions and facilitating objective comparisons of design and analysis choices based on pre-specified criteria, the Roadmap can help investigators to evaluate the quality of evidence that a given study is likely to produce, specify a study to generate high-quality RWE, and communicate effectively with regulatory agencies and other stakeholders. This paper aims to disseminate and extend the Causal Roadmap framework for use by clinical and translational researchers, with companion papers demonstrating application of the Causal Roadmap for specific use cases.
△ Less
Submitted 11 May, 2023;
originally announced May 2023.
-
Generalizing and transporting inferences about the effects of treatment assignment subject to non-adherence
Authors:
Issa J. Dahabreh,
Sarah E. Robertson,
Miguel A. Hernán
Abstract:
We discuss the identifiability of causal estimands for generalizability and transportability analyses, both under perfect and imperfect adherence to treatment assignment. We consider a setting where the trial data contain information on baseline covariates, assignment at baseline, intervention at baseline (point treatment), and outcomes; and where the data from non-randomized individuals only cont…
▽ More
We discuss the identifiability of causal estimands for generalizability and transportability analyses, both under perfect and imperfect adherence to treatment assignment. We consider a setting where the trial data contain information on baseline covariates, assignment at baseline, intervention at baseline (point treatment), and outcomes; and where the data from non-randomized individuals only contain information on baseline covariates. In this setting, we review identification results under perfect adherence and study two examples in which non-adherence severely limits the ability to transport inferences about the effects of treatment assignment to the target population. In the first example, trial participation has a direct effect on treatment receipt and, through treatment receipt, on the outcome (a "trial engagement effect" via adherence). In the second example, participation in the trial has unmeasured common causes with treatment receipt. In both examples, the effect of assignment on the outcome in the target population is not identifiable. In the first example, however, the effect of joint interventions to scale-up trial activities that affect adherence and assign treatment is identifiable. We conclude that generalizability and transportability analyses should consider trial engagement effects via adherence and selection for participation on the basis of unmeasured factors that influence adherence.
△ Less
Submitted 9 November, 2022;
originally announced November 2022.
-
Robust Estimation of Loss-Based Measures of Model Performance under Covariate Shift
Authors:
Samantha Morrison,
Constantine Gatsonis,
Issa J. Dahabreh,
Bing Li,
Jon A. Steingrimsson
Abstract:
We present methods for estimating loss-based measures of the performance of a prediction model in a target population that differs from the source population in which the model was developed, in settings where outcome and covariate data are available from the source population but only covariate data are available on a simple random sample from the target population. Prior work adjusting for diffe…
▽ More
We present methods for estimating loss-based measures of the performance of a prediction model in a target population that differs from the source population in which the model was developed, in settings where outcome and covariate data are available from the source population but only covariate data are available on a simple random sample from the target population. Prior work adjusting for differences between the two populations has used various weighting estimators with inverse odds or density ratio weights. Here, we develop more robust estimators for the target population risk (expected loss) that can be used with data-adaptive (e.g., machine learning-based) estimation of nuisance parameters. We examine the large-sample properties of the estimators and evaluate finite sample performance in simulations. Last, we apply the methods to data from lung cancer screening using nationally representative data from the National Health and Nutrition Examination Survey (NHANES) and extend our methods to account for the complex survey design of the NHANES.
△ Less
Submitted 4 October, 2022;
originally announced October 2022.
-
Selection on treatment in the target population of generalizabillity and transportability analyses
Authors:
Yu-Han Chiu,
Issa J. Dahabreh
Abstract:
Investigators are increasingly using novel methods for extending (generalizing or transporting) causal inferences from a trial to a target population. In many generalizability and transportability analyses, the trial and the observational data from the target population are separately sampled, following a non-nested trial design. In practical implementations of this design, non-randomized individu…
▽ More
Investigators are increasingly using novel methods for extending (generalizing or transporting) causal inferences from a trial to a target population. In many generalizability and transportability analyses, the trial and the observational data from the target population are separately sampled, following a non-nested trial design. In practical implementations of this design, non-randomized individuals from the target population are often identified by conditioning on the use of a particular treatment, while individuals who used other candidate treatments for the same indication or individuals who did not use any treatment are excluded. In this paper, we argue that conditioning on treatment in the target population changes the estimand of generalizability and transportability analyses and potentially introduces serious bias in the estimation of causal estimands in the target population or the subset of the target population using a specific treatment. Furthermore, we argue that the naive application of marginalization-based or weighting-based standardization methods does not produce estimates of any reasonable causal estimand. We use causal graphs and counterfactual arguments to characterize the identification problems induced by conditioning on treatment in the target population and illustrate the problems using simulated data. We conclude by considering the implications of our findings for applied work.
△ Less
Submitted 19 September, 2022;
originally announced September 2022.
-
Tree-based Subgroup Discovery In Electronic Health Records: Heterogeneity of Treatment Effects for DTG-containing Therapies
Authors:
Jiabei Yang,
Ann W. Mwangi,
Rami Kantor,
Issa J. Dahabreh,
Monicah Nyambura,
Allison Delong,
Joseph W. Hogan,
Jon A. Steingrimsson
Abstract:
The rich longitudinal individual level data available from electronic health records (EHRs) can be used to examine treatment effect heterogeneity. However, estimating treatment effects using EHR data poses several challenges, including time-varying confounding, repeated and temporally non-aligned measurements of covariates, treatment assignments and outcomes, and loss-to-follow-up due to dropout.…
▽ More
The rich longitudinal individual level data available from electronic health records (EHRs) can be used to examine treatment effect heterogeneity. However, estimating treatment effects using EHR data poses several challenges, including time-varying confounding, repeated and temporally non-aligned measurements of covariates, treatment assignments and outcomes, and loss-to-follow-up due to dropout. Here, we develop the Subgroup Discovery for Longitudinal Data (SDLD) algorithm, a tree-based algorithm for discovering subgroups with heterogeneous treatment effects using longitudinal data by combining the generalized interaction tree algorithm, a general data-driven method for subgroup discovery, with longitudinal targeted maximum likelihood estimation. We apply the algorithm to EHR data to discover subgroups of people living with human immunodeficiency virus (HIV) who are at higher risk of weight gain when receiving dolutegravir-containing antiretroviral therapies (ARTs) versus when receiving non dolutegravir-containing ARTs.
△ Less
Submitted 30 August, 2022;
originally announced August 2022.
-
Global sensitivity analysis for studies extending inferences from a randomized trial to a target population
Authors:
Issa J. Dahabreh,
James M. Robins,
Sebastien J-P. A. Haneuse,
Sarah E. Robertson,
Jon A. Steingrimsson,
Miguel A. Hernán
Abstract:
When individuals participating in a randomized trial differ with respect to the distribution of effect modifiers compared compared with the target population where the trial results will be used, treatment effect estimates from the trial may not directly apply to target population. Methods for extending -- generalizing or transporting -- causal inferences from the trial to the target population re…
▽ More
When individuals participating in a randomized trial differ with respect to the distribution of effect modifiers compared compared with the target population where the trial results will be used, treatment effect estimates from the trial may not directly apply to target population. Methods for extending -- generalizing or transporting -- causal inferences from the trial to the target population rely on conditional exchangeability assumptions between randomized and non-randomized individuals. The validity of these assumptions is often uncertain or controversial and investigators need to examine how violation of the assumptions would impact study conclusions. We describe methods for global sensitivity analysis that directly parameterize violations of the assumptions in terms of potential (counterfactual) outcome distributions. Our approach does not require detailed knowledge about the distribution of specific unmeasured effect modifiers or their relationship with the observed variables. We illustrate the methods using data from a trial nested within a cohort of trial-eligible individuals to compare coronary artery surgery plus medical therapy versus medical therapy alone for stable ischemic heart disease.
△ Less
Submitted 20 July, 2022;
originally announced July 2022.
-
Systematically Missing Data in Causally Interpretable Meta-Analysis
Authors:
Jon A. Steingrimsson,
David H. Barker,
Ruofan Bie,
Issa J. Dahabreh
Abstract:
Causally interpretable meta-analysis combines information from a collection of randomized controlled trials to estimate treatment effects in a target population in which experimentation may not be possible but covariate information can be collected from a simple random sample. In such analyses, a key practical challenge is systematically missing data when some baseline covariates are not collected…
▽ More
Causally interpretable meta-analysis combines information from a collection of randomized controlled trials to estimate treatment effects in a target population in which experimentation may not be possible but covariate information can be collected from a simple random sample. In such analyses, a key practical challenge is systematically missing data when some baseline covariates are not collected in all trials. Here, we provide identification results for potential (counterfactual) outcome means and average treatment effects in the target population when covariate data are systematically missing from some of the trials in the meta-analysis. We propose three estimators for the average treatment effect in the target population, examine their asymptotic properties, and show that they have good finite-sample performance in simulation studies. We use the estimators to analyze data from two large lung cancer screening trials and target population data from the National Health and Nutrition Examination Survey (NHANES). To accommodate the complex survey design of the NHANES, we modify the methods to incorporate survey sampling weights and allow for clustering.
△ Less
Submitted 1 May, 2022;
originally announced May 2022.
-
Analyzing cluster randomized trials designed to support generalizable inferences
Authors:
Sarah E. Robertson,
Jon A. Steingrimsson,
Issa J. Dahabreh
Abstract:
Background: When planning a cluster randomized trial, evaluators often have access to an enumerated cohort representing the target population of clusters. Practicalities of conducting the trial, such as the need to oversample clusters with certain characteristics to improve trial economy or to support inference about subgroups of clusters, may preclude simple random sampling from the cohort into t…
▽ More
Background: When planning a cluster randomized trial, evaluators often have access to an enumerated cohort representing the target population of clusters. Practicalities of conducting the trial, such as the need to oversample clusters with certain characteristics to improve trial economy or to support inference about subgroups of clusters, may preclude simple random sampling from the cohort into the trial, and thus interfere with the goal of producing generalizable inferences about the target population.
Methods: We describe a nested trial design where the randomized clusters are embedded within a cohort of trial-eligible clusters from the target population and where clusters are selected for inclusion in the trial with known sampling probabilities that may depend on cluster characteristics (e.g., allowing clusters to be chosen to facilitate trial conduct or to examine hypotheses related to their characteristics). We develop and evaluate methods for analyzing data from this design to generalize causal inferences to the target population underlying the cohort.
Results: We present identification and estimation results for the expectation of the average potential outcome and for the average treatment effect, in the entire target population of clusters and in its non-randomized subset. In simulation studies we show that different estimators have low bias but markedly different precision.
Conclusions: Cluster randomized trials where clusters are selected for inclusion with known sampling probabilities that depend on cluster characteristics, combined with efficient estimation methods, can precisely quantify treatment effects in the target population, while addressing objectives of trial conduct that require oversampling clusters on the basis of their characteristics.
△ Less
Submitted 6 April, 2022;
originally announced April 2022.
-
Randomized trials and their observational emulations: a framework for benchmarking and joint analysis
Authors:
Issa J. Dahabreh,
Jon A. Steingrimsson,
James M. Robins,
Miguel A. Hernán
Abstract:
A randomized trial and an analysis of observational data designed to emulate the trial sample observations separately, but have the same eligibility criteria, collect information on some shared baseline covariates, and compare the effects of the same treatments on the same outcomes. Treatment effect estimates from the trial and its emulation can be compared to benchmark observational analysis meth…
▽ More
A randomized trial and an analysis of observational data designed to emulate the trial sample observations separately, but have the same eligibility criteria, collect information on some shared baseline covariates, and compare the effects of the same treatments on the same outcomes. Treatment effect estimates from the trial and its emulation can be compared to benchmark observational analysis methods. In a simplified setting with complete adherence to the assigned treatment strategy and no loss-to-follow-up, we show that benchmarking relies on an exchangeability condition between the populations underlying the trial and its emulation, to account for differences in the distribution of covariates between them. When this exchangeability condition holds, and the usual conditions needed for the estimates from the trial and its emulation to have a causal interpretation also hold, we derive restrictions on the law of the observed data. When the data are compatible with the restrictions, joint analysis of the trial and its emulation is possible. When the data are incompatible with the restrictions, a discrepancy between (1) estimates based on extending inferences from the trial to the population underlying the emulation and (2) the emulation itself may reflect either inability to benchmark (e.g., due to selective participation into the trial) or a failure of the emulation (e.g., due to unmeasured confounding), but we cannot use the data to determine which is the case. Our analysis reveals how benchmarking attempts combine causal assumptions, data analysis methods, and substantive knowledge to examine the validity of observational analysis methods.
△ Less
Submitted 28 March, 2022;
originally announced March 2022.
-
Extending inferences from a cluster randomized trial to a target population
Authors:
Issa J. Dahabreh,
Sarah E. Robertson,
Jon A. Steingrimsson,
Stefan Gravenstein,
Nina Joyce
Abstract:
We describe methods that extend (generalize or transport) causal inferences from cluster randomized trials to a target population of clusters, under a general nonparametric model that allows for arbitrary within-cluster dependence. We propose doubly robust estimators of potential outcome means in the target population that exploit individual-level data on covariates and outcomes to improve efficie…
▽ More
We describe methods that extend (generalize or transport) causal inferences from cluster randomized trials to a target population of clusters, under a general nonparametric model that allows for arbitrary within-cluster dependence. We propose doubly robust estimators of potential outcome means in the target population that exploit individual-level data on covariates and outcomes to improve efficiency and are appropriate for use with machine learning methods. We illustrate the methods using a cluster randomized trial of influenza vaccination strategies conducted in 818 nursing homes nested in a cohort of 4,475 trial-eligible Medicare-certified nursing homes.
△ Less
Submitted 28 March, 2022;
originally announced March 2022.
-
Learning about treatment effects in a new target population under transportability assumptions for relative effect measures
Authors:
Issa J. Dahabreh,
Sarah E. Robertson,
Jon A. Steingrimsson
Abstract:
Epidemiologists and applied statisticians often believe that relative effect measures conditional on covariates, such as risk ratios and mean ratios, are ``transportable'' across populations. Here, we examine the identification of causal effects in a target population using an assumption that conditional relative effect measures (e.g., conditional risk ratios or mean ratios) are transportable from…
▽ More
Epidemiologists and applied statisticians often believe that relative effect measures conditional on covariates, such as risk ratios and mean ratios, are ``transportable'' across populations. Here, we examine the identification of causal effects in a target population using an assumption that conditional relative effect measures (e.g., conditional risk ratios or mean ratios) are transportable from a trial to the target population. We show that transportability for relative effect measures is largely incompatible with transportability for difference effect measures, unless the treatment has no effect on average or one is willing to make even stronger transportability assumptions, which imply the transportability of both relative and difference effect measures. We then describe how marginal causal estimands in a target population can be identified under the assumption of transportability of relative effect measures, when we are interested in the effectiveness of a new experimental treatment in a target population where the only treatment in use is the control treatment evaluated in the trial. We extend these results to consider cases where the control treatment evaluated in the trial is only one of the treatments in use in the target population, under an additional partial exchangeability assumption in the target population (i.e., a partial assumption of no unmeasured confounding in the target population). We also develop identification results that allow for the covariates needed for transportability of relative effect measures to be only a small subset of the covariates needed to control confounding in the target population. Last, we propose estimators that can be easily implemented in standard statistical software.
△ Less
Submitted 23 February, 2022;
originally announced February 2022.
-
Regression-based estimation of heterogeneous treatment effects when extending inferences from a randomized trial to a target population
Authors:
Sarah E Robertson,
Jon A Steingrimsson,
Issa J Dahabreh
Abstract:
Methods for extending -- generalizing or transporting -- inferences from a randomized trial to a target population involve conditioning on a large set of covariates that is sufficient for rendering the randomized and non-randomized groups exchangeable. Yet, decision-makers are often interested in examining treatment effects in subgroups of the target population defined in terms of only a few discr…
▽ More
Methods for extending -- generalizing or transporting -- inferences from a randomized trial to a target population involve conditioning on a large set of covariates that is sufficient for rendering the randomized and non-randomized groups exchangeable. Yet, decision-makers are often interested in examining treatment effects in subgroups of the target population defined in terms of only a few discrete covariates. Here, we propose methods for estimating subgroup-specific potential outcome means and average treatment effects in generalizability and transportability analyses, using outcome model-based (g-formula), weighting, and augmented weighting estimators. We consider estimating subgroup-specific average treatment effects in the target population and its non-randomized subset, and provide methods that are appropriate both for nested and non-nested trial designs. As an illustration, we apply the methods to data from the Coronary Artery Surgery Study to compare the effect of surgery plus medical therapy versus medical therapy alone for chronic coronary artery disease in subgroups defined by history of myocardial infarction.
△ Less
Submitted 30 September, 2021;
originally announced October 2021.
-
Estimating subgroup effects in generalizability and transportability analyses
Authors:
Sarah E. Robertson,
Jon A. Steingrimsson,
Nina R. Joyce,
Elizabeth A. Stuart,
Issa J. Dahabreh
Abstract:
Methods for extending -- generalizing or transporting -- inferences from a randomized trial to a target population involve conditioning on a large set of covariates that is sufficient for rendering the randomized and non-randomized groups exchangeable. Yet, decision-makers are often interested in examining treatment effects in subgroups of the target population defined in terms of only a few discr…
▽ More
Methods for extending -- generalizing or transporting -- inferences from a randomized trial to a target population involve conditioning on a large set of covariates that is sufficient for rendering the randomized and non-randomized groups exchangeable. Yet, decision-makers are often interested in examining treatment effects in subgroups of the target population defined in terms of only a few discrete covariates. Here, we propose methods for estimating subgroup-specific potential outcome means and average treatment effects in generalizability and transportability analyses, using outcome model-based (g-formula), weighting, and augmented weighting estimators. We consider estimating subgroup-specific average treatment effects in the target population and its non-randomized subset, and provide methods that are appropriate both for nested and non-nested trial designs. As an illustration, we apply the methods to data from the Coronary Artery Surgery Study to compare the effect of surgery plus medical therapy versus medical therapy alone for chronic coronary artery disease in subgroups defined by history of myocardial infarction.
△ Less
Submitted 28 September, 2021;
originally announced September 2021.
-
Center-specific causal inference with multicenter trials: reinterpreting trial evidence in the context of each participating center
Authors:
Sarah E. Robertson,
Jon A. Steingrimsson,
Nina R. Joyce,
Elizabeth A. Stuart,
Issa J. Dahabreh
Abstract:
In multicenter randomized trials, when effect modifiers have a different distribution across centers, comparisons between treatment groups that average over centers may not apply to any of the populations underlying the individual centers. Here, we describe methods for reinterpreting the evidence produced by a multicenter trial in the context of the population underlying each center. We describe h…
▽ More
In multicenter randomized trials, when effect modifiers have a different distribution across centers, comparisons between treatment groups that average over centers may not apply to any of the populations underlying the individual centers. Here, we describe methods for reinterpreting the evidence produced by a multicenter trial in the context of the population underlying each center. We describe how to identify center-specific effects under identifiability conditions that are largely supported by the study design and when associations between center membership and the outcome may be present, given baseline covariates and treatment ("center-outcome associations"). We then consider an additional condition of no center-outcome associations given baseline covariates and treatment. We show that this condition can be assessed using the trial data; when it holds, center-specific treatment effects can be estimated using analyses that completely pool information across centers. We propose methods for estimating center-specific average treatment effects, when center-outcome associations may be present and when they are absent, and describe approaches for assessing whether center-specific treatment effects are homogeneous. We evaluate the performance of the methods in a simulation study and illustrate their implementation using data from the Hepatitis C Antiviral Long-Term Treatment Against Cirrhosis trial.
△ Less
Submitted 25 April, 2021; v1 submitted 12 April, 2021;
originally announced April 2021.
-
Transporting a prediction model for use in a new target population
Authors:
Jon A. Steingrimsson,
Constantine Gatsonis,
Issa J. Dahabreh
Abstract:
We consider methods for transporting a prediction model and assessing its performance for use in a new target population, when outcome and covariate information for model development is available from a simple random sample from the source population, but only covariate information is available on a simple random sample from the target population. We discuss how to tailor the prediction model for…
▽ More
We consider methods for transporting a prediction model and assessing its performance for use in a new target population, when outcome and covariate information for model development is available from a simple random sample from the source population, but only covariate information is available on a simple random sample from the target population. We discuss how to tailor the prediction model for use in the target population, how to assess model performance in the target population (e.g., by estimating the target population mean squared error), and how to perform model and tuning parameter selection in the context of the target population. We provide identifiability results for the target population mean squared error of a potentially misspecified prediction model under a sampling design where the source study and the target population samples are obtained separately. We also introduce the concept of prediction error modifiers that can be used to reason about the need for tailoring measures of model performance to the target population and provide an illustration of the methods using simulated data.
△ Less
Submitted 14 April, 2021; v1 submitted 26 January, 2021;
originally announced January 2021.
-
Causal Interaction Trees: Tree-Based Subgroup Identification for Observational Data
Authors:
Jiabei Yang,
Issa J. Dahabreh,
Jon A. Steingrimsson
Abstract:
We propose Causal Interaction Trees for identifying subgroups of participants that have enhanced treatment effects using observational data. We extend the Classification and Regression Tree algorithm by using splitting criteria that focus on maximizing between-group treatment effect heterogeneity based on subgroup-specific treatment effect estimators to dictate decision-making in the algorithm. We…
▽ More
We propose Causal Interaction Trees for identifying subgroups of participants that have enhanced treatment effects using observational data. We extend the Classification and Regression Tree algorithm by using splitting criteria that focus on maximizing between-group treatment effect heterogeneity based on subgroup-specific treatment effect estimators to dictate decision-making in the algorithm. We derive properties of three subgroup-specific treatment effect estimators that account for the observational nature of the data -- inverse probability weighting, g-formula and doubly robust estimators. We study the performance of the proposed algorithms using simulations and implement the algorithms in an observational study that evaluates the effectiveness of right heart catheterization on critically ill patients.
△ Less
Submitted 6 March, 2020;
originally announced March 2020.
-
Efficient and robust methods for causally interpretable meta-analysis: transporting inferences from multiple randomized trials to a target population
Authors:
Issa J. Dahabreh,
Sarah E. Robertson,
Lucia C. Petito,
Miguel A. Hernán,
Jon A. Steingrimsson
Abstract:
We present methods for causally interpretable meta-analyses that combine information from multiple randomized trials to estimate potential (counterfactual) outcome means and average treatment effects in a target population. We consider identifiability conditions, derive implications of the conditions for the law of the observed data, and obtain identification results for transporting causal infere…
▽ More
We present methods for causally interpretable meta-analyses that combine information from multiple randomized trials to estimate potential (counterfactual) outcome means and average treatment effects in a target population. We consider identifiability conditions, derive implications of the conditions for the law of the observed data, and obtain identification results for transporting causal inferences from a collection of independent randomized trials to a new target population in which experimental data may not be available. We propose an estimator for the potential (counterfactual) outcome mean in the target population under each treatment studied in the trials. The estimator uses covariate, treatment, and outcome data from the collection of trials, but only covariate data from the target population sample. We show that it is doubly robust, in the sense that it is consistent and asymptotically normal when at least one of the models it relies on is correctly specified. We study the finite sample properties of the estimator in simulation studies and demonstrate its implementation using data from a multi-center randomized trial.
△ Less
Submitted 4 February, 2022; v1 submitted 24 August, 2019;
originally announced August 2019.
-
Generalizing causal inferences from randomized trials: counterfactual and graphical identification
Authors:
Issa J. Dahabreh,
James M. Robins,
Sebastien J-P. A. Haneuse,
Miguel A. Hernán
Abstract:
When engagement with a randomized trial is driven by factors that affect the outcome or when trial engagement directly affects the outcome independent of treatment, the average treatment effect among trial participants is unlikely to generalize to a target population. In this paper, we use counterfactual and graphical causal models to examine under what conditions we can generalize causal inferenc…
▽ More
When engagement with a randomized trial is driven by factors that affect the outcome or when trial engagement directly affects the outcome independent of treatment, the average treatment effect among trial participants is unlikely to generalize to a target population. In this paper, we use counterfactual and graphical causal models to examine under what conditions we can generalize causal inferences from a randomized trial to the target population of trial-eligible individuals. We offer an interpretation of generalizability analyses using the notion of a hypothetical intervention to "scale-up" trial engagement to the target population. We consider the interpretation of generalizability analyses when trial engagement does or does not directly affect the outcome, highlight connections with censoring in longitudinal studies, and discuss identification of the distribution of counterfactual outcomes via g-formula computation and inverse probability weighting. Last, we show how the methods can be extended to address time-varying treatments, non-adherence, and censoring.
△ Less
Submitted 25 June, 2019;
originally announced June 2019.
-
Sensitivity analysis using bias functions for studies extending inferences from a randomized trial to a target population
Authors:
Issa J. Dahabreh,
James M. Robins,
Sebastien J-P. A. Haneuse,
Iman Saeed,
Sarah E. Robertson,
Elisabeth A. Stuart,
Miguel A. Hernán
Abstract:
Extending (generalizing or transporting) causal inferences from a randomized trial to a target population requires ``generalizability'' or ``transportability'' assumptions, which state that randomized and non-randomized individuals are exchangeable conditional on baseline covariates. These assumptions are made on the basis of background knowledge, which is often uncertain or controversial, and nee…
▽ More
Extending (generalizing or transporting) causal inferences from a randomized trial to a target population requires ``generalizability'' or ``transportability'' assumptions, which state that randomized and non-randomized individuals are exchangeable conditional on baseline covariates. These assumptions are made on the basis of background knowledge, which is often uncertain or controversial, and need to be subjected to sensitivity analysis. We present simple methods for sensitivity analyses that do not require detailed background knowledge about specific unknown or unmeasured determinants of the outcome or modifiers of the treatment effect. Instead, our methods directly parameterize violations of the assumptions using bias functions. We show how the methods can be applied to non-nested trial designs, where the trial data are combined with a separately obtained sample of non-randomized individuals, as well as to nested trial designs, where a clinical trial is embedded within a cohort sampled from the target population. We illustrate the methods using data from a clinical trial comparing treatments for chronic hepatitis C infection.
△ Less
Submitted 25 May, 2019;
originally announced May 2019.
-
Study designs for extending causal inferences from a randomized trial to a target population
Authors:
Issa J. Dahabreh,
Sebastien J-P. A. Haneuse,
James M. Robins,
Sarah E. Robertson,
Ashley L. Buchanan,
Elisabeth A. Stuart,
Miguel A. Hernán
Abstract:
We examine study designs for extending (generalizing or transporting) causal inferences from a randomized trial to a target population. Specifically, we consider nested trial designs, where randomized individuals are nested within a sample from the target population, and non-nested trial designs, including composite dataset designs, where a randomized trial is combined with a separately obtained s…
▽ More
We examine study designs for extending (generalizing or transporting) causal inferences from a randomized trial to a target population. Specifically, we consider nested trial designs, where randomized individuals are nested within a sample from the target population, and non-nested trial designs, including composite dataset designs, where a randomized trial is combined with a separately obtained sample of non-randomized individuals from the target population. We show that the causal quantities that can be identified in each study design depend on what is known about the probability of sampling non-randomized individuals. For each study design, we examine identification of potential outcome means via the g-formula and inverse probability weighting. Last, we explore the implications of the sampling properties underlying the designs for the identification and estimation of the probability of trial participation.
△ Less
Submitted 19 May, 2019;
originally announced May 2019.
-
Towards causally interpretable meta-analysis: transporting inferences from multiple studies to a target population
Authors:
Issa J. Dahabreh,
Lucia C. Petito,
Sarah E. Robertson,
Miguel A. Hernán,
Jon A. Steingrimsson
Abstract:
We take steps towards causally interpretable meta-analysis by describing methods for transporting causal inferences from a collection of randomized trials to a new target population, one-trial-at-a-time and pooling all trials. We discuss identifiability conditions for average treatment effects in the target population and provide identification results. We show that assuming inferences are transpo…
▽ More
We take steps towards causally interpretable meta-analysis by describing methods for transporting causal inferences from a collection of randomized trials to a new target population, one-trial-at-a-time and pooling all trials. We discuss identifiability conditions for average treatment effects in the target population and provide identification results. We show that assuming inferences are transportable from all trials in the collection to the same target population has implications for the law underlying the observed data. We propose average treatment effect estimators that rely on different working models and provide code for their implementation in statistical software. We discuss how to use the data to examine whether transported inferences are homogeneous across the collection of trials, sketch approaches for sensitivity analysis to violations of the identifiability conditions, and describe extensions to address non-adherence in the trials. Last, we illustrate the proposed methods using data from the HALT-C multi-center trial.
△ Less
Submitted 8 February, 2020; v1 submitted 27 March, 2019;
originally announced March 2019.
-
Generalizing trial findings using nested trial designs with sub-sampling of non-randomized individuals
Authors:
Issa J. Dahabreh,
Miguel A. Hernan,
Sarah E. Robertson,
Ashley Buchanan,
Jon A. Steingrimsson
Abstract:
To generalize inferences from a randomized trial to the target population of all trial-eligible individuals, investigators can use nested trial designs, where the randomized individuals are nested within a cohort of trial-eligible individuals, including those who are not offered or refuse randomization. In these designs, data on baseline covariates are collected from the entire cohort, and treatme…
▽ More
To generalize inferences from a randomized trial to the target population of all trial-eligible individuals, investigators can use nested trial designs, where the randomized individuals are nested within a cohort of trial-eligible individuals, including those who are not offered or refuse randomization. In these designs, data on baseline covariates are collected from the entire cohort, and treatment and outcome data need only be collected from randomized individuals. In this paper, we describe nested trial designs that improve research economy by collecting additional baseline covariate data after sub-sampling non-randomized individuals (i.e., a two-stage design), using sampling probabilities that may depend on the initial set of baseline covariates available from all individuals in the cohort. We propose an estimator for the potential outcome mean in the target population of all trial-eligible individuals and show that our estimator is doubly robust, in the sense that it is consistent when either the model for the conditional outcome mean among randomized individuals or the model for the probability of trial participation is correctly specified. We assess the impact of sub-sampling on the asymptotic variance of our estimator and examine the estimator's finite-sample performance in a simulation study. We illustrate the methods using data from the Coronary Artery Surgery Study (CASS).
△ Less
Submitted 7 March, 2019; v1 submitted 16 February, 2019;
originally announced February 2019.
-
Extending inferences from a randomized trial to a new target population
Authors:
Issa J. Dahabreh,
Sarah E. Robertson,
Jon A. Steingrimsson,
Elizabeth A. Stuart,
Miguel A. Hernan
Abstract:
When treatment effect modifiers influence the decision to participate in a randomized trial, the average treatment effect in the population represented by the randomized individuals will differ from the effect in other populations. In this tutorial, we consider methods for extending causal inferences about time-fixed treatments from a trial to a new target population of non-participants, using dat…
▽ More
When treatment effect modifiers influence the decision to participate in a randomized trial, the average treatment effect in the population represented by the randomized individuals will differ from the effect in other populations. In this tutorial, we consider methods for extending causal inferences about time-fixed treatments from a trial to a new target population of non-participants, using data from a completed randomized trial and baseline covariate data from a sample from the target population. We examine methods based on modeling the expectation of the outcome, the probability of participation, or both (doubly robust). We compare the methods in a simulation study and show how they can be implemented in software. We apply the methods to a randomized trial nested within a cohort of trial-eligible patients to compare coronary artery surgery plus medical therapy versus medical therapy alone for patients with chronic coronary artery disease. We conclude by discussing issues that arise when using the methods in applied analyses.
△ Less
Submitted 28 October, 2019; v1 submitted 1 May, 2018;
originally announced May 2018.
-
Generalizing causal inferences from individuals in randomized trials to all trial-eligible individuals
Authors:
Issa Dahabreh,
Sarah Robertson,
Eric Tchetgen Tchetgen,
Elizabeth Stuart,
Miguel Hernan
Abstract:
We consider methods for causal inference in randomized trials nested within cohorts of trial-eligible individuals, including those who are not randomized. We show how baseline covariate data from the entire cohort, and treatment and outcome data only from randomized individuals, can be used to identify potential (counterfactual) outcome means and average treatment effects in the target population…
▽ More
We consider methods for causal inference in randomized trials nested within cohorts of trial-eligible individuals, including those who are not randomized. We show how baseline covariate data from the entire cohort, and treatment and outcome data only from randomized individuals, can be used to identify potential (counterfactual) outcome means and average treatment effects in the target population of all eligible individuals. We review identifiability conditions, propose estimators, and assess the estimators' finite-sample performance in simulation studies. As an illustration, we apply the estimators in a trial nested within a cohort of trial-eligible individuals to compare coronary artery bypass grafting surgery plus medical therapy vs. medical therapy alone for chronic coronary artery disease.
△ Less
Submitted 29 October, 2019; v1 submitted 13 September, 2017;
originally announced September 2017.