Search | arXiv e-print repository

Measurement Error-Robust Causal Inference via Constructed Instrumental Variables

Authors: Caleb H. Miles, Linda Valeri, Brent Coull

Abstract: Measurement error can often be harmful when estimating causal effects. Two scenarios in which this is the case are in the estimation of (a) the average treatment effect when confounders are measured with error and (b) the natural indirect effect when the exposure and/or confounders are measured with error. Methods adjusting for measurement error typically require external data or knowledge about t… ▽ More Measurement error can often be harmful when estimating causal effects. Two scenarios in which this is the case are in the estimation of (a) the average treatment effect when confounders are measured with error and (b) the natural indirect effect when the exposure and/or confounders are measured with error. Methods adjusting for measurement error typically require external data or knowledge about the measurement error distribution. Here, we propose methodology not requiring any such information. Instead, we show that when the outcome regression is linear in the error-prone variables, consistent estimation of these causal effects can be recovered using constructed instrumental variables under certain conditions. These variables, which are functions of only the observed data, behave like instrumental variables for the error-prone variables. Using data from a study of the effects of prenatal exposure to heavy metals on growth and neurodevelopment in Bangladeshi mother-infant pairs, we apply our methodology to estimate (a) the effect of lead exposure on birth length while controlling for maternal protein intake, and (b) lead exposure's role in mediating the effect of maternal protein intake on birth length. Protein intake is calculated from food journal entries, and is suspected to be highly prone to measurement error. △ Less

Submitted 2 June, 2024; originally announced June 2024.

Comments: 72 pages, 4 figures

MSC Class: 62

arXiv:2301.02904 [pdf, other]

Sensitivity analysis for transportability in multi-study, multi-outcome settings

Authors: Ngoc Q. Duong, Amy J. Pitts, Soohyun Kim, Caleb H. Miles

Abstract: Existing work in data fusion has covered identification of causal estimands when integrating data from heterogeneous sources. These results typically require additional assumptions to make valid estimation and inference. However, there is little literature on transporting and generalizing causal effects in multiple-outcome setting, where the primary outcome is systematically missing on the study l… ▽ More Existing work in data fusion has covered identification of causal estimands when integrating data from heterogeneous sources. These results typically require additional assumptions to make valid estimation and inference. However, there is little literature on transporting and generalizing causal effects in multiple-outcome setting, where the primary outcome is systematically missing on the study level but for which other outcome variables may serve as proxies. We review an identification result developed in ongoing work that utilizes information from these proxies to obtain more efficient estimators and the corresponding key identification assumption. We then introduce methods for assessing the sensitivity of this approach to the identification assumption. △ Less

Submitted 7 January, 2023; originally announced January 2023.

arXiv:2211.10310 [pdf, other]

All models are wrong, but which are useful? Comparing parametric and nonparametric estimation of causal effects in finite samples

Authors: Kara E. Rudolph, Nicholas Williams, Caleb H. Miles, Joseph Antonelli, Ivan Diaz

Abstract: There is a long-standing debate in the statistical, epidemiological and econometric fields as to whether nonparametric estimation that uses data-adaptive methods, like machine learning algorithms in model fitting, confer any meaningful advantage over simpler, parametric approaches in real-world, finite sample estimation of causal effects. We address the question: when trying to estimate the effect… ▽ More There is a long-standing debate in the statistical, epidemiological and econometric fields as to whether nonparametric estimation that uses data-adaptive methods, like machine learning algorithms in model fitting, confer any meaningful advantage over simpler, parametric approaches in real-world, finite sample estimation of causal effects. We address the question: when trying to estimate the effect of a treatment on an outcome, across a universe of reasonable data distributions, how much does the choice of nonparametric vs.~parametric estimation matter? Instead of answering this question with simulations that reflect a few chosen data scenarios, we propose a novel approach evaluating performance across thousands of data-generating mechanisms drawn from non-parametric models with semi-informative priors. We call this approach a Universal Monte-Carlo Simulation. We compare performance of estimating the average treatment effect across two parametric estimators (a g-computation estimator that uses a parametric outcome model and an inverse probability of treatment weighted estimator) and two nonparametric estimators (Bayesian additive regression trees and a targeted minimum loss-based estimator that uses an ensemble of machine learning algorithms in model fitting). We summarize estimator performance in terms of bias, confidence interval coverage, and mean squared error. We find that the nonparametric estimators nearly always outperform the parametric estimators with the exception of having similar performance in terms of bias and similar-to-slightly-worse performance in terms of coverage under the smallest sample size of N=100. △ Less

Submitted 19 December, 2022; v1 submitted 18 November, 2022; originally announced November 2022.

arXiv:2203.00245 [pdf, ps, other]

On the Causal Interpretation of Randomized Interventional Indirect Effects

Authors: Caleb H. Miles

Abstract: Identification of standard mediated effects such as the natural indirect effect relies on heavy causal assumptions. By circumventing such assumptions, so-called randomized interventional indirect effects have gained popularity in the mediation literature. Here, I introduce properties one might demand of an indirect effect measure in order for it to have a true mediational interpretation. For insta… ▽ More Identification of standard mediated effects such as the natural indirect effect relies on heavy causal assumptions. By circumventing such assumptions, so-called randomized interventional indirect effects have gained popularity in the mediation literature. Here, I introduce properties one might demand of an indirect effect measure in order for it to have a true mediational interpretation. For instance, the sharp null criterion requires an indirect effect measure to be null whenever no individual-level indirect effect exists. I show that without stronger assumptions, randomized interventional indirect effects do not satisfy such criteria. I additionally discuss alternative causal interpretations of such effects. △ Less

Submitted 29 September, 2023; v1 submitted 1 March, 2022; originally announced March 2022.

Comments: 59 pages, 3 figures

arXiv:2107.07575 [pdf, other]

Optimal tests of the composite null hypothesis arising in mediation analysis

Authors: Caleb H. Miles, Antoine Chambaz

Abstract: The indirect effect of an exposure on an outcome through an intermediate variable can be identified by a product of regression coefficients under certain causal and regression modeling assumptions. Thus, the null hypothesis of no indirect effect is a composite null hypothesis, as the null holds if either regression coefficient is zero. A consequence is that existing hypothesis tests are either sev… ▽ More The indirect effect of an exposure on an outcome through an intermediate variable can be identified by a product of regression coefficients under certain causal and regression modeling assumptions. Thus, the null hypothesis of no indirect effect is a composite null hypothesis, as the null holds if either regression coefficient is zero. A consequence is that existing hypothesis tests are either severely underpowered near the origin (i.e., when both coefficients are small with respect to standard errors) or do not preserve type 1 error uniformly over the null hypothesis space. We propose hypothesis tests that (i) preserve level alpha type 1 error, (ii) meaningfully improve power when both true underlying effects are small relative to sample size, and (iii) preserve power when at least one is not. One approach gives a closed-form test that is minimax optimal with respect to local power over the alternative parameter space. Another uses sparse linear programming to produce an approximately optimal test for a Bayes risk criterion. We provide an R package that implements the minimax optimal test. △ Less

Submitted 15 July, 2021; originally announced July 2021.

Comments: 40 pages, 7 figures

MSC Class: 62F05

arXiv:1710.09588 [pdf, other]

Causal Inference When Counterfactuals Depend on the Proportion of All Subjects Exposed

Authors: Caleb H. Miles, Maya Petersen, Mark J. van der Laan

Abstract: The assumption that no subject's exposure affects another subject's outcome, known as the no-interference assumption, has long held a foundational position in the study of causal inference. However, this assumption may be violated in many settings, and in recent years has been relaxed considerably. Often this has been achieved with either the aid of a known underlying network, or the assumption th… ▽ More The assumption that no subject's exposure affects another subject's outcome, known as the no-interference assumption, has long held a foundational position in the study of causal inference. However, this assumption may be violated in many settings, and in recent years has been relaxed considerably. Often this has been achieved with either the aid of a known underlying network, or the assumption that the population can be partitioned into separate groups, between which there is no interference, and within which each subject's outcome may be affected by all the other subjects in the group via the proportion exposed (the stratified interference assumption). In this paper, we instead consider a complete interference setting, in which each subject affects every other subject's outcome. In particular, we make the stratified interference assumption for a single group consisting of the entire sample. This can occur when the exposure is a shared resource whose efficacy is modified by the number of subjects among whom it is shared. We show that a targeted maximum likelihood estimator for the i.i.d.~setting can be used to estimate a class of causal parameters that includes direct effects and overall effects under certain interventions. This estimator remains doubly-robust, semiparametric efficient, and continues to allow for incorporation of machine learning under our model. We conduct a simulation study, and present results from a data application where we study the effect of a nurse-based triage system on the outcomes of patients receiving HIV care in Kenyan health clinics. △ Less

Submitted 23 November, 2018; v1 submitted 26 October, 2017; originally announced October 2017.

Comments: 23 pages main article, 23 pages supplementary materials + references, 4 tables, 1 figure

MSC Class: 62

arXiv:1710.02011 [pdf, other]

On semiparametric estimation of a path-specific effect in the presence of mediator-outcome confounding

Authors: Caleb H. Miles, Ilya Shpitser, Phyllis Kanki, Seema Meloni, Eric J. Tchetgen Tchetgen

Abstract: Path-specific effects are a broad class of mediated effects from an exposure to an outcome via one or more causal pathways with respect to some subset of intermediate variables. The majority of the literature concerning estimation of mediated effects has focused on parametric models with stringent assumptions regarding unmeasured confounding. We consider semiparametric inference of a path-specific… ▽ More Path-specific effects are a broad class of mediated effects from an exposure to an outcome via one or more causal pathways with respect to some subset of intermediate variables. The majority of the literature concerning estimation of mediated effects has focused on parametric models with stringent assumptions regarding unmeasured confounding. We consider semiparametric inference of a path-specific effect when these assumptions are relaxed. In particular, we develop a suite of semiparametric estimators for the effect along a pathway through a mediator, but not some exposure-induced confounder of that mediator. These estimators have different robustness properties, as each depends on different parts of the observed data likelihood. One of our estimators may be viewed as combining the others, because it is locally semiparametric efficient and multiply robust. The latter property is illustrated in a simulation study. We apply our methodology to an HIV study, in which we estimate the effect comparing two drug treatments on a patient's average log CD4 count mediated by the patient's level of adherence, but not by previous experience of toxicity, which is clearly affected by which treatment the patient is assigned to, and may confound the effect of the patient's level of adherence on their virologic outcome. △ Less

Submitted 3 October, 2017; originally announced October 2017.

Comments: 17 pages + 9 pages of supplementary material + 4 pages of references, 3 figures, 1 table. arXiv admin note: text overlap with arXiv:1411.6028

MSC Class: 62

arXiv:1610.05005 [pdf, other]

A Class of Semiparametric Tests of Treatment Effect Robust to Confounder Classical Measurement Error

Authors: Caleb H. Miles, Joel Schwartz, Eric J. Tchetgen Tchetgen

Abstract: When assessing the presence of an exposure causal effect on a given outcome, it is well known that classical measurement error of the exposure can reduce the power of a test of the null hypothesis in question, although its type I error rate will generally remain at the nominal level. In contrast, classical measurement error of a confounder can inflate the type I error rate of a test of treatment e… ▽ More When assessing the presence of an exposure causal effect on a given outcome, it is well known that classical measurement error of the exposure can reduce the power of a test of the null hypothesis in question, although its type I error rate will generally remain at the nominal level. In contrast, classical measurement error of a confounder can inflate the type I error rate of a test of treatment effect. In this paper, we develop a large class of semiparametric test statistics of an exposure causal effect, which are completely robust to classical measurement error of a subset of confounders. A unique and appealing feature of our proposed methods is that they require no external information such as validation data or replicates of error-prone confounders. We present a doubly-robust form of this test that requires only one of two models to be correctly specified for the resulting test statistic to have correct type I error rate. We demonstrate validity and power within our class of test statistics through simulation studies. We apply the methods to a multi-U.S.-city, time-series data set to test for an effect of temperature on mortality while adjusting for atmospheric particulate matter with diameter of 2.5 micrometres or less (PM2.5), which is known to be measured with error. △ Less

Submitted 17 October, 2016; originally announced October 2016.

Comments: 35 pages, 1 figure

MSC Class: 62

arXiv:1509.01652 [pdf, other]

On Partial Identification of the Pure Direct Effect

Authors: Caleb H. Miles, Phyllis Kanki, Seema Meloni, Eric J. Tchetgen Tchetgen

Abstract: In causal mediation analysis, nonparametric identification of the pure (natural) direct effect typically relies on, in addition to no unobserved pre-exposure confounding, fundamental assumptions of (i) so-called "cross-world-counterfactuals" independence and (ii) no exposure- induced confounding. When the mediator is binary, bounds for partial identification have been given when neither assumption… ▽ More In causal mediation analysis, nonparametric identification of the pure (natural) direct effect typically relies on, in addition to no unobserved pre-exposure confounding, fundamental assumptions of (i) so-called "cross-world-counterfactuals" independence and (ii) no exposure- induced confounding. When the mediator is binary, bounds for partial identification have been given when neither assumption is made, or alternatively when assuming only (ii). We extend existing bounds to the case of a polytomous mediator, and provide bounds for the case assuming only (i). We apply these bounds to data from the Harvard PEPFAR program in Nigeria, where we evaluate the extent to which the effects of antiretroviral therapy on virological failure are mediated by a patient's adherence, and show that inference on this effect is somewhat sensitive to model assumptions. △ Less

Submitted 4 September, 2015; originally announced September 2015.

Comments: 24 pages, 4 figures

MSC Class: 62

Showing 1–9 of 9 results for author: Miles, C H