Search | arXiv e-print repository

Valid causal inference with unobserved confounding in high-dimensional settings

Authors: Niloofar Moosavi, Tetiana Gorbach, Xavier de Luna

Abstract: Various methods have recently been proposed to estimate causal effects with confidence intervals that are uniformly valid over a set of data generating processes when high-dimensional nuisance models are estimated by post-model-selection or machine learning estimators. These methods typically require that all the confounders are observed to ensure identification of the effects. We contribute by sh… ▽ More Various methods have recently been proposed to estimate causal effects with confidence intervals that are uniformly valid over a set of data generating processes when high-dimensional nuisance models are estimated by post-model-selection or machine learning estimators. These methods typically require that all the confounders are observed to ensure identification of the effects. We contribute by showing how valid semiparametric inference can be obtained in the presence of unobserved confounders and high-dimensional nuisance models. We propose uncertainty intervals which allow for unobserved confounding, and show that the resulting inference is valid when the amount of unobserved confounding is small relative to the sample size; the latter is formalized in terms of convergence rates. Simulation experiments illustrate the finite sample properties of the proposed intervals and investigate an alternative procedure that improves the empirical coverage of the intervals when the amount of unobserved confounding is large. Finally, a case study on the effect of smoking during pregnancy on birth weight is used to illustrate the use of the methods introduced to perform a sensitivity analysis to unobserved confounding. △ Less

Submitted 12 January, 2024; originally announced January 2024.

Comments: 20 pages, 2 figures

MSC Class: 62D20; 62D10

arXiv:2304.07113 [pdf, other]

doi 10.1093/jrsssc/qlad092

Causal inference with a functional outcome

Authors: Kreske Ecker, Xavier de Luna, Lina Schelin

Abstract: This paper presents methods to study the causal effect of a binary treatment on a functional outcome with observational data. We define a Functional Average Treatment Effect and develop an outcome regression estimator. We show how to obtain valid inference on the FATE using simultaneous confidence bands, which cover the FATE with a given probability over the entire domain. Simulation experiments i… ▽ More This paper presents methods to study the causal effect of a binary treatment on a functional outcome with observational data. We define a Functional Average Treatment Effect and develop an outcome regression estimator. We show how to obtain valid inference on the FATE using simultaneous confidence bands, which cover the FATE with a given probability over the entire domain. Simulation experiments illustrate how the simultaneous confidence bands take the multiple comparison problem into account. Finally, we use the methods to infer the effect of early adult location on subsequent income development for one Swedish birth cohort. △ Less

Submitted 10 November, 2023; v1 submitted 14 April, 2023; originally announced April 2023.

Comments: 31 pages, 9 figures

Journal ref: Journal of the Royal Statistical Society Series C: Applied Statistics, 2023

arXiv:2301.11732 [pdf, other]

doi 10.1080/10618600.2023.2257247

Convolutional neural networks for valid and efficient causal inference

Authors: Mohammad Ghasempour, Niloofar Moosavi, Xavier de Luna

Abstract: Convolutional neural networks (CNN) have been successful in machine learning applications. Their success relies on their ability to consider space invariant local features. We consider the use of CNN to fit nuisance models in semiparametric estimation of the average causal effect of a treatment. In this setting, nuisance models are functions of pre-treatment covariates that need to be controlled f… ▽ More Convolutional neural networks (CNN) have been successful in machine learning applications. Their success relies on their ability to consider space invariant local features. We consider the use of CNN to fit nuisance models in semiparametric estimation of the average causal effect of a treatment. In this setting, nuisance models are functions of pre-treatment covariates that need to be controlled for. In an application where we want to estimate the effect of early retirement on a health outcome, we propose to use CNN to control for time-structured covariates. Thus, CNN is used when fitting nuisance models explaining the treatment and the outcome. These fits are then combined into an augmented inverse probability weighting estimator yielding efficient and uniformly valid inference. Theoretically, we contribute by providing rates of convergence for CNN equipped with the rectified linear unit activation function and compare it to an existing result for feedforward neural networks. We also show when those rates guarantee uniformly valid inference. A Monte Carlo study is provided where the performance of the proposed estimator is evaluated and compared with other strategies. Finally, we give results on a study of the effect of early retirement on hospitalization using data covering the whole Swedish population. △ Less

Submitted 27 January, 2023; originally announced January 2023.

Comments: 22 pages, 2 figure

MSC Class: 62G05; 62G20

Journal ref: Journal of Computational and Graphical Statistics, 2023

arXiv:2111.15233 [pdf, other]

Contrasting Identifying Assumptions of Average Causal Effects: Robustness and Semiparametric Efficiency

Authors: Tetiana Gorbach, Xavier de Luna, Juha Karvanen, Ingeborg Waernbaum

Abstract: Semiparametric inference on average causal effects from observational data is based on assumptions yielding identification of the effects. In practice, several distinct identifying assumptions may be plausible; an analyst has to make a delicate choice between these models. In this paper, we study three identifying assumptions based on the potential outcome framework: the back-door assumption, whic… ▽ More Semiparametric inference on average causal effects from observational data is based on assumptions yielding identification of the effects. In practice, several distinct identifying assumptions may be plausible; an analyst has to make a delicate choice between these models. In this paper, we study three identifying assumptions based on the potential outcome framework: the back-door assumption, which uses pre-treatment covariates, the front-door assumption, which uses mediators, and the two-door assumption using pre-treatment covariates and mediators simultaneously. We provide the efficient influence functions and the corresponding semiparametric efficiency bounds that hold under these assumptions, and their combinations. We demonstrate that neither of the identification models provides uniformly the most efficient estimation and give conditions under which some bounds are lower than others. We show when semiparametric estimating equation estimators based on influence functions attain the bounds, and study the robustness of the estimators to misspecification of the nuisance models. The theory is complemented with simulation experiments on the finite sample behavior of the estimators. The results obtained are relevant for an analyst facing a choice between several plausible identifying assumptions and corresponding estimators. Our results show that this choice implies a trade-off between efficiency and robustness to misspecification of the nuisance models. △ Less

Submitted 17 February, 2023; v1 submitted 30 November, 2021; originally announced November 2021.

Journal ref: Journal of Machine Learning Research 24 (197), 1-65, 2023

arXiv:2105.02499 [pdf, other]

SDRcausal: an R package for causal inference based on sufficient dimension reduction

Authors: Mohammad Ghasempour, Xavier de Luna

Abstract: SDRcausal is a package that implements sufficient dimension reduction methods for causal inference as proposed in Ghosh, Ma, and de Luna (2021). The package implements (augmented) inverse probability weighting and outcome regression (imputation) estimators of an average treatment effect (ATE) parameter. Nuisance models, both treatment assignment probability given the covariates (propensity score)… ▽ More SDRcausal is a package that implements sufficient dimension reduction methods for causal inference as proposed in Ghosh, Ma, and de Luna (2021). The package implements (augmented) inverse probability weighting and outcome regression (imputation) estimators of an average treatment effect (ATE) parameter. Nuisance models, both treatment assignment probability given the covariates (propensity score) and outcome regression models, are fitted by using semiparametric locally efficient dimension reduction estimators, thereby allowing for large sets of confounding covariates. Techniques including linear extrapolation, numerical differentiation, and truncation have been used to obtain a practicable implementation of the methods. Finding the suitable dimension reduction map (central mean subspace) requires solving an optimization problem, and several optimization algorithms are given as choices to the user. The package also provides estimators of the asymptotic variances of the causal effect estimators implemented. Plotting options are provided. The core of the methods are implemented in C language, and parallelization is allowed for. The user-friendly and freeware R language is used as interface. The package can be downloaded from Github repository: https://github.com/stat4reg. △ Less

Submitted 6 May, 2021; originally announced May 2021.

arXiv:2105.02071 [pdf, ps, other]

doi 10.1214/21-STS843

The costs and benefits of uniformly valid causal inference with high-dimensional nuisance parameters

Authors: Niloofar Moosavi, Jenny Häggström, Xavier de Luna

Abstract: Important advances have recently been achieved in develo** procedures yielding uniformly valid inference for a low dimensional causal parameter when high-dimensional nuisance models must be estimated. In this paper, we review the literature on uniformly valid causal inference and discuss the costs and benefits of using uniformly valid inference procedures. Naive estimation strategies based on re… ▽ More Important advances have recently been achieved in develo** procedures yielding uniformly valid inference for a low dimensional causal parameter when high-dimensional nuisance models must be estimated. In this paper, we review the literature on uniformly valid causal inference and discuss the costs and benefits of using uniformly valid inference procedures. Naive estimation strategies based on regularisation, machine learning, or a preliminary model selection stage for the nuisance models have finite sample distributions which are badly approximated by their asymptotic distributions. To solve this serious problem, estimators which converge uniformly in distribution over a class of data generating mechanisms have been proposed in the literature. In order to obtain uniformly valid results in high-dimensional situations, sparsity conditions for the nuisance models need typically to be made, although a double robustness property holds, whereby if one of the nuisance model is more sparse, the other nuisance model is allowed to be less sparse. While uniformly valid inference is a highly desirable property, uniformly valid procedures pay a high price in terms of inflated variability. Our discussion of this dilemma is illustrated by the study of a double-selection outcome regression estimator, which we show is uniformly asymptotically unbiased, but is less variable than uniformly valid estimators in the numerical experiments conducted. △ Less

Submitted 5 May, 2021; originally announced May 2021.

Journal ref: Statistical Science 38(1): 1-12, 2023

arXiv:2103.00527 [pdf, ps, other]

Covariate balancing for causal inference on categorical and continuous treatments

Authors: Seong-ho Lee, Yanyuan Ma, Xavier de Luna

Abstract: We propose novel estimators for categorical and continuous treatments by using an optimal covariate balancing strategy for inverse probability weighting. The resulting estimators are shown to be consistent and asymptotically normal for causal contrasts of interest, either when the model explaining treatment assignment is correctly specified, or when the correct set of bases for the outcome models… ▽ More We propose novel estimators for categorical and continuous treatments by using an optimal covariate balancing strategy for inverse probability weighting. The resulting estimators are shown to be consistent and asymptotically normal for causal contrasts of interest, either when the model explaining treatment assignment is correctly specified, or when the correct set of bases for the outcome models has been chosen and the assignment model is sufficiently rich. For the categorical treatment case, we show that the estimator attains the semiparametric efficiency bound when all models are correctly specified. For the continuous case, the causal parameter of interest is a function of the treatment dose. The latter is not parametrized and the estimators proposed are shown to have bias and variance of the classical nonparametric rate. Asymptotic results are complemented with simulations illustrating the finite sample properties. Our analysis of a data set suggests a nonlinear effect of BMI on the decline in self reported health. △ Less

Submitted 28 February, 2021; originally announced March 2021.

MSC Class: 62D20

arXiv:1811.01992 [pdf, other]

doi 10.5705/ss.202018.0416

Sufficient Dimension Reduction for Feasible and Robust Estimation of Average Causal Effect

Authors: Trinetri Ghosh, Yanyuan Ma, Xavier de Luna

Abstract: When estimating the treatment effect in an observational study, we use a semiparametric locally efficient dimension reduction approach to assess both the treatment assignment mechanism and the average responses in both treated and nontreated groups. We then integrate all results through imputation, inverse probability weighting and doubly robust augmentation estimators. Doubly robust estimators ar… ▽ More When estimating the treatment effect in an observational study, we use a semiparametric locally efficient dimension reduction approach to assess both the treatment assignment mechanism and the average responses in both treated and nontreated groups. We then integrate all results through imputation, inverse probability weighting and doubly robust augmentation estimators. Doubly robust estimators are locally efficient while imputation estimators are super-efficient when the response models are correct. To take advantage of both procedures, we introduce a shrinkage estimator to automatically combine the two, which retains the double robustness property while improving on the variance when the response model is correct. We demonstrate the performance of these estimators through simulated experiments and a real dataset concerning the effect of maternal smoking on baby birth weight. Key words and phrases: Average Treatment Effect, Doubly Robust Estimator, Efficiency, Inverse Probability Weighting, Shrinkage Estimator. △ Less

Submitted 5 November, 2018; originally announced November 2018.

Comments: 47 Pages, 4 figures

Journal ref: Statistica Sinica 31 (2021), 1-22

arXiv:1803.08764 [pdf, other]

doi 10.1016/j.ecosta.2020.01.003

Robust semiparametric inference with missing data

Authors: Eva Cantoni, Xavier de Luna

Abstract: Classical semiparametric inference with missing outcome data is not robust to contamination of the observed data and a single observation can have arbitrarily large influence on estimation of a parameter of interest. This sensitivity is exacerbated when inverse probability weighting methods are used, which may overweight contaminated observations. We introduce inverse probability weighted, double… ▽ More Classical semiparametric inference with missing outcome data is not robust to contamination of the observed data and a single observation can have arbitrarily large influence on estimation of a parameter of interest. This sensitivity is exacerbated when inverse probability weighting methods are used, which may overweight contaminated observations. We introduce inverse probability weighted, double robust and outcome regression estimators of location and scale parameters, which are robust to contamination in the sense that their influence function is bounded. We give asymptotic properties and study finite sample behaviour. Our simulated experiments show that contamination can be more serious a threat to the quality of inference than model misspecification. An interesting aspect of our results is that the auxiliary outcome model used to adjust for ignorable missingness by some of the estimators, is also useful to protect against contamination. We also illustrate through a case study how both adjustment to ignorable missingness and protection against contamination are achieved through weighting schemes, which can be contrasted to gain further insights. △ Less

Submitted 5 October, 2018; v1 submitted 23 March, 2018; originally announced March 2018.

Comments: 51 pages with appendices

MSC Class: 62G35

Journal ref: Econometrics and Statistics, Volume 16, October 2020, Pages 108-120

arXiv:1712.00292 [pdf, other]

doi 10.1111/biom.13001

Causal inference taking into account unobserved confounding

Authors: Minna Genbäck, Xavier de Luna

Abstract: Causal inference with observational data can be performed under an assumption of no unobserved confounders (unconfoundedness assumption). There is, however, seldom clear subject-matter or empirical evidence for such an assumption. We therefore develop uncertainty intervals for average causal effects based on outcome regression estimators and doubly robust estimators, which provide inference taking… ▽ More Causal inference with observational data can be performed under an assumption of no unobserved confounders (unconfoundedness assumption). There is, however, seldom clear subject-matter or empirical evidence for such an assumption. We therefore develop uncertainty intervals for average causal effects based on outcome regression estimators and doubly robust estimators, which provide inference taking into account both sampling variability and uncertainty due to unobserved confounders. In contrast with sampling variation, uncertainty due unobserved confounding does not decrease with increasing sample size. The intervals introduced are obtained by deriving the bias of the estimators due to unobserved confounders. We are thus also able to contrast the size of the bias due to violation of the unconfoundedness assumption, with bias due to misspecification of the models used to explain potential outcomes. This is illustrated through numerical experiments where bias due to moderate unobserved confounding dominates misspecification bias for typical situations in terms of sample size and modeling assumptions. We also study the empirical coverage of the uncertainty intervals introduced and apply the results to a study of the effect of regular food intake on health. An R-package implementing the inference proposed is available. △ Less

Submitted 7 December, 2017; v1 submitted 1 December, 2017; originally announced December 2017.

Comments: Biometrics 2018

Journal ref: Biometrics. 2019; 75, 506-515

arXiv:1309.4054 [pdf, other]

doi 10.1016/j.csda.2016.08.012

Data-driven Algorithms for Dimension Reduction in Causal Inference

Authors: Emma Persson, Jenny Häggström, Ingeborg Waernbaum, Xavier de Luna

Abstract: In observational studies, the causal effect of a treatment may be confounded with variables that are related to both the treatment and the outcome of interest. In order to identify a causal effect, such studies often rely on the unconfoundedness assumption, i.e., that all confounding variables are observed. The choice of covariates to control for, which is primarily based on subject matter knowled… ▽ More In observational studies, the causal effect of a treatment may be confounded with variables that are related to both the treatment and the outcome of interest. In order to identify a causal effect, such studies often rely on the unconfoundedness assumption, i.e., that all confounding variables are observed. The choice of covariates to control for, which is primarily based on subject matter knowledge, may result in a large covariate vector in the attempt to ensure that unconfoundedness holds. However, including redundant covariates can affect bias and efficiency of nonparametric causal effect estimators, e.g., due to the curse of dimensionality. Data-driven algorithms for the selection of sufficient covariate subsets are investigated. Under the assumption of unconfoundedness the algorithms search for minimal subsets of the covariate vector. Based, e.g., on the framework of sufficient dimension reduction or kernel smoothing, the algorithms perform a backward elimination procedure assessing the significance of each covariate. Their performance is evaluated in simulations and an application using data from the Swedish Childhood Diabetes Register is also presented. △ Less

Submitted 31 August, 2016; v1 submitted 16 September, 2013; originally announced September 2013.

Comments: 27 pages, 2 figures, 11 tables

Journal ref: Computational Statistics and Data Analysis, 2017, Vol. 105, p. 280-292

arXiv:1306.4509 [pdf, ps, other]

doi 10.1007/s00180-014-0515-0

Targeted smoothing parameter selection for estimating average causal effects

Authors: Jenny Häggström, Xavier de Luna

Abstract: The non-parametric estimation of average causal effects in observational studies often relies on controlling for confounding covariates through smoothing regression methods such as kernel, splines or local polynomial regression. Such regression methods are tuned via smoothing parameters which regulates the amount of degrees of freedom used in the fit. In this paper we propose data-driven methods f… ▽ More The non-parametric estimation of average causal effects in observational studies often relies on controlling for confounding covariates through smoothing regression methods such as kernel, splines or local polynomial regression. Such regression methods are tuned via smoothing parameters which regulates the amount of degrees of freedom used in the fit. In this paper we propose data-driven methods for selecting smoothing parameters when the targeted parameter is an average causal effect. For this purpose, we propose to estimate the exact expression of the mean squared error of the estimators. Asymptotic approximations indicate that the smoothing parameters minimizing this mean squared error converges to zero faster than the optimal smoothing parameter for the estimation of the regression functions. In a simulation study we show that the proposed data-driven methods for selecting the smoothing parameters yield lower empirical mean squared error than other methods available such as, e.g., cross-validation. △ Less

Submitted 19 June, 2013; originally announced June 2013.

Journal ref: Computational Statistics 29 (2014) 1727-1748

Showing 1–12 of 12 results for author: de Luna, X