-
Impact of Non-Informative Censoring on Propensity Score Based Estimation of Marginal Hazard Ratios
Authors:
Guilherme W. F. Barros,
Jenny Häggström
Abstract:
In medical and epidemiological studies, one of the most common settings is studying the effect of a treatment on a time-to-event outcome, where the time-to-event might be censored before end of study. A common parameter of interest in such a setting is the marginal hazard ratio (MHR). When a study is based on observational data, propensity score (PS) based methods are often used, in an attempt to…
▽ More
In medical and epidemiological studies, one of the most common settings is studying the effect of a treatment on a time-to-event outcome, where the time-to-event might be censored before end of study. A common parameter of interest in such a setting is the marginal hazard ratio (MHR). When a study is based on observational data, propensity score (PS) based methods are often used, in an attempt to make the treatment groups comparable despite having a non-randomized treatment. Previous studies have shown censoring to be a factor that induces bias when using PS based estimators. In this paper we study the magnitude of the bias under different rates of non-informative censoring when estimating MHR using PS weighting or PS matching. A bias correction involving the probability of event is suggested and compared to conventional PS based methods.
△ Less
Submitted 14 February, 2024;
originally announced February 2024.
-
Covariate selection for the estimation of marginal hazard ratios in high-dimensional data
Authors:
Guilherme W. F. Barros,
Jenny Häggström
Abstract:
Hazard ratios are frequently reported in time-to-event and epidemiological studies to assess treatment effects. In observational studies, the combination of propensity score weights with the Cox proportional hazards model facilitates the estimation of the marginal hazard ratio (MHR). The methods for estimating MHR are analogous to those employed for estimating common causal parameters, such as the…
▽ More
Hazard ratios are frequently reported in time-to-event and epidemiological studies to assess treatment effects. In observational studies, the combination of propensity score weights with the Cox proportional hazards model facilitates the estimation of the marginal hazard ratio (MHR). The methods for estimating MHR are analogous to those employed for estimating common causal parameters, such as the average treatment effect. However, MHR estimation in the context of high-dimensional data remain unexplored. This paper seeks to address this gap through a simulation study that consider variable selection methods from causal inference combined with a recently proposed multiply robust approach for MHR estimation. Additionally, a case study utilizing stroke register data is conducted to demonstrate the application of these methods. The results from the simulation study indicate that the double selection covariate selection method is preferable to several other strategies when estimating MHR. Nevertheless, the estimation can be further improved by employing the multiply robust approach to the set of propensity score models obtained during the double selection process.
△ Less
Submitted 13 February, 2024;
originally announced February 2024.
-
The costs and benefits of uniformly valid causal inference with high-dimensional nuisance parameters
Authors:
Niloofar Moosavi,
Jenny Häggström,
Xavier de Luna
Abstract:
Important advances have recently been achieved in develo** procedures yielding uniformly valid inference for a low dimensional causal parameter when high-dimensional nuisance models must be estimated. In this paper, we review the literature on uniformly valid causal inference and discuss the costs and benefits of using uniformly valid inference procedures. Naive estimation strategies based on re…
▽ More
Important advances have recently been achieved in develo** procedures yielding uniformly valid inference for a low dimensional causal parameter when high-dimensional nuisance models must be estimated. In this paper, we review the literature on uniformly valid causal inference and discuss the costs and benefits of using uniformly valid inference procedures. Naive estimation strategies based on regularisation, machine learning, or a preliminary model selection stage for the nuisance models have finite sample distributions which are badly approximated by their asymptotic distributions. To solve this serious problem, estimators which converge uniformly in distribution over a class of data generating mechanisms have been proposed in the literature. In order to obtain uniformly valid results in high-dimensional situations, sparsity conditions for the nuisance models need typically to be made, although a double robustness property holds, whereby if one of the nuisance model is more sparse, the other nuisance model is allowed to be less sparse. While uniformly valid inference is a highly desirable property, uniformly valid procedures pay a high price in terms of inflated variability. Our discussion of this dilemma is illustrated by the study of a double-selection outcome regression estimator, which we show is uniformly asymptotically unbiased, but is less variable than uniformly valid estimators in the numerical experiments conducted.
△ Less
Submitted 5 May, 2021;
originally announced May 2021.
-
Collaborative-controlled LASSO for Constructing Propensity Score-based Estimators in High-Dimensional Data
Authors:
Cheng Ju,
Richard Wyss,
Jessica M. Franklin,
Sebastian Schneeweiss,
Jenny Häggström,
Mark J. van der Laan
Abstract:
Propensity score (PS) based estimators are increasingly used for causal inference in observational studies. However, model selection for PS estimation in high-dimensional data has received little attention. In these settings, PS models have traditionally been selected based on the goodness-of-fit for the treatment mechanism itself, without consideration of the causal parameter of interest. Collabo…
▽ More
Propensity score (PS) based estimators are increasingly used for causal inference in observational studies. However, model selection for PS estimation in high-dimensional data has received little attention. In these settings, PS models have traditionally been selected based on the goodness-of-fit for the treatment mechanism itself, without consideration of the causal parameter of interest. Collaborative minimum loss-based estimation (C-TMLE) is a novel methodology for causal inference that takes into account information on the causal parameter of interest when selecting a PS model. This "collaborative learning" considers variable associations with both treatment and outcome when selecting a PS model in order to minimize a bias-variance trade off in the estimated treatment effect. In this study, we introduce a novel approach for collaborative model selection when using the LASSO estimator for PS estimation in high-dimensional covariate settings. To demonstrate the importance of selecting the PS model collaboratively, we designed quasi-experiments based on a real electronic healthcare database, where only the potential outcomes were manually generated, and the treatment and baseline covariates remained unchanged. Results showed that the C-TMLE algorithm outperformed other competing estimators for both point estimation and confidence interval coverage. In addition, the PS model selected by C-TMLE could be applied to other PS-based estimators, which also resulted in substantive improvement for both point estimation and confidence interval coverage. We illustrate the discussed concepts through an empirical example comparing the effects of non-selective nonsteroidal anti-inflammatory drugs with selective COX-2 inhibitors on gastrointestinal complications in a population of Medicare beneficiaries.
△ Less
Submitted 30 June, 2017;
originally announced June 2017.
-
Data-Driven Confounder Selection via Markov and Bayesian Networks
Authors:
Jenny Häggström
Abstract:
To unbiasedly estimate a causal effect on an outcome unconfoundedness is often assumed. If there is sufficient knowledge on the underlying causal structure then existing confounder selection criteria can be used to select subsets of the observed pretreatment covariates, $X$, sufficient for unconfoundedness, if such subsets exist. Here, estimation of these target subsets is considered when the unde…
▽ More
To unbiasedly estimate a causal effect on an outcome unconfoundedness is often assumed. If there is sufficient knowledge on the underlying causal structure then existing confounder selection criteria can be used to select subsets of the observed pretreatment covariates, $X$, sufficient for unconfoundedness, if such subsets exist. Here, estimation of these target subsets is considered when the underlying causal structure is unknown. The proposed method is to model the causal structure by a probabilistic graphical model, e.g., a Markov or Bayesian network, estimate this graph from observed data and select the target subsets given the estimated graph. The approach is evaluated by simulation both in a high-dimensional setting where unconfoundedness holds given $X$ and in a setting where unconfoundedness only holds given subsets of $X$. Several common target subsets are investigated and the selected subsets are compared with respect to accuracy in estimating the average causal effect. The proposed method is implemented with existing software that can easily handle high-dimensional data, in terms of large samples and large number of covariates. The results from the simulation study show that, if unconfoundedness holds given $X$, this approach is very successful in selecting the target subsets, outperforming alternative approaches based on random forests and LASSO, and that the subset estimating the target subset containing all causes of outcome yields smallest MSE in the average causal effect estimation.
△ Less
Submitted 17 March, 2017; v1 submitted 25 April, 2016;
originally announced April 2016.
-
Data-driven Algorithms for Dimension Reduction in Causal Inference
Authors:
Emma Persson,
Jenny Häggström,
Ingeborg Waernbaum,
Xavier de Luna
Abstract:
In observational studies, the causal effect of a treatment may be confounded with variables that are related to both the treatment and the outcome of interest. In order to identify a causal effect, such studies often rely on the unconfoundedness assumption, i.e., that all confounding variables are observed. The choice of covariates to control for, which is primarily based on subject matter knowled…
▽ More
In observational studies, the causal effect of a treatment may be confounded with variables that are related to both the treatment and the outcome of interest. In order to identify a causal effect, such studies often rely on the unconfoundedness assumption, i.e., that all confounding variables are observed. The choice of covariates to control for, which is primarily based on subject matter knowledge, may result in a large covariate vector in the attempt to ensure that unconfoundedness holds. However, including redundant covariates can affect bias and efficiency of nonparametric causal effect estimators, e.g., due to the curse of dimensionality. Data-driven algorithms for the selection of sufficient covariate subsets are investigated. Under the assumption of unconfoundedness the algorithms search for minimal subsets of the covariate vector. Based, e.g., on the framework of sufficient dimension reduction or kernel smoothing, the algorithms perform a backward elimination procedure assessing the significance of each covariate. Their performance is evaluated in simulations and an application using data from the Swedish Childhood Diabetes Register is also presented.
△ Less
Submitted 31 August, 2016; v1 submitted 16 September, 2013;
originally announced September 2013.
-
Targeted smoothing parameter selection for estimating average causal effects
Authors:
Jenny Häggström,
Xavier de Luna
Abstract:
The non-parametric estimation of average causal effects in observational studies often relies on controlling for confounding covariates through smoothing regression methods such as kernel, splines or local polynomial regression. Such regression methods are tuned via smoothing parameters which regulates the amount of degrees of freedom used in the fit. In this paper we propose data-driven methods f…
▽ More
The non-parametric estimation of average causal effects in observational studies often relies on controlling for confounding covariates through smoothing regression methods such as kernel, splines or local polynomial regression. Such regression methods are tuned via smoothing parameters which regulates the amount of degrees of freedom used in the fit. In this paper we propose data-driven methods for selecting smoothing parameters when the targeted parameter is an average causal effect. For this purpose, we propose to estimate the exact expression of the mean squared error of the estimators. Asymptotic approximations indicate that the smoothing parameters minimizing this mean squared error converges to zero faster than the optimal smoothing parameter for the estimation of the regression functions. In a simulation study we show that the proposed data-driven methods for selecting the smoothing parameters yield lower empirical mean squared error than other methods available such as, e.g., cross-validation.
△ Less
Submitted 19 June, 2013;
originally announced June 2013.