-
A Bayesian joint model for mediation analysis with matrix-valued mediators
Authors:
Zi** Liu,
Zhihui Liu,
Ali Hosni,
John Kim,
Bei Jiang,
Olli Saarela
Abstract:
Unscheduled treatment interruptions may lead to reduced quality of care in radiation therapy (RT). Identifying the RT prescription dose effects on the outcome of treatment interruptions, mediated through doses distributed into different organs-at-risk (OARs), can inform future treatment planning. The radiation exposure to OARs can be summarized by a matrix of dose-volume histograms (DVH) for each…
▽ More
Unscheduled treatment interruptions may lead to reduced quality of care in radiation therapy (RT). Identifying the RT prescription dose effects on the outcome of treatment interruptions, mediated through doses distributed into different organs-at-risk (OARs), can inform future treatment planning. The radiation exposure to OARs can be summarized by a matrix of dose-volume histograms (DVH) for each patient. Although various methods for high-dimensional mediation analysis have been proposed recently, few studies investigated how matrix-valued data can be treated as mediators. In this paper, we propose a novel Bayesian joint mediation model for high-dimensional matrix-valued mediators. In this joint model, latent features are extracted from the matrix-valued data through an adaptation of probabilistic multilinear principal components analysis (MPCA), retaining the inherent matrix structure. We derive and implement a Gibbs sampling algorithm to jointly estimate all model parameters, and introduce a Varimax rotation method to identify active indicators of mediation among the matrix-valued data. Our simulation study finds that the proposed joint model has higher efficiency in estimating causal decomposition effects compared to an alternative two-step method, and demonstrates that the mediation effects can be identified and visualized in the matrix form. We apply the method to study the effect of prescription dose on treatment interruptions in anal canal cancer patients.
△ Less
Submitted 27 June, 2024; v1 submitted 1 October, 2023;
originally announced October 2023.
-
Multiply Robust Estimator Circumvents Hyperparameter Tuning of Neural Network Models in Causal Inference
Authors:
Mehdi Rostami,
Olli Saarela
Abstract:
Estimation of the Average Treatment Effect (ATE) is often carried out in 2 steps, wherein the first step, the treatment and outcome are modeled, and in the second step the predictions are inserted into the ATE estimator. In the first steps, numerous models can be fit to the treatment and outcome, including using machine learning algorithms. However, it is a difficult task to choose among the hyper…
▽ More
Estimation of the Average Treatment Effect (ATE) is often carried out in 2 steps, wherein the first step, the treatment and outcome are modeled, and in the second step the predictions are inserted into the ATE estimator. In the first steps, numerous models can be fit to the treatment and outcome, including using machine learning algorithms. However, it is a difficult task to choose among the hyperparameter sets which will result in the best causal effect estimation and inference. Multiply Robust (MR) estimator allows us to leverage all the first-step models in a single estimator. We show that MR estimator is $n^r$ consistent if one of the first-step treatment or outcome models is $n^r$ consistent. We also show that MR is the solution to a broad class of estimating equations, and is asymptotically normal if one of the treatment models is $\sqrt{n}$-consistent. The standard error of MR is also calculated which does not require a knowledge of the true models in the first step. Our simulations study supports the theoretical findings.
△ Less
Submitted 19 July, 2023;
originally announced July 2023.
-
A marginal structural model for normal tissue complication probability
Authors:
Thai-Son Tang,
Zhihui Liu,
Ali Hosni,
John Kim,
Olli Saarela
Abstract:
The goal of radiation therapy for cancer is to deliver prescribed radiation dose to the tumor while minimizing dose to the surrounding healthy tissues. To evaluate treatment plans, the dose distribution to healthy organs is commonly summarized as dose-volume histograms (DVHs). Normal tissue complication probability (NTCP) modelling has centered around making patient-level risk predictions with fea…
▽ More
The goal of radiation therapy for cancer is to deliver prescribed radiation dose to the tumor while minimizing dose to the surrounding healthy tissues. To evaluate treatment plans, the dose distribution to healthy organs is commonly summarized as dose-volume histograms (DVHs). Normal tissue complication probability (NTCP) modelling has centered around making patient-level risk predictions with features extracted from the DVHs, but few have considered adapting a causal framework to evaluate the safety of alternative treatment plans. We propose causal estimands for NTCP based on deterministic and stochastic interventions, as well as propose estimators based on marginal structural models that impose bivariable monotonicity between dose, volume, and toxicity risk. The properties of these estimators are studied through simulations, and their use is illustrated in the context of radiotherapy treatment of anal canal cancer patients.
△ Less
Submitted 23 May, 2024; v1 submitted 9 March, 2023;
originally announced March 2023.
-
A Generalized Variable Importance Metric and Estimator for Black Box Machine Learning Models
Authors:
Mohammad Kaviul Anam Khan,
Olli Saarela,
Rafal Kustra
Abstract:
In this paper we define a population parameter, ``Generalized Variable Importance Metric (GVIM)'', to measure importance of predictors for black box machine learning methods, where the importance is not represented by model-based parameter. GVIM is defined for each input variable, using the true conditional expectation function, and it measures the variable's importance in affecting a continuous o…
▽ More
In this paper we define a population parameter, ``Generalized Variable Importance Metric (GVIM)'', to measure importance of predictors for black box machine learning methods, where the importance is not represented by model-based parameter. GVIM is defined for each input variable, using the true conditional expectation function, and it measures the variable's importance in affecting a continuous or a binary response. We extend previously published results to show that the defined GVIM can be represented as a function of the Conditional Average Treatment Effect (CATE) for any kind of a predictor, which gives it a causal interpretation and further justification as an alternative to classical measures of significance that are only available in simple parametric models. Extensive set of simulations using realistically complex relationships between covariates and outcomes and number of regression techniques of varying degree of complexity show the performance of our proposed estimator of the GVIM.
△ Less
Submitted 23 December, 2023; v1 submitted 19 December, 2022;
originally announced December 2022.
-
A Feature Selection Method that Controls the False Discovery Rate
Authors:
Mehdi Rostami,
Olli Saarela
Abstract:
The problem of selecting a handful of truly relevant variables in supervised machine learning algorithms is a challenging problem in terms of untestable assumptions that must hold and unavailability of theoretical assurances that selection errors are under control. We propose a distribution-free feature selection method, referred to as Data Splitting Selection (DSS) which controls False Discovery…
▽ More
The problem of selecting a handful of truly relevant variables in supervised machine learning algorithms is a challenging problem in terms of untestable assumptions that must hold and unavailability of theoretical assurances that selection errors are under control. We propose a distribution-free feature selection method, referred to as Data Splitting Selection (DSS) which controls False Discovery Rate (FDR) of feature selection while obtaining a high power. Another version of DSS is proposed with a higher power which "almost" controls FDR. No assumptions are made on the distribution of the response or on the joint distribution of the features. Extensive simulation is performed to compare the performance of the proposed methods with the existing ones.
△ Less
Submitted 9 November, 2023; v1 submitted 4 August, 2022;
originally announced August 2022.
-
Normalized Augmented Inverse Probability Weighting with Neural Network Predictions
Authors:
Mehdi Rostami,
Olli Saarela
Abstract:
The estimation of Average Treatment Effect (ATE) as a causal parameter is carried out in two steps, where in the first step, the treatment and outcome are modeled to incorporate the potential confounders, and in the second step, the predictions are inserted into the ATE estimators such as the Augmented Inverse Probability Weighting (AIPW) estimator. Due to the concerns regarding the nonlinear or u…
▽ More
The estimation of Average Treatment Effect (ATE) as a causal parameter is carried out in two steps, where in the first step, the treatment and outcome are modeled to incorporate the potential confounders, and in the second step, the predictions are inserted into the ATE estimators such as the Augmented Inverse Probability Weighting (AIPW) estimator. Due to the concerns regarding the nonlinear or unknown relationships between confounders and the treatment and outcome, there has been an interest in applying non-parametric methods such as Machine Learning (ML) algorithms instead. Some literature proposes to use two separate Neural Networks (NNs) where there's no regularization on the network's parameters except the Stochastic Gradient Descent (SGD) in the NN's optimization. Our simulations indicate that the AIPW estimator suffers extensively if no regularization is utilized. We propose the normalization of AIPW (referred to as nAIPW) which can be helpful in some scenarios. nAIPW, provably, has the same properties as AIPW, that is, the double-robustness and orthogonality properties. Further, if the first step algorithms converge fast enough, under regulatory conditions, nAIPW will be asymptotically normal. We also compare the performance of AIPW and nAIPW in terms of the bias and variance when small to moderate L1 regularization is imposed on the NNs.
△ Less
Submitted 12 November, 2021; v1 submitted 3 August, 2021;
originally announced August 2021.
-
The Bias-Variance Tradeoff of Doubly Robust Estimator with Targeted $L_1$ regularized Neural Networks Predictions
Authors:
Mehdi Rostami,
Olli Saarela,
Michael Escobar
Abstract:
The Doubly Robust (DR) estimation of ATE can be carried out in 2 steps, where in the first step, the treatment and outcome are modeled, and in the second step the predictions are inserted into the DR estimator. The model misspecification in the first step has led researchers to utilize Machine Learning algorithms instead of parametric algorithms. However, existence of strong confounders and/or Ins…
▽ More
The Doubly Robust (DR) estimation of ATE can be carried out in 2 steps, where in the first step, the treatment and outcome are modeled, and in the second step the predictions are inserted into the DR estimator. The model misspecification in the first step has led researchers to utilize Machine Learning algorithms instead of parametric algorithms. However, existence of strong confounders and/or Instrumental Variables (IVs) can lead the complex ML algorithms to provide perfect predictions for the treatment model which can violate the positivity assumption and elevate the variance of DR estimators. Thus the ML algorithms must be controlled to avoid perfect predictions for the treatment model while still learn the relationship between the confounders and the treatment and outcome.
We use two Neural network architectures and investigate how their hyperparameters should be tuned in the presence of confounders and IVs to achieve a low bias-variance tradeoff for ATE estimators such as DR estimator. Through simulation results, we will provide recommendations as to how NNs can be employed for ATE estimation.
△ Less
Submitted 2 August, 2021;
originally announced August 2021.
-
casebase: An Alternative Framework For Survival Analysis and Comparison of Event Rates
Authors:
Sahir Rai Bhatnagar,
Maxime Turgeon,
Jesse Islam,
James A. Hanley,
Olli Saarela
Abstract:
In epidemiological studies of time-to-event data, a quantity of interest to the clinician and the patient is the risk of an event given a covariate profile. However, methods relying on time matching or risk-set sampling (including Cox regression) eliminate the baseline hazard from the likelihood expression or the estimating function. The baseline hazard then needs to be estimated separately using…
▽ More
In epidemiological studies of time-to-event data, a quantity of interest to the clinician and the patient is the risk of an event given a covariate profile. However, methods relying on time matching or risk-set sampling (including Cox regression) eliminate the baseline hazard from the likelihood expression or the estimating function. The baseline hazard then needs to be estimated separately using a non-parametric approach. This leads to step-wise estimates of the cumulative incidence that are difficult to interpret. Using case-base sampling, Hanley & Miettinen (2009) explained how the parametric hazard functions can be estimated using logistic regression. Their approach naturally leads to estimates of the cumulative incidence that are smooth-in-time. In this paper, we present the casebase R package, a comprehensive and flexible toolkit for parametric survival analysis. We describe how the case-base framework can also be used in more complex settings: competing risks, time-varying exposure, and variable selection. Our package also includes an extensive array of visualization tools to complement the analysis of time-to-event data. We illustrate all these features through four different case studies. *SRB and MT contributed equally to this work.
△ Less
Submitted 21 September, 2020;
originally announced September 2020.
-
Estimation of marriage incidence rates by combining two cross-sectional retrospective designs: Event history analysis of two dependent processes
Authors:
Sangita Kulathinal,
Minna Säävälä,
Kari Auranen,
Olli Saarela
Abstract:
The aim of this work is to develop methods for studying the determinants of marriage incidence using marriage histories collected under two different types of retrospective cross-sectional study designs. These designs are: sampling of ever married women before the cross-section, a prevalent cohort, and sampling of women irrespective of marital status, a general cross-sectional cohort. While retros…
▽ More
The aim of this work is to develop methods for studying the determinants of marriage incidence using marriage histories collected under two different types of retrospective cross-sectional study designs. These designs are: sampling of ever married women before the cross-section, a prevalent cohort, and sampling of women irrespective of marital status, a general cross-sectional cohort. While retrospective histories from a prevalent cohort do not identify incidence rates without parametric modelling assumptions, the rates can be identified when combined with data from a general cohort. Moreover, education, a strong endogenous covariate, and marriage processes are correlated. Hence, they need to be modelled jointly in order to estimate the marriage incidence. For this purpose, we specify a multi-state model and propose a likelihood-based estimation method. We outline the assumptions under which a likelihood expression involving only marriage incidence parameters can be derived. This is of particular interest when either retrospective education histories are not available or related parameters are not of interest. Our simulation results confirm the gain in efficiency by combining data from the two designs, while demonstrating how the parameter estimates are affected by violations of the assumptions used in deriving the simplified likelihood expressions. Two Indian National Family Health Surveys are used as motivation for the methodological development and to demonstrate the application of the methods.
△ Less
Submitted 3 September, 2020;
originally announced September 2020.
-
Causal mediation analysis decomposition of between-hospital variance
Authors:
Bo Chen,
Keith A. Lawson,
Antonio Finelli,
Olli Saarela
Abstract:
Causal variance decompositions for a given disease-specific quality indicator can be used to quantify differences in performance between hospitals or health care providers. While variance decompositions can demonstrate variation in quality of care, causal mediation analysis can be used to study care pathways leading to the differences in performance between the institutions. This raises the questi…
▽ More
Causal variance decompositions for a given disease-specific quality indicator can be used to quantify differences in performance between hospitals or health care providers. While variance decompositions can demonstrate variation in quality of care, causal mediation analysis can be used to study care pathways leading to the differences in performance between the institutions. This raises the question of whether the two approaches can be combined to decompose between-hospital variation in an outcome type indicator to that mediated through a given process (indirect effect) and remaining variation due to all other pathways (direct effect). For this purpose, we derive a causal mediation analysis decomposition of between-hospital variance, discuss its interpretation, and propose an estimation approach based on generalized linear mixed models for the outcome and the mediator. We study the performance of the estimators in a simulation study and demonstrate its use in administrative data on kidney cancer care in Ontario.
△ Less
Submitted 24 January, 2023; v1 submitted 28 August, 2020;
originally announced August 2020.
-
Bayesian non-parametric ordinal regression under a monotonicity constraint
Authors:
Olli Saarela,
Christian Rohrbeck,
Elja Arjas
Abstract:
Compared to the nominal scale, the ordinal scale for a categorical outcome variable has the property of making a monotonicity assumption for the covariate effects meaningful. This assumption is encoded in the commonly used proportional odds model, but there it is combined with other parametric assumptions such as linearity and additivity. Herein, the considered models are non-parametric and the on…
▽ More
Compared to the nominal scale, the ordinal scale for a categorical outcome variable has the property of making a monotonicity assumption for the covariate effects meaningful. This assumption is encoded in the commonly used proportional odds model, but there it is combined with other parametric assumptions such as linearity and additivity. Herein, the considered models are non-parametric and the only condition imposed is that the effects of the covariates on the outcome categories are stochastically monotone according to the ordinal scale. We are not aware of the existence of other comparable multivariable models that would be suitable for inference purposes. We generalize our previously proposed Bayesian monotonic multivariable regression model to ordinal outcomes, and propose an estimation procedure based on reversible jump Markov chain Monte Carlo. The model is based on a marked point process construction, which allows it to approximate arbitrary monotonic regression function shapes, and has a built-in covariate selection property. We study the performance of the proposed approach through extensive simulation studies, and demonstrate its practical application in two real data examples.
△ Less
Submitted 11 February, 2022; v1 submitted 2 July, 2020;
originally announced July 2020.
-
The role of exchangeability in causal inference
Authors:
Olli Saarela,
David A. Stephens,
Erica E. M. Moodie
Abstract:
Though the notion of exchangeability has been discussed in the causal inference literature under various guises, it has rarely taken its original meaning as a symmetry property of probability distributions. As this property is a standard component of Bayesian inference, we argue that in Bayesian causal inference it is natural to link the causal model, including the notion of confounding and defini…
▽ More
Though the notion of exchangeability has been discussed in the causal inference literature under various guises, it has rarely taken its original meaning as a symmetry property of probability distributions. As this property is a standard component of Bayesian inference, we argue that in Bayesian causal inference it is natural to link the causal model, including the notion of confounding and definition of causal contrasts of interest, to the concept of exchangeability. Here we propose a probabilistic between-group exchangeability property as an identifying condition for causal effects, relate it to alternative conditions for unconfounded inferences (commonly stated using potential outcomes) and define causal contrasts in the presence of exchangeability in terms of posterior predictive expectations for further exchangeable units. While our main focus is on a point treatment setting, we also investigate how this reasoning carries over to longitudinal settings.
△ Less
Submitted 15 December, 2022; v1 submitted 2 June, 2020;
originally announced June 2020.
-
Instrumental variable estimation of early treatment effect in randomized screening trials
Authors:
Sudipta Saha,
Zhihui Liu,
Olli Saarela
Abstract:
The primary analysis of randomized screening trials for cancer typically adheres to the intention-to-screen principle, measuring cancer-specific mortality reductions between screening and control arms. These mortality reductions result from a combination of the screening regimen, screening technology and the effect of the early, screening-induced, treatment. This motivates addressing these differe…
▽ More
The primary analysis of randomized screening trials for cancer typically adheres to the intention-to-screen principle, measuring cancer-specific mortality reductions between screening and control arms. These mortality reductions result from a combination of the screening regimen, screening technology and the effect of the early, screening-induced, treatment. This motivates addressing these different aspects separately. Here we are interested in the causal effect of early versus delayed treatments on cancer mortality among the screening-detectable subgroup, which under certain assumptions is estimable from conventional randomized screening trial using instrumental variable type methods. To define the causal effect of interest, we formulate a simplified structural multi-state model for screening trials, based on a hypothetical intervention trial where screening detected individuals would be randomized into early versus delayed treatments. The cancer-specific mortality reductions after screening detection are quantified by a cause-specific hazard ratio. For this, we propose two estimators, based on an estimating equation and a likelihood expression. The methods extend existing instrumental variable methods for time-to-event and competing risks outcomes to time-dependent intermediate variables. Using the multi-state model as the basis of a data generating mechanism, we investigate the performance of the new estimators through simulation studies. In addition, we illustrate the proposed method in the context of CT screening for lung cancer using the US National Lung Screening Trial (NLST) data.
△ Less
Submitted 7 June, 2021; v1 submitted 14 May, 2020;
originally announced May 2020.
-
Hierarchical causal variance decomposition for institution and provider comparisons in healthcare
Authors:
Bo Chen,
Olli Saarela
Abstract:
Disease-specific quality indicators (QIs) are used to compare institutions and health care providers in terms processes or outcomes relevant to treatment of a particular condition. In the context of surgical cancer treatments, the performance variations can be due to hospital and/or surgeon level differences, creating a hierarchical clustering. We consider how the observed variation in care receiv…
▽ More
Disease-specific quality indicators (QIs) are used to compare institutions and health care providers in terms processes or outcomes relevant to treatment of a particular condition. In the context of surgical cancer treatments, the performance variations can be due to hospital and/or surgeon level differences, creating a hierarchical clustering. We consider how the observed variation in care received at patient level can be decomposed into that causally explained by the hospital performance, surgeon performance within hospital, patient case-mix, and unexplained (residual) variation. For this purpose, we derive a four-way variance decomposition, with particular attention to the causal interpretation of the components. For estimation, we use inputs from a mixed-effect model with nested random hospital/surgeon-specific effects, and a multinomial logistic model for the hospital/surgeon-specific patient populations. We investigate the performance of our methods in a simulation study.
△ Less
Submitted 21 May, 2020; v1 submitted 14 May, 2020;
originally announced May 2020.
-
Causal variance decompositions for institutional comparisons in healthcare
Authors:
Bo Chen,
Keith A. Lawson,
Antonio Finelli,
Olli Saarela
Abstract:
There is increasing interest in comparing institutions delivering healthcare in terms of disease-specific quality indicators (QIs) that capture processes or outcomes showing variations in the care provided. Such comparisons can be framed in terms of causal models, where adjusting for patient case-mix is analogous to controlling for confounding, and exposure is being treated in a given hospital, fo…
▽ More
There is increasing interest in comparing institutions delivering healthcare in terms of disease-specific quality indicators (QIs) that capture processes or outcomes showing variations in the care provided. Such comparisons can be framed in terms of causal models, where adjusting for patient case-mix is analogous to controlling for confounding, and exposure is being treated in a given hospital, for instance. Our goal here is to help identifying good QIs rather than comparing hospitals in terms of an already chosen QI, and so we focus on the presence and magnitude of overall variation in care between the hospitals rather than the pairwise differences between any two hospitals. We consider how the observed variation in care received at patient level can be decomposed into that causally explained by the hospital performance adjusting for the case-mix, the case-mix itself, and residual variation. For this purpose, we derive a three-way variance decomposition, with particular attention to its causal interpretation in terms of potential outcome variables. We propose model-based estimators for the decomposition, accommodating different link functions and either fixed or random effect models. We evaluate their performance in a simulation study and demonstrate their use in a real data application.
△ Less
Submitted 4 September, 2019; v1 submitted 20 February, 2019;
originally announced February 2019.
-
A Bayesian view of doubly robust causal inference
Authors:
Olli Saarela,
Léo R. Belzile,
David A. Stephens
Abstract:
In causal inference confounding may be controlled either through regression adjustment in an outcome model, or through propensity score adjustment or inverse probability of treatment weighting, or both. The latter approaches, which are based on modelling of the treatment assignment mechanism and their doubly robust extensions have been difficult to motivate using formal Bayesian arguments, in prin…
▽ More
In causal inference confounding may be controlled either through regression adjustment in an outcome model, or through propensity score adjustment or inverse probability of treatment weighting, or both. The latter approaches, which are based on modelling of the treatment assignment mechanism and their doubly robust extensions have been difficult to motivate using formal Bayesian arguments, in principle, for likelihood-based inferences, the treatment assignment model can play no part in inferences concerning the expected outcomes if the models are assumed to be correctly specified. On the other hand, forcing dependency between the outcome and treatment assignment models by allowing the former to be misspecified results in loss of the balancing property of the propensity scores and the loss of any double robustness. In this paper, we explain in the framework of misspecified models why doubly robust inferences cannot arise from purely likelihood-based arguments, and demonstrate this through simulations. As an alternative to Bayesian propensity score analysis, we propose a Bayesian posterior predictive approach for constructing doubly robust estimation procedures. Our approach appropriately decouples the outcome and treatment assignment models by incorporating the inverse treatment assignment probabilities in Bayesian causal inferences as importance sampling weights in Monte Carlo integration.
△ Less
Submitted 15 January, 2017;
originally announced January 2017.