-
Bipartite causal inference with interference, time series data, and a random network
Authors:
Zhaoyan Song,
Georgia Papadogeorgou
Abstract:
In bipartite causal inference with interference there are two distinct sets of units: those that receive the treatment, termed interventional units, and those on which the outcome is measured, termed outcome units. Which interventional units' treatment can drive which outcome units' outcomes is often depicted in a bipartite network. We study bipartite causal inference with interference from observ…
▽ More
In bipartite causal inference with interference there are two distinct sets of units: those that receive the treatment, termed interventional units, and those on which the outcome is measured, termed outcome units. Which interventional units' treatment can drive which outcome units' outcomes is often depicted in a bipartite network. We study bipartite causal inference with interference from observational data across time and with a changing bipartite network. Under an exposure map** framework, we define causal effects specific to each outcome unit, representing average contrasts of potential outcomes across time. We establish unconfoundedness of the exposure received by the outcome units based on unconfoundedness assumptions on the interventional units' treatment assignment and the random graph, hence respecting the bipartite structure of the problem. By harvesting the time component of our setting, causal effects are estimable while controlling only for temporal trends and time-varying confounders. Our results hold for binary, continuous, and multivariate exposure map**s. In the case of a binary exposure, we propose three matching algorithms to estimate the causal effect based on matching exposed to unexposed time periods for the same outcome unit, and we show that the bias of the resulting estimators is bounded. We illustrate our approach with an extensive simulation study and an application on the effect of wildfire smoke on transportation by bicycle.
△ Less
Submitted 6 April, 2024;
originally announced April 2024.
-
Addressing selection bias in cluster randomized experiments via weighting
Authors:
Georgia Papadogeorgou,
Bo Liu,
Fan Li,
Fan Li
Abstract:
In cluster randomized experiments, units are often recruited after the random cluster assignment, and data are only available for the recruited sample. Post-randomization recruitment can lead to selection bias, inducing systematic differences between the overall and the recruited populations, and between the recruited intervention and control arms. In this setting, we define causal estimands for t…
▽ More
In cluster randomized experiments, units are often recruited after the random cluster assignment, and data are only available for the recruited sample. Post-randomization recruitment can lead to selection bias, inducing systematic differences between the overall and the recruited populations, and between the recruited intervention and control arms. In this setting, we define causal estimands for the overall and the recruited populations. We first show that if units select their cluster independently of the treatment assignment, cluster randomization implies individual randomization in the overall population. We then prove that under the assumption of ignorable recruitment, the average treatment effect on the recruited population can be consistently estimated from the recruited sample using inverse probability weighting. Generally we cannot identify the average treatment effect on the overall population. Nonetheless, we show, via a principal stratification formulation, that one can use weighting of the recruited sample to identify treatment effects on two meaningful subpopulations of the overall population: units who would be recruited into the study regardless of the assignment, and units who would be recruited in the study under treatment but not under control. We develop a corresponding estimation strategy and a sensitivity analysis method for checking the ignorable recruitment assumption.
△ Less
Submitted 13 September, 2023;
originally announced September 2023.
-
Spatial causal inference in the presence of unmeasured confounding and interference
Authors:
Georgia Papadogeorgou,
Srijata Samanta
Abstract:
This manuscript bridges the divide between causal inference and spatial statistics, presenting novel insights for causal inference in spatial data analysis, and establishing how tools from spatial statistics can be used to draw causal inferences. We introduce spatial causal graphs to highlight that spatial confounding and interference can be entangled, in that investigating the presence of one can…
▽ More
This manuscript bridges the divide between causal inference and spatial statistics, presenting novel insights for causal inference in spatial data analysis, and establishing how tools from spatial statistics can be used to draw causal inferences. We introduce spatial causal graphs to highlight that spatial confounding and interference can be entangled, in that investigating the presence of one can lead to wrongful conclusions in the presence of the other. Moreover, we show that spatial dependence in the exposure variable can render standard analyses invalid, which can lead to erroneous conclusions. To remedy these issues, we propose a Bayesian parametric approach based on tools commonly-used in spatial statistics. This approach simultaneously accounts for interference and mitigates bias resulting from local and neighborhood unmeasured spatial confounding. From a Bayesian perspective, we show that incorporating an exposure model is necessary, and we theoretically prove that all model parameters are identifiable, even in the presence of unmeasured confounding. To illustrate the approach's effectiveness, we provide results from a simulation study and a case study involving the impact of sulfur dioxide emissions from power plants on cardiovascular mortality.
△ Less
Submitted 2 February, 2024; v1 submitted 14 March, 2023;
originally announced March 2023.
-
Bayesian inference for aggregated Hawkes processes
Authors:
Lingxiao Zhou,
Georgia Papadogeorgou
Abstract:
The Hawkes process, a self-exciting point process, has a wide range of applications in modeling earthquakes, social networks and stock markets. The established estimation process requires that researchers have access to the exact time stamps and spatial information. However, available data are often rounded or aggregated. We develop a Bayesian estimation procedure for the parameters of a Hawkes pr…
▽ More
The Hawkes process, a self-exciting point process, has a wide range of applications in modeling earthquakes, social networks and stock markets. The established estimation process requires that researchers have access to the exact time stamps and spatial information. However, available data are often rounded or aggregated. We develop a Bayesian estimation procedure for the parameters of a Hawkes process based on aggregated data. Our approach is developed for temporal, spatio-temporal, and mutually exciting Hawkes processes where data are available over discrete time periods and regions. We show theoretically that the parameters of the Hawkes process are identifiable from aggregated data under general specifications. We demonstrate the method on simulated temporal and spatio-temporal data with various model specifications in the presence of one or more interacting processes, and under varying coarseness of data aggregation. Finally, we examine the internal and cross-excitation effects of airstrikes and insurgent violence events from February 2007 to June 2008, with some data aggregated by day.
△ Less
Submitted 16 June, 2024; v1 submitted 29 November, 2022;
originally announced November 2022.
-
Clarifying Selection Bias in Cluster Randomized Trials: Estimands and Estimation
Authors:
Fan Li,
Zizhong Tian,
Jennifer Bobb,
Georgia Papadogeorgou,
Fan Li
Abstract:
In cluster randomized trials, patients are typically recruited after clusters are randomized, and the recruiters and patients may not be blinded to the assignment. This often leads to differential recruitment and consequently systematic differences in baseline characteristics of the recruited patients between intervention and control arms, inducing post-randomization selection bias. We rigorously…
▽ More
In cluster randomized trials, patients are typically recruited after clusters are randomized, and the recruiters and patients may not be blinded to the assignment. This often leads to differential recruitment and consequently systematic differences in baseline characteristics of the recruited patients between intervention and control arms, inducing post-randomization selection bias. We rigorously define causal estimands in the presence of selection bias. We elucidate the conditions under which standard covariate adjustment methods can validly estimate these estimands. We further discuss the additional data and assumptions necessary for estimating causal effects when such conditions are not met. Adopting the principal stratification framework in causal inference, we clarify there are two average treatment effect (ATE) estimands in cluster randomized trials: one for the overall population and one for the recruited population. We derive the analytical formula of the two estimands in terms of principal-stratum-specific causal effects. Using simulation studies, we assess the empirical performance of the multivariable regression adjustment method under different data generating processes leading to selection bias. When treatment effects are heterogeneous across principal strata, the ATE on the overall population generally differs from the ATE on the recruited population. An intention-to-treat analysis of the recruited sample leads to biased estimates of both ATEs. In the presence of post-randomization selection and without additional data on the non-recruited subjects, the ATE on the recruited population is estimable only when the treatment effects are homogenous between principal strata, and the ATE on the overall population is generally not estimable. The extent to which covariate adjustment can remove selection bias depends on the degree of effect heterogeneity across principal strata.
△ Less
Submitted 4 October, 2021; v1 submitted 16 July, 2021;
originally announced July 2021.
-
Discussion of the manuscript: Spatial+ a novel approach to spatial confounding
Authors:
Georgia Papadogeorgou
Abstract:
I congratulate Dupont, Wood and Augustin (DWA hereon) for providing an easy-to-implement method for estimation in the presence of spatial confounding, and for addressing some of the complicated aspects on the topic. The method regresses the covariate of interest on spatial basis functions and uses the residuals of this model in an outcome regression. The authors show that, if the covariate is not…
▽ More
I congratulate Dupont, Wood and Augustin (DWA hereon) for providing an easy-to-implement method for estimation in the presence of spatial confounding, and for addressing some of the complicated aspects on the topic. The method regresses the covariate of interest on spatial basis functions and uses the residuals of this model in an outcome regression. The authors show that, if the covariate is not completely spatial, this approach leads to consistent estimation of the conditional association between the exposure and the outcome. Below I discuss conceptual and operational issues that are fundamental to inference in spatial settings: (i) the target quantity and its interpretability, (ii) the non-spatial aspect of covariates and their relative spatial scales, and (iii) the impact of spatial smoothing. While DWA provide some insights on these issues, I believe that the audience might benefit from a deeper discussion. In what follows, I focus on the setting where a researcher is interested in interpreting the relationship between a given covariate and an outcome. I refer to the covariate of interest as the exposure to differentiate it from the rest.
△ Less
Submitted 4 July, 2021;
originally announced July 2021.
-
Covariate-informed latent interaction models: Addressing geographic & taxonomic bias in predicting bird-plant interactions
Authors:
Georgia Papadogeorgou,
Carolina Bello,
Otso Ovaskainen,
David B. Dunson
Abstract:
Reductions in natural habitats urge that we better understand species' interconnection and how biological communities respond to environmental changes. However, ecological studies of species' interactions are limited by their geographic and taxonomic focus which can distort our understanding of interaction dynamics. We focus on bird-plant interactions that refer to situations of potential fruit co…
▽ More
Reductions in natural habitats urge that we better understand species' interconnection and how biological communities respond to environmental changes. However, ecological studies of species' interactions are limited by their geographic and taxonomic focus which can distort our understanding of interaction dynamics. We focus on bird-plant interactions that refer to situations of potential fruit consumption and seed dispersal. We develop an approach for predicting species' interactions that accounts for errors in the recorded interaction networks, addresses the geographic and taxonomic biases of existing studies, is based on latent factors to increase flexibility and borrow information across species, incorporates covariates in a flexible manner to inform the latent factors, and uses a meta-analysis data set from 85 individual studies. We focus on interactions among 232 birds and 511 plants in the Atlantic Forest, and identify 5% of pairs of species with an unrecorded interaction, but posterior probability that the interaction is possible over 80%. Finally, we develop a permutation-based variable importance procedure for latent factor network models and identify that a bird's body mass and a plant's fruit diameter are important in driving the presence of species interactions, with a multiplicative relationship that exhibits both a thresholding and a matching behavior.
△ Less
Submitted 20 February, 2023; v1 submitted 9 March, 2021;
originally announced March 2021.
-
Propensity Score Weighting for Causal Subgroup Analysis
Authors:
Siyun Yang,
Elizabeth Lorenzi,
Georgia Papadogeorgou,
Daniel M. Wojdyla,
Fan Li,
Laine E. Thomas
Abstract:
A common goal in comparative effectiveness research is to estimate treatment effects on pre-specified subpopulations of patients. Though widely used in medical research, causal inference methods for such subgroup analysis remain underdeveloped, particularly in observational studies. In this article, we develop a suite of analytical methods and visualization tools for causal subgroup analysis. Firs…
▽ More
A common goal in comparative effectiveness research is to estimate treatment effects on pre-specified subpopulations of patients. Though widely used in medical research, causal inference methods for such subgroup analysis remain underdeveloped, particularly in observational studies. In this article, we develop a suite of analytical methods and visualization tools for causal subgroup analysis. First, we introduce the estimand of subgroup weighted average treatment effect and provide the corresponding propensity score weighting estimator. We show that balancing covariates within a subgroup bounds the bias of the estimator of subgroup causal effects. Second, we design a new diagnostic graph -- the Connect-S plot -- for visualizing the subgroup covariate balance. Finally, we propose to use the overlap weighting method to achieve exact balance within subgroups. We further propose a method that combines overlap weighting and LASSO, to balance the bias-variance tradeoff in subgroup analysis. Extensive simulation studies are presented to compare the proposed method with several existing methods. We apply the proposed methods to the Patient-centered Results for Uterine Fibroids (COMPARE-UF) registry data to evaluate alternative management options for uterine fibroids for relief of symptoms and quality of life.
△ Less
Submitted 20 March, 2021; v1 submitted 5 October, 2020;
originally announced October 2020.
-
Causal Inference with Spatio-temporal Data: Estimating the Effects of Airstrikes on Insurgent Violence in Iraq
Authors:
Georgia Papadogeorgou,
Kosuke Imai,
Jason Lyall,
Fan Li
Abstract:
Many causal processes have spatial and temporal dimensions. Yet the classic causal inference framework is not directly applicable when the treatment and outcome variables are generated by spatio-temporal point processes. We extend the potential outcomes framework to these settings by formulating the treatment point process as a stochastic intervention. Our causal estimands include the expected num…
▽ More
Many causal processes have spatial and temporal dimensions. Yet the classic causal inference framework is not directly applicable when the treatment and outcome variables are generated by spatio-temporal point processes. We extend the potential outcomes framework to these settings by formulating the treatment point process as a stochastic intervention. Our causal estimands include the expected number of outcome events in a specified area under a particular stochastic treatment assignment strategy. Our methodology allows for arbitrary patterns of spatial spillover and temporal carryover effects. Using martingale theory, we show that the proposed estimator is consistent and asymptotically normal as the number of time periods increases. We propose a sensitivity analysis for the possible existence of unmeasured confounders, and extend it to the Hajek estimator. Simulation studies are conducted to examine the estimators' finite sample performance. Finally, we illustrate the proposed methods by estimating the effects of American airstrikes on insurgent violence in Iraq from February 2007 to July 2008. Our analysis suggests that increasing the average number of daily airstrikes for up to one month may result in more insurgent attacks. We also find some evidence that airstrikes can displace attacks from Baghdad to new locations up to 400 kilometers away
△ Less
Submitted 8 June, 2022; v1 submitted 30 March, 2020;
originally announced March 2020.
-
Soft Tensor Regression
Authors:
Georgia Papadogeorgou,
Zhengwu Zhang,
David B. Dunson
Abstract:
Statistical methods relating tensor predictors to scalar outcomes in a regression model generally vectorize the tensor predictor and estimate the coefficients of its entries employing some form of regularization, use summaries of the tensor covariate, or use a low dimensional approximation of the coefficient tensor. However, low rank approximations of the coefficient tensor can suffer if the true…
▽ More
Statistical methods relating tensor predictors to scalar outcomes in a regression model generally vectorize the tensor predictor and estimate the coefficients of its entries employing some form of regularization, use summaries of the tensor covariate, or use a low dimensional approximation of the coefficient tensor. However, low rank approximations of the coefficient tensor can suffer if the true rank is not small. We propose a tensor regression framework which assumes a soft version of the parallel factors (PARAFAC) approximation. In contrast to classic PARAFAC, where each entry of the coefficient tensor is the sum of products of row-specific contributions across the tensor modes, the soft tensor regression (Softer) framework allows the row-specific contributions to vary around an overall mean. We follow a Bayesian approach to inference, and show that softening the PARAFAC increases model flexibility, leads to improved estimation of coefficient tensors, more accurate identification of important predictor entries, and more precise predictions, even for a low approximation rank. From a theoretical perspective, we show that employing Softer leads to a weakly consistent posterior distribution of the coefficient tensor, irrespective of the true or approximation tensor rank, a result that is not true when employing the classic PARAFAC for tensor regression. In the context of our motivating application, we adapt Softer to symmetric and semi-symmetric tensor predictors and analyze the relationship between brain network characteristics and human traits.soft
△ Less
Submitted 28 July, 2021; v1 submitted 21 October, 2019;
originally announced October 2019.
-
Mitigating Unobserved Spatial Confounding when Estimating the Effect of Supermarket Access on Cardiovascular Disease Deaths
Authors:
Patrick Schnell,
Georgia Papadogeorgou
Abstract:
Confounding by unmeasured spatial variables has received some attention in the spatial statistics and causal inference literatures, but concepts and approaches have remained largely separated. In this paper, we aim to bridge these distinct strands of statistics by considering unmeasured spatial confounding within a causal inference framework, and estimating effects using outcome regression tools p…
▽ More
Confounding by unmeasured spatial variables has received some attention in the spatial statistics and causal inference literatures, but concepts and approaches have remained largely separated. In this paper, we aim to bridge these distinct strands of statistics by considering unmeasured spatial confounding within a causal inference framework, and estimating effects using outcome regression tools popular within the spatial literature. First, we show how using spatially correlated random effects in the outcome model, an approach common among spatial statisticians, does not necessarily mitigate bias due to spatial confounding, a previously published but not universally known result. Motivated by the bias term of commonly-used estimators, we propose an affine estimator which addresses this deficiency. We discuss how unbiased estimation of causal parameters in the presence of unmeasured spatial confounding can only be achieved under an untestable set of assumptions which will often be application-specific. We provide a set of assumptions which describe how the exposure and outcome of interest relate to the unmeasured variables, and we show that this set of assumptions is sufficient for identification of the causal effect based on the observed data when spatial dependencies can be represented by a ring graph. We implement our method using a fully Bayesian approach applicable to any type of outcome variable. This work is motivated by and used to estimate the effect of county-level limited access to supermarkets on the rate of cardiovascular disease deaths in the elderly across the whole continental United States. Even though standard approaches return null or protective effects, our approach uncovers evidence of unobserved spatial confounding, and indicates that limited supermarket access has a harmful effect on cardiovascular mortality.
△ Less
Submitted 1 June, 2020; v1 submitted 28 July, 2019;
originally announced July 2019.
-
Evaluating Federal Policies Using Bayesian Time Series Models: Estimating the Causal Impact of the Hospital Readmissions Reduction Program
Authors:
Georgia Papadogeorgou,
Fiammetta Menchetti,
Christine Choirat,
Jason H. Wasfy,
Corwin M. Zigler,
Fabrizia Mealli
Abstract:
Researchers are often faced with evaluating the effect of a policy or program that was simultaneously initiated across an entire population of units at a single point in time, and its effects over the targeted population can manifest at any time period afterwards. In the presence of data measured over time, Bayesian time series models have been used to impute what would have happened after the pol…
▽ More
Researchers are often faced with evaluating the effect of a policy or program that was simultaneously initiated across an entire population of units at a single point in time, and its effects over the targeted population can manifest at any time period afterwards. In the presence of data measured over time, Bayesian time series models have been used to impute what would have happened after the policy was initiated, had the policy not taken place, in order to estimate causal effects. However, the considerations regarding the definition of the target estimands, the underlying assumptions, the plausibility of such assumptions, and the choice of an appropriate model have not been thoroughly investigated. In this paper, we establish useful estimands for the evaluation of large-scale policies. We discuss that imputation of missing potential outcomes relies on an assumption which, even though untestable, can be partially evaluated using observed data. We illustrate an approach to evaluate this key causal assumption and facilitate model elicitation based on data from the time interval before policy initiation and using classic statistical techniques. As an illustration, we study the Hospital Readmissions Reduction Program (HRRP), a US federal intervention aiming to improve health outcomes for patients with pneumonia, acute myocardial infraction, or congestive failure admitted to a hospital. We evaluate the effect of the HRRP on population mortality among the elderly across the US and in four geographic subregions, and at different time windows. We find that the HRRP increased mortality from pneumonia and acute myocardial infraction across at least one geographical region and time horizon, and is likely to have had a detrimental effect on public health.
△ Less
Submitted 28 October, 2022; v1 submitted 13 September, 2018;
originally announced September 2018.
-
Bipartite Causal Inference with Interference
Authors:
Corwin M. Zigler,
Georgia Papadogeorgou
Abstract:
Statistical methods to evaluate the effectiveness of interventions are increasingly challenged by the inherent interconnectedness of units. Specifically, a recent flurry of methods research has addressed the problem of interference between observations, which arises when one observational unit's outcome depends not only on its treatment but also the treatment assigned to other units. We introduce…
▽ More
Statistical methods to evaluate the effectiveness of interventions are increasingly challenged by the inherent interconnectedness of units. Specifically, a recent flurry of methods research has addressed the problem of interference between observations, which arises when one observational unit's outcome depends not only on its treatment but also the treatment assigned to other units. We introduce the setting of bipartite causal inference with interference, which arises when 1) treatments are defined on observational units that are distinct from those at which outcomes are measured and 2) there is interference between units in the sense that outcomes for some units depend on the treatments assigned to many other units. Basic definitions and formulations are provided for this setting, highlighting similarities and differences with more commonly considered settings of causal inference with interference. Several types of causal estimands are discussed, and a simple inverse probability of treatment weighted estimator is developed for a subset of simplified estimands. The estimators are deployed to evaluate how interventions to reduce air pollution from 473 power plants in the U.S. causally affect cardiovascular hospitalization among Medicare beneficiaries residing at 23,458 zip code locations.
△ Less
Submitted 23 July, 2018;
originally announced July 2018.
-
A causal exposure response function with local adjustment for confounding: Estimating health effects of exposure to low levels of ambient fine particulate matter
Authors:
Georgia Papadogeorgou,
Francesca Dominici
Abstract:
The Clean Air Act mandates that the National Ambient Air Quality Standards (NAAQS) must be routinely assessed to protect populations based on the latest science. Therefore, researchers should continue to address whether exposure to levels of air pollution below the NAAQS is harmful to human health. The contentious nature surrounding environmental regulations urges us to cast this question within a…
▽ More
The Clean Air Act mandates that the National Ambient Air Quality Standards (NAAQS) must be routinely assessed to protect populations based on the latest science. Therefore, researchers should continue to address whether exposure to levels of air pollution below the NAAQS is harmful to human health. The contentious nature surrounding environmental regulations urges us to cast this question within a causal inference framework. Parametric and semi-parametric regression approaches have been used to estimate the exposure-response (ER) curve between ambient air pollution and health outcomes. Most of these approaches are not formulated within a causal framework, adjust for the same covariates across all levels of exposure, and do not account for model uncertainty. We introduce a Bayesian framework for the estimation of a causal ER curve called LERCA (Local Exposure Response Confounding Adjustment), which allows for different confounders and different strength of confounding at the different exposure levels; and propagates uncertainty regarding confounders' selection and the shape of the ER. LERCA provides a principled way of assessing the covariates' confounding importance at different exposure levels, providing researchers with information regarding the variables to adjust for in regression models. Using simulations, we show that state of the art approaches perform poorly in estimating the ER curve in the presence of local confounding. LERCA is used to evaluate the relationship between exposure to ambient PM2.5 and cardiovascular hospitalizations for 5,362 zip codes in the US, while adjusting for a potentially varying set of confounders across the exposure range. Ambient PM2.5 leads to an increase in cardiovascular hospitalization rates when focusing at the low exposure range. Our results indicate that there is no threshold for the effect of PM2.5 on cardiovascular hospitalizations.
△ Less
Submitted 8 January, 2020; v1 submitted 3 June, 2018;
originally announced June 2018.
-
Causal Inference in high dimensions: A marriage between Bayesian modeling and good frequentist properties
Authors:
Joseph Antonelli,
Georgia Papadogeorgou,
Francesca Dominici
Abstract:
We introduce a framework for estimating causal effects of binary and continuous treatments in high dimensions. We show how posterior distributions of treatment and outcome models can be used together with doubly robust estimators. We propose an approach to uncertainty quantification for the doubly robust estimator which utilizes posterior distributions of model parameters and (1) results in good f…
▽ More
We introduce a framework for estimating causal effects of binary and continuous treatments in high dimensions. We show how posterior distributions of treatment and outcome models can be used together with doubly robust estimators. We propose an approach to uncertainty quantification for the doubly robust estimator which utilizes posterior distributions of model parameters and (1) results in good frequentist properties in small samples, (2) is based on a single MCMC, and (3) improves over frequentist measures of uncertainty which rely on asymptotic properties. We show that our proposed variance estimation strategy is consistent when both models are correctly specified and that it is conservative in finite samples or when one or both models are misspecified. We consider a flexible framework for modeling the treatment and outcome processes within the Bayesian paradigm that reduces model dependence, accommodates nonlinearity, and achieves dimension reduction of the covariate space. We illustrate the ability of the proposed approach to flexibly estimate causal effects in high dimensions and appropriately quantify uncertainty, and show that it performs well relative to existing approaches. Finally, we estimate the effect of continuous environmental exposures on cholesterol and triglyceride levels. An R package is available at github.com/jantonelli111/DoublyRobustHD.
△ Less
Submitted 2 October, 2020; v1 submitted 13 May, 2018;
originally announced May 2018.
-
Causal inference for interfering units with cluster and population level treatment allocation programs
Authors:
Georgia Papadogeorgou,
Fabrizia Mealli,
Corwin M. Zigler
Abstract:
Interference arises when an individual's potential outcome depends on the individual treatment level, but also on the treatment level of others. A common assumption in the causal inference literature in the presence of interference is partial interference, implying that the population can be partitioned in clusters of individuals whose potential outcomes only depend on the treatment of units withi…
▽ More
Interference arises when an individual's potential outcome depends on the individual treatment level, but also on the treatment level of others. A common assumption in the causal inference literature in the presence of interference is partial interference, implying that the population can be partitioned in clusters of individuals whose potential outcomes only depend on the treatment of units within the same cluster. Previous literature has defined average potential outcomes under counterfactual scenarios where treatments are randomly allocated to units within a cluster. However, within clusters there may be units that are more or less likely to receive treatment based on covariates or neighbors' treatment. We define new estimands that describe average potential outcomes for realistic counterfactual treatment allocation programs, extending existing estimands to take into consideration the units' covariates and dependence between units' treatment assignment. We further propose entirely new estimands for population-level interventions over the collection of clusters, which correspond in the motivating setting to regulations at the federal (vs. cluster or regional) level. We discuss these estimands, propose unbiased estimators and derive asymptotic results as the number of clusters grows. Finally, we estimate effects in a comparative effectiveness study of power plant emission reduction technologies on ambient ozone pollution.
△ Less
Submitted 14 May, 2018; v1 submitted 3 November, 2017;
originally announced November 2017.
-
Adjusting for Unmeasured Spatial Confounding with Distance Adjusted Propensity Score Matching
Authors:
Georgia Papadogeorgou,
Christine Choirat,
Corwin Zigler
Abstract:
Propensity score matching is a common tool for adjusting for observed confounding in observational studies, but is known to have limitations in the presence of unmeasured confounding. In many settings, researchers are confronted with spatially-indexed data where the relative locations of the observational units may serve as a useful proxy for unmeasured confounding that varies according to a spati…
▽ More
Propensity score matching is a common tool for adjusting for observed confounding in observational studies, but is known to have limitations in the presence of unmeasured confounding. In many settings, researchers are confronted with spatially-indexed data where the relative locations of the observational units may serve as a useful proxy for unmeasured confounding that varies according to a spatial pattern. We develop a new method, termed Distance Adjusted Propensity Score Matching (DAPSm) that incorporates information on units' spatial proximity into a propensity score matching procedure. We show that DAPSm can adjust for both observed and some forms of unobserved confounding and evaluate its performance relative to several other reasonable alternatives for incorporating spatial information into propensity score adjustment. The method is motivated by and applied to a comparative effectiveness investigation of power plant emission reduction technologies designed to reduce population exposure to ambient ozone pollution. Ultimately, DAPSm provides a framework for augmenting a "standard" propensity score analysis with information on spatial proximity and provides a transparent and principled way to assess the relative trade offs of prioritizing observed confounding adjustment versus spatial proximity adjustment.
△ Less
Submitted 6 December, 2017; v1 submitted 24 October, 2016;
originally announced October 2016.