Search | arXiv e-print repository

Calibrated sensitivity models

Authors: Alec McClean, Zach Branson, Edward H. Kennedy

Abstract: In causal inference, sensitivity models assess how unmeasured confounders could alter causal analyses, but the sensitivity parameter -- which quantifies the degree of unmeasured confounding -- is often difficult to interpret. For this reason, researchers sometimes compare the sensitivity parameter to an estimate for measured confounding. This is known as calibration. Although calibration can aid i… ▽ More In causal inference, sensitivity models assess how unmeasured confounders could alter causal analyses, but the sensitivity parameter -- which quantifies the degree of unmeasured confounding -- is often difficult to interpret. For this reason, researchers sometimes compare the sensitivity parameter to an estimate for measured confounding. This is known as calibration. Although calibration can aid interpretation, it is typically conducted post hoc, and uncertainty in the point estimate for measured confounding is rarely accounted for. To address these limitations, we propose novel calibrated sensitivity models, which directly bound the degree of unmeasured confounding by a multiple of measured confounding. The calibrated sensitivity parameter is interpretable as an intuitive unit-less ratio of unmeasured to measured confounding, and uncertainty due to estimating measured confounding can be incorporated. Incorporating this uncertainty shows causal analyses can be less or more robust to unmeasured confounding than would have been suggested by standard approaches. We develop efficient estimators and inferential methods for bounds on the average treatment effect with three calibrated sensitivity models, establishing parametric efficiency and asymptotic normality under doubly robust style nonparametric conditions. We illustrate our methods with a data analysis of the effect of mothers' smoking on infant birthweight. △ Less

Submitted 7 June, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

arXiv:2403.12815 [pdf, ps, other]

A Unified Framework for Rerandomization using Quadratic Forms

Authors: Kyle Schindl, Zach Branson

Abstract: In the design stage of a randomized experiment, one way to ensure treatment and control groups exhibit similar covariate distributions is to randomize treatment until some prespecified level of covariate balance is satisfied. This experimental design strategy is known as rerandomization. Most rerandomization methods utilize balance metrics based on a quadratic form $v^TAv$ , where $v$ is a vector… ▽ More In the design stage of a randomized experiment, one way to ensure treatment and control groups exhibit similar covariate distributions is to randomize treatment until some prespecified level of covariate balance is satisfied. This experimental design strategy is known as rerandomization. Most rerandomization methods utilize balance metrics based on a quadratic form $v^TAv$ , where $v$ is a vector of covariate mean differences and $A$ is a positive semi-definite matrix. In this work, we derive general results for treatment-versus-control rerandomization schemes that employ quadratic forms for covariate balance. In addition to allowing researchers to quickly derive properties of rerandomization schemes not previously considered, our theoretical results provide guidance on how to choose the matrix $A$ in practice. We find the Mahalanobis and Euclidean distances optimize different measures of covariate balance. Furthermore, we establish how the covariates' eigenstructure and their relationship to the outcomes dictates which matrix $A$ yields the most precise mean-difference estimator for the average treatment effect. We find that the Euclidean distance is minimax optimal, in the sense that the mean-difference estimator's precision is never too far from the optimal choice, regardless of the relationship between covariates and outcomes. Our theoretical results are verified via simulation, where we find that rerandomization using the Euclidean distance has better performance in high-dimensional settings and typically achieves greater variance reduction to the mean-difference estimator than other quadratic forms. △ Less

Submitted 19 March, 2024; originally announced March 2024.

arXiv:2309.00706 [pdf, other]

Causal Effect Estimation after Propensity Score Trimming with Continuous Treatments

Authors: Zach Branson, Edward H. Kennedy, Sivaraman Balakrishnan, Larry Wasserman

Abstract: Most works in causal inference focus on binary treatments where one estimates a single treatment-versus-control effect. When treatment is continuous, one must estimate a curve representing the causal relationship between treatment and outcome (the "dose-response curve"), which makes causal inference more challenging. This work proposes estimators using efficient influence functions (EIFs) for caus… ▽ More Most works in causal inference focus on binary treatments where one estimates a single treatment-versus-control effect. When treatment is continuous, one must estimate a curve representing the causal relationship between treatment and outcome (the "dose-response curve"), which makes causal inference more challenging. This work proposes estimators using efficient influence functions (EIFs) for causal dose-response curves after propensity score trimming. Trimming involves estimating causal effects among subjects with propensity scores above a threshold, which addresses positivity violations that complicate estimation. Several challenges arise with continuous treatments. First, EIFs for trimmed dose-response curves do not exist, due to a lack of pathwise differentiability induced by trimming and a continuous treatment. Second, if the trimming threshold is not prespecified and is instead a parameter that must be estimated, then estimation uncertainty in the threshold must be accounted for. To address these challenges, we target a smoothed version of the trimmed dose-response curve for which an EIF exists. We allow the trimming threshold to be a user-specified quantile of the propensity score distribution, and we construct confidence intervals which reflect uncertainty involved in threshold estimation. Our resulting EIF-based estimators exhibit doubly-robust style guarantees, with error involving products or squares of errors for the outcome regression and propensity score. Thus, our estimators can exhibit parametric convergence rates even when the outcome regression and propensity score are estimated at slower nonparametric rates with flexible estimators. These findings are validated via simulation and an application, thereby showing how to efficiently-but-flexibly estimate a dose-response curve after trimming. △ Less

Submitted 1 September, 2023; originally announced September 2023.

arXiv:2305.14040 [pdf]

doi 10.1007/s10940-024-09582-7

Incremental Propensity Score Effects for Criminology: An Application Assessing the Relationship Between Homelessness, Behavioral Health Problems, and Recidivism

Authors: Leah A. Jacobs, Alec McClean, Zach Branson, Edward H. Kennedy, Alex Fixler

Abstract: This study examines the relationship between homelessness and recidivism among people on probation with and without behavioral health problems. The study also illustrates a new way to summarize the effect of an exposure on an outcome, the Incremental Propensity Score (IPS) effect, which avoids pitfalls of other approaches commonly used in criminology. We assessed the impact of homelessness at prob… ▽ More This study examines the relationship between homelessness and recidivism among people on probation with and without behavioral health problems. The study also illustrates a new way to summarize the effect of an exposure on an outcome, the Incremental Propensity Score (IPS) effect, which avoids pitfalls of other approaches commonly used in criminology. We assessed the impact of homelessness at probation start on rearrest within one year among a cohort of people on probation (n = 2,453). We estimated IPS effects, considering general and crime-specific recidivism if subjects were more or less likely to be unhoused, and assessed effect variation by behavioral health problem status. We used a doubly robust machine learning estimator to flexibly but efficiently estimate effects. A substantial intervention -- reducing homelessness by roughly 65% -- corresponded to a 9% reduction in the estimated average rate of recidivism (p < .05). Milder interventions showed smaller, non-significant effect sizes. Stratifying by behavioral health problem and rearrest type led to similar results without statistical significance. Minding limitations related to observational data and generalizability, this study suggests large reductions in homelessness lead to significant reductions in rearrest rates. Efforts to reduce recidivism should include interventions that make homelessness less likely, but notable differences in recidivism will require these interventions be sizable. Meanwhile, efforts to establish recidivism risk factors should consider alternative effects, like IPS effects, to maximize validity and reduce bias. △ Less

Submitted 8 February, 2024; v1 submitted 23 May, 2023; originally announced May 2023.

arXiv:2212.03578 [pdf, other]

Nonparametric Estimation of Conditional Incremental Effects

Authors: Alec McClean, Zach Branson, Edward H. Kennedy

Abstract: Conditional effect estimation has great scientific and policy importance because interventions may impact subjects differently depending on their characteristics. Most research has focused on estimating the conditional average treatment effect (CATE). However, identification of the CATE requires all subjects have a non-zero probability of receiving treatment, or positivity, which may be unrealisti… ▽ More Conditional effect estimation has great scientific and policy importance because interventions may impact subjects differently depending on their characteristics. Most research has focused on estimating the conditional average treatment effect (CATE). However, identification of the CATE requires all subjects have a non-zero probability of receiving treatment, or positivity, which may be unrealistic in practice. Instead, we propose conditional effects based on incremental propensity score interventions, which are stochastic interventions where the odds of treatment are multiplied by some factor. These effects do not require positivity for identification and can be better suited for modeling scenarios in which people cannot be forced into treatment. We develop a projection estimator and a flexible nonparametric estimator that can each estimate all the conditional effects we propose and derive model-agnostic error guarantees showing both estimators satisfy a form of double robustness. Further, we propose a summary of treatment effect heterogeneity and a test for any effect heterogeneity based on the variance of a conditional derivative effect and derive a nonparametric estimator that also satisfies a form of double robustness. Finally, we demonstrate our estimators by analyzing the effect of intensive care unit admission on mortality using a dataset from the (SPOT)light study. △ Less

Submitted 24 April, 2023; v1 submitted 7 December, 2022; originally announced December 2022.

arXiv:2210.08272 [pdf, other]

Heterogeneous interventional indirect effects with multiple mediators: non-parametric and semi-parametric approaches

Authors: Max Rubinstein, Zach Branson, Edward H. Kennedy

Abstract: We propose semi- and non-parametric methods to estimate conditional interventional effects in the setting of two discrete mediators whose causal ordering is unknown. Average interventional indirect effects have been shown to decompose an average treatment effect into a direct effect and interventional indirect effects that quantify effects of hypothetical interventions on mediator distributions. Y… ▽ More We propose semi- and non-parametric methods to estimate conditional interventional effects in the setting of two discrete mediators whose causal ordering is unknown. Average interventional indirect effects have been shown to decompose an average treatment effect into a direct effect and interventional indirect effects that quantify effects of hypothetical interventions on mediator distributions. Yet these effects may be heterogeneous across the covariate distribution. We consider the problem of estimating these effects at particular points. We propose an influence-function based estimator of the projection of the conditional effects onto a working model, and show under some conditions that we can achieve root-n consistent and asymptotically normal estimates. Second, we propose a fully non-parametric approach to estimation and show the conditions where this approach can achieve oracle rates of convergence. Finally, we propose a sensitivity analysis for the conditional effects in the presence of mediator-outcome confounding. We propose estimating bounds on the conditional effects using these same methods, and show that these results easily extend to allow for influence-function based estimates of the bounds on the average effects. We conclude examining heterogeneous effects with respect to the effect of COVID-19 vaccinations on depression during February 2021. △ Less

Submitted 18 April, 2023; v1 submitted 15 October, 2022; originally announced October 2022.

arXiv:2201.02486 [pdf, other]

Power and Sample Size Calculations for Rerandomization

Authors: Zach Branson, Xinran Li, Peng Ding

Abstract: Power analyses are an important aspect of experimental design, because they help determine how experiments are implemented in practice. It is common to specify a desired level of power and compute the sample size necessary to obtain that power. Such calculations are well-known for completely randomized experiments, but there can be many benefits to using other experimental designs. For example, it… ▽ More Power analyses are an important aspect of experimental design, because they help determine how experiments are implemented in practice. It is common to specify a desired level of power and compute the sample size necessary to obtain that power. Such calculations are well-known for completely randomized experiments, but there can be many benefits to using other experimental designs. For example, it has recently been established that rerandomization, where subjects are randomized until covariate balance is obtained, increases the precision of causal effect estimators. This work establishes the power of rerandomized treatment-control experiments, thereby allowing for sample size calculators. We find the surprising result that, while power is often greater under rerandomization than complete randomization, the opposite can occur for very small treatment effects. The reason is that inference under rerandomization can be relatively more conservative, in the sense that it can have a lower type-I error at the same nominal significance level, and this additional conservativeness adversely affects power. This surprising result is due to treatment effect heterogeneity, a quantity often ignored in power analyses. We find that heterogeneity increases power for large effect sizes but decreases power for small effect sizes. △ Less

Submitted 8 December, 2022; v1 submitted 7 January, 2022; originally announced January 2022.

Comments: 35 pages, 6 figures

arXiv:2110.10532 [pdf, other]

Incremental causal effects: an introduction and review

Authors: Matteo Bonvini, Alec McClean, Zach Branson, Edward H. Kennedy

Abstract: In this chapter, we review the class of causal effects based on incremental propensity scores interventions proposed by Kennedy [2019]. The aim of incremental propensity score interventions is to estimate the effect of increasing or decreasing subjects' odds of receiving treatment; this differs from the average treatment effect, where the aim is to estimate the effect of everyone deterministically… ▽ More In this chapter, we review the class of causal effects based on incremental propensity scores interventions proposed by Kennedy [2019]. The aim of incremental propensity score interventions is to estimate the effect of increasing or decreasing subjects' odds of receiving treatment; this differs from the average treatment effect, where the aim is to estimate the effect of everyone deterministically receiving versus not receiving treatment. We first present incremental causal effects for the case when there is a single binary treatment, such that it can be compared to average treatment effects and thus shed light on key concepts. In particular, a benefit of incremental effects is that positivity - a common assumption in causal inference - is not needed to identify causal effects. Then we discuss the more general case where treatment is measured at multiple time points, where positivity is more likely to be violated and thus incremental effects can be especially useful. Throughout, we motivate incremental effects with real-world applications, present nonparametric estimators for these effects, and discuss their efficiency properties, while also briefly reviewing the role of influence functions in functional estimation. Finally, we show how to interpret and analyze results using these estimators in practice, and discuss extensions and future directions. △ Less

Submitted 20 October, 2021; originally announced October 2021.

Comments: Matteo Bonvini and Alec McClean contributed equally

arXiv:1907.01943 [pdf, other]

Evaluating A Key Instrumental Variable Assumption Using Randomization Tests

Authors: Zach Branson, Luke Keele

Abstract: Instrumental variable (IV) analyses are becoming common in health services research and epidemiology. Most IV analyses use naturally occurring instruments, such as distance to a hospital. In these analyses, investigators must assume the instrument is as-if randomly assigned. This assumption cannot be tested directly, but it can be falsified. Most falsification tests in the literature compare relat… ▽ More Instrumental variable (IV) analyses are becoming common in health services research and epidemiology. Most IV analyses use naturally occurring instruments, such as distance to a hospital. In these analyses, investigators must assume the instrument is as-if randomly assigned. This assumption cannot be tested directly, but it can be falsified. Most falsification tests in the literature compare relative prevalence or bias in observed covariates between the instrument and the exposure. These tests require investigators to make a covariate-by-covariate judgment about the validity of the IV design. Often, only some of the covariates are well-balanced, making it unclear if as-if randomization can be assumed for the instrument across all covariates. We propose an alternative falsification test that compares IV balance or bias to the balance or bias that would have been produced under randomization. A key advantage of our test is that it allows for global balance measures as well as easily interpretable graphical comparisons. Furthermore, our test does not rely on any parametric assumptions and can be used to validly assess if the instrument is significantly closer to being as-if randomized than the exposure. We demonstrate our approach on a recent IV application that uses bed availability in the intensive care unit (ICU) as an instrument for admission to the ICU. △ Less

Submitted 3 July, 2019; originally announced July 2019.

Comments: 19 pages

arXiv:1810.02761 [pdf, other]

The Local Randomization Framework for Regression Discontinuity Designs: A Review and Some Extensions

Authors: Zach Branson, Fabrizia Mealli

Abstract: Regression discontinuity designs (RDDs) are a common quasi-experiment in economics and statistics. The most popular methodologies for analyzing RDDs utilize continuity-based assumptions and local polynomial regression, but recent works have developed alternative assumptions based on local randomization. The local randomization framework avoids modeling assumptions by instead placing assumptions on… ▽ More Regression discontinuity designs (RDDs) are a common quasi-experiment in economics and statistics. The most popular methodologies for analyzing RDDs utilize continuity-based assumptions and local polynomial regression, but recent works have developed alternative assumptions based on local randomization. The local randomization framework avoids modeling assumptions by instead placing assumptions on the assignment mechanism near the cutoff. However, most works have focused on completely randomized assignment mechanisms, which posit that propensity scores are equal for all units near the cutoff. In our review of the local randomization framework, we extend the framework to allow for any assignment mechanism, such that propensity scores may differ. We outline randomization tests that can be used to select a window around the cutoff where a particular assignment mechanism is most plausible, as well as methodologies for estimating causal effects after a window and assignment mechanism are chosen. We apply our methodology to a fuzzy RDD assessing the effects of financial aid on college dropout rates in Italy. We find that positing different assignment mechanisms within a single RDD can provide more nuanced sensitivity analyses as well as more precise inferences for causal effects. △ Less

Submitted 5 November, 2019; v1 submitted 5 October, 2018; originally announced October 2018.

Comments: 28 pages, 2 figures

arXiv:1808.04513 [pdf, other]

Ridge Rerandomization: An Experimental Design Strategy in the Presence of Collinearity

Authors: Zach Branson, Stephane Shao

Abstract: Randomization ensures that observed and unobserved covariates are balanced, on average. However, randomizing units to treatment and control often leads to covariate imbalances in realization, and such imbalances can inflate the variance of estimators of the treatment effect. One solution to this problem is rerandomization---an experimental design strategy that randomizes units until some balance c… ▽ More Randomization ensures that observed and unobserved covariates are balanced, on average. However, randomizing units to treatment and control often leads to covariate imbalances in realization, and such imbalances can inflate the variance of estimators of the treatment effect. One solution to this problem is rerandomization---an experimental design strategy that randomizes units until some balance criterion is fulfilled---which yields more precise estimators of the treatment effect if covariates are correlated with the outcome. Most rerandomization schemes in the literature utilize the Mahalanobis distance, which may not be preferable when covariates are correlated or vary in importance. As an alternative, we introduce an experimental design strategy called ridge rerandomization, which utilizes a modified Mahalanobis distance that addresses collinearities among covariates and automatically places a hierarchy of importance on the covariates according to their eigenstructure. This modified Mahalanobis distance has connections to principal components and the Euclidean distance, and---to our knowledge---has remained unexplored. We establish several theoretical properties of this modified Mahalanobis distance and our ridge rerandomization scheme. These results guarantee that ridge rerandomization is preferable over randomization and suggest when ridge rerandomization is preferable over standard rerandomization schemes. We also provide simulation evidence that suggests that ridge rerandomization is particularly preferable over typical rerandomization schemes in high-dimensional or high-collinearity settings. △ Less

Submitted 9 February, 2020; v1 submitted 13 August, 2018; originally announced August 2018.

Comments: 33 pages, 8 figures

arXiv:1808.01691 [pdf, other]

Sampling-based randomized designs for causal inference under the potential outcomes framework

Authors: Zach Branson, Tirthankar Dasgupta

Abstract: We establish the inferential properties of the mean-difference estimator for the average treatment effect in randomized experiments where each unit in a population is randomized to one of two treatments and then units within treatment groups are randomly sampled. The properties of this estimator are well-understood in the experimental design scenario where first units are randomly sampled and then… ▽ More We establish the inferential properties of the mean-difference estimator for the average treatment effect in randomized experiments where each unit in a population is randomized to one of two treatments and then units within treatment groups are randomly sampled. The properties of this estimator are well-understood in the experimental design scenario where first units are randomly sampled and then treatment is randomly assigned, but not for the aforementioned scenario where the sampling and treatment assignment stages are reversed. We find that the inferential properties of the mean-difference estimator under this experimental design scenario are identical to those under the more common sample-first-randomize-second design. This finding will bring some clarifications about sampling-based randomized designs for causal inference, particularly for settings where there is a finite super-population. Finally, we explore to what extent pre-treatment measurements can be used to improve upon the mean-difference estimator for this randomize-first-sample-second design. Unfortunately, we find that pre-treatment measurements are often unhelpful in improving the precision of average treatment effect estimators under this design, unless a large number of pre-treatment measurements that are highly associative with the post-treatment measurements can be obtained. We confirm these results using a simulation study based on a real experiment in nanomaterials. △ Less

Submitted 16 February, 2019; v1 submitted 5 August, 2018; originally announced August 2018.

Comments: 30 pages, 3 figures

arXiv:1807.04516 [pdf, other]

A Bayesian Nonparametric Approach to Geographic Regression Discontinuity Designs: Do School Districts Affect NYC House Prices?

Authors: Maxime Rischard, Zach Branson, Luke Miratrix, Luke Bornn

Abstract: Most research on regression discontinuity designs (RDDs) has focused on univariate cases, where only those units with a "forcing" variable on one side of a threshold value receive a treatment. Geographical regression discontinuity designs (GeoRDDs) extend the RDD to multivariate settings with spatial forcing variables. We propose a framework for analysing GeoRDDs, which we implement using Gaussian… ▽ More Most research on regression discontinuity designs (RDDs) has focused on univariate cases, where only those units with a "forcing" variable on one side of a threshold value receive a treatment. Geographical regression discontinuity designs (GeoRDDs) extend the RDD to multivariate settings with spatial forcing variables. We propose a framework for analysing GeoRDDs, which we implement using Gaussian process regression. This yields a Bayesian posterior distribution of the treatment effect at every point along the border. We address nuances of having a functional estimand defind on a border with potentially intricate topology, particularly when defining and estimating causal estimands of the local average treatment effect (LATE). The Bayesian estimate of the LATE can also be used as a test statistic in a hypothesis test with good frequentist properties, which we validate using simulations and placebo tests. We demonstrate our methodology with a dataset of property sales in New York City, to assess whether there is a discontinuity in housing prices at the border between two school district. We find a statistically significant difference in price across the border between the districts with $p$=0.002, and estimate a 20% higher price on average for a house on the more desirable side. △ Less

Submitted 11 July, 2018; originally announced July 2018.

Comments: 40 pages, 12 figures

arXiv:1804.08760 [pdf, other]

Randomization Tests to Assess Covariate Balance When Designing and Analyzing Matched Datasets

Authors: Zach Branson

Abstract: Causal analyses for observational studies are often complicated by covariate imbalances among treatment groups, and matching methodologies alleviate this complication by finding subsets of treatment groups that exhibit covariate balance. It is widely agreed upon that covariate balance can serve as evidence that a matched dataset approximates a randomized experiment, but what kind of experiment doe… ▽ More Causal analyses for observational studies are often complicated by covariate imbalances among treatment groups, and matching methodologies alleviate this complication by finding subsets of treatment groups that exhibit covariate balance. It is widely agreed upon that covariate balance can serve as evidence that a matched dataset approximates a randomized experiment, but what kind of experiment does a matched dataset approximate? In this work, we develop a randomization test for the hypothesis that a matched dataset approximates a particular experimental design, such as complete randomization, block randomization, or rerandomization. Our test can incorporate any experimental design, and it allows for a graphical display that puts several designs on the same univariate scale, thereby allowing researchers to pinpoint which design -- if any -- is most appropriate for a matched dataset. After researchers determine a plausible design, we recommend a randomization-based approach for analyzing the matched data, which can incorporate any design and treatment effect estimator. Through simulation, we find that our test can frequently detect violations of randomized assignment that harm inferential results. Furthermore, through simulation and a real application in political science, we find that matched datasets with high levels of covariate balance tend to approximate balance-constrained designs like rerandomization, and analyzing them as such can lead to precise causal analyses. However, assuming a precise design should be proceeded with caution, because it can harm inferential results if there are still substantial biases due to remaining imbalances after matching. Our approach is implemented in the randChecks R package, available on CRAN. △ Less

Submitted 15 February, 2021; v1 submitted 23 April, 2018; originally announced April 2018.

Comments: 28 pages, 3 figures

Journal ref: Observational Studies 7 (2021) 1-36

arXiv:1802.01018 [pdf, other]

doi 10.1515/jci-2018-0004

Randomization Tests that Condition on Non-Categorical Covariate Balance

Authors: Zach Branson, Luke Miratrix

Abstract: A benefit of randomized experiments is that covariate distributions of treatment and control groups are balanced on average, resulting in simple unbiased estimators for treatment effects. However, it is possible that a particular randomization yields covariate imbalances that researchers want to address in the analysis stage through adjustment or other methods. Here we present a randomization test… ▽ More A benefit of randomized experiments is that covariate distributions of treatment and control groups are balanced on average, resulting in simple unbiased estimators for treatment effects. However, it is possible that a particular randomization yields covariate imbalances that researchers want to address in the analysis stage through adjustment or other methods. Here we present a randomization test that conditions on covariate balance by only considering treatment assignments that are similar to the observed one in terms of covariate balance. Previous conditional randomization tests have only allowed for categorical covariates, while our randomization test allows for any type of covariate. Through extensive simulation studies, we find that our conditional randomization test is more powerful than unconditional randomization tests and other conditional tests. Furthermore, we find that our conditional randomization test is valid (1) unconditionally across levels of covariate balance, and (2) conditional on particular levels of covariate balance. Meanwhile, unconditional randomization tests are valid for (1) but not (2). Finally, we find that our conditional randomization test is similar to a randomization test that uses a model-adjusted test statistic. △ Less

Submitted 4 October, 2018; v1 submitted 3 February, 2018; originally announced February 2018.

Comments: 54 pages, 12 Figures

arXiv:1707.04136 [pdf, other]

doi 10.1177/0962280218756689

Randomization-based Inference for Bernoulli-Trial Experiments and Implications for Observational Studies

Authors: Zach Branson, Marie-Abele Bind

Abstract: We present a randomization-based inferential framework for experiments characterized by a strongly ignorable assignment mechanism where units have independent probabilities of receiving treatment. Previous works on randomization tests often assume these probabilities are equal within blocks of units. We consider the general case where they differ across units and show how to perform randomization… ▽ More We present a randomization-based inferential framework for experiments characterized by a strongly ignorable assignment mechanism where units have independent probabilities of receiving treatment. Previous works on randomization tests often assume these probabilities are equal within blocks of units. We consider the general case where they differ across units and show how to perform randomization tests and obtain point estimates and confidence intervals. Furthermore, we develop a rejection-sampling algorithm to conduct randomization-based inference conditional on ancillary statistics, covariate balance, or other statistics of interest. Through simulation we demonstrate how our algorithm can yield powerful randomization tests and thus precise inference. Our work also has implications for observational studies, which commonly assume a strongly ignorable assignment mechanism. Most methodologies for observational studies make additional modeling or asymptotic assumptions, while our framework only assumes the strongly ignorable assignment mechanism, and thus can be considered a minimal-assumption approach. △ Less

Submitted 9 December, 2017; v1 submitted 13 July, 2017; originally announced July 2017.

Comments: 39 Pages, 4 Figures, 3 Tables

arXiv:1704.04858 [pdf, other]

doi 10.1016/j.jspi.2019.01.003

A Nonparametric Bayesian Methodology for Regression Discontinuity Designs

Authors: Zach Branson, Maxime Rischard, Luke Bornn, Luke Miratrix

Abstract: One of the most popular methodologies for estimating the average treatment effect at the threshold in a regression discontinuity design is local linear regression (LLR), which places larger weight on units closer to the threshold. We propose a Gaussian process regression methodology that acts as a Bayesian analog to LLR for regression discontinuity designs. Our methodology provides a flexible fit… ▽ More One of the most popular methodologies for estimating the average treatment effect at the threshold in a regression discontinuity design is local linear regression (LLR), which places larger weight on units closer to the threshold. We propose a Gaussian process regression methodology that acts as a Bayesian analog to LLR for regression discontinuity designs. Our methodology provides a flexible fit for treatment and control responses by placing a general prior on the mean response functions. Furthermore, unlike LLR, our methodology can incorporate uncertainty in how units are weighted when estimating the treatment effect. We prove our method is consistent in estimating the average treatment effect at the threshold. Furthermore, we find via simulation that our method exhibits promising coverage, interval length, and mean squared error properties compared to standard LLR and state-of-the-art LLR methodologies. Finally, we explore the performance of our method on a real-world example by studying the impact of being a first-round draft pick on the performance and playing time of basketball players in the National Basketball Association. △ Less

Submitted 30 September, 2018; v1 submitted 16 April, 2017; originally announced April 2017.

Comments: 40 pages, 5 figures, 5 tables

arXiv:1511.01973 [pdf, other]

Improving Covariate Balance in 2^K Factorial Designs via Rerandomization

Authors: Zach Branson, Tirthankar Dasgupta, Donald B. Rubin

Abstract: Factorial designs are widely used in agriculture, engineering, and the social sciences to study the causal effects of several factors simultaneously on a response. The objective of such a design is to estimate all factorial effects of interest, which typically include main effects and interactions among factors. To estimate factorial effects with high precision when a large number of pre-treatment… ▽ More Factorial designs are widely used in agriculture, engineering, and the social sciences to study the causal effects of several factors simultaneously on a response. The objective of such a design is to estimate all factorial effects of interest, which typically include main effects and interactions among factors. To estimate factorial effects with high precision when a large number of pre-treatment covariates are present, balance among covariates across treatment groups should be ensured. We propose utilizing rerandomization to ensure covariate balance in factorial designs. Although both factorial designs and rerandomization have been discussed before, the combination has not. Here, theoretical properties of rerandomization for factorial designs are established, and empirical results are explored using an application from the New York Department of Education. △ Less

Submitted 5 November, 2015; originally announced November 2015.

Showing 1–18 of 18 results for author: Branson, Z