-
Improving instrumental variable estimators with post-stratification
Authors:
Nicole E. Pashley,
Luke Keele,
Luke W. Miratrix
Abstract:
Experiments studying get-out-the-vote (GOTV) efforts estimate the causal effect of various mobilization efforts on voter turnout. However, there is often substantial noncompliance in these studies. A usual approach is to use an instrumental variable (IV) analysis to estimate impacts for compliers, here being those actually contacted by the investigators. Unfortunately, popular IV estimators can be…
▽ More
Experiments studying get-out-the-vote (GOTV) efforts estimate the causal effect of various mobilization efforts on voter turnout. However, there is often substantial noncompliance in these studies. A usual approach is to use an instrumental variable (IV) analysis to estimate impacts for compliers, here being those actually contacted by the investigators. Unfortunately, popular IV estimators can be unstable in studies with a small fraction of compliers. We explore post-stratifying the data (e.g., taking a weighted average of IV estimates within each stratum) using variables that predict complier status (and, potentially, the outcome) to mitigate this. We present the benefits of post-stratification in terms of bias, variance, and improved standard error estimates, and provide a finite-sample asymptotic variance formula. We also compare the performance of different IV approaches and discuss the advantages of our design-based post-stratification approach over incorporating compliance-predictive covariates into the two-stage least squares estimator. In the end, we show that covariates predictive of compliance can increase precision, but only if one is willing to make a bias-variance trade-off by down-weighting or drop** strata with few compliers. By contrast, standard approaches such as two-stage least squares fail to use such information. We finally examine the benefits of our approach in two GOTV applications.
△ Less
Submitted 28 June, 2024; v1 submitted 17 March, 2023;
originally announced March 2023.
-
Precise Unbiased Estimation in Randomized Experiments using Auxiliary Observational Data
Authors:
Johann A. Gagnon-Bartsch,
Adam C. Sales,
Edward Wu,
Anthony F. Botelho,
John A. Erickson,
Luke W. Miratrix,
Neil T. Heffernan
Abstract:
Randomized controlled trials (RCTs) are increasingly prevalent in education research, and are often regarded as a gold standard of causal inference. Two main virtues of randomized experiments are that they (1) do not suffer from confounding, thereby allowing for an unbiased estimate of an intervention's causal impact, and (2) allow for design-based inference, meaning that the physical act of rando…
▽ More
Randomized controlled trials (RCTs) are increasingly prevalent in education research, and are often regarded as a gold standard of causal inference. Two main virtues of randomized experiments are that they (1) do not suffer from confounding, thereby allowing for an unbiased estimate of an intervention's causal impact, and (2) allow for design-based inference, meaning that the physical act of randomization largely justifies the statistical assumptions made. However, RCT sample sizes are often small, leading to low precision; in many cases RCT estimates may be too imprecise to guide policy or inform science. Observational studies, by contrast, have strengths and weaknesses complementary to those of RCTs. Observational studies typically offer much larger sample sizes, but may suffer confounding. In many contexts, experimental and observational data exist side by side, allowing the possibility of integrating "big observational data" with "small but high-quality experimental data" to get the best of both. Such approaches hold particular promise in the field of education, where RCT sample sizes are often small due to cost constraints, but automatic collection of observational data, such as in computerized educational technology applications, or in state longitudinal data systems (SLDS) with administrative data on hundreds of thousand of students, has made rich, high-dimensional observational data widely available. We outline an approach that allows one to employ machine learning algorithms to learn from the observational data, and use the resulting models to improve precision in randomized experiments. Importantly, there is no requirement that the machine learning models are "correct" in any sense, and the final experimental results are guaranteed to be exactly unbiased. Thus, there is no danger of confounding biases in the observational data leaking into the experiment.
△ Less
Submitted 19 May, 2023; v1 submitted 7 May, 2021;
originally announced May 2021.
-
Block what you can, except when you shouldn't
Authors:
Nicole E. Pashley,
Luke W. Miratrix
Abstract:
Several branches of the potential outcome causal inference literature have discussed the merits of blocking versus complete randomization. Some have concluded it can never hurt the precision of estimates, and some have concluded it can hurt. In this paper, we reconcile these apparently conflicting views, give a more thorough discussion of what guarantees no harm, and discuss how other aspects of a…
▽ More
Several branches of the potential outcome causal inference literature have discussed the merits of blocking versus complete randomization. Some have concluded it can never hurt the precision of estimates, and some have concluded it can hurt. In this paper, we reconcile these apparently conflicting views, give a more thorough discussion of what guarantees no harm, and discuss how other aspects of a blocked design can cost, all in terms of precision. We discuss how the different findings are due to different sampling models and assumptions of how the blocks were formed. We also connect these ideas to common misconceptions, for instance showing that analyzing a blocked experiment as if it were completely randomized, a seemingly conservative method, can actually backfire in some cases. Overall, we find that blocking can have a price, but that this price is usually small and the potential for gain can be large. It is hard to go too far wrong with blocking.
△ Less
Submitted 27 May, 2021; v1 submitted 26 October, 2020;
originally announced October 2020.
-
Conditional As-If Analyses in Randomized Experiments
Authors:
Nicole E. Pashley,
Guillaume W. Basse,
Luke W. Miratrix
Abstract:
The injunction to `analyze the way you randomize' is well-known to statisticians since Fisher advocated for randomization as the basis of inference. Yet even those convinced by the merits of randomization-based inference seldom follow this injunction to the letter. Bernoulli randomized experiments are often analyzed as completely randomized experiments, and completely randomized experiments are an…
▽ More
The injunction to `analyze the way you randomize' is well-known to statisticians since Fisher advocated for randomization as the basis of inference. Yet even those convinced by the merits of randomization-based inference seldom follow this injunction to the letter. Bernoulli randomized experiments are often analyzed as completely randomized experiments, and completely randomized experiments are analyzed as if they had been stratified; more generally, it is not uncommon to analyze an experiment as if it had been randomized differently. This paper examines the theoretical foundation behind this practice within a randomization-based framework. Specifically, we ask when is it legitimate to analyze an experiment randomized according to one design as if it had been randomized according to some other design. We show that a sufficient condition for this type of analysis to be valid is that the design used for analysis be derived from the original design by an appropriate form of conditioning. We use our theory to justify certain existing methods, question others, and finally suggest new methodological insights such as conditioning on approximate covariate balance.
△ Less
Submitted 19 August, 2021; v1 submitted 3 August, 2020;
originally announced August 2020.
-
Design-Based Ratio Estimators and Central Limit Theorems for Clustered, Blocked RCTs
Authors:
Peter Z. Schochet,
Nicole E. Pashley,
Luke W. Miratrix,
Tim Kautz
Abstract:
This article develops design-based ratio estimators for clustered, blocked randomized controlled trials (RCTs), with an application to a federally funded, school-based RCT testing the effects of behavioral health interventions. We consider finite population weighted least squares estimators for average treatment effects (ATEs), allowing for general weighting schemes and covariates. We consider mod…
▽ More
This article develops design-based ratio estimators for clustered, blocked randomized controlled trials (RCTs), with an application to a federally funded, school-based RCT testing the effects of behavioral health interventions. We consider finite population weighted least squares estimators for average treatment effects (ATEs), allowing for general weighting schemes and covariates. We consider models with block-by-treatment status interactions as well as restricted models with block indicators only. We prove new finite population central limit theorems for each block specification. We also discuss simple variance estimators that share features with commonly used cluster-robust standard error estimators. Simulations show that the design-based ATE estimator yields nominal rejection rates with standard errors near true ones, even with few clusters.
△ Less
Submitted 25 February, 2021; v1 submitted 4 February, 2020;
originally announced February 2020.
-
Identifying and Estimating Principal Causal Effects in Multi-site Trials
Authors:
Lo-Hua Yuan,
Avi Feller,
Luke W. Miratrix
Abstract:
Randomized trials are often conducted with separate randomizations across multiple sites such as schools, voting districts, or hospitals. These sites can differ in important ways, including the site's implementation, local conditions, and the composition of individuals. An important question in practice is whether---and under what assumptions---researchers can leverage this cross-site variation to…
▽ More
Randomized trials are often conducted with separate randomizations across multiple sites such as schools, voting districts, or hospitals. These sites can differ in important ways, including the site's implementation, local conditions, and the composition of individuals. An important question in practice is whether---and under what assumptions---researchers can leverage this cross-site variation to learn more about the intervention. We address these questions in the principal stratification framework, which describes causal effects for subgroups defined by post-treatment quantities. We show that researchers can estimate certain principal causal effects via the multi-site design if they are willing to impose the strong assumption that the site-specific effects are uncorrelated with the site-specific distribution of stratum membership. We motivate this approach with a multi-site trial of the Early College High School Initiative, a unique secondary education program with the goal of increasing high school graduation rates and college enrollment. Our analyses corroborate previous studies suggesting that the initiative had positive effects for students who would have otherwise attended a low-quality high school, although power is limited.
△ Less
Submitted 15 March, 2018;
originally announced March 2018.
-
Insights on Variance Estimation for Blocked and Matched Pairs Designs
Authors:
Nicole E. Pashley,
Luke W. Miratrix
Abstract:
Evaluating blocked randomized experiments from a potential outcomes perspective has two primary branches of work. The first focuses on larger blocks, with multiple treatment and control units in each block. The second focuses on matched pairs, with a single treatment and control unit in each block. These literatures not only provide different estimators for the standard errors of the estimated ave…
▽ More
Evaluating blocked randomized experiments from a potential outcomes perspective has two primary branches of work. The first focuses on larger blocks, with multiple treatment and control units in each block. The second focuses on matched pairs, with a single treatment and control unit in each block. These literatures not only provide different estimators for the standard errors of the estimated average impact, but they are also built on different sets of assumptions. Neither literature handles cases with blocks of varying size that contain singleton treatment or control units, a case which can occur in a variety of contexts, such as with different forms of matching or post-stratification. In this paper, we reconcile the literatures by carefully examining the performance of variance estimators under several different frameworks. We then use these insights to derive novel variance estimators for experiments containing blocks of different sizes.
△ Less
Submitted 29 June, 2020; v1 submitted 27 October, 2017;
originally announced October 2017.
-
Shape-constrained partial identification of a population mean under unknown probabilities of sample selection
Authors:
Luke W. Miratrix,
Stefan Wager,
Jose R. Zubizarreta
Abstract:
A prevailing challenge in the biomedical and social sciences is to estimate a population mean from a sample obtained with unknown selection probabilities. Using a well-known ratio estimator, Aronow and Lee (2013) proposed a method for partial identification of the mean by allowing the unknown selection probabilities to vary arbitrarily between two fixed extreme values. In this paper, we show how t…
▽ More
A prevailing challenge in the biomedical and social sciences is to estimate a population mean from a sample obtained with unknown selection probabilities. Using a well-known ratio estimator, Aronow and Lee (2013) proposed a method for partial identification of the mean by allowing the unknown selection probabilities to vary arbitrarily between two fixed extreme values. In this paper, we show how to leverage auxiliary shape constraints on the population outcome distribution, such as symmetry or log-concavity, to obtain tighter bounds on the population mean. We use this method to estimate the performance of Aymara students---an ethnic minority in the north of Chile---in a national educational standardized test. We implement this method in the new statistical software package scbounds for R.
△ Less
Submitted 22 June, 2017;
originally announced June 2017.
-
Model-free causal inference of binary experimental data
Authors:
Peng Ding,
Luke W. Miratrix
Abstract:
For binary experimental data, we discuss randomization-based inferential procedures that do not need to invoke any modeling assumptions. We also introduce methods for likelihood and Bayesian inference based solely on the physical randomization without any hypothetical super population assumptions about the potential outcomes. These estimators have some properties superior to moment-based ones such…
▽ More
For binary experimental data, we discuss randomization-based inferential procedures that do not need to invoke any modeling assumptions. We also introduce methods for likelihood and Bayesian inference based solely on the physical randomization without any hypothetical super population assumptions about the potential outcomes. These estimators have some properties superior to moment-based ones such as only giving estimates in regions of feasible support. Due to the lack of identification of the causal model, we also propose a sensitivity analysis approach which allows for the characterization of the impact of the association between the potential outcomes on statistical inference.
△ Less
Submitted 23 May, 2017;
originally announced May 2017.
-
Worth Weighting? How to Think About and Use Weights in Survey Experiments
Authors:
Luke W. Miratrix,
Jasjeet S. Sekhon,
Alexander G. Theodoridis,
Luis F. Campos
Abstract:
The popularity of online surveys has increased the prominence of using weights that capture units' probabilities of inclusion for claims of representativeness. Yet, much uncertainty remains regarding how these weights should be employed in the analysis of survey experiments: Should they be used or ignored? If they are used, which estimators are preferred? We offer practical advice, rooted in the N…
▽ More
The popularity of online surveys has increased the prominence of using weights that capture units' probabilities of inclusion for claims of representativeness. Yet, much uncertainty remains regarding how these weights should be employed in the analysis of survey experiments: Should they be used or ignored? If they are used, which estimators are preferred? We offer practical advice, rooted in the Neyman-Rubin model, for researchers producing and working with survey experimental data. We examine simple, efficient estimators for analyzing these data, and give formulae for their biases and variances. We provide simulations that examine these estimators as well as real examples from experiments administered online through YouGov. We find that for examining the existence of population treatment effects using high-quality, broadly representative samples recruited by top online survey firms, sample quantities, which do not rely on weights, are often sufficient. We found that Sample Average Treatment Effect (SATE) estimates did not appear to differ substantially from their weighted counterparts, and they avoided the substantial loss of statistical power that accompanies weighting. When precise estimates of Population Average Treatment Effects (PATE) are essential, we analytically show post-stratifying on survey weights and/or covariates highly correlated with the outcome to be a conservative choice. While we show these substantial gains in simulations, we find limited evidence of them in practice.
△ Less
Submitted 15 August, 2017; v1 submitted 20 March, 2017;
originally announced March 2017.
-
Bridging Finite and Super Population Causal Inference
Authors:
Peng Ding,
Xinran Li,
Luke W. Miratrix
Abstract:
There are two general views in causal analysis of experimental data: the super population view that the units are an independent sample from some hypothetical infinite populations, and the finite population view that the potential outcomes of the experimental units are fixed and the randomness comes solely from the physical randomization of the treatment assignment. These two views differs concept…
▽ More
There are two general views in causal analysis of experimental data: the super population view that the units are an independent sample from some hypothetical infinite populations, and the finite population view that the potential outcomes of the experimental units are fixed and the randomness comes solely from the physical randomization of the treatment assignment. These two views differs conceptually and mathematically, resulting in different sampling variances of the usual difference-in-means estimator of the average causal effect. Practically, however, these two views result in identical variance estimators. By recalling a variance decomposition and exploiting a completeness-type argument, we establish a connection between these two views in completely randomized experiments. This alternative formulation could serve as a template for bridging finite and super population causal inference in other scenarios.
△ Less
Submitted 27 February, 2017;
originally announced February 2017.
-
Implementing Risk-Limiting Post-Election Audits in California
Authors:
Joseph Lorenzo Hall,
Luke W. Miratrix,
Philip B. Stark,
Melvin Briones,
Elaine Ginnold,
Freddie Oakley,
Martin Peaden,
Gail Pellerin,
Tom Stanionis,
Tricia Webber
Abstract:
Risk-limiting post-election audits limit the chance of certifying an electoral outcome if the outcome is not what a full hand count would show. Building on previous work, we report on pilot risk-limiting audits in four elections during 2008 in three California counties: one during the February 2008 Primary Election in Marin County and three during the November 2008 General Elections in Marin, Sa…
▽ More
Risk-limiting post-election audits limit the chance of certifying an electoral outcome if the outcome is not what a full hand count would show. Building on previous work, we report on pilot risk-limiting audits in four elections during 2008 in three California counties: one during the February 2008 Primary Election in Marin County and three during the November 2008 General Elections in Marin, Santa Cruz and Yolo Counties. We explain what makes an audit risk-limiting and how existing and proposed laws fall short. We discuss the differences among our four pilot audits. We identify challenges to practical, efficient risk-limiting audits and conclude that current approaches are too complex to be used routinely on a large scale. One important logistical bottleneck is the difficulty of exporting data from commercial election management systems in a format amenable to audit calculations. Finally, we propose a bare-bones risk-limiting audit that is less efficient than these pilot audits, but avoids many practical problems.
△ Less
Submitted 10 July, 2009; v1 submitted 28 May, 2009;
originally announced May 2009.