-
Finely Stratified Rerandomization Designs
Authors:
Max Cytrynbaum
Abstract:
We study estimation and inference on causal parameters under finely stratified rerandomization designs, which use baseline covariates to match units into groups (e.g. matched pairs), then rerandomize within-group treatment assignments until a balance criterion is satisfied. We show that finely stratified rerandomization does partially linear regression adjustment by design, providing nonparametric…
▽ More
We study estimation and inference on causal parameters under finely stratified rerandomization designs, which use baseline covariates to match units into groups (e.g. matched pairs), then rerandomize within-group treatment assignments until a balance criterion is satisfied. We show that finely stratified rerandomization does partially linear regression adjustment by design, providing nonparametric control over the covariates used for stratification, and linear control over the rerandomization covariates. We also introduce novel rerandomization criteria, allowing for nonlinear imbalance metrics and proposing a minimax scheme that optimizes the balance criterion using pilot data or prior information provided by the researcher. While the asymptotic distribution of generalized method of moments (GMM) estimators under stratified rerandomization is generically non-Gaussian, we show how to restore asymptotic normality using optimal ex-post linear adjustment. This allows us to provide simple asymptotically exact inference methods for superpopulation parameters, as well as efficient conservative inference methods for finite population parameters.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Covariate Adjustment in Stratified Experiments
Authors:
Max Cytrynbaum
Abstract:
This paper studies covariate adjusted estimation of the average treatment effect in stratified experiments. We work in a general framework that includes matched tuples designs, coarse stratification, and complete randomization as special cases. Regression adjustment with treatment-covariate interactions is known to weakly improve efficiency for completely randomized designs. By contrast, we show t…
▽ More
This paper studies covariate adjusted estimation of the average treatment effect in stratified experiments. We work in a general framework that includes matched tuples designs, coarse stratification, and complete randomization as special cases. Regression adjustment with treatment-covariate interactions is known to weakly improve efficiency for completely randomized designs. By contrast, we show that for stratified designs such regression estimators are generically inefficient, potentially even increasing estimator variance relative to the unadjusted benchmark. Motivated by this result, we derive the asymptotically optimal linear covariate adjustment for a given stratification. We construct several feasible estimators that implement this efficient adjustment in large samples. In the special case of matched pairs, for example, the regression including treatment, covariates, and pair fixed effects is asymptotically optimal. We also provide novel asymptotically exact inference methods that allow researchers to report smaller confidence intervals, fully reflecting the efficiency gains from both stratification and adjustment. Simulations and an empirical application demonstrate the value of our proposed methods.
△ Less
Submitted 18 September, 2023; v1 submitted 7 February, 2023;
originally announced February 2023.
-
Optimal Stratification of Survey Experiments
Authors:
Max Cytrynbaum
Abstract:
This paper studies a two-stage model of experimentation, where the researcher first samples representative units from an eligible pool, then assigns each sampled unit to treatment or control. To implement balanced sampling and assignment, we introduce a new family of finely stratified designs that generalize matched pairs randomization to propensities p(x) not equal to 1/2. We show that two-stage…
▽ More
This paper studies a two-stage model of experimentation, where the researcher first samples representative units from an eligible pool, then assigns each sampled unit to treatment or control. To implement balanced sampling and assignment, we introduce a new family of finely stratified designs that generalize matched pairs randomization to propensities p(x) not equal to 1/2. We show that two-stage stratification nonparametrically dampens the variance of treatment effect estimation. We formulate and solve the optimal stratification problem with heterogeneous costs and fixed budget, providing simple heuristics for the optimal design. In settings with pilot data, we show that implementing a consistent estimate of this design is also efficient, minimizing asymptotic variance subject to the budget constraint. We also provide new asymptotically exact inference methods, allowing experimenters to fully exploit the efficiency gains from both stratified sampling and assignment. An application to nine papers recently published in top economics journals demonstrates the value of our methods.
△ Less
Submitted 19 August, 2023; v1 submitted 15 November, 2021;
originally announced November 2021.
-
Blocked Clusterwise Regression
Authors:
Max Cytrynbaum
Abstract:
A recent literature in econometrics models unobserved cross-sectional heterogeneity in panel data by assigning each cross-sectional unit a one-dimensional, discrete latent type. Such models have been shown to allow estimation and inference by regression clustering methods. This paper is motivated by the finding that the clustered heterogeneity models studied in this literature can be badly misspec…
▽ More
A recent literature in econometrics models unobserved cross-sectional heterogeneity in panel data by assigning each cross-sectional unit a one-dimensional, discrete latent type. Such models have been shown to allow estimation and inference by regression clustering methods. This paper is motivated by the finding that the clustered heterogeneity models studied in this literature can be badly misspecified, even when the panel has significant discrete cross-sectional structure. To address this issue, we generalize previous approaches to discrete unobserved heterogeneity by allowing each unit to have multiple, imperfectly-correlated latent variables that describe its response-type to different covariates. We give inference results for a k-means style estimator of our model and develop information criteria to jointly select the number clusters for each latent variable. Monte Carlo simulations confirm our theoretical results and give intuition about the finite-sample performance of estimation and model selection. We also contribute to the theory of clustering with an over-specified number of clusters and derive new convergence rates for this setting. Our results suggest that over-fitting can be severe in k-means style estimators when the number of clusters is over-specified.
△ Less
Submitted 29 January, 2020;
originally announced January 2020.