Search | arXiv e-print repository

Identification and Inference on Treatment Effects under Covariate-Adaptive Randomization and Imperfect Compliance

Authors: Federico A. Bugni, Mengsi Gao, Filip Obradovic, Amilcar Velez

Abstract: Randomized controlled trials (RCTs) frequently utilize covariate-adaptive randomization (CAR) (e.g., stratified block randomization) and commonly suffer from imperfect compliance. This paper studies the identification and inference for the average treatment effect (ATE) and the average treatment effect on the treated (ATT) in such RCTs with a binary treatment. We first develop characterizations… ▽ More Randomized controlled trials (RCTs) frequently utilize covariate-adaptive randomization (CAR) (e.g., stratified block randomization) and commonly suffer from imperfect compliance. This paper studies the identification and inference for the average treatment effect (ATE) and the average treatment effect on the treated (ATT) in such RCTs with a binary treatment. We first develop characterizations of the identified sets for both estimands. Since data are generally not i.i.d. under CAR, these characterizations do not follow from existing results. We then provide consistent estimators of the identified sets and asymptotically valid confidence intervals for the parameters. Our asymptotic analysis leads to concrete practical recommendations regarding how to estimate the treatment assignment probabilities that enter in estimated bounds. In the case of the ATE, using sample analog assignment frequencies is more efficient than using the true assignment probabilities. On the contrary, using the true assignment probabilities is preferable for the ATT. △ Less

Submitted 20 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

Comments: 62 pages and 3 tables

arXiv:2311.09972 [pdf, other]

Inference in Auctions with Many Bidders Using Transaction Prices

Authors: Federico A. Bugni, Yulong Wang

Abstract: This paper considers inference in first-price and second-price sealed-bid auctions in empirical settings where we observe auctions with a large number of bidders. Relevant applications include online auctions, treasury auctions, spectrum auctions, art auctions, and IPO auctions, among others. Given the abundance of bidders in each auction, we propose an asymptotic framework in which the number of… ▽ More This paper considers inference in first-price and second-price sealed-bid auctions in empirical settings where we observe auctions with a large number of bidders. Relevant applications include online auctions, treasury auctions, spectrum auctions, art auctions, and IPO auctions, among others. Given the abundance of bidders in each auction, we propose an asymptotic framework in which the number of bidders diverges while the number of auctions remains fixed. This framework allows us to perform asymptotically exact inference on key model features using only transaction price data. Specifically, we examine inference on the expected utility of the auction winner, the expected revenue of the seller, and the tail properties of the valuation distribution. Simulations confirm the accuracy of our inference methods in finite samples. Finally, we also apply them to Hong Kong car license auction data. △ Less

Submitted 10 April, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

arXiv:2302.11505 [pdf, ps, other]

Decomposition and Interpretation of Treatment Effects in Settings with Delayed Outcomes

Authors: Federico A. Bugni, Ivan A. Canay, Steve McBride

Abstract: This paper studies settings where the analyst is interested in identifying and estimating the average causal effect of a binary treatment on an outcome. We consider a setup in which the outcome realization does not get immediately realized after the treatment assignment, a feature that is ubiquitous in empirical settings. The period between the treatment and the realization of the outcome allows o… ▽ More This paper studies settings where the analyst is interested in identifying and estimating the average causal effect of a binary treatment on an outcome. We consider a setup in which the outcome realization does not get immediately realized after the treatment assignment, a feature that is ubiquitous in empirical settings. The period between the treatment and the realization of the outcome allows other observed actions to occur and affect the outcome. In this context, we study several regression-based estimands routinely used in empirical work to capture the average treatment effect and shed light on interpreting them in terms of ceteris paribus effects, indirect causal effects, and selection terms. We obtain three main and related takeaways. First, the three most popular estimands do not generally satisfy what we call \emph{strong sign preservation}, in the sense that these estimands may be negative even when the treatment positively affects the outcome conditional on any possible combination of other actions. Second, the most popular regression that includes the other actions as controls satisfies strong sign preservation \emph{if and only if} these actions are mutually exclusive binary variables. Finally, we show that a linear regression that fully stratifies the other actions leads to estimands that satisfy strong sign preservation. △ Less

Submitted 9 October, 2023; v1 submitted 22 February, 2023; originally announced February 2023.

arXiv:2204.08356 [pdf, other]

Inference for Cluster Randomized Experiments with Non-ignorable Cluster Sizes

Authors: Federico Bugni, Ivan Canay, Azeem Shaikh, Max Tabord-Meehan

Abstract: This paper considers the problem of inference in cluster randomized experiments when cluster sizes are non-ignorable. Here, by a cluster randomized experiment, we mean one in which treatment is assigned at the cluster level. By non-ignorable cluster sizes, we refer to the possibility that the treatment effects may depend non-trivially on the cluster sizes. We frame our analysis in a super-populati… ▽ More This paper considers the problem of inference in cluster randomized experiments when cluster sizes are non-ignorable. Here, by a cluster randomized experiment, we mean one in which treatment is assigned at the cluster level. By non-ignorable cluster sizes, we refer to the possibility that the treatment effects may depend non-trivially on the cluster sizes. We frame our analysis in a super-population framework in which cluster sizes are random. In this way, our analysis departs from earlier analyses of cluster randomized experiments in which cluster sizes are treated as non-random. We distinguish between two different parameters of interest: the equally-weighted cluster-level average treatment effect, and the size-weighted cluster-level average treatment effect. For each parameter, we provide methods for inference in an asymptotic framework where the number of clusters tends to infinity and treatment is assigned using a covariate-adaptive stratified randomization procedure. We additionally permit the experimenter to sample only a subset of the units within each cluster rather than the entire cluster and demonstrate the implications of such sampling for some commonly used estimators. A small simulation study and empirical demonstration show the practical relevance of our theoretical results. △ Less

Submitted 9 April, 2024; v1 submitted 18 April, 2022; originally announced April 2022.

arXiv:2102.03937 [pdf, ps, other]

Inference under Covariate-Adaptive Randomization with Imperfect Compliance

Authors: Federico A. Bugni, Mengsi Gao

Abstract: This paper studies inference in a randomized controlled trial (RCT) with covariate-adaptive randomization (CAR) and imperfect compliance of a binary treatment. In this context, we study inference on the LATE. As in Bugni et al. (2018,2019), CAR refers to randomization schemes that first stratify according to baseline covariates and then assign treatment status so as to achieve "balance" within eac… ▽ More This paper studies inference in a randomized controlled trial (RCT) with covariate-adaptive randomization (CAR) and imperfect compliance of a binary treatment. In this context, we study inference on the LATE. As in Bugni et al. (2018,2019), CAR refers to randomization schemes that first stratify according to baseline covariates and then assign treatment status so as to achieve "balance" within each stratum. In contrast to these papers, however, we allow participants of the RCT to endogenously decide to comply or not with the assigned treatment status. We study the properties of an estimator of the LATE derived from a "fully saturated" IV linear regression, i.e., a linear regression of the outcome on all indicators for all strata and their interaction with the treatment decision, with the latter instrumented with the treatment assignment. We show that the proposed LATE estimator is asymptotically normal, and we characterize its asymptotic variance in terms of primitives of the problem. We provide consistent estimators of the standard errors and asymptotically exact hypothesis tests. In the special case when the target proportion of units assigned to each treatment does not vary across strata, we can also consider two other estimators of the LATE, including the one based on the "strata fixed effects" IV linear regression, i.e., a linear regression of the outcome on indicators for all strata and the treatment decision, with the latter instrumented with the treatment assignment. Our characterization of the asymptotic variance of the LATE estimators allows us to understand the influence of the parameters of the RCT. We use this to propose strategies to minimize their asymptotic variance in a hypothetical RCT based on data from a pilot study. We illustrate the practical relevance of these results using a simulation study and an empirical application based on Dupas et al. (2018). △ Less

Submitted 24 July, 2023; v1 submitted 7 February, 2021; originally announced February 2021.

Comments: 61 pages, 6 tables

arXiv:2010.02297 [pdf, other]

Testing homogeneity in dynamic discrete games in finite samples

Authors: Federico A. Bugni, Jackson Bunting, Takuya Ura

Abstract: The literature on dynamic discrete games often assumes that the conditional choice probabilities and the state transition probabilities are homogeneous across markets and over time. We refer to this as the "homogeneity assumption" in dynamic discrete games. This assumption enables empirical studies to estimate the game's structural parameters by pooling data from multiple markets and from many tim… ▽ More The literature on dynamic discrete games often assumes that the conditional choice probabilities and the state transition probabilities are homogeneous across markets and over time. We refer to this as the "homogeneity assumption" in dynamic discrete games. This assumption enables empirical studies to estimate the game's structural parameters by pooling data from multiple markets and from many time periods. In this paper, we propose a hypothesis test to evaluate whether the homogeneity assumption holds in the data. Our hypothesis test is the result of an approximate randomization test, implemented via a Markov chain Monte Carlo (MCMC) algorithm. We show that our hypothesis test becomes valid as the (user-defined) number of MCMC draws diverges, for any fixed number of markets, time periods, and players. We apply our test to the empirical study of the U.S.\ Portland cement industry in Ryan (2012). △ Less

Submitted 2 May, 2023; v1 submitted 5 October, 2020; originally announced October 2020.

arXiv:2007.09837 [pdf, other]

Permutation-based tests for discontinuities in event studies

Authors: Federico A. Bugni, Jia Li, Qiyuan Li

Abstract: We propose using a permutation test to detect discontinuities in an underlying economic model at a known cutoff point. Relative to the existing literature, we show that this test is well suited for event studies based on time-series data. The test statistic measures the distance between the empirical distribution functions of observed data in two local subsamples on the two sides of the cutoff. Cr… ▽ More We propose using a permutation test to detect discontinuities in an underlying economic model at a known cutoff point. Relative to the existing literature, we show that this test is well suited for event studies based on time-series data. The test statistic measures the distance between the empirical distribution functions of observed data in two local subsamples on the two sides of the cutoff. Critical values are computed via a standard permutation algorithm. Under a high-level condition that the observed data can be coupled by a collection of conditionally independent variables, we establish the asymptotic validity of the permutation test, allowing the sizes of the local subsamples to be either be fixed or grow to infinity. In the latter case, we also establish that the permutation test is consistent. We demonstrate that our high-level condition can be verified in a broad range of problems in the infill asymptotic time-series setting, which justifies using the permutation test to detect jumps in economic variables such as volatility, trading activity, and liquidity. These potential applications are illustrated in an empirical case study for selected FOMC announcements during the ongoing COVID-19 pandemic. △ Less

Submitted 10 July, 2022; v1 submitted 19 July, 2020; originally announced July 2020.

Comments: 35 pages, 3 tables, 2 figures

arXiv:1806.11466 [pdf, ps, other]

Subvector Inference in Partially Identified Models with Many Moment Inequalities

Authors: Alexandre Belloni, Federico Bugni, Victor Chernozhukov

Abstract: This paper considers inference for a function of a parameter vector in a partially identified model with many moment inequalities. This framework allows the number of moment conditions to grow with the sample size, possibly at exponential rates. Our main motivating application is subvector inference, i.e., inference on a single component of the partially identified parameter vector associated with… ▽ More This paper considers inference for a function of a parameter vector in a partially identified model with many moment inequalities. This framework allows the number of moment conditions to grow with the sample size, possibly at exponential rates. Our main motivating application is subvector inference, i.e., inference on a single component of the partially identified parameter vector associated with a treatment effect or a policy variable of interest. Our inference method compares a MinMax test statistic (minimum over parameters satisfying $H_0$ and maximum over moment inequalities) against critical values that are based on bootstrap approximations or analytical bounds. We show that this method controls asymptotic size uniformly over a large class of data generating processes despite the partially identified many moment inequality setting. The finite sample analysis allows us to obtain explicit rates of convergence on the size control. Our results are based on combining non-asymptotic approximations and new high-dimensional central limit theorems for the MinMax of the components of random matrices. Unlike the previous literature on functional inference in partially identified models, our results do not rely on weak convergence results based on Donsker's class assumptions and, in fact, our test statistic may not even converge in distribution. Our bootstrap approximation requires the choice of a tuning parameter sequence that can avoid the excessive concentration of our test statistic. To this end, we propose an asymptotically valid data-driven method to select this tuning parameter sequence. This method generalizes the selection of tuning parameter sequences to problems outside the Donsker's class assumptions and may also be of independent interest. Our procedures based on self-normalized moderate deviation bounds are relatively more conservative but easier to implement. △ Less

Submitted 29 June, 2018; originally announced June 2018.

arXiv:1806.04206 [pdf, other]

Inference under Covariate-Adaptive Randomization with Multiple Treatments

Authors: Federico A. Bugni, Ivan A. Canay, Azeem M. Shaikh

Abstract: This paper studies inference in randomized controlled trials with covariate-adaptive randomization when there are multiple treatments. More specifically, we study inference about the average effect of one or more treatments relative to other treatments or a control. As in Bugni et al. (2018), covariate-adaptive randomization refers to randomization schemes that first stratify according to baseline… ▽ More This paper studies inference in randomized controlled trials with covariate-adaptive randomization when there are multiple treatments. More specifically, we study inference about the average effect of one or more treatments relative to other treatments or a control. As in Bugni et al. (2018), covariate-adaptive randomization refers to randomization schemes that first stratify according to baseline covariates and then assign treatment status so as to achieve balance within each stratum. In contrast to Bugni et al. (2018), we not only allow for multiple treatments, but further allow for the proportion of units being assigned to each of the treatments to vary across strata. We first study the properties of estimators derived from a fully saturated linear regression, i.e., a linear regression of the outcome on all interactions between indicators for each of the treatments and indicators for each of the strata. We show that tests based on these estimators using the usual heteroskedasticity-consistent estimator of the asymptotic variance are invalid; on the other hand, tests based on these estimators and suitable estimators of the asymptotic variance that we provide are exact. For the special case in which the target proportion of units being assigned to each of the treatments does not vary across strata, we additionally consider tests based on estimators derived from a linear regression with strata fixed effects, i.e., a linear regression of the outcome on indicators for each of the treatments and indicators for each of the strata. We show that tests based on these estimators using the usual heteroskedasticity-consistent estimator of the asymptotic variance are conservative, but tests based on these estimators and suitable estimators of the asymptotic variance that we provide are exact. A simulation study illustrates the practical relevance of our theoretical results. △ Less

Submitted 17 January, 2019; v1 submitted 11 June, 2018; originally announced June 2018.

Comments: 33 pages, 8 tables

arXiv:1803.07951 [pdf, other]

Testing Continuity of a Density via g-order statistics in the Regression Discontinuity Design

Authors: Federico A. Bugni, Ivan A. Canay

Abstract: In the regression discontinuity design (RDD), it is common practice to assess the credibility of the design by testing the continuity of the density of the running variable at the cut-off, e.g., McCrary (2008). In this paper we propose an approximate sign test for continuity of a density at a point based on the so-called g-order statistics, and study its properties under two complementary asymptot… ▽ More In the regression discontinuity design (RDD), it is common practice to assess the credibility of the design by testing the continuity of the density of the running variable at the cut-off, e.g., McCrary (2008). In this paper we propose an approximate sign test for continuity of a density at a point based on the so-called g-order statistics, and study its properties under two complementary asymptotic frameworks. In the first asymptotic framework, the number q of observations local to the cut-off is fixed as the sample size n diverges to infinity, while in the second framework q diverges to infinity slowly as n diverges to infinity. Under both of these frameworks, we show that the test we propose is asymptotically valid in the sense that it has limiting rejection probability under the null hypothesis not exceeding the nominal level. More importantly, the test is easy to implement, asymptotically valid under weaker conditions than those used by competing methods, and exhibits finite sample validity under stronger conditions than those needed for its asymptotic validity. In a simulation study, we find that the approximate sign test provides good control of the rejection probability under the null hypothesis while remaining competitive under the alternative hypothesis. We finally apply our test to the design in Lee (2008), a well-known application of the RDD to study incumbency advantage. △ Less

Submitted 11 February, 2020; v1 submitted 21 March, 2018; originally announced March 2018.

Comments: 32 pages, 3 figures, and 4 tables

arXiv:1803.00798 [pdf]

Permutation Tests for Equality of Distributions of Functional Data

Authors: Federico A. Bugni, Joel L. Horowitz

Abstract: Economic data are often generated by stochastic processes that take place in continuous time, though observations may occur only at discrete times. For example, electricity and gas consumption take place in continuous time. Data generated by a continuous time stochastic process are called functional data. This paper is concerned with comparing two or more stochastic processes that generate functio… ▽ More Economic data are often generated by stochastic processes that take place in continuous time, though observations may occur only at discrete times. For example, electricity and gas consumption take place in continuous time. Data generated by a continuous time stochastic process are called functional data. This paper is concerned with comparing two or more stochastic processes that generate functional data. The data may be produced by a randomized experiment in which there are multiple treatments. The paper presents a method for testing the hypothesis that the same stochastic process generates all the functional data. The test described here applies to both functional data and multiple treatments. It is implemented as a combination of two permutation tests. This ensures that in finite samples, the true and nominal probabilities that each test rejects a correct null hypothesis are equal. The paper presents upper and lower bounds on the asymptotic power of the test under alternative hypotheses. The results of Monte Carlo experiments and an application to an experiment on billing and pricing of natural gas illustrate the usefulness of the test. △ Less

Submitted 14 June, 2021; v1 submitted 2 March, 2018; originally announced March 2018.

Comments: 49 pages, 5 figures, 5 tables

arXiv:1802.06665 [pdf, other]

On the iterated estimation of dynamic discrete choice games

Authors: Federico A. Bugni, Jackson Bunting

Abstract: We study the asymptotic properties of a class of estimators of the structural parameters in dynamic discrete choice games. We consider K-stage policy iteration (PI) estimators, where K denotes the number of policy iterations employed in the estimation. This class nests several estimators proposed in the literature such as those in Aguirregabiria and Mira (2002, 2007), Pesendorfer and Schmidt-Dengl… ▽ More We study the asymptotic properties of a class of estimators of the structural parameters in dynamic discrete choice games. We consider K-stage policy iteration (PI) estimators, where K denotes the number of policy iterations employed in the estimation. This class nests several estimators proposed in the literature such as those in Aguirregabiria and Mira (2002, 2007), Pesendorfer and Schmidt-Dengler (2008), and Pakes et al. (2007). First, we establish that the K-PML estimator is consistent and asymptotically normal for all K. This complements findings in Aguirregabiria and Mira (2007), who focus on K=1 and K large enough to induce convergence of the estimator. Furthermore, we show under certain conditions that the asymptotic variance of the K-PML estimator can exhibit arbitrary patterns as a function of K. Second, we establish that the K-MD estimator is consistent and asymptotically normal for all K. For a specific weight matrix, the K-MD estimator has the same asymptotic distribution as the K-PML estimator. Our main result provides an optimal sequence of weight matrices for the K-MD estimator and shows that the optimally weighted K-MD estimator has an asymptotic distribution that is invariant to K. The invariance result is especially unexpected given the findings in Aguirregabiria and Mira (2007) for K-PML estimators. Our main result implies two new corollaries about the optimal 1-MD estimator (derived by Pesendorfer and Schmidt-Dengler (2008)). First, the optimal 1-MD estimator is optimal in the class of K-MD estimators. In other words, additional policy iterations do not provide asymptotic efficiency gains relative to the optimal 1-MD estimator. Second, the optimal 1-MD estimator is more or equally asymptotically efficient than any K-PML estimator for all K. Finally, the appendix provides appropriate conditions under which the optimal 1-MD estimator is asymptotically efficient. △ Less

Submitted 24 May, 2020; v1 submitted 19 February, 2018; originally announced February 2018.

Comments: 46 pages, 6 figures, 5 tables. JEL: C13, C61, C73

arXiv:1604.02309 [pdf, other]

Inference in partially identified models with many moment inequalities using Lasso

Authors: Federico A. Bugni, Mehmet Caner, Anders Bredahl Kock, Soumendra Lahiri

Abstract: This paper considers inference in a partially identified moment (in)equality model with many moment inequalities. We propose a novel two-step inference procedure that combines the methods proposed by Chernozhukov, Chetverikov and Kato (2018a) (CCK18, hereafter) with a first step moment inequality selection based on the Lasso. Our method controls asymptotic size uniformly, both in underlying parame… ▽ More This paper considers inference in a partially identified moment (in)equality model with many moment inequalities. We propose a novel two-step inference procedure that combines the methods proposed by Chernozhukov, Chetverikov and Kato (2018a) (CCK18, hereafter) with a first step moment inequality selection based on the Lasso. Our method controls asymptotic size uniformly, both in underlying parameter and data distribution. Also, the power of our method compares favorably with that of the corresponding two-step method in CCK18 for large parts of the parameter space, both in theory and in simulations. Finally, we show that our Lasso-based first step can be implemented by thresholding standardized sample averages, and so it is straightforward to implement. △ Less

Submitted 29 June, 2019; v1 submitted 8 April, 2016; originally announced April 2016.

Comments: 1 figure

arXiv:1603.07987 [pdf, ps, other]

Inference in Dynamic Discrete Choice Problems under Local Misspecification

Authors: Federico A. Bugni, Takuya Ura

Abstract: Single-agent dynamic discrete choice models are typically estimated using heavily parametrized econometric frameworks, making them susceptible to model misspecification. This paper investigates how misspecification affects the results of inference in these models. Specifically, we consider a local misspecification framework in which specification errors are assumed to vanish at an arbitrary and un… ▽ More Single-agent dynamic discrete choice models are typically estimated using heavily parametrized econometric frameworks, making them susceptible to model misspecification. This paper investigates how misspecification affects the results of inference in these models. Specifically, we consider a local misspecification framework in which specification errors are assumed to vanish at an arbitrary and unknown rate with the sample size. Relative to global misspecification, the local misspecification analysis has two important advantages. First, it yields tractable and general results. Second, it allows us to focus on parameters with structural interpretation, instead of "pseudo-true" parameters. We consider a general class of two-step estimators based on the K-stage sequential policy function iteration algorithm, where K denotes the number of iterations employed in the estimation. This class includes Hotz and Miller (1993)'s conditional choice probability estimator, Aguirregabiria and Mira (2002)'s pseudo-likelihood estimator, and Pesendorfer and Schmidt-Dengler (2008)'s asymptotic least squares estimator. We show that local misspecification can affect the asymptotic distribution and even the rate of convergence of these estimators. In principle, one might expect that the effect of the local misspecification could change with the number of iterations K. One of our main findings is that this is not the case, i.e., the effect of local misspecification is invariant to K. In practice, this means that researchers cannot eliminate or even alleviate problems of model misspecification by changing K. △ Less

Submitted 7 February, 2018; v1 submitted 25 March, 2016; originally announced March 2016.

Showing 1–14 of 14 results for author: Bugni, F