-
Extremal quantiles of intermediate orders under two-way clustering
Authors:
Harold D. Chiang,
Ryutah Kato,
Yuya Sasaki
Abstract:
This paper investigates extremal quantiles under two-way cluster dependence. We demonstrate that the limiting distribution of the unconditional intermediate order quantiles in the tails converges to a Gaussian distribution. This is remarkable as two-way cluster dependence entails potential non-Gaussianity in general, but extremal quantiles do not suffer from this issue. Building upon this result,…
▽ More
This paper investigates extremal quantiles under two-way cluster dependence. We demonstrate that the limiting distribution of the unconditional intermediate order quantiles in the tails converges to a Gaussian distribution. This is remarkable as two-way cluster dependence entails potential non-Gaussianity in general, but extremal quantiles do not suffer from this issue. Building upon this result, we extend our analysis to extremal quantile regressions of intermediate order.
△ Less
Submitted 4 March, 2024; v1 submitted 29 February, 2024;
originally announced February 2024.
-
On the Inconsistency of Cluster-Robust Inference and How Subsampling Can Fix It
Authors:
Harold D. Chiang,
Yuya Sasaki,
Yulong Wang
Abstract:
Conventional methods of cluster-robust inference are inconsistent in the presence of unignorably large clusters. We formalize this claim by establishing a necessary and sufficient condition for the consistency of the conventional methods. We find that this condition for the consistency is rejected for a majority of empirical research papers. In this light, we propose a novel score subsampling meth…
▽ More
Conventional methods of cluster-robust inference are inconsistent in the presence of unignorably large clusters. We formalize this claim by establishing a necessary and sufficient condition for the consistency of the conventional methods. We find that this condition for the consistency is rejected for a majority of empirical research papers. In this light, we propose a novel score subsampling method that achieves uniform size control over a broad class of data generating processes, covering that fails the conventional method. Simulation studies support these claims. With real data used by an empirical paper, we showcase that the conventional methods conclude significance while our proposed method concludes insignificance.
△ Less
Submitted 23 March, 2024; v1 submitted 19 August, 2023;
originally announced August 2023.
-
Regression adjustment in randomized controlled trials with many covariates
Authors:
Harold D Chiang,
Yukitoshi Matsushita,
Taisuke Otsu
Abstract:
This paper is concerned with estimation and inference on average treatment effects in randomized controlled trials when researchers observe potentially many covariates. By employing Neyman's (1923) finite population perspective, we propose a bias-corrected regression adjustment estimator using cross-fitting, and show that the proposed estimator has favorable properties over existing alternatives.…
▽ More
This paper is concerned with estimation and inference on average treatment effects in randomized controlled trials when researchers observe potentially many covariates. By employing Neyman's (1923) finite population perspective, we propose a bias-corrected regression adjustment estimator using cross-fitting, and show that the proposed estimator has favorable properties over existing alternatives. For inference, we derive the first and second order terms in the stochastic component of the regression adjustment estimators, study higher order properties of the existing inference methods, and propose a bias-corrected version of the HC3 standard error. The proposed methods readily extend to stratified experiments with large strata. Simulation studies show our cross-fitted estimator, combined with the bias-corrected HC3, delivers precise point estimates and robust size controls over a wide range of DGPs. To illustrate, the proposed methods are applied to real dataset on randomized experiments of incentives and services for college achievement following Angrist, Lang, and Oreopoulos (2009).
△ Less
Submitted 13 November, 2023; v1 submitted 1 February, 2023;
originally announced February 2023.
-
On Using The Two-Way Cluster-Robust Standard Errors
Authors:
Harold D Chiang,
Yuya Sasaki
Abstract:
Thousands of papers have reported two-way cluster-robust (TWCR) standard errors. However, the recent econometrics literature points out the potential non-gaussianity of two-way cluster sample means, and thus invalidity of the inference based on the TWCR standard errors. Fortunately, simulation studies nonetheless show that the gaussianity is rather common than exceptional. This paper provides theo…
▽ More
Thousands of papers have reported two-way cluster-robust (TWCR) standard errors. However, the recent econometrics literature points out the potential non-gaussianity of two-way cluster sample means, and thus invalidity of the inference based on the TWCR standard errors. Fortunately, simulation studies nonetheless show that the gaussianity is rather common than exceptional. This paper provides theoretical support for this encouraging observation. Specifically, we derive a novel central limit theorem for two-way clustered triangular arrays that justifies the use of the TWCR under very mild and interpretable conditions. We, therefore, hope that this paper will provide a theoretical justification for the legitimacy of most, if not all, of the thousands of those empirical papers that have used the TWCR standard errors. We provide a guide in practice as to when a researcher can employ the TWCR standard errors.
△ Less
Submitted 31 January, 2023;
originally announced January 2023.
-
Standard errors for two-way clustering with serially correlated time effects
Authors:
Harold D Chiang,
Bruce E Hansen,
Yuya Sasaki
Abstract:
We propose improved standard errors and an asymptotic distribution theory for two-way clustered panels. Our proposed estimator and theory allow for arbitrary serial dependence in the common time effects, which is excluded by existing two-way methods, including the popular two-way cluster standard errors of Cameron, Gelbach, and Miller (2011) and the cluster bootstrap of Menzel (2021). Our asymptot…
▽ More
We propose improved standard errors and an asymptotic distribution theory for two-way clustered panels. Our proposed estimator and theory allow for arbitrary serial dependence in the common time effects, which is excluded by existing two-way methods, including the popular two-way cluster standard errors of Cameron, Gelbach, and Miller (2011) and the cluster bootstrap of Menzel (2021). Our asymptotic distribution theory is the first which allows for this level of inter-dependence among the observations. Under weak regularity conditions, we demonstrate that the least squares estimator is asymptotically normal, our proposed variance estimator is consistent, and t-ratios are asymptotically standard normal, permitting conventional inference. We present simulation evidence that confidence intervals constructed with our proposed standard errors obtain superior coverage performance relative to existing methods. We illustrate the relevance of the proposed method in an empirical application to a standard Fama-French three-factor regression.
△ Less
Submitted 13 December, 2023; v1 submitted 26 January, 2022;
originally announced January 2022.
-
Dyadic double/debiased machine learning for analyzing determinants of free trade agreements
Authors:
Harold D Chiang,
Yukun Ma,
Joel Rodrigue,
Yuya Sasaki
Abstract:
This paper presents novel methods and theories for estimation and inference about parameters in econometric models using machine learning for nuisance parameters estimation when data are dyadic. We propose a dyadic cross fitting method to remove over-fitting biases under arbitrary dyadic dependence. Together with the use of Neyman orthogonal scores, this novel cross fitting method enables root-…
▽ More
This paper presents novel methods and theories for estimation and inference about parameters in econometric models using machine learning for nuisance parameters estimation when data are dyadic. We propose a dyadic cross fitting method to remove over-fitting biases under arbitrary dyadic dependence. Together with the use of Neyman orthogonal scores, this novel cross fitting method enables root-$n$ consistent estimation and inference robustly against dyadic dependence. We illustrate an application of our general framework to high-dimensional network link formation models. With this method applied to empirical data of international economic networks, we reexamine determinants of free trade agreements (FTA) viewed as links formed in the dyad composed of world economies. We document that standard methods may lead to misleading conclusions for numerous classic determinants of FTA formation due to biased point estimates or standard errors which are too small.
△ Less
Submitted 19 December, 2022; v1 submitted 8 October, 2021;
originally announced October 2021.
-
Inference in high-dimensional regression models without the exact or $L^p$ sparsity
Authors:
Jooyoung Cha,
Harold D. Chiang,
Yuya Sasaki
Abstract:
This paper proposes a new method of inference in high-dimensional regression models and high-dimensional IV regression models. Estimation is based on a combined use of the orthogonal greedy algorithm, high-dimensional Akaike information criterion, and double/debiased machine learning. The method of inference for any low-dimensional subvector of high-dimensional parameters is based on a root-$N$ as…
▽ More
This paper proposes a new method of inference in high-dimensional regression models and high-dimensional IV regression models. Estimation is based on a combined use of the orthogonal greedy algorithm, high-dimensional Akaike information criterion, and double/debiased machine learning. The method of inference for any low-dimensional subvector of high-dimensional parameters is based on a root-$N$ asymptotic normality, which is shown to hold without requiring the exact sparsity condition or the $L^p$ sparsity condition. Simulation studies demonstrate superior finite-sample performance of this proposed method over those based on the LASSO or the random forest, especially under less sparse models. We illustrate an application to production analysis with a panel of Chilean firms.
△ Less
Submitted 31 December, 2022; v1 submitted 21 August, 2021;
originally announced August 2021.
-
Multiway empirical likelihood
Authors:
Harold D Chiang,
Yukitoshi Matsushita,
Taisuke Otsu
Abstract:
This paper develops a general methodology to conduct statistical inference for observations indexed by multiple sets of entities. We propose a novel multiway empirical likelihood statistic that converges to a chi-square distribution under the non-degenerate case, where corresponding Hoeffding type decomposition is dominated by linear terms. Our methodology is related to the notion of jackknife emp…
▽ More
This paper develops a general methodology to conduct statistical inference for observations indexed by multiple sets of entities. We propose a novel multiway empirical likelihood statistic that converges to a chi-square distribution under the non-degenerate case, where corresponding Hoeffding type decomposition is dominated by linear terms. Our methodology is related to the notion of jackknife empirical likelihood but the leave-out pseudo values are constructed by leaving columns or rows. We further develop a modified version of our multiway empirical likelihood statistic, which converges to a chi-square distribution regardless of the degeneracy, and discover its desirable higher-order property compared to the t-ratio by the conventional Eicker-White type variance estimator. The proposed methodology is illustrated by several important statistical problems, such as bipartite network, generalized estimating equations, and three-way observations.
△ Less
Submitted 6 December, 2023; v1 submitted 10 August, 2021;
originally announced August 2021.
-
Algorithmic subsampling under multiway clustering
Authors:
Harold D. Chiang,
Jiatong Li,
Yuya Sasaki
Abstract:
This paper proposes a novel method of algorithmic subsampling (data sketching) for multiway cluster dependent data. We establish a new uniform weak law of large numbers and a new central limit theorem for the multiway algorithmic subsample means. Consequently, we discover an additional advantage of the algorithmic subsampling that it allows for robustness against potential degeneracy, and even non…
▽ More
This paper proposes a novel method of algorithmic subsampling (data sketching) for multiway cluster dependent data. We establish a new uniform weak law of large numbers and a new central limit theorem for the multiway algorithmic subsample means. Consequently, we discover an additional advantage of the algorithmic subsampling that it allows for robustness against potential degeneracy, and even non-Gaussian degeneracy, of the asymptotic distribution under multiway clustering. Simulation studies support this novel result, and demonstrate that inference with the algorithmic subsampling entails more accuracy than that without the algorithmic subsampling. Applying these basic asymptotic theories, we derive the consistency and the asymptotic normality for the multiway algorithmic subsampling generalized method of moments estimator and for the multiway algorithmic subsampling M-estimator. We illustrate an application to scanner data.
△ Less
Submitted 30 October, 2022; v1 submitted 28 February, 2021;
originally announced March 2021.
-
Linear programming approach to nonparametric inference under shape restrictions: with an application to regression kink designs
Authors:
Harold D. Chiang,
Kengo Kato,
Yuya Sasaki,
Takuya Ura
Abstract:
We develop a novel method of constructing confidence bands for nonparametric regression functions under shape constraints. This method can be implemented via a linear programming, and it is thus computationally appealing. We illustrate a usage of our proposed method with an application to the regression kink design (RKD). Econometric analyses based on the RKD often suffer from wide confidence inte…
▽ More
We develop a novel method of constructing confidence bands for nonparametric regression functions under shape constraints. This method can be implemented via a linear programming, and it is thus computationally appealing. We illustrate a usage of our proposed method with an application to the regression kink design (RKD). Econometric analyses based on the RKD often suffer from wide confidence intervals due to slow convergence rates of nonparametric derivative estimators. We demonstrate that economic models and structures motivate shape restrictions, which in turn contribute to shrinking the confidence interval for an analysis of the causal effects of unemployment insurance benefits on unemployment durations.
△ Less
Submitted 12 February, 2021;
originally announced February 2021.
-
Empirical likelihood and uniform convergence rates for dyadic kernel density estimation
Authors:
Harold D. Chiang,
Bing Yang Tan
Abstract:
This paper studies the asymptotic properties of and alternative inference methods for kernel density estimation (KDE) for dyadic data. We first establish uniform convergence rates for dyadic KDE. Secondly, we propose a modified jackknife empirical likelihood procedure for inference. The proposed test statistic is asymptotically pivotal regardless of presence of dyadic clustering. The results are f…
▽ More
This paper studies the asymptotic properties of and alternative inference methods for kernel density estimation (KDE) for dyadic data. We first establish uniform convergence rates for dyadic KDE. Secondly, we propose a modified jackknife empirical likelihood procedure for inference. The proposed test statistic is asymptotically pivotal regardless of presence of dyadic clustering. The results are further extended to cover the practically relevant case of incomplete dyadic data. Simulations show that this modified jackknife empirical likelihood-based inference procedure delivers precise coverage probabilities even with modest sample sizes and with incomplete dyadic data. Finally, we illustrate the method by studying airport congestion in the United States.
△ Less
Submitted 13 May, 2022; v1 submitted 17 October, 2020;
originally announced October 2020.
-
Inference for high-dimensional exchangeable arrays
Authors:
Harold D. Chiang,
Kengo Kato,
Yuya Sasaki
Abstract:
We consider inference for high-dimensional separately and jointly exchangeable arrays where the dimensions may be much larger than the sample sizes. For both exchangeable arrays, we first derive high-dimensional central limit theorems over the rectangles and subsequently develop novel multiplier bootstraps with theoretical guarantees. These theoretical results rely on new technical tools such as H…
▽ More
We consider inference for high-dimensional separately and jointly exchangeable arrays where the dimensions may be much larger than the sample sizes. For both exchangeable arrays, we first derive high-dimensional central limit theorems over the rectangles and subsequently develop novel multiplier bootstraps with theoretical guarantees. These theoretical results rely on new technical tools such as Hoeffding-type decomposition and maximal inequalities for the degenerate components in the Hoeffiding-type decomposition for the exchangeable arrays. We exhibit applications of our methods to uniform confidence bands for density estimation under joint exchangeability and penalty choice for $\ell_1$-penalized regression under separate exchangeability. Extensive simulations demonstrate precise uniform coverage rates. We illustrate by constructing uniform confidence bands for international trade network densities.
△ Less
Submitted 9 July, 2021; v1 submitted 10 September, 2020;
originally announced September 2020.
-
Multiway Cluster Robust Double/Debiased Machine Learning
Authors:
Harold D. Chiang,
Kengo Kato,
Yukun Ma,
Yuya Sasaki
Abstract:
This paper investigates double/debiased machine learning (DML) under multiway clustered sampling environments. We propose a novel multiway cross fitting algorithm and a multiway DML estimator based on this algorithm. We also develop a multiway cluster robust standard error formula. Simulations indicate that the proposed procedure has favorable finite sample performance. Applying the proposed metho…
▽ More
This paper investigates double/debiased machine learning (DML) under multiway clustered sampling environments. We propose a novel multiway cross fitting algorithm and a multiway DML estimator based on this algorithm. We also develop a multiway cluster robust standard error formula. Simulations indicate that the proposed procedure has favorable finite sample performance. Applying the proposed method to market share data for demand analysis, we obtain larger two-way cluster robust standard errors than non-robust ones.
△ Less
Submitted 4 March, 2020; v1 submitted 8 September, 2019;
originally announced September 2019.
-
Lasso under Multi-way Clustering: Estimation and Post-selection Inference
Authors:
Harold D. Chiang,
Yuya Sasaki
Abstract:
This paper studies high-dimensional regression models with lasso when data is sampled under multi-way clustering. First, we establish convergence rates for the lasso and post-lasso estimators. Second, we propose a novel inference method based on a post-double-selection procedure and show its asymptotic validity. Our procedure can be easily implemented with existing statistical packages. Simulation…
▽ More
This paper studies high-dimensional regression models with lasso when data is sampled under multi-way clustering. First, we establish convergence rates for the lasso and post-lasso estimators. Second, we propose a novel inference method based on a post-double-selection procedure and show its asymptotic validity. Our procedure can be easily implemented with existing statistical packages. Simulation results demonstrate that the proposed procedure works well in finite sample. We illustrate the proposed method with a couple of empirical applications to development and growth economics.
△ Less
Submitted 21 August, 2019; v1 submitted 6 May, 2019;
originally announced May 2019.
-
Post-Selection Inference in Three-Dimensional Panel Data
Authors:
Harold D. Chiang,
Joel Rodrigue,
Yuya Sasaki
Abstract:
Three-dimensional panel models are widely used in empirical analysis. Researchers use various combinations of fixed effects for three-dimensional panels. When one imposes a parsimonious model and the true model is rich, then it incurs mis-specification biases. When one employs a rich model and the true model is parsimonious, then it incurs larger standard errors than necessary. It is therefore use…
▽ More
Three-dimensional panel models are widely used in empirical analysis. Researchers use various combinations of fixed effects for three-dimensional panels. When one imposes a parsimonious model and the true model is rich, then it incurs mis-specification biases. When one employs a rich model and the true model is parsimonious, then it incurs larger standard errors than necessary. It is therefore useful for researchers to know correct models. In this light, Lu, Miao, and Su (2018) propose methods of model selection. We advance this literature by proposing a method of post-selection inference for regression parameters. Despite our use of the lasso technique as means of model selection, our assumptions allow for many and even all fixed effects to be nonzero. Simulation studies demonstrate that the proposed method is more precise than under-fitting fixed effect estimators, is more efficient than over-fitting fixed effect estimators, and allows for as accurate inference as the oracle estimator.
△ Less
Submitted 30 April, 2019; v1 submitted 30 March, 2019;
originally announced April 2019.
-
Many Average Partial Effects: with An Application to Text Regression
Authors:
Harold D. Chiang
Abstract:
We study estimation, pointwise and simultaneous inference, and confidence intervals for many average partial effects of lasso Logit. Focusing on high-dimensional, cluster-sampling environments, we propose a new average partial effect estimator and explore its asymptotic properties. Practical penalty choices compatible with our asymptotic theory are also provided. The proposed estimator allow for v…
▽ More
We study estimation, pointwise and simultaneous inference, and confidence intervals for many average partial effects of lasso Logit. Focusing on high-dimensional, cluster-sampling environments, we propose a new average partial effect estimator and explore its asymptotic properties. Practical penalty choices compatible with our asymptotic theory are also provided. The proposed estimator allow for valid inference without requiring oracle property. We provide easy-to-implement algorithms for cluster-robust high-dimensional hypothesis testing and construction of simultaneously valid confidence intervals using a multiplier cluster bootstrap. We apply the proposed algorithms to the text regression model of Wu (2018) to examine the presence of gendered language on the internet.
△ Less
Submitted 17 January, 2022; v1 submitted 21 December, 2018;
originally announced December 2018.
-
Quantile Treatment Effects in Regression Kink Designs
Authors:
Heng Chen,
Harold D. Chiang,
Yuya Sasaki
Abstract:
The literature on regression kink designs develops identification results for average effects of continuous treatments (Card, Lee, Pei, and Weber, 2015), average effects of binary treatments (Dong, 2018), and quantile-wise effects of continuous treatments (Chiang and Sasaki, 2019), but there has been no identification result for quantile-wise effects of binary treatments to date. In this paper, we…
▽ More
The literature on regression kink designs develops identification results for average effects of continuous treatments (Card, Lee, Pei, and Weber, 2015), average effects of binary treatments (Dong, 2018), and quantile-wise effects of continuous treatments (Chiang and Sasaki, 2019), but there has been no identification result for quantile-wise effects of binary treatments to date. In this paper, we fill this void in the literature by providing an identification of quantile treatment effects in regression kink designs with binary treatment variables. For completeness, we also develop large sample theories for statistical inference and a practical guideline on estimation and inference.
△ Less
Submitted 18 March, 2019; v1 submitted 15 March, 2017;
originally announced March 2017.
-
Robust Uniform Inference for Quantile Treatment Effects in Regression Discontinuity Designs
Authors:
Harold D. Chiang,
Yu-Chin Hsu,
Yuya Sasaki
Abstract:
The practical importance of inference with robustness against large bandwidths for causal effects in regression discontinuity and kink designs is widely recognized. Existing robust methods cover many cases, but do not handle uniform inference for CDF and quantile processes in fuzzy designs, despite its use in the recent literature in empirical microeconomics. In this light, this paper extends the…
▽ More
The practical importance of inference with robustness against large bandwidths for causal effects in regression discontinuity and kink designs is widely recognized. Existing robust methods cover many cases, but do not handle uniform inference for CDF and quantile processes in fuzzy designs, despite its use in the recent literature in empirical microeconomics. In this light, this paper extends the literature by develo** a unified framework of inference with robustness against large bandwidths that applies to uniform inference for quantile treatment effects in fuzzy designs, as well as all the other cases of sharp/fuzzy mean/quantile regression discontinuity/kink designs. We present Monte Carlo simulation studies and an empirical application for evaluations of the Oklahoma pre-K program.
△ Less
Submitted 23 February, 2019; v1 submitted 14 February, 2017;
originally announced February 2017.
-
Causal Inference by Quantile Regression Kink Designs
Authors:
Harold D. Chiang,
Yuya Sasaki
Abstract:
The quantile regression kink design (QRKD) is proposed by empirical researchers as a potential method to assess heterogeneous treatment effects under suitable research designs, but its causal interpretation remains unknown. We propose a causal interpretation of the QRKD estimand. Under flexible heterogeneity and endogeneity, the QRKD estimand measures a weighted average of heterogeneous marginal e…
▽ More
The quantile regression kink design (QRKD) is proposed by empirical researchers as a potential method to assess heterogeneous treatment effects under suitable research designs, but its causal interpretation remains unknown. We propose a causal interpretation of the QRKD estimand. Under flexible heterogeneity and endogeneity, the QRKD estimand measures a weighted average of heterogeneous marginal effects at respective conditional quantiles of outcome given a designed kink point. In addition, we develop weak convergence results for the QRKD estimator as a local quantile process for the purpose of conducting statistical inference on heterogeneous treatment effects using the QRKD. Applying our methods to the Continuous Wage and Benefit History Project (CWBH) data, we find significantly heterogeneous positive causal effects of unemployment insurance benefits on unemployment durations in Louisiana between 1981 and 1983. These effects are larger for individuals with longer unemployment durations.
△ Less
Submitted 14 December, 2017; v1 submitted 31 May, 2016;
originally announced May 2016.