-
Efficient Bias Correction for Cross-section and Panel Data
Authors:
**yong Hahn,
David W. Hughes,
Guido Kuersteiner,
Whitney K. Newey
Abstract:
Bias correction can often improve the finite sample performance of estimators. We show that the choice of bias correction method has no effect on the higher-order variance of semiparametrically efficient parametric estimators, so long as the estimate of the bias is asymptotically linear. It is also shown that bootstrap, jackknife, and analytical bias estimates are asymptotically linear for estimat…
▽ More
Bias correction can often improve the finite sample performance of estimators. We show that the choice of bias correction method has no effect on the higher-order variance of semiparametrically efficient parametric estimators, so long as the estimate of the bias is asymptotically linear. It is also shown that bootstrap, jackknife, and analytical bias estimates are asymptotically linear for estimators with higher-order expansions of a standard form. In particular, we find that for a variety of estimators the straightforward bootstrap bias correction gives the same higher-order variance as more complicated analytical or jackknife bias corrections. In contrast, bias corrections that do not estimate the bias at the parametric rate, such as the split-sample jackknife, result in larger higher-order variances in the i.i.d. setting we focus on. For both a cross-sectional MLE and a panel model with individual fixed effects, we show that the split-sample jackknife has a higher-order variance term that is twice as large as that of the `leave-one-out' jackknife.
△ Less
Submitted 26 January, 2024; v1 submitted 20 July, 2022;
originally announced July 2022.
-
RieszNet and ForestRiesz: Automatic Debiased Machine Learning with Neural Nets and Random Forests
Authors:
Victor Chernozhukov,
Whitney K. Newey,
Victor Quintas-Martinez,
Vasilis Syrgkanis
Abstract:
Many causal and policy effects of interest are defined by linear functionals of high-dimensional or non-parametric regression functions. $\sqrt{n}$-consistent and asymptotically normal estimation of the object of interest requires debiasing to reduce the effects of regularization and/or model selection on the object of interest. Debiasing is typically achieved by adding a correction term to the pl…
▽ More
Many causal and policy effects of interest are defined by linear functionals of high-dimensional or non-parametric regression functions. $\sqrt{n}$-consistent and asymptotically normal estimation of the object of interest requires debiasing to reduce the effects of regularization and/or model selection on the object of interest. Debiasing is typically achieved by adding a correction term to the plug-in estimator of the functional, which leads to properties such as semi-parametric efficiency, double robustness, and Neyman orthogonality. We implement an automatic debiasing procedure based on automatically learning the Riesz representation of the linear functional using Neural Nets and Random Forests. Our method only relies on black-box evaluation oracle access to the linear functional and does not require knowledge of its analytic form. We propose a multitasking Neural Net debiasing method with stochastic gradient descent minimization of a combined Riesz representer and regression loss, while sharing representation layers for the two functions. We also propose a Random Forest method which learns a locally linear representation of the Riesz function. Even though our method applies to arbitrary functionals, we experimentally find that it performs well compared to the state of art neural net based algorithm of Shi et al. (2019) for the case of the average treatment effect functional. We also evaluate our method on the problem of estimating average marginal effects with continuous treatments, using semi-synthetic data of gasoline price changes on gasoline demand.
△ Less
Submitted 15 June, 2022; v1 submitted 6 October, 2021;
originally announced October 2021.
-
A Simple and General Debiased Machine Learning Theorem with Finite Sample Guarantees
Authors:
Victor Chernozhukov,
Whitney K. Newey,
Rahul Singh
Abstract:
Debiased machine learning is a meta algorithm based on bias correction and sample splitting to calculate confidence intervals for functionals, i.e. scalar summaries, of machine learning algorithms. For example, an analyst may desire the confidence interval for a treatment effect estimated with a neural network. We provide a nonasymptotic debiased machine learning theorem that encompasses any globa…
▽ More
Debiased machine learning is a meta algorithm based on bias correction and sample splitting to calculate confidence intervals for functionals, i.e. scalar summaries, of machine learning algorithms. For example, an analyst may desire the confidence interval for a treatment effect estimated with a neural network. We provide a nonasymptotic debiased machine learning theorem that encompasses any global or local functional of any machine learning algorithm that satisfies a few simple, interpretable conditions. Formally, we prove consistency, Gaussian approximation, and semiparametric efficiency by finite sample arguments. The rate of convergence is $n^{-1/2}$ for global functionals, and it degrades gracefully for local functionals. Our results culminate in a simple set of conditions that an analyst can use to translate modern learning theory rates into traditional statistical inference. The conditions reveal a general double robustness property for ill posed inverse problems.
△ Less
Submitted 21 October, 2022; v1 submitted 31 May, 2021;
originally announced May 2021.
-
Automatic Debiased Machine Learning via Riesz Regression
Authors:
Victor Chernozhukov,
Whitney K. Newey,
Victor Quintas-Martinez,
Vasilis Syrgkanis
Abstract:
A variety of interesting parameters may depend on high dimensional regressions. Machine learning can be used to estimate such parameters. However estimators based on machine learners can be severely biased by regularization and/or model selection. Debiased machine learning uses Neyman orthogonal estimating equations to reduce such biases. Debiased machine learning generally requires estimation of…
▽ More
A variety of interesting parameters may depend on high dimensional regressions. Machine learning can be used to estimate such parameters. However estimators based on machine learners can be severely biased by regularization and/or model selection. Debiased machine learning uses Neyman orthogonal estimating equations to reduce such biases. Debiased machine learning generally requires estimation of unknown Riesz representers. A primary innovation of this paper is to provide Riesz regression estimators of Riesz representers that depend on the parameter of interest, rather than explicit formulae, and that can employ any machine learner, including neural nets and random forests. End-to-end algorithms emerge where the researcher chooses the parameter of interest and the machine learner and the debiasing follows automatically. Another innovation here is debiased machine learners of parameters depending on generalized regressions, including high-dimensional generalized linear models. An empirical example of automatic debiased machine learning using neural nets is given. We find in Monte Carlo examples that automatic debiasing sometimes performs better than debiasing via inverse propensity scores and never worse. Finite sample mean square error bounds for Riesz regression estimators and asymptotic theory are also given.
△ Less
Submitted 14 March, 2024; v1 submitted 29 April, 2021;
originally announced April 2021.
-
Heterogeneous Coefficients, Control Variables, and Identification of Multiple Treatment Effects
Authors:
Whitney K. Newey,
Sami Stouli
Abstract:
Multidimensional heterogeneity and endogeneity are important features of models with multiple treatments. We consider a heterogeneous coefficients model where the outcome is a linear combination of dummy treatment variables, with each variable representing a different kind of treatment. We use control variables to give necessary and sufficient conditions for identification of average treatment eff…
▽ More
Multidimensional heterogeneity and endogeneity are important features of models with multiple treatments. We consider a heterogeneous coefficients model where the outcome is a linear combination of dummy treatment variables, with each variable representing a different kind of treatment. We use control variables to give necessary and sufficient conditions for identification of average treatment effects. With mutually exclusive treatments we find that, provided the heterogeneous coefficients are mean independent from treatments given the controls, a simple identification condition is that the generalized propensity scores (Imbens, 2000) be bounded away from zero and that their sum be bounded away from one, with probability one. Our analysis extends to distributional and quantile treatment effects, as well as corresponding treatment effects on the treated. These results generalize the classical identification result of Rosenbaum and Rubin (1983) for binary treatments.
△ Less
Submitted 22 November, 2021; v1 submitted 4 September, 2020;
originally announced September 2020.
-
Minimax Semiparametric Learning With Approximate Sparsity
Authors:
Jelena Bradic,
Victor Chernozhukov,
Whitney K. Newey,
Yinchu Zhu
Abstract:
This paper is about the feasibility and means of root-n consistently estimating linear, mean-square continuous functionals of a high dimensional, approximately sparse regression. Such objects include a wide variety of interesting parameters such as regression coefficients, average derivatives, and the average treatment effect. We give lower bounds on the convergence rate of estimators of a regress…
▽ More
This paper is about the feasibility and means of root-n consistently estimating linear, mean-square continuous functionals of a high dimensional, approximately sparse regression. Such objects include a wide variety of interesting parameters such as regression coefficients, average derivatives, and the average treatment effect. We give lower bounds on the convergence rate of estimators of a regression slope and an average derivative and find that these bounds are substantially larger than in a low dimensional, semiparametric setting. We also give debiased machine learners that are root-n consistent under either a minimal approximate sparsity condition or rate double robustness. These estimators improve on existing estimators in being root-n consistent under more general conditions that previously known.
△ Less
Submitted 8 August, 2022; v1 submitted 27 December, 2019;
originally announced December 2019.
-
Testing the Drift-Diffusion Model
Authors:
Drew Fudenberg,
Whitney K. Newey,
Philipp Strack,
Tomasz Strzalecki
Abstract:
The drift diffusion model (DDM) is a model of sequential sampling with diffusion (Brownian) signals, where the decision maker accumulates evidence until the process hits a stop** boundary, and then stops and chooses the alternative that corresponds to that boundary. This model has been widely used in psychology, neuroeconomics, and neuroscience to explain the observed patterns of choice and resp…
▽ More
The drift diffusion model (DDM) is a model of sequential sampling with diffusion (Brownian) signals, where the decision maker accumulates evidence until the process hits a stop** boundary, and then stops and chooses the alternative that corresponds to that boundary. This model has been widely used in psychology, neuroeconomics, and neuroscience to explain the observed patterns of choice and response times in a range of binary choice decision problems. This paper provides a statistical test for DDM's with general boundaries. We first prove a characterization theorem: we find a condition on choice probabilities that is satisfied if and only if the choice probabilities are generated by some DDM. Moreover, we show that the drift and the boundary are uniquely identified. We then use our condition to nonparametrically estimate the drift and the boundary and construct a test statistic.
△ Less
Submitted 15 August, 2019;
originally announced August 2019.
-
Heterogenous Coefficients, Discrete Instruments, and Identification of Treatment Effects
Authors:
Whitney K. Newey,
Sami Stouli
Abstract:
Multidimensional heterogeneity and endogeneity are important features of a wide class of econometric models. We consider heterogenous coefficients models where the outcome is a linear combination of known functions of treatment and heterogenous coefficients. We use control variables to obtain identification results for average treatment effects. With discrete instruments in a triangular model we f…
▽ More
Multidimensional heterogeneity and endogeneity are important features of a wide class of econometric models. We consider heterogenous coefficients models where the outcome is a linear combination of known functions of treatment and heterogenous coefficients. We use control variables to obtain identification results for average treatment effects. With discrete instruments in a triangular model we find that average treatment effects cannot be identified when the number of support points is less than or equal to the number of coefficients. A sufficient condition for identification is that the second moment matrix of the treatment functions given the control is nonsingular with probability one. We relate this condition to identification of average treatment effects with multiple treatments.
△ Less
Submitted 24 November, 2018;
originally announced November 2018.
-
Automatic Debiased Machine Learning of Causal and Structural Effects
Authors:
Victor Chernozhukov,
Whitney K Newey,
Rahul Singh
Abstract:
Many causal and structural effects depend on regressions. Examples include policy effects, average derivatives, regression decompositions, average treatment effects, causal mediation, and parameters of economic structural models. The regressions may be high dimensional, making machine learning useful. Plugging machine learners into identifying equations can lead to poor inference due to bias from…
▽ More
Many causal and structural effects depend on regressions. Examples include policy effects, average derivatives, regression decompositions, average treatment effects, causal mediation, and parameters of economic structural models. The regressions may be high dimensional, making machine learning useful. Plugging machine learners into identifying equations can lead to poor inference due to bias from regularization and/or model selection. This paper gives automatic debiasing for linear and nonlinear functions of regressions. The debiasing is automatic in using Lasso and the function of interest without the full form of the bias correction. The debiasing can be applied to any regression learner, including neural nets, random forests, Lasso, boosting, and other high dimensional methods. In addition to providing the bias correction we give standard errors that are robust to misspecification, convergence rates for the bias correction, and primitive conditions for asymptotic inference for estimators of a variety of estimators of structural and causal effects. The automatic debiased machine learning is used to estimate the average treatment effect on the treated for the NSW job training data and to estimate demand elasticities from Nielsen scanner data while allowing preferences to be correlated with prices and income.
△ Less
Submitted 21 October, 2022; v1 submitted 13 September, 2018;
originally announced September 2018.
-
Locally Robust Semiparametric Estimation
Authors:
Victor Chernozhukov,
Juan Carlos Escanciano,
Hidehiko Ichimura,
Whitney K. Newey,
James M. Robins
Abstract:
Many economic and causal parameters depend on nonparametric or high dimensional first steps. We give a general construction of locally robust/orthogonal moment functions for GMM, where moment conditions have zero derivative with respect to first steps. We show that orthogonal moment functions can be constructed by adding to identifying moments the nonparametric influence function for the effect of…
▽ More
Many economic and causal parameters depend on nonparametric or high dimensional first steps. We give a general construction of locally robust/orthogonal moment functions for GMM, where moment conditions have zero derivative with respect to first steps. We show that orthogonal moment functions can be constructed by adding to identifying moments the nonparametric influence function for the effect of the first step on identifying moments. Orthogonal moments reduce model selection and regularization bias, as is very important in many applications, especially for machine learning first steps.
We give debiased machine learning estimators of functionals of high dimensional conditional quantiles and of dynamic discrete choice parameters with high dimensional state variables. We show that adding to identifying moments the nonparametric influence function provides a general construction of orthogonal moments, including regularity conditions, and show that the nonparametric influence function is robust to additional unknown functions on which it depends. We give a general approach to estimating the unknown functions in the nonparametric influence function and use it to automatically debias estimators of functionals of high dimensional conditional location learners. We give a variety of new doubly robust moment equations and characterize double robustness. We give general and simple regularity conditions and apply these for asymptotic inference on functionals of high dimensional regression quantiles and dynamic discrete choice parameters with high dimensional state variables.
△ Less
Submitted 3 August, 2020; v1 submitted 29 July, 2016;
originally announced August 2016.
-
Inference in Linear Regression Models with Many Covariates and Heteroskedasticity
Authors:
Matias D. Cattaneo,
Michael Jansson,
Whitney K. Newey
Abstract:
The linear regression model is widely used in empirical work in Economics, Statistics, and many other disciplines. Researchers often include many covariates in their linear model specification in an attempt to control for confounders. We give inference methods that allow for many covariates and heteroskedasticity. Our results are obtained using high-dimensional approximations, where the number of…
▽ More
The linear regression model is widely used in empirical work in Economics, Statistics, and many other disciplines. Researchers often include many covariates in their linear model specification in an attempt to control for confounders. We give inference methods that allow for many covariates and heteroskedasticity. Our results are obtained using high-dimensional approximations, where the number of included covariates are allowed to grow as fast as the sample size. We find that all of the usual versions of Eicker-White heteroskedasticity consistent standard error estimators for linear models are inconsistent under this asymptotics. We then propose a new heteroskedasticity consistent standard error formula that is fully automatic and robust to both (conditional)\ heteroskedasticity of unknown form and the inclusion of possibly many covariates. We apply our findings to three settings: parametric linear models with many covariates, linear panel models with many fixed effects, and semiparametric semi-linear models with many technical regressors. Simulation evidence consistent with our theoretical results is also provided. The proposed methods are also illustrated with an empirical application.
△ Less
Submitted 16 January, 2017; v1 submitted 9 July, 2015;
originally announced July 2015.
-
Alternative Asymptotics and the Partially Linear Model with Many Regressors
Authors:
Matias D. Cattaneo,
Michael Jansson,
Whitney K. Newey
Abstract:
Non-standard distributional approximations have received considerable attention in recent years. They often provide more accurate approximations in small samples, and theoretical improvements in some cases. This paper shows that the seemingly unrelated "many instruments asymptotics" and "small bandwidth asymptotics" share a common structure, where the object determining the limiting distribution i…
▽ More
Non-standard distributional approximations have received considerable attention in recent years. They often provide more accurate approximations in small samples, and theoretical improvements in some cases. This paper shows that the seemingly unrelated "many instruments asymptotics" and "small bandwidth asymptotics" share a common structure, where the object determining the limiting distribution is a V-statistic with a remainder that is an asymptotically normal degenerate U-statistic. We illustrate how this general structure can be used to derive new results by obtaining a new asymptotic distribution of a series estimator of the partially linear model when the number of terms in the series approximation possibly grows as fast as the sample size, which we call "many terms asymptotics".
△ Less
Submitted 28 May, 2015;
originally announced May 2015.