Search | arXiv e-print repository

Estimating Causal Effects of Discrete and Continuous Treatments with Binary Instruments

Authors: Victor Chernozhukov, Iván Fernández-Val, Suk** Han, Kaspar Wüthrich

Abstract: We propose an instrumental variable framework for identifying and estimating average and quantile effects of discrete and continuous treatments with binary instruments. The basis of our approach is a local copula representation of the joint distribution of the potential outcomes and unobservables determining treatment assignment. This representation allows us to introduce an identifying assumption… ▽ More We propose an instrumental variable framework for identifying and estimating average and quantile effects of discrete and continuous treatments with binary instruments. The basis of our approach is a local copula representation of the joint distribution of the potential outcomes and unobservables determining treatment assignment. This representation allows us to introduce an identifying assumption, so-called copula invariance, that restricts the local dependence of the copula with respect to the treatment propensity. We show that copula invariance identifies treatment effects for the entire population and other subpopulations such as the treated. The identification results are constructive and lead to straightforward semiparametric estimation procedures based on distribution regression. An application to the effect of sleep on well-being uncovers interesting patterns of heterogeneity. △ Less

Submitted 9 March, 2024; originally announced March 2024.

arXiv:2403.02467 [pdf]

Applied Causal Inference Powered by ML and AI

Authors: Victor Chernozhukov, Christian Hansen, Nathan Kallus, Martin Spindler, Vasilis Syrgkanis

Abstract: An introduction to the emerging fusion of machine learning and causal inference. The book presents ideas from classical structural equation models (SEMs) and their modern AI equivalent, directed acyclical graphs (DAGs) and structural causal models (SCMs), and covers Double/Debiased Machine Learning methods to do inference in such models using modern predictive tools. An introduction to the emerging fusion of machine learning and causal inference. The book presents ideas from classical structural equation models (SEMs) and their modern AI equivalent, directed acyclical graphs (DAGs) and structural causal models (SCMs), and covers Double/Debiased Machine Learning methods to do inference in such models using modern predictive tools. △ Less

Submitted 4 March, 2024; originally announced March 2024.

arXiv:2402.04674 [pdf, other]

Hyperparameter Tuning for Causal Inference with Double Machine Learning: A Simulation Study

Authors: Philipp Bach, Oliver Schacht, Victor Chernozhukov, Sven Klaassen, Martin Spindler

Abstract: Proper hyperparameter tuning is essential for achieving optimal performance of modern machine learning (ML) methods in predictive tasks. While there is an extensive literature on tuning ML learners for prediction, there is only little guidance available on tuning ML learners for causal machine learning and how to select among different ML learners. In this paper, we empirically assess the relation… ▽ More Proper hyperparameter tuning is essential for achieving optimal performance of modern machine learning (ML) methods in predictive tasks. While there is an extensive literature on tuning ML learners for prediction, there is only little guidance available on tuning ML learners for causal machine learning and how to select among different ML learners. In this paper, we empirically assess the relationship between the predictive performance of ML methods and the resulting causal estimation based on the Double Machine Learning (DML) approach by Chernozhukov et al. (2018). DML relies on estimating so-called nuisance parameters by treating them as supervised learning problems and using them as plug-in estimates to solve for the (causal) parameter. We conduct an extensive simulation study using data from the 2019 Atlantic Causal Inference Conference Data Challenge. We provide empirical insights on the role of hyperparameter tuning and other practical decisions for causal estimation with DML. First, we assess the importance of data splitting schemes for tuning ML learners within Double Machine Learning. Second, we investigate how the choice of ML methods and hyperparameters, including recent AutoML frameworks, impacts the estimation performance for a causal parameter of interest. Third, we assess to what extent the choice of a particular causal model, as characterized by incorporated parametric assumptions, can be based on predictive performance metrics. △ Less

Submitted 7 February, 2024; originally announced February 2024.

arXiv:2402.01785 [pdf, other]

DoubleMLDeep: Estimation of Causal Effects with Multimodal Data

Authors: Sven Klaassen, Jan Teichert-Kluge, Philipp Bach, Victor Chernozhukov, Martin Spindler, Suhas Vijaykumar

Abstract: This paper explores the use of unstructured, multimodal data, namely text and images, in causal inference and treatment effect estimation. We propose a neural network architecture that is adapted to the double machine learning (DML) framework, specifically the partially linear model. An additional contribution of our paper is a new method to generate a semi-synthetic dataset which can be used to e… ▽ More This paper explores the use of unstructured, multimodal data, namely text and images, in causal inference and treatment effect estimation. We propose a neural network architecture that is adapted to the double machine learning (DML) framework, specifically the partially linear model. An additional contribution of our paper is a new method to generate a semi-synthetic dataset which can be used to evaluate the performance of causal effect estimation in the presence of text and images as confounders. The proposed methods and architectures are evaluated on the semi-synthetic dataset and compared to standard approaches, highlighting the potential benefit of using text and images directly in causal studies. Our findings have implications for researchers and practitioners in economics, marketing, finance, medicine and data science in general who are interested in estimating causal quantities using non-traditional data. △ Less

Submitted 1 February, 2024; originally announced February 2024.

MSC Class: 62; 91 ACM Class: I.2.0

arXiv:2307.04527 [pdf, other]

Automatic Debiased Machine Learning for Covariate Shifts

Authors: Victor Chernozhukov, Michael Newey, Whitney K Newey, Rahul Singh, Vasilis Srygkanis

Abstract: In this paper we address the problem of bias in machine learning of parameters following covariate shifts. Covariate shift occurs when the distribution of input features change between the training and deployment stages. Regularization and model selection associated with machine learning biases many parameter estimates. In this paper, we propose an automatic debiased machine learning approach to c… ▽ More In this paper we address the problem of bias in machine learning of parameters following covariate shifts. Covariate shift occurs when the distribution of input features change between the training and deployment stages. Regularization and model selection associated with machine learning biases many parameter estimates. In this paper, we propose an automatic debiased machine learning approach to correct for this bias under covariate shifts. The proposed approach leverages state-of-the-art techniques in debiased machine learning to debias estimators of policy and causal parameters when covariate shift is present. The debiasing is automatic in only relying on the parameter of interest and not requiring the form of the form of the bias. We show that our estimator is asymptotically normal as the sample size grows. Finally, we demonstrate the proposed method on a regression problem using a Monte-Carlo simulation. △ Less

Submitted 19 April, 2024; v1 submitted 10 July, 2023; originally announced July 2023.

arXiv:2301.07782 [pdf, other]

doi 10.1016/S0304-4076(03)00100-3

An MCMC Approach to Classical Estimation

Authors: Victor Chernozhukov, Han Hong

Abstract: This paper studies computationally and theoretically attractive estimators called the Laplace type estimators (LTE), which include means and quantiles of Quasi-posterior distributions defined as transformations of general (non-likelihood-based) statistical criterion functions, such as those in GMM, nonlinear IV, empirical likelihood, and minimum distance methods. The approach generates an alternat… ▽ More This paper studies computationally and theoretically attractive estimators called the Laplace type estimators (LTE), which include means and quantiles of Quasi-posterior distributions defined as transformations of general (non-likelihood-based) statistical criterion functions, such as those in GMM, nonlinear IV, empirical likelihood, and minimum distance methods. The approach generates an alternative to classical extremum estimation and also falls outside the parametric Bayesian approach. For example, it offers a new attractive estimation method for such important semi-parametric problems as censored and instrumental quantile, nonlinear GMM and value-at-risk models. The LTE's are computed using Markov Chain Monte Carlo methods, which help circumvent the computational curse of dimensionality. A large sample theory is obtained for regular cases. △ Less

Submitted 18 January, 2023; originally announced January 2023.

Comments: This is an archival version of the article "An MCMC approach to classical estimation", Journal of econometrics 115 (2), August 2003, pages 293-346. This version does not reflect the corrections made to the article during the publication process; it contains additional two remarks added, as indicated in the text. 62 pages, 7 figures

Journal ref: Journal of econometrics 115 (2), August 2003, pages 293-346

arXiv:2207.13081 [pdf, other]

Future-Dependent Value-Based Off-Policy Evaluation in POMDPs

Authors: Masatoshi Uehara, Haruka Kiyohara, Andrew Bennett, Victor Chernozhukov, Nan Jiang, Nathan Kallus, Chengchun Shi, Wen Sun

Abstract: We study off-policy evaluation (OPE) for partially observable MDPs (POMDPs) with general function approximation. Existing methods such as sequential importance sampling estimators and fitted-Q evaluation suffer from the curse of horizon in POMDPs. To circumvent this problem, we develop a novel model-free OPE method by introducing future-dependent value functions that take future proxies as inputs.… ▽ More We study off-policy evaluation (OPE) for partially observable MDPs (POMDPs) with general function approximation. Existing methods such as sequential importance sampling estimators and fitted-Q evaluation suffer from the curse of horizon in POMDPs. To circumvent this problem, we develop a novel model-free OPE method by introducing future-dependent value functions that take future proxies as inputs. Future-dependent value functions play similar roles as classical value functions in fully-observable MDPs. We derive a new Bellman equation for future-dependent value functions as conditional moment equations that use history proxies as instrumental variables. We further propose a minimax learning method to learn future-dependent value functions using the new Bellman equation. We obtain the PAC result, which implies our OPE estimator is consistent as long as futures and histories contain sufficient information about latent states, and the Bellman completeness. Finally, we extend our methods to learning of dynamics and establish the connection between our approach and the well-known spectral learning methods in POMDPs. △ Less

Submitted 14 November, 2023; v1 submitted 26 July, 2022; originally announced July 2022.

Comments: This paper was accepted in NeurIPS 2023

arXiv:2203.13887 [pdf, other]

Automatic Debiased Machine Learning for Dynamic Treatment Effects and General Nested Functionals

Authors: Victor Chernozhukov, Whitney Newey, Rahul Singh, Vasilis Syrgkanis

Abstract: We extend the idea of automated debiased machine learning to the dynamic treatment regime and more generally to nested functionals. We show that the multiply robust formula for the dynamic treatment regime with discrete treatments can be re-stated in terms of a recursive Riesz representer characterization of nested mean regressions. We then apply a recursive Riesz representer estimation learning a… ▽ More We extend the idea of automated debiased machine learning to the dynamic treatment regime and more generally to nested functionals. We show that the multiply robust formula for the dynamic treatment regime with discrete treatments can be re-stated in terms of a recursive Riesz representer characterization of nested mean regressions. We then apply a recursive Riesz representer estimation learning algorithm that estimates de-biasing corrections without the need to characterize how the correction terms look like, such as for instance, products of inverse probability weighting terms, as is done in prior work on doubly robust estimation in the dynamic regime. Our approach defines a sequence of loss minimization problems, whose minimizers are the mulitpliers of the de-biasing correction, hence circumventing the need for solving auxiliary propensity models and directly optimizing for the mean squared error of the target de-biasing correction. We provide further applications of our approach to estimation of dynamic discrete choice models and estimation of long-term effects with surrogates. △ Less

Submitted 20 June, 2023; v1 submitted 25 March, 2022; originally announced March 2022.

arXiv:2112.13398 [pdf, other]

Long Story Short: Omitted Variable Bias in Causal Machine Learning

Authors: Victor Chernozhukov, Carlos Cinelli, Whitney Newey, Amit Sharma, Vasilis Syrgkanis

Abstract: We develop a general theory of omitted variable bias for a wide range of common causal parameters, including (but not limited to) averages of potential outcomes, average treatment effects, average causal derivatives, and policy effects from covariate shifts. Our theory applies to nonparametric models, while naturally allowing for (semi-)parametric restrictions (such as partial linearity) when such… ▽ More We develop a general theory of omitted variable bias for a wide range of common causal parameters, including (but not limited to) averages of potential outcomes, average treatment effects, average causal derivatives, and policy effects from covariate shifts. Our theory applies to nonparametric models, while naturally allowing for (semi-)parametric restrictions (such as partial linearity) when such assumptions are made. We show how simple plausibility judgments on the maximum explanatory power of omitted variables are sufficient to bound the magnitude of the bias, thus facilitating sensitivity analysis in otherwise complex, nonlinear models. Finally, we provide flexible and efficient statistical inference methods for the bounds, which can leverage modern machine learning algorithms for estimation. These results allow empirical researchers to perform sensitivity analyses in a flexible class of machine-learned causal models using very simple, and interpretable, tools. We demonstrate the utility of our approach with two empirical examples. △ Less

Submitted 26 May, 2024; v1 submitted 26 December, 2021; originally announced December 2021.

Comments: This is an extended version of the paper was prepared for the NeurIPS-2021 Workshop "Causal Inference & Machine Learning: Why now?"; 55 pages; 10 figures

MSC Class: 62G

arXiv:2110.06136 [pdf, other]

A Response to Philippe Lemoine's Critique on our Paper "Causal Impact of Masks, Policies, Behavior on Early Covid-19 Pandemic in the U.S."

Authors: Victor Chernozhukov, Hiroyuki Kasahara, Paul Schrimpf

Abstract: Recently, Phillippe Lemoine posted a critique of our paper "Causal Impact of Masks, Policies, Behavior on Early Covid-19 Pandemic in the U.S." [arXiv:2005.14168] at his post titled "Lockdowns, econometrics and the art of putting lipstick on a pig." Although Lemoine's critique appears ideologically driven and overly emotional, some of his points are worth addressing. In particular, the sensitivity… ▽ More Recently, Phillippe Lemoine posted a critique of our paper "Causal Impact of Masks, Policies, Behavior on Early Covid-19 Pandemic in the U.S." [arXiv:2005.14168] at his post titled "Lockdowns, econometrics and the art of putting lipstick on a pig." Although Lemoine's critique appears ideologically driven and overly emotional, some of his points are worth addressing. In particular, the sensitivity of our estimation results for (i) including "masks in public spaces" and (ii) updating the data seems important critiques and, therefore, we decided to analyze the updated data ourselves. This note summarizes our findings from re-examining the updated data and responds to Phillippe Lemoine's critique on these two important points. We also briefly discuss other points Lemoine raised in his post. After analyzing the updated data, we find evidence that reinforces the conclusions reached in the original study. △ Less

Submitted 10 October, 2021; originally announced October 2021.

arXiv:2110.03031 [pdf, other]

RieszNet and ForestRiesz: Automatic Debiased Machine Learning with Neural Nets and Random Forests

Authors: Victor Chernozhukov, Whitney K. Newey, Victor Quintas-Martinez, Vasilis Syrgkanis

Abstract: Many causal and policy effects of interest are defined by linear functionals of high-dimensional or non-parametric regression functions. $\sqrt{n}$-consistent and asymptotically normal estimation of the object of interest requires debiasing to reduce the effects of regularization and/or model selection on the object of interest. Debiasing is typically achieved by adding a correction term to the pl… ▽ More Many causal and policy effects of interest are defined by linear functionals of high-dimensional or non-parametric regression functions. $\sqrt{n}$-consistent and asymptotically normal estimation of the object of interest requires debiasing to reduce the effects of regularization and/or model selection on the object of interest. Debiasing is typically achieved by adding a correction term to the plug-in estimator of the functional, which leads to properties such as semi-parametric efficiency, double robustness, and Neyman orthogonality. We implement an automatic debiasing procedure based on automatically learning the Riesz representation of the linear functional using Neural Nets and Random Forests. Our method only relies on black-box evaluation oracle access to the linear functional and does not require knowledge of its analytic form. We propose a multitasking Neural Net debiasing method with stochastic gradient descent minimization of a combined Riesz representer and regression loss, while sharing representation layers for the two functions. We also propose a Random Forest method which learns a locally linear representation of the Riesz function. Even though our method applies to arbitrary functionals, we experimentally find that it performs well compared to the state of art neural net based algorithm of Shi et al. (2019) for the case of the average treatment effect functional. We also evaluate our method on the problem of estimating average marginal effects with continuous treatments, using semi-synthetic data of gasoline price changes on gasoline demand. △ Less

Submitted 15 June, 2022; v1 submitted 6 October, 2021; originally announced October 2021.

Comments: Accepted for a long presentation at the ICML. Code available at https://github.com/victor5as/RieszLearning

arXiv:2107.02602 [pdf, ps, other]

Inference for Low-Rank Models

Authors: Victor Chernozhukov, Christian Hansen, Yuan Liao, Yinchu Zhu

Abstract: This paper studies inference in linear models with a high-dimensional parameter matrix that can be well-approximated by a ``spiked low-rank matrix.'' A spiked low-rank matrix has rank that grows slowly compared to its dimensions and nonzero singular values that diverge to infinity. We show that this framework covers a broad class of models of latent-variables which can accommodate matrix completio… ▽ More This paper studies inference in linear models with a high-dimensional parameter matrix that can be well-approximated by a ``spiked low-rank matrix.'' A spiked low-rank matrix has rank that grows slowly compared to its dimensions and nonzero singular values that diverge to infinity. We show that this framework covers a broad class of models of latent-variables which can accommodate matrix completion problems, factor models, varying coefficient models, and heterogeneous treatment effects. For inference, we apply a procedure that relies on an initial nuclear-norm penalized estimation step followed by two ordinary least squares regressions. We consider the framework of estimating incoherent eigenvectors and use a rotation argument to argue that the eigenspace estimation is asymptotically unbiased. Using this framework we show that our procedure provides asymptotically normal inference and achieves the semiparametric efficiency bound. We illustrate our framework by providing low-level conditions for its application in a treatment effects context where treatment assignment might be strongly dependent. △ Less

Submitted 2 January, 2023; v1 submitted 6 July, 2021; originally announced July 2021.

arXiv:2106.09762 [pdf, other]

Causal Bias Quantification for Continuous Treatments

Authors: Gianluca Detommaso, Michael Brückner, Philip Schulz, Victor Chernozhukov

Abstract: We extend the definition of the marginal causal effect to the continuous treatment setting and develop a novel characterization of causal bias in the framework of structural causal models. We prove that our derived bias expression is zero if, and only if, the causal effect is identifiable via covariate adjustment. We show that under some restrictions on the structural equations, the causal bias ca… ▽ More We extend the definition of the marginal causal effect to the continuous treatment setting and develop a novel characterization of causal bias in the framework of structural causal models. We prove that our derived bias expression is zero if, and only if, the causal effect is identifiable via covariate adjustment. We show that under some restrictions on the structural equations, the causal bias can be estimated efficiently and allows for causal regularization of predictive probabilistic models. We demonstrate the effectiveness of our method for causal bias quantification in various settings where (not) controlling for certain covariates would introduce causal bias. △ Less

Submitted 30 January, 2022; v1 submitted 17 June, 2021; originally announced June 2021.

arXiv:2105.15197 [pdf, ps, other]

A Simple and General Debiased Machine Learning Theorem with Finite Sample Guarantees

Authors: Victor Chernozhukov, Whitney K. Newey, Rahul Singh

Abstract: Debiased machine learning is a meta algorithm based on bias correction and sample splitting to calculate confidence intervals for functionals, i.e. scalar summaries, of machine learning algorithms. For example, an analyst may desire the confidence interval for a treatment effect estimated with a neural network. We provide a nonasymptotic debiased machine learning theorem that encompasses any globa… ▽ More Debiased machine learning is a meta algorithm based on bias correction and sample splitting to calculate confidence intervals for functionals, i.e. scalar summaries, of machine learning algorithms. For example, an analyst may desire the confidence interval for a treatment effect estimated with a neural network. We provide a nonasymptotic debiased machine learning theorem that encompasses any global or local functional of any machine learning algorithm that satisfies a few simple, interpretable conditions. Formally, we prove consistency, Gaussian approximation, and semiparametric efficiency by finite sample arguments. The rate of convergence is $n^{-1/2}$ for global functionals, and it degrades gracefully for local functionals. Our results culminate in a simple set of conditions that an analyst can use to translate modern learning theory rates into traditional statistical inference. The conditions reveal a general double robustness property for ill posed inverse problems. △ Less

Submitted 21 October, 2022; v1 submitted 31 May, 2021; originally announced May 2021.

Comments: Biometrika 2022

arXiv:2105.07424 [pdf, other]

Uniform Inference on High-dimensional Spatial Panel Networks

Authors: Victor Chernozhukov, Chen Huang, Weining Wang

Abstract: We propose employing a debiased-regularized, high-dimensional generalized method of moments (GMM) framework to perform inference on large-scale spatial panel networks. In particular, network structure with a flexible sparse deviation, which can be regarded either as latent or as misspecified from a predetermined adjacency matrix, is estimated using debiased machine learning approach. The theoretic… ▽ More We propose employing a debiased-regularized, high-dimensional generalized method of moments (GMM) framework to perform inference on large-scale spatial panel networks. In particular, network structure with a flexible sparse deviation, which can be regarded either as latent or as misspecified from a predetermined adjacency matrix, is estimated using debiased machine learning approach. The theoretical analysis establishes the consistency and asymptotic normality of our proposed estimator, taking into account general temporal and spatial dependency inherent in the data-generating processes. The dimensionality allowance in presence of dependency is discussed. A primary contribution of our study is the development of uniform inference theory that enables hypothesis testing on the parameters of interest, including zero or non-zero elements in the network structure. Additionally, the asymptotic properties for the estimator are derived for both linear and nonlinear moments. Simulations demonstrate superior performance of our proposed approach. Lastly, we apply our methodology to investigate the spatial network effect of stock returns. △ Less

Submitted 7 September, 2023; v1 submitted 16 May, 2021; originally announced May 2021.

arXiv:2105.04646 [pdf, other]

Deeply-Debiased Off-Policy Interval Estimation

Authors: Chengchun Shi, Runzhe Wan, Victor Chernozhukov, Rui Song

Abstract: Off-policy evaluation learns a target policy's value with a historical dataset generated by a different behavior policy. In addition to a point estimate, many applications would benefit significantly from having a confidence interval (CI) that quantifies the uncertainty of the point estimate. In this paper, we propose a novel deeply-debiasing procedure to construct an efficient, robust, and flexib… ▽ More Off-policy evaluation learns a target policy's value with a historical dataset generated by a different behavior policy. In addition to a point estimate, many applications would benefit significantly from having a confidence interval (CI) that quantifies the uncertainty of the point estimate. In this paper, we propose a novel deeply-debiasing procedure to construct an efficient, robust, and flexible CI on a target policy's value. Our method is justified by theoretical results and numerical experiments. A Python implementation of the proposed procedure is available at https://github.com/RunzheStat/D2OPE. △ Less

Submitted 7 June, 2021; v1 submitted 10 May, 2021; originally announced May 2021.

arXiv:2104.03220 [pdf, other]

DoubleML -- An Object-Oriented Implementation of Double Machine Learning in Python

Authors: Philipp Bach, Victor Chernozhukov, Malte S. Kurz, Martin Spindler

Abstract: DoubleML is an open-source Python library implementing the double machine learning framework of Chernozhukov et al. (2018) for a variety of causal models. It contains functionalities for valid statistical inference on causal parameters when the estimation of nuisance parameters is based on machine learning methods. The object-oriented implementation of DoubleML provides a high flexibility in terms… ▽ More DoubleML is an open-source Python library implementing the double machine learning framework of Chernozhukov et al. (2018) for a variety of causal models. It contains functionalities for valid statistical inference on causal parameters when the estimation of nuisance parameters is based on machine learning methods. The object-oriented implementation of DoubleML provides a high flexibility in terms of model specifications and makes it easily extendable. The package is distributed under the MIT license and relies on core libraries from the scientific Python ecosystem: scikit-learn, numpy, pandas, scipy, statsmodels and joblib. Source code, documentation and an extensive user guide can be found at https://github.com/DoubleML/doubleml-for-py and https://docs.doubleml.org. △ Less

Submitted 20 December, 2021; v1 submitted 7 April, 2021; originally announced April 2021.

Comments: 6 pages, 2 figures

MSC Class: 62-04

Journal ref: Journal of Machine Learning Research 23 (53), 2022, 1-6

arXiv:2103.09603 [pdf, other]

doi 10.18637/jss.v108.i03

DoubleML -- An Object-Oriented Implementation of Double Machine Learning in R

Authors: Philipp Bach, Victor Chernozhukov, Malte S. Kurz, Martin Spindler, Sven Klaassen

Abstract: The R package DoubleML implements the double/debiased machine learning framework of Chernozhukov et al. (2018). It provides functionalities to estimate parameters in causal models based on machine learning methods. The double machine learning framework consist of three key ingredients: Neyman orthogonality, high-quality machine learning estimation and sample splitting. Estimation of nuisance compo… ▽ More The R package DoubleML implements the double/debiased machine learning framework of Chernozhukov et al. (2018). It provides functionalities to estimate parameters in causal models based on machine learning methods. The double machine learning framework consist of three key ingredients: Neyman orthogonality, high-quality machine learning estimation and sample splitting. Estimation of nuisance components can be performed by various state-of-the-art machine learning methods that are available in the mlr3 ecosystem. DoubleML makes it possible to perform inference in a variety of causal models, including partially linear and interactive regression models and their extensions to instrumental variable estimation. The object-oriented implementation of DoubleML enables a high flexibility for the model specification and makes it easily extendable. This paper serves as an introduction to the double machine learning framework and the R package DoubleML. In reproducible code examples with simulated and real data sets, we demonstrate how DoubleML users can perform valid inference based on machine learning methods. △ Less

Submitted 5 June, 2024; v1 submitted 17 March, 2021; originally announced March 2021.

Comments: 56 pages, 8 Figures, 1 Table; Updated version for DoubleML 1.0.0; Updated version due to changes in R package paradox (for parameter tuning with mlr3)

MSC Class: 62-04

Journal ref: Journal of Statistical Software 2024

arXiv:2101.00009 [pdf, other]

Adversarial Estimation of Riesz Representers

Authors: Victor Chernozhukov, Whitney Newey, Rahul Singh, Vasilis Syrgkanis

Abstract: Many causal parameters are linear functionals of an underlying regression. The Riesz representer is a key component in the asymptotic variance of a semiparametrically estimated linear functional. We propose an adversarial framework to estimate the Riesz representer using general function spaces. We prove a nonasymptotic mean square rate in terms of an abstract quantity called the critical radius,… ▽ More Many causal parameters are linear functionals of an underlying regression. The Riesz representer is a key component in the asymptotic variance of a semiparametrically estimated linear functional. We propose an adversarial framework to estimate the Riesz representer using general function spaces. We prove a nonasymptotic mean square rate in terms of an abstract quantity called the critical radius, then specialize it for neural networks, random forests, and reproducing kernel Hilbert spaces as leading cases. Our estimators are highly compatible with targeted and debiased machine learning with sample splitting; our guarantees directly verify general conditions for inference that allow mis-specification. We also use our guarantees to prove inference without sample splitting, based on stability or complexity. Our estimators achieve nominal coverage in highly nonlinear simulations where some previous methods break down. They shed new light on the heterogeneous effects of matching grants. △ Less

Submitted 26 April, 2024; v1 submitted 30 December, 2020; originally announced January 2021.

arXiv:2011.01092 [pdf, other]

Insights from Optimal Pandemic Shielding in a Multi-Group SEIR Framework

Authors: Philipp Bach, Victor Chernozhukov, Martin Spindler

Abstract: The COVID-19 pandemic constitutes one of the largest threats in recent decades to the health and economic welfare of populations globally. In this paper, we analyze different types of policy measures designed to fight the spread of the virus and minimize economic losses. Our analysis builds on a multi-group SEIR model, which extends the multi-group SIR model introduced by Acemoglu et al.~(2020). W… ▽ More The COVID-19 pandemic constitutes one of the largest threats in recent decades to the health and economic welfare of populations globally. In this paper, we analyze different types of policy measures designed to fight the spread of the virus and minimize economic losses. Our analysis builds on a multi-group SEIR model, which extends the multi-group SIR model introduced by Acemoglu et al.~(2020). We adjust the underlying social interaction patterns and consider an extended set of policy measures. The model is calibrated for Germany. Despite the trade-off between COVID-19 prevention and economic activity that is inherent to shielding policies, our results show that efficiency gains can be achieved by targeting such policies towards different age groups. Alternative policies such as physical distancing can be employed to reduce the degree of targeting and the intensity and duration of shielding. Our results show that a comprehensive approach that combines multiple policy measures simultaneously can effectively mitigate population mortality and economic harm. △ Less

Submitted 2 November, 2020; originally announced November 2020.

Comments: 39 pages, 23 figures

arXiv:2005.14168 [pdf, other]

doi 10.1016/j.jeconom.2020.09.003

Causal Impact of Masks, Policies, Behavior on Early Covid-19 Pandemic in the U.S

Authors: Victor Chernozhukov, Hiroyuki Kasaha, Paul Schrimpf

Abstract: This paper evaluates the dynamic impact of various policies adopted by US states on the growth rates of confirmed Covid-19 cases and deaths as well as social distancing behavior measured by Google Mobility Reports, where we take into consideration people's voluntarily behavioral response to new information of transmission risks. Our analysis finds that both policies and information on transmission… ▽ More This paper evaluates the dynamic impact of various policies adopted by US states on the growth rates of confirmed Covid-19 cases and deaths as well as social distancing behavior measured by Google Mobility Reports, where we take into consideration people's voluntarily behavioral response to new information of transmission risks. Our analysis finds that both policies and information on transmission risks are important determinants of Covid-19 cases and deaths and shows that a change in policies explains a large fraction of observed changes in social distancing behavior. Our counterfactual experiments suggest that nationally mandating face masks for employees on April 1st could have reduced the growth rate of cases and deaths by more than 10 percentage points in late April, and could have led to as much as 17 to 55 percent less deaths nationally by the end of May, which roughly translates into 17 to 55 thousand saved lives. Our estimates imply that removing non-essential business closures (while maintaining school closures, restrictions on movie theaters and restaurants) could have led to -20 to 60 percent more cases and deaths by the end of May. We also find that, without stay-at-home orders, cases would have been larger by 25 to 170 percent, which implies that 0.5 to 3.4 million more Americans could have been infected if stay-at-home orders had not been implemented. Finally, not having implemented any policies could have led to at least a 7 fold increase with an uninformative upper bound in cases (and deaths) by the end of May in the US, with considerable uncertainty over the effects of school closures, which had little cross-sectional variation. △ Less

Submitted 19 October, 2020; v1 submitted 28 May, 2020; originally announced May 2020.

Journal ref: Journal of Econometrics (2020)

arXiv:1912.12213 [pdf, other]

Minimax Semiparametric Learning With Approximate Sparsity

Authors: Jelena Bradic, Victor Chernozhukov, Whitney K. Newey, Yinchu Zhu

Abstract: This paper is about the feasibility and means of root-n consistently estimating linear, mean-square continuous functionals of a high dimensional, approximately sparse regression. Such objects include a wide variety of interesting parameters such as regression coefficients, average derivatives, and the average treatment effect. We give lower bounds on the convergence rate of estimators of a regress… ▽ More This paper is about the feasibility and means of root-n consistently estimating linear, mean-square continuous functionals of a high dimensional, approximately sparse regression. Such objects include a wide variety of interesting parameters such as regression coefficients, average derivatives, and the average treatment effect. We give lower bounds on the convergence rate of estimators of a regression slope and an average derivative and find that these bounds are substantially larger than in a low dimensional, semiparametric setting. We also give debiased machine learners that are root-n consistent under either a minimal approximate sparsity condition or rate double robustness. These estimators improve on existing estimators in being root-n consistent under more general conditions that previously known. △ Less

Submitted 8 August, 2022; v1 submitted 27 December, 2019; originally announced December 2019.

arXiv:1909.07889 [pdf, other]

Distributional conformal prediction

Authors: Victor Chernozhukov, Kaspar Wüthrich, Yinchu Zhu

Abstract: We propose a robust method for constructing conditionally valid prediction intervals based on models for conditional distributions such as quantile and distribution regression. Our approach can be applied to important prediction problems including cross-sectional prediction, k-step-ahead forecasts, synthetic controls and counterfactual prediction, and individual treatment effects prediction. Our m… ▽ More We propose a robust method for constructing conditionally valid prediction intervals based on models for conditional distributions such as quantile and distribution regression. Our approach can be applied to important prediction problems including cross-sectional prediction, k-step-ahead forecasts, synthetic controls and counterfactual prediction, and individual treatment effects prediction. Our method exploits the probability integral transform and relies on permuting estimated ranks. Unlike regression residuals, ranks are independent of the predictors, allowing us to construct conditionally valid prediction intervals under heteroskedasticity. We establish approximate conditional validity under consistent estimation and provide approximate unconditional validity under model misspecification, overfitting, and with time series data. We also propose a simple "shape" adjustment of our baseline method that yields optimal prediction intervals. △ Less

Submitted 21 August, 2021; v1 submitted 17 September, 2019; originally announced September 2019.

Journal ref: PNAS November 30, 2021 118 (48) e2107794118

arXiv:1909.05782 [pdf, ps, other]

Fast Algorithms for the Quantile Regression Process

Authors: Victor Chernozhukov, Iván Fernández-Val, Blaise Melly

Abstract: The widespread use of quantile regression methods depends crucially on the existence of fast algorithms. Despite numerous algorithmic improvements, the computation time is still non-negligible because researchers often estimate many quantile regressions and use the bootstrap for inference. We suggest two new fast algorithms for the estimation of a sequence of quantile regressions at many quantile… ▽ More The widespread use of quantile regression methods depends crucially on the existence of fast algorithms. Despite numerous algorithmic improvements, the computation time is still non-negligible because researchers often estimate many quantile regressions and use the bootstrap for inference. We suggest two new fast algorithms for the estimation of a sequence of quantile regressions at many quantile indexes. The first algorithm applies the preprocessing idea of Portnoy and Koenker (1997) but exploits a previously estimated quantile regression to guess the sign of the residuals. This step allows for a reduction of the effective sample size. The second algorithm starts from a previously estimated quantile regression at a similar quantile index and updates it using a single Newton-Raphson iteration. The first algorithm is exact, while the second is only asymptotically equivalent to the traditional quantile regression estimator. We also apply the preprocessing idea to the bootstrap by using the sample estimates to guess the sign of the residuals in the bootstrap sample. Simulations show that our new algorithms provide very large improvements in computation time without significant (if any) cost in the quality of the estimates. For instance, we divide by 100 the time required to estimate 99 quantile regressions with 20 regressors and 50,000 observations. △ Less

Submitted 6 April, 2020; v1 submitted 12 September, 2019; originally announced September 2019.

Comments: 29 pages, 3 figures, 4 tables; for associated Stata package, see https://sites.google.com/site/blaisemelly/home/computer-programs/fast

arXiv:1909.00836 [pdf, other]

SortedEffects: Sorted Causal Effects in R

Authors: Shuowen Chen, Victor Chernozhukov, Iván Fernández-Val, Ye Luo

Abstract: Chernozhukov et al. (2018) proposed the sorted effect method for nonlinear regression models. This method consists of reporting percentiles of the partial effects in addition to the average commonly used to summarize the heterogeneity in the partial effects. They also proposed to use the sorted effects to carry out classification analysis where the observational units are classified as most and le… ▽ More Chernozhukov et al. (2018) proposed the sorted effect method for nonlinear regression models. This method consists of reporting percentiles of the partial effects in addition to the average commonly used to summarize the heterogeneity in the partial effects. They also proposed to use the sorted effects to carry out classification analysis where the observational units are classified as most and least affected if their causal effects are above or below some tail sorted effects. The R package SortedEffects implements the estimation and inference methods therein and provides tools to visualize the results. This vignette serves as an introduction to the package and displays basic functionality of the functions within. △ Less

Submitted 6 November, 2019; v1 submitted 2 September, 2019; originally announced September 2019.

Comments: 15 pages, 6 figures, 8 tables

MSC Class: 62-07; 62E20

arXiv:1908.09173 [pdf, ps, other]

Inference on weighted average value function in high-dimensional state space

Authors: Victor Chernozhukov, Whitney Newey, Vira Semenova

Abstract: This paper gives a consistent, asymptotically normal estimator of the expected value function when the state space is high-dimensional and the first-stage nuisance functions are estimated by modern machine learning tools. First, we show that value function is orthogonal to the conditional choice probability, therefore, this nuisance function needs to be estimated only at $n^{-1/4}$ rate. Second, w… ▽ More This paper gives a consistent, asymptotically normal estimator of the expected value function when the state space is high-dimensional and the first-stage nuisance functions are estimated by modern machine learning tools. First, we show that value function is orthogonal to the conditional choice probability, therefore, this nuisance function needs to be estimated only at $n^{-1/4}$ rate. Second, we give a correction term for the transition density of the state variable. The resulting orthogonal moment is robust to misspecification of the transition density and does not require this nuisance function to be consistently estimated. Third, we generalize this result by considering the weighted expected value. In this case, the orthogonal moment is doubly robust in the transition density and additional second-stage nuisance functions entering the correction term. We complete the asymptotic theory by providing bounds on second-order asymptotic terms. △ Less

Submitted 24 August, 2019; originally announced August 2019.

arXiv:1905.10116 [pdf, other]

Semi-Parametric Efficient Policy Learning with Continuous Actions

Authors: Mert Demirer, Vasilis Syrgkanis, Greg Lewis, Victor Chernozhukov

Abstract: We consider off-policy evaluation and optimization with continuous action spaces. We focus on observational data where the data collection policy is unknown and needs to be estimated. We take a semi-parametric approach where the value function takes a known parametric form in the treatment, but we are agnostic on how it depends on the observed contexts. We propose a doubly robust off-policy estima… ▽ More We consider off-policy evaluation and optimization with continuous action spaces. We focus on observational data where the data collection policy is unknown and needs to be estimated. We take a semi-parametric approach where the value function takes a known parametric form in the treatment, but we are agnostic on how it depends on the observed contexts. We propose a doubly robust off-policy estimate for this setting and show that off-policy optimization based on this estimate is robust to estimation errors of the policy function or the regression model. Our results also apply if the model does not satisfy our semi-parametric form, but rather we measure regret in terms of the best projection of the true value function to this functional space. Our work extends prior approaches of policy optimization from observational data that only considered discrete actions. We provide an experimental evaluation of our method in a synthetic data example motivated by optimal personalized pricing and costly resource allocation. △ Less

Submitted 20 July, 2019; v1 submitted 24 May, 2019; originally announced May 2019.

arXiv:1901.03821 [pdf, ps, other]

Mastering Panel 'Metrics: Causal Impact of Democracy on Growth

Authors: Shuowen Chen, Victor Chernozhukov, Iván Fernández-Val

Abstract: The relationship between democracy and economic growth is of long-standing interest. We revisit the panel data analysis of this relationship by Acemoglu, Naidu, Restrepo and Robinson (forthcoming) using state of the art econometric methods. We argue that this and lots of other panel data settings in economics are in fact high-dimensional, resulting in principal estimators -- the fixed effects (FE)… ▽ More The relationship between democracy and economic growth is of long-standing interest. We revisit the panel data analysis of this relationship by Acemoglu, Naidu, Restrepo and Robinson (forthcoming) using state of the art econometric methods. We argue that this and lots of other panel data settings in economics are in fact high-dimensional, resulting in principal estimators -- the fixed effects (FE) and Arellano-Bond (AB) estimators -- to be biased to the degree that invalidates statistical inference. We can however remove these biases by using simple analytical and sample-splitting methods, and thereby restore valid statistical inference. We find that the debiased FE and AB estimators produce substantially higher estimates of the long-run effect of democracy on growth, providing even stronger support for the key hypothesis in Acemoglu, Naidu, Restrepo and Robinson (forthcoming). Given the ubiquitous nature of panel data, we conclude that the use of debiased panel data estimators should substantially improve the quality of empirical inference in economics. △ Less

Submitted 12 January, 2019; originally announced January 2019.

Comments: 8 pages, 2 tables, includes supplementary appendix

MSC Class: 62P20

arXiv:1812.04345 [pdf, other]

Closing the U.S. gender wage gap requires understanding its heterogeneity

Authors: Philipp Bach, Victor Chernozhukov, Martin Spindler

Abstract: In 2016, the majority of full-time employed women in the U.S. earned significantly less than comparable men. The extent to which women were affected by gender inequality in earnings, however, depended greatly on socio-economic characteristics, such as marital status or educational attainment. In this paper, we analyzed data from the 2016 American Community Survey using a high-dimensional wage regr… ▽ More In 2016, the majority of full-time employed women in the U.S. earned significantly less than comparable men. The extent to which women were affected by gender inequality in earnings, however, depended greatly on socio-economic characteristics, such as marital status or educational attainment. In this paper, we analyzed data from the 2016 American Community Survey using a high-dimensional wage regression and applying double lasso to quantify heterogeneity in the gender wage gap. We found that the gap varied substantially across women and was driven primarily by marital status, having children at home, race, occupation, industry, and educational attainment. We recommend that policy makers use these insights to design policies that will reduce discrimination and unequal pay more effectively. △ Less

Submitted 7 June, 2021; v1 submitted 11 December, 2018; originally announced December 2018.

Comments: Main text: 8 pages, 3 figures; Supplementary Material available online

arXiv:1811.11603 [pdf, other]

Distribution Regression with Sample Selection, with an Application to Wage Decompositions in the UK

Authors: Victor Chernozhukov, Iván Fernández-Val, Siyi Luo

Abstract: We develop a distribution regression model under endogenous sample selection. This model is a semi-parametric generalization of the Heckman selection model. It accommodates much richer effects of the covariates on outcome distribution and patterns of heterogeneity in the selection process, and allows for drastic departures from the Gaussian error structure, while maintaining the same level tractab… ▽ More We develop a distribution regression model under endogenous sample selection. This model is a semi-parametric generalization of the Heckman selection model. It accommodates much richer effects of the covariates on outcome distribution and patterns of heterogeneity in the selection process, and allows for drastic departures from the Gaussian error structure, while maintaining the same level tractability as the classical model. The model applies to continuous, discrete and mixed outcomes. We provide identification, estimation, and inference methods, and apply them to obtain wage decomposition for the UK. Here we decompose the difference between the male and female wage distributions into composition, wage structure, selection structure, and selection sorting effects. After controlling for endogenous employment selection, we still find substantial gender wage gap -- ranging from 21% to 40% throughout the (latent) offered wage distribution that is not explained by composition. We also uncover positive sorting for single men and negative sorting for married women that accounts for a substantive fraction of the gender wage gap at the top of the distribution. △ Less

Submitted 18 December, 2023; v1 submitted 28 November, 2018; originally announced November 2018.

Comments: 86 pages, 4 tables, 40 figures, includes supplement

MSC Class: 62P20; 91B40

arXiv:1809.04951 [pdf, other]

Valid Simultaneous Inference in High-Dimensional Settings (with the hdm package for R)

Authors: Philipp Bach, Victor Chernozhukov, Martin Spindler

Abstract: Due to the increasing availability of high-dimensional empirical applications in many research disciplines, valid simultaneous inference becomes more and more important. For instance, high-dimensional settings might arise in economic studies due to very rich data sets with many potential covariates or in the analysis of treatment heterogeneities. Also the evaluation of potentially more complicated… ▽ More Due to the increasing availability of high-dimensional empirical applications in many research disciplines, valid simultaneous inference becomes more and more important. For instance, high-dimensional settings might arise in economic studies due to very rich data sets with many potential covariates or in the analysis of treatment heterogeneities. Also the evaluation of potentially more complicated (non-linear) functional forms of the regression relationship leads to many potential variables for which simultaneous inferential statements might be of interest. Here we provide a review of classical and modern methods for simultaneous inference in (high-dimensional) settings and illustrate their use by a case study using the R package hdm. The R package hdm implements valid joint powerful and efficient hypothesis tests for a potentially large number of coeffcients as well as the construction of simultaneous confidence intervals and, therefore, provides useful methods to perform valid post-selection inference based on the LASSO. △ Less

Submitted 13 September, 2018; originally announced September 2018.

Comments: 25 pages, 2 figures, 4 tables

arXiv:1809.01038 [pdf, other]

Shape-Enforcing Operators for Point and Interval Estimators

Authors: Xi Chen, Victor Chernozhukov, Iván Fernández-Val, Scott Kostyshak, Ye Luo

Abstract: A common problem in econometrics, statistics, and machine learning is to estimate and make inference on functions that satisfy shape restrictions. For example, distribution functions are nondecreasing and range between zero and one, height growth charts are nondecreasing in age, and production functions are nondecreasing and quasi-concave in input quantities. We propose a method to enforce these r… ▽ More A common problem in econometrics, statistics, and machine learning is to estimate and make inference on functions that satisfy shape restrictions. For example, distribution functions are nondecreasing and range between zero and one, height growth charts are nondecreasing in age, and production functions are nondecreasing and quasi-concave in input quantities. We propose a method to enforce these restrictions ex post on point and interval estimates of the target function by applying functional operators. If an operator satisfies certain properties that we make precise, the shape-enforced point estimates are closer to the target function than the original point estimates and the shape-enforced interval estimates have greater coverage and shorter length than the original interval estimates. We show that these properties hold for six different operators that cover commonly used shape restrictions in practice: range, convexity, monotonicity, monotone convexity, quasi-convexity, and monotone quasi-convexity. We illustrate the results with two empirical applications to the estimation of a height growth chart for infants in India and a production function for chemical firms in China. △ Less

Submitted 12 February, 2021; v1 submitted 4 September, 2018; originally announced September 2018.

Comments: 42 pages, 5 figures, 3 tables, v5 includes changes in the main text

MSC Class: 62F10; 62F25; 62G05; 62G15

arXiv:1808.10532 [pdf, other]

Uniform Inference in High-Dimensional Gaussian Graphical Models

Authors: Sven Klaassen, Jannis Kück, Martin Spindler, Victor Chernozhukov

Abstract: Graphical models have become a very popular tool for representing dependencies within a large set of variables and are key for representing causal structures. We provide results for uniform inference on high-dimensional graphical models with the number of target parameters $d$ being possible much larger than sample size. This is in particular important when certain features or structures of a caus… ▽ More Graphical models have become a very popular tool for representing dependencies within a large set of variables and are key for representing causal structures. We provide results for uniform inference on high-dimensional graphical models with the number of target parameters $d$ being possible much larger than sample size. This is in particular important when certain features or structures of a causal model should be recovered. Our results highlight how in high-dimensional settings graphical models can be estimated and recovered with modern machine learning methods in complex data sets. To construct simultaneous confidence regions on many target parameters, sufficiently fast estimation rates of the nuisance functions are crucial. In this context, we establish uniform estimation rates and sparsity guarantees of the square-root estimator in a random design under approximate sparsity conditions that might be of independent interest for related problems in high-dimensions. We also demonstrate in a comprehensive simulation study that our procedure has good small sample properties. △ Less

Submitted 3 December, 2018; v1 submitted 30 August, 2018; originally announced August 2018.

Comments: 59 pages, 2 figures, 6 tables

MSC Class: 62H15; 62J07;

arXiv:1806.05081 [pdf, other]

LASSO-Driven Inference in Time and Space

Authors: Victor Chernozhukov, Wolfgang K. Härdle, Chen Huang, Weining Wang

Abstract: We consider the estimation and inference in a system of high-dimensional regression equations allowing for temporal and cross-sectional dependency in covariates and error processes, covering rather general forms of weak temporal dependence. A sequence of regressions with many regressors using LASSO (Least Absolute Shrinkage and Selection Operator) is applied for variable selection purpose, and an… ▽ More We consider the estimation and inference in a system of high-dimensional regression equations allowing for temporal and cross-sectional dependency in covariates and error processes, covering rather general forms of weak temporal dependence. A sequence of regressions with many regressors using LASSO (Least Absolute Shrinkage and Selection Operator) is applied for variable selection purpose, and an overall penalty level is carefully chosen by a block multiplier bootstrap procedure to account for multiplicity of the equations and dependencies in the data. Correspondingly, oracle properties with a jointly selected tuning parameter are derived. We further provide high-quality de-biased simultaneous inference on the many target parameters of the system. We provide bootstrap consistency results of the test procedure, which are based on a general Bahadur representation for the $Z$-estimators with dependent data. Simulations demonstrate good performance of the proposed inference procedure. Finally, we apply the method to quantify spillover effects of textual sentiment indices in a financial market and to test the connectedness among sectors. △ Less

Submitted 15 May, 2020; v1 submitted 13 June, 2018; originally announced June 2018.

arXiv:1803.08154 [pdf, other]

Network and Panel Quantile Effects Via Distribution Regression

Authors: Victor Chernozhukov, Iván Fernández-Val, Martin Weidner

Abstract: This paper provides a method to construct simultaneous confidence bands for quantile functions and quantile effects in nonlinear network and panel models with unobserved two-way effects, strictly exogenous covariates, and possibly discrete outcome variables. The method is based upon projection of simultaneous confidence bands for distribution functions constructed from fixed effects distribution r… ▽ More This paper provides a method to construct simultaneous confidence bands for quantile functions and quantile effects in nonlinear network and panel models with unobserved two-way effects, strictly exogenous covariates, and possibly discrete outcome variables. The method is based upon projection of simultaneous confidence bands for distribution functions constructed from fixed effects distribution regression estimators. These fixed effects estimators are debiased to deal with the incidental parameter problem. Under asymptotic sequences where both dimensions of the data set grow at the same rate, the confidence bands for the quantile functions and effects have correct joint coverage in large samples. An empirical application to gravity models of trade illustrates the applicability of the methods to network data. △ Less

Submitted 8 June, 2020; v1 submitted 21 March, 2018; originally announced March 2018.

Comments: 71 pages, 8 figures, 3 tables, includes supplementary appendix

arXiv:1802.08667 [pdf, ps, other]

De-Biased Machine Learning of Global and Local Parameters Using Regularized Riesz Representers

Authors: Victor Chernozhukov, Whitney Newey, Rahul Singh

Abstract: We provide adaptive inference methods, based on $\ell_1$ regularization, for regular (semi-parametric) and non-regular (nonparametric) linear functionals of the conditional expectation function. Examples of regular functionals include average treatment effects, policy effects, and derivatives. Examples of non-regular functionals include average treatment effects, policy effects, and derivatives co… ▽ More We provide adaptive inference methods, based on $\ell_1$ regularization, for regular (semi-parametric) and non-regular (nonparametric) linear functionals of the conditional expectation function. Examples of regular functionals include average treatment effects, policy effects, and derivatives. Examples of non-regular functionals include average treatment effects, policy effects, and derivatives conditional on a covariate subvector fixed at a point. We construct a Neyman orthogonal equation for the target parameter that is approximately invariant to small perturbations of the nuisance parameters. To achieve this property, we include the Riesz representer for the functional as an additional nuisance parameter. Our analysis yields weak ``double sparsity robustness'': either the approximation to the regression or the approximation to the representer can be ``completely dense'' as long as the other is sufficiently ``sparse''. Our main results are non-asymptotic and imply asymptotic uniform validity over large classes of models, translating into honest confidence bands for both global and local parameters. △ Less

Submitted 21 October, 2022; v1 submitted 23 February, 2018; originally announced February 2018.

Comments: The Econometrics Journal, 2022

arXiv:1802.06300 [pdf, other]

Exact and Robust Conformal Inference Methods for Predictive Machine Learning With Dependent Data

Authors: Victor Chernozhukov, Kaspar Wuthrich, Yinchu Zhu

Abstract: We extend conformal inference to general settings that allow for time series data. Our proposal is developed as a randomization method and accounts for potential serial dependence by including block structures in the permutation scheme. As a result, the proposed method retains the exact, model-free validity when the data are i.i.d. or more generally exchangeable, similar to usual conformal inferen… ▽ More We extend conformal inference to general settings that allow for time series data. Our proposal is developed as a randomization method and accounts for potential serial dependence by including block structures in the permutation scheme. As a result, the proposed method retains the exact, model-free validity when the data are i.i.d. or more generally exchangeable, similar to usual conformal inference methods. When exchangeability fails, as is the case for common time series data, the proposed approach is approximately valid under weak assumptions on the conformity score. △ Less

Submitted 12 July, 2018; v1 submitted 17 February, 2018; originally announced February 2018.

Journal ref: Proceedings of COLT 2018 (PMLR 75:732-749)

arXiv:1801.05305 [pdf, other]

Censored Quantile Instrumental Variable Estimation with Stata

Authors: Victor Chernozhukov, Iván Fernández-Val, Suk** Han, Amanda Kowalski

Abstract: Many applications involve a censored dependent variable and an endogenous independent variable. Chernozhukov et al. (2015) introduced a censored quantile instrumental variable estimator (CQIV) for use in those applications, which has been applied by Kowalski (2016), among others. In this article, we introduce a Stata command, cqiv, that simplifes application of the CQIV estimator in Stata. We summ… ▽ More Many applications involve a censored dependent variable and an endogenous independent variable. Chernozhukov et al. (2015) introduced a censored quantile instrumental variable estimator (CQIV) for use in those applications, which has been applied by Kowalski (2016), among others. In this article, we introduce a Stata command, cqiv, that simplifes application of the CQIV estimator in Stata. We summarize the CQIV estimator and algorithm, we describe the use of the cqiv command, and we provide empirical examples. △ Less

Submitted 24 September, 2019; v1 submitted 13 January, 2018; originally announced January 2018.

Comments: 12 pages, 1 table, associated software can be found at https://ideas.repec.org/c/boc/bocode/s457478.html. arXiv admin note: text overlap with arXiv:1104.4580. We have updated the command to report standard errors and bootstrap percentile-t confidence intervals

MSC Class: 62P20

arXiv:1712.09988 [pdf, other]

Estimation and Inference on Heterogeneous Treatment Effects in High-Dimensional Dynamic Panels under Weak Dependence

Authors: Vira Semenova, Matt Goldman, Victor Chernozhukov, Matt Taddy

Abstract: This paper provides estimation and inference methods for a conditional average treatment effects (CATE) characterized by a high-dimensional parameter in both homogeneous cross-sectional and unit-heterogeneous dynamic panel data settings. In our leading example, we model CATE by interacting the base treatment variable with explanatory variables. The first step of our procedure is orthogonalization,… ▽ More This paper provides estimation and inference methods for a conditional average treatment effects (CATE) characterized by a high-dimensional parameter in both homogeneous cross-sectional and unit-heterogeneous dynamic panel data settings. In our leading example, we model CATE by interacting the base treatment variable with explanatory variables. The first step of our procedure is orthogonalization, where we partial out the controls and unit effects from the outcome and the base treatment and take the cross-fitted residuals. This step uses a novel generic cross-fitting method we design for weakly dependent time series and panel data. This method "leaves out the neighbors" when fitting nuisance components, and we theoretically power it by using Strassen's coupling. As a result, we can rely on any modern machine learning method in the first step, provided it learns the residuals well enough. Second, we construct an orthogonal (or residual) learner of CATE -- the Lasso CATE -- that regresses the outcome residual on the vector of interactions of the residualized treatment with explanatory variables. If the complexity of CATE function is simpler than that of the first-stage regression, the orthogonal learner converges faster than the single-stage regression-based learner. Third, we perform simultaneous inference on parameters of the CATE function using debiasing. We also can use ordinary least squares in the last two steps when CATE is low-dimensional. In heterogeneous panel data settings, we model the unobserved unit heterogeneity as a weakly sparse deviation from Mundlak (1978)'s model of correlated unit effects as a linear function of time-invariant covariates and make use of L1-penalization to estimate these models. We demonstrate our methods by estimating price elasticities of groceries based on scanner data. We note that our results are new even for the cross-sectional (i.i.d) case. △ Less

Submitted 10 December, 2022; v1 submitted 28 December, 2017; originally announced December 2017.

arXiv:1712.09089 [pdf, other]

An Exact and Robust Conformal Inference Method for Counterfactual and Synthetic Controls

Authors: Victor Chernozhukov, Kaspar Wüthrich, Yinchu Zhu

Abstract: We introduce new inference procedures for counterfactual and synthetic control methods for policy evaluation. We recast the causal inference problem as a counterfactual prediction and a structural breaks testing problem. This allows us to exploit insights from conformal prediction and structural breaks testing to develop permutation inference procedures that accommodate modern high-dimensional est… ▽ More We introduce new inference procedures for counterfactual and synthetic control methods for policy evaluation. We recast the causal inference problem as a counterfactual prediction and a structural breaks testing problem. This allows us to exploit insights from conformal prediction and structural breaks testing to develop permutation inference procedures that accommodate modern high-dimensional estimators, are valid under weak and easy-to-verify conditions, and are provably robust against misspecification. Our methods work in conjunction with many different approaches for predicting counterfactual mean outcomes in the absence of the policy intervention. Examples include synthetic controls, difference-in-differences, factor and matrix completion models, and (fused) time series panel data models. Our approach demonstrates an excellent small-sample performance in simulations and is taken to a data application where we re-evaluate the consequences of decriminalizing indoor prostitution. Open-source software for implementing our conformal inference methods is available. △ Less

Submitted 20 May, 2021; v1 submitted 25 December, 2017; originally announced December 2017.

Journal ref: Journal of the American Statistical Association 2021, 116:536, 1849-1864

arXiv:1712.04802 [pdf, other]

Fisher-Schultz Lecture: Generic Machine Learning Inference on Heterogenous Treatment Effects in Randomized Experiments, with an Application to Immunization in India

Authors: Victor Chernozhukov, Mert Demirer, Esther Duflo, Iván Fernández-Val

Abstract: We propose strategies to estimate and make inference on key features of heterogeneous effects in randomized experiments. These key features include best linear predictors of the effects using machine learning proxies, average effects sorted by impact groups, and average characteristics of most and least impacted units. The approach is valid in high dimensional settings, where the effects are proxi… ▽ More We propose strategies to estimate and make inference on key features of heterogeneous effects in randomized experiments. These key features include best linear predictors of the effects using machine learning proxies, average effects sorted by impact groups, and average characteristics of most and least impacted units. The approach is valid in high dimensional settings, where the effects are proxied (but not necessarily consistently estimated) by predictive and causal machine learning methods. We post-process these proxies into estimates of the key features. Our approach is generic, it can be used in conjunction with penalized methods, neural networks, random forests, boosted trees, and ensemble methods, both predictive and causal. Estimation and inference are based on repeated data splitting to avoid overfitting and achieve validity. We use quantile aggregation of the results across many potential splits, in particular taking medians of p-values and medians and other quantiles of confidence intervals. We show that quantile aggregation lowers estimation risks over a single split procedure, and establish its principal inferential properties. Finally, our analysis reveals ways to build provably better machine learning proxies through causal learning: we can use the objective functions that we develop to construct the best linear predictors of the effects, to obtain better machine learning proxies in the initial step. We illustrate the use of both inferential tools and causal learners with a randomized field experiment that evaluates a combination of nudges to stimulate demand for immunization in India. △ Less

Submitted 23 October, 2023; v1 submitted 13 December, 2017; originally announced December 2017.

Comments: 81 pages, 8 figures, 17 tables, includes Online Appendix, minor revision with respect to previous version

arXiv:1711.02184 [pdf, other]

Semiparametric Estimation of Structural Functions in Nonseparable Triangular Models

Authors: Victor Chernozhukov, Iván Fernández-Val, Whitney Newey, Sami Stouli, Francis Vella

Abstract: Triangular systems with nonadditively separable unobserved heterogeneity provide a theoretically appealing framework for the modelling of complex structural relationships. However, they are not commonly used in practice due to the need for exogenous variables with large support for identification, the curse of dimensionality in estimation, and the lack of inferential tools. This paper introduces t… ▽ More Triangular systems with nonadditively separable unobserved heterogeneity provide a theoretically appealing framework for the modelling of complex structural relationships. However, they are not commonly used in practice due to the need for exogenous variables with large support for identification, the curse of dimensionality in estimation, and the lack of inferential tools. This paper introduces two classes of semiparametric nonseparable triangular models that address these limitations. They are based on distribution and quantile regression modelling of the reduced form conditional distributions of the endogenous variables. We show that average, distribution and quantile structural functions are identified in these systems through a control function approach that does not require a large support condition. We propose a computationally attractive three-stage procedure to estimate the structural functions where the first two stages consist of quantile or distribution regressions. We provide asymptotic theory and uniform inference methods for each stage. In particular, we derive functional central limit theorems and bootstrap functional central limit theorems for the distribution regression estimators of the structural functions. These results establish the validity of the bootstrap for three-stage estimators of structural functions, and lead to simple inference algorithms. We illustrate the implementation and applicability of all our methods with numerical simulations and an empirical application to demand analysis. △ Less

Submitted 5 October, 2019; v1 submitted 6 November, 2017; originally announced November 2017.

Comments: 45 pages, 4 figures, 1 table, we have added grant funding acknowledgement to v3

MSC Class: 62P20; 91B82

arXiv:1706.08418 [pdf, ps, other]

Nonseparable Multinomial Choice Models in Cross-Section and Panel Data

Authors: Victor Chernozhukov, Iván Fernández-Val, Whitney Newey

Abstract: Multinomial choice models are fundamental for empirical modeling of economic choices among discrete alternatives. We analyze identification of binary and multinomial choice models when the choice utilities are nonseparable in observed attributes and multidimensional unobserved heterogeneity with cross-section and panel data. We show that derivatives of choice probabilities with respect to continuo… ▽ More Multinomial choice models are fundamental for empirical modeling of economic choices among discrete alternatives. We analyze identification of binary and multinomial choice models when the choice utilities are nonseparable in observed attributes and multidimensional unobserved heterogeneity with cross-section and panel data. We show that derivatives of choice probabilities with respect to continuous attributes are weighted averages of utility derivatives in cross-section models with exogenous heterogeneity. In the special case of random coefficient models with an independent additive effect, we further characterize that the probability derivative at zero is proportional to the population mean of the coefficients. We extend the identification results to models with endogenous heterogeneity using either a control function or panel data. In time stationary panel models with two periods, we find that differences over time of derivatives of choice probabilities identify utility derivatives "on the diagonal," i.e. when the observed attributes take the same values in the two periods. We also show that time stationarity does not identify structural derivatives "off the diagonal" both in continuous and multinomial choice panel models. △ Less

Submitted 9 May, 2018; v1 submitted 26 June, 2017; originally announced June 2017.

Comments: 23 pages

arXiv:1702.06240 [pdf, other]

Debiased Machine Learning of Conditional Average Treatment Effects and Other Causal Functions

Authors: Vira Semenova, Victor Chernozhukov

Abstract: This paper provides estimation and inference methods for the best linear predictor (approximation) of a structural function, such as conditional average structural and treatment effects, and structural derivatives, based on modern machine learning (ML) tools. We represent this structural function as a conditional expectation of an unbiased signal that depends on a nuisance parameter, which we esti… ▽ More This paper provides estimation and inference methods for the best linear predictor (approximation) of a structural function, such as conditional average structural and treatment effects, and structural derivatives, based on modern machine learning (ML) tools. We represent this structural function as a conditional expectation of an unbiased signal that depends on a nuisance parameter, which we estimate by modern machine learning techniques. We first adjust the signal to make it insensitive (Neyman-orthogonal) with respect to the first-stage regularization bias. We then project the signal onto a set of basis functions, growing with sample size, which gives us the best linear predictor of the structural function. We derive a complete set of results for estimation and simultaneous inference on all parameters of the best linear predictor, conducting inference by Gaussian bootstrap. When the structural function is smooth and the basis is sufficiently rich, our estimation and inference result automatically targets this function. When basis functions are group indicators, the best linear predictor reduces to group average treatment/structural effect, and our inference automatically targets these parameters. We demonstrate our method by estimating uniform confidence bands for the average price elasticity of gasoline demand conditional on income. △ Less

Submitted 14 August, 2020; v1 submitted 20 February, 2017; originally announced February 2017.

arXiv:1701.08687 [pdf, ps, other]

Double/Debiased/Neyman Machine Learning of Treatment Effects

Authors: Victor Chernozhukov, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, Whitney Newey

Abstract: Chernozhukov, Chetverikov, Demirer, Duflo, Hansen, and Newey (2016) provide a generic double/de-biased machine learning (DML) approach for obtaining valid inferential statements about focal parameters, using Neyman-orthogonal scores and cross-fitting, in settings where nuisance parameters are estimated using a new generation of nonparametric fitting methods for high-dimensional data, called machin… ▽ More Chernozhukov, Chetverikov, Demirer, Duflo, Hansen, and Newey (2016) provide a generic double/de-biased machine learning (DML) approach for obtaining valid inferential statements about focal parameters, using Neyman-orthogonal scores and cross-fitting, in settings where nuisance parameters are estimated using a new generation of nonparametric fitting methods for high-dimensional data, called machine learning methods. In this note, we illustrate the application of this method in the context of estimating average treatment effects (ATE) and average treatment effects on the treated (ATTE) using observational data. A more general discussion and references to the existing literature are available in Chernozhukov, Chetverikov, Demirer, Duflo, Hansen, and Newey (2016). △ Less

Submitted 30 January, 2017; originally announced January 2017.

Comments: Conference paper, forthcoming in American Economic Review, Papers and Proceedings, 2017. arXiv admin note: text overlap with arXiv:1608.00060

arXiv:1612.06850 [pdf, ps, other]

doi 10.1201/9781315120256

Extremal Quantile Regression: An Overview

Authors: Victor Chernozhukov, Iván Fernández-Val, Tetsuya Kaji

Abstract: Extremal quantile regression, i.e. quantile regression applied to the tails of the conditional distribution, counts with an increasing number of economic and financial applications such as value-at-risk, production frontiers, determinants of low infant birth weights, and auction models. This chapter provides an overview of recent developments in the theory and empirics of extremal quantile regress… ▽ More Extremal quantile regression, i.e. quantile regression applied to the tails of the conditional distribution, counts with an increasing number of economic and financial applications such as value-at-risk, production frontiers, determinants of low infant birth weights, and auction models. This chapter provides an overview of recent developments in the theory and empirics of extremal quantile regression. The advances in the theory have relied on the use of extreme value approximations to the law of the Koenker and Bassett (1978) quantile regression estimator. Extreme value laws not only have been shown to provide more accurate approximations than Gaussian laws at the tails, but also have served as the basis to develop bias corrected estimators and inference methods using simulation and suitable variations of bootstrap and subsampling. The applicability of these methods is illustrated with two empirical examples on conditional value-at-risk and financial contagion. △ Less

Submitted 8 February, 2017; v1 submitted 20 December, 2016; originally announced December 2016.

Comments: 32 pages, 4 tables, 7 figures, forthcoming in the Handbook of Quantile Regression

arXiv:1610.08329 [pdf, other]

quantreg.nonpar: An R Package for Performing Nonparametric Series Quantile Regression

Authors: Michael Lipsitz, Alexandre Belloni, Victor Chernozhukov, Iván Fernández-Val

Abstract: The R package quantreg.nonpar implements nonparametric quantile regression methods to estimate and make inference on partially linear quantile models. quantreg.nonpar obtains point estimates of the conditional quantile function and its derivatives based on series approximations to the nonparametric part of the model. It also provides pointwise and uniform confidence intervals over a region of cova… ▽ More The R package quantreg.nonpar implements nonparametric quantile regression methods to estimate and make inference on partially linear quantile models. quantreg.nonpar obtains point estimates of the conditional quantile function and its derivatives based on series approximations to the nonparametric part of the model. It also provides pointwise and uniform confidence intervals over a region of covariate values and/or quantile indices for the same functions using analytical and resampling methods. This paper serves as an introduction to the package and displays basic functionality of the functions contained within. △ Less

Submitted 26 October, 2016; originally announced October 2016.

Comments: 12 pages, 5 figures

arXiv:1610.07894 [pdf, other]

Counterfactual: An R Package for Counterfactual Analysis

Authors: Mingli Chen, Victor Chernozhukov, Iván Fernández-Val, Blaise Melly

Abstract: The Counterfactual package implements the estimation and inference methods of Chernozhukov, Fernández-Val and Melly (2013) for counterfactual analysis. The counterfactual distributions considered are the result of changing either the marginal distribution of covariates related to the outcome variable of interest, or the conditional distribution of the outcome given the covariates. They can be appl… ▽ More The Counterfactual package implements the estimation and inference methods of Chernozhukov, Fernández-Val and Melly (2013) for counterfactual analysis. The counterfactual distributions considered are the result of changing either the marginal distribution of covariates related to the outcome variable of interest, or the conditional distribution of the outcome given the covariates. They can be applied to estimate quantile treatment effects and wage decompositions. This paper serves as an introduction to the package and displays basic functionality of the commands contained within. △ Less

Submitted 25 October, 2016; originally announced October 2016.

Comments: 15 pages, 4 figures

arXiv:1610.06833 [pdf, ps, other]

Vector quantile regression beyond correct specification

Authors: Guillaume Carlier, Victor Chernozhukov, Alfred Galichon

Abstract: This paper studies vector quantile regression (VQR), which is a way to model the dependence of a random vector of interest with respect to a vector of explanatory variables so to capture the whole conditional distribution, and not only the conditional mean. The problem of vector quantile regression is formulated as an optimal transport problem subject to an additional mean-independence condition.… ▽ More This paper studies vector quantile regression (VQR), which is a way to model the dependence of a random vector of interest with respect to a vector of explanatory variables so to capture the whole conditional distribution, and not only the conditional mean. The problem of vector quantile regression is formulated as an optimal transport problem subject to an additional mean-independence condition. This paper provides a new set of results on VQR beyond the case with correct specification which had been the focus of previous work. First, we show that even under misspecification, the VQR problem still has a solution which provides a general representation of the conditional dependence between random vectors. Second, we provide a detailed comparison with the classical approach of Koenker and Bassett in the case when the dependent variable is univariate and we show that in that case, VQR is equivalent to classical quantile regression with an additional monotonicity constraint. △ Less

Submitted 21 October, 2016; originally announced October 2016.

arXiv:1608.05142 [pdf, other]

Generic Inference on Quantile and Quantile Effect Functions for Discrete Outcomes

Authors: Victor Chernozhukov, Iván Fernández-Val, Blaise Melly, Kaspar Wüthrich

Abstract: Quantile and quantile effect functions are important tools for descriptive and causal analyses due to their natural and intuitive interpretation. Existing inference methods for these functions do not apply to discrete random variables. This paper offers a simple, practical construction of simultaneous confidence bands for quantile and quantile effect functions of possibly discrete random variables… ▽ More Quantile and quantile effect functions are important tools for descriptive and causal analyses due to their natural and intuitive interpretation. Existing inference methods for these functions do not apply to discrete random variables. This paper offers a simple, practical construction of simultaneous confidence bands for quantile and quantile effect functions of possibly discrete random variables. It is based on a natural transformation of simultaneous confidence bands for distribution functions, which are readily available for many problems. The construction is generic and does not depend on the nature of the underlying problem. It works in conjunction with parametric, semiparametric, and nonparametric modeling methods for observed and counterfactual distributions, and does not depend on the sampling scheme. We apply our method to characterize the distributional impact of insurance coverage on health care utilization and obtain the distributional decomposition of the racial test score gap. We find that universal insurance coverage increases the number of doctor visits across the entire distribution, and that the racial test score gap is small at early ages but grows with age due to socio economic factors affecting child development especially at the top of the distribution. These are new, interesting empirical findings that complement previous analyses that focused on mean effects only. In both applications, the outcomes of interest are discrete rendering existing inference methods invalid for obtaining uniform confidence bands for observed and counterfactual quantile functions and for their difference -- the quantile effects functions. △ Less

Submitted 30 August, 2018; v1 submitted 17 August, 2016; originally announced August 2016.

Comments: 38 pages, 6 figures, 4 tables

MSC Class: 62F25; 62G15; 62P20

Showing 1–50 of 88 results for author: Chernozhukov, V