Search | arXiv e-print repository

Automatic Debiased Machine Learning for Dynamic Treatment Effects and General Nested Functionals

Authors: Victor Chernozhukov, Whitney Newey, Rahul Singh, Vasilis Syrgkanis

Abstract: We extend the idea of automated debiased machine learning to the dynamic treatment regime and more generally to nested functionals. We show that the multiply robust formula for the dynamic treatment regime with discrete treatments can be re-stated in terms of a recursive Riesz representer characterization of nested mean regressions. We then apply a recursive Riesz representer estimation learning a… ▽ More We extend the idea of automated debiased machine learning to the dynamic treatment regime and more generally to nested functionals. We show that the multiply robust formula for the dynamic treatment regime with discrete treatments can be re-stated in terms of a recursive Riesz representer characterization of nested mean regressions. We then apply a recursive Riesz representer estimation learning algorithm that estimates de-biasing corrections without the need to characterize how the correction terms look like, such as for instance, products of inverse probability weighting terms, as is done in prior work on doubly robust estimation in the dynamic regime. Our approach defines a sequence of loss minimization problems, whose minimizers are the mulitpliers of the de-biasing correction, hence circumventing the need for solving auxiliary propensity models and directly optimizing for the mean squared error of the target de-biasing correction. We provide further applications of our approach to estimation of dynamic discrete choice models and estimation of long-term effects with surrogates. △ Less

Submitted 20 June, 2023; v1 submitted 25 March, 2022; originally announced March 2022.

arXiv:2201.05139 [pdf, ps, other]

Generalized Kernel Ridge Regression for Long Term Causal Inference: Treatment Effects, Dose Responses, and Counterfactual Distributions

Authors: Rahul Singh

Abstract: I propose kernel ridge regression estimators for long term causal inference, where a short term experimental data set containing randomized treatment and short term surrogates is fused with a long term observational data set containing short term surrogates and long term outcomes. I propose estimators of treatment effects, dose responses, and counterfactual distributions with closed form solutions… ▽ More I propose kernel ridge regression estimators for long term causal inference, where a short term experimental data set containing randomized treatment and short term surrogates is fused with a long term observational data set containing short term surrogates and long term outcomes. I propose estimators of treatment effects, dose responses, and counterfactual distributions with closed form solutions in terms of kernel matrix operations. I allow covariates, treatment, and surrogates to be discrete or continuous, and low, high, or infinite dimensional. For long term treatment effects, I prove $\sqrt{n}$ consistency, Gaussian approximation, and semiparametric efficiency. For long term dose responses, I prove uniform consistency with finite sample rates. For long term counterfactual distributions, I prove convergence in distribution. △ Less

Submitted 13 January, 2022; originally announced January 2022.

Comments: 30 pages. arXiv admin note: substantial text overlap with arXiv:2111.05277; text overlap with arXiv:2010.04855, arXiv:2111.03950

arXiv:2112.14249 [pdf, other]

Nested Nonparametric Instrumental Variable Regression: Long Term, Mediated, and Time Varying Treatment Effects

Authors: Isaac Meza, Rahul Singh

Abstract: Several causal parameters in short panel data models are scalar summaries of a function called a nested nonparametric instrumental variable regression (nested NPIV). Examples include long term, mediated, and time varying treatment effects identified using proxy variables. However, it appears that no prior estimators or guarantees for nested NPIV exist, preventing flexible estimation and inference… ▽ More Several causal parameters in short panel data models are scalar summaries of a function called a nested nonparametric instrumental variable regression (nested NPIV). Examples include long term, mediated, and time varying treatment effects identified using proxy variables. However, it appears that no prior estimators or guarantees for nested NPIV exist, preventing flexible estimation and inference for these causal parameters. A major challenge is compounding ill posedness due to the nested inverse problems. We analyze adversarial estimators of nested NPIV, and provide sufficient conditions for efficient inference on the causal parameter. Our nonasymptotic analysis has three salient features: (i) introducing techniques that limit how ill posedness compounds; (ii) accommodating neural networks, random forests, and reproducing kernel Hilbert spaces; and (iii) extending to causal functions, e.g. long term heterogeneous treatment effects. We measure long term heterogeneous treatment effects of Project STAR and mediated proximal treatment effects of the Job Corps. △ Less

Submitted 10 March, 2024; v1 submitted 28 December, 2021; originally announced December 2021.

arXiv:2111.05277 [pdf, ps, other]

Generalized Kernel Ridge Regression for Causal Inference with Missing-at-Random Sample Selection

Authors: Rahul Singh

Abstract: I propose kernel ridge regression estimators for nonparametric dose response curves and semiparametric treatment effects in the setting where an analyst has access to a selected sample rather than a random sample; only for select observations, the outcome is observed. I assume selection is as good as random conditional on treatment and a sufficiently rich set of observed covariates, where the cova… ▽ More I propose kernel ridge regression estimators for nonparametric dose response curves and semiparametric treatment effects in the setting where an analyst has access to a selected sample rather than a random sample; only for select observations, the outcome is observed. I assume selection is as good as random conditional on treatment and a sufficiently rich set of observed covariates, where the covariates are allowed to cause treatment or be caused by treatment -- an extension of missingness-at-random (MAR). I propose estimators of means, increments, and distributions of counterfactual outcomes with closed form solutions in terms of kernel matrix operations, allowing treatment and covariates to be discrete or continuous, and low, high, or infinite dimensional. For the continuous treatment case, I prove uniform consistency with finite sample rates. For the discrete treatment case, I prove root-n consistency, Gaussian approximation, and semiparametric efficiency. △ Less

Submitted 9 November, 2021; originally announced November 2021.

Comments: 75 pages. arXiv admin note: text overlap with arXiv:2010.04855, arXiv:2111.03950

arXiv:2111.03950 [pdf, other]

Sequential Kernel Embedding for Mediated and Time-Varying Dose Response Curves

Authors: Rahul Singh, Liyuan Xu, Arthur Gretton

Abstract: We propose simple nonparametric estimators for mediated and time-varying dose response curves based on kernel ridge regression. By embedding Pearl's mediation formula and Robins' g-formula with kernels, we allow treatments, mediators, and covariates to be continuous in general spaces, and also allow for nonlinear treatment-confounder feedback. Our key innovation is a reproducing kernel Hilbert spa… ▽ More We propose simple nonparametric estimators for mediated and time-varying dose response curves based on kernel ridge regression. By embedding Pearl's mediation formula and Robins' g-formula with kernels, we allow treatments, mediators, and covariates to be continuous in general spaces, and also allow for nonlinear treatment-confounder feedback. Our key innovation is a reproducing kernel Hilbert space technique called sequential kernel embedding, which we use to construct simple estimators for complex causal estimands. Our estimators preserve the generality of classic identification while also achieving nonasymptotic uniform rates. In nonlinear simulations with many covariates, we demonstrate strong performance. We estimate mediated and time-varying dose response curves of the US Job Corps, and clean data that may serve as a benchmark in future work. We extend our results to mediated and time-varying treatment effects and counterfactual distributions, verifying semiparametric efficiency and weak convergence. △ Less

Submitted 19 July, 2023; v1 submitted 6 November, 2021; originally announced November 2021.

Comments: 87 pages. Material in this draft previously appeared in a working paper presented at the 2020 NeurIPS Workshop on ML for Economic Policy (arXiv:2010.04855v1). We have divided the original working paper (arXiv:2010.04855v1) into two projects: one paper focusing on time-fixed settings (arXiv:2010.04855) and this paper focusing on time-varying settings

arXiv:2107.02780 [pdf, other]

Causal Inference with Corrupted Data: Measurement Error, Missing Values, Discretization, and Differential Privacy

Authors: Anish Agarwal, Rahul Singh

Abstract: The US Census Bureau will deliberately corrupt data sets derived from the 2020 US Census, enhancing the privacy of respondents while potentially reducing the precision of economic analysis. To investigate whether this trade-off is inevitable, we formulate a semiparametric model of causal inference with high dimensional corrupted data. We propose a procedure for data cleaning, estimation, and infer… ▽ More The US Census Bureau will deliberately corrupt data sets derived from the 2020 US Census, enhancing the privacy of respondents while potentially reducing the precision of economic analysis. To investigate whether this trade-off is inevitable, we formulate a semiparametric model of causal inference with high dimensional corrupted data. We propose a procedure for data cleaning, estimation, and inference with data cleaning-adjusted confidence intervals. We prove consistency and Gaussian approximation by finite sample arguments, with a rate of $n^{ 1/2}$ for semiparametric estimands that degrades gracefully for nonparametric estimands. Our key assumption is that the true covariates are approximately low rank, which we interpret as approximate repeated measurements and empirically validate. Our analysis provides nonasymptotic theoretical contributions to matrix completion, statistical learning, and semiparametric statistics. Calibrated simulations verify the coverage of our data cleaning adjusted confidence intervals and demonstrate the relevance of our results for Census-derived data. △ Less

Submitted 12 February, 2024; v1 submitted 6 July, 2021; originally announced July 2021.

ACM Class: G.3; J.4

arXiv:2107.01098 [pdf, other]

Temporal Analysis of Worldwide War

Authors: Devansh Bajpai, Rishi Ranjan Singh

Abstract: Analysis of wars and conflicts between regions has been an important topic of interest throughout the history of humankind. In the latter part of the 20th century, in the aftermath of two World Wars and the shadow of nuclear, biological, and chemical holocaust, more was written on the subject than ever before. Wars have a negative impact on a country's economy, social order, infrastructure, and pu… ▽ More Analysis of wars and conflicts between regions has been an important topic of interest throughout the history of humankind. In the latter part of the 20th century, in the aftermath of two World Wars and the shadow of nuclear, biological, and chemical holocaust, more was written on the subject than ever before. Wars have a negative impact on a country's economy, social order, infrastructure, and public health. In this paper, we study the wars fought in history and draw conclusions from that. We explore the participation of countries in wars and the nature of relationships between various countries during different timelines. A big part of today's wars is fought against terrorism. Therefore, this study also attempts to shed light on different countries' exposure to terrorist encounters and analyses the impact of wars on a country's economy in terms of change in GDP. △ Less

Submitted 27 June, 2021; originally announced July 2021.

arXiv:2105.15197 [pdf, ps, other]

A Simple and General Debiased Machine Learning Theorem with Finite Sample Guarantees

Authors: Victor Chernozhukov, Whitney K. Newey, Rahul Singh

Abstract: Debiased machine learning is a meta algorithm based on bias correction and sample splitting to calculate confidence intervals for functionals, i.e. scalar summaries, of machine learning algorithms. For example, an analyst may desire the confidence interval for a treatment effect estimated with a neural network. We provide a nonasymptotic debiased machine learning theorem that encompasses any globa… ▽ More Debiased machine learning is a meta algorithm based on bias correction and sample splitting to calculate confidence intervals for functionals, i.e. scalar summaries, of machine learning algorithms. For example, an analyst may desire the confidence interval for a treatment effect estimated with a neural network. We provide a nonasymptotic debiased machine learning theorem that encompasses any global or local functional of any machine learning algorithm that satisfies a few simple, interpretable conditions. Formally, we prove consistency, Gaussian approximation, and semiparametric efficiency by finite sample arguments. The rate of convergence is $n^{-1/2}$ for global functionals, and it degrades gracefully for local functionals. Our results culminate in a simple set of conditions that an analyst can use to translate modern learning theory rates into traditional statistical inference. The conditions reveal a general double robustness property for ill posed inverse problems. △ Less

Submitted 21 October, 2022; v1 submitted 31 May, 2021; originally announced May 2021.

Comments: Biometrika 2022

arXiv:2102.11076 [pdf, other]

Kernel Ridge Riesz Representers: Generalization, Mis-specification, and the Counterfactual Effective Dimension

Authors: Rahul Singh

Abstract: Kernel balancing weights provide confidence intervals for average treatment effects, based on the idea of balancing covariates for the treated group and untreated group in feature space, often with ridge regularization. Previous works on the classical kernel ridge balancing weights have certain limitations: (i) not articulating generalization error for the balancing weights, (ii) typically requiri… ▽ More Kernel balancing weights provide confidence intervals for average treatment effects, based on the idea of balancing covariates for the treated group and untreated group in feature space, often with ridge regularization. Previous works on the classical kernel ridge balancing weights have certain limitations: (i) not articulating generalization error for the balancing weights, (ii) typically requiring correct specification of features, and (iii) justifying Gaussian approximation for only average effects. I interpret kernel balancing weights as kernel ridge Riesz representers (KRRR) and address these limitations via a new characterization of the counterfactual effective dimension. KRRR is an exact generalization of kernel ridge regression and kernel ridge balancing weights. I prove strong properties similar to kernel ridge regression: population $L_2$ rates controlling generalization error, and a standalone closed form solution that can interpolate. The framework relaxes the stringent assumption that the underlying regression model is correctly specified by the features. It extends Gaussian approximation beyond average effects to heterogeneous effects, justifying confidence sets for causal functions. I use KRRR to quantify uncertainty for heterogeneous treatment effects, by age, of 401(k) eligibility on assets. △ Less

Submitted 4 July, 2024; v1 submitted 22 February, 2021; originally announced February 2021.

MSC Class: 62G15; 62D20; 46E22 ACM Class: G.3; J.4

arXiv:2101.00009 [pdf, other]

Adversarial Estimation of Riesz Representers

Authors: Victor Chernozhukov, Whitney Newey, Rahul Singh, Vasilis Syrgkanis

Abstract: Many causal parameters are linear functionals of an underlying regression. The Riesz representer is a key component in the asymptotic variance of a semiparametrically estimated linear functional. We propose an adversarial framework to estimate the Riesz representer using general function spaces. We prove a nonasymptotic mean square rate in terms of an abstract quantity called the critical radius,… ▽ More Many causal parameters are linear functionals of an underlying regression. The Riesz representer is a key component in the asymptotic variance of a semiparametrically estimated linear functional. We propose an adversarial framework to estimate the Riesz representer using general function spaces. We prove a nonasymptotic mean square rate in terms of an abstract quantity called the critical radius, then specialize it for neural networks, random forests, and reproducing kernel Hilbert spaces as leading cases. Our estimators are highly compatible with targeted and debiased machine learning with sample splitting; our guarantees directly verify general conditions for inference that allow mis-specification. We also use our guarantees to prove inference without sample splitting, based on stability or complexity. Our estimators achieve nominal coverage in highly nonlinear simulations where some previous methods break down. They shed new light on the heterogeneous effects of matching grants. △ Less

Submitted 26 April, 2024; v1 submitted 30 December, 2020; originally announced January 2021.

arXiv:2012.10315 [pdf, ps, other]

Kernel Methods for Unobserved Confounding: Negative Controls, Proxies, and Instruments

Authors: Rahul Singh

Abstract: Negative control is a strategy for learning the causal relationship between treatment and outcome in the presence of unmeasured confounding. The treatment effect can nonetheless be identified if two auxiliary variables are available: a negative control treatment (which has no effect on the actual outcome), and a negative control outcome (which is not affected by the actual treatment). These auxili… ▽ More Negative control is a strategy for learning the causal relationship between treatment and outcome in the presence of unmeasured confounding. The treatment effect can nonetheless be identified if two auxiliary variables are available: a negative control treatment (which has no effect on the actual outcome), and a negative control outcome (which is not affected by the actual treatment). These auxiliary variables can also be viewed as proxies for a traditional set of control variables, and they bear resemblance to instrumental variables. I propose a family of algorithms based on kernel ridge regression for learning nonparametric treatment effects with negative controls. Examples include dose response curves, dose response curves with distribution shift, and heterogeneous treatment effects. Data may be discrete or continuous, and low, high, or infinite dimensional. I prove uniform consistency and provide finite sample rates of convergence. I estimate the dose response curve of cigarette smoking on infant birth weight adjusting for unobserved confounding due to household income, using a data set of singleton births in the state of Pennsylvania between 1989 and 1991. △ Less

Submitted 23 March, 2023; v1 submitted 18 December, 2020; originally announced December 2020.

MSC Class: 62G05; 62P10; 62P10 ACM Class: G.3; J.3; J.4

arXiv:2010.04855 [pdf, other]

Kernel Methods for Causal Functions: Dose, Heterogeneous, and Incremental Response Curves

Authors: Rahul Singh, Liyuan Xu, Arthur Gretton

Abstract: We propose estimators based on kernel ridge regression for nonparametric causal functions such as dose, heterogeneous, and incremental response curves. Treatment and covariates may be discrete or continuous in general spaces. Due to a decomposition property specific to the RKHS, our estimators have simple closed form solutions. We prove uniform consistency with finite sample rates via original ana… ▽ More We propose estimators based on kernel ridge regression for nonparametric causal functions such as dose, heterogeneous, and incremental response curves. Treatment and covariates may be discrete or continuous in general spaces. Due to a decomposition property specific to the RKHS, our estimators have simple closed form solutions. We prove uniform consistency with finite sample rates via original analysis of generalized kernel ridge regression. We extend our main results to counterfactual distributions and to causal functions identified by front and back door criteria. We achieve state-of-the-art performance in nonlinear simulations with many covariates, and conduct a policy evaluation of the US Job Corps training program for disadvantaged youths. △ Less

Submitted 21 October, 2022; v1 submitted 9 October, 2020; originally announced October 2020.

Comments: Formerly "Kernel Methods for Policy Evaluation: Treatment Effects, Mediation Analysis, and Off-Policy Planning" (2020)

MSC Class: 62P2; 62G08; 68T05 ACM Class: G.3; I.2; J.4

arXiv:1909.05244 [pdf, ps, other]

Double Robustness for Complier Parameters and a Semiparametric Test for Complier Characteristics

Authors: Rahul Singh, Liyang Sun

Abstract: We propose a semiparametric test to evaluate (i) whether different instruments induce subpopulations of compliers with the same observable characteristics on average, and (ii) whether compliers have observable characteristics that are the same as the full population on average. The test is a flexible robustness check for the external validity of instruments. We use it to reinterpret the difference… ▽ More We propose a semiparametric test to evaluate (i) whether different instruments induce subpopulations of compliers with the same observable characteristics on average, and (ii) whether compliers have observable characteristics that are the same as the full population on average. The test is a flexible robustness check for the external validity of instruments. We use it to reinterpret the difference in LATE estimates that Angrist and Evans (1998) obtain when using different instrumental variables. To justify the test, we characterize the doubly robust moment for Abadie (2003)'s class of complier parameters, and we analyze a machine learning update to $κ$ weighting. △ Less

Submitted 11 December, 2022; v1 submitted 10 September, 2019; originally announced September 2019.

Comments: 36 pages, 4 figures, 4 tables

arXiv:1906.00232 [pdf, ps, other]

Kernel Instrumental Variable Regression

Authors: Rahul Singh, Maneesh Sahani, Arthur Gretton

Abstract: Instrumental variable (IV) regression is a strategy for learning causal relationships in observational data. If measurements of input X and output Y are confounded, the causal relationship can nonetheless be identified if an instrumental variable Z is available that influences X directly, but is conditionally independent of Y given X and the unmeasured confounder. The classic two-stage least squar… ▽ More Instrumental variable (IV) regression is a strategy for learning causal relationships in observational data. If measurements of input X and output Y are confounded, the causal relationship can nonetheless be identified if an instrumental variable Z is available that influences X directly, but is conditionally independent of Y given X and the unmeasured confounder. The classic two-stage least squares algorithm (2SLS) simplifies the estimation problem by modeling all relationships as linear functions. We propose kernel instrumental variable regression (KIV), a nonparametric generalization of 2SLS, modeling relations among X, Y, and Z as nonlinear functions in reproducing kernel Hilbert spaces (RKHSs). We prove the consistency of KIV under mild assumptions, and derive conditions under which convergence occurs at the minimax optimal rate for unconfounded, single-stage RKHS regression. In doing so, we obtain an efficient ratio between training sample sizes used in the algorithm's first and second stages. In experiments, KIV outperforms state of the art alternatives for nonparametric IV regression. △ Less

Submitted 15 July, 2020; v1 submitted 1 June, 2019; originally announced June 2019.

Comments: 41 pages, 11 figures. Advances in Neural Information Processing Systems. 2019

arXiv:1809.05224 [pdf, ps, other]

Automatic Debiased Machine Learning of Causal and Structural Effects

Authors: Victor Chernozhukov, Whitney K Newey, Rahul Singh

Abstract: Many causal and structural effects depend on regressions. Examples include policy effects, average derivatives, regression decompositions, average treatment effects, causal mediation, and parameters of economic structural models. The regressions may be high dimensional, making machine learning useful. Plugging machine learners into identifying equations can lead to poor inference due to bias from… ▽ More Many causal and structural effects depend on regressions. Examples include policy effects, average derivatives, regression decompositions, average treatment effects, causal mediation, and parameters of economic structural models. The regressions may be high dimensional, making machine learning useful. Plugging machine learners into identifying equations can lead to poor inference due to bias from regularization and/or model selection. This paper gives automatic debiasing for linear and nonlinear functions of regressions. The debiasing is automatic in using Lasso and the function of interest without the full form of the bias correction. The debiasing can be applied to any regression learner, including neural nets, random forests, Lasso, boosting, and other high dimensional methods. In addition to providing the bias correction we give standard errors that are robust to misspecification, convergence rates for the bias correction, and primitive conditions for asymptotic inference for estimators of a variety of estimators of structural and causal effects. The automatic debiased machine learning is used to estimate the average treatment effect on the treated for the NSW job training data and to estimate demand elasticities from Nielsen scanner data while allowing preferences to be correlated with prices and income. △ Less

Submitted 21 October, 2022; v1 submitted 13 September, 2018; originally announced September 2018.

Comments: Econometrica 2022

arXiv:1802.08667 [pdf, ps, other]

De-Biased Machine Learning of Global and Local Parameters Using Regularized Riesz Representers

Authors: Victor Chernozhukov, Whitney Newey, Rahul Singh

Abstract: We provide adaptive inference methods, based on $\ell_1$ regularization, for regular (semi-parametric) and non-regular (nonparametric) linear functionals of the conditional expectation function. Examples of regular functionals include average treatment effects, policy effects, and derivatives. Examples of non-regular functionals include average treatment effects, policy effects, and derivatives co… ▽ More We provide adaptive inference methods, based on $\ell_1$ regularization, for regular (semi-parametric) and non-regular (nonparametric) linear functionals of the conditional expectation function. Examples of regular functionals include average treatment effects, policy effects, and derivatives. Examples of non-regular functionals include average treatment effects, policy effects, and derivatives conditional on a covariate subvector fixed at a point. We construct a Neyman orthogonal equation for the target parameter that is approximately invariant to small perturbations of the nuisance parameters. To achieve this property, we include the Riesz representer for the functional as an additional nuisance parameter. Our analysis yields weak ``double sparsity robustness'': either the approximation to the regression or the approximation to the representer can be ``completely dense'' as long as the other is sufficiently ``sparse''. Our main results are non-asymptotic and imply asymptotic uniform validity over large classes of models, translating into honest confidence bands for both global and local parameters. △ Less

Submitted 21 October, 2022; v1 submitted 23 February, 2018; originally announced February 2018.

Comments: The Econometrics Journal, 2022

Showing 1–16 of 16 results for author: Singh, R