-
Causal Inference with High-dimensional Discrete Covariates
Authors:
Zhenghao Zeng,
Sivaraman Balakrishnan,
Yanjun Han,
Edward H. Kennedy
Abstract:
When estimating causal effects from observational studies, researchers often need to adjust for many covariates to deconfound the non-causal relationship between exposure and outcome, among which many covariates are discrete. The behavior of commonly used estimators in the presence of many discrete covariates is not well understood since their properties are often analyzed under structural assumpt…
▽ More
When estimating causal effects from observational studies, researchers often need to adjust for many covariates to deconfound the non-causal relationship between exposure and outcome, among which many covariates are discrete. The behavior of commonly used estimators in the presence of many discrete covariates is not well understood since their properties are often analyzed under structural assumptions including sparsity and smoothness, which do not apply in discrete settings. In this work, we study the estimation of causal effects in a model where the covariates required for confounding adjustment are discrete but high-dimensional, meaning the number of categories $d$ is comparable with or even larger than sample size $n$. Specifically, we show the mean squared error of commonly used regression, weighting and doubly robust estimators is bounded by $\frac{d^2}{n^2}+\frac{1}{n}$. We then prove the minimax lower bound for the average treatment effect is of order $\frac{d^2}{n^2 \log^2 n}+\frac{1}{n}$, which characterizes the fundamental difficulty of causal effect estimation in the high-dimensional discrete setting, and shows the estimators mentioned above are rate-optimal up to log-factors. We further consider additional structures that can be exploited, namely effect homogeneity and prior knowledge of the covariate distribution, and propose new estimators that enjoy faster convergence rates of order $\frac{d}{n^2} + \frac{1}{n}$, which achieve consistency in a broader regime. The results are illustrated empirically via simulation studies.
△ Less
Submitted 5 May, 2024; v1 submitted 30 April, 2024;
originally announced May 2024.
-
Double Cross-fit Doubly Robust Estimators: Beyond Series Regression
Authors:
Alec McClean,
Sivaraman Balakrishnan,
Edward H. Kennedy,
Larry Wasserman
Abstract:
Doubly robust estimators with cross-fitting have gained popularity in causal inference due to their favorable structure-agnostic error guarantees. However, when additional structure, such as Hölder smoothness, is available then more accurate "double cross-fit doubly robust" (DCDR) estimators can be constructed by splitting the training data and undersmoothing nuisance function estimators on indepe…
▽ More
Doubly robust estimators with cross-fitting have gained popularity in causal inference due to their favorable structure-agnostic error guarantees. However, when additional structure, such as Hölder smoothness, is available then more accurate "double cross-fit doubly robust" (DCDR) estimators can be constructed by splitting the training data and undersmoothing nuisance function estimators on independent samples. We study a DCDR estimator of the Expected Conditional Covariance, a functional of interest in causal inference and conditional independence testing, and derive a series of increasingly powerful results with progressively stronger assumptions. We first provide a structure-agnostic error analysis for the DCDR estimator with no assumptions on the nuisance functions or their estimators. Then, assuming the nuisance functions are Hölder smooth, but without assuming knowledge of the true smoothness level or the covariate density, we establish that DCDR estimators with several linear smoothers are semiparametric efficient under minimal conditions and achieve fast convergence rates in the non-$\sqrt{n}$ regime. When the covariate density and smoothnesses are known, we propose a minimax rate-optimal DCDR estimator based on undersmoothed kernel regression. Moreover, we show an undersmoothed DCDR estimator satisfies a slower-than-$\sqrt{n}$ central limit theorem, and that inference is possible even in the non-$\sqrt{n}$ regime. Finally, we support our theoretical results with simulations, providing intuition for double cross-fitting and undersmoothing, demonstrating where our estimator achieves semiparametric efficiency while the usual "single cross-fit" estimator fails, and illustrating asymptotic normality for the undersmoothed DCDR estimator.
△ Less
Submitted 15 April, 2024; v1 submitted 22 March, 2024;
originally announced March 2024.
-
Distribution-uniform anytime-valid sequential inference
Authors:
Ian Waudby-Smith,
Edward H. Kennedy,
Aaditya Ramdas
Abstract:
Are asymptotic confidence sequences and anytime $p$-values uniformly valid for a nontrivial class of distributions $\mathcal{P}$? We give a positive answer to this question by deriving distribution-uniform anytime-valid inference procedures. Historically, anytime-valid methods -- including confidence sequences, anytime $p$-values, and sequential hypothesis tests that enable inference at stop** t…
▽ More
Are asymptotic confidence sequences and anytime $p$-values uniformly valid for a nontrivial class of distributions $\mathcal{P}$? We give a positive answer to this question by deriving distribution-uniform anytime-valid inference procedures. Historically, anytime-valid methods -- including confidence sequences, anytime $p$-values, and sequential hypothesis tests that enable inference at stop** times -- have been justified nonasymptotically. Nevertheless, asymptotic procedures such as those based on the central limit theorem occupy an important part of statistical toolbox due to their simplicity, universality, and weak assumptions. While recent work has derived asymptotic analogues of anytime-valid methods with the aforementioned benefits, these were not shown to be $\mathcal{P}$-uniform, meaning that their asymptotics are not uniformly valid in a class of distributions $\mathcal{P}$. Indeed, the anytime-valid inference literature currently has no central limit theory to draw from that is both uniform in $\mathcal{P}$ and in the sample size $n$. This paper fills that gap by deriving a novel $\mathcal{P}$-uniform strong Gaussian approximation theorem. We apply some of these results to obtain an anytime-valid test of conditional independence without the Model-X assumption, as well as a $\mathcal{P}$-uniform law of the iterated logarithm.
△ Less
Submitted 18 April, 2024; v1 submitted 6 November, 2023;
originally announced November 2023.
-
Causal Effect Estimation after Propensity Score Trimming with Continuous Treatments
Authors:
Zach Branson,
Edward H. Kennedy,
Sivaraman Balakrishnan,
Larry Wasserman
Abstract:
Most works in causal inference focus on binary treatments where one estimates a single treatment-versus-control effect. When treatment is continuous, one must estimate a curve representing the causal relationship between treatment and outcome (the "dose-response curve"), which makes causal inference more challenging. This work proposes estimators using efficient influence functions (EIFs) for caus…
▽ More
Most works in causal inference focus on binary treatments where one estimates a single treatment-versus-control effect. When treatment is continuous, one must estimate a curve representing the causal relationship between treatment and outcome (the "dose-response curve"), which makes causal inference more challenging. This work proposes estimators using efficient influence functions (EIFs) for causal dose-response curves after propensity score trimming. Trimming involves estimating causal effects among subjects with propensity scores above a threshold, which addresses positivity violations that complicate estimation. Several challenges arise with continuous treatments. First, EIFs for trimmed dose-response curves do not exist, due to a lack of pathwise differentiability induced by trimming and a continuous treatment. Second, if the trimming threshold is not prespecified and is instead a parameter that must be estimated, then estimation uncertainty in the threshold must be accounted for. To address these challenges, we target a smoothed version of the trimmed dose-response curve for which an EIF exists. We allow the trimming threshold to be a user-specified quantile of the propensity score distribution, and we construct confidence intervals which reflect uncertainty involved in threshold estimation. Our resulting EIF-based estimators exhibit doubly-robust style guarantees, with error involving products or squares of errors for the outcome regression and propensity score. Thus, our estimators can exhibit parametric convergence rates even when the outcome regression and propensity score are estimated at slower nonparametric rates with flexible estimators. These findings are validated via simulation and an application, thereby showing how to efficiently-but-flexibly estimate a dose-response curve after trimming.
△ Less
Submitted 1 September, 2023;
originally announced September 2023.
-
The Fundamental Limits of Structure-Agnostic Functional Estimation
Authors:
Sivaraman Balakrishnan,
Edward H. Kennedy,
Larry Wasserman
Abstract:
Many recent developments in causal inference, and functional estimation problems more generally, have been motivated by the fact that classical one-step (first-order) debiasing methods, or their more recent sample-split double machine-learning avatars, can outperform plugin estimators under surprisingly weak conditions. These first-order corrections improve on plugin estimators in a black-box fash…
▽ More
Many recent developments in causal inference, and functional estimation problems more generally, have been motivated by the fact that classical one-step (first-order) debiasing methods, or their more recent sample-split double machine-learning avatars, can outperform plugin estimators under surprisingly weak conditions. These first-order corrections improve on plugin estimators in a black-box fashion, and consequently are often used in conjunction with powerful off-the-shelf estimation methods. These first-order methods are however provably suboptimal in a minimax sense for functional estimation when the nuisance functions live in Holder-type function spaces. This suboptimality of first-order debiasing has motivated the development of "higher-order" debiasing methods. The resulting estimators are, in some cases, provably optimal over Holder-type spaces, but both the estimators which are minimax-optimal and their analyses are crucially tied to properties of the underlying function space.
In this paper we investigate the fundamental limits of structure-agnostic functional estimation, where relatively weak conditions are placed on the underlying nuisance functions. We show that there is a strong sense in which existing first-order methods are optimal. We achieve this goal by providing a formalization of the problem of functional estimation with black-box nuisance function estimates, and deriving minimax lower bounds for this problem. Our results highlight some clear tradeoffs in functional estimation -- if we wish to remain agnostic to the underlying nuisance function spaces, impose only high-level rate conditions, and maintain compatibility with black-box nuisance estimators then first-order methods are optimal. When we have an understanding of the structure of the underlying nuisance functions then carefully constructed higher-order estimators can outperform first-order estimators.
△ Less
Submitted 6 May, 2023;
originally announced May 2023.
-
Nonparametric Estimation of Conditional Incremental Effects
Authors:
Alec McClean,
Zach Branson,
Edward H. Kennedy
Abstract:
Conditional effect estimation has great scientific and policy importance because interventions may impact subjects differently depending on their characteristics. Most research has focused on estimating the conditional average treatment effect (CATE). However, identification of the CATE requires all subjects have a non-zero probability of receiving treatment, or positivity, which may be unrealisti…
▽ More
Conditional effect estimation has great scientific and policy importance because interventions may impact subjects differently depending on their characteristics. Most research has focused on estimating the conditional average treatment effect (CATE). However, identification of the CATE requires all subjects have a non-zero probability of receiving treatment, or positivity, which may be unrealistic in practice. Instead, we propose conditional effects based on incremental propensity score interventions, which are stochastic interventions where the odds of treatment are multiplied by some factor. These effects do not require positivity for identification and can be better suited for modeling scenarios in which people cannot be forced into treatment. We develop a projection estimator and a flexible nonparametric estimator that can each estimate all the conditional effects we propose and derive model-agnostic error guarantees showing both estimators satisfy a form of double robustness. Further, we propose a summary of treatment effect heterogeneity and a test for any effect heterogeneity based on the variance of a conditional derivative effect and derive a nonparametric estimator that also satisfies a form of double robustness. Finally, we demonstrate our estimators by analyzing the effect of intensive care unit admission on mortality using a dataset from the (SPOT)light study.
△ Less
Submitted 24 April, 2023; v1 submitted 7 December, 2022;
originally announced December 2022.
-
Minimax rates for heterogeneous causal effect estimation
Authors:
Edward H. Kennedy,
Sivaraman Balakrishnan,
James M. Robins,
Larry Wasserman
Abstract:
Estimation of heterogeneous causal effects - i.e., how effects of policies and treatments vary across subjects - is a fundamental task in causal inference. Many methods for estimating conditional average treatment effects (CATEs) have been proposed in recent years, but questions surrounding optimality have remained largely unanswered. In particular, a minimax theory of optimality has yet to be dev…
▽ More
Estimation of heterogeneous causal effects - i.e., how effects of policies and treatments vary across subjects - is a fundamental task in causal inference. Many methods for estimating conditional average treatment effects (CATEs) have been proposed in recent years, but questions surrounding optimality have remained largely unanswered. In particular, a minimax theory of optimality has yet to be developed, with the minimax rate of convergence and construction of rate-optimal estimators remaining open problems. In this paper we derive the minimax rate for CATE estimation, in a Holder-smooth nonparametric model, and present a new local polynomial estimator, giving high-level conditions under which it is minimax optimal. Our minimax lower bound is derived via a localized version of the method of fuzzy hypotheses, combining lower bound constructions for nonparametric regression and functional estimation. Our proposed estimator can be viewed as a local polynomial R-Learner, based on a localized modification of higher-order influence function methods. The minimax rate we find exhibits several interesting features, including a non-standard elbow phenomenon and an unusual interpolation between nonparametric regression and functional estimation rates. The latter quantifies how the CATE, as an estimand, can be viewed as a regression/functional hybrid.
△ Less
Submitted 22 December, 2023; v1 submitted 1 March, 2022;
originally announced March 2022.
-
Doubly robust capture-recapture methods for estimating population size
Authors:
Manjari Das,
Edward H. Kennedy,
Nicholas P. Jewell
Abstract:
Estimation of population size using incomplete lists (also called the capture-recapture problem) has a long history across many biological and social sciences. For example, human rights and other groups often construct partial and overlap** lists of victims of armed conflicts, with the hope of using this information to estimate the total number of victims. Earlier statistical methods for this se…
▽ More
Estimation of population size using incomplete lists (also called the capture-recapture problem) has a long history across many biological and social sciences. For example, human rights and other groups often construct partial and overlap** lists of victims of armed conflicts, with the hope of using this information to estimate the total number of victims. Earlier statistical methods for this setup either use potentially restrictive parametric assumptions, or else rely on typically suboptimal plug-in-type nonparametric estimators; however, both approaches can lead to substantial bias, the former via model misspecification and the latter via smoothing. Under an identifying assumption that two lists are conditionally independent given measured covariate information, we make several contributions. First, we derive the nonparametric efficiency bound for estimating the capture probability, which indicates the best possible performance of any estimator, and sheds light on the statistical limits of capture-recapture methods. Then we present a new estimator, and study its finite-sample properties, showing that it has a double robustness property new to capture-recapture, and that it is near-optimal in a non-asymptotic sense, under relatively mild nonparametric conditions. Next, we give a method for constructing confidence intervals for total population size from generic capture probability estimators, and prove non-asymptotic near-validity. Finally, we study our methods in simulations, and apply them to estimate the number of killings and disappearances attributable to different groups in Peru during its internal armed conflict between 1980 and 2000.
△ Less
Submitted 31 July, 2021; v1 submitted 28 April, 2021;
originally announced April 2021.
-
Time-uniform central limit theory and asymptotic confidence sequences
Authors:
Ian Waudby-Smith,
David Arbour,
Ritwik Sinha,
Edward H. Kennedy,
Aaditya Ramdas
Abstract:
Confidence intervals based on the central limit theorem (CLT) are a cornerstone of classical statistics. Despite being only asymptotically valid, they are ubiquitous because they permit statistical inference under weak assumptions and can often be applied to problems even when nonasymptotic inference is impossible. This paper introduces time-uniform analogues of such asymptotic confidence interval…
▽ More
Confidence intervals based on the central limit theorem (CLT) are a cornerstone of classical statistics. Despite being only asymptotically valid, they are ubiquitous because they permit statistical inference under weak assumptions and can often be applied to problems even when nonasymptotic inference is impossible. This paper introduces time-uniform analogues of such asymptotic confidence intervals, adding to the literature on confidence sequences (CS) -- sequences of confidence intervals that are uniformly valid over time -- which provide valid inference at arbitrary stop** times and incur no penalties for "peeking" at the data, unlike classical confidence intervals which require the sample size to be fixed in advance. Existing CSs in the literature are nonasymptotic, enjoying finite-sample guarantees but not the aforementioned broad applicability of asymptotic confidence intervals. This work provides a definition for "asymptotic CSs" and a general recipe for deriving them. Asymptotic CSs forgo nonasymptotic validity for CLT-like versatility and (asymptotic) time-uniform guarantees. While the CLT approximates the distribution of a sample average by that of a Gaussian for a fixed sample size, we use strong invariance principles (stemming from the seminal 1960s work of Strassen) to uniformly approximate the entire sample average process by an implicit Gaussian process. As an illustration, we derive asymptotic CSs for the average treatment effect in observational studies (for which nonasymptotic bounds are essentially impossible to derive even in the fixed-time regime) as well as randomized experiments, enabling causal inference in sequential environments.
△ Less
Submitted 13 March, 2024; v1 submitted 11 March, 2021;
originally announced March 2021.
-
Semiparametric counterfactual density estimation
Authors:
Edward H. Kennedy,
Sivaraman Balakrishnan,
Larry Wasserman
Abstract:
Causal effects are often characterized with averages, which can give an incomplete picture of the underlying counterfactual distributions. Here we consider estimating the entire counterfactual density and generic functionals thereof. We focus on two kinds of target parameters. The first is a density approximation, defined by a projection onto a finite-dimensional model using a generalized distance…
▽ More
Causal effects are often characterized with averages, which can give an incomplete picture of the underlying counterfactual distributions. Here we consider estimating the entire counterfactual density and generic functionals thereof. We focus on two kinds of target parameters. The first is a density approximation, defined by a projection onto a finite-dimensional model using a generalized distance metric, which includes f-divergences as well as $L_p$ norms. The second is the distance between counterfactual densities, which can be used as a more nuanced effect measure than the mean difference, and as a tool for model selection. We study nonparametric efficiency bounds for these targets, giving results for smooth but otherwise generic models and distances. Importantly, we show how these bounds connect to means of particular non-trivial functions of counterfactuals, linking the problems of density and mean estimation. We go on to propose doubly robust-style estimators for the density approximations and distances, and study their rates of convergence, showing they can be optimally efficient in large nonparametric models. We also give analogous methods for model selection and aggregation, when many models may be available and of interest. Our results all hold for generic models and distances, but throughout we highlight what happens for particular choices, such as $L_2$ projections on linear models, and KL projections on exponential families. Finally we illustrate by estimating the density of CD4 count among patients with HIV, had all been treated with combination therapy versus zidovudine alone, as well as a density effect. Our results suggest combination therapy may have increased CD4 count most for high-risk patients. Our methods are implemented in the freely available R package npcausal on GitHub.
△ Less
Submitted 23 February, 2021;
originally announced February 2021.
-
Towards optimal doubly robust estimation of heterogeneous causal effects
Authors:
Edward H. Kennedy
Abstract:
Heterogeneous effect estimation plays a crucial role in causal inference, with applications across medicine and social science. Many methods for estimating conditional average treatment effects (CATEs) have been proposed in recent years, but there are important theoretical gaps in understanding if and when such methods are optimal. This is especially true when the CATE has nontrivial structure (e.…
▽ More
Heterogeneous effect estimation plays a crucial role in causal inference, with applications across medicine and social science. Many methods for estimating conditional average treatment effects (CATEs) have been proposed in recent years, but there are important theoretical gaps in understanding if and when such methods are optimal. This is especially true when the CATE has nontrivial structure (e.g., smoothness or sparsity). Our work contributes in several main ways. First, we study a two-stage doubly robust CATE estimator and give a generic model-free error bound, which, despite its generality, yields sharper results than those in the current literature. We apply the bound to derive error rates in nonparametric models with smoothness or sparsity, and give sufficient conditions for oracle efficiency. Underlying our error bound is a general oracle inequality for regression with estimated or imputed outcomes, which is of independent interest; this is the second main contribution. The third contribution is aimed at understanding the fundamental statistical limits of CATE estimation. To that end, we propose and study a local polynomial adaptation of double-residual regression. We show that this estimator can be oracle efficient under even weaker conditions, if used with a specialized form of sample splitting and careful choices of tuning parameters. These are the weakest conditions currently found in the literature, and we conjecture that they are minimal in a minimax sense. We go on to give error bounds in the non-trivial regime where oracle rates cannot be achieved. Some finite-sample properties are explored with simulations.
△ Less
Submitted 21 August, 2023; v1 submitted 29 April, 2020;
originally announced April 2020.
-
Visually Communicating and Teaching Intuition for Influence Functions
Authors:
Aaron Fisher,
Edward H. Kennedy
Abstract:
Estimators based on influence functions (IFs) have been shown to be effective in many settings, especially when combined with machine learning techniques. By focusing on estimating a specific target of interest (e.g., the average effect of a treatment), rather than on estimating the full underlying data generating distribution, IF-based estimators are often able to achieve asymptotically optimal m…
▽ More
Estimators based on influence functions (IFs) have been shown to be effective in many settings, especially when combined with machine learning techniques. By focusing on estimating a specific target of interest (e.g., the average effect of a treatment), rather than on estimating the full underlying data generating distribution, IF-based estimators are often able to achieve asymptotically optimal mean-squared error. Still, many researchers find IF-based estimators to be opaque or overly technical, which makes their use less prevalent and their benefits less available. To help foster understanding and trust in IF-based estimators, we present tangible, visual illustrations of when and how IF-based estimators can outperform standard ``plug-in'' estimators. The figures we show are based on connections between IFs, gradients, linear approximations, and Newton-Raphson.
△ Less
Submitted 27 October, 2019; v1 submitted 7 October, 2018;
originally announced October 2018.
-
Semiparametric theory and empirical processes in causal inference
Authors:
Edward H. Kennedy
Abstract:
In this paper we review important aspects of semiparametric theory and empirical processes that arise in causal inference problems. We begin with a brief introduction to the general problem of causal inference, and go on to discuss estimation and inference for causal effects under semiparametric models, which allow parts of the data-generating process to be unrestricted if they are not of particul…
▽ More
In this paper we review important aspects of semiparametric theory and empirical processes that arise in causal inference problems. We begin with a brief introduction to the general problem of causal inference, and go on to discuss estimation and inference for causal effects under semiparametric models, which allow parts of the data-generating process to be unrestricted if they are not of particular interest (i.e., nuisance functions). These models are very useful in causal problems because the outcome process is often complex and difficult to model, and there may only be information available about the treatment process (at best). Semiparametric theory gives a framework for benchmarking efficiency and constructing estimators in such settings. In the second part of the paper we discuss empirical process theory, which provides powerful tools for understanding the asymptotic behavior of semiparametric estimators that depend on flexible nonparametric estimators of nuisance functions. These tools are crucial for incorporating machine learning and other modern methods into causal inference analyses. We conclude by examining related extensions and future directions for work in semiparametric causal inference.
△ Less
Submitted 22 July, 2016; v1 submitted 15 October, 2015;
originally announced October 2015.