Search | arXiv e-print repository

Calibrated sensitivity models

Authors: Alec McClean, Zach Branson, Edward H. Kennedy

Abstract: In causal inference, sensitivity models assess how unmeasured confounders could alter causal analyses, but the sensitivity parameter -- which quantifies the degree of unmeasured confounding -- is often difficult to interpret. For this reason, researchers sometimes compare the sensitivity parameter to an estimate for measured confounding. This is known as calibration. Although calibration can aid i… ▽ More In causal inference, sensitivity models assess how unmeasured confounders could alter causal analyses, but the sensitivity parameter -- which quantifies the degree of unmeasured confounding -- is often difficult to interpret. For this reason, researchers sometimes compare the sensitivity parameter to an estimate for measured confounding. This is known as calibration. Although calibration can aid interpretation, it is typically conducted post hoc, and uncertainty in the point estimate for measured confounding is rarely accounted for. To address these limitations, we propose novel calibrated sensitivity models, which directly bound the degree of unmeasured confounding by a multiple of measured confounding. The calibrated sensitivity parameter is interpretable as an intuitive unit-less ratio of unmeasured to measured confounding, and uncertainty due to estimating measured confounding can be incorporated. Incorporating this uncertainty shows causal analyses can be less or more robust to unmeasured confounding than would have been suggested by standard approaches. We develop efficient estimators and inferential methods for bounds on the average treatment effect with three calibrated sensitivity models, establishing parametric efficiency and asymptotic normality under doubly robust style nonparametric conditions. We illustrate our methods with a data analysis of the effect of mothers' smoking on infant birthweight. △ Less

Submitted 7 June, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

arXiv:2405.08727 [pdf, other]

Intervention effects based on potential benefit

Authors: Alexander W. Levis, Eli Ben-Michael, Edward H. Kennedy

Abstract: Optimal treatment rules are map**s from individual patient characteristics to tailored treatment assignments that maximize mean outcomes. In this work, we introduce a conditional potential benefit (CPB) metric that measures the expected improvement under an optimally chosen treatment compared to the status quo, within covariate strata. The potential benefit combines (i) the magnitude of the trea… ▽ More Optimal treatment rules are map**s from individual patient characteristics to tailored treatment assignments that maximize mean outcomes. In this work, we introduce a conditional potential benefit (CPB) metric that measures the expected improvement under an optimally chosen treatment compared to the status quo, within covariate strata. The potential benefit combines (i) the magnitude of the treatment effect, and (ii) the propensity for subjects to naturally select a suboptimal treatment. As a consequence, heterogeneity in the CPB can provide key insights into the mechanism by which a treatment acts and/or highlight potential barriers to treatment access or adverse effects. Moreover, we demonstrate that CPB is the natural prioritization score for individualized treatment policies when intervention capacity is constrained. That is, in the resource-limited setting where treatment options are freely accessible, but the ability to intervene on a portion of the target population is constrained (e.g., if the population is large, and follow-up and encouragement of treatment uptake is labor-intensive), targeting subjects with highest CPB maximizes the mean outcome. Focusing on this resource-limited setting, we derive formulas for optimal constrained treatment rules, and for any given budget, quantify the loss compared to the optimal unconstrained rule. We describe sufficient identification assumptions, and propose nonparametric, robust, and efficient estimators of the proposed quantities emerging from our framework. △ Less

Submitted 14 May, 2024; originally announced May 2024.

Comments: 32 pages, 1 figure

arXiv:2405.08525 [pdf, other]

Doubly-robust inference and optimality in structure-agnostic models with smoothness

Authors: Matteo Bonvini, Edward H. Kennedy, Oliver Dukes, Sivaraman Balakrishnan

Abstract: We study the problem of constructing an estimator of the average treatment effect (ATE) that exhibits doubly-robust asymptotic linearity (DRAL). This is a stronger requirement than doubly-robust consistency. A DRAL estimator can yield asymptotically valid Wald-type confidence intervals even when the propensity score or the outcome model is inconsistently estimated. On the contrary, the celebrated… ▽ More We study the problem of constructing an estimator of the average treatment effect (ATE) that exhibits doubly-robust asymptotic linearity (DRAL). This is a stronger requirement than doubly-robust consistency. A DRAL estimator can yield asymptotically valid Wald-type confidence intervals even when the propensity score or the outcome model is inconsistently estimated. On the contrary, the celebrated doubly-robust, augmented-IPW (AIPW) estimator generally requires consistent estimation of both nuisance functions for standard root-n inference. We make three main contributions. First, we propose a new hybrid class of distributions that consists of the structure-agnostic class introduced in Balakrishnan et al (2023) with additional smoothness constraints. While DRAL is generally not possible in the pure structure-agnostic class, we show that it can be attained in the new hybrid one. Second, we calculate minimax lower bounds for estimating the ATE in the new class, as well as in the pure structure-agnostic one. Third, building upon the literature on doubly-robust inference (van der Laan, 2014, Benkeser et al, 2017, Dukes et al 2021), we propose a new estimator of the ATE that enjoys DRAL. Under certain conditions, we show that its rate of convergence in the new class can be much faster than that achieved by the AIPW estimator and, in particular, matches the minimax lower bound rate, thereby establishing its optimality. Finally, we clarify the connection between DRAL estimators and those based on higher-order influence functions (Robins et al, 2017) and complement our theoretical findings with simulations. △ Less

Submitted 14 May, 2024; originally announced May 2024.

Comments: 54 pages, 2 figures

arXiv:2405.03083 [pdf, other]

Causal K-Means Clustering

Authors: Kwangho Kim, Jisu Kim, Edward H. Kennedy

Abstract: Causal effects are often characterized with population summaries. These might provide an incomplete picture when there are heterogeneous treatment effects across subgroups. Since the subgroup structure is typically unknown, it is more challenging to identify and evaluate subgroup effects than population effects. We propose a new solution to this problem: Causal k-Means Clustering, which harnesses… ▽ More Causal effects are often characterized with population summaries. These might provide an incomplete picture when there are heterogeneous treatment effects across subgroups. Since the subgroup structure is typically unknown, it is more challenging to identify and evaluate subgroup effects than population effects. We propose a new solution to this problem: Causal k-Means Clustering, which harnesses the widely-used k-means clustering algorithm to uncover the unknown subgroup structure. Our problem differs significantly from the conventional clustering setup since the variables to be clustered are unknown counterfactual functions. We present a plug-in estimator which is simple and readily implementable using off-the-shelf algorithms, and study its rate of convergence. We also develop a new bias-corrected estimator based on nonparametric efficiency theory and double machine learning, and show that this estimator achieves fast root-n rates and asymptotic normality in large nonparametric models. Our proposed methods are especially useful for modern outcome-wide studies with multiple treatment levels. Further, our framework is extensible to clustering with generic pseudo-outcomes, such as partially observed outcomes or otherwise unknown functions. Finally, we explore finite sample properties via simulation, and illustrate the proposed methods in a study of treatment programs for adolescent substance abuse. △ Less

Submitted 29 June, 2024; v1 submitted 5 May, 2024; originally announced May 2024.

arXiv:2405.00118 [pdf, other]

Causal Inference with High-dimensional Discrete Covariates

Authors: Zhenghao Zeng, Sivaraman Balakrishnan, Yanjun Han, Edward H. Kennedy

Abstract: When estimating causal effects from observational studies, researchers often need to adjust for many covariates to deconfound the non-causal relationship between exposure and outcome, among which many covariates are discrete. The behavior of commonly used estimators in the presence of many discrete covariates is not well understood since their properties are often analyzed under structural assumpt… ▽ More When estimating causal effects from observational studies, researchers often need to adjust for many covariates to deconfound the non-causal relationship between exposure and outcome, among which many covariates are discrete. The behavior of commonly used estimators in the presence of many discrete covariates is not well understood since their properties are often analyzed under structural assumptions including sparsity and smoothness, which do not apply in discrete settings. In this work, we study the estimation of causal effects in a model where the covariates required for confounding adjustment are discrete but high-dimensional, meaning the number of categories $d$ is comparable with or even larger than sample size $n$. Specifically, we show the mean squared error of commonly used regression, weighting and doubly robust estimators is bounded by $\frac{d^2}{n^2}+\frac{1}{n}$. We then prove the minimax lower bound for the average treatment effect is of order $\frac{d^2}{n^2 \log^2 n}+\frac{1}{n}$, which characterizes the fundamental difficulty of causal effect estimation in the high-dimensional discrete setting, and shows the estimators mentioned above are rate-optimal up to log-factors. We further consider additional structures that can be exploited, namely effect homogeneity and prior knowledge of the covariate distribution, and propose new estimators that enjoy faster convergence rates of order $\frac{d}{n^2} + \frac{1}{n}$, which achieve consistency in a broader regime. The results are illustrated empirically via simulation studies. △ Less

Submitted 5 May, 2024; v1 submitted 30 April, 2024; originally announced May 2024.

Comments: 66 pages, 5 figures

arXiv:2404.09119 [pdf, other]

Causal Inference for Genomic Data with Multiple Heterogeneous Outcomes

Authors: **-Hong Du, Zhenghao Zeng, Edward H. Kennedy, Larry Wasserman, Kathryn Roeder

Abstract: With the evolution of single-cell RNA sequencing techniques into a standard approach in genomics, it has become possible to conduct cohort-level causal inferences based on single-cell-level measurements. However, the individual gene expression levels of interest are not directly observable; instead, only repeated proxy measurements from each individual's cells are available, providing a derived ou… ▽ More With the evolution of single-cell RNA sequencing techniques into a standard approach in genomics, it has become possible to conduct cohort-level causal inferences based on single-cell-level measurements. However, the individual gene expression levels of interest are not directly observable; instead, only repeated proxy measurements from each individual's cells are available, providing a derived outcome to estimate the underlying outcome for each of many genes. In this paper, we propose a generic semiparametric inference framework for doubly robust estimation with multiple derived outcomes, which also encompasses the usual setting of multiple outcomes when the response of each unit is available. To reliably quantify the causal effects of heterogeneous outcomes, we specialize the analysis to standardized average treatment effects and quantile treatment effects. Through this, we demonstrate the use of the semiparametric inferential results for doubly robust estimators derived from both Von Mises expansions and estimating equations. A multiple testing procedure based on Gaussian multiplier bootstrap is tailored for doubly robust estimators to control the false discovery exceedance rate. Applications in single-cell CRISPR perturbation analysis and individual-level differential expression analysis demonstrate the utility of the proposed methods and offer insights into the usage of different estimands for causal inference in genomics. △ Less

Submitted 16 April, 2024; v1 submitted 13 April, 2024; originally announced April 2024.

Comments: 26 pages and 6 figures for the main text, 30 pages and 3 figures for the supplement

arXiv:2403.15175 [pdf, other]

Double Cross-fit Doubly Robust Estimators: Beyond Series Regression

Authors: Alec McClean, Sivaraman Balakrishnan, Edward H. Kennedy, Larry Wasserman

Abstract: Doubly robust estimators with cross-fitting have gained popularity in causal inference due to their favorable structure-agnostic error guarantees. However, when additional structure, such as Hölder smoothness, is available then more accurate "double cross-fit doubly robust" (DCDR) estimators can be constructed by splitting the training data and undersmoothing nuisance function estimators on indepe… ▽ More Doubly robust estimators with cross-fitting have gained popularity in causal inference due to their favorable structure-agnostic error guarantees. However, when additional structure, such as Hölder smoothness, is available then more accurate "double cross-fit doubly robust" (DCDR) estimators can be constructed by splitting the training data and undersmoothing nuisance function estimators on independent samples. We study a DCDR estimator of the Expected Conditional Covariance, a functional of interest in causal inference and conditional independence testing, and derive a series of increasingly powerful results with progressively stronger assumptions. We first provide a structure-agnostic error analysis for the DCDR estimator with no assumptions on the nuisance functions or their estimators. Then, assuming the nuisance functions are Hölder smooth, but without assuming knowledge of the true smoothness level or the covariate density, we establish that DCDR estimators with several linear smoothers are semiparametric efficient under minimal conditions and achieve fast convergence rates in the non-$\sqrt{n}$ regime. When the covariate density and smoothnesses are known, we propose a minimax rate-optimal DCDR estimator based on undersmoothed kernel regression. Moreover, we show an undersmoothed DCDR estimator satisfies a slower-than-$\sqrt{n}$ central limit theorem, and that inference is possible even in the non-$\sqrt{n}$ regime. Finally, we support our theoretical results with simulations, providing intuition for double cross-fitting and undersmoothing, demonstrating where our estimator achieves semiparametric efficiency while the usual "single cross-fit" estimator fails, and illustrating asymptotic normality for the undersmoothed DCDR estimator. △ Less

Submitted 15 April, 2024; v1 submitted 22 March, 2024; originally announced March 2024.

arXiv:2402.09332 [pdf, ps, other]

Nonparametric identification and efficient estimation of causal effects with instrumental variables

Authors: Alexander W. Levis, Edward H. Kennedy, Luke Keele

Abstract: Instrumental variables are widely used in econometrics and epidemiology for identifying and estimating causal effects when an exposure of interest is confounded by unmeasured factors. Despite this popularity, the assumptions invoked to justify the use of instruments differ substantially across the literature. Similarly, statistical approaches for estimating the resulting causal quantities vary con… ▽ More Instrumental variables are widely used in econometrics and epidemiology for identifying and estimating causal effects when an exposure of interest is confounded by unmeasured factors. Despite this popularity, the assumptions invoked to justify the use of instruments differ substantially across the literature. Similarly, statistical approaches for estimating the resulting causal quantities vary considerably, and often rely on strong parametric assumptions. In this work, we compile and organize structural conditions that nonparametrically identify conditional average treatment effects, average treatment effects among the treated, and local average treatment effects, with a focus on identification formulae invoking the conditional Wald estimand. Moreover, we build upon existing work and propose nonparametric efficient estimators of functionals corresponding to marginal and conditional causal contrasts resulting from the various identification paradigms. We illustrate the proposed methods on an observational study examining the effects of operative care on adverse events for cholecystitis patients, and a randomized trial assessing the effects of market participation on political views. △ Less

Submitted 14 February, 2024; originally announced February 2024.

Comments: 46 pages, 2 figures

arXiv:2402.00168 [pdf, other]

Continuous Treatment Effects with Surrogate Outcomes

Authors: Zhenghao Zeng, David Arbour, Avi Feller, Raghavendra Addanki, Ryan Rossi, Ritwik Sinha, Edward H. Kennedy

Abstract: In many real-world causal inference applications, the primary outcomes (labels) are often partially missing, especially if they are expensive or difficult to collect. If the missingness depends on covariates (i.e., missingness is not completely at random), analyses based on fully observed samples alone may be biased. Incorporating surrogates, which are fully observed post-treatment variables relat… ▽ More In many real-world causal inference applications, the primary outcomes (labels) are often partially missing, especially if they are expensive or difficult to collect. If the missingness depends on covariates (i.e., missingness is not completely at random), analyses based on fully observed samples alone may be biased. Incorporating surrogates, which are fully observed post-treatment variables related to the primary outcome, can improve estimation in this case. In this paper, we study the role of surrogates in estimating continuous treatment effects and propose a doubly robust method to efficiently incorporate surrogates in the analysis, which uses both labeled and unlabeled data and does not suffer from the above selection bias problem. Importantly, we establish the asymptotic normality of the proposed estimator and show possible improvements on the variance compared with methods that solely use labeled data. Extensive simulations show our methods enjoy appealing empirical performance. △ Less

Submitted 21 May, 2024; v1 submitted 31 January, 2024; originally announced February 2024.

Comments: 30 pages, 7 figures

arXiv:2311.04359 [pdf, other]

Flexibly Estimating and Interpreting Heterogeneous Treatment Effects of Laparoscopic Surgery for Cholecystitis Patients

Authors: Matteo Bonvini, Zhenghao Zeng, Miaoqing Yu, Edward H. Kennedy, Luke Keele

Abstract: Laparoscopic surgery has been shown through a number of randomized trials to be an effective form of treatment for cholecystitis. Given this evidence, one natural question for clinical practice is: does the effectiveness of laparoscopic surgery vary among patients? It might be the case that, while the overall effect is positive, some patients treated with laparoscopic surgery may respond positivel… ▽ More Laparoscopic surgery has been shown through a number of randomized trials to be an effective form of treatment for cholecystitis. Given this evidence, one natural question for clinical practice is: does the effectiveness of laparoscopic surgery vary among patients? It might be the case that, while the overall effect is positive, some patients treated with laparoscopic surgery may respond positively to the intervention while others do not or may be harmed. In our study, we focus on conditional average treatment effects to understand whether treatment effects vary systematically with patient characteristics. Recent methodological work has developed a meta-learner framework for flexible estimation of conditional causal effects. In this framework, nonparametric estimation methods can be used to avoid bias from model misspecification while preserving statistical efficiency. In addition, researchers can flexibly and effectively explore whether treatment effects vary with a large number of possible effect modifiers. However, these methods have certain limitations. For example, conducting inference can be challenging if black-box models are used. Further, interpreting and visualizing the effect estimates can be difficult when there are multi-valued effect modifiers. In this paper, we develop new methods that allow for interpretable results and inference from the meta-learner framework for heterogeneous treatment effects estimation. We also demonstrate methods that allow for an exploratory analysis to identify possible effect modifiers. We apply our methods to a large database for the use of laparoscopic surgery in treating cholecystitis. We also conduct a series of simulation studies to understand the relative performance of the methods we develop. Our study provides key guidelines for the interpretation of conditional causal effects from the meta-learner framework. △ Less

Submitted 7 November, 2023; originally announced November 2023.

Comments: 48 pages, 7 figures

arXiv:2311.03343 [pdf, other]

Distribution-uniform anytime-valid sequential inference

Authors: Ian Waudby-Smith, Edward H. Kennedy, Aaditya Ramdas

Abstract: Are asymptotic confidence sequences and anytime $p$-values uniformly valid for a nontrivial class of distributions $\mathcal{P}$? We give a positive answer to this question by deriving distribution-uniform anytime-valid inference procedures. Historically, anytime-valid methods -- including confidence sequences, anytime $p$-values, and sequential hypothesis tests that enable inference at stop** t… ▽ More Are asymptotic confidence sequences and anytime $p$-values uniformly valid for a nontrivial class of distributions $\mathcal{P}$? We give a positive answer to this question by deriving distribution-uniform anytime-valid inference procedures. Historically, anytime-valid methods -- including confidence sequences, anytime $p$-values, and sequential hypothesis tests that enable inference at stop** times -- have been justified nonasymptotically. Nevertheless, asymptotic procedures such as those based on the central limit theorem occupy an important part of statistical toolbox due to their simplicity, universality, and weak assumptions. While recent work has derived asymptotic analogues of anytime-valid methods with the aforementioned benefits, these were not shown to be $\mathcal{P}$-uniform, meaning that their asymptotics are not uniformly valid in a class of distributions $\mathcal{P}$. Indeed, the anytime-valid inference literature currently has no central limit theory to draw from that is both uniform in $\mathcal{P}$ and in the sample size $n$. This paper fills that gap by deriving a novel $\mathcal{P}$-uniform strong Gaussian approximation theorem. We apply some of these results to obtain an anytime-valid test of conditional independence without the Model-X assumption, as well as a $\mathcal{P}$-uniform law of the iterated logarithm. △ Less

Submitted 18 April, 2024; v1 submitted 6 November, 2023; originally announced November 2023.

arXiv:2309.16129 [pdf, other]

Counterfactual Density Estimation using Kernel Stein Discrepancies

Authors: Diego Martinez-Taboada, Edward H. Kennedy

Abstract: Causal effects are usually studied in terms of the means of counterfactual distributions, which may be insufficient in many scenarios. Given a class of densities known up to normalizing constants, we propose to model counterfactual distributions by minimizing kernel Stein discrepancies in a doubly robust manner. This enables the estimation of counterfactuals over large classes of distributions whi… ▽ More Causal effects are usually studied in terms of the means of counterfactual distributions, which may be insufficient in many scenarios. Given a class of densities known up to normalizing constants, we propose to model counterfactual distributions by minimizing kernel Stein discrepancies in a doubly robust manner. This enables the estimation of counterfactuals over large classes of distributions while exploiting the desired double robustness. We present a theoretical analysis of the proposed estimator, providing sufficient conditions for consistency and asymptotic normality, as well as an examination of its empirical performance. △ Less

Submitted 18 February, 2024; v1 submitted 27 September, 2023; originally announced September 2023.

arXiv:2309.12595 [pdf, other]

Effects of Adolescent Victimization on Offending: Flexible Methods for Missing Data & Unmeasured Confounding

Authors: Mateo Dulce Rubio, Edward H. Kennedy, Valerio Baćak, Daniel S. Nagin

Abstract: The causal link between victimization and violence later in life is largely accepted but has been understudied for victimized adolescents. In this work we use the Add Health dataset, the largest nationally representative longitudinal survey of adolescents, to estimate the relationship between victimization and future offending in this population. To accomplish this, we derive a new doubly robust e… ▽ More The causal link between victimization and violence later in life is largely accepted but has been understudied for victimized adolescents. In this work we use the Add Health dataset, the largest nationally representative longitudinal survey of adolescents, to estimate the relationship between victimization and future offending in this population. To accomplish this, we derive a new doubly robust estimator for the average treatment effect on the treated (ATT) when the treatment and outcome are not always observed. We then find that the offending rate among victimized individuals would have been 3.86 percentage points lower if none of them had been victimized (95% CI: [0.28, 7.45]). This contributes positive evidence of a causal effect of victimization on future offending among adolescents. We further present statistical evidence of heterogeneous effects by age, under which the ATT decreases according to the age at which victimization is experienced. We then devise a novel risk-ratio-based sensitivity analysis and conclude that our results are robust to modest unmeasured confounding. Finally, we show that the found effect is mainly driven by non-violent offending. △ Less

Submitted 21 September, 2023; originally announced September 2023.

arXiv:2309.00706 [pdf, other]

Causal Effect Estimation after Propensity Score Trimming with Continuous Treatments

Authors: Zach Branson, Edward H. Kennedy, Sivaraman Balakrishnan, Larry Wasserman

Abstract: Most works in causal inference focus on binary treatments where one estimates a single treatment-versus-control effect. When treatment is continuous, one must estimate a curve representing the causal relationship between treatment and outcome (the "dose-response curve"), which makes causal inference more challenging. This work proposes estimators using efficient influence functions (EIFs) for caus… ▽ More Most works in causal inference focus on binary treatments where one estimates a single treatment-versus-control effect. When treatment is continuous, one must estimate a curve representing the causal relationship between treatment and outcome (the "dose-response curve"), which makes causal inference more challenging. This work proposes estimators using efficient influence functions (EIFs) for causal dose-response curves after propensity score trimming. Trimming involves estimating causal effects among subjects with propensity scores above a threshold, which addresses positivity violations that complicate estimation. Several challenges arise with continuous treatments. First, EIFs for trimmed dose-response curves do not exist, due to a lack of pathwise differentiability induced by trimming and a continuous treatment. Second, if the trimming threshold is not prespecified and is instead a parameter that must be estimated, then estimation uncertainty in the threshold must be accounted for. To address these challenges, we target a smoothed version of the trimmed dose-response curve for which an EIF exists. We allow the trimming threshold to be a user-specified quantile of the propensity score distribution, and we construct confidence intervals which reflect uncertainty involved in threshold estimation. Our resulting EIF-based estimators exhibit doubly-robust style guarantees, with error involving products or squares of errors for the outcome regression and propensity score. Thus, our estimators can exhibit parametric convergence rates even when the outcome regression and propensity score are estimated at slower nonparametric rates with flexible estimators. These findings are validated via simulation and an application, thereby showing how to efficiently-but-flexibly estimate a dose-response curve after trimming. △ Less

Submitted 1 September, 2023; originally announced September 2023.

arXiv:2306.17464 [pdf, other]

Minimax optimal subgroup identification

Authors: Matteo Bonvini, Edward H. Kennedy, Luke J. Keele

Abstract: Quantifying treatment effect heterogeneity is a crucial task in many areas of causal inference, e.g. optimal treatment allocation and estimation of subgroup effects. We study the problem of estimating the level sets of the conditional average treatment effect (CATE), identified under the no-unmeasured-confounders assumption. Given a user-specified threshold, the goal is to estimate the set of all… ▽ More Quantifying treatment effect heterogeneity is a crucial task in many areas of causal inference, e.g. optimal treatment allocation and estimation of subgroup effects. We study the problem of estimating the level sets of the conditional average treatment effect (CATE), identified under the no-unmeasured-confounders assumption. Given a user-specified threshold, the goal is to estimate the set of all units for whom the treatment effect exceeds that threshold. For example, if the cutoff is zero, the estimand is the set of all units who would benefit from receiving treatment. Assigning treatment just to this set represents the optimal treatment rule that maximises the mean population outcome. Similarly, cutoffs greater than zero represent optimal rules under resource constraints. The level set estimator that we study follows the plug-in principle and consists of simply thresholding a good estimator of the CATE. While many CATE estimators have been recently proposed and analysed, how their properties relate to those of the corresponding level set estimators remains unclear. Our first goal is thus to fill this gap by deriving the asymptotic properties of level set estimators depending on which estimator of the CATE is used. Next, we identify a minimax optimal estimator in a model where the CATE, the propensity score and the outcome model are Holder-smooth of varying orders. We consider data generating processes that satisfy a margin condition governing the probability of observing units for whom the CATE is close to the threshold. We investigate the performance of the estimators in simulations and illustrate our methods on a dataset used to study the effects on mortality of laparoscopic vs open surgery in the treatment of various conditions of the colon. △ Less

Submitted 30 June, 2023; originally announced June 2023.

Comments: 38 pages, 4 figures

arXiv:2305.14040 [pdf]

doi 10.1007/s10940-024-09582-7

Incremental Propensity Score Effects for Criminology: An Application Assessing the Relationship Between Homelessness, Behavioral Health Problems, and Recidivism

Authors: Leah A. Jacobs, Alec McClean, Zach Branson, Edward H. Kennedy, Alex Fixler

Abstract: This study examines the relationship between homelessness and recidivism among people on probation with and without behavioral health problems. The study also illustrates a new way to summarize the effect of an exposure on an outcome, the Incremental Propensity Score (IPS) effect, which avoids pitfalls of other approaches commonly used in criminology. We assessed the impact of homelessness at prob… ▽ More This study examines the relationship between homelessness and recidivism among people on probation with and without behavioral health problems. The study also illustrates a new way to summarize the effect of an exposure on an outcome, the Incremental Propensity Score (IPS) effect, which avoids pitfalls of other approaches commonly used in criminology. We assessed the impact of homelessness at probation start on rearrest within one year among a cohort of people on probation (n = 2,453). We estimated IPS effects, considering general and crime-specific recidivism if subjects were more or less likely to be unhoused, and assessed effect variation by behavioral health problem status. We used a doubly robust machine learning estimator to flexibly but efficiently estimate effects. A substantial intervention -- reducing homelessness by roughly 65% -- corresponded to a 9% reduction in the estimated average rate of recidivism (p < .05). Milder interventions showed smaller, non-significant effect sizes. Stratifying by behavioral health problem and rearrest type led to similar results without statistical significance. Minding limitations related to observational data and generalizability, this study suggests large reductions in homelessness lead to significant reductions in rearrest rates. Efforts to reduce recidivism should include interventions that make homelessness less likely, but notable differences in recidivism will require these interventions be sizable. Meanwhile, efforts to establish recidivism risk factors should consider alternative effects, like IPS effects, to maximize validity and reduce bias. △ Less

Submitted 8 February, 2024; v1 submitted 23 May, 2023; originally announced May 2023.

arXiv:2305.04116 [pdf, ps, other]

The Fundamental Limits of Structure-Agnostic Functional Estimation

Authors: Sivaraman Balakrishnan, Edward H. Kennedy, Larry Wasserman

Abstract: Many recent developments in causal inference, and functional estimation problems more generally, have been motivated by the fact that classical one-step (first-order) debiasing methods, or their more recent sample-split double machine-learning avatars, can outperform plugin estimators under surprisingly weak conditions. These first-order corrections improve on plugin estimators in a black-box fash… ▽ More Many recent developments in causal inference, and functional estimation problems more generally, have been motivated by the fact that classical one-step (first-order) debiasing methods, or their more recent sample-split double machine-learning avatars, can outperform plugin estimators under surprisingly weak conditions. These first-order corrections improve on plugin estimators in a black-box fashion, and consequently are often used in conjunction with powerful off-the-shelf estimation methods. These first-order methods are however provably suboptimal in a minimax sense for functional estimation when the nuisance functions live in Holder-type function spaces. This suboptimality of first-order debiasing has motivated the development of "higher-order" debiasing methods. The resulting estimators are, in some cases, provably optimal over Holder-type spaces, but both the estimators which are minimax-optimal and their analyses are crucially tied to properties of the underlying function space. In this paper we investigate the fundamental limits of structure-agnostic functional estimation, where relatively weak conditions are placed on the underlying nuisance functions. We show that there is a strong sense in which existing first-order methods are optimal. We achieve this goal by providing a formalization of the problem of functional estimation with black-box nuisance function estimates, and deriving minimax lower bounds for this problem. Our results highlight some clear tradeoffs in functional estimation -- if we wish to remain agnostic to the underlying nuisance function spaces, impose only high-level rate conditions, and maintain compatibility with black-box nuisance estimators then first-order methods are optimal. When we have an understanding of the structure of the underlying nuisance functions then carefully constructed higher-order estimators can outperform first-order estimators. △ Less

Submitted 6 May, 2023; originally announced May 2023.

Comments: 32 pages

arXiv:2304.13237 [pdf, other]

An Efficient Doubly-Robust Test for the Kernel Treatment Effect

Authors: Diego Martinez-Taboada, Aaditya Ramdas, Edward H. Kennedy

Abstract: The average treatment effect, which is the difference in expectation of the counterfactuals, is probably the most popular target effect in causal inference with binary treatments. However, treatments may have effects beyond the mean, for instance decreasing or increasing the variance. We propose a new kernel-based test for distributional effects of the treatment. It is, to the best of our knowledg… ▽ More The average treatment effect, which is the difference in expectation of the counterfactuals, is probably the most popular target effect in causal inference with binary treatments. However, treatments may have effects beyond the mean, for instance decreasing or increasing the variance. We propose a new kernel-based test for distributional effects of the treatment. It is, to the best of our knowledge, the first kernel-based, doubly-robust test with provably valid type-I error. Furthermore, our proposed algorithm is computationally efficient, avoiding the use of permutations. △ Less

Submitted 31 October, 2023; v1 submitted 25 April, 2023; originally announced April 2023.

arXiv:2302.00092 [pdf, other]

Efficient Generalization and Transportation

Authors: Zhenghao Zeng, Edward H. Kennedy, Lisa M. Bodnar, Ashley I. Naimi

Abstract: When estimating causal effects, it is important to assess external validity, i.e., determine how useful a given study is to inform a practical question for a specific target population. One challenge is that the covariate distribution in the population underlying a study may be different from that in the target population. If some covariates are effect modifiers, the average treatment effect (ATE)… ▽ More When estimating causal effects, it is important to assess external validity, i.e., determine how useful a given study is to inform a practical question for a specific target population. One challenge is that the covariate distribution in the population underlying a study may be different from that in the target population. If some covariates are effect modifiers, the average treatment effect (ATE) may not generalize to the target population. To tackle this problem, we propose new methods to generalize or transport the ATE from a source population to a target population, in the case where the source and target populations have different sets of covariates. When the ATE in the target population is identified, we propose new doubly robust estimators and establish their rates of convergence and limiting distributions. Under regularity conditions, the doubly robust estimators provably achieve the efficiency bound and are locally asymptotic minimax optimal. A sensitivity analysis is provided when the identification assumptions fail. Simulation studies show the advantages of the proposed doubly robust estimator over simple plug-in estimators. Importantly, we also provide minimax lower bounds and higher-order estimators of the target functionals. The proposed methods are applied in transporting causal effects of dietary intake on adverse pregnancy outcomes from an observational study to the whole U.S. female population. △ Less

Submitted 20 March, 2023; v1 submitted 31 January, 2023; originally announced February 2023.

Comments: 49 pages, 9 figures

arXiv:2301.12106 [pdf, other]

Covariate-assisted bounds on causal effects with instrumental variables

Authors: Alexander W. Levis, Matteo Bonvini, Zhenghao Zeng, Luke Keele, Edward H. Kennedy

Abstract: When an exposure of interest is confounded by unmeasured factors, an instrumental variable (IV) can be used to identify and estimate certain causal contrasts. Identification of the marginal average treatment effect (ATE) from IVs relies on strong untestable structural assumptions. When one is unwilling to assert such structure, IVs can nonetheless be used to construct bounds on the ATE. Famously,… ▽ More When an exposure of interest is confounded by unmeasured factors, an instrumental variable (IV) can be used to identify and estimate certain causal contrasts. Identification of the marginal average treatment effect (ATE) from IVs relies on strong untestable structural assumptions. When one is unwilling to assert such structure, IVs can nonetheless be used to construct bounds on the ATE. Famously, Balke and Pearl (1997) proved tight bounds on the ATE for a binary outcome, in a randomized trial with noncompliance and no covariate information. We demonstrate how these bounds remain useful in observational settings with baseline confounders of the IV, as well as randomized trials with measured baseline covariates. The resulting bounds on the ATE are non-smooth functionals, and thus standard nonparametric efficiency theory is not immediately applicable. To remedy this, we propose (1) under a novel margin condition, influence function-based estimators of the bounds that can attain parametric convergence rates when the nuisance functions are modeled flexibly, and (2) estimators of smooth approximations of these bounds. We propose extensions to continuous outcomes, explore finite sample properties in simulations, and illustrate the proposed estimators in an observational study targeting the effect of higher education on wages. △ Less

Submitted 29 September, 2023; v1 submitted 28 January, 2023; originally announced January 2023.

Comments: 42 pages, 2 figures

arXiv:2301.06199 [pdf, other]

Doubly Robust Counterfactual Classification

Authors: Kwangho Kim, Edward H. Kennedy, José R. Zubizarreta

Abstract: We study counterfactual classification as a new tool for decision-making under hypothetical (contrary to fact) scenarios. We propose a doubly-robust nonparametric estimator for a general counterfactual classifier, where we can incorporate flexible constraints by casting the classification problem as a nonlinear mathematical program involving counterfactuals. We go on to analyze the rates of conver… ▽ More We study counterfactual classification as a new tool for decision-making under hypothetical (contrary to fact) scenarios. We propose a doubly-robust nonparametric estimator for a general counterfactual classifier, where we can incorporate flexible constraints by casting the classification problem as a nonlinear mathematical program involving counterfactuals. We go on to analyze the rates of convergence of the estimator and provide a closed-form expression for its asymptotic distribution. Our analysis shows that the proposed estimator is robust against nuisance model misspecification, and can attain fast $\sqrt{n}$ rates with tractable inference even when using nonparametric machine learning approaches. We study the empirical performance of our methods by simulation and apply them for recidivism risk prediction. △ Less

Submitted 15 January, 2023; originally announced January 2023.

Journal ref: 36th Conference on Neural Information Processing Systems (NeurIPS 2022)

arXiv:2212.03578 [pdf, other]

Nonparametric Estimation of Conditional Incremental Effects

Authors: Alec McClean, Zach Branson, Edward H. Kennedy

Abstract: Conditional effect estimation has great scientific and policy importance because interventions may impact subjects differently depending on their characteristics. Most research has focused on estimating the conditional average treatment effect (CATE). However, identification of the CATE requires all subjects have a non-zero probability of receiving treatment, or positivity, which may be unrealisti… ▽ More Conditional effect estimation has great scientific and policy importance because interventions may impact subjects differently depending on their characteristics. Most research has focused on estimating the conditional average treatment effect (CATE). However, identification of the CATE requires all subjects have a non-zero probability of receiving treatment, or positivity, which may be unrealistic in practice. Instead, we propose conditional effects based on incremental propensity score interventions, which are stochastic interventions where the odds of treatment are multiplied by some factor. These effects do not require positivity for identification and can be better suited for modeling scenarios in which people cannot be forced into treatment. We develop a projection estimator and a flexible nonparametric estimator that can each estimate all the conditional effects we propose and derive model-agnostic error guarantees showing both estimators satisfy a form of double robustness. Further, we propose a summary of treatment effect heterogeneity and a test for any effect heterogeneity based on the variance of a conditional derivative effect and derive a nonparametric estimator that also satisfies a form of double robustness. Finally, we demonstrate our estimators by analyzing the effect of intensive care unit admission on mortality using a dataset from the (SPOT)light study. △ Less

Submitted 24 April, 2023; v1 submitted 7 December, 2022; originally announced December 2022.

arXiv:2210.08272 [pdf, other]

Heterogeneous interventional indirect effects with multiple mediators: non-parametric and semi-parametric approaches

Authors: Max Rubinstein, Zach Branson, Edward H. Kennedy

Abstract: We propose semi- and non-parametric methods to estimate conditional interventional effects in the setting of two discrete mediators whose causal ordering is unknown. Average interventional indirect effects have been shown to decompose an average treatment effect into a direct effect and interventional indirect effects that quantify effects of hypothetical interventions on mediator distributions. Y… ▽ More We propose semi- and non-parametric methods to estimate conditional interventional effects in the setting of two discrete mediators whose causal ordering is unknown. Average interventional indirect effects have been shown to decompose an average treatment effect into a direct effect and interventional indirect effects that quantify effects of hypothetical interventions on mediator distributions. Yet these effects may be heterogeneous across the covariate distribution. We consider the problem of estimating these effects at particular points. We propose an influence-function based estimator of the projection of the conditional effects onto a working model, and show under some conditions that we can achieve root-n consistent and asymptotically normal estimates. Second, we propose a fully non-parametric approach to estimation and show the conditions where this approach can achieve oracle rates of convergence. Finally, we propose a sensitivity analysis for the conditional effects in the presence of mediator-outcome confounding. We propose estimating bounds on the conditional effects using these same methods, and show that these results easily extend to allow for influence-function based estimates of the bounds on the average effects. We conclude examining heterogeneous effects with respect to the effect of COVID-19 vaccinations on depression during February 2021. △ Less

Submitted 18 April, 2023; v1 submitted 15 October, 2022; originally announced October 2022.

arXiv:2207.11825 [pdf, other]

Fast convergence rates for dose-response estimation

Authors: Matteo Bonvini, Edward H. Kennedy

Abstract: We consider the problem of estimating a dose-response curve, both globally and locally at a point. Continuous treatments arise often in practice, e.g. in the form of time spent on an operation, distance traveled to a location or dosage of a drug. Letting A denote a continuous treatment variable, the target of inference is the expected outcome if everyone in the population takes treatment level A=a… ▽ More We consider the problem of estimating a dose-response curve, both globally and locally at a point. Continuous treatments arise often in practice, e.g. in the form of time spent on an operation, distance traveled to a location or dosage of a drug. Letting A denote a continuous treatment variable, the target of inference is the expected outcome if everyone in the population takes treatment level A=a. Under standard assumptions, the dose-response function takes the form of a partial mean. Building upon the recent literature on nonparametric regression with estimated outcomes, we study three different estimators. As a global method, we construct an empirical-risk-minimization-based estimator with an explicit characterization of second-order remainder terms. As a local method, we develop a two-stage, doubly-robust (DR) learner. Finally, we construct a mth-order estimator based on the theory of higher-order influence functions. Under certain conditions, this higher order estimator achieves the fastest rate of convergence that we are aware of for this problem. However, the other two approaches are easier to implement using off-the-shelf software, since they are formulated as two-stage regression tasks. For each estimator, we provide an upper bound on the mean-square error and investigate its finite-sample performance in a simulation. Finally, we describe a flexible, nonparametric method to perform sensitivity analysis to the no-unmeasured-confounding assumption when the treatment is continuous. △ Less

Submitted 24 July, 2022; originally announced July 2022.

arXiv:2207.09016 [pdf, other]

The role of the geometric mean in case-control studies

Authors: Amanda Coston, Edward H. Kennedy

Abstract: Historically used in settings where the outcome is rare or data collection is expensive, outcome-dependent sampling is relevant to many modern settings where data is readily available for a biased sample of the target population, such as public administrative data. Under outcome-dependent sampling, common effect measures such as the average risk difference and the average risk ratio are not identi… ▽ More Historically used in settings where the outcome is rare or data collection is expensive, outcome-dependent sampling is relevant to many modern settings where data is readily available for a biased sample of the target population, such as public administrative data. Under outcome-dependent sampling, common effect measures such as the average risk difference and the average risk ratio are not identified, but the conditional odds ratio is. Aggregation of the conditional odds ratio is challenging since summary measures are generally not identified. Furthermore, the marginal odds ratio can be larger (or smaller) than all conditional odds ratios. This so-called non-collapsibility of the odds ratio is avoidable if we use an alternative aggregation to the standard arithmetic mean. We provide a new definition of collapsibility that makes this choice of aggregation method explicit, and we demonstrate that the odds ratio is collapsible under geometric aggregation. We describe how to partially identify, estimate, and do inference on the geometric odds ratio under outcome-dependent sampling. Our proposed estimator is based on the efficient influence function and therefore has doubly robust-style properties. △ Less

Submitted 18 July, 2022; originally announced July 2022.

arXiv:2203.06469 [pdf, ps, other]

Semiparametric doubly robust targeted double machine learning: a review

Authors: Edward H. Kennedy

Abstract: In this review we cover the basics of efficient nonparametric parameter estimation (also called functional estimation), with a focus on parameters that arise in causal inference problems. We review both efficiency bounds (i.e., what is the best possible performance for estimating a given parameter?) and the analysis of particular estimators (i.e., what is this estimator's error, and does it attain… ▽ More In this review we cover the basics of efficient nonparametric parameter estimation (also called functional estimation), with a focus on parameters that arise in causal inference problems. We review both efficiency bounds (i.e., what is the best possible performance for estimating a given parameter?) and the analysis of particular estimators (i.e., what is this estimator's error, and does it attain the efficiency bound?) under weak assumptions. We emphasize minimax-style efficiency bounds, worked examples, and practical shortcuts for easing derivations. We gloss over most technical details, in the interest of highlighting important concepts and providing intuition for main ideas. △ Less

Submitted 25 January, 2023; v1 submitted 12 March, 2022; originally announced March 2022.

arXiv:2111.07191 [pdf, other]

drpop: Efficient and Doubly Robust Population Size Estimation in R

Authors: Manjari Das, Edward H. Kennedy

Abstract: This paper introduces the R package drpop to flexibly estimate total population size from incomplete lists. Total population estimation, also called capture-recapture, is an important problem in many biological and social sciences. A typical dataset consists of incomplete lists of individuals from the population of interest along with some covariate information. The goal is to estimate the number… ▽ More This paper introduces the R package drpop to flexibly estimate total population size from incomplete lists. Total population estimation, also called capture-recapture, is an important problem in many biological and social sciences. A typical dataset consists of incomplete lists of individuals from the population of interest along with some covariate information. The goal is to estimate the number of unobserved individuals and equivalently, the total population size. drpop flexibly models heterogeneity using the covariate information, under the assumption that two lists are conditionally independent given covariates. This can be a much weaker assumption than full marginal independence often required by classical methods. Moreover, it can incorporate complex and high dimensional covariates, and does not require parametric models like other popular methods. In particular, our estimator is doubly robust and has fast convergence rates even under flexible non-parametric set-ups. drpop provides the user with the flexibility to choose the model for estimation of intermediate parameters and returns the estimated population size, confidence interval and some other related quantities. In this paper, we illustrate the applications of drpop in different scenarios and we also present some performance summaries. △ Less

Submitted 13 November, 2021; originally announced November 2021.

arXiv:2110.10532 [pdf, other]

Incremental causal effects: an introduction and review

Authors: Matteo Bonvini, Alec McClean, Zach Branson, Edward H. Kennedy

Abstract: In this chapter, we review the class of causal effects based on incremental propensity scores interventions proposed by Kennedy [2019]. The aim of incremental propensity score interventions is to estimate the effect of increasing or decreasing subjects' odds of receiving treatment; this differs from the average treatment effect, where the aim is to estimate the effect of everyone deterministically… ▽ More In this chapter, we review the class of causal effects based on incremental propensity scores interventions proposed by Kennedy [2019]. The aim of incremental propensity score interventions is to estimate the effect of increasing or decreasing subjects' odds of receiving treatment; this differs from the average treatment effect, where the aim is to estimate the effect of everyone deterministically receiving versus not receiving treatment. We first present incremental causal effects for the case when there is a single binary treatment, such that it can be compared to average treatment effects and thus shed light on key concepts. In particular, a benefit of incremental effects is that positivity - a common assumption in causal inference - is not needed to identify causal effects. Then we discuss the more general case where treatment is measured at multiple time points, where positivity is more likely to be violated and thus incremental effects can be especially useful. Throughout, we motivate incremental effects with real-world applications, present nonparametric estimators for these effects, and discuss their efficiency properties, while also briefly reviewing the role of influence functions in functional estimation. Finally, we show how to interpret and analyze results using these estimators in practice, and discuss extensions and future directions. △ Less

Submitted 20 October, 2021; originally announced October 2021.

Comments: Matteo Bonvini and Alec McClean contributed equally

arXiv:2104.14091 [pdf, other]

Doubly robust capture-recapture methods for estimating population size

Authors: Manjari Das, Edward H. Kennedy, Nicholas P. Jewell

Abstract: Estimation of population size using incomplete lists (also called the capture-recapture problem) has a long history across many biological and social sciences. For example, human rights and other groups often construct partial and overlap** lists of victims of armed conflicts, with the hope of using this information to estimate the total number of victims. Earlier statistical methods for this se… ▽ More Estimation of population size using incomplete lists (also called the capture-recapture problem) has a long history across many biological and social sciences. For example, human rights and other groups often construct partial and overlap** lists of victims of armed conflicts, with the hope of using this information to estimate the total number of victims. Earlier statistical methods for this setup either use potentially restrictive parametric assumptions, or else rely on typically suboptimal plug-in-type nonparametric estimators; however, both approaches can lead to substantial bias, the former via model misspecification and the latter via smoothing. Under an identifying assumption that two lists are conditionally independent given measured covariate information, we make several contributions. First, we derive the nonparametric efficiency bound for estimating the capture probability, which indicates the best possible performance of any estimator, and sheds light on the statistical limits of capture-recapture methods. Then we present a new estimator, and study its finite-sample properties, showing that it has a double robustness property new to capture-recapture, and that it is near-optimal in a non-asymptotic sense, under relatively mild nonparametric conditions. Next, we give a method for constructing confidence intervals for total population size from generic capture probability estimators, and prove non-asymptotic near-validity. Finally, we study our methods in simulations, and apply them to estimate the number of killings and disappearances attributable to different groups in Peru during its internal armed conflict between 1980 and 2000. △ Less

Submitted 31 July, 2021; v1 submitted 28 April, 2021; originally announced April 2021.

Comments: 20 pages, 7 figures

arXiv:2104.08300 [pdf, other]

Semiparametric Sensitivity Analysis: Unmeasured Confounding In Observational Studies

Authors: Daniel O. Scharfstein, Razieh Nabi, Edward H. Kennedy, Ming-Yueh Huang, Matteo Bonvini, Marcela Smid

Abstract: Establishing cause-effect relationships from observational data often relies on untestable assumptions. It is crucial to know whether, and to what extent, the conclusions drawn from non-experimental studies are robust to potential unmeasured confounding. In this paper, we focus on the average causal effect (ACE) as our target of inference. We generalize the sensitivity analysis approach developed… ▽ More Establishing cause-effect relationships from observational data often relies on untestable assumptions. It is crucial to know whether, and to what extent, the conclusions drawn from non-experimental studies are robust to potential unmeasured confounding. In this paper, we focus on the average causal effect (ACE) as our target of inference. We generalize the sensitivity analysis approach developed by Robins et al. (2000), Franks et al. (2020) and Zhou and Yao (2023. We use semiparametric theory to derive the non-parametric efficient influence function of the ACE, for fixed sensitivity parameters. We use this influence function to construct a one-step bias-corrected estimator of the ACE. Our estimator depends on semiparametric models for the distribution of the observed data; importantly, these models do not impose any restrictions on the values of sensitivity analysis parameters. We establish sufficient conditions ensuring that our estimator has root-n asymptotics. We use our methodology to evaluate the causal effect of smoking during pregnancy on birth weight. We also evaluate the performance of estimation procedure in a simulation study. △ Less

Submitted 3 November, 2023; v1 submitted 16 April, 2021; originally announced April 2021.

arXiv:2103.15281 [pdf, ps, other]

Comment on "Statistical Modeling: The Two Cultures" by Leo Breiman

Authors: Matteo Bonvini, Alan Mishler, Edward H. Kennedy

Abstract: Motivated by Breiman's rousing 2001 paper on the "two cultures" in statistics, we consider the role that different modeling approaches play in causal inference. We discuss the relationship between model complexity and causal (mis)interpretation, the relative merits of plug-in versus targeted estimation, issues that arise in tuning flexible estimators of causal effects, and some outstanding cultura… ▽ More Motivated by Breiman's rousing 2001 paper on the "two cultures" in statistics, we consider the role that different modeling approaches play in causal inference. We discuss the relationship between model complexity and causal (mis)interpretation, the relative merits of plug-in versus targeted estimation, issues that arise in tuning flexible estimators of causal effects, and some outstanding cultural divisions in causal inference. △ Less

Submitted 28 March, 2021; originally announced March 2021.

arXiv:2103.06476 [pdf, other]

Time-uniform central limit theory and asymptotic confidence sequences

Authors: Ian Waudby-Smith, David Arbour, Ritwik Sinha, Edward H. Kennedy, Aaditya Ramdas

Abstract: Confidence intervals based on the central limit theorem (CLT) are a cornerstone of classical statistics. Despite being only asymptotically valid, they are ubiquitous because they permit statistical inference under weak assumptions and can often be applied to problems even when nonasymptotic inference is impossible. This paper introduces time-uniform analogues of such asymptotic confidence interval… ▽ More Confidence intervals based on the central limit theorem (CLT) are a cornerstone of classical statistics. Despite being only asymptotically valid, they are ubiquitous because they permit statistical inference under weak assumptions and can often be applied to problems even when nonasymptotic inference is impossible. This paper introduces time-uniform analogues of such asymptotic confidence intervals, adding to the literature on confidence sequences (CS) -- sequences of confidence intervals that are uniformly valid over time -- which provide valid inference at arbitrary stop** times and incur no penalties for "peeking" at the data, unlike classical confidence intervals which require the sample size to be fixed in advance. Existing CSs in the literature are nonasymptotic, enjoying finite-sample guarantees but not the aforementioned broad applicability of asymptotic confidence intervals. This work provides a definition for "asymptotic CSs" and a general recipe for deriving them. Asymptotic CSs forgo nonasymptotic validity for CLT-like versatility and (asymptotic) time-uniform guarantees. While the CLT approximates the distribution of a sample average by that of a Gaussian for a fixed sample size, we use strong invariance principles (stemming from the seminal 1960s work of Strassen) to uniformly approximate the entire sample average process by an implicit Gaussian process. As an illustration, we derive asymptotic CSs for the average treatment effect in observational studies (for which nonasymptotic bounds are essentially impossible to derive even in the fixed-time regime) as well as randomized experiments, enabling causal inference in sequential environments. △ Less

Submitted 13 March, 2024; v1 submitted 11 March, 2021; originally announced March 2021.

Comments: 69 pages, 10 figures

arXiv:2103.01802 [pdf, other]

Median Optimal Treatment Regimes

Authors: Liu Leqi, Edward H. Kennedy

Abstract: Optimal treatment regimes are personalized policies for making a treatment decision based on subject characteristics, with the policy chosen to maximize some value. It is common to aim to maximize the mean outcome in the population, via a regime assigning treatment only to those whose mean outcome is higher under treatment versus control. However, the mean can be an unstable measure of centrality,… ▽ More Optimal treatment regimes are personalized policies for making a treatment decision based on subject characteristics, with the policy chosen to maximize some value. It is common to aim to maximize the mean outcome in the population, via a regime assigning treatment only to those whose mean outcome is higher under treatment versus control. However, the mean can be an unstable measure of centrality, resulting in imprecise statistical procedures, as well as unrobust decisions that can be overly influenced by a small fraction of subjects. In this work, we propose a new median optimal treatment regime that instead treats individuals whose conditional median is higher under treatment. This ensures that optimal decisions for individuals from the same group are not overly influenced either by (i) a small fraction of the group (unlike the mean criterion), or (ii) unrelated subjects from different groups (unlike marginal median/quantile criteria). We introduce a new measure of value, the Average Conditional Median Effect (ACME), which summarizes across-group median treatment outcomes of a policy, and which the median optimal treatment regime maximizes. After develo** key motivating examples that distinguish median optimal treatment regimes from mean and marginal median optimal treatment regimes, we give a nonparametric efficiency bound for estimating the ACME of a policy, and propose a new doubly robust-style estimator that achieves the efficiency bound under weak conditions. To construct the median optimal treatment regime, we introduce a new doubly robust-style estimator for the conditional median treatment effect. Finite-sample properties are explored via numerical simulations and the proposed algorithm is illustrated using data from a randomized clinical trial in patients with HIV. △ Less

Submitted 24 February, 2022; v1 submitted 2 March, 2021; originally announced March 2021.

arXiv:2102.12034 [pdf, other]

Semiparametric counterfactual density estimation

Authors: Edward H. Kennedy, Sivaraman Balakrishnan, Larry Wasserman

Abstract: Causal effects are often characterized with averages, which can give an incomplete picture of the underlying counterfactual distributions. Here we consider estimating the entire counterfactual density and generic functionals thereof. We focus on two kinds of target parameters. The first is a density approximation, defined by a projection onto a finite-dimensional model using a generalized distance… ▽ More Causal effects are often characterized with averages, which can give an incomplete picture of the underlying counterfactual distributions. Here we consider estimating the entire counterfactual density and generic functionals thereof. We focus on two kinds of target parameters. The first is a density approximation, defined by a projection onto a finite-dimensional model using a generalized distance metric, which includes f-divergences as well as $L_p$ norms. The second is the distance between counterfactual densities, which can be used as a more nuanced effect measure than the mean difference, and as a tool for model selection. We study nonparametric efficiency bounds for these targets, giving results for smooth but otherwise generic models and distances. Importantly, we show how these bounds connect to means of particular non-trivial functions of counterfactuals, linking the problems of density and mean estimation. We go on to propose doubly robust-style estimators for the density approximations and distances, and study their rates of convergence, showing they can be optimally efficient in large nonparametric models. We also give analogous methods for model selection and aggregation, when many models may be available and of interest. Our results all hold for generic models and distances, but throughout we highlight what happens for particular choices, such as $L_2$ projections on linear models, and KL projections on exponential families. Finally we illustrate by estimating the density of CD4 count among patients with HIV, had all been treated with combination therapy versus zidovudine alone, as well as a density effect. Our results suggest combination therapy may have increased CD4 count most for high-risk patients. Our methods are implemented in the freely available R package npcausal on GitHub. △ Less

Submitted 23 February, 2021; originally announced February 2021.

arXiv:2011.12746 [pdf, ps, other]

Doubly Robust Adaptive LASSO for Effect Modifier Discovery

Authors: Asma Bahamyirou, Mireille E. Schnitzer, Edward H. Kennedy, Lucie Blais, Yi Yang

Abstract: Effect modification occurs when the effect of the treatment on an outcome differs according to the level of a third variable (the effect modifier, EM). A natural way to assess effect modification is by subgroup analysis or include the interaction terms between the treatment and the covariates in an outcome regression. The latter, however, does not target a parameter of a marginal structural model… ▽ More Effect modification occurs when the effect of the treatment on an outcome differs according to the level of a third variable (the effect modifier, EM). A natural way to assess effect modification is by subgroup analysis or include the interaction terms between the treatment and the covariates in an outcome regression. The latter, however, does not target a parameter of a marginal structural model (MSM) unless a correctly specified outcome model is specified. Our aim is to develop a data-adaptive method to select effect modifying variables in an MSM with a single time point exposure. A two-stage procedure is proposed. First, we estimate the conditional outcome expectation and propensity score and plug these into a doubly robust loss function. Second, we use the adaptive LASSO to select the EMs and estimate MSM coefficients. Post-selection inference is then used to obtain coverage on the selected EMs. Simulations studies are performed in order to verify the performance of the proposed methods. △ Less

Submitted 21 December, 2021; v1 submitted 25 November, 2020; originally announced November 2020.

arXiv:2009.02841 [pdf, other]

doi 10.1145/3442188.3445902

Fairness in Risk Assessment Instruments: Post-Processing to Achieve Counterfactual Equalized Odds

Authors: Alan Mishler, Edward H. Kennedy, Alexandra Chouldechova

Abstract: In domains such as criminal justice, medicine, and social welfare, decision makers increasingly have access to algorithmic Risk Assessment Instruments (RAIs). RAIs estimate the risk of an adverse outcome such as recidivism or child neglect, potentially informing high-stakes decisions such as whether to release a defendant on bail or initiate a child welfare investigation. It is important to ensure… ▽ More In domains such as criminal justice, medicine, and social welfare, decision makers increasingly have access to algorithmic Risk Assessment Instruments (RAIs). RAIs estimate the risk of an adverse outcome such as recidivism or child neglect, potentially informing high-stakes decisions such as whether to release a defendant on bail or initiate a child welfare investigation. It is important to ensure that RAIs are fair, so that the benefits and harms of such decisions are equitably distributed. The most widely used algorithmic fairness criteria are formulated with respect to observable outcomes, such as whether a person actually recidivates, but these criteria are misleading when applied to RAIs. Since RAIs are intended to inform interventions that can reduce risk, the prediction itself affects the downstream outcome. Recent work has argued that fairness criteria for RAIs should instead utilize potential outcomes, i.e. the outcomes that would occur in the absence of an appropriate intervention. However, no methods currently exist to satisfy such fairness criteria. In this paper, we target one such criterion, counterfactual equalized odds. We develop a post-processed predictor that is estimated via doubly robust estimators, extending and adapting previous post-processing approaches to the counterfactual setting. We also provide doubly robust estimators of the risk and fairness properties of arbitrary fixed post-processed predictors. Our predictor converges to an optimal fair predictor at fast rates. We illustrate properties of our method and show that it performs well on both simulated and real data. △ Less

Submitted 6 August, 2021; v1 submitted 6 September, 2020; originally announced September 2020.

Comments: 19 pages, 7 figures

Journal ref: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. Pages 386-400

arXiv:2007.12973 [pdf, ps, other]

Doubly Robust Nonparametric Instrumental Variable Estimators for Survival Outcomes

Authors: You** Lee, Edward H. Kennedy, Nandita Mitra

Abstract: Instrumental variable (IV) methods allow us the opportunity to address unmeasured confounding in causal inference. However, most IV methods are only applicable to discrete or continuous outcomes with very few IV methods for censored survival outcomes. In this work we propose nonparametric estimators for the local average treatment effect on survival probabilities under both nonignorable and ignora… ▽ More Instrumental variable (IV) methods allow us the opportunity to address unmeasured confounding in causal inference. However, most IV methods are only applicable to discrete or continuous outcomes with very few IV methods for censored survival outcomes. In this work we propose nonparametric estimators for the local average treatment effect on survival probabilities under both nonignorable and ignorable censoring. We provide an efficient influence function-based estimator and a simple estimation procedure when the IV is either binary or continuous. The proposed estimators possess double-robustness properties and can easily incorporate nonparametric estimation using machine learning tools. In simulation studies, we demonstrate the flexibility and efficiency of our proposed estimators under various plausible scenarios. We apply our method to the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial for estimating the causal effect of screening on survival probabilities and investigate the causal contrasts between the two interventions under different censoring assumptions. △ Less

Submitted 28 September, 2020; v1 submitted 25 July, 2020; originally announced July 2020.

arXiv:2006.16916 [pdf, other]

Counterfactual Predictions under Runtime Confounding

Authors: Amanda Coston, Edward H. Kennedy, Alexandra Chouldechova

Abstract: Algorithms are commonly used to predict outcomes under a particular decision or intervention, such as predicting whether an offender will succeed on parole if placed under minimal supervision. Generally, to learn such counterfactual prediction models from observational data on historical decisions and corresponding outcomes, one must measure all factors that jointly affect the outcomes and the dec… ▽ More Algorithms are commonly used to predict outcomes under a particular decision or intervention, such as predicting whether an offender will succeed on parole if placed under minimal supervision. Generally, to learn such counterfactual prediction models from observational data on historical decisions and corresponding outcomes, one must measure all factors that jointly affect the outcomes and the decision taken. Motivated by decision support applications, we study the counterfactual prediction task in the setting where all relevant factors are captured in the historical data, but it is either undesirable or impermissible to use some such factors in the prediction model. We refer to this setting as runtime confounding. We propose a doubly-robust procedure for learning counterfactual prediction models in this setting. Our theoretical analysis and experimental results suggest that our method often outperforms competing approaches. We also present a validation procedure for evaluating the performance of counterfactual prediction methods. △ Less

Submitted 15 April, 2021; v1 submitted 30 June, 2020; originally announced June 2020.

Journal ref: Advances in Neural Information Processing Systems Vol 33, 2020. pp. 4150--4162

arXiv:2006.09613 [pdf, ps, other]

Discussion of "On nearly assumption-free tests of nominal confidence interval coverage for causal parameters estimated by machine learning"

Authors: Edward H. Kennedy, Sivaraman Balakrishnan, Larry A. Wasserman

Abstract: We congratulate the authors on their exciting paper, which introduces a novel idea for assessing the estimation bias in causal estimates. Doubly robust estimators are now part of the standard set of tools in causal inference, but a typical analysis stops with an estimate and a confidence interval. The authors give an approach for a unique type of model-checking that allows the user to check whethe… ▽ More We congratulate the authors on their exciting paper, which introduces a novel idea for assessing the estimation bias in causal estimates. Doubly robust estimators are now part of the standard set of tools in causal inference, but a typical analysis stops with an estimate and a confidence interval. The authors give an approach for a unique type of model-checking that allows the user to check whether the bias is sufficiently small with respect to the standard error, which is generally required for confidence intervals to be reliable. △ Less

Submitted 16 June, 2020; originally announced June 2020.

arXiv:1912.02793 [pdf, other]

doi 10.1080/01621459.2020.1864382

Sensitivity analysis via the proportion of unmeasured confounding

Authors: Matteo Bonvini, Edward H Kennedy

Abstract: In observational studies, identification of ATEs is generally achieved by assuming that the correct set of confounders has been measured and properly included in the relevant models. Because this assumption is both strong and untestable, a sensitivity analysis should be performed. Common approaches include modeling the bias directly or varying the propensity scores to probe the effects of a potent… ▽ More In observational studies, identification of ATEs is generally achieved by assuming that the correct set of confounders has been measured and properly included in the relevant models. Because this assumption is both strong and untestable, a sensitivity analysis should be performed. Common approaches include modeling the bias directly or varying the propensity scores to probe the effects of a potential unmeasured confounder. In this paper, we take a novel approach whereby the sensitivity parameter is the "proportion of unmeasured confounding:" the proportion of units for whom the treatment is not as good as randomized even after conditioning on the observed covariates. We consider different assumptions on the probability of a unit being unconfounded. In each case, we derive sharp bounds on the average treatment effect as a function of the sensitivity parameter and propose nonparametric estimators that allow flexible covariate adjustment. We also introduce a one-number summary of a study's robustness to the number of confounded units. Finally, we explore finite-sample properties via simulation, and apply the methods to an observational database used to assess the effects of right heart catheterization. △ Less

Submitted 17 December, 2020; v1 submitted 5 December, 2019; originally announced December 2019.

Comments: 41 pages, 5 figures

arXiv:1910.03531 [pdf, ps, other]

Causal Inference for Comprehensive Cohort Studies

Authors: Yi Lu, Daniel O. Scharfstein, Maria M. Brooks, Kevin Quach, Edward H. Kennedy

Abstract: In a comprehensive cohort study of two competing treatments (say, A and B), clinically eligible individuals are first asked to enroll in a randomized trial and, if they refuse, are then asked to enroll in a parallel observational study in which they can choose treatment according to their own preference. We consider estimation of two estimands: (1) comprehensive cohort causal effect -- the differe… ▽ More In a comprehensive cohort study of two competing treatments (say, A and B), clinically eligible individuals are first asked to enroll in a randomized trial and, if they refuse, are then asked to enroll in a parallel observational study in which they can choose treatment according to their own preference. We consider estimation of two estimands: (1) comprehensive cohort causal effect -- the difference in mean potential outcomes had all patients in the comprehensive cohort received treatment A vs. treatment B and (2) randomized trial causal effect -- the difference in mean potential outcomes had all patients enrolled in the randomized trial received treatment A vs. treatment B. For each estimand, we consider inference under various sets of unconfoundedness assumptions and construct semiparametric efficient and robust estimators. These estimators depend on nuisance functions, which we estimate, for illustrative purposes, using generalized additive models. Using the theory of sample splitting, we establish the asymptotic properties of our proposed estimators. We also illustrate our methodology using data from the Bypass Angioplasty Revascularization Investigation (BARI) randomized trial and observational registry to evaluate the effect of percutaneous transluminal coronary balloon angioplasty versus coronary artery bypass grafting on 5-year mortality. To evaluate the finite sample performance of our estimators, we use the BARI dataset as the basis of a realistic simulation study. △ Less

Submitted 8 October, 2019; originally announced October 2019.

Comments: 34 pages, 1 figure, 3 tables

arXiv:1909.00066 [pdf, other]

Counterfactual Risk Assessments, Evaluation, and Fairness

Authors: Amanda Coston, Alan Mishler, Edward H. Kennedy, Alexandra Chouldechova

Abstract: Algorithmic risk assessments are increasingly used to help humans make decisions in high-stakes settings, such as medicine, criminal justice and education. In each of these cases, the purpose of the risk assessment tool is to inform actions, such as medical treatments or release conditions, often with the aim of reducing the likelihood of an adverse event such as hospital readmission or recidivism… ▽ More Algorithmic risk assessments are increasingly used to help humans make decisions in high-stakes settings, such as medicine, criminal justice and education. In each of these cases, the purpose of the risk assessment tool is to inform actions, such as medical treatments or release conditions, often with the aim of reducing the likelihood of an adverse event such as hospital readmission or recidivism. Problematically, most tools are trained and evaluated on historical data in which the outcomes observed depend on the historical decision-making policy. These tools thus reflect risk under the historical policy, rather than under the different decision options that the tool is intended to inform. Even when tools are constructed to predict risk under a specific decision, they are often improperly evaluated as predictors of the target outcome. Focusing on the evaluation task, in this paper we define counterfactual analogues of common predictive performance and algorithmic fairness metrics that we argue are better suited for the decision-making context. We introduce a new method for estimating the proposed metrics using doubly robust estimation. We provide theoretical results that show that only under strong conditions can fairness according to the standard metric and the counterfactual metric simultaneously hold. Consequently, fairness-promoting methods that target parity in a standard fairness metric may --- and as we show empirically, do --- induce greater imbalance in the counterfactual analogue. We provide empirical comparisons on both synthetic data and a real world child welfare dataset to demonstrate how the proposed method improves upon standard practice. △ Less

Submitted 10 January, 2020; v1 submitted 30 August, 2019; originally announced September 2019.

Comments: To appear in ACM FAT* 2020

arXiv:1907.04004 [pdf, other]

doi 10.1515/jci-2020-0031

Incremental Intervention Effects in Studies with Dropout and Many Timepoints

Authors: Kwangho Kim, Edward H. Kennedy, Ashley I. Naimi

Abstract: Modern longitudinal studies collect feature data at many timepoints, often of the same order of sample size. Such studies are typically affected by {dropout} and positivity violations. We tackle these problems by generalizing effects of recent incremental interventions (which shift propensity scores rather than set treatment values deterministically) to accommodate multiple outcomes and subject dr… ▽ More Modern longitudinal studies collect feature data at many timepoints, often of the same order of sample size. Such studies are typically affected by {dropout} and positivity violations. We tackle these problems by generalizing effects of recent incremental interventions (which shift propensity scores rather than set treatment values deterministically) to accommodate multiple outcomes and subject dropout. We give an identifying expression for incremental intervention effects when dropout is conditionally ignorable (without requiring treatment positivity), and derive the nonparametric efficiency bound for estimating such effects. Then we present efficient nonparametric estimators, showing that they converge at fast parametric rates and yield uniform inferential guarantees, even when nuisance functions are estimated flexibly at slower rates. We also study the variance ratio of incremental intervention effects relative to more conventional deterministic effects in a novel infinite time horizon setting, where the number of timepoints can grow with sample size, and show that incremental intervention effects yield near-exponential gains in statistical precision in this setup. Finally we conclude with simulations and apply our methods in a study of the effect of low-dose aspirin on pregnancy outcomes. △ Less

Submitted 25 November, 2021; v1 submitted 9 July, 2019; originally announced July 2019.

Comments: 52 pages

MSC Class: 62G05

Journal ref: Journal of Causal Inference, vol. 9, no. 1, 2021, pp. 302-344

arXiv:1811.01301 [pdf, other]

Instrumental Variable Methods using Dynamic Interventions

Authors: Jacqueline A Mauro, Edward H Kennedy, Daniel Nagin

Abstract: Recent work on dynamic interventions has greatly expanded the range of causal questions researchers can study while weakening identifying assumptions and yielding effects that are more practically relevant. However, most work in dynamic interventions to date has focused on settings where we directly alter some unconfounded treatment of interest. In policy analysis, decision makers rarely have this… ▽ More Recent work on dynamic interventions has greatly expanded the range of causal questions researchers can study while weakening identifying assumptions and yielding effects that are more practically relevant. However, most work in dynamic interventions to date has focused on settings where we directly alter some unconfounded treatment of interest. In policy analysis, decision makers rarely have this level of control over behaviors or access to experimental data. Instead, they are often faced with treatments they can affect only indirectly and whose effects must be learned from observational data. In this paper, we propose new estimands and estimators of causal effects based on dynamic interventions with instrumental variables. This method does not rely on parametric models and does not require an experiment. Instead, we estimate the effect of a dynamic intervention on the instrument. This robustness should reassure policy makers that these estimates can be used to effectively inform policy. We demonstrate the usefulness of this estimation strategy in a case study examining the effect of visitation on recidivism. △ Less

Submitted 8 July, 2019; v1 submitted 3 November, 2018; originally announced November 2018.

arXiv:1810.03260 [pdf, other]

Visually Communicating and Teaching Intuition for Influence Functions

Authors: Aaron Fisher, Edward H. Kennedy

Abstract: Estimators based on influence functions (IFs) have been shown to be effective in many settings, especially when combined with machine learning techniques. By focusing on estimating a specific target of interest (e.g., the average effect of a treatment), rather than on estimating the full underlying data generating distribution, IF-based estimators are often able to achieve asymptotically optimal m… ▽ More Estimators based on influence functions (IFs) have been shown to be effective in many settings, especially when combined with machine learning techniques. By focusing on estimating a specific target of interest (e.g., the average effect of a treatment), rather than on estimating the full underlying data generating distribution, IF-based estimators are often able to achieve asymptotically optimal mean-squared error. Still, many researchers find IF-based estimators to be opaque or overly technical, which makes their use less prevalent and their benefits less available. To help foster understanding and trust in IF-based estimators, we present tangible, visual illustrations of when and how IF-based estimators can outperform standard ``plug-in'' estimators. The figures we show are based on connections between IFs, gradients, linear approximations, and Newton-Raphson. △ Less

Submitted 27 October, 2019; v1 submitted 7 October, 2018; originally announced October 2018.

Comments: This manuscript version includes 2 additional supplemental figures to further aid intuition. In total: 4 figures, 36 pages (double spaced)

arXiv:1810.00767 [pdf, other]

A nonparametric projection-based estimator for the probability of causation, with application to water sanitation in Kenya

Authors: Maria Cuellar, Edward H. Kennedy

Abstract: Current estimation methods for the probability of causation (PC) make strong parametric assumptions or are inefficient. We derive a nonparametric influence-function-based estimator for a projection of PC, which allows for simple interpretation and valid inference by making weak structural assumptions. We apply our estimator to real data from an experiment in Kenya, which found, by estimating the a… ▽ More Current estimation methods for the probability of causation (PC) make strong parametric assumptions or are inefficient. We derive a nonparametric influence-function-based estimator for a projection of PC, which allows for simple interpretation and valid inference by making weak structural assumptions. We apply our estimator to real data from an experiment in Kenya, which found, by estimating the average treatment effect, that protecting water springs reduces childhood disease. However, before scaling up this intervention, it is important to determine whether it was the exposure, and not something else, that caused the outcome. Indeed, we find that some children, who were exposed to a high concentration of bacteria in drinking water and had a diarrheal disease, would likely have contracted the disease absent the exposure since the estimated PC for an average child in this study is 0.12 with a 95% confidence interval of (0.11, 0.13). Our nonparametric method offers researchers a way to estimate PC, which is essential if one wishes to determine not only the average treatment effect, but also whether an exposure likely caused the observed outcome. △ Less

Submitted 30 October, 2019; v1 submitted 1 October, 2018; originally announced October 2018.

Comments: 24 pages, 6 figures

arXiv:1806.02935 [pdf, other]

Causal effects based on distributional distances

Authors: Kwangho Kim, Jisu Kim, Edward H. Kennedy

Abstract: In this paper we develop a framework for characterizing causal effects via distributional distances. In particular we define a causal effect in terms of the $L_1$ distance between different counterfactual outcome distributions, rather than the typical mean difference in outcome values. Comparing entire counterfactual outcome distributions can provide more nuanced and valuable measures for explorin… ▽ More In this paper we develop a framework for characterizing causal effects via distributional distances. In particular we define a causal effect in terms of the $L_1$ distance between different counterfactual outcome distributions, rather than the typical mean difference in outcome values. Comparing entire counterfactual outcome distributions can provide more nuanced and valuable measures for exploring causal effects beyond the average treatment effect. First, we propose a novel way to estimate counterfactual outcome densities, which is of independent interest. Then we develop an efficient estimator of our target causal effect. We go on to provide error bounds and asymptotic properties of the proposed estimator, along with bootstrap-based confidence intervals. Finally, we illustrate the methods via simulations and real data. △ Less

Submitted 26 February, 2021; v1 submitted 7 June, 2018; originally announced June 2018.

Comments: 46 pages

arXiv:1802.08952 [pdf, other]

Efficient nonparametric causal inference with missing exposure information

Authors: Edward H. Kennedy

Abstract: Missing exposure information is a very common feature of many observational studies. Here we study identifiability and efficient estimation of causal effects on vector outcomes, in such cases where treatment is unconfounded but partially missing. We consider a missing at random setting where missingness in treatment can depend not only on complex covariates, but also on post-treatment outcomes. We… ▽ More Missing exposure information is a very common feature of many observational studies. Here we study identifiability and efficient estimation of causal effects on vector outcomes, in such cases where treatment is unconfounded but partially missing. We consider a missing at random setting where missingness in treatment can depend not only on complex covariates, but also on post-treatment outcomes. We give a new identifying expression for average treatment effects in this setting, along with the efficient influence function for this parameter in a nonparametric model, which yields a nonparametric efficiency bound. We use this latter result to construct nonparametric estimators that are less sensitive to the curse of dimensionality than usual, e.g., by having faster rates of convergence than the complex nuisance estimators they rely on. Further we show that these estimators can be root-n consistent and asymptotically normal under weak nonparametric conditions, even when constructed using flexible machine learning. Finally we apply these results to the problem of causal inference with a partially missing instrumental variable. △ Less

Submitted 1 February, 2020; v1 submitted 24 February, 2018; originally announced February 2018.

arXiv:1801.03635 [pdf, other]

Sharp instruments for classifying compliers and generalizing causal effects

Authors: Edward H. Kennedy, Sivaraman Balakrishnan, Max G'Sell

Abstract: It is well-known that, without restricting treatment effect heterogeneity, instrumental variable (IV) methods only identify "local" effects among compliers, i.e., those subjects who take treatment only when encouraged by the IV. Local effects are controversial since they seem to only apply to an unidentified subgroup; this has led many to denounce these effects as having little policy relevance. H… ▽ More It is well-known that, without restricting treatment effect heterogeneity, instrumental variable (IV) methods only identify "local" effects among compliers, i.e., those subjects who take treatment only when encouraged by the IV. Local effects are controversial since they seem to only apply to an unidentified subgroup; this has led many to denounce these effects as having little policy relevance. However, we show that such pessimism is not always warranted: it is possible in some cases to accurately predict who compliers are, and obtain tight bounds on more generalizable effects in identifiable subgroups. We propose methods for doing so and study their estimation error and asymptotic properties, showing that these tasks can in theory be accomplished even with very weak IVs. We go on to introduce a new measure of IV quality called "sharpness", which reflects the variation in compliance explained by covariates, and captures how well one can identify compliers and obtain tight bounds on identifiable subgroup effects. We develop an estimator of sharpness, and show that it is asymptotically efficient under weak conditions. Finally we explore finite-sample properties via simulation, and apply the methods to study canvassing effects on voter turnout. We propose that sharpness should be presented alongside strength to assess IV quality. △ Less

Submitted 30 May, 2019; v1 submitted 11 January, 2018; originally announced January 2018.

arXiv:1711.07137 [pdf, other]

Challenges in Obtaining Valid Causal Effect Estimates with Machine Learning Algorithms

Authors: Ashley I Naimi, Alan E Mishler, Edward H Kennedy

Abstract: Unlike parametric regression, machine learning (ML) methods do not generally require precise knowledge of the true data generating mechanisms. As such, numerous authors have advocated for ML methods to estimate causal effects. Unfortunately, ML algorithms can perform worse than parametric regression. We demonstrate the performance of ML-based single- and double-robust estimators. We use 100 Monte… ▽ More Unlike parametric regression, machine learning (ML) methods do not generally require precise knowledge of the true data generating mechanisms. As such, numerous authors have advocated for ML methods to estimate causal effects. Unfortunately, ML algorithms can perform worse than parametric regression. We demonstrate the performance of ML-based single- and double-robust estimators. We use 100 Monte Carlo samples with sample sizes of 200, 1200, and 5000 to investigate bias and confidence interval coverage under several scenarios. In a simple confounding scenario, confounders were related to the treatment and the outcome via parametric models. In a complex confounding scenario, the simple confounders were transformed to induce complicated nonlinear relationships. In the simple scenario, when ML algorithms were used, double-robust estimators were superior to single-robust estimators. In the complex scenario, single-robust estimators with ML algorithms were at least as biased as estimators using misspecified parametric models. Double-robust estimators were less biased, but coverage was well below nominal. The use of sample splitting, inclusion of confounder interactions, reliance on a richly specified ML algorithm, and use of doubly robust estimators was the only explored approach that yielded negligible bias and nominal coverage. Our results suggest that ML based singly robust methods should be avoided. △ Less

Submitted 14 May, 2020; v1 submitted 19 November, 2017; originally announced November 2017.

Comments: 21 pages, 2 figures, 1 table

Showing 1–50 of 58 results for author: Kennedy, E H