-
Estimation of subsidiary performance metrics under optimal policies
Authors:
Zhaoqi Li,
Houssam Nassif,
Alex Luedtke
Abstract:
In policy learning, the goal is typically to optimize a primary performance metric, but other subsidiary metrics often also warrant attention. This paper presents two strategies for evaluating these subsidiary metrics under a policy that is optimal for the primary one. The first relies on a novel margin condition that facilitates Wald-type inference. Under this and other regularity conditions, we…
▽ More
In policy learning, the goal is typically to optimize a primary performance metric, but other subsidiary metrics often also warrant attention. This paper presents two strategies for evaluating these subsidiary metrics under a policy that is optimal for the primary one. The first relies on a novel margin condition that facilitates Wald-type inference. Under this and other regularity conditions, we show that the one-step corrected estimator is efficient. Despite the utility of this margin condition, it places strong restrictions on how the subsidiary metric behaves for nearly optimal policies, which may not hold in practice. We therefore introduce alternative, two-stage strategies that do not require a margin condition. The first stage constructs a set of candidate policies and the second builds a uniform confidence interval over this set. We provide numerical simulations to evaluate the performance of these methods in different scenarios.
△ Less
Submitted 8 January, 2024;
originally announced January 2024.
-
Adaptive debiased machine learning using data-driven model selection techniques
Authors:
Lars van der Laan,
Marco Carone,
Alex Luedtke,
Mark van der Laan
Abstract:
Debiased machine learning estimators for nonparametric inference of smooth functionals of the data-generating distribution can suffer from excessive variability and instability. For this reason, practitioners may resort to simpler models based on parametric or semiparametric assumptions. However, such simplifying assumptions may fail to hold, and estimates may then be biased due to model misspecif…
▽ More
Debiased machine learning estimators for nonparametric inference of smooth functionals of the data-generating distribution can suffer from excessive variability and instability. For this reason, practitioners may resort to simpler models based on parametric or semiparametric assumptions. However, such simplifying assumptions may fail to hold, and estimates may then be biased due to model misspecification. To address this problem, we propose Adaptive Debiased Machine Learning (ADML), a nonparametric framework that combines data-driven model selection and debiased machine learning techniques to construct asymptotically linear, adaptive, and superefficient estimators for pathwise differentiable functionals. By learning model structure directly from data, ADML avoids the bias introduced by model misspecification and remains free from the restrictions of parametric and semiparametric models. While they may exhibit irregular behavior for the target parameter in a nonparametric statistical model, we demonstrate that ADML estimators provides regular and locally uniformly valid inference for a projection-based oracle parameter. Importantly, this oracle parameter agrees with the original target parameter for distributions within an unknown but correctly specified oracle statistical submodel that is learned from the data. This finding implies that there is no penalty, in a local asymptotic sense, for conducting data-driven model selection compared to having prior knowledge of the oracle submodel and oracle parameter. To demonstrate the practical applicability of our theory, we provide a broad class of ADML estimators for estimating the average treatment effect in adaptive partially linear regression models.
△ Less
Submitted 24 July, 2023;
originally announced July 2023.
-
One-Step Estimation of Differentiable Hilbert-Valued Parameters
Authors:
Alex Luedtke,
Incheoul Chung
Abstract:
We present estimators for smooth Hilbert-valued parameters, where smoothness is characterized by a pathwise differentiability condition. When the parameter space is a reproducing kernel Hilbert space, we provide a means to obtain efficient, root-n rate estimators and corresponding confidence sets. These estimators correspond to generalizations of cross-fitted one-step estimators based on Hilbert-v…
▽ More
We present estimators for smooth Hilbert-valued parameters, where smoothness is characterized by a pathwise differentiability condition. When the parameter space is a reproducing kernel Hilbert space, we provide a means to obtain efficient, root-n rate estimators and corresponding confidence sets. These estimators correspond to generalizations of cross-fitted one-step estimators based on Hilbert-valued efficient influence functions. We give theoretical guarantees even when arbitrary estimators of nuisance functions are used, including those based on machine learning techniques. We show that these results naturally extend to Hilbert spaces that lack a reproducing kernel, as long as the parameter has an efficient influence function. However, we also uncover the unfortunate fact that, when there is no reproducing kernel, many interesting parameters fail to have an efficient influence function, even though they are pathwise differentiable. To handle these cases, we propose a regularized one-step estimator and associated confidence sets. We also show that pathwise differentiability, which is a central requirement of our approach, holds in many cases. Specifically, we provide multiple examples of pathwise differentiable parameters and develop corresponding estimators and confidence sets. Among these examples, four are particularly relevant to ongoing research by the causal inference community: the counterfactual density function, dose-response function, conditional average treatment effect function, and counterfactual kernel mean embedding.
△ Less
Submitted 26 September, 2023; v1 submitted 29 March, 2023;
originally announced March 2023.
-
Efficient Estimation of the Maximal Association between Multiple Predictors and a Survival Outcome
Authors:
Tzu-Jung Huang,
Alex Luedtke,
Ian W. McKeague
Abstract:
This paper develops a new approach to post-selection inference for screening high-dimensional predictors of survival outcomes. Post-selection inference for right-censored outcome data has been investigated in the literature, but much remains to be done to make the methods both reliable and computationally-scalable in high-dimensions. Machine learning tools are commonly used to provide {\it predict…
▽ More
This paper develops a new approach to post-selection inference for screening high-dimensional predictors of survival outcomes. Post-selection inference for right-censored outcome data has been investigated in the literature, but much remains to be done to make the methods both reliable and computationally-scalable in high-dimensions. Machine learning tools are commonly used to provide {\it predictions} of survival outcomes, but the estimated effect of a selected predictor suffers from confirmation bias unless the selection is taken into account. The new approach involves construction of semi-parametrically efficient estimators of the linear association between the predictors and the survival outcome, which are used to build a test statistic for detecting the presence of an association between any of the predictors and the outcome. Further, a stabilization technique reminiscent of bagging allows a normal calibration for the resulting test statistic, which enables the construction of confidence intervals for the maximal association between predictors and the outcome and also greatly reduces computational cost. Theoretical results show that this testing procedure is valid even when the number of predictors grows superpolynomially with sample size, and our simulations support that this asymptotic guarantee is indicative the performance of the test at moderate sample sizes. The new approach is applied to the problem of identifying patterns in viral gene expression associated with the potency of an antiviral drug.
△ Less
Submitted 21 December, 2021;
originally announced December 2021.
-
Improved inference for vaccine-induced immune responses via shape-constrained methods
Authors:
Nilanjana Laha,
Zoe Moodie,
Ying Huang,
Alex Luedtke
Abstract:
We study the performance of shape-constrained methods for evaluating immune response profiles from early-phase vaccine trials. The motivating problem for this work involves quantifying and comparing the IgG binding immune responses to the first and second variable loops (V1V2 region) arising in HVTN 097 and HVTN 100 HIV vaccine trials. We consider unimodal and log-concave shape-constrained methods…
▽ More
We study the performance of shape-constrained methods for evaluating immune response profiles from early-phase vaccine trials. The motivating problem for this work involves quantifying and comparing the IgG binding immune responses to the first and second variable loops (V1V2 region) arising in HVTN 097 and HVTN 100 HIV vaccine trials. We consider unimodal and log-concave shape-constrained methods to compare the immune profiles of the two vaccines, which is reasonable because the data support that the underlying densities of the immune responses could have these shapes. To this end, we develop novel shape-constrained tests of stochastic dominance and shape-constrained plug-in estimators of the Hellinger distance between two densities. Our techniques are either tuning parameter free, or rely on only one tuning parameter, but their performance is either better (the tests of stochastic dominance) or comparable with the nonparametric methods (the estimators of Hellinger distance). The minimal dependence on tuning parameters is especially desirable in clinical contexts where analyses must be prespecified and reproducible. Our methods are supported by theoretical results and simulation studies.
△ Less
Submitted 29 October, 2022; v1 submitted 24 July, 2021;
originally announced July 2021.
-
Adversarial Meta-Learning of Gamma-Minimax Estimators That Leverage Prior Knowledge
Authors:
Hongxiang Qiu,
Alex Luedtke
Abstract:
Bayes estimators are well known to provide a means to incorporate prior knowledge that can be expressed in terms of a single prior distribution. However, when this knowledge is too vague to express with a single prior, an alternative approach is needed. Gamma-minimax estimators provide such an approach. These estimators minimize the worst-case Bayes risk over a set $Γ$ of prior distributions that…
▽ More
Bayes estimators are well known to provide a means to incorporate prior knowledge that can be expressed in terms of a single prior distribution. However, when this knowledge is too vague to express with a single prior, an alternative approach is needed. Gamma-minimax estimators provide such an approach. These estimators minimize the worst-case Bayes risk over a set $Γ$ of prior distributions that are compatible with the available knowledge. Traditionally, Gamma-minimaxity is defined for parametric models. In this work, we define Gamma-minimax estimators for general models and propose adversarial meta-learning algorithms to compute them when the set of prior distributions is constrained by generalized moments. Accompanying convergence guarantees are also provided. We also introduce a neural network class that provides a rich, but finite-dimensional, class of estimators from which a Gamma-minimax estimator can be selected. We illustrate our method in two settings, namely entropy estimation and a prediction problem that arises in biodiversity studies.
△ Less
Submitted 31 August, 2023; v1 submitted 10 December, 2020;
originally announced December 2020.
-
Sufficient and insufficient conditions for the stochastic convergence of Cesàro means
Authors:
Aurélien F. Bibaut,
Alex Luedtke,
Mark J. van der Laan
Abstract:
We study the stochastic convergence of the Cesàro mean of a sequence of random variables. These arise naturally in statistical problems that have a sequential component, where the sequence of random variables is typically derived from a sequence of estimators computed on data. We show that establishing a rate of convergence in probability for a sequence is not sufficient in general to establish a…
▽ More
We study the stochastic convergence of the Cesàro mean of a sequence of random variables. These arise naturally in statistical problems that have a sequential component, where the sequence of random variables is typically derived from a sequence of estimators computed on data. We show that establishing a rate of convergence in probability for a sequence is not sufficient in general to establish a rate in probability for its Cesàro mean. We also present several sets of conditions on the sequence of random variables that are sufficient to guarantee a rate of convergence for its Cesàro mean. We identify common settings in which these sets of conditions hold.
△ Less
Submitted 13 September, 2020;
originally announced September 2020.
-
Universal sieve-based strategies for efficient estimation using machine learning tools
Authors:
Hongxiang Qiu,
Alex Luedtke,
Marco Carone
Abstract:
Suppose that we wish to estimate a finite-dimensional summary of one or more function-valued features of an underlying data-generating mechanism under a nonparametric model. One approach to estimation is by plugging in flexible estimates of these features. Unfortunately, in general, such estimators may not be asymptotically efficient, which often makes these estimators difficult to use as a basis…
▽ More
Suppose that we wish to estimate a finite-dimensional summary of one or more function-valued features of an underlying data-generating mechanism under a nonparametric model. One approach to estimation is by plugging in flexible estimates of these features. Unfortunately, in general, such estimators may not be asymptotically efficient, which often makes these estimators difficult to use as a basis for inference. Though there are several existing methods to construct asymptotically efficient plug-in estimators, each such method either can only be derived using knowledge of efficiency theory or is only valid under stringent smoothness assumptions. Among existing methods, sieve estimators stand out as particularly convenient because efficiency theory is not required in their construction, their tuning parameters can be selected data adaptively, and they are universal in the sense that the same fits lead to efficient plug-in estimators for a rich class of estimands. Inspired by these desirable properties, we propose two novel universal approaches for estimating function-valued features that can be analyzed using sieve estimation theory. Compared to traditional sieve estimators, these approaches are valid under more general conditions on the smoothness of the function-valued features by utilizing flexible estimates that can be obtained, for example, using machine learning.
△ Less
Submitted 26 August, 2020; v1 submitted 3 March, 2020;
originally announced March 2020.
-
Faster Rates for Policy Learning
Authors:
Alexander Luedtke,
Antoine Chambaz
Abstract:
This article improves the existing proven rates of regret decay in optimal policy estimation. We give a margin-free result showing that the regret decay for estimating a within-class optimal policy is second-order for empirical risk minimizers over Donsker classes, with regret decaying at a faster rate than the standard error of an efficient estimator of the value of an optimal policy. We also giv…
▽ More
This article improves the existing proven rates of regret decay in optimal policy estimation. We give a margin-free result showing that the regret decay for estimating a within-class optimal policy is second-order for empirical risk minimizers over Donsker classes, with regret decaying at a faster rate than the standard error of an efficient estimator of the value of an optimal policy. We also give a result from the classification literature that shows that faster regret decay is possible via plug-in estimation provided a margin condition holds. Four examples are considered. In these examples, the regret is expressed in terms of either the mean value or the median value; the number of possible actions is either two or finitely many; and the sampling scheme is either independent and identically distributed or sequential, where the latter represents a contextual bandit sampling scheme.
△ Less
Submitted 21 April, 2017;
originally announced April 2017.
-
Toward computerized efficient estimation in infinite-dimensional models
Authors:
Marco Carone,
Alexander R. Luedtke,
Mark J. van der Laan
Abstract:
Despite the risk of misspecification they are tied to, parametric models continue to be used in statistical practice because they are accessible to all. In particular, efficient estimation procedures in parametric models are simple to describe and implement. Unfortunately, the same cannot be said of semiparametric and nonparametric models. While the latter often reflect the level of available scie…
▽ More
Despite the risk of misspecification they are tied to, parametric models continue to be used in statistical practice because they are accessible to all. In particular, efficient estimation procedures in parametric models are simple to describe and implement. Unfortunately, the same cannot be said of semiparametric and nonparametric models. While the latter often reflect the level of available scientific knowledge more appropriately, performing efficient inference in these models is generally challenging. The efficient influence function is a key analytic object from which the construction of asymptotically efficient estimators can potentially be streamlined. However, the theoretical derivation of the efficient influence function requires specialized knowledge and is often a difficult task, even for experts. In this paper, we propose and discuss a numerical procedure for approximating the efficient influence function. The approach generalizes the simple nonparametric procedures described recently by Frangakis et al. (2015) and Luedtke et al. (2015) to arbitrary models. We present theoretical results to support our proposal, and also illustrate the method in the context of two examples. The proposed approach is an important step toward automating efficient estimation in general statistical models, thereby rendering the use of realistic models in statistical analyses much more accessible.
△ Less
Submitted 30 August, 2016;
originally announced August 2016.
-
Statistical inference for the mean outcome under a possibly non-unique optimal treatment strategy
Authors:
Alexander R. Luedtke,
Mark J. van der Laan
Abstract:
We consider challenges that arise in the estimation of the mean outcome under an optimal individualized treatment strategy defined as the treatment rule that maximizes the population mean outcome, where the candidate treatment rules are restricted to depend on baseline covariates. We prove a necessary and sufficient condition for the pathwise differentiability of the optimal value, a key condition…
▽ More
We consider challenges that arise in the estimation of the mean outcome under an optimal individualized treatment strategy defined as the treatment rule that maximizes the population mean outcome, where the candidate treatment rules are restricted to depend on baseline covariates. We prove a necessary and sufficient condition for the pathwise differentiability of the optimal value, a key condition needed to develop a regular and asymptotically linear (RAL) estimator of the optimal value. The stated condition is slightly more general than the previous condition implied in the literature. We then describe an approach to obtain root-$n$ rate confidence intervals for the optimal value even when the parameter is not pathwise differentiable. We provide conditions under which our estimator is RAL and asymptotically efficient when the mean outcome is pathwise differentiable. We also outline an extension of our approach to a multiple time point problem. All of our results are supported by simulations.
△ Less
Submitted 24 March, 2016;
originally announced March 2016.
-
An Omnibus Nonparametric Test of Equality in Distribution for Unknown Functions
Authors:
Alexander R. Luedtke,
Marco Carone,
Mark J. van der Laan
Abstract:
We present a novel family of nonparametric omnibus tests of the hypothesis that two unknown but estimable functions are equal in distribution when applied to the observed data structure. We developed these tests, which represent a generalization of the maximum mean discrepancy tests described in Gretton et al. [2006], using recent developments from the higher-order pathwise differentiability liter…
▽ More
We present a novel family of nonparametric omnibus tests of the hypothesis that two unknown but estimable functions are equal in distribution when applied to the observed data structure. We developed these tests, which represent a generalization of the maximum mean discrepancy tests described in Gretton et al. [2006], using recent developments from the higher-order pathwise differentiability literature. Despite their complex derivation, the associated test statistics can be expressed rather simply as U-statistics. We study the asymptotic behavior of the proposed tests under the null hypothesis and under both fixed and local alternatives. We provide examples to which our tests can be applied and show that they perform well in a simulation study. As an important special case, our proposed tests can be used to determine whether an unknown function, such as the conditional average treatment effect, is equal to zero almost surely.
△ Less
Submitted 13 June, 2017; v1 submitted 14 October, 2015;
originally announced October 2015.