Search | arXiv e-print repository

Estimating conditional hazard functions and densities with the highly-adaptive lasso

Authors: Anders Munch, Thomas A. Gerds, Mark J. van der Laan, Helene C. W. Rytgaard

Abstract: We consider estimation of conditional hazard functions and densities over the class of multivariate càdlàg functions with uniformly bounded sectional variation norm when data are either fully observed or subject to right-censoring. We demonstrate that the empirical risk minimizer is either not well-defined or not consistent for estimation of conditional hazard functions and densities. Under a smoo… ▽ More We consider estimation of conditional hazard functions and densities over the class of multivariate càdlàg functions with uniformly bounded sectional variation norm when data are either fully observed or subject to right-censoring. We demonstrate that the empirical risk minimizer is either not well-defined or not consistent for estimation of conditional hazard functions and densities. Under a smoothness assumption about the data-generating distribution, a highly-adaptive lasso estimator based on a particular data-adaptive sieve achieves the same convergence rate as has been shown to hold for the empirical risk minimizer in settings where the latter is well-defined. We use this result to study a highly-adaptive lasso estimator of a conditional hazard function based on right-censored data. We also propose a new conditional density estimator and derive its convergence rate. Finally, we show that the result is of interest also for settings where the empirical risk minimizer is well-defined, because the highly-adaptive lasso depends on a much smaller number of basis function than the empirical risk minimizer. △ Less

Submitted 17 April, 2024; originally announced April 2024.

Comments: 36 pages, 14 figures

MSC Class: 62G05 (primary) 62N02 (secondary)

arXiv:2309.16099 [pdf, other]

Nonparametric estimation of a covariate-adjusted counterfactual treatment regimen response curve

Authors: Ashkan Ertefaie, Luke Duttweiler, Brent A. Johnson, Mark J. van der Laan

Abstract: Flexible estimation of the mean outcome under a treatment regimen (i.e., value function) is the key step toward personalized medicine. We define our target parameter as a conditional value function given a set of baseline covariates which we refer to as a stratum based value function. We focus on semiparametric class of decision rules and propose a sieve based nonparametric covariate adjusted regi… ▽ More Flexible estimation of the mean outcome under a treatment regimen (i.e., value function) is the key step toward personalized medicine. We define our target parameter as a conditional value function given a set of baseline covariates which we refer to as a stratum based value function. We focus on semiparametric class of decision rules and propose a sieve based nonparametric covariate adjusted regimen-response curve estimator within that class. Our work contributes in several ways. First, we propose an inverse probability weighted nonparametrically efficient estimator of the smoothed regimen-response curve function. We show that asymptotic linearity is achieved when the nuisance functions are undersmoothed sufficiently. Asymptotic and finite sample criteria for undersmoothing are proposed. Second, using Gaussian process theory, we propose simultaneous confidence intervals for the smoothed regimen-response curve function. Third, we provide consistency and convergence rate for the optimizer of the regimen-response curve estimator; this enables us to estimate an optimal semiparametric rule. The latter is important as the optimizer corresponds with the optimal dynamic treatment regimen. Some finite-sample properties are explored with simulations. △ Less

Submitted 27 September, 2023; originally announced September 2023.

arXiv:2307.12544 [pdf, other]

Adaptive debiased machine learning using data-driven model selection techniques

Authors: Lars van der Laan, Marco Carone, Alex Luedtke, Mark van der Laan

Abstract: Debiased machine learning estimators for nonparametric inference of smooth functionals of the data-generating distribution can suffer from excessive variability and instability. For this reason, practitioners may resort to simpler models based on parametric or semiparametric assumptions. However, such simplifying assumptions may fail to hold, and estimates may then be biased due to model misspecif… ▽ More Debiased machine learning estimators for nonparametric inference of smooth functionals of the data-generating distribution can suffer from excessive variability and instability. For this reason, practitioners may resort to simpler models based on parametric or semiparametric assumptions. However, such simplifying assumptions may fail to hold, and estimates may then be biased due to model misspecification. To address this problem, we propose Adaptive Debiased Machine Learning (ADML), a nonparametric framework that combines data-driven model selection and debiased machine learning techniques to construct asymptotically linear, adaptive, and superefficient estimators for pathwise differentiable functionals. By learning model structure directly from data, ADML avoids the bias introduced by model misspecification and remains free from the restrictions of parametric and semiparametric models. While they may exhibit irregular behavior for the target parameter in a nonparametric statistical model, we demonstrate that ADML estimators provides regular and locally uniformly valid inference for a projection-based oracle parameter. Importantly, this oracle parameter agrees with the original target parameter for distributions within an unknown but correctly specified oracle statistical submodel that is learned from the data. This finding implies that there is no penalty, in a local asymptotic sense, for conducting data-driven model selection compared to having prior knowledge of the oracle submodel and oracle parameter. To demonstrate the practical applicability of our theory, we provide a broad class of ADML estimators for estimating the average treatment effect in adaptive partially linear regression models. △ Less

Submitted 24 July, 2023; originally announced July 2023.

Comments: 32 pages + appendix

arXiv:2301.13354 [pdf, ps, other]

Higher Order Spline Highly Adaptive Lasso Estimators of Functional Parameters: Pointwise Asymptotic Normality and Uniform Convergence Rates

Authors: Mark van der Laan

Abstract: We consider estimation of a functional of the data distribution based on i.i.d. observations. We assume the target function can be defined as the minimizer of the expectation of a loss function over a class of $d$-variate real valued cadlag functions that have finite sectional variation norm. For all $k=0,1,\ldots$, we define a $k$-th order smoothness class of functions as $d$-variate functions on… ▽ More We consider estimation of a functional of the data distribution based on i.i.d. observations. We assume the target function can be defined as the minimizer of the expectation of a loss function over a class of $d$-variate real valued cadlag functions that have finite sectional variation norm. For all $k=0,1,\ldots$, we define a $k$-th order smoothness class of functions as $d$-variate functions on the unit cube for which each of a sequentially defined $k$-th order Radon-Nikodym derivative w.r.t. Lebesgue measure is cadlag and of bounded variation. For a target function in this $k$-th order smoothness class we provide a representation of the target function as an infinite linear combination of tensor products of $\leq k$-th order spline basis functions indexed by a knot-point, where the lower (than $k$) order spline basis functions are used to represent the function at the $0$-edges. The $L_1$-norm of the coefficients represents the sum of the variation norms across all the $k$-th order derivatives, which is called the $k$-th order sectional variation norm of the target function. This generalizes the zero order spline representation of cadlag functions with bounded sectional variation norm to higher order smoothness classes. We use this $k$-th order spline representation of a function to define the $k$-th order spline sieve minimum loss estimator (MLE), Highly Adaptive Lasso (HAL) MLE, and Relax HAL-MLE. For first and higher order smoothness classes, in this article we analyze these three classes of estimators and establish pointwise asymptotic normality and uniform convergence at dimension free rate $n^{-k^*/(2k^*+1)}$ up till a power of $\log n$ depending on the dimension, where $k^*=k+1$, assuming appropriate undersmoothing is used in selecting the $L_1$-norm. We also establish asymptotic linearity of plug-in estimators of pathwise differentiable features of the target function. △ Less

Submitted 30 January, 2023; originally announced January 2023.

arXiv:2205.10697 [pdf, other]

Lassoed Tree Boosting

Authors: Alejandro Schuler, Yi Li, Mark van der Laan

Abstract: Gradient boosting performs exceptionally in most prediction problems and scales well to large datasets. In this paper we prove that a ``lassoed'' gradient boosted tree algorithm with early stop** achieves faster than $n^{-1/4}$ L2 convergence in the large nonparametric space of cadlag functions of bounded sectional variation. This rate is remarkable because it does not depend on the dimension, s… ▽ More Gradient boosting performs exceptionally in most prediction problems and scales well to large datasets. In this paper we prove that a ``lassoed'' gradient boosted tree algorithm with early stop** achieves faster than $n^{-1/4}$ L2 convergence in the large nonparametric space of cadlag functions of bounded sectional variation. This rate is remarkable because it does not depend on the dimension, sparsity, or smoothness. We use simulation and real data to confirm our theory and demonstrate empirical performance and scalability on par with standard boosting. Our convergence proofs are based on a novel, general theorem on early stop** with empirical loss minimizers of nested Donsker classes. △ Less

Submitted 8 December, 2023; v1 submitted 21 May, 2022; originally announced May 2022.

arXiv:2110.12112 [pdf, ps, other]

Why Machine Learning Cannot Ignore Maximum Likelihood Estimation

Authors: Mark J. van der Laan, Sherri Rose

Abstract: The growth of machine learning as a field has been accelerating with increasing interest and publications across fields, including statistics, but predominantly in computer science. How can we parse this vast literature for developments that exemplify the necessary rigor? How many of these manuscripts incorporate foundational theory to allow for statistical inference? Which advances have the great… ▽ More The growth of machine learning as a field has been accelerating with increasing interest and publications across fields, including statistics, but predominantly in computer science. How can we parse this vast literature for developments that exemplify the necessary rigor? How many of these manuscripts incorporate foundational theory to allow for statistical inference? Which advances have the greatest potential for impact in practice? One could posit many answers to these queries. Here, we assert that one essential idea is for machine learning to integrate maximum likelihood for estimation of functional parameters, such as prediction functions and conditional densities. △ Less

Submitted 22 October, 2021; originally announced October 2021.

Comments: 30 pages. Forthcoming as a chapter in the Handbook of Matching and Weighting in Causal Inference

arXiv:2106.01723 [pdf, other]

Risk Minimization from Adaptively Collected Data: Guarantees for Supervised and Policy Learning

Authors: Aurélien Bibaut, Antoine Chambaz, Maria Dimakopoulou, Nathan Kallus, Mark van der Laan

Abstract: Empirical risk minimization (ERM) is the workhorse of machine learning, whether for classification and regression or for off-policy policy learning, but its model-agnostic guarantees can fail when we use adaptively collected data, such as the result of running a contextual bandit algorithm. We study a generic importance sampling weighted ERM algorithm for using adaptively collected data to minimiz… ▽ More Empirical risk minimization (ERM) is the workhorse of machine learning, whether for classification and regression or for off-policy policy learning, but its model-agnostic guarantees can fail when we use adaptively collected data, such as the result of running a contextual bandit algorithm. We study a generic importance sampling weighted ERM algorithm for using adaptively collected data to minimize the average of a loss function over a hypothesis class and provide first-of-their-kind generalization guarantees and fast convergence rates. Our results are based on a new maximal inequality that carefully leverages the importance sampling structure to obtain rates with the right dependence on the exploration rate in the data. For regression, we provide fast rates that leverage the strong convexity of squared-error loss. For policy learning, we provide rate-optimal regret guarantees that close an open gap in the existing literature whenever exploration decays to zero, as is the case for bandit-collected data. An empirical investigation validates our theory. △ Less

Submitted 3 June, 2021; originally announced June 2021.

arXiv:2106.00418 [pdf, other]

Post-Contextual-Bandit Inference

Authors: Aurélien Bibaut, Antoine Chambaz, Maria Dimakopoulou, Nathan Kallus, Mark van der Laan

Abstract: Contextual bandit algorithms are increasingly replacing non-adaptive A/B tests in e-commerce, healthcare, and policymaking because they can both improve outcomes for study participants and increase the chance of identifying good or even best policies. To support credible inference on novel interventions at the end of the study, nonetheless, we still want to construct valid confidence intervals on… ▽ More Contextual bandit algorithms are increasingly replacing non-adaptive A/B tests in e-commerce, healthcare, and policymaking because they can both improve outcomes for study participants and increase the chance of identifying good or even best policies. To support credible inference on novel interventions at the end of the study, nonetheless, we still want to construct valid confidence intervals on average treatment effects, subgroup effects, or value of new policies. The adaptive nature of the data collected by contextual bandit algorithms, however, makes this difficult: standard estimators are no longer asymptotically normally distributed and classic confidence intervals fail to provide correct coverage. While this has been addressed in non-contextual settings by using stabilized estimators, the contextual setting poses unique challenges that we tackle for the first time in this paper. We propose the Contextual Adaptive Doubly Robust (CADR) estimator, the first estimator for policy value that is asymptotically normal under contextual adaptive data collection. The main technical challenge in constructing CADR is designing adaptive and consistent conditional standard deviation estimators for stabilization. Extensive numerical experiments using 57 OpenML datasets demonstrate that confidence intervals based on CADR uniquely provide correct coverage. △ Less

Submitted 1 June, 2021; originally announced June 2021.

arXiv:2105.05373 [pdf, other]

Estimation of population size based on capture recapture designs and evaluation of the estimation reliability

Authors: Yue You, Mark van der Laan, Philip Collender, Qu Cheng, Alan Hubbard, Nicholas P Jewell, Zhiyue Tom Hu, Robin Mejia, Justin Remais

Abstract: We propose a modern method to estimate population size based on capture-recapture designs of K samples. The observed data is formulated as a sample of n i.i.d. K-dimensional vectors of binary indicators, where the k-th component of each vector indicates the subject being caught by the k-th sample, such that only subjects with nonzero capture vectors are observed. The target quantity is the uncondi… ▽ More We propose a modern method to estimate population size based on capture-recapture designs of K samples. The observed data is formulated as a sample of n i.i.d. K-dimensional vectors of binary indicators, where the k-th component of each vector indicates the subject being caught by the k-th sample, such that only subjects with nonzero capture vectors are observed. The target quantity is the unconditional probability of the vector being nonzero across both observed and unobserved subjects. We cover models assuming a single constraint (identification assumption) on the K-dimensional distribution such that the target quantity is identified and the statistical model is unrestricted. We present solutions for linear and non-linear constraints commonly assumed to identify capture-recapture models, including no K-way interaction in linear and log-linear models, independence or conditional independence. We demonstrate that the choice of constraint has a dramatic impact on the value of the estimand, showing that it is crucial that the constraint is known to hold by design. For the commonly assumed constraint of no K-way interaction in a log-linear model, the statistical target parameter is only defined when each of the $2^K - 1$ observable capture patterns is present, and therefore suffers from the curse of dimensionality. We propose a targeted MLE based on undersmoothed lasso model to smooth across the cells while targeting the fit towards the single valued target parameter of interest. For each identification assumption, we provide simulated inference and confidence intervals to assess the performance on the estimator under correct and incorrect identifying assumptions. We apply the proposed method, alongside existing estimators, to estimate prevalence of a parasitic infection using multi-source surveillance data from a region in southwestern China, under the four identification assumptions. △ Less

Submitted 11 May, 2021; originally announced May 2021.

arXiv:2105.02088 [pdf, other]

Continuous-time targeted minimum loss-based estimation of intervention-specific mean outcomes

Authors: Helene C. Rytgaard, Thomas A. Gerds, Mark J. van der Laan

Abstract: This paper studies the generalization of the targeted minimum loss-based estimation (TMLE) framework to estimation of effects of time-varying interventions in settings where both interventions, covariates, and outcome can happen at subject-specific time-points on an arbitrarily fine time-scale. TMLE is a general template for constructing asymptotically linear substitution estimators for smooth low… ▽ More This paper studies the generalization of the targeted minimum loss-based estimation (TMLE) framework to estimation of effects of time-varying interventions in settings where both interventions, covariates, and outcome can happen at subject-specific time-points on an arbitrarily fine time-scale. TMLE is a general template for constructing asymptotically linear substitution estimators for smooth low-dimensional parameters in infinite-dimensional models. Existing longitudinal TMLE methods are developed for data where observations are made on a discrete time-grid. We consider a continuous-time counting process model where intensity measures track the monitoring of subjects, and focus on a low-dimensional target parameter defined as the intervention-specific mean outcome at the end of follow-up. To construct our TMLE algorithm for the given statistical estimation problem we derive an expression for the efficient influence curve and represent the target parameter as a functional of intensities and conditional expectations. The high-dimensional nuisance parameters of our model are estimated and updated in an iterative manner according to separate targeting steps for the involved intensities and conditional expectations. The resulting estimator solves the efficient influence curve equation. We state a general efficiency theorem and describe a highly adaptive lasso estimator for nuisance parameters that allows us to establish asymptotic linearity and efficiency of our estimator under minimal conditions on the underlying statistical model. △ Less

Submitted 5 May, 2021; originally announced May 2021.

Comments: 27 pages (excluding supplementary material), 1 figures

arXiv:2102.00102 [pdf, other]

Adaptive Sequential Design for a Single Time-Series

Authors: Ivana Malenica, Aurelien Bibaut, Mark J. van der Laan

Abstract: The current work is motivated by the need for robust statistical methods for precision medicine; as such, we address the need for statistical methods that provide actionable inference for a single unit at any point in time. We aim to learn an optimal, unknown choice of the controlled components of the design in order to optimize the expected outcome; with that, we adapt the randomization mechanism… ▽ More The current work is motivated by the need for robust statistical methods for precision medicine; as such, we address the need for statistical methods that provide actionable inference for a single unit at any point in time. We aim to learn an optimal, unknown choice of the controlled components of the design in order to optimize the expected outcome; with that, we adapt the randomization mechanism for future time-point experiments based on the data collected on the individual over time. Our results demonstrate that one can learn the optimal rule based on a single sample, and thereby adjust the design at any point t with valid inference for the mean target parameter. This work provides several contributions to the field of statistical precision medicine. First, we define a general class of averages of conditional causal parameters defined by the current context for the single unit time-series data. We define a nonparametric model for the probability distribution of the time-series under few assumptions, and aim to fully utilize the sequential randomization in the estimation procedure via the double robust structure of the efficient influence curve of the proposed target parameter. We present multiple exploration-exploitation strategies for assigning treatment, and methods for estimating the optimal rule. Lastly, we present the study of the data-adaptive inference on the mean under the optimal treatment rule, where the target parameter adapts over time in response to the observed context of the individual. Our target parameter is pathwise differentiable with an efficient influence function that is doubly robust - which makes it easier to estimate than previously proposed variations. We characterize the limit distribution of our estimator under a Donsker condition expressed in terms of a notion of bracketing entropy adapted to martingale settings. △ Less

Submitted 1 July, 2021; v1 submitted 29 January, 2021; originally announced February 2021.

Comments: arXiv admin note: text overlap with arXiv:1809.00734

arXiv:2101.07380 [pdf, ps, other]

Sequential causal inference in a single world of connected units

Authors: Aurelien Bibaut, Maya Petersen, Nikos Vlassis, Maria Dimakopoulou, Mark van der Laan

Abstract: We consider adaptive designs for a trial involving N individuals that we follow along T time steps. We allow for the variables of one individual to depend on its past and on the past of other individuals. Our goal is to learn a mean outcome, averaged across the N individuals, that we would observe, if we started from some given initial state, and we carried out a given sequence of counterfactual i… ▽ More We consider adaptive designs for a trial involving N individuals that we follow along T time steps. We allow for the variables of one individual to depend on its past and on the past of other individuals. Our goal is to learn a mean outcome, averaged across the N individuals, that we would observe, if we started from some given initial state, and we carried out a given sequence of counterfactual interventions for $τ$ time steps. We show how to identify a statistical parameter that equals this mean counterfactual outcome, and how to perform inference for this parameter, while adaptively learning an oracle design defined as a parameter of the true data generating distribution. Oracle designs of interest include the design that maximizes the efficiency for a statistical parameter of interest, or designs that mix the optimal treatment rule with a certain exploration distribution. We also show how to design adaptive stop** rules for sequential hypothesis testing. This setting presents unique technical challenges. Unlike in usual statistical settings where the data consists of several independent observations, here, due to network and temporal dependence, the data reduces to one single observation with dependent components. In particular, this precludes the use of sample splitting techniques. We therefore had to develop a new equicontinuity result and guarantees for estimators fitted on dependent data. We were motivated to work on this problem by the following two questions. (1) In the context of a sequential adaptive trial with K treatment arms, how to design a procedure to identify in as few rounds as possible the treatment arm with best final outcome? (2) In the context of sequential randomized disease testing at the scale of a city, how to estimate and infer the value of an optimal testing and isolation strategy? △ Less

Submitted 18 January, 2021; originally announced January 2021.

arXiv:2101.06290 [pdf, other]

Higher Order Targeted Maximum Likelihood Estimation

Authors: Mark van der Laan, Zeyi Wang, Lars van der Laan

Abstract: Asymptotic efficiency of targeted maximum likelihood estimators (TMLE) of target features of the data distribution relies on a a second order remainder being asymptotically negligible. In previous work we proposed a nonparametric MLE termed Highly Adaptive Lasso (HAL) which parametrizes the relevant functional of the data distribution in terms of a multivariate real valued cadlag function that is… ▽ More Asymptotic efficiency of targeted maximum likelihood estimators (TMLE) of target features of the data distribution relies on a a second order remainder being asymptotically negligible. In previous work we proposed a nonparametric MLE termed Highly Adaptive Lasso (HAL) which parametrizes the relevant functional of the data distribution in terms of a multivariate real valued cadlag function that is assumed to have finite variation norm. We showed that the HAL-MLE converges in Kullback-Leibler dissimilarity at a rate n-1/3 up till logn factors. Therefore, by using HAL as initial density estimator in the TMLE, the resulting HAL-TMLE is an asymptotically efficient estimator only assuming that the relevant nuisance functions of the data density are cadlag and have finite variation norm. However, in finite samples, the second order remainder can dominate the sampling distribution so that inference based on asymptotic normality would be anti-conservative. In this article we propose a new higher order TMLE, generalizing the regular first order TMLE. We prove that it satisfies an exact linear expansion, in terms of efficient influence functions of sequentially defined higher order fluctuations of the target parameter, with a remainder that is a k+1th order remainder. As a consequence, this k-th order TMLE allows statistical inference only relying on the k+1th order remainder being negligible. We also provide a rationale for the higher order TMLE that it will be superior to the first order TMLE by (iteratively) locally minimizing the exact finite sample remainder of the first order TMLE. The second order TMLE is demonstrated for nonparametric estimation of the integrated squared density and for the treatment specific mean outcome. We also provide an initial simulation study for the second order TMLE of the treatment specific mean confirming the theoretical analysis. △ Less

Submitted 30 June, 2021; v1 submitted 15 January, 2021; originally announced January 2021.

arXiv:2009.06203 [pdf, other]

doi 10.1093/biostatistics/kxac002

Nonparametric causal mediation analysis for stochastic interventional (in)direct effects

Authors: Nima S. Hejazi, Kara E. Rudolph, Mark J. van der Laan, Iván Díaz

Abstract: Causal mediation analysis has historically been limited in two important ways: (i) a focus has traditionally been placed on binary treatments and static interventions, and (ii) direct and indirect effect decompositions have been pursued that are only identifiable in the absence of intermediate confounders affected by treatment. We present a theoretical study of an (in)direct effect decomposition o… ▽ More Causal mediation analysis has historically been limited in two important ways: (i) a focus has traditionally been placed on binary treatments and static interventions, and (ii) direct and indirect effect decompositions have been pursued that are only identifiable in the absence of intermediate confounders affected by treatment. We present a theoretical study of an (in)direct effect decomposition of the population intervention effect, defined by stochastic interventions jointly applied to the treatment and mediators. In contrast to existing proposals, our causal effects can be evaluated regardless of whether a treatment is categorical or continuous and remain well-defined even in the presence of intermediate confounders affected by treatment. Our (in)direct effects are identifiable without a restrictive assumption on cross-world counterfactual independencies, allowing for substantive conclusions drawn from them to be validated in randomized controlled trials. Beyond the novel effects introduced, we provide a careful study of nonparametric efficiency theory relevant for the construction of flexible, multiply robust estimators of our (in)direct effects, while avoiding undue restrictions induced by assuming parametric models of nuisance parameter functionals. To complement our nonparametric estimation strategy, we introduce inferential techniques for constructing confidence intervals and hypothesis tests, and discuss open source software implementing the proposed methodology. △ Less

Submitted 11 January, 2022; v1 submitted 14 September, 2020; originally announced September 2020.

Journal ref: Biostatistics, 2022

arXiv:2009.05974 [pdf, ps, other]

Sufficient and insufficient conditions for the stochastic convergence of Cesàro means

Authors: Aurélien F. Bibaut, Alex Luedtke, Mark J. van der Laan

Abstract: We study the stochastic convergence of the Cesàro mean of a sequence of random variables. These arise naturally in statistical problems that have a sequential component, where the sequence of random variables is typically derived from a sequence of estimators computed on data. We show that establishing a rate of convergence in probability for a sequence is not sufficient in general to establish a… ▽ More We study the stochastic convergence of the Cesàro mean of a sequence of random variables. These arise naturally in statistical problems that have a sequential component, where the sequence of random variables is typically derived from a sequence of estimators computed on data. We show that establishing a rate of convergence in probability for a sequence is not sufficient in general to establish a rate in probability for its Cesàro mean. We also present several sets of conditions on the sequence of random variables that are sufficient to guarantee a rate of convergence for its Cesàro mean. We identify common settings in which these sets of conditions hold. △ Less

Submitted 13 September, 2020; originally announced September 2020.

arXiv:2005.11303 [pdf, other]

Nonparametric inverse probability weighted estimators based on the highly adaptive lasso

Authors: Ashkan Ertefaie, Nima S. Hejazi, Mark J. van der Laan

Abstract: Inverse probability weighted estimators are the oldest and potentially most commonly used class of procedures for the estimation of causal effects. By adjusting for selection biases via a weighting mechanism, these procedures estimate an effect of interest by constructing a pseudo-population in which selection biases are eliminated. Despite their ease of use, these estimators require the correct s… ▽ More Inverse probability weighted estimators are the oldest and potentially most commonly used class of procedures for the estimation of causal effects. By adjusting for selection biases via a weighting mechanism, these procedures estimate an effect of interest by constructing a pseudo-population in which selection biases are eliminated. Despite their ease of use, these estimators require the correct specification of a model for the weighting mechanism, are known to be inefficient, and suffer from the curse of dimensionality. We propose a class of nonparametric inverse probability weighted estimators in which the weighting mechanism is estimated via undersmoothing of the highly adaptive lasso, a nonparametric regression function proven to converge at $n^{-1/3}$-rate to the true weighting mechanism. We demonstrate that our estimators are asymptotically linear with variance converging to the nonparametric efficiency bound. Unlike doubly robust estimators, our procedures require neither derivation of the efficient influence function nor specification of the conditional outcome model. Our theoretical developments have broad implications for the construction of efficient inverse probability weighted estimators in large statistical models and a variety of problem settings. We assess the practical performance of our estimators in simulation studies and demonstrate use of our proposed methodology with data from a large-scale epidemiologic study. △ Less

Submitted 3 July, 2021; v1 submitted 22 May, 2020; originally announced May 2020.

arXiv:1908.05607 [pdf, other]

Efficient Estimation of Pathwise Differentiable Target Parameters with the Undersmoothed Highly Adaptive Lasso

Authors: Mark J. van der Laan, David Benkeser, Weixin Cai

Abstract: We consider estimation of a functional parameter of a realistically modeled data distribution based on observing independent and identically distributed observations. We define an $m$-th order Spline Highly Adaptive Lasso Minimum Loss Estimator (Spline HAL-MLE) of a functional parameter that is defined by minimizing the empirical risk function over an $m$-th order smoothness class of functions. We… ▽ More We consider estimation of a functional parameter of a realistically modeled data distribution based on observing independent and identically distributed observations. We define an $m$-th order Spline Highly Adaptive Lasso Minimum Loss Estimator (Spline HAL-MLE) of a functional parameter that is defined by minimizing the empirical risk function over an $m$-th order smoothness class of functions. We show that this $m$-th order smoothness class consists of all functions that can be represented as an infinitesimal linear combination of tensor products of $\leq m$-th order spline-basis functions, and involves assuming $m$-derivatives in each coordinate. By selecting $m$ with cross-validation we obtain a Spline-HAL-MLE that is able to adapt to the underlying unknown smoothness of the true function, while guaranteeing a rate of convergence faster than $n^{-1/4}$, as long as the true function is cadlag (right-continuous with left-hand limits) and has finite sectional variation norm. The $m=0$-smoothness class consists of all cadlag functions with finite sectional variation norm and corresponds with the original HAL-MLE defined in van der Laan (2015). In this article we establish that this Spline-HAL-MLE yields an asymptotically efficient estimator of any smooth feature of the functional parameter under an easily verifiable global undersmoothing condition. A sufficient condition for the latter condition is that the minimum of the empirical mean of the selected basis functions is smaller than a constant times $n^{-1/2}$, which is not parameter specific and enforces the selection of the $L_1$-norm in the lasso to be large enough to include sparsely supported basis. We demonstrate our general result for the $m=0$-HAL-MLE of the average treatment effect and of the integral of the square of the data density. We also present simulations for these two examples confirming the theory. △ Less

Submitted 2 July, 2021; v1 submitted 14 August, 2019; originally announced August 2019.

arXiv:1907.09244 [pdf, ps, other]

Fast rates for empirical risk minimization over càdlàg functions with bounded sectional variation norm

Authors: Aurélien F. Bibaut, Mark J. van der Laan

Abstract: Empirical risk minimization over classes functions that are bounded for some version of the variation norm has a long history, starting with Total Variation Denoising (Rudin et al., 1992), and has been considered by several recent articles, in particular Fang et al., 2019 and van der Laan, 2015. In this article, we consider empirical risk minimization over the class $\mathcal{F}_d$ of càdlàg funct… ▽ More Empirical risk minimization over classes functions that are bounded for some version of the variation norm has a long history, starting with Total Variation Denoising (Rudin et al., 1992), and has been considered by several recent articles, in particular Fang et al., 2019 and van der Laan, 2015. In this article, we consider empirical risk minimization over the class $\mathcal{F}_d$ of càdlàg functions over $[0,1]^d$ with bounded sectional variation norm (also called Hardy-Krause variation). We show how a certain representation of functions in $\mathcal{F}_d$ allows to bound the bracketing entropy of sieves of $\mathcal{F}_d$, and therefore derive rates of convergence in nonparametric function estimation. Specifically, for sieves whose growth is controlled by some rate $a_n$, we show that the empirical risk minimizer has rate of convergence $O_P(n^{-1/3} (\log n)^{2(d-1)/3} a_n)$. Remarkably, the dimension only affects the rate in $n$ through the logarithmic factor, making this method especially appropriate for high dimensional problems. In particular, we show that in the case of nonparametric regression over sieves of càdlàg functions with bounded sectional variation norm, this upper bound on the rate of convergence holds for least-squares estimators, under the random design, sub-exponential errors setting. △ Less

Submitted 23 August, 2019; v1 submitted 22 July, 2019; originally announced July 2019.

arXiv:1905.10299 [pdf, other]

Nonparametric Bootstrap Inference for the Targeted Highly Adaptive LASSO Estimator

Authors: Weixin Cai, Mark van der Laan

Abstract: The Highly-Adaptive-LASSO Targeted Minimum Loss Estimator (HAL-TMLE) is an efficient plug-in estimator of a pathwise differentiable parameter in a statistical model that at minimal (and possibly only) assumes that the sectional variation norm of the true nuisance functional parameters (i.e., the relevant part of data distribution) are finite. It relies on an initial estimator (HAL-MLE) of the nuis… ▽ More The Highly-Adaptive-LASSO Targeted Minimum Loss Estimator (HAL-TMLE) is an efficient plug-in estimator of a pathwise differentiable parameter in a statistical model that at minimal (and possibly only) assumes that the sectional variation norm of the true nuisance functional parameters (i.e., the relevant part of data distribution) are finite. It relies on an initial estimator (HAL-MLE) of the nuisance functional parameters by minimizing the empirical risk over the parameter space under the constraint that the sectional variation norm of the candidate functions are bounded by a constant, where this constant can be selected with cross-validation. In this article, we establish that the nonparametric bootstrap for the HAL-TMLE, fixing the value of the sectional variation norm at a value larger or equal than the cross-validation selector, provides a consistent method for estimating the normal limit distribution of the HAL-TMLE. In order to optimize the finite sample coverage of the nonparametric bootstrap confidence intervals, we propose a selection method for this sectional variation norm that is based on running the nonparametric bootstrap for all values of the sectional variation norm larger than the one selected by cross-validation, and subsequently determining a value at which the width of the resulting confidence intervals reaches a plateau. We demonstrate our method for 1) nonparametric estimation of the average treatment effect based on observing on each unit a covariate vector, binary treatment, and outcome, and for 2) nonparametric estimation of the integral of the square of the multivariate density of the data distribution. In addition, we also present simulation results for these two examples demonstrating the excellent finite sample coverage of bootstrap-based confidence intervals. △ Less

Submitted 7 February, 2020; v1 submitted 23 May, 2019; originally announced May 2019.

Comments: arXiv admin note: substantial text overlap with arXiv:1708.09502

arXiv:1811.03745 [pdf, other]

A Fundamental Measure of Treatment Effect Heterogeneity

Authors: Jonathan Levy, Mark van der Laan, Alan Hubbard, Romain Pirracchio

Abstract: We offer a non-parametric plug-in estimator for an important measure of treatment effect variability and provide minimum conditions under which the estimator is asymptotically efficient. The stratum specific treatment effect function or so-called blip function, is the average treatment effect for a randomly drawn stratum of confounders. The mean of the blip function is the average treatment effect… ▽ More We offer a non-parametric plug-in estimator for an important measure of treatment effect variability and provide minimum conditions under which the estimator is asymptotically efficient. The stratum specific treatment effect function or so-called blip function, is the average treatment effect for a randomly drawn stratum of confounders. The mean of the blip function is the average treatment effect (ATE), whereas the variance of the blip function (VTE), the main subject of this paper, measures overall clinical effect heterogeneity, perhaps providing a strong impetus to refine treatment based on the confounders. VTE is also an important measure for assessing reliability of the treatment for an individual. The CV-TMLE provides simultaneous plug-in estimates and inference for both ATE and VTE, guaranteeing asymptotic efficiency under one less condition than for TMLE. This condition is difficult to guarantee a priori, particularly when using highly adaptive machine learning that we need to employ in order to eliminate bias. Even in defiance of this condition, CV-TMLE sampling distributions maintain normality, not guaranteed for TMLE, and have a lower mean squared error than their TMLE counterparts. In addition to verifying the theoretical properties of TMLE and CV-TMLE through simulations, we point out some of the challenges in estimating VTE, which lacks double robustness and might be unavoidably biased if the true VTE is small and sample size insufficient. We will provide an application of the estimator on a data set for treatment of acute trauma patients. △ Less

Submitted 23 December, 2018; v1 submitted 8 November, 2018; originally announced November 2018.

Comments: Presented at JSM 2018

arXiv:1810.09022 [pdf, other]

Correcting an estimator of a multivariate monotone function with isotonic regression

Authors: Ted Westling, Mark van der Laan, Marco Carone

Abstract: In many problems, a sensible estimator of a possibly multivariate monotone function may itself fail to be monotone. We study the correction of such an estimator obtained via projection onto the space of functions monotone over a finite grid in the domain. We demonstrate that this corrected estimator has no worse supremal estimation error than the initial estimator, and that analogously corrected c… ▽ More In many problems, a sensible estimator of a possibly multivariate monotone function may itself fail to be monotone. We study the correction of such an estimator obtained via projection onto the space of functions monotone over a finite grid in the domain. We demonstrate that this corrected estimator has no worse supremal estimation error than the initial estimator, and that analogously corrected confidence bands contain the true function whenever the initial bands do, at no loss to average or maximal band width. Additionally, we demonstrate that the corrected estimator is uniformly asymptotically equivalent to the initial estimator provided that the initial estimator satisfies a stochastic equicontinuity condition and that the true function is Lipschitz and strictly monotone. We provide simple sufficient conditions for our stochastic equicontinuity condition in the important special case that the initial estimator is uniformly asymptotically linear, and illustrate the use of these results for estimation of a G-computed distribution function. Our stochastic equicontinuity condition is weaker than standard uniform stochastic equicontinuity, which has been required for alternative correction procedures. Crucially, this allows us to apply our results to the bivariate correction of the local linear estimator of a conditional distribution function known to be monotone in its conditioning argument. Our experiments suggest that the projection step can yield significant practical improvements in performance for both the estimator and confidence band. △ Less

Submitted 4 September, 2019; v1 submitted 21 October, 2018; originally announced October 2018.

arXiv:1810.03030 [pdf, other]

Robust variance estimation and inference for causal effect estimation

Authors: Linh Tran, Maya Petersen, Joshua Schwab, Mark J van der Laan

Abstract: We consider a longitudinal data structure consisting of baseline covariates, time-varying treatment variables, intermediate time-dependent covariates, and a possibly time dependent outcome. Previous studies have shown that estimating the variance of asymptotically linear estimators using empirical influence functions in this setting result in anti-conservative estimates with increasing magnitudes… ▽ More We consider a longitudinal data structure consisting of baseline covariates, time-varying treatment variables, intermediate time-dependent covariates, and a possibly time dependent outcome. Previous studies have shown that estimating the variance of asymptotically linear estimators using empirical influence functions in this setting result in anti-conservative estimates with increasing magnitudes of positivity violations, leading to poor coverage and uncontrolled Type I errors. In this paper, we present two alternative approaches of estimating the variance of these estimators: (i) a robust approach which directly targets the variance of the influence function as a counterfactual mean outcome, and (ii) a non-parametric bootstrap based approach that is theoretically valid and lowers the computational cost, thereby increasing the feasibility in non-parametric settings using complex machine learning algorithms. The performance of these approaches are compared to that of the empirical influence function in simulations across different levels of positivity violations and treatment effect sizes. △ Less

Submitted 6 October, 2018; originally announced October 2018.

Comments: 20 pages, 8 figures

arXiv:1809.00734 [pdf, other]

Robust Estimation of Data-Dependent Causal Effects based on Observing a Single Time-Series

Authors: Mark J. van der Laan, Ivana Malenica

Abstract: Consider the case that one observes a single time-series, where at each time t one observes a data record O(t) involving treatment nodes A(t), possible covariates L(t) and an outcome node Y(t). The data record at time t carries information for an (potentially causal) effect of the treatment A(t) on the outcome Y(t), in the context defined by a fixed dimensional summary measure Co(t). We are concer… ▽ More Consider the case that one observes a single time-series, where at each time t one observes a data record O(t) involving treatment nodes A(t), possible covariates L(t) and an outcome node Y(t). The data record at time t carries information for an (potentially causal) effect of the treatment A(t) on the outcome Y(t), in the context defined by a fixed dimensional summary measure Co(t). We are concerned with defining causal effects that can be consistently estimated, with valid inference, for sequentially randomized experiments without further assumptions. More generally, we consider the case when the (possibly causal) effects can be estimated in a double robust manner, analogue to double robust estimation of effects in the i.i.d. causal inference literature. We propose a general class of averages of conditional (context-specific) causal parameters that can be estimated in a double robust manner, therefore fully utilizing the sequential randomization. We propose a targeted maximum likelihood estimator (TMLE) of these causal parameters, and present a general theorem establishing the asymptotic consistency and normality of the TMLE. We extend our general framework to a number of typically studied causal target parameters, including a sequentially adaptive design within a single unit that learns the optimal treatment rule for the unit over time. Our work opens up robust statistical inference for causal questions based on observing a single time-series on a particular unit. △ Less

Submitted 3 September, 2018; originally announced September 2018.

arXiv:1804.00102 [pdf, other]

Collaborative targeted inference from continuously indexed nuisance parameter estimators

Authors: Cheng Ju, Antoine Chambaz, Mark J. van der Laan

Abstract: We wish to infer the value of a parameter at a law from which we sample independent observations. The parameter is smooth and we can define two variation-independent features of the law, its $Q$- and $G$-components, such that estimating them consistently at a fast enough product of rates allows to build a confidence interval (CI) with a given asymptotic level from a plain targeted minimum loss est… ▽ More We wish to infer the value of a parameter at a law from which we sample independent observations. The parameter is smooth and we can define two variation-independent features of the law, its $Q$- and $G$-components, such that estimating them consistently at a fast enough product of rates allows to build a confidence interval (CI) with a given asymptotic level from a plain targeted minimum loss estimator (TMLE). Say that the above product is not fast enough and the algorithm for the $G$-component is fine-tuned by a real-valued $h$. A plain TMLE with an $h$ chosen by cross-validation would typically not yield a CI. We construct a collaborative TMLE (C-TMLE) and show under mild conditions that, if there exists an oracle $h$ that makes a bulky remainder term asymptotically Gaussian, then the C-TMLE yields a CI. We illustrate our findings with the inference of the average treatment effect. We conduct a simulation study where the $G$-component is estimated by the LASSO and $h$ is the bound on the coefficients' norms. It sheds light on small sample properties, in the face of low- to high-dimensional baseline covariates, and possibly positivity violation. △ Less

Submitted 5 April, 2018; v1 submitted 30 March, 2018; originally announced April 2018.

Comments: 38 pages

arXiv:1709.06256 [pdf, ps, other]

Uniform Consistency of the Highly Adaptive Lasso Estimator of Infinite Dimensional Parameters

Authors: Mark J. van der Laan, Aurélien F. Bibaut

Abstract: Consider the case that we observe $n$ independent and identically distributed copies of a random variable with a probability distribution known to be an element of a specified statistical model. We are interested in estimating an infinite dimensional target parameter that minimizes the expectation of a specified loss function. In \cite{generally_efficient_TMLE} we defined an estimator that minimiz… ▽ More Consider the case that we observe $n$ independent and identically distributed copies of a random variable with a probability distribution known to be an element of a specified statistical model. We are interested in estimating an infinite dimensional target parameter that minimizes the expectation of a specified loss function. In \cite{generally_efficient_TMLE} we defined an estimator that minimizes the empirical risk over all multivariate real valued cadlag functions with variation norm bounded by some constant $M$ in the parameter space, and selects $M$ with cross-validation. We referred to this estimator as the Highly-Adaptive-Lasso estimator due to the fact that the constrained can be formulated as a bound $M$ on the sum of the coefficients a linear combination of a very large number of basis functions. Specifically, in the case that the target parameter is a conditional mean, then it can be implemented with the standard LASSO regression estimator. In \cite{generally_efficient_TMLE} we proved that the HAL-estimator is consistent w.r.t. the (quadratic) loss-based dissimilarity at a rate faster than $n^{-1/2}$ (i.e., faster than $n^{-1/4}$ w.r.t. a norm), even when the parameter space is completely nonparametric. The only assumption required for this rate is that the true parameter function has a finite variation norm. The loss-based dissimilarity is often equivalent with the square of an $L^2(P_0)$-type norm. In this article, we establish that under some weak continuity condition, the HAL-estimator is also uniformly consistent. △ Less

Submitted 19 September, 2017; originally announced September 2017.

arXiv:1708.09502 [pdf, ps, other]

Finite Sample Inference for Targeted Learning

Authors: Mark van der Laan

Abstract: The Highly-Adaptive-Lasso(HAL)-TMLE is an efficient estimator of a pathwise differentiable parameter in a statistical model that at minimal (and possibly only) assumes that the sectional variation norm of the true nuisance parameters are finite. It relies on an initial estimator (HAL-MLE) of the nuisance parameters by minimizing the empirical risk over the parameter space under the constraint that… ▽ More The Highly-Adaptive-Lasso(HAL)-TMLE is an efficient estimator of a pathwise differentiable parameter in a statistical model that at minimal (and possibly only) assumes that the sectional variation norm of the true nuisance parameters are finite. It relies on an initial estimator (HAL-MLE) of the nuisance parameters by minimizing the empirical risk over the parameter space under the constraint that sectional variation norm is bounded by a constant, where this constant can be selected with cross-validation. In the formulation of the HALMLE this sectional variation norm corresponds with the sum of absolute value of coefficients for an indicator basis. Due to its reliance on machine learning, statistical inference for the TMLE has been based on its normal limit distribution, thereby potentially ignoring a large second order remainder in finite samples. In this article, we present four methods for construction of a finite sample 0.95-confidence interval that use the nonparametric bootstrap to estimate the finite sample distribution of the HAL-TMLE or a conservative distribution dominating the true finite sample distribution. We prove that it consistently estimates the optimal normal limit distribution, while its approximation error is driven by the performance of the bootstrap for a well behaved empirical process. We demonstrate our general inferential methods for 1) nonparametric estimation of the average treatment effect based on observing on each unit a covariate vector, binary treatment, and outcome, and for 2) nonparametric estimation of the integral of the square of the multivariate density of the data distribution. △ Less

Submitted 30 August, 2017; originally announced August 2017.

arXiv:1706.07408 [pdf, other]

Data-adaptive smoothing for optimal-rate estimation of possibly non-regular parameters

Authors: Aurelien F. Bibaut, Mark J. van der Laan

Abstract: We consider nonparametric inference of finite dimensional, potentially non-pathwise differentiable target parameters. In a nonparametric model, some examples of such parameters that are always non pathwise differentiable target parameters include probability density functions at a point, or regression functions at a point. In causal inference, under appropriate causal assumptions, mean counterfact… ▽ More We consider nonparametric inference of finite dimensional, potentially non-pathwise differentiable target parameters. In a nonparametric model, some examples of such parameters that are always non pathwise differentiable target parameters include probability density functions at a point, or regression functions at a point. In causal inference, under appropriate causal assumptions, mean counterfactual outcomes can be pathwise differentiable or not, depending on the degree at which the positivity assumption holds. In this paper, given a potentially non-pathwise differentiable target parameter, we introduce a family of approximating parameters, that are pathwise differentiable. This family is indexed by a scalar. In kernel regression or density estimation for instance, a natural choice for such a family is obtained by kernel smoothing and is indexed by the smoothing level. For the counterfactual mean outcome, a possible approximating family is obtained through truncation of the propensity score, and the truncation level then plays the role of the index. We propose a method to data-adaptively select the index in the family, so as to optimize mean squared error. We prove an asymptotic normality result, which allows us to derive confidence intervals. Under some conditions, our estimator achieves an optimal mean squared error convergence rate. Confidence intervals are data-adaptive and have almost optimal width. A simulation study demonstrates the practical performance of our estimators for the inference of a causal dose-response curve at a given treatment dose. △ Less

Submitted 12 July, 2017; v1 submitted 22 June, 2017; originally announced June 2017.

arXiv:1705.08527 [pdf, other]

Causal inference for social network data

Authors: Elizabeth L. Ogburn, Oleg Sofrygin, Ivan Diaz, Mark J. van der Laan

Abstract: We describe semiparametric estimation and inference for causal effects using observational data from a single social network. Our asymptotic results are the first to allow for dependence of each observation on a growing number of other units as sample size increases. In addition, while previous methods have implicitly permitted only one of two possible sources of dependence among social network ob… ▽ More We describe semiparametric estimation and inference for causal effects using observational data from a single social network. Our asymptotic results are the first to allow for dependence of each observation on a growing number of other units as sample size increases. In addition, while previous methods have implicitly permitted only one of two possible sources of dependence among social network observations, we allow for both dependence due to transmission of information across network ties and for dependence due to latent similarities among nodes sharing ties. We propose new causal effects that are specifically of interest in social network settings, such as interventions on network ties and network structure. We use our methods to reanalyze an influential and controversial study that estimated causal peer effects of obesity using social network data from the Framingham Heart Study; after accounting for network structure we find no evidence for causal peer effects. △ Less

Submitted 1 June, 2022; v1 submitted 23 May, 2017; originally announced May 2017.

arXiv:1608.08717 [pdf, other]

Toward computerized efficient estimation in infinite-dimensional models

Authors: Marco Carone, Alexander R. Luedtke, Mark J. van der Laan

Abstract: Despite the risk of misspecification they are tied to, parametric models continue to be used in statistical practice because they are accessible to all. In particular, efficient estimation procedures in parametric models are simple to describe and implement. Unfortunately, the same cannot be said of semiparametric and nonparametric models. While the latter often reflect the level of available scie… ▽ More Despite the risk of misspecification they are tied to, parametric models continue to be used in statistical practice because they are accessible to all. In particular, efficient estimation procedures in parametric models are simple to describe and implement. Unfortunately, the same cannot be said of semiparametric and nonparametric models. While the latter often reflect the level of available scientific knowledge more appropriately, performing efficient inference in these models is generally challenging. The efficient influence function is a key analytic object from which the construction of asymptotically efficient estimators can potentially be streamlined. However, the theoretical derivation of the efficient influence function requires specialized knowledge and is often a difficult task, even for experts. In this paper, we propose and discuss a numerical procedure for approximating the efficient influence function. The approach generalizes the simple nonparametric procedures described recently by Frangakis et al. (2015) and Luedtke et al. (2015) to arbitrary models. We present theoretical results to support our proposal, and also illustrate the method in the context of two examples. The proposed approach is an important step toward automating efficient estimation in general statistical models, thereby rendering the use of realistic models in statistical analyses much more accessible. △ Less

Submitted 30 August, 2016; originally announced August 2016.

arXiv:1603.07573 [pdf, ps, other]

doi 10.1214/15-AOS1384

Statistical inference for the mean outcome under a possibly non-unique optimal treatment strategy

Authors: Alexander R. Luedtke, Mark J. van der Laan

Abstract: We consider challenges that arise in the estimation of the mean outcome under an optimal individualized treatment strategy defined as the treatment rule that maximizes the population mean outcome, where the candidate treatment rules are restricted to depend on baseline covariates. We prove a necessary and sufficient condition for the pathwise differentiability of the optimal value, a key condition… ▽ More We consider challenges that arise in the estimation of the mean outcome under an optimal individualized treatment strategy defined as the treatment rule that maximizes the population mean outcome, where the candidate treatment rules are restricted to depend on baseline covariates. We prove a necessary and sufficient condition for the pathwise differentiability of the optimal value, a key condition needed to develop a regular and asymptotically linear (RAL) estimator of the optimal value. The stated condition is slightly more general than the previous condition implied in the literature. We then describe an approach to obtain root-$n$ rate confidence intervals for the optimal value even when the parameter is not pathwise differentiable. We provide conditions under which our estimator is RAL and asymptotically efficient when the mean outcome is pathwise differentiable. We also outline an extension of our approach to a multiple time point problem. All of our results are supported by simulations. △ Less

Submitted 24 March, 2016; originally announced March 2016.

Comments: Published at http://dx.doi.org/10.1214/15-AOS1384 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOS-AOS1384

Journal ref: Annals of Statistics 2016, Vol. 44, No. 2, 713-742

arXiv:1511.08369 [pdf, other]

Second-Order Inference for the Mean of a Variable Missing at Random

Authors: Iván Díaz, Marco Carone, Mark J. van der Laan

Abstract: We present a second-order estimator of the mean of a variable subject to missingness, under the missing at random assumption. The estimator improves upon existing methods by using an approximate second-order expansion of the parameter functional, in addition to the first-order expansion employed by standard doubly robust methods. This results in weaker assumptions about the convergence rates neces… ▽ More We present a second-order estimator of the mean of a variable subject to missingness, under the missing at random assumption. The estimator improves upon existing methods by using an approximate second-order expansion of the parameter functional, in addition to the first-order expansion employed by standard doubly robust methods. This results in weaker assumptions about the convergence rates necessary to establish consistency, local efficiency, and asymptotic linearity. The general estimation strategy is developed under the targeted minimum loss-based estimation (TMLE) framework. We present a simulation comparing the sensitivity of the first and second order estimators to the convergence rate of the initial estimators of the outcome regression and missingness score. In our simulation, the second-order TMLE improved the coverage probability of a confidence interval by up to 85%. In addition, we present a first-order estimator inspired by a second-order expansion of the parameter functional. This estimator only requires one-dimensional smoothing, whereas implementation of the second-order TMLE generally requires kernel smoothing on the covariate space. The first-order estimator proposed is expected to have improved finite sample performance compared to existing first-order estimators. In our simulations, the proposed first-order estimator improved the coverage probability by up to 90%. We provide an illustration of our methods using a publicly available dataset to determine the effect of an anticoagulant on health outcomes of patients undergoing percutaneous coronary intervention. We provide R code implementing the proposed estimator. △ Less

Submitted 26 November, 2015; originally announced November 2015.

arXiv:1510.04195 [pdf, other]

An Omnibus Nonparametric Test of Equality in Distribution for Unknown Functions

Authors: Alexander R. Luedtke, Marco Carone, Mark J. van der Laan

Abstract: We present a novel family of nonparametric omnibus tests of the hypothesis that two unknown but estimable functions are equal in distribution when applied to the observed data structure. We developed these tests, which represent a generalization of the maximum mean discrepancy tests described in Gretton et al. [2006], using recent developments from the higher-order pathwise differentiability liter… ▽ More We present a novel family of nonparametric omnibus tests of the hypothesis that two unknown but estimable functions are equal in distribution when applied to the observed data structure. We developed these tests, which represent a generalization of the maximum mean discrepancy tests described in Gretton et al. [2006], using recent developments from the higher-order pathwise differentiability literature. Despite their complex derivation, the associated test statistics can be expressed rather simply as U-statistics. We study the asymptotic behavior of the proposed tests under the null hypothesis and under both fixed and local alternatives. We provide examples to which our tests can be applied and show that they perform well in a simulation study. As an important special case, our proposed tests can be used to determine whether an unknown function, such as the conditional average treatment effect, is equal to zero almost surely. △ Less

Submitted 13 June, 2017; v1 submitted 14 October, 2015; originally announced October 2015.

MSC Class: 62G10

arXiv:0705.1270 [pdf, ps, other]

doi 10.1214/07-EJS050

Causal inference in longitudinal studies with history-restricted marginal structural models

Authors: Romain Neugebauer, Mark J. van der Laan, Marshall M. Joffe, Ira B. Tager

Abstract: A new class of Marginal Structural Models (MSMs), History-Restricted MSMs (HRMSMs), was recently introduced for longitudinal data for the purpose of defining causal parameters which may often be better suited for public health research or at least more practicable than MSMs \citejoffe,feldman. HRMSMs allow investigators to analyze the causal effect of a treatment on an outcome based on a fixed,… ▽ More A new class of Marginal Structural Models (MSMs), History-Restricted MSMs (HRMSMs), was recently introduced for longitudinal data for the purpose of defining causal parameters which may often be better suited for public health research or at least more practicable than MSMs \citejoffe,feldman. HRMSMs allow investigators to analyze the causal effect of a treatment on an outcome based on a fixed, shorter and user-specified history of exposure compared to MSMs. By default, the latter represent the treatment causal effect of interest based on a treatment history defined by the treatments assigned between the study's start and outcome collection. We lay out in this article the formal statistical framework behind HRMSMs. Beyond allowing a more flexible causal analysis, HRMSMs improve computational tractability and mitigate statistical power concerns when designing longitudinal studies. We also develop three consistent estimators of HRMSM parameters under sufficient model assumptions: the Inverse Probability of Treatment Weighted (IPTW), G-computation and Double Robust (DR) estimators. In addition, we show that the assumptions commonly adopted for identification and consistent estimation of MSM parameters (existence of counterfactuals, consistency, time-ordering and sequential randomization assumptions) also lead to identification and consistent estimation of HRMSM parameters. △ Less

Submitted 9 May, 2007; originally announced May 2007.

Comments: Published at http://dx.doi.org/10.1214/07-EJS050 in the Electronic Journal of Statistics (http://www.i-journals.org/ejs/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-EJS-EJS_2007_50

Journal ref: Electronic Journal of Statistics 2007, Vol. 1, 119-154

Showing 1–33 of 33 results for author: van der Laan, M