Skip to main content

Showing 1–33 of 33 results for author: van der Laan, M

Searching in archive math. Search in all archives.
.
  1. arXiv:2404.11083  [pdf, other

    math.ST

    Estimating conditional hazard functions and densities with the highly-adaptive lasso

    Authors: Anders Munch, Thomas A. Gerds, Mark J. van der Laan, Helene C. W. Rytgaard

    Abstract: We consider estimation of conditional hazard functions and densities over the class of multivariate càdlàg functions with uniformly bounded sectional variation norm when data are either fully observed or subject to right-censoring. We demonstrate that the empirical risk minimizer is either not well-defined or not consistent for estimation of conditional hazard functions and densities. Under a smoo… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: 36 pages, 14 figures

    MSC Class: 62G05 (primary) 62N02 (secondary)

  2. arXiv:2309.16099  [pdf, other

    math.ST stat.ME stat.ML

    Nonparametric estimation of a covariate-adjusted counterfactual treatment regimen response curve

    Authors: Ashkan Ertefaie, Luke Duttweiler, Brent A. Johnson, Mark J. van der Laan

    Abstract: Flexible estimation of the mean outcome under a treatment regimen (i.e., value function) is the key step toward personalized medicine. We define our target parameter as a conditional value function given a set of baseline covariates which we refer to as a stratum based value function. We focus on semiparametric class of decision rules and propose a sieve based nonparametric covariate adjusted regi… ▽ More

    Submitted 27 September, 2023; originally announced September 2023.

  3. arXiv:2307.12544  [pdf, other

    stat.ME math.ST stat.ML

    Adaptive debiased machine learning using data-driven model selection techniques

    Authors: Lars van der Laan, Marco Carone, Alex Luedtke, Mark van der Laan

    Abstract: Debiased machine learning estimators for nonparametric inference of smooth functionals of the data-generating distribution can suffer from excessive variability and instability. For this reason, practitioners may resort to simpler models based on parametric or semiparametric assumptions. However, such simplifying assumptions may fail to hold, and estimates may then be biased due to model misspecif… ▽ More

    Submitted 24 July, 2023; originally announced July 2023.

    Comments: 32 pages + appendix

  4. arXiv:2301.13354  [pdf, ps, other

    math.ST

    Higher Order Spline Highly Adaptive Lasso Estimators of Functional Parameters: Pointwise Asymptotic Normality and Uniform Convergence Rates

    Authors: Mark van der Laan

    Abstract: We consider estimation of a functional of the data distribution based on i.i.d. observations. We assume the target function can be defined as the minimizer of the expectation of a loss function over a class of $d$-variate real valued cadlag functions that have finite sectional variation norm. For all $k=0,1,\ldots$, we define a $k$-th order smoothness class of functions as $d$-variate functions on… ▽ More

    Submitted 30 January, 2023; originally announced January 2023.

  5. arXiv:2205.10697  [pdf, other

    stat.ML cs.LG math.ST

    Lassoed Tree Boosting

    Authors: Alejandro Schuler, Yi Li, Mark van der Laan

    Abstract: Gradient boosting performs exceptionally in most prediction problems and scales well to large datasets. In this paper we prove that a ``lassoed'' gradient boosted tree algorithm with early stop** achieves faster than $n^{-1/4}$ L2 convergence in the large nonparametric space of cadlag functions of bounded sectional variation. This rate is remarkable because it does not depend on the dimension, s… ▽ More

    Submitted 8 December, 2023; v1 submitted 21 May, 2022; originally announced May 2022.

  6. arXiv:2110.12112  [pdf, ps, other

    math.ST cs.LG stat.ML

    Why Machine Learning Cannot Ignore Maximum Likelihood Estimation

    Authors: Mark J. van der Laan, Sherri Rose

    Abstract: The growth of machine learning as a field has been accelerating with increasing interest and publications across fields, including statistics, but predominantly in computer science. How can we parse this vast literature for developments that exemplify the necessary rigor? How many of these manuscripts incorporate foundational theory to allow for statistical inference? Which advances have the great… ▽ More

    Submitted 22 October, 2021; originally announced October 2021.

    Comments: 30 pages. Forthcoming as a chapter in the Handbook of Matching and Weighting in Causal Inference

  7. arXiv:2106.01723  [pdf, other

    stat.ML cs.LG math.ST

    Risk Minimization from Adaptively Collected Data: Guarantees for Supervised and Policy Learning

    Authors: Aurélien Bibaut, Antoine Chambaz, Maria Dimakopoulou, Nathan Kallus, Mark van der Laan

    Abstract: Empirical risk minimization (ERM) is the workhorse of machine learning, whether for classification and regression or for off-policy policy learning, but its model-agnostic guarantees can fail when we use adaptively collected data, such as the result of running a contextual bandit algorithm. We study a generic importance sampling weighted ERM algorithm for using adaptively collected data to minimiz… ▽ More

    Submitted 3 June, 2021; originally announced June 2021.

  8. arXiv:2106.00418  [pdf, other

    stat.ML cs.LG math.ST

    Post-Contextual-Bandit Inference

    Authors: Aurélien Bibaut, Antoine Chambaz, Maria Dimakopoulou, Nathan Kallus, Mark van der Laan

    Abstract: Contextual bandit algorithms are increasingly replacing non-adaptive A/B tests in e-commerce, healthcare, and policymaking because they can both improve outcomes for study participants and increase the chance of identifying good or even best policies. To support credible inference on novel interventions at the end of the study, nonetheless, we still want to construct valid confidence intervals on… ▽ More

    Submitted 1 June, 2021; originally announced June 2021.

  9. arXiv:2105.05373  [pdf, other

    math.ST stat.ME stat.ML

    Estimation of population size based on capture recapture designs and evaluation of the estimation reliability

    Authors: Yue You, Mark van der Laan, Philip Collender, Qu Cheng, Alan Hubbard, Nicholas P Jewell, Zhiyue Tom Hu, Robin Mejia, Justin Remais

    Abstract: We propose a modern method to estimate population size based on capture-recapture designs of K samples. The observed data is formulated as a sample of n i.i.d. K-dimensional vectors of binary indicators, where the k-th component of each vector indicates the subject being caught by the k-th sample, such that only subjects with nonzero capture vectors are observed. The target quantity is the uncondi… ▽ More

    Submitted 11 May, 2021; originally announced May 2021.

  10. arXiv:2105.02088  [pdf, other

    math.ST stat.ME

    Continuous-time targeted minimum loss-based estimation of intervention-specific mean outcomes

    Authors: Helene C. Rytgaard, Thomas A. Gerds, Mark J. van der Laan

    Abstract: This paper studies the generalization of the targeted minimum loss-based estimation (TMLE) framework to estimation of effects of time-varying interventions in settings where both interventions, covariates, and outcome can happen at subject-specific time-points on an arbitrarily fine time-scale. TMLE is a general template for constructing asymptotically linear substitution estimators for smooth low… ▽ More

    Submitted 5 May, 2021; originally announced May 2021.

    Comments: 27 pages (excluding supplementary material), 1 figures

  11. arXiv:2102.00102  [pdf, other

    math.ST cs.LG stat.ME stat.ML

    Adaptive Sequential Design for a Single Time-Series

    Authors: Ivana Malenica, Aurelien Bibaut, Mark J. van der Laan

    Abstract: The current work is motivated by the need for robust statistical methods for precision medicine; as such, we address the need for statistical methods that provide actionable inference for a single unit at any point in time. We aim to learn an optimal, unknown choice of the controlled components of the design in order to optimize the expected outcome; with that, we adapt the randomization mechanism… ▽ More

    Submitted 1 July, 2021; v1 submitted 29 January, 2021; originally announced February 2021.

    Comments: arXiv admin note: text overlap with arXiv:1809.00734

  12. arXiv:2101.07380  [pdf, ps, other

    math.ST

    Sequential causal inference in a single world of connected units

    Authors: Aurelien Bibaut, Maya Petersen, Nikos Vlassis, Maria Dimakopoulou, Mark van der Laan

    Abstract: We consider adaptive designs for a trial involving N individuals that we follow along T time steps. We allow for the variables of one individual to depend on its past and on the past of other individuals. Our goal is to learn a mean outcome, averaged across the N individuals, that we would observe, if we started from some given initial state, and we carried out a given sequence of counterfactual i… ▽ More

    Submitted 18 January, 2021; originally announced January 2021.

  13. arXiv:2101.06290  [pdf, other

    math.ST

    Higher Order Targeted Maximum Likelihood Estimation

    Authors: Mark van der Laan, Zeyi Wang, Lars van der Laan

    Abstract: Asymptotic efficiency of targeted maximum likelihood estimators (TMLE) of target features of the data distribution relies on a a second order remainder being asymptotically negligible. In previous work we proposed a nonparametric MLE termed Highly Adaptive Lasso (HAL) which parametrizes the relevant functional of the data distribution in terms of a multivariate real valued cadlag function that is… ▽ More

    Submitted 30 June, 2021; v1 submitted 15 January, 2021; originally announced January 2021.

  14. Nonparametric causal mediation analysis for stochastic interventional (in)direct effects

    Authors: Nima S. Hejazi, Kara E. Rudolph, Mark J. van der Laan, Iván Díaz

    Abstract: Causal mediation analysis has historically been limited in two important ways: (i) a focus has traditionally been placed on binary treatments and static interventions, and (ii) direct and indirect effect decompositions have been pursued that are only identifiable in the absence of intermediate confounders affected by treatment. We present a theoretical study of an (in)direct effect decomposition o… ▽ More

    Submitted 11 January, 2022; v1 submitted 14 September, 2020; originally announced September 2020.

    Journal ref: Biostatistics, 2022

  15. arXiv:2009.05974  [pdf, ps, other

    math.ST

    Sufficient and insufficient conditions for the stochastic convergence of Cesàro means

    Authors: Aurélien F. Bibaut, Alex Luedtke, Mark J. van der Laan

    Abstract: We study the stochastic convergence of the Cesàro mean of a sequence of random variables. These arise naturally in statistical problems that have a sequential component, where the sequence of random variables is typically derived from a sequence of estimators computed on data. We show that establishing a rate of convergence in probability for a sequence is not sufficient in general to establish a… ▽ More

    Submitted 13 September, 2020; originally announced September 2020.

  16. arXiv:2005.11303  [pdf, other

    stat.ME math.ST stat.ML

    Nonparametric inverse probability weighted estimators based on the highly adaptive lasso

    Authors: Ashkan Ertefaie, Nima S. Hejazi, Mark J. van der Laan

    Abstract: Inverse probability weighted estimators are the oldest and potentially most commonly used class of procedures for the estimation of causal effects. By adjusting for selection biases via a weighting mechanism, these procedures estimate an effect of interest by constructing a pseudo-population in which selection biases are eliminated. Despite their ease of use, these estimators require the correct s… ▽ More

    Submitted 3 July, 2021; v1 submitted 22 May, 2020; originally announced May 2020.

  17. arXiv:1908.05607  [pdf, other

    math.ST stat.ME

    Efficient Estimation of Pathwise Differentiable Target Parameters with the Undersmoothed Highly Adaptive Lasso

    Authors: Mark J. van der Laan, David Benkeser, Weixin Cai

    Abstract: We consider estimation of a functional parameter of a realistically modeled data distribution based on observing independent and identically distributed observations. We define an $m$-th order Spline Highly Adaptive Lasso Minimum Loss Estimator (Spline HAL-MLE) of a functional parameter that is defined by minimizing the empirical risk function over an $m$-th order smoothness class of functions. We… ▽ More

    Submitted 2 July, 2021; v1 submitted 14 August, 2019; originally announced August 2019.

  18. arXiv:1907.09244  [pdf, ps, other

    math.ST

    Fast rates for empirical risk minimization over càdlàg functions with bounded sectional variation norm

    Authors: Aurélien F. Bibaut, Mark J. van der Laan

    Abstract: Empirical risk minimization over classes functions that are bounded for some version of the variation norm has a long history, starting with Total Variation Denoising (Rudin et al., 1992), and has been considered by several recent articles, in particular Fang et al., 2019 and van der Laan, 2015. In this article, we consider empirical risk minimization over the class $\mathcal{F}_d$ of càdlàg funct… ▽ More

    Submitted 23 August, 2019; v1 submitted 22 July, 2019; originally announced July 2019.

  19. arXiv:1905.10299  [pdf, other

    math.ST stat.ME

    Nonparametric Bootstrap Inference for the Targeted Highly Adaptive LASSO Estimator

    Authors: Weixin Cai, Mark van der Laan

    Abstract: The Highly-Adaptive-LASSO Targeted Minimum Loss Estimator (HAL-TMLE) is an efficient plug-in estimator of a pathwise differentiable parameter in a statistical model that at minimal (and possibly only) assumes that the sectional variation norm of the true nuisance functional parameters (i.e., the relevant part of data distribution) are finite. It relies on an initial estimator (HAL-MLE) of the nuis… ▽ More

    Submitted 7 February, 2020; v1 submitted 23 May, 2019; originally announced May 2019.

    Comments: arXiv admin note: substantial text overlap with arXiv:1708.09502

  20. arXiv:1811.03745  [pdf, other

    stat.ME math.ST stat.AP

    A Fundamental Measure of Treatment Effect Heterogeneity

    Authors: Jonathan Levy, Mark van der Laan, Alan Hubbard, Romain Pirracchio

    Abstract: We offer a non-parametric plug-in estimator for an important measure of treatment effect variability and provide minimum conditions under which the estimator is asymptotically efficient. The stratum specific treatment effect function or so-called blip function, is the average treatment effect for a randomly drawn stratum of confounders. The mean of the blip function is the average treatment effect… ▽ More

    Submitted 23 December, 2018; v1 submitted 8 November, 2018; originally announced November 2018.

    Comments: Presented at JSM 2018

  21. arXiv:1810.09022  [pdf, other

    math.ST stat.ME

    Correcting an estimator of a multivariate monotone function with isotonic regression

    Authors: Ted Westling, Mark van der Laan, Marco Carone

    Abstract: In many problems, a sensible estimator of a possibly multivariate monotone function may itself fail to be monotone. We study the correction of such an estimator obtained via projection onto the space of functions monotone over a finite grid in the domain. We demonstrate that this corrected estimator has no worse supremal estimation error than the initial estimator, and that analogously corrected c… ▽ More

    Submitted 4 September, 2019; v1 submitted 21 October, 2018; originally announced October 2018.

  22. arXiv:1810.03030  [pdf, other

    math.ST stat.ME

    Robust variance estimation and inference for causal effect estimation

    Authors: Linh Tran, Maya Petersen, Joshua Schwab, Mark J van der Laan

    Abstract: We consider a longitudinal data structure consisting of baseline covariates, time-varying treatment variables, intermediate time-dependent covariates, and a possibly time dependent outcome. Previous studies have shown that estimating the variance of asymptotically linear estimators using empirical influence functions in this setting result in anti-conservative estimates with increasing magnitudes… ▽ More

    Submitted 6 October, 2018; originally announced October 2018.

    Comments: 20 pages, 8 figures

  23. arXiv:1809.00734  [pdf, other

    math.ST cs.LG stat.AP stat.ME stat.ML

    Robust Estimation of Data-Dependent Causal Effects based on Observing a Single Time-Series

    Authors: Mark J. van der Laan, Ivana Malenica

    Abstract: Consider the case that one observes a single time-series, where at each time t one observes a data record O(t) involving treatment nodes A(t), possible covariates L(t) and an outcome node Y(t). The data record at time t carries information for an (potentially causal) effect of the treatment A(t) on the outcome Y(t), in the context defined by a fixed dimensional summary measure Co(t). We are concer… ▽ More

    Submitted 3 September, 2018; originally announced September 2018.

  24. arXiv:1804.00102  [pdf, other

    stat.ME math.ST stat.ML

    Collaborative targeted inference from continuously indexed nuisance parameter estimators

    Authors: Cheng Ju, Antoine Chambaz, Mark J. van der Laan

    Abstract: We wish to infer the value of a parameter at a law from which we sample independent observations. The parameter is smooth and we can define two variation-independent features of the law, its $Q$- and $G$-components, such that estimating them consistently at a fast enough product of rates allows to build a confidence interval (CI) with a given asymptotic level from a plain targeted minimum loss est… ▽ More

    Submitted 5 April, 2018; v1 submitted 30 March, 2018; originally announced April 2018.

    Comments: 38 pages

  25. arXiv:1709.06256  [pdf, ps, other

    math.ST

    Uniform Consistency of the Highly Adaptive Lasso Estimator of Infinite Dimensional Parameters

    Authors: Mark J. van der Laan, Aurélien F. Bibaut

    Abstract: Consider the case that we observe $n$ independent and identically distributed copies of a random variable with a probability distribution known to be an element of a specified statistical model. We are interested in estimating an infinite dimensional target parameter that minimizes the expectation of a specified loss function. In \cite{generally_efficient_TMLE} we defined an estimator that minimiz… ▽ More

    Submitted 19 September, 2017; originally announced September 2017.

  26. arXiv:1708.09502  [pdf, ps, other

    math.ST

    Finite Sample Inference for Targeted Learning

    Authors: Mark van der Laan

    Abstract: The Highly-Adaptive-Lasso(HAL)-TMLE is an efficient estimator of a pathwise differentiable parameter in a statistical model that at minimal (and possibly only) assumes that the sectional variation norm of the true nuisance parameters are finite. It relies on an initial estimator (HAL-MLE) of the nuisance parameters by minimizing the empirical risk over the parameter space under the constraint that… ▽ More

    Submitted 30 August, 2017; originally announced August 2017.

  27. arXiv:1706.07408  [pdf, other

    math.ST stat.ME

    Data-adaptive smoothing for optimal-rate estimation of possibly non-regular parameters

    Authors: Aurelien F. Bibaut, Mark J. van der Laan

    Abstract: We consider nonparametric inference of finite dimensional, potentially non-pathwise differentiable target parameters. In a nonparametric model, some examples of such parameters that are always non pathwise differentiable target parameters include probability density functions at a point, or regression functions at a point. In causal inference, under appropriate causal assumptions, mean counterfact… ▽ More

    Submitted 12 July, 2017; v1 submitted 22 June, 2017; originally announced June 2017.

  28. arXiv:1705.08527  [pdf, other

    stat.ME math.ST

    Causal inference for social network data

    Authors: Elizabeth L. Ogburn, Oleg Sofrygin, Ivan Diaz, Mark J. van der Laan

    Abstract: We describe semiparametric estimation and inference for causal effects using observational data from a single social network. Our asymptotic results are the first to allow for dependence of each observation on a growing number of other units as sample size increases. In addition, while previous methods have implicitly permitted only one of two possible sources of dependence among social network ob… ▽ More

    Submitted 1 June, 2022; v1 submitted 23 May, 2017; originally announced May 2017.

  29. arXiv:1608.08717  [pdf, other

    math.ST

    Toward computerized efficient estimation in infinite-dimensional models

    Authors: Marco Carone, Alexander R. Luedtke, Mark J. van der Laan

    Abstract: Despite the risk of misspecification they are tied to, parametric models continue to be used in statistical practice because they are accessible to all. In particular, efficient estimation procedures in parametric models are simple to describe and implement. Unfortunately, the same cannot be said of semiparametric and nonparametric models. While the latter often reflect the level of available scie… ▽ More

    Submitted 30 August, 2016; originally announced August 2016.

  30. Statistical inference for the mean outcome under a possibly non-unique optimal treatment strategy

    Authors: Alexander R. Luedtke, Mark J. van der Laan

    Abstract: We consider challenges that arise in the estimation of the mean outcome under an optimal individualized treatment strategy defined as the treatment rule that maximizes the population mean outcome, where the candidate treatment rules are restricted to depend on baseline covariates. We prove a necessary and sufficient condition for the pathwise differentiability of the optimal value, a key condition… ▽ More

    Submitted 24 March, 2016; originally announced March 2016.

    Comments: Published at http://dx.doi.org/10.1214/15-AOS1384 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOS-AOS1384

    Journal ref: Annals of Statistics 2016, Vol. 44, No. 2, 713-742

  31. arXiv:1511.08369  [pdf, other

    math.ST

    Second-Order Inference for the Mean of a Variable Missing at Random

    Authors: Iván Díaz, Marco Carone, Mark J. van der Laan

    Abstract: We present a second-order estimator of the mean of a variable subject to missingness, under the missing at random assumption. The estimator improves upon existing methods by using an approximate second-order expansion of the parameter functional, in addition to the first-order expansion employed by standard doubly robust methods. This results in weaker assumptions about the convergence rates neces… ▽ More

    Submitted 26 November, 2015; originally announced November 2015.

  32. arXiv:1510.04195  [pdf, other

    math.ST stat.ML

    An Omnibus Nonparametric Test of Equality in Distribution for Unknown Functions

    Authors: Alexander R. Luedtke, Marco Carone, Mark J. van der Laan

    Abstract: We present a novel family of nonparametric omnibus tests of the hypothesis that two unknown but estimable functions are equal in distribution when applied to the observed data structure. We developed these tests, which represent a generalization of the maximum mean discrepancy tests described in Gretton et al. [2006], using recent developments from the higher-order pathwise differentiability liter… ▽ More

    Submitted 13 June, 2017; v1 submitted 14 October, 2015; originally announced October 2015.

    MSC Class: 62G10

  33. arXiv:0705.1270  [pdf, ps, other

    math.ST stat.ME

    Causal inference in longitudinal studies with history-restricted marginal structural models

    Authors: Romain Neugebauer, Mark J. van der Laan, Marshall M. Joffe, Ira B. Tager

    Abstract: A new class of Marginal Structural Models (MSMs), History-Restricted MSMs (HRMSMs), was recently introduced for longitudinal data for the purpose of defining causal parameters which may often be better suited for public health research or at least more practicable than MSMs \citejoffe,feldman. HRMSMs allow investigators to analyze the causal effect of a treatment on an outcome based on a fixed,… ▽ More

    Submitted 9 May, 2007; originally announced May 2007.

    Comments: Published at http://dx.doi.org/10.1214/07-EJS050 in the Electronic Journal of Statistics (http://www.i-journals.org/ejs/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-EJS-EJS_2007_50

    Journal ref: Electronic Journal of Statistics 2007, Vol. 1, 119-154