-
Adaptation using spatially distributed Gaussian Processes
Authors:
Botond Szabo,
Amine Hadji,
Aad van der Vaart
Abstract:
We consider the accuracy of an approximate posterior distribution in nonparametric regression problems by combining posterior distributions computed on subsets of the data defined by the locations of the independent variables. We show that this approximate posterior retains the rate of recovery of the full data posterior distribution, where the rate of recovery adapts to the smoothness of the true…
▽ More
We consider the accuracy of an approximate posterior distribution in nonparametric regression problems by combining posterior distributions computed on subsets of the data defined by the locations of the independent variables. We show that this approximate posterior retains the rate of recovery of the full data posterior distribution, where the rate of recovery adapts to the smoothness of the true regression function. As particular examples we consider Gaussian process priors based on integrated Brownian motion and the Matérn kernel augmented with a prior on the length scale. Besides theoretical guarantees we present a numerical study of the methods both on synthetic and real world data. We also propose a new aggregation technique, which numerically outperforms previous approaches.
△ Less
Submitted 21 December, 2023;
originally announced December 2023.
-
Optimal testing using combined test statistics across independent studies
Authors:
Botond Szabó,
Aad van der Vaart,
Lasse Vuursteen,
Harry van Zanten
Abstract:
Combining test statistics from independent trials or experiments is a popular method of meta-analysis. However, there is very limited theoretical understanding of the power of the combined test, especially in high-dimensional models considering composite hypotheses tests. We derive a mathematical framework to study standard {meta-analysis} testing approaches in the context of the many normal means…
▽ More
Combining test statistics from independent trials or experiments is a popular method of meta-analysis. However, there is very limited theoretical understanding of the power of the combined test, especially in high-dimensional models considering composite hypotheses tests. We derive a mathematical framework to study standard {meta-analysis} testing approaches in the context of the many normal means model, which serves as the platform to investigate more complex models.
We introduce a natural and mild restriction on the meta-level combination functions of the local trials. This allows us to mathematically quantify the cost of compressing $m$ trials into real-valued test statistics and combining these. We then derive minimax lower and matching upper bounds for the separation rates of standard combination methods for e.g. p-values and e-values, quantifying the loss relative to using the full, pooled data. We observe an elbow effect, revealing that in certain cases combining the locally optimal tests in each trial results in a sub-optimal {meta-analysis} method and develop approaches to achieve the global optima. We also explore the possible gains of allowing limited coordination between the trial designs. Our results connect meta-analysis with bandwidth constraint distributed inference and build on recent information theoretic developments in the latter field.
△ Less
Submitted 30 October, 2023;
originally announced October 2023.
-
Adaptive and Efficient Isotonic Estimation in Wicksell's Problem
Authors:
Francesco Gili,
Geurt Jongbloed,
Aad van der Vaart
Abstract:
We consider nonparametric estimation in Wicksell's problem which has relevant applications in astronomy for estimating the distribution of the positions of the stars in a galaxy given projected stellar positions and in material sciences to determine the 3D microstructure of a material, using its 2D cross sections. In the classical setting, we study the isotonized version of the plug-in estimator (…
▽ More
We consider nonparametric estimation in Wicksell's problem which has relevant applications in astronomy for estimating the distribution of the positions of the stars in a galaxy given projected stellar positions and in material sciences to determine the 3D microstructure of a material, using its 2D cross sections. In the classical setting, we study the isotonized version of the plug-in estimator (IIE) for the underlying cdf $F$ of the spheres' squared radii. This estimator is fully automatic, in the sense that it does not rely on tuning parameters, and we show it is adaptive to local smoothness properties of the distribution function $F$ to be estimated. Moreover, we prove a local asymptotic minimax lower bound in this non-standard setting, with $\sqrt{\log{n}/n}$-asymptotics and where the functional $F$ to be estimated is not regular. Combined, our results prove that the isotonic estimator (IIE) is an adaptive, easy-to-compute, and efficient estimator for estimating the underlying distribution function $F$.
△ Less
Submitted 18 December, 2023; v1 submitted 9 October, 2023;
originally announced October 2023.
-
Misspecified Bernstein-Von Mises theorem for hierarchical models
Authors:
Geerten Koers,
Botond Szabó,
Aad van der Vaart
Abstract:
We derive a Bernstein von-Mises theorem in the context of misspecified, non-i.i.d., hierarchical models parametrized by a finite-dimensional parameter of interest. We apply our results to hierarchical models containing non-linear operators, including the squared integral operator, and PDE-constrained inverse problems. More specifically, we consider the elliptic, time-independent Schrödinger equati…
▽ More
We derive a Bernstein von-Mises theorem in the context of misspecified, non-i.i.d., hierarchical models parametrized by a finite-dimensional parameter of interest. We apply our results to hierarchical models containing non-linear operators, including the squared integral operator, and PDE-constrained inverse problems. More specifically, we consider the elliptic, time-independent Schrödinger equation with parametric boundary condition and general parabolic PDEs with parametric potential and boundary constraints. Our theoretical results are complemented with numerical analysis on synthetic data sets, considering both the square integral operator and the Schrödinger equation.
△ Less
Submitted 15 August, 2023;
originally announced August 2023.
-
Bayesian sensitivity analysis for a missing data model
Authors:
Bart Eggen,
Stéphanie L. van der Pas,
Aad W. van der Vaart
Abstract:
In causal inference, sensitivity analysis is important to assess the robustness of study conclusions to key assumptions. We perform sensitivity analysis of the assumption that missing outcomes are missing completely at random. We follow a Bayesian approach, which is nonparametric for the outcome distribution and can be combined with an informative prior on the sensitivity parameter. We give insigh…
▽ More
In causal inference, sensitivity analysis is important to assess the robustness of study conclusions to key assumptions. We perform sensitivity analysis of the assumption that missing outcomes are missing completely at random. We follow a Bayesian approach, which is nonparametric for the outcome distribution and can be combined with an informative prior on the sensitivity parameter. We give insight in the posterior and provide theoretical guarantees in the form of Bernstein-von Mises theorems for estimating the mean outcome. We study different parametrisations of the model involving Dirichlet process priors on the distribution of the outcome and on the distribution of the outcome conditional on the subject being treated. We show that these parametrisations incorporate a prior on the sensitivity parameter in different ways and discuss the relative merits. We also present a simulation study, showing the performance of the methods in finite sample scenarios.
△ Less
Submitted 11 May, 2023;
originally announced May 2023.
-
Asymptotics of Caliper Matching Estimators for Average Treatment Effects
Authors:
Máté Kormos,
Stéphanie van der Pas,
Aad van der Vaart
Abstract:
Caliper matching is used to estimate causal effects of a binary treatment from observational data by comparing matched treated and control units. Units are matched when their propensity scores, the conditional probability of receiving treatment given pretreatment covariates, are within a certain distance called caliper. So far, theoretical results on caliper matching are lacking, leaving practitio…
▽ More
Caliper matching is used to estimate causal effects of a binary treatment from observational data by comparing matched treated and control units. Units are matched when their propensity scores, the conditional probability of receiving treatment given pretreatment covariates, are within a certain distance called caliper. So far, theoretical results on caliper matching are lacking, leaving practitioners with ad-hoc caliper choices and inference procedures. We bridge this gap by proposing a caliper that balances the quality and the number of matches. We prove that the resulting estimator of the average treatment effect, and average treatment effect on the treated, is asymptotically unbiased and normal at parametric rate. We describe the conditions under which semiparametric efficiency is obtainable, and show that when the parametric propensity score is estimated, the variance is increased for both estimands. Finally, we construct asymptotic confidence intervals for the two estimands.
△ Less
Submitted 17 April, 2023;
originally announced April 2023.
-
Bayesian tolerance regions with an application to linear mixed models
Authors:
X. Gregory Chen,
Aad van der Vaart
Abstract:
We review and contrast frequentist and Bayesian definitions of tolerance regions. We give conditions under which for large samples a Bayesian region also has frequentist validity, and study the latter for smaller samples in a simulation study. We discuss a computational strategy for computing a Bayesian two-sided tolerance interval for a Gaussian future variable, and apply this to the case of poss…
▽ More
We review and contrast frequentist and Bayesian definitions of tolerance regions. We give conditions under which for large samples a Bayesian region also has frequentist validity, and study the latter for smaller samples in a simulation study. We discuss a computational strategy for computing a Bayesian two-sided tolerance interval for a Gaussian future variable, and apply this to the case of possibly unbalanced linear mixed models. We illustrate the method on a quality control experiment from the pharmaceutical industry.
△ Less
Submitted 18 September, 2022;
originally announced September 2022.
-
Empirical and Full Bayes estimation of the type of a Pitman-Yor process
Authors:
S. E. M. P. Franssen,
A. W. van der Vaart
Abstract:
The Pitman-Yor process is a random discrete probability distribution of which the atoms can be used to model the relative abundance of species. The process is indexed by a type parameter $σ$, which controls the number of different species in a finite sample from a realization of the distribution. A random sample of size $n$ from the Pitman-Yor process of type $σ>0$ will contain of the order $n^σ$…
▽ More
The Pitman-Yor process is a random discrete probability distribution of which the atoms can be used to model the relative abundance of species. The process is indexed by a type parameter $σ$, which controls the number of different species in a finite sample from a realization of the distribution. A random sample of size $n$ from the Pitman-Yor process of type $σ>0$ will contain of the order $n^σ$ distinct values (``species''). In this paper we consider the estimation of the type parameter by both empirical Bayes and full Bayes methods. We derive the asymptotic normality of the empirical Bayes estimator and a Bernstein-von Mises theorem for the full Bayes posterior, in the frequentist setup that the observations are a random sample from a given true distribution. We also consider the estimation of the second parameter of the Pitman-Yor process, the prior precision. We apply our results to derive the limit behaviour of the likelihood ratio in a setting of forensic statistics.
△ Less
Submitted 30 August, 2022;
originally announced August 2022.
-
Statistical Inference in Parametric Preferential Attachment Trees
Authors:
Fengnan Gao,
Aad van der Vaart
Abstract:
The preferential attachment (PA) model is a popular way of modeling dynamic social networks, such as collaboration networks. Assuming that the PA function takes a parametric form, we propose and study the maximum likelihood estimator of the parameter. Using a supercritical continuous-time branching process framework, we prove the almost sure consistency and asymptotic normality of this estimator.…
▽ More
The preferential attachment (PA) model is a popular way of modeling dynamic social networks, such as collaboration networks. Assuming that the PA function takes a parametric form, we propose and study the maximum likelihood estimator of the parameter. Using a supercritical continuous-time branching process framework, we prove the almost sure consistency and asymptotic normality of this estimator. We also provide an estimator that only depends on the final snapshot of the network and prove its consistency, and its asymptotic normality under general conditions. We compare the performance of the estimators to a nonparametric estimator in a small simulation study.
△ Less
Submitted 16 August, 2022; v1 submitted 1 November, 2021;
originally announced November 2021.
-
Semi-supervised empirical Bayes group-regularized factor regression
Authors:
Magnus M. Münch,
Mark A. van de Wiel,
Aad W. van der Vaart,
Carel F. W. Peeters
Abstract:
The features in high dimensional biomedical prediction problems are often well described with lower dimensional manifolds. An example is genes that are organised in smaller functional networks. The outcome can then be described with the factor regression model. A benefit of the factor model is that is allows for straightforward inclusion of unlabeled observations in the estimation of the model, i.…
▽ More
The features in high dimensional biomedical prediction problems are often well described with lower dimensional manifolds. An example is genes that are organised in smaller functional networks. The outcome can then be described with the factor regression model. A benefit of the factor model is that is allows for straightforward inclusion of unlabeled observations in the estimation of the model, i.e., semi-supervised learning. In addition, the high dimensional features in biomedical prediction problems are often well characterised. Examples are genes, for which annotation is available, and metabolites with $p$-values from a previous study available. In this paper, the extra information on the features is included in the prior model for the features. The extra information is weighted and included in the estimation through empirical Bayes, with Variational approximations to speed up the computation. The method is demonstrated in simulations and two applications. One application considers influenza vaccine efficacy prediction based on microarray data. The second application predictions oral cancer metastatsis from RNAseq data.
△ Less
Submitted 6 April, 2021;
originally announced April 2021.
-
The Bernstein-von Mises theorem for the Pitman-Yor process of nonnegative type
Authors:
S. E. M. P. Franssen,
A. W. van der Vaart
Abstract:
The Pitman-Yor process is a random probability distribution, that can be used as a prior distribution in a nonparametric Bayesian analysis. The process is of species sampling type and generates discrete distributions, which yield of the order $n^σ$ different values ("species") in a random sample of size $n$, if the type $σ$ is positive. Thus this type parameter can be set to target true distributi…
▽ More
The Pitman-Yor process is a random probability distribution, that can be used as a prior distribution in a nonparametric Bayesian analysis. The process is of species sampling type and generates discrete distributions, which yield of the order $n^σ$ different values ("species") in a random sample of size $n$, if the type $σ$ is positive. Thus this type parameter can be set to target true distributions of various levels of discreteness, making the Pitman-Yor process an interesting prior in this case. It was previously shown that the resulting posterior distribution is consistent if and only if the true distribution of the data is discrete. In this paper we derive the distributional limit of the posterior distribution, in the form of a (corrected) Bernstein-von Mises theorem, which previously was known only in the continuous, inconsistent case. It turns out that the Pitman-Yor posterior distribution has good behaviour if the true distribution of the data is discrete with atoms that decrease not too slowly. Credible sets derived from the posterior distribution provide valid frequentist confidence sets in this case. For a general discrete distribution, the posterior distribution, although consistent, may contain a bias which does not converge to zero at the $\sqrt{n}$ rate and invalidates posterior inference. We propose a bias correction that solves this problem. We also consider the effect of estimating the type parameter from the data, both by empirical Bayes and full Bayes methods. In a small simulation study we illustrate that without bias correction the coverage of credible sets can be arbitrarily low, also for some discrete distributions.
△ Less
Submitted 9 December, 2021; v1 submitted 11 February, 2021;
originally announced February 2021.
-
On the Bernstein-von Mises theorem for the Dirichlet process
Authors:
Kolyan Ray,
Aad van der Vaart
Abstract:
We establish that Laplace transforms of the posterior Dirichlet process converge to those of the limiting Brownian bridge process in a neighbourhood about zero, uniformly over Glivenko-Cantelli function classes. For real-valued random variables and functions of bounded variation, we strengthen this result to hold for all real numbers. This last result is proved via an explicit strong approximation…
▽ More
We establish that Laplace transforms of the posterior Dirichlet process converge to those of the limiting Brownian bridge process in a neighbourhood about zero, uniformly over Glivenko-Cantelli function classes. For real-valued random variables and functions of bounded variation, we strengthen this result to hold for all real numbers. This last result is proved via an explicit strong approximation coupling inequality.
△ Less
Submitted 25 March, 2021; v1 submitted 3 August, 2020;
originally announced August 2020.
-
Incorporating prior information and borrowing information in high-dimensional sparse regression using the horseshoe and variational Bayes
Authors:
Gino B. Kpogbezan,
Mark A. van de Wiel,
Wessel N. van Wieringen,
Aad W. van der Vaart
Abstract:
We introduce a sparse high-dimensional regression approach that can incorporate prior information on the regression parameters and can borrow information across a set of similar datasets. Prior information may for instance come from previous studies or genomic databases, and information borrowed across a set of genes or genomic networks. The approach is based on prior modelling of the regression p…
▽ More
We introduce a sparse high-dimensional regression approach that can incorporate prior information on the regression parameters and can borrow information across a set of similar datasets. Prior information may for instance come from previous studies or genomic databases, and information borrowed across a set of genes or genomic networks. The approach is based on prior modelling of the regression parameters using the horseshoe prior, with a prior on the sparsity index that depends on external information. Multiple datasets are integrated by applying an empirical Bayes strategy on hyperparameters. For computational efficiency we approximate the posterior distribution using a variational Bayes method. The proposed framework is useful for analysing large-scale data sets with complex dependence structures. We illustrate this by applications to the reconstruction of gene regulatory networks and to eQTL map**.
△ Less
Submitted 29 January, 2019;
originally announced January 2019.
-
Semiparametric Bayesian causal inference
Authors:
Kolyan Ray,
Aad van der Vaart
Abstract:
We develop a semiparametric Bayesian approach for estimating the mean response in a missing data model with binary outcomes and a nonparametrically modelled propensity score. Equivalently we estimate the causal effect of a treatment, correcting nonparametrically for confounding. We show that standard Gaussian process priors satisfy a semiparametric Bernstein-von Mises theorem under smoothness cond…
▽ More
We develop a semiparametric Bayesian approach for estimating the mean response in a missing data model with binary outcomes and a nonparametrically modelled propensity score. Equivalently we estimate the causal effect of a treatment, correcting nonparametrically for confounding. We show that standard Gaussian process priors satisfy a semiparametric Bernstein-von Mises theorem under smoothness conditions. We further propose a novel propensity score-dependent prior that provides efficient inference under strictly weaker conditions. We also show that it is theoretically preferable to model the covariate distribution with a Dirichlet process or Bayesian bootstrap, rather than modelling the covariate density using a Gaussian process prior.
△ Less
Submitted 10 October, 2019; v1 submitted 13 August, 2018;
originally announced August 2018.
-
Adaptive group-regularized logistic elastic net regression
Authors:
Magnus M. Münch,
Carel F. W. Peeters,
Aad W. van der Vaart,
Mark A. van de Wiel
Abstract:
In high-dimensional data settings, additional information on the features is often available. Examples of such external information in omics research are: (a) p-values from a previous study, (b) a summary of prior information, and (c) omics annotation. The inclusion of this information in the analysis may enhance classification performance and feature selection, but is not straightforward in the s…
▽ More
In high-dimensional data settings, additional information on the features is often available. Examples of such external information in omics research are: (a) p-values from a previous study, (b) a summary of prior information, and (c) omics annotation. The inclusion of this information in the analysis may enhance classification performance and feature selection, but is not straightforward in the standard regression setting. As a solution to this problem, we propose a group-regularized (logistic) elastic net regression method, where each penalty parameter corresponds to a group of features based on the external information. The method, termed gren, makes use of the Bayesian formulation of logistic elastic net regression to estimate both the model and penalty parameters in an approximate empirical-variational Bayes framework. Simulations and an application to a colon cancer microRNA study show that, if the partitioning of the features is informative, classification performance and feature selection are indeed enhanced.
△ Less
Submitted 1 May, 2018;
originally announced May 2018.
-
Bayesian inverse problems with partial observations
Authors:
Shota Gugushvili,
Aad van der Vaart,
Dong Yan
Abstract:
We study a nonparametric Bayesian approach to linear inverse problems under discrete observations. We use the discrete Fourier transform to convert our model into a truncated Gaussian sequence model, that is closely related to the classical Gaussian sequence model. Upon placing the truncated series prior on the unknown parameter, we show that as the number of observations $n\rightarrow\infty,$ the…
▽ More
We study a nonparametric Bayesian approach to linear inverse problems under discrete observations. We use the discrete Fourier transform to convert our model into a truncated Gaussian sequence model, that is closely related to the classical Gaussian sequence model. Upon placing the truncated series prior on the unknown parameter, we show that as the number of observations $n\rightarrow\infty,$ the corresponding posterior distribution contracts around the true parameter at a rate depending on the smoothness of the true parameter and the prior, and the ill-posedness degree of the problem. Correct combinations of these values lead to optimal posterior contraction rates (up to logarithmic factors). Similarly, the frequentist coverage of Bayesian credible sets is shown to be dependent on a combination of smoothness of the true parameter and the prior, and the ill-posedness of the problem. Oversmoothing priors lead to zero coverage, while undersmoothing priors produce highly conservative results. Finally, we illustrate our theoretical results by numerical examples.
△ Less
Submitted 18 October, 2018; v1 submitted 25 February, 2018;
originally announced February 2018.
-
Bayesian linear inverse problems in regularity scales
Authors:
Shota Gugushvili,
Aad van der Vaart,
Dong Yan
Abstract:
We obtain rates of contraction of posterior distributions in inverse problems defined by scales of smoothness classes. We derive abstract results for general priors, with contraction rates determined by Galerkin approximation. The rate depends on the amount of prior concentration near the true function and the prior mass of functions with inferior Galerkin approximation. We apply the general resul…
▽ More
We obtain rates of contraction of posterior distributions in inverse problems defined by scales of smoothness classes. We derive abstract results for general priors, with contraction rates determined by Galerkin approximation. The rate depends on the amount of prior concentration near the true function and the prior mass of functions with inferior Galerkin approximation. We apply the general result to non-conjugate series priors, showing that these priors give near optimal and adaptive recovery in some generality, Gaussian priors, and mixtures of Gaussian priors, where the latter are also shown to be near optimal and adaptive. The proofs are based on general testing and approximation arguments, without explicit calculations on the posterior distribution. We are thus not restricted to priors based on the singular value decomposition of the operator. We illustrate the results with examples of inverse problems resulting from differential equations.
△ Less
Submitted 3 December, 2019; v1 submitted 25 February, 2018;
originally announced February 2018.
-
The Bayes Lepski's Method and Credible Bands through Volume of Tubular Neighborhoods
Authors:
William Weimin Yoo,
Aad W. van der Vaart
Abstract:
For a general class of priors based on random series basis expansion, we develop the Bayes Lepski's method to estimate unknown regression function. In this approach, the series truncation point is determined based on a stop** rule that balances the posterior mean bias and the posterior standard deviation. Equipped with this mechanism, we present a method to construct adaptive Bayesian credible b…
▽ More
For a general class of priors based on random series basis expansion, we develop the Bayes Lepski's method to estimate unknown regression function. In this approach, the series truncation point is determined based on a stop** rule that balances the posterior mean bias and the posterior standard deviation. Equipped with this mechanism, we present a method to construct adaptive Bayesian credible bands, where this statistical task is reformulated into a problem in geometry, and the band's radius is computed based on finding the volume of certain tubular neighborhood embedded on a unit sphere. We consider two special cases involving B-splines and wavelets, and discuss some interesting consequences such as the uncertainty principle and self-similarity. Lastly, we show how to program the Bayes Lepski stop** rule on a computer, and numerical simulations in conjunction with our theoretical investigations concur that this is a promising Bayesian uncertainty quantification procedure.
△ Less
Submitted 18 November, 2017;
originally announced November 2017.
-
Consistent Estimation in General Sublinear Preferential Attachment Trees
Authors:
Fengnan Gao,
Aad van der Vaart,
Rui Castro,
Remco van der Hofstad
Abstract:
We propose an empirical estimator of the preferential attachment function $f$ in the setting of general preferential attachment trees. Using a supercritical continuous-time branching process framework, we prove the almost sure consistency of the proposed estimator. We perform simulations to study the empirical properties of our estimators.
We propose an empirical estimator of the preferential attachment function $f$ in the setting of general preferential attachment trees. Using a supercritical continuous-time branching process framework, we prove the almost sure consistency of the proposed estimator. We perform simulations to study the empirical properties of our estimators.
△ Less
Submitted 23 June, 2017;
originally announced June 2017.
-
Adaptive posterior contraction rates for the horseshoe
Authors:
Stéphanie van der Pas,
Botond Szabó,
Aad van der Vaart
Abstract:
We investigate the frequentist properties of Bayesian procedures for estimation based on the horseshoe prior in the sparse multivariate normal means model. Previous theoretical results assumed that the sparsity level, that is, the number of signals, was known. We drop this assumption and characterize the behavior of the maximum marginal likelihood estimator (MMLE) of a key parameter of the horsesh…
▽ More
We investigate the frequentist properties of Bayesian procedures for estimation based on the horseshoe prior in the sparse multivariate normal means model. Previous theoretical results assumed that the sparsity level, that is, the number of signals, was known. We drop this assumption and characterize the behavior of the maximum marginal likelihood estimator (MMLE) of a key parameter of the horseshoe prior. We prove that the MMLE is an effective estimator of the sparsity level, in the sense that it leads to (near) minimax optimal estimation of the underlying mean vector generating the data. Besides this empirical Bayes procedure, we consider the hierarchical Bayes method of putting a prior on the unknown sparsity level as well. We show that both Bayesian techniques lead to rate-adaptive optimal posterior contraction, which implies that the horseshoe posterior is a good candidate for generating rate-adaptive credible sets.
△ Less
Submitted 13 February, 2017;
originally announced February 2017.
-
Bayesian Community Detection
Authors:
Stéphanie van der Pas,
Aad van der Vaart
Abstract:
We introduce a Bayesian estimator of the underlying class structure in the stochastic block model, when the number of classes is known. The estimator is the posterior mode corresponding to a Dirichlet prior on the class proportions, a generalized Bernoulli prior on the class labels, and a beta prior on the edge probabilities. We show that this estimator is strongly consistent when the expected deg…
▽ More
We introduce a Bayesian estimator of the underlying class structure in the stochastic block model, when the number of classes is known. The estimator is the posterior mode corresponding to a Dirichlet prior on the class proportions, a generalized Bernoulli prior on the class labels, and a beta prior on the edge probabilities. We show that this estimator is strongly consistent when the expected degree is at least of order $\log^2{n}$, where $n$ is the number of nodes in the network.
△ Less
Submitted 15 August, 2016;
originally announced August 2016.
-
Uncertainty quantification for the horseshoe
Authors:
Stéphanie van der Pas,
Botond Szabó,
Aad van der Vaart
Abstract:
We investigate the credible sets and marginal credible intervals resulting from the horseshoe prior in the sparse multivariate normal means model. We do so in an adaptive setting without assuming knowledge of the sparsity level (number of signals). We consider both the hierarchical Bayes method of putting a prior on the unknown sparsity level and the empirical Bayes method with the sparsity level…
▽ More
We investigate the credible sets and marginal credible intervals resulting from the horseshoe prior in the sparse multivariate normal means model. We do so in an adaptive setting without assuming knowledge of the sparsity level (number of signals). We consider both the hierarchical Bayes method of putting a prior on the unknown sparsity level and the empirical Bayes method with the sparsity level estimated by maximum marginal likelihood. We show that credible balls and marginal credible intervals have good frequentist coverage and optimal size if the sparsity level of the prior is set correctly. By general theory honest confidence sets cannot adapt in size to an unknown sparsity level. Accordingly the hierarchical and empirical Bayes credible sets based on the horseshoe prior are not honest over the full parameter space. We show that this is due to over-shrinkage for certain parameters and characterise the set of parameters for which credible balls and marginal credible intervals do give correct uncertainty quantification. In particular we show that the fraction of false discoveries by the marginal Bayesian procedure is controlled by a correct choice of cut-off.
△ Less
Submitted 13 February, 2017; v1 submitted 7 July, 2016;
originally announced July 2016.
-
An empirical Bayes approach to network recovery using external knowledge
Authors:
Gino B. Kpogbezan,
Aad W. van der Vaart,
Wessel N. van Wieringen,
Gwenaël G. R. Leday,
Mark A. van de Wiel
Abstract:
Reconstruction of a high-dimensional network may benefit substantially from the inclusion of prior knowledge on the network topology. In the case of gene interaction networks such knowledge may come for instance from pathway repositories like KEGG, or be inferred from data of a pilot study. The Bayesian framework provides a natural means of including such prior knowledge. Based on a Bayesian Simul…
▽ More
Reconstruction of a high-dimensional network may benefit substantially from the inclusion of prior knowledge on the network topology. In the case of gene interaction networks such knowledge may come for instance from pathway repositories like KEGG, or be inferred from data of a pilot study. The Bayesian framework provides a natural means of including such prior knowledge. Based on a Bayesian Simultaneous Equation Model, we develop an appealing empirical Bayes procedure which automatically assesses the relevance of the used prior knowledge. We use variational Bayes method for posterior densities approximation and compare its accuracy with that of Gibbs sampling strategy. Our method is computationally fast, and can outperform known competitors. In a simulation study we show that accurate prior data can greatly improve the reconstruction of the network, but need not harm the reconstruction if wrong. We demonstrate the benefits of the method in an analysis of gene expression data from GEO. In particular, the edges of the recovered network have superior reproducibility (compared to that of competitors) over resampled versions of the data.
△ Less
Submitted 24 May, 2016;
originally announced May 2016.
-
On the Asymptotic Normality of Estimating the Affine Preferential Attachment Network Models with Random Initial Degrees
Authors:
Fengnan Gao,
Aad van der Vaart
Abstract:
We consider the estimation of the affine parameter (and power-law exponent) in the preferential attachment model with random initial degrees. We derive the likelihood, and show that the maximum likelihood estimator (MLE) is asymptotically normal and efficient. We also propose a quasi-maximum-likelihood estimator (QMLE) to overcome the MLE's dependence on the history of the initial degrees. To demo…
▽ More
We consider the estimation of the affine parameter (and power-law exponent) in the preferential attachment model with random initial degrees. We derive the likelihood, and show that the maximum likelihood estimator (MLE) is asymptotically normal and efficient. We also propose a quasi-maximum-likelihood estimator (QMLE) to overcome the MLE's dependence on the history of the initial degrees. To demonstrate the power of our idea, we present numerical simulations.
△ Less
Submitted 8 March, 2017; v1 submitted 8 March, 2016;
originally announced March 2016.
-
Technical Report: Higher Order Influence Functions and Minimax Estimation of Nonlinear Functionals
Authors:
James Robins,
Lingling Li,
Eric Tchetgen Tchetgen,
Aad van der Vaart
Abstract:
Robins et al, 2008, published a theory of higher order influence functions for inference in semi- and non-parametric models. This paper is a comprehensive manuscript from which Robins et al, was drawn. The current paper includes many results and proofs that were not included in Robins et al due to space limitation. Particular results contained in the present paper that were not reported in Robins…
▽ More
Robins et al, 2008, published a theory of higher order influence functions for inference in semi- and non-parametric models. This paper is a comprehensive manuscript from which Robins et al, was drawn. The current paper includes many results and proofs that were not included in Robins et al due to space limitation. Particular results contained in the present paper that were not reported in Robins et al include the following. Given a set of functionals and their corresponding higher order influence functions, we show how to derive the higher order influence function of their product. We apply this result to obtain higher order influence functions and associated estimators for the mean of a response Y subject to monotone missingness under missing at random. These results also apply to estimating the causal effect of a time dependent treatment on an outcome Y in the presence of time-varying confounding. Finally, we include an appendix that contains proofs for all theorems that were stated without proof in Robins et al, 2008. The initial part of the paper is closely related to Robins et al, the latter parts differ.
△ Less
Submitted 6 January, 2016;
originally announced January 2016.
-
Asymptotic Normality of Quadratic Estimators
Authors:
James Robins,
Lingling Li,
Eric Tchetgen Tchetgen,
Aad van der Vaart
Abstract:
We prove conditional asymptotic normality of a class of quadratic U-statistics that are dominated by their degenerate second order part and have kernels that change with the number of observations. These statistics arise in the construction of estimators in high-dimensional semi- and non-parametric models, and in the construction of nonparametric confidence sets. This is illustrated by estimation…
▽ More
We prove conditional asymptotic normality of a class of quadratic U-statistics that are dominated by their degenerate second order part and have kernels that change with the number of observations. These statistics arise in the construction of estimators in high-dimensional semi- and non-parametric models, and in the construction of nonparametric confidence sets. This is illustrated by estimation of the integral of a square of a density or regression function, and estimation of the mean response with missing data. We show that estimators are asymptotically normal even in the case that the rate is slower than the square root of the observations.
△ Less
Submitted 7 December, 2015;
originally announced December 2015.
-
Higher Order Estimating Equations for High-dimensional Models
Authors:
James Robins,
Lingling Li,
Rajarshi Mukherjee,
Eric Tchetgen Tchetgen,
Aad van der Vaart
Abstract:
We introduce a new method of estimation of parameters in semiparametric and nonparametric models. The method is based on estimating equations that are $U$-statistics in the observations. The $U$-statistics are based on higher order influence functions that extend ordinary linear influence functions of the parameter of interest, and represent higher derivatives of this parameter. For parameters for…
▽ More
We introduce a new method of estimation of parameters in semiparametric and nonparametric models. The method is based on estimating equations that are $U$-statistics in the observations. The $U$-statistics are based on higher order influence functions that extend ordinary linear influence functions of the parameter of interest, and represent higher derivatives of this parameter. For parameters for which the representation cannot be perfect the method leads to a bias-variance trade-off, and results in estimators that converge at a slower than $\sqrt n$-rate. In a number of examples the resulting rate can be shown to be optimal. We are particularly interested in estimating parameters in models with a nuisance parameter of high dimension or low regularity, where the parameter of interest cannot be estimated at $\sqrt n$-rate, but we also consider efficient $\sqrt n$-estimation using novel nonlinear estimators. The general approach is applied in detail to the example of estimating a mean response when the response is not always observed.
△ Less
Submitted 13 July, 2023; v1 submitted 7 December, 2015;
originally announced December 2015.
-
Gene network reconstruction using global-local shrinkage priors
Authors:
Gwenaël G. R. Leday,
Mathisca C. M. de Gunst,
Gino B. Kpogbezan,
Aad W. Van der Vaart,
Wessel N. Van Wieringen,
Mark A. Van de Wiel
Abstract:
Reconstructing a gene network from high-throughput molecular data is often a challenging task, as the number of parameters to estimate easily is much larger than the sample size. A conventional remedy is to regularize or penalize the model likelihood. In network models, this is often done locally in the neighbourhood of each node or gene. However, estimation of the many regularization parameters i…
▽ More
Reconstructing a gene network from high-throughput molecular data is often a challenging task, as the number of parameters to estimate easily is much larger than the sample size. A conventional remedy is to regularize or penalize the model likelihood. In network models, this is often done locally in the neighbourhood of each node or gene. However, estimation of the many regularization parameters is often difficult and can result in large statistical uncertainties. In this paper we propose to combine local regularization with global shrinkage of the regularization parameters to borrow strength between genes and improve inference. We employ a simple Bayesian model with non-sparse, conjugate priors to facilitate the use of fast variational approximations to posteriors. We discuss empirical Bayes estimation of hyper-parameters of the priors, and propose a novel approach to rank-based posterior thresholding. Using extensive model- and data-based simulations, we demonstrate that the proposed inference strategy outperforms popular (sparse) methods, yields more stable edges, and is more reproducible.
△ Less
Submitted 13 October, 2015;
originally announced October 2015.
-
Rejoinder to discussions of "Frequentist coverage of adaptive nonparametric Bayesian credible sets"
Authors:
Botond Szabó,
A. W. van der Vaart,
J. H. van Zanten
Abstract:
Rejoinder of "Frequentist coverage of adaptive nonparametric Bayesian credible sets" by Szabó, van der Vaart and van Zanten [arXiv:1310.4489v5].
Rejoinder of "Frequentist coverage of adaptive nonparametric Bayesian credible sets" by Szabó, van der Vaart and van Zanten [arXiv:1310.4489v5].
△ Less
Submitted 7 September, 2015;
originally announced September 2015.
-
Posterior contraction rates for deconvolution of Dirichlet-Laplace mixtures
Authors:
Fengnan Gao,
Aad van der Vaart
Abstract:
We study nonparametric Bayesian inference with location mixtures of the Laplace density and a Dirichlet process prior on the mixing distribution. We derive a contraction rate of the corresponding posterior distribution, both for the mixing distribution relative to the Wasserstein metric and for the mixed density relative to the Hellinger and $L_q$ metrics.
We study nonparametric Bayesian inference with location mixtures of the Laplace density and a Dirichlet process prior on the mixing distribution. We derive a contraction rate of the corresponding posterior distribution, both for the mixing distribution relative to the Wasserstein metric and for the mixed density relative to the Hellinger and $L_q$ metrics.
△ Less
Submitted 26 January, 2016; v1 submitted 27 July, 2015;
originally announced July 2015.
-
A General Framework for Bayes Structured Linear Models
Authors:
Chao Gao,
Aad W. van der Vaart,
Harrison H. Zhou
Abstract:
High dimensional statistics deals with the challenge of extracting structured information from complex model settings. Compared with the growing number of frequentist methodologies, there are rather few theoretically optimal Bayes methods that can deal with very general high dimensional models. In contrast, Bayes methods have been extensively studied in various nonparametric settings and rate opti…
▽ More
High dimensional statistics deals with the challenge of extracting structured information from complex model settings. Compared with the growing number of frequentist methodologies, there are rather few theoretically optimal Bayes methods that can deal with very general high dimensional models. In contrast, Bayes methods have been extensively studied in various nonparametric settings and rate optimal posterior contraction results have been established. This paper provides a unified approach to both Bayes high dimensional statistics and Bayes nonparametrics in a general framework of structured linear models. With the proposed two-step model selection prior, we prove a general theorem of posterior contraction under an abstract setting. The main theorem can be used to derive new results on optimal posterior contraction under many complex model settings including stochastic block model, graphon estimation and dictionary learning. It can also be used to re-derive optimal posterior contraction for problems such as sparse linear regression and nonparametric aggregation, which improve upon previous Bayes results for these problems. The key of the success lies in the proposed two-step prior distribution. The prior on the parameters is an elliptical Laplace distribution that is capable to model signals with large magnitude, and the prior on the models involves an important correction factor that compensates the effect of the normalizing constant of the elliptical Laplace distribution.
△ Less
Submitted 19 August, 2018; v1 submitted 6 June, 2015;
originally announced June 2015.
-
Adaptive Bayesian credible sets in regression with a Gaussian process prior
Authors:
Suzanne Sniekers,
Aad van der Vaart
Abstract:
We investigate two empirical Bayes methods and a hierarchical Bayes method for adapting the scale of a Gaussian process prior in a nonparametric regression model. We show that all methods lead to a posterior contraction rate that adapts to the smoothness of the true regression function. Furthermore, we show that the corresponding credible sets cover the true regression function whenever this funct…
▽ More
We investigate two empirical Bayes methods and a hierarchical Bayes method for adapting the scale of a Gaussian process prior in a nonparametric regression model. We show that all methods lead to a posterior contraction rate that adapts to the smoothness of the true regression function. Furthermore, we show that the corresponding credible sets cover the true regression function whenever this function satisfies a certain extrapolation condition. This condition depends on the specific method, but is implied by a condition of self-similarity. The latter condition is shown to be satisfied with probability one under the prior distribution.
△ Less
Submitted 29 April, 2015;
originally announced April 2015.
-
Higher Order Tangent Spaces and Influence Functions
Authors:
Aad van der Vaart
Abstract:
We review higher order tangent spaces and influence functions and their use to construct minimax efficient estimators for parameters in high-dimensional semiparametric models.
We review higher order tangent spaces and influence functions and their use to construct minimax efficient estimators for parameters in high-dimensional semiparametric models.
△ Less
Submitted 3 February, 2015;
originally announced February 2015.
-
The Horseshoe Estimator: Posterior Concentration around Nearly Black Vectors
Authors:
S. L. van der Pas,
B. J. K. Kleijn,
A. W. van der Vaart
Abstract:
We consider the horseshoe estimator due to Carvalho, Polson and Scott (2010) for the multivariate normal mean model in the situation that the mean vector is sparse in the nearly black sense. We assume the frequentist framework where the data is generated according to a fixed mean vector. We show that if the number of nonzero parameters of the mean vector is known, the horseshoe estimator attains t…
▽ More
We consider the horseshoe estimator due to Carvalho, Polson and Scott (2010) for the multivariate normal mean model in the situation that the mean vector is sparse in the nearly black sense. We assume the frequentist framework where the data is generated according to a fixed mean vector. We show that if the number of nonzero parameters of the mean vector is known, the horseshoe estimator attains the minimax $\ell_2$ risk, possibly up to a multiplicative constant. We provide conditions under which the horseshoe estimator combined with an empirical Bayes estimate of the number of nonzero means still yields the minimax risk. We furthermore prove an upper bound on the rate of contraction of the posterior distribution around the horseshoe estimator, and a lower bound on the posterior variance. These bounds indicate that the posterior distribution of the horseshoe prior may be more informative than that of other one-component priors, including the Lasso.
△ Less
Submitted 15 December, 2014; v1 submitted 1 April, 2014;
originally announced April 2014.
-
Bayesian linear regression with sparse priors
Authors:
Ismaël Castillo,
Johannes Schmidt-Hieber,
Aad van der Vaart
Abstract:
We study full Bayesian procedures for high-dimensional linear regression under sparsity constraints. The prior is a mixture of point masses at zero and continuous distributions. Under compatibility conditions on the design matrix, the posterior distribution is shown to contract at the optimal rate for recovery of the unknown sparse vector, and to give optimal prediction of the response vector. It…
▽ More
We study full Bayesian procedures for high-dimensional linear regression under sparsity constraints. The prior is a mixture of point masses at zero and continuous distributions. Under compatibility conditions on the design matrix, the posterior distribution is shown to contract at the optimal rate for recovery of the unknown sparse vector, and to give optimal prediction of the response vector. It is also shown to select the correct sparse model, or at least the coefficients that are significantly different from zero. The asymptotic shape of the posterior distribution is characterized and employed to the construction and study of credible sets for uncertainty quantification.
△ Less
Submitted 14 October, 2015; v1 submitted 4 March, 2014;
originally announced March 2014.
-
Modeling association between DNA copy number and gene expression with constrained piecewise linear regression splines
Authors:
Gwenaël G. R. Leday,
Aad W. van der Vaart,
Wessel N. van Wieringen,
Mark A. van de Wiel
Abstract:
DNA copy number and mRNA expression are widely used data types in cancer studies, which combined provide more insight than separately. Whereas in existing literature the form of the relationship between these two types of markers is fixed a priori, in this paper we model their association. We employ piecewise linear regression splines (PLRS), which combine good interpretation with sufficient flexi…
▽ More
DNA copy number and mRNA expression are widely used data types in cancer studies, which combined provide more insight than separately. Whereas in existing literature the form of the relationship between these two types of markers is fixed a priori, in this paper we model their association. We employ piecewise linear regression splines (PLRS), which combine good interpretation with sufficient flexibility to identify any plausible type of relationship. The specification of the model leads to estimation and model selection in a constrained, nonstandard setting. We provide methodology for testing the effect of DNA on mRNA and choosing the appropriate model. Furthermore, we present a novel approach to obtain reliable confidence bands for constrained PLRS, which incorporates model uncertainty. The procedures are applied to colorectal and breast cancer data. Common assumptions are found to be potentially misleading for biologically relevant genes. More flexible models may bring more insight in the interaction between the two markers.
△ Less
Submitted 6 December, 2013;
originally announced December 2013.
-
Honest Bayesian confidence sets for the L2-norm
Authors:
Botond Szabo,
Aad van der Vaart,
Harry van Zanten
Abstract:
We investigate the problem of constructing Bayesian credible sets that are honest and adaptive for the L2-loss over a scale of Sobolev classes with regularity ranging between [D; 2D], for some given D in the context of the signal-in-white-noise model. We consider a scale of prior distributions indexed by a regularity hyper-parameter and choose the hyper-parameter both by marginal likelihood empiri…
▽ More
We investigate the problem of constructing Bayesian credible sets that are honest and adaptive for the L2-loss over a scale of Sobolev classes with regularity ranging between [D; 2D], for some given D in the context of the signal-in-white-noise model. We consider a scale of prior distributions indexed by a regularity hyper-parameter and choose the hyper-parameter both by marginal likelihood empirical Bayes and by hierarchical Bayes method, respectively. Next we consider a ball centered around the corresponding posterior mean with prescribed posterior probability. We show by theory and examples that both the empirical Bayes and the hierarchical Bayes credible sets give misleading, overconfident uncertainty quantification for certain oddly behaving truth. Then we construct a new empirical Bayes method based on risk estimation, which provides the correct uncertainty quantification and optimal size.
△ Less
Submitted 23 April, 2014; v1 submitted 29 November, 2013;
originally announced November 2013.
-
Frequentist coverage of adaptive nonparametric Bayesian credible sets
Authors:
Botond Szabó,
A. W. van der Vaart,
J. H. van Zanten
Abstract:
We investigate the frequentist coverage of Bayesian credible sets in a nonparametric setting. We consider a scale of priors of varying regularity and choose the regularity by an empirical Bayes method. Next we consider a central set of prescribed posterior probability in the posterior distribution of the chosen regularity. We show that such an adaptive Bayes credible set gives correct uncertainty…
▽ More
We investigate the frequentist coverage of Bayesian credible sets in a nonparametric setting. We consider a scale of priors of varying regularity and choose the regularity by an empirical Bayes method. Next we consider a central set of prescribed posterior probability in the posterior distribution of the chosen regularity. We show that such an adaptive Bayes credible set gives correct uncertainty quantification of "polished tail" parameters, in the sense of high probability of coverage of such parameters. On the negative side, we show by theory and example that adaptation of the prior necessarily leads to gross and haphazard uncertainty quantification for some true parameters that are still within the hyperrectangle regularity scale.
△ Less
Submitted 4 September, 2015; v1 submitted 16 October, 2013;
originally announced October 2013.
-
Needles and Straw in a Haystack: Posterior concentration for possibly sparse sequences
Authors:
Ismaël Castillo,
Aad van der Vaart
Abstract:
We consider full Bayesian inference in the multivariate normal mean model in the situation that the mean vector is sparse. The prior distribution on the vector of means is constructed hierarchically by first choosing a collection of nonzero means and next a prior on the nonzero values. We consider the posterior distribution in the frequentist set-up that the observations are generated according to…
▽ More
We consider full Bayesian inference in the multivariate normal mean model in the situation that the mean vector is sparse. The prior distribution on the vector of means is constructed hierarchically by first choosing a collection of nonzero means and next a prior on the nonzero values. We consider the posterior distribution in the frequentist set-up that the observations are generated according to a fixed mean vector, and are interested in the posterior distribution of the number of nonzero components and the contraction of the posterior distribution to the true mean vector. We find various combinations of priors on the number of nonzero coefficients and on these coefficients that give desirable performance. We also find priors that give suboptimal convergence, for instance, Gaussian priors on the nonzero coefficients. We illustrate the results by simulations.
△ Less
Submitted 6 November, 2012;
originally announced November 2012.
-
Bayes procedures for adaptive inference in inverse problems for the white noise model
Authors:
B. T. Knapik,
B. T. Szabó,
A. W. van der Vaart,
J. H. van Zanten
Abstract:
We study empirical and hierarchical Bayes approaches to the problem of estimating an infinite-dimensional parameter in mildly ill-posed inverse problems. We consider a class of prior distributions indexed by a hyperparameter that quantifies regularity. We prove that both methods we consider succeed in automatically selecting this parameter optimally, resulting in optimal convergence rates for trut…
▽ More
We study empirical and hierarchical Bayes approaches to the problem of estimating an infinite-dimensional parameter in mildly ill-posed inverse problems. We consider a class of prior distributions indexed by a hyperparameter that quantifies regularity. We prove that both methods we consider succeed in automatically selecting this parameter optimally, resulting in optimal convergence rates for truths with Sobolev or analytic "smoothness", without using knowledge about this regularity. Both methods are illustrated by simulation examples.
△ Less
Submitted 29 May, 2013; v1 submitted 17 September, 2012;
originally announced September 2012.
-
Bayesian recovery of the initial condition for the heat equation
Authors:
B. T. Knapik,
A. W. van der Vaart,
J. H. van Zanten
Abstract:
We study a Bayesian approach to recovering the initial condition for the heat equation from noisy observations of the solution at a later time. We consider a class of prior distributions indexed by a parameter quantifying "smoothness" and show that the corresponding posterior distributions contract around the true parameter at a rate that depends on the smoothness of the true initial condition and…
▽ More
We study a Bayesian approach to recovering the initial condition for the heat equation from noisy observations of the solution at a later time. We consider a class of prior distributions indexed by a parameter quantifying "smoothness" and show that the corresponding posterior distributions contract around the true parameter at a rate that depends on the smoothness of the true initial condition and the smoothness and scale of the prior. Correct combinations of these characteristics lead to the optimal minimax rate. One type of priors leads to a rate-adaptive Bayesian procedure. The frequentist coverage of credible sets is shown to depend on the combination of the prior and true parameter as well, with smoother priors leading to zero coverage and rougher priors to (extremely) conservative results. In the latter case credible sets are much larger than frequentist confidence sets, in that the ratio of diameters diverges to infinity. The results are numerically illustrated by a simulated data example.
△ Less
Submitted 1 March, 2013; v1 submitted 24 November, 2011;
originally announced November 2011.
-
Bayesian inverse problems with Gaussian priors
Authors:
B. T. Knapik,
A. W. van der Vaart,
J. H. van Zanten
Abstract:
The posterior distribution in a nonparametric inverse problem is shown to contract to the true parameter at a rate that depends on the smoothness of the parameter, and the smoothness and scale of the prior. Correct combinations of these characteristics lead to the minimax rate. The frequentist coverage of credible sets is shown to depend on the combination of prior and true parameter, with smoothe…
▽ More
The posterior distribution in a nonparametric inverse problem is shown to contract to the true parameter at a rate that depends on the smoothness of the parameter, and the smoothness and scale of the prior. Correct combinations of these characteristics lead to the minimax rate. The frequentist coverage of credible sets is shown to depend on the combination of prior and true parameter, with smoother priors leading to zero coverage and rougher priors to conservative coverage. In the latter case credible sets are of the correct order of magnitude. The results are numerically illustrated by the problem of recovering a function from observation of a noisy version of its primitive.
△ Less
Submitted 23 February, 2012; v1 submitted 14 March, 2011;
originally announced March 2011.
-
A local maximal inequality under uniform entropy
Authors:
Aad van der Vaart,
Jon A. Wellner
Abstract:
We derive an upper bound for the mean of the supremum of the empirical process indexed by a class of functions that are known to have variance bounded by a small constant $δ$. The bound is expressed in the uniform entropy integral of the class at $δ$. The bound yields a rate of convergence of minimum contrast estimators when applied to the modulus of continuity of the contrast functions.
We derive an upper bound for the mean of the supremum of the empirical process indexed by a class of functions that are known to have variance bounded by a small constant $δ$. The bound is expressed in the uniform entropy integral of the class at $δ$. The bound yields a rate of convergence of minimum contrast estimators when applied to the modulus of continuity of the contrast functions.
△ Less
Submitted 26 December, 2010;
originally announced December 2010.
-
Adaptive Bayesian estimation using a Gaussian random field with inverse Gamma bandwidth
Authors:
A. W. van der Vaart,
J. H. van Zanten
Abstract:
We consider nonparametric Bayesian estimation inference using a rescaled smooth Gaussian field as a prior for a multidimensional function. The rescaling is achieved using a Gamma variable and the procedure can be viewed as choosing an inverse Gamma bandwidth. The procedure is studied from a frequentist perspective in three statistical settings involving replicated observations (density estimatio…
▽ More
We consider nonparametric Bayesian estimation inference using a rescaled smooth Gaussian field as a prior for a multidimensional function. The rescaling is achieved using a Gamma variable and the procedure can be viewed as choosing an inverse Gamma bandwidth. The procedure is studied from a frequentist perspective in three statistical settings involving replicated observations (density estimation, regression and classification). We prove that the resulting posterior distribution shrinks to the distribution that generates the data at a speed which is minimax-optimal up to a logarithmic factor, whatever the regularity level of the data-generating distribution. Thus the hierachical Bayesian procedure, with a fixed prior, is shown to be fully adaptive.
△ Less
Submitted 25 August, 2009;
originally announced August 2009.
-
Rates of contraction of posterior distributions based on Gaussian process priors
Authors:
A. W. van der Vaart,
J. H. van Zanten
Abstract:
We derive rates of contraction of posterior distributions on nonparametric or semiparametric models based on Gaussian processes. The rate of contraction is shown to depend on the position of the true parameter relative to the reproducing kernel Hilbert space of the Gaussian process and the small ball probabilities of the Gaussian process. We determine these quantities for a range of examples of…
▽ More
We derive rates of contraction of posterior distributions on nonparametric or semiparametric models based on Gaussian processes. The rate of contraction is shown to depend on the position of the true parameter relative to the reproducing kernel Hilbert space of the Gaussian process and the small ball probabilities of the Gaussian process. We determine these quantities for a range of examples of Gaussian priors and in several statistical settings. For instance, we consider the rate of contraction of the posterior distribution based on sampling from a smooth density model when the prior models the log density as a (fractionally integrated) Brownian motion. We also consider regression with Gaussian errors and smooth classification under a logistic or probit link function combined with various priors.
△ Less
Submitted 18 June, 2008;
originally announced June 2008.
-
Reproducing kernel Hilbert spaces of Gaussian priors
Authors:
A. W. van der Vaart,
J. H. van Zanten
Abstract:
We review definitions and properties of reproducing kernel Hilbert spaces attached to Gaussian variables and processes, with a view to applications in nonparametric Bayesian statistics using Gaussian priors. The rate of contraction of posterior distributions based on Gaussian priors can be described through a concentration function that is expressed in the reproducing Hilbert space. Absolute con…
▽ More
We review definitions and properties of reproducing kernel Hilbert spaces attached to Gaussian variables and processes, with a view to applications in nonparametric Bayesian statistics using Gaussian priors. The rate of contraction of posterior distributions based on Gaussian priors can be described through a concentration function that is expressed in the reproducing Hilbert space. Absolute continuity of Gaussian measures and concentration inequalities play an important role in understanding and deriving this result. Series expansions of Gaussian variables and transformations of their reproducing kernel Hilbert spaces under linear maps are useful tools to compute the concentration function.
△ Less
Submitted 21 May, 2008;
originally announced May 2008.
-
Higher order influence functions and minimax estimation of nonlinear functionals
Authors:
James Robins,
Lingling Li,
Eric Tchetgen,
Aad van der Vaart
Abstract:
We present a theory of point and interval estimation for nonlinear functionals in parametric, semi-, and non-parametric models based on higher order influence functions (Robins (2004), Section 9; Li et al. (2004), Tchetgen et al. (2006), Robins et al. (2007)). Higher order influence functions are higher order U-statistics. Our theory extends the first order semiparametric theory of Bickel et al.…
▽ More
We present a theory of point and interval estimation for nonlinear functionals in parametric, semi-, and non-parametric models based on higher order influence functions (Robins (2004), Section 9; Li et al. (2004), Tchetgen et al. (2006), Robins et al. (2007)). Higher order influence functions are higher order U-statistics. Our theory extends the first order semiparametric theory of Bickel et al. (1993) and van der Vaart (1991) by incorporating the theory of higher order scores considered by Pfanzagl (1990), Small and McLeish (1994) and Lindsay and Waterman (1996). The theory reproduces many previous results, produces new non-$\sqrt{n}$ results, and opens up the ability to perform optimal non-$\sqrt{n}$ inference in complex high dimensional models. We present novel rate-optimal point and interval estimators for various functionals of central importance to biostatistics in settings in which estimation at the expected $\sqrt{n}$ rate is not possible, owing to the curse of dimensionality. We also show that our higher order influence functions have a multi-robustness property that extends the double robustness property of first order influence functions described by Robins and Rotnitzky (2001) and van der Laan and Robins (2003).
△ Less
Submitted 20 May, 2008;
originally announced May 2008.
-
Nonparametric Bayesian model selection and averaging
Authors:
Subhashis Ghosal,
Jüri Lember,
Aad van der Vaart
Abstract:
We consider nonparametric Bayesian estimation of a probability density $p$ based on a random sample of size $n$ from this density using a hierarchical prior. The prior consists, for instance, of prior weights on the regularity of the unknown density combined with priors that are appropriate given that the density has this regularity. More generally, the hierarchy consists of prior weights on an…
▽ More
We consider nonparametric Bayesian estimation of a probability density $p$ based on a random sample of size $n$ from this density using a hierarchical prior. The prior consists, for instance, of prior weights on the regularity of the unknown density combined with priors that are appropriate given that the density has this regularity. More generally, the hierarchy consists of prior weights on an abstract model index and a prior on a density model for each model index. We present a general theorem on the rate of contraction of the resulting posterior distribution as $n\to \infty$, which gives conditions under which the rate of contraction is the one attached to the model that best approximates the true density of the observations. This shows that, for instance, the posterior distribution can adapt to the smoothness of the underlying density. We also study the posterior distribution of the model index, and find that under the same conditions the posterior distribution gives negligible weight to models that are bigger than the optimal one, and thus selects the optimal model or smaller models that also approximate the true density well. We apply these result to log spline density models, where we show that the prior weights on the regularity index interact with the priors on the models, making the exact rates depend in a complicated way on the priors, but also that the rate is fairly robust to specification of the prior weights.
△ Less
Submitted 1 February, 2008;
originally announced February 2008.
-
Bayesian inference with rescaled Gaussian process priors
Authors:
Aad van der Vaart,
Harry van Zanten
Abstract:
We use rescaled Gaussian processes as prior models for functional parameters in nonparametric statistical models. We show how the rate of contraction of the posterior distributions depends on the scaling factor. In particular, we exhibit rescaled Gaussian process priors yielding posteriors that contract around the true parameter at optimal convergence rates. To derive our results we establish bo…
▽ More
We use rescaled Gaussian processes as prior models for functional parameters in nonparametric statistical models. We show how the rate of contraction of the posterior distributions depends on the scaling factor. In particular, we exhibit rescaled Gaussian process priors yielding posteriors that contract around the true parameter at optimal convergence rates. To derive our results we establish bounds on small deviation probabilities for smooth stationary Gaussian processes.
△ Less
Submitted 19 October, 2007;
originally announced October 2007.
-
Empirical processes indexed by estimated functions
Authors:
Aad W. van der Vaart,
Jon A. Wellner
Abstract:
We consider the convergence of empirical processes indexed by functions that depend on an estimated parameter $η$ and give several alternative conditions under which the ``estimated parameter'' $η_n$ can be replaced by its natural limit $η_0$ uniformly in some other indexing set $Θ$. In particular we reconsider some examples treated by Ghoudi and Remillard [Asymptotic Methods in Probability and…
▽ More
We consider the convergence of empirical processes indexed by functions that depend on an estimated parameter $η$ and give several alternative conditions under which the ``estimated parameter'' $η_n$ can be replaced by its natural limit $η_0$ uniformly in some other indexing set $Θ$. In particular we reconsider some examples treated by Ghoudi and Remillard [Asymptotic Methods in Probability and Statistics (1998) 171--197, Fields Inst. Commun. 44 (2004) 381--406]. We recast their examples in terms of empirical process theory, and provide an alternative general view which should be of wide applicability.
△ Less
Submitted 7 September, 2007;
originally announced September 2007.