-
A variational Bayes approach to debiased inference for low-dimensional parameters in high-dimensional linear regression
Authors:
Ismaël Castillo,
Alice L'Huillier,
Kolyan Ray,
Luke Travis
Abstract:
We propose a scalable variational Bayes method for statistical inference for a single or low-dimensional subset of the coordinates of a high-dimensional parameter in sparse linear regression. Our approach relies on assigning a mean-field approximation to the nuisance coordinates and carefully modelling the conditional distribution of the target given the nuisance. This requires only a preprocessin…
▽ More
We propose a scalable variational Bayes method for statistical inference for a single or low-dimensional subset of the coordinates of a high-dimensional parameter in sparse linear regression. Our approach relies on assigning a mean-field approximation to the nuisance coordinates and carefully modelling the conditional distribution of the target given the nuisance. This requires only a preprocessing step and preserves the computational advantages of mean-field variational Bayes, while ensuring accurate and reliable inference for the target parameter, including for uncertainty quantification. We investigate the numerical performance of our algorithm, showing that it performs competitively with existing methods. We further establish accompanying theoretical guarantees for estimation and uncertainty quantification in the form of a Bernstein--von Mises theorem.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Posterior and variational inference for deep neural networks with heavy-tailed weights
Authors:
Ismaël Castillo,
Paul Egels
Abstract:
We consider deep neural networks in a Bayesian framework with a prior distribution sampling the network weights at random. Following a recent idea of Agapiou and Castillo (2023), who show that heavy-tailed prior distributions achieve automatic adaptation to smoothness, we introduce a simple Bayesian deep learning prior based on heavy-tailed weights and ReLU activation. We show that the correspondi…
▽ More
We consider deep neural networks in a Bayesian framework with a prior distribution sampling the network weights at random. Following a recent idea of Agapiou and Castillo (2023), who show that heavy-tailed prior distributions achieve automatic adaptation to smoothness, we introduce a simple Bayesian deep learning prior based on heavy-tailed weights and ReLU activation. We show that the corresponding posterior distribution achieves near-optimal minimax contraction rates, simultaneously adaptive to both intrinsic dimension and smoothness of the underlying function, in a variety of contexts including nonparametric regression, geometric data and Besov spaces. While most works so far need a form of model selection built-in within the prior distribution, a key aspect of our approach is that it does not require to sample hyperparameters to learn the architecture of the network. We also provide variational Bayes counterparts of the results, that show that mean-field variational approximations still benefit from near-optimal theoretical support.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Deep Horseshoe Gaussian Processes
Authors:
Ismaël Castillo,
Thibault Randrianarisoa
Abstract:
Deep Gaussian processes have recently been proposed as natural objects to fit, similarly to deep neural networks, possibly complex features present in modern data samples, such as compositional structures. Adopting a Bayesian nonparametric approach, it is natural to use deep Gaussian processes as prior distributions, and use the corresponding posterior distributions for statistical inference. We i…
▽ More
Deep Gaussian processes have recently been proposed as natural objects to fit, similarly to deep neural networks, possibly complex features present in modern data samples, such as compositional structures. Adopting a Bayesian nonparametric approach, it is natural to use deep Gaussian processes as prior distributions, and use the corresponding posterior distributions for statistical inference. We introduce the deep Horseshoe Gaussian process Deep-HGP, a new simple prior based on deep Gaussian processes with a squared-exponential kernel, that in particular enables data-driven choices of the key lengthscale parameters. For nonparametric regression with random design, we show that the associated tempered posterior distribution recovers the unknown true regression curve optimally in terms of quadratic loss, up to a logarithmic factor, in an adaptive way. The convergence rates are simultaneously adaptive to both the smoothness of the regression function and to its structure in terms of compositions. The dependence of the rates in terms of dimension are explicit, allowing in particular for input spaces of dimension increasing with the number of observations.
△ Less
Submitted 4 March, 2024;
originally announced March 2024.
-
Bayesian nonparametric statistics, St-Flour lecture notes
Authors:
Ismaël Castillo
Abstract:
These are lecture notes of the 51st Saint-Flour summer school, July 2023, on the topic of Bayesian nonparametric statistics
These are lecture notes of the 51st Saint-Flour summer school, July 2023, on the topic of Bayesian nonparametric statistics
△ Less
Submitted 26 February, 2024;
originally announced February 2024.
-
Heavy-tailed Bayesian nonparametric adaptation
Authors:
Sergios Agapiou,
Ismaël Castillo
Abstract:
We propose a new Bayesian strategy for adaptation to smoothness in nonparametric models based on heavy tailed series priors. We illustrate it in a variety of settings, showing in particular that the corresponding Bayesian posterior distributions achieve adaptive rates of contraction in the minimax sense (up to logarithmic factors) without the need to sample hyperparameters. Unlike many existing pr…
▽ More
We propose a new Bayesian strategy for adaptation to smoothness in nonparametric models based on heavy tailed series priors. We illustrate it in a variety of settings, showing in particular that the corresponding Bayesian posterior distributions achieve adaptive rates of contraction in the minimax sense (up to logarithmic factors) without the need to sample hyperparameters. Unlike many existing procedures, where a form of direct model (or estimator) selection is performed, the method can be seen as performing a soft selection through the prior tail. In Gaussian regression, such heavy tailed priors are shown to lead to (near-)optimal simultaneous adaptation both in the $L^2$- and $L^\infty$-sense. Results are also derived for linear inverse problems, for anisotropic Besov classes, and for certain losses in more general models through the use of tempered posterior distributions. We present numerical simulations corroborating the theory.
△ Less
Submitted 29 May, 2024; v1 submitted 9 August, 2023;
originally announced August 2023.
-
Semiparametric inference using fractional posteriors
Authors:
Alice L'Huillier,
Luke Travis,
Ismaël Castillo,
Kolyan Ray
Abstract:
We establish a general Bernstein--von Mises theorem for approximately linear semiparametric functionals of fractional posterior distributions based on nonparametric priors. This is illustrated in a number of nonparametric settings and for different classes of prior distributions, including Gaussian process priors. We show that fractional posterior credible sets can provide reliable semiparametric…
▽ More
We establish a general Bernstein--von Mises theorem for approximately linear semiparametric functionals of fractional posterior distributions based on nonparametric priors. This is illustrated in a number of nonparametric settings and for different classes of prior distributions, including Gaussian process priors. We show that fractional posterior credible sets can provide reliable semiparametric uncertainty quantification, but have inflated size. To remedy this, we further propose a \textit{shifted-and-rescaled} fractional posterior set that is an efficient confidence set having optimal size under regularity conditions. As part of our proofs, we also refine existing contraction rate results for fractional posteriors by sharpening the dependence of the rate on the fractional exponent.
△ Less
Submitted 6 February, 2024; v1 submitted 19 January, 2023;
originally announced January 2023.
-
Bayesian Multiscale Analysis of the Cox Model
Authors:
Bo Y. -C. Ning,
Ismaël Castillo
Abstract:
Piecewise constant priors are routinely used in the Bayesian Cox proportional hazards model for survival analysis. Despite its popularity, large sample properties of this Bayesian method are not yet well understood. This work provides a unified theory for posterior distributions in this setting, not requiring the priors to be conjugate. We first derive contraction rate results for wide classes of…
▽ More
Piecewise constant priors are routinely used in the Bayesian Cox proportional hazards model for survival analysis. Despite its popularity, large sample properties of this Bayesian method are not yet well understood. This work provides a unified theory for posterior distributions in this setting, not requiring the priors to be conjugate. We first derive contraction rate results for wide classes of histogram priors on the unknown hazard function and prove asymptotic normality of linear functionals of the posterior hazard in the form of Bernstein--von Mises theorems. Second, using recently developed multiscale techniques, we derive functional limiting results for the cumulative hazard and survival function. Frequentist coverage properties of Bayesian credible sets are investigated: we prove that certain easily computable credible bands for the survival function are optimal frequentist confidence bands. We conduct simulation studies that confirm these predictions, with an excellent behavior particularly in finite samples. Our results suggest that the Bayesian approach can provide an easy solution to obtain both the coefficients estimate and the credible bands for survival function in practice.
△ Less
Submitted 14 June, 2023; v1 submitted 25 May, 2022;
originally announced May 2022.
-
Optional Pólya trees: posterior rates and uncertainty quantification
Authors:
Ismaël Castillo,
Thibault Randrianarisoa
Abstract:
We consider statistical inference in the density estimation model using a tree-based Bayesian approach, with Optional Pólya trees as prior distribution. We derive near-optimal convergence rates for corresponding posterior distributions with respect to the supremum norm. For broad classes of Hölder-smooth densities, we show that the method automatically adapts to the unknown Hölder regularity param…
▽ More
We consider statistical inference in the density estimation model using a tree-based Bayesian approach, with Optional Pólya trees as prior distribution. We derive near-optimal convergence rates for corresponding posterior distributions with respect to the supremum norm. For broad classes of Hölder-smooth densities, we show that the method automatically adapts to the unknown Hölder regularity parameter. We consider the question of uncertainty quantification by providing mathematical guarantees for credible sets from the obtained posterior distributions, leading to near-optimal uncertainty quantification for the density function, as well as related functionals such as the cumulative distribution function. The results are illustrated through a brief simulation study.
△ Less
Submitted 11 October, 2021;
originally announced October 2021.
-
Sharp multiple testing boundary for sparse sequences
Authors:
Kweku Abraham,
Ismael Castillo,
Etienne Roquain
Abstract:
This work investigates multiple testing by considering minimax separation rates in the sparse sequence model, when the testing risk is measured as the sum FDR+FNR (False Discovery Rate plus False Negative Rate). First using the popular beta-min separation condition, with all nonzero signals separated from $0$ by at least some amount, we determine the sharp minimax testing risk asymptotically and t…
▽ More
This work investigates multiple testing by considering minimax separation rates in the sparse sequence model, when the testing risk is measured as the sum FDR+FNR (False Discovery Rate plus False Negative Rate). First using the popular beta-min separation condition, with all nonzero signals separated from $0$ by at least some amount, we determine the sharp minimax testing risk asymptotically and thereby explicitly describe the transition from "achievable multiple testing with vanishing risk" to "impossible multiple testing". Adaptive multiple testing procedures achieving the corresponding optimal boundary are provided: the Benjamini--Hochberg procedure with a properly tuned level, and an empirical Bayes $\ell$-value (`local FDR') procedure. We prove that the FDR and FNR make non-symmetric contributions to the testing risk for most optimal procedures, the FNR part being dominant at the boundary. The multiple testing hardness is then investigated for classes of arbitrary sparse signals. A number of extensions, including results for classification losses and convergence rates in the case of large signals, are also investigated.
△ Less
Submitted 30 August, 2023; v1 submitted 28 September, 2021;
originally announced September 2021.
-
Empirical Bayes cumulative $\ell$-value multiple testing procedure for sparse sequences
Authors:
Kweku Abraham,
Ismael Castillo,
Etienne Roquain
Abstract:
In the sparse sequence model, we consider a popular Bayesian multiple testing procedure and investigate for the first time its behaviour from the frequentist point of view. Given a spike-and-slab prior on the high-dimensional sparse unknown parameter, one can easily compute posterior probabilities of coming from the spike, which correspond to the well known local-fdr values, also called $\ell$-val…
▽ More
In the sparse sequence model, we consider a popular Bayesian multiple testing procedure and investigate for the first time its behaviour from the frequentist point of view. Given a spike-and-slab prior on the high-dimensional sparse unknown parameter, one can easily compute posterior probabilities of coming from the spike, which correspond to the well known local-fdr values, also called $\ell$-values. The spike-and-slab weight parameter is calibrated in an empirical Bayes fashion, using marginal maximum likelihood. The multiple testing procedure under study, called here the cumulative $\ell$-value procedure, ranks coordinates according to their empirical $\ell$-values and thresholds so that the cumulative ranked sum does not exceed a user-specified level $t$.
We validate the use of this method from the multiple testing perspective: for alternatives of appropriately large signal strength, the false discovery rate (FDR) of the procedure is shown to converge to the target level $t$, while its false negative rate (FNR) goes to $0$. We complement this study by providing convergence rates for the method. Additionally, we prove that the $q$-value multiple testing procedure shares similar convergence rates in this model.
△ Less
Submitted 28 March, 2022; v1 submitted 1 February, 2021;
originally announced February 2021.
-
Bayesian inference in high-dimensional models
Authors:
Sayantan Banerjee,
Ismaël Castillo,
Subhashis Ghosal
Abstract:
Models with dimension more than the available sample size are now commonly used in various applications. A sensible inference is possible using a lower-dimensional structure. In regression problems with a large number of predictors, the model is often assumed to be sparse, with only a few predictors active. Interdependence between a large number of variables is succinctly described by a graphical…
▽ More
Models with dimension more than the available sample size are now commonly used in various applications. A sensible inference is possible using a lower-dimensional structure. In regression problems with a large number of predictors, the model is often assumed to be sparse, with only a few predictors active. Interdependence between a large number of variables is succinctly described by a graphical model, where variables are represented by nodes on a graph and an edge between two nodes is used to indicate their conditional dependence given other variables. Many procedures for making inferences in the high-dimensional setting, typically using penalty functions to induce sparsity in the solution obtained by minimizing a loss function, were developed. Bayesian methods have been proposed for such problems more recently, where the prior takes care of the sparsity structure. These methods have the natural ability to also automatically quantify the uncertainty of the inference through the posterior distribution. Theoretical studies of Bayesian procedures in high-dimension have been carried out recently. Questions that arise are, whether the posterior distribution contracts near the true value of the parameter at the minimax optimal rate, whether the correct lower-dimensional structure is discovered with high posterior probability, and whether a credible region has adequate frequentist coverage. In this paper, we review these properties of Bayesian and related methods for several high-dimensional models such as many normal means problem, linear regression, generalized linear models, Gaussian and non-Gaussian graphical models. Effective computational approaches are also discussed.
△ Less
Submitted 12 January, 2021;
originally announced January 2021.
-
Multiple Testing in Nonparametric Hidden Markov Models: An Empirical Bayes Approach
Authors:
Kweku Abraham,
Ismael Castillo,
Elisabeth Gassiat
Abstract:
Given a nonparametric Hidden Markov Model (HMM) with two states, the question of constructing efficient multiple testing procedures is considered, treating one of the states as an unknown null hypothesis. A procedure is introduced, based on nonparametric empirical Bayes ideas, that controls the False Discovery Rate (FDR) at a user--specified level. Guarantees on power are also provided, in the for…
▽ More
Given a nonparametric Hidden Markov Model (HMM) with two states, the question of constructing efficient multiple testing procedures is considered, treating one of the states as an unknown null hypothesis. A procedure is introduced, based on nonparametric empirical Bayes ideas, that controls the False Discovery Rate (FDR) at a user--specified level. Guarantees on power are also provided, in the form of a control of the true positive rate. One of the key steps in the construction requires supremum--norm convergence of preliminary estimators of the emission densities of the HMM. We provide the existence of such estimators, with convergence at the optimal minimax rate, for the case of a HMM with $J\ge 2$ states, which is of independent interest.
△ Less
Submitted 11 January, 2021;
originally announced January 2021.
-
Finding the Sequence of Largest Small n-Polygons by Numerical Optimization
Authors:
János D. Pintér,
Frank J. Kampas,
Ignacio Castillo
Abstract:
LSP(n), the largest small polygon with n vertices, is the polygon of unit diameter that has maximal area A(n). It is known that for all odd values $n \geq 3$, LSP(n) is the regular n-polygon; however, this statement is not valid for even values of n. Finding the polygon LSP(n) and A(n) for even values $n \geq 6$ has been a long-standing challenge. In this work, we develop high-precision numerical…
▽ More
LSP(n), the largest small polygon with n vertices, is the polygon of unit diameter that has maximal area A(n). It is known that for all odd values $n \geq 3$, LSP(n) is the regular n-polygon; however, this statement is not valid for even values of n. Finding the polygon LSP(n) and A(n) for even values $n \geq 6$ has been a long-standing challenge. In this work, we develop high-precision numerical solution estimates of A(n) for even values $n \geq 4$, using the Mathematica model development environment and the IPOPT local nonlinear optimization solver engine. First, we present a revised (tightened) LSP model that greatly assists the efficient solution of the model-class considered. This is followed by numerical results for an illustrative sequence of even values of n, up to $n \leq 1000$. Our results are in close agreement with, or surpass, the best results reported in all earlier studies. Most of these earlier works addressed special cases up to $n \leq 20$, while others obtained numerical optimization results for a range of values from $6 \leq n \leq 100$. For completeness, we also calculate numerically optimized results for a selection of odd values of n, up to $n \leq 999$: these results can be compared to the corresponding theoretical (exact) values. The results obtained are used to provide regression model-based estimates of the optimal area sequence {A(n)}, for all even and odd values n of interest, thereby essentially solving the entire LSP model-class numerically, with demonstrably high precision.
△ Less
Submitted 4 January, 2021;
originally announced January 2021.
-
Multiscale Bayesian Survival Analysis
Authors:
Ismaël Castillo,
Stéphanie van der Pas
Abstract:
We consider Bayesian nonparametric inference in the right-censoring survival model, where modeling is made at the level of the hazard rate. We derive posterior limiting distributions for linear functionals of the hazard, and then for `many' functionals simultaneously in appropriate multiscale spaces. As an application, we derive Bernstein-von Mises theorems for the cumulative hazard and survival f…
▽ More
We consider Bayesian nonparametric inference in the right-censoring survival model, where modeling is made at the level of the hazard rate. We derive posterior limiting distributions for linear functionals of the hazard, and then for `many' functionals simultaneously in appropriate multiscale spaces. As an application, we derive Bernstein-von Mises theorems for the cumulative hazard and survival functions, which lead to asymptotically efficient confidence bands for these quantities. Further, we show optimal posterior contraction rates for the hazard in terms of the supremum norm. In medical studies, a popular approach is to model hazards a priori as random histograms with possibly dependent heights. This and more general classes of arbitrarily smooth prior distributions are considered as applications of our theory. A sampler is provided for possibly dependent histogram posteriors. Its finite sample properties are investigated on both simulated and real data experiments.
△ Less
Submitted 31 May, 2021; v1 submitted 6 May, 2020;
originally announced May 2020.
-
Spike and Slab Pólya tree posterior distributions: adaptive inference
Authors:
Ismaël Castillo,
Romain Mismer
Abstract:
In the density estimation model, the question of adaptive inference using Pólya tree-type prior distributions is considered. A class of prior densities having a tree structure, called spike-and-slab Pólya trees, is introduced. For this class, two types of results are obtained: first, the Bayesian posterior distribution is shown to converge at the minimax rate for the supremum norm in an adaptive w…
▽ More
In the density estimation model, the question of adaptive inference using Pólya tree-type prior distributions is considered. A class of prior densities having a tree structure, called spike-and-slab Pólya trees, is introduced. For this class, two types of results are obtained: first, the Bayesian posterior distribution is shown to converge at the minimax rate for the supremum norm in an adaptive way, for any Hölder regularity of the true density between $0$ and $1$, thereby providing adaptive counterparts to the results for classical Pólya trees in Castillo (2017). Second, the question of uncertainty quantification is considered. An adaptive nonparametric Bernstein-von Mises theorem is derived. Next, it is shown that, under a self-similarity condition on the true density, certain credible sets from the posterior distribution are adaptive confidence bands, having prescribed coverage level and with a diameter shrinking at optimal rate in the minimax sense.
△ Less
Submitted 17 September, 2020; v1 submitted 27 November, 2019;
originally announced November 2019.
-
Uncertainty Quantification for Bayesian CART
Authors:
Ismael Castillo,
Veronika Rockova
Abstract:
This work affords new insights into Bayesian CART in the context of structured wavelet shrinkage. The main thrust is to develop a formal inferential framework for Bayesian tree-based regression. We reframe Bayesian CART as a g-type prior which departs from the typical wavelet product priors by harnessing correlation induced by the tree topology. The practically used Bayesian CART priors are shown…
▽ More
This work affords new insights into Bayesian CART in the context of structured wavelet shrinkage. The main thrust is to develop a formal inferential framework for Bayesian tree-based regression. We reframe Bayesian CART as a g-type prior which departs from the typical wavelet product priors by harnessing correlation induced by the tree topology. The practically used Bayesian CART priors are shown to attain adaptive near rate-minimax posterior concentration in the supremum norm in regression models. For the fundamental goal of uncertainty quantification, we construct adaptive confidence bands for the regression function with uniform coverage under self-similarity. In addition, we show that tree-posteriors enable optimal inference in the form of efficient confidence sets for smooth functionals of the regression function.
△ Less
Submitted 24 May, 2021; v1 submitted 16 October, 2019;
originally announced October 2019.
-
Packing Ovals in Optimized Regular Polygons
Authors:
Frank J. Kampas,
Janos D. Pinter,
Ignacio Castillo
Abstract:
We present a model development framework and numerical solution approach to the general problem-class of packing convex objects into optimized convex containers. Specifically, here we discuss the problem of packing ovals (egg-shaped objects, defined here as generalized ellipses) into optimized regular polygons in $\mathbb{R}^2$. Our solution strategy is based on the use of embedded Lagrange multip…
▽ More
We present a model development framework and numerical solution approach to the general problem-class of packing convex objects into optimized convex containers. Specifically, here we discuss the problem of packing ovals (egg-shaped objects, defined here as generalized ellipses) into optimized regular polygons in $\mathbb{R}^2$. Our solution strategy is based on the use of embedded Lagrange multipliers, followed by nonlinear (global-local) optimization. The numerical results are attained using randomized starting solutions refined by a single call to a local optimization solver. We obtain credible, tight packings for packing 4 to 10 ovals into regular polygons with 3 to 10 sides in all (224) test problems presented here, and for other similarly difficult packing problems.
△ Less
Submitted 21 January, 2019;
originally announced January 2019.
-
On spike and slab empirical Bayes multiple testing
Authors:
Ismael Castillo,
Etienne Roquain
Abstract:
This paper explores a connection between empirical Bayes posterior distributions and false discovery rate (FDR) control. In the Gaussian sequence model, this work shows that empirical Bayes-calibrated spike and slab posterior distributions allow a correct FDR control under sparsity. Doing so, it offers a frequentist theoretical validation of empirical Bayes methods in the context of multiple testi…
▽ More
This paper explores a connection between empirical Bayes posterior distributions and false discovery rate (FDR) control. In the Gaussian sequence model, this work shows that empirical Bayes-calibrated spike and slab posterior distributions allow a correct FDR control under sparsity. Doing so, it offers a frequentist theoretical validation of empirical Bayes methods in the context of multiple testing. Our theoretical results are illustrated with numerical experiments.
△ Less
Submitted 15 June, 2019; v1 submitted 29 August, 2018;
originally announced August 2018.
-
Spike and slab empirical Bayes sparse credible sets
Authors:
Ismael Castillo,
Botond Szabo
Abstract:
In the sparse normal means model, coverage of adaptive Bayesian posterior credible sets associated to spike and slab prior distributions is considered. The key sparsity hyperparameter is calibrated via marginal maximum likelihood empirical Bayes. First, adaptive posterior contraction rates are derived with respect to $d_q$--type--distances for $q\leq 2$. Next, under a type of so-called excessive-b…
▽ More
In the sparse normal means model, coverage of adaptive Bayesian posterior credible sets associated to spike and slab prior distributions is considered. The key sparsity hyperparameter is calibrated via marginal maximum likelihood empirical Bayes. First, adaptive posterior contraction rates are derived with respect to $d_q$--type--distances for $q\leq 2$. Next, under a type of so-called excessive-bias conditions, credible sets are constructed that have coverage of the true parameter at prescribed $1-α$ confidence level and at the same time are of optimal diameter. We also prove that the previous conditions cannot be significantly weakened from the minimax perspective.
△ Less
Submitted 2 February, 2019; v1 submitted 23 August, 2018;
originally announced August 2018.
-
Empirical Bayes analysis of spike and slab posterior distributions
Authors:
Ismaël Castillo,
Romain Mismer
Abstract:
In the sparse normal means model, convergence of the Bayesian posterior distribution associated to spike and slab prior distributions is considered. The key sparsity hyperparameter is calibrated via marginal maximum likelihood empirical Bayes. The plug-in posterior squared-$L^2$ norm is shown to converge at the minimax rate for the euclidean norm for appropriate choices of spike and slab distribut…
▽ More
In the sparse normal means model, convergence of the Bayesian posterior distribution associated to spike and slab prior distributions is considered. The key sparsity hyperparameter is calibrated via marginal maximum likelihood empirical Bayes. The plug-in posterior squared-$L^2$ norm is shown to converge at the minimax rate for the euclidean norm for appropriate choices of spike and slab distributions. Possible choices include standard spike and slab with heavy tailed slab, and the spike and slab LASSO of Rocková and George with heavy tailed slab. Surprisingly, the popular Laplace slab is shown to lead to a suboptimal rate for the full empirical Bayes posterior. This provides a striking example where convergence of aspects of the empirical Bayes posterior does not entail convergence of the full empirical Bayes posterior itself.
△ Less
Submitted 16 October, 2018; v1 submitted 5 January, 2018;
originally announced January 2018.
-
Uniform estimation in stochastic block models is slow
Authors:
Ismaël Castillo,
Peter Orbanz
Abstract:
We explicitly quantify the empirically observed phenomenon that estimation under a stochastic block model (SBM) is hard if the model contains classes that are similar. More precisely, we consider estimation of certain functionals of random graphs generated by a SBM. The SBM may or may not be sparse, and the number of classes may be fixed or grow with the number of vertices. Minimax lower and upper…
▽ More
We explicitly quantify the empirically observed phenomenon that estimation under a stochastic block model (SBM) is hard if the model contains classes that are similar. More precisely, we consider estimation of certain functionals of random graphs generated by a SBM. The SBM may or may not be sparse, and the number of classes may be fixed or grow with the number of vertices. Minimax lower and upper bounds of estimation along specific submodels are derived. The results are nonasymptotic and imply that uniform estimation of a single connectivity parameter is much slower than the expected asymptotic pointwise rate. Specifically, the uniform quadratic rate does not scale as the number of edges, but only as the number of vertices. The lower bounds are local around any possible SBM. An analogous result is derived for functionals of a class of smooth graphons.
△ Less
Submitted 26 April, 2022; v1 submitted 9 March, 2017;
originally announced March 2017.
-
Discussion of "Frequentist coverage of adaptive nonparametric Bayesian credible sets"
Authors:
Ismaël Castillo
Abstract:
Discussion of "Frequentist coverage of adaptive nonparametric Bayesian credible sets" by Szabó, van der Vaart and van Zanten [arXiv:1310.4489v5].
Discussion of "Frequentist coverage of adaptive nonparametric Bayesian credible sets" by Szabó, van der Vaart and van Zanten [arXiv:1310.4489v5].
△ Less
Submitted 7 September, 2015;
originally announced September 2015.
-
Bayesian linear regression with sparse priors
Authors:
Ismaël Castillo,
Johannes Schmidt-Hieber,
Aad van der Vaart
Abstract:
We study full Bayesian procedures for high-dimensional linear regression under sparsity constraints. The prior is a mixture of point masses at zero and continuous distributions. Under compatibility conditions on the design matrix, the posterior distribution is shown to contract at the optimal rate for recovery of the unknown sparse vector, and to give optimal prediction of the response vector. It…
▽ More
We study full Bayesian procedures for high-dimensional linear regression under sparsity constraints. The prior is a mixture of point masses at zero and continuous distributions. Under compatibility conditions on the design matrix, the posterior distribution is shown to contract at the optimal rate for recovery of the unknown sparse vector, and to give optimal prediction of the response vector. It is also shown to select the correct sparse model, or at least the coefficients that are significantly different from zero. The asymptotic shape of the posterior distribution is characterized and employed to the construction and study of credible sets for uncertainty quantification.
△ Less
Submitted 14 October, 2015; v1 submitted 4 March, 2014;
originally announced March 2014.
-
On the Bernstein-von Mises phenomenon for nonparametric Bayes procedures
Authors:
Ismaël Castillo,
Richard Nickl
Abstract:
We continue the investigation of Bernstein-von Mises theorems for nonparametric Bayes procedures from [Ann. Statist. 41 (2013) 1999-2028]. We introduce multiscale spaces on which nonparametric priors and posteriors are naturally defined, and prove Bernstein-von Mises theorems for a variety of priors in the setting of Gaussian nonparametric regression and in the i.i.d. sampling model. From these re…
▽ More
We continue the investigation of Bernstein-von Mises theorems for nonparametric Bayes procedures from [Ann. Statist. 41 (2013) 1999-2028]. We introduce multiscale spaces on which nonparametric priors and posteriors are naturally defined, and prove Bernstein-von Mises theorems for a variety of priors in the setting of Gaussian nonparametric regression and in the i.i.d. sampling model. From these results we deduce several applications where posterior-based inference coincides with efficient frequentist procedures, including Donsker- and Kolmogorov-Smirnov theorems for the random posterior cumulative distribution functions. We also show that multiscale posterior credible bands for the regression or density function are optimal frequentist confidence bands.
△ Less
Submitted 2 October, 2014; v1 submitted 9 October, 2013;
originally announced October 2013.
-
A Bernstein-von Mises theorem for smooth functionals in semiparametric models
Authors:
Ismaël Castillo,
Judith Rousseau
Abstract:
A Bernstein-von Mises theorem is derived for general semiparametric functionals. The result is applied to a variety of semiparametric problems in i.i.d. and non-i.i.d. situations. In particular, new tools are developed to handle semiparametric bias, in particular for nonlinear functionals and in cases where regularity is possibly low. Examples include the squared $L^2$-norm in Gaussian white noise…
▽ More
A Bernstein-von Mises theorem is derived for general semiparametric functionals. The result is applied to a variety of semiparametric problems in i.i.d. and non-i.i.d. situations. In particular, new tools are developed to handle semiparametric bias, in particular for nonlinear functionals and in cases where regularity is possibly low. Examples include the squared $L^2$-norm in Gaussian white noise, nonlinear functionals in density estimation, as well as functionals in autoregressive models. For density estimation, a systematic study of BvM results for two important classes of priors is provided, namely random histograms and Gaussian process priors.
△ Less
Submitted 17 November, 2015; v1 submitted 20 May, 2013;
originally announced May 2013.
-
On Bayesian supremum norm contraction rates
Authors:
Ismaël Castillo
Abstract:
Building on ideas from Castillo and Nickl [Ann. Statist. 41 (2013) 1999-2028], a method is provided to study nonparametric Bayesian posterior convergence rates when "strong" measures of distances, such as the sup-norm, are considered. In particular, we show that likelihood methods can achieve optimal minimax sup-norm rates in density estimation on the unit interval. The introduced methodology is u…
▽ More
Building on ideas from Castillo and Nickl [Ann. Statist. 41 (2013) 1999-2028], a method is provided to study nonparametric Bayesian posterior convergence rates when "strong" measures of distances, such as the sup-norm, are considered. In particular, we show that likelihood methods can achieve optimal minimax sup-norm rates in density estimation on the unit interval. The introduced methodology is used to prove that commonly used families of prior distributions on densities, namely log-density priors and dyadic random density histograms, can indeed achieve optimal sup-norm rates of convergence. New results are also derived in the Gaussian white noise model as a further illustration of the presented techniques.
△ Less
Submitted 14 October, 2014; v1 submitted 5 April, 2013;
originally announced April 2013.
-
Needles and Straw in a Haystack: Posterior concentration for possibly sparse sequences
Authors:
Ismaël Castillo,
Aad van der Vaart
Abstract:
We consider full Bayesian inference in the multivariate normal mean model in the situation that the mean vector is sparse. The prior distribution on the vector of means is constructed hierarchically by first choosing a collection of nonzero means and next a prior on the nonzero values. We consider the posterior distribution in the frequentist set-up that the observations are generated according to…
▽ More
We consider full Bayesian inference in the multivariate normal mean model in the situation that the mean vector is sparse. The prior distribution on the vector of means is constructed hierarchically by first choosing a collection of nonzero means and next a prior on the nonzero values. We consider the posterior distribution in the frequentist set-up that the observations are generated according to a fixed mean vector, and are interested in the posterior distribution of the number of nonzero components and the contraction of the posterior distribution to the true mean vector. We find various combinations of priors on the number of nonzero coefficients and on these coefficients that give desirable performance. We also find priors that give suboptimal convergence, for instance, Gaussian priors on the nonzero coefficients. We illustrate the results by simulations.
△ Less
Submitted 6 November, 2012;
originally announced November 2012.
-
Nonparametric Bernstein-von Mises theorems in Gaussian white noise
Authors:
Ismaël Castillo,
Richard Nickl
Abstract:
Bernstein-von Mises theorems for nonparametric Bayes priors in the Gaussian white noise model are proved. It is demonstrated how such results justify Bayes methods as efficient frequentist inference procedures in a variety of concrete nonparametric problems. Particularly Bayesian credible sets are constructed that have asymptotically exact $1-α$ frequentist coverage level and whose $L^2$-diameter…
▽ More
Bernstein-von Mises theorems for nonparametric Bayes priors in the Gaussian white noise model are proved. It is demonstrated how such results justify Bayes methods as efficient frequentist inference procedures in a variety of concrete nonparametric problems. Particularly Bayesian credible sets are constructed that have asymptotically exact $1-α$ frequentist coverage level and whose $L^2$-diameter shrinks at the minimax rate of convergence (within logarithmic factors) over Hölder balls. Other applications include general classes of linear and nonlinear functionals and credible bands for auto-convolutions. The assumptions cover nonconjugate product priors defined on general orthonormal bases of $L^2$ satisfying weak conditions.
△ Less
Submitted 31 October, 2013; v1 submitted 19 August, 2012;
originally announced August 2012.
-
Thomas Bayes' walk on manifolds
Authors:
Ismael Castillo,
Gerard Kerkyacharian,
Dominique Picard
Abstract:
Convergence of the Bayes posterior measure is considered in canonical statistical settings where observations sit on a geometrical object such as a compact manifold, or more generally on a compact metric space verifying some conditions. A natural geometric prior based on randomly rescaled solutions of the heat equation is considered. Upper and lower bound posterior contraction rates are derived.
Convergence of the Bayes posterior measure is considered in canonical statistical settings where observations sit on a geometrical object such as a compact manifold, or more generally on a compact metric space verifying some conditions. A natural geometric prior based on randomly rescaled solutions of the heat equation is considered. Upper and lower bound posterior contraction rates are derived.
△ Less
Submitted 3 June, 2012;
originally announced June 2012.
-
A weighted message-passing algorithm to estimate volume-related properties of random polytopes
Authors:
Francesc Font-Clos,
Francesco Alessandro Massucci,
Isaac Pérez Castillo
Abstract:
In this letter, we introduce a novel message-passing algorithm for a class of problems which can be mathematically understood as estimating volume-related properties of random polytopes. Unlike the usual approach consisting in approximating the real-valued cavity marginal distributions by a few parameters, we propose a weighted message-passing algorithm to deal with the entire function. Various al…
▽ More
In this letter, we introduce a novel message-passing algorithm for a class of problems which can be mathematically understood as estimating volume-related properties of random polytopes. Unlike the usual approach consisting in approximating the real-valued cavity marginal distributions by a few parameters, we propose a weighted message-passing algorithm to deal with the entire function. Various alternatives of how to implement our approach are discussed and numerical results for random polytopes are compared with results using the Hit-and-Run algorithm.
△ Less
Submitted 21 November, 2011;
originally announced November 2011.
-
Estimation of the distribution of random shifts deformation
Authors:
Ismael Castillo,
Jean-Michel Loubes
Abstract:
Consider discrete values of functions shifted by unobserved translation effects, which are independent realizations of a random variable with unknown distribution $μ$, modeling the variability in the response of each individual. Our aim is to construct a nonparametric estimator of the density of these random translation deformations using semiparametric preliminary estimates of the shifts. Build…
▽ More
Consider discrete values of functions shifted by unobserved translation effects, which are independent realizations of a random variable with unknown distribution $μ$, modeling the variability in the response of each individual. Our aim is to construct a nonparametric estimator of the density of these random translation deformations using semiparametric preliminary estimates of the shifts. Building on results of Dalalyan et al. (2006), semiparametric estimators are obtained in our discrete framework and their performance studied. From these estimates we construct a nonparametric estimator of the target density. Both rates of convergence and an algorithm to construct the estimator are provided.
△ Less
Submitted 17 December, 2008;
originally announced December 2008.
-
Lower bounds for posterior rates with Gaussian process priors
Authors:
Ismaël Castillo
Abstract:
Upper bounds for rates of convergence of posterior distributions associated to Gaussian process priors are obtained by van der Vaart and van Zanten in [14] and expressed in terms of a concentration function involving the Reproducing Kernel Hilbert Space of the Gaussian prior. Here lower-bound counterparts are obtained. As a corollary, we obtain the precise rate of convergence of posteriors for G…
▽ More
Upper bounds for rates of convergence of posterior distributions associated to Gaussian process priors are obtained by van der Vaart and van Zanten in [14] and expressed in terms of a concentration function involving the Reproducing Kernel Hilbert Space of the Gaussian prior. Here lower-bound counterparts are obtained. As a corollary, we obtain the precise rate of convergence of posteriors for Gaussian priors in various settings. Additionally, we extend the upper-bound results of [14] about Riemann-Liouville priors to a continuous family of parameters.
△ Less
Submitted 22 December, 2008; v1 submitted 17 July, 2008;
originally announced July 2008.
-
Semi-parametric second-order efficient estimation of the period of a signal
Authors:
I. Castillo
Abstract:
This paper is concerned with the estimation of the period of an unknown periodic function in Gaussian white noise. A class of estimators of the period is constructed by means of a penalized maximum likelihood method. A second-order asymptotic expansion of the risk of these estimators is obtained. Moreover, the minimax problem for the second-order term is studied and an estimator of the preceding…
▽ More
This paper is concerned with the estimation of the period of an unknown periodic function in Gaussian white noise. A class of estimators of the period is constructed by means of a penalized maximum likelihood method. A second-order asymptotic expansion of the risk of these estimators is obtained. Moreover, the minimax problem for the second-order term is studied and an estimator of the preceding class is shown to be second order efficient.
△ Less
Submitted 26 November, 2007;
originally announced November 2007.