Search | arXiv e-print repository

A variational Bayes approach to debiased inference for low-dimensional parameters in high-dimensional linear regression

Authors: Ismaël Castillo, Alice L'Huillier, Kolyan Ray, Luke Travis

Abstract: We propose a scalable variational Bayes method for statistical inference for a single or low-dimensional subset of the coordinates of a high-dimensional parameter in sparse linear regression. Our approach relies on assigning a mean-field approximation to the nuisance coordinates and carefully modelling the conditional distribution of the target given the nuisance. This requires only a preprocessin… ▽ More We propose a scalable variational Bayes method for statistical inference for a single or low-dimensional subset of the coordinates of a high-dimensional parameter in sparse linear regression. Our approach relies on assigning a mean-field approximation to the nuisance coordinates and carefully modelling the conditional distribution of the target given the nuisance. This requires only a preprocessing step and preserves the computational advantages of mean-field variational Bayes, while ensuring accurate and reliable inference for the target parameter, including for uncertainty quantification. We investigate the numerical performance of our algorithm, showing that it performs competitively with existing methods. We further establish accompanying theoretical guarantees for estimation and uncertainty quantification in the form of a Bernstein--von Mises theorem. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: 46 pages, 5 figures

MSC Class: 62

arXiv:2406.03369 [pdf, ps, other]

Posterior and variational inference for deep neural networks with heavy-tailed weights

Authors: Ismaël Castillo, Paul Egels

Abstract: We consider deep neural networks in a Bayesian framework with a prior distribution sampling the network weights at random. Following a recent idea of Agapiou and Castillo (2023), who show that heavy-tailed prior distributions achieve automatic adaptation to smoothness, we introduce a simple Bayesian deep learning prior based on heavy-tailed weights and ReLU activation. We show that the correspondi… ▽ More We consider deep neural networks in a Bayesian framework with a prior distribution sampling the network weights at random. Following a recent idea of Agapiou and Castillo (2023), who show that heavy-tailed prior distributions achieve automatic adaptation to smoothness, we introduce a simple Bayesian deep learning prior based on heavy-tailed weights and ReLU activation. We show that the corresponding posterior distribution achieves near-optimal minimax contraction rates, simultaneously adaptive to both intrinsic dimension and smoothness of the underlying function, in a variety of contexts including nonparametric regression, geometric data and Besov spaces. While most works so far need a form of model selection built-in within the prior distribution, a key aspect of our approach is that it does not require to sample hyperparameters to learn the architecture of the network. We also provide variational Bayes counterparts of the results, that show that mean-field variational approximations still benefit from near-optimal theoretical support. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: 41 pages

arXiv:2403.01737 [pdf, other]

Deep Horseshoe Gaussian Processes

Authors: Ismaël Castillo, Thibault Randrianarisoa

Abstract: Deep Gaussian processes have recently been proposed as natural objects to fit, similarly to deep neural networks, possibly complex features present in modern data samples, such as compositional structures. Adopting a Bayesian nonparametric approach, it is natural to use deep Gaussian processes as prior distributions, and use the corresponding posterior distributions for statistical inference. We i… ▽ More Deep Gaussian processes have recently been proposed as natural objects to fit, similarly to deep neural networks, possibly complex features present in modern data samples, such as compositional structures. Adopting a Bayesian nonparametric approach, it is natural to use deep Gaussian processes as prior distributions, and use the corresponding posterior distributions for statistical inference. We introduce the deep Horseshoe Gaussian process Deep-HGP, a new simple prior based on deep Gaussian processes with a squared-exponential kernel, that in particular enables data-driven choices of the key lengthscale parameters. For nonparametric regression with random design, we show that the associated tempered posterior distribution recovers the unknown true regression curve optimally in terms of quadratic loss, up to a logarithmic factor, in an adaptive way. The convergence rates are simultaneously adaptive to both the smoothness of the regression function and to its structure in terms of compositions. The dependence of the rates in terms of dimension are explicit, allowing in particular for input spaces of dimension increasing with the number of observations. △ Less

Submitted 4 March, 2024; originally announced March 2024.

Comments: 46 pages (20-page supplement included), one figure

MSC Class: 62G20

arXiv:2402.16422 [pdf, other]

Bayesian nonparametric statistics, St-Flour lecture notes

Authors: Ismaël Castillo

Abstract: These are lecture notes of the 51st Saint-Flour summer school, July 2023, on the topic of Bayesian nonparametric statistics These are lecture notes of the 51st Saint-Flour summer school, July 2023, on the topic of Bayesian nonparametric statistics △ Less

Submitted 26 February, 2024; originally announced February 2024.

Comments: 186 pages

arXiv:2308.04916 [pdf, other]

Heavy-tailed Bayesian nonparametric adaptation

Authors: Sergios Agapiou, Ismaël Castillo

Abstract: We propose a new Bayesian strategy for adaptation to smoothness in nonparametric models based on heavy tailed series priors. We illustrate it in a variety of settings, showing in particular that the corresponding Bayesian posterior distributions achieve adaptive rates of contraction in the minimax sense (up to logarithmic factors) without the need to sample hyperparameters. Unlike many existing pr… ▽ More We propose a new Bayesian strategy for adaptation to smoothness in nonparametric models based on heavy tailed series priors. We illustrate it in a variety of settings, showing in particular that the corresponding Bayesian posterior distributions achieve adaptive rates of contraction in the minimax sense (up to logarithmic factors) without the need to sample hyperparameters. Unlike many existing procedures, where a form of direct model (or estimator) selection is performed, the method can be seen as performing a soft selection through the prior tail. In Gaussian regression, such heavy tailed priors are shown to lead to (near-)optimal simultaneous adaptation both in the $L^2$- and $L^\infty$-sense. Results are also derived for linear inverse problems, for anisotropic Besov classes, and for certain losses in more general models through the use of tempered posterior distributions. We present numerical simulations corroborating the theory. △ Less

Submitted 29 May, 2024; v1 submitted 9 August, 2023; originally announced August 2023.

MSC Class: 62G05; 62G20

arXiv:2301.08158 [pdf, other]

Semiparametric inference using fractional posteriors

Authors: Alice L'Huillier, Luke Travis, Ismaël Castillo, Kolyan Ray

Abstract: We establish a general Bernstein--von Mises theorem for approximately linear semiparametric functionals of fractional posterior distributions based on nonparametric priors. This is illustrated in a number of nonparametric settings and for different classes of prior distributions, including Gaussian process priors. We show that fractional posterior credible sets can provide reliable semiparametric… ▽ More We establish a general Bernstein--von Mises theorem for approximately linear semiparametric functionals of fractional posterior distributions based on nonparametric priors. This is illustrated in a number of nonparametric settings and for different classes of prior distributions, including Gaussian process priors. We show that fractional posterior credible sets can provide reliable semiparametric uncertainty quantification, but have inflated size. To remedy this, we further propose a \textit{shifted-and-rescaled} fractional posterior set that is an efficient confidence set having optimal size under regularity conditions. As part of our proofs, we also refine existing contraction rate results for fractional posteriors by sharpening the dependence of the rate on the fractional exponent. △ Less

Submitted 6 February, 2024; v1 submitted 19 January, 2023; originally announced January 2023.

Comments: 61 pages, 2 figures

MSC Class: 62G

arXiv:2205.12489 [pdf, other]

Bayesian Multiscale Analysis of the Cox Model

Authors: Bo Y. -C. Ning, Ismaël Castillo

Abstract: Piecewise constant priors are routinely used in the Bayesian Cox proportional hazards model for survival analysis. Despite its popularity, large sample properties of this Bayesian method are not yet well understood. This work provides a unified theory for posterior distributions in this setting, not requiring the priors to be conjugate. We first derive contraction rate results for wide classes of… ▽ More Piecewise constant priors are routinely used in the Bayesian Cox proportional hazards model for survival analysis. Despite its popularity, large sample properties of this Bayesian method are not yet well understood. This work provides a unified theory for posterior distributions in this setting, not requiring the priors to be conjugate. We first derive contraction rate results for wide classes of histogram priors on the unknown hazard function and prove asymptotic normality of linear functionals of the posterior hazard in the form of Bernstein--von Mises theorems. Second, using recently developed multiscale techniques, we derive functional limiting results for the cumulative hazard and survival function. Frequentist coverage properties of Bayesian credible sets are investigated: we prove that certain easily computable credible bands for the survival function are optimal frequentist confidence bands. We conduct simulation studies that confirm these predictions, with an excellent behavior particularly in finite samples. Our results suggest that the Bayesian approach can provide an easy solution to obtain both the coefficients estimate and the credible bands for survival function in practice. △ Less

Submitted 14 June, 2023; v1 submitted 25 May, 2022; originally announced May 2022.

Comments: 84 pages, 6 figures, 2 tables

arXiv:2110.05265 [pdf, other]

Optional Pólya trees: posterior rates and uncertainty quantification

Authors: Ismaël Castillo, Thibault Randrianarisoa

Abstract: We consider statistical inference in the density estimation model using a tree-based Bayesian approach, with Optional Pólya trees as prior distribution. We derive near-optimal convergence rates for corresponding posterior distributions with respect to the supremum norm. For broad classes of Hölder-smooth densities, we show that the method automatically adapts to the unknown Hölder regularity param… ▽ More We consider statistical inference in the density estimation model using a tree-based Bayesian approach, with Optional Pólya trees as prior distribution. We derive near-optimal convergence rates for corresponding posterior distributions with respect to the supremum norm. For broad classes of Hölder-smooth densities, we show that the method automatically adapts to the unknown Hölder regularity parameter. We consider the question of uncertainty quantification by providing mathematical guarantees for credible sets from the obtained posterior distributions, leading to near-optimal uncertainty quantification for the density function, as well as related functionals such as the cumulative distribution function. The results are illustrated through a brief simulation study. △ Less

Submitted 11 October, 2021; originally announced October 2021.

Comments: 27 pages with 5 figures/tables + a 13-page appendix; submitted to SIAM/ASA Journal on Uncertainty Quantification

MSC Class: 62G ACM Class: G.3

arXiv:2109.13601 [pdf, other]

Sharp multiple testing boundary for sparse sequences

Authors: Kweku Abraham, Ismael Castillo, Etienne Roquain

Abstract: This work investigates multiple testing by considering minimax separation rates in the sparse sequence model, when the testing risk is measured as the sum FDR+FNR (False Discovery Rate plus False Negative Rate). First using the popular beta-min separation condition, with all nonzero signals separated from $0$ by at least some amount, we determine the sharp minimax testing risk asymptotically and t… ▽ More This work investigates multiple testing by considering minimax separation rates in the sparse sequence model, when the testing risk is measured as the sum FDR+FNR (False Discovery Rate plus False Negative Rate). First using the popular beta-min separation condition, with all nonzero signals separated from $0$ by at least some amount, we determine the sharp minimax testing risk asymptotically and thereby explicitly describe the transition from "achievable multiple testing with vanishing risk" to "impossible multiple testing". Adaptive multiple testing procedures achieving the corresponding optimal boundary are provided: the Benjamini--Hochberg procedure with a properly tuned level, and an empirical Bayes $\ell$-value (`local FDR') procedure. We prove that the FDR and FNR make non-symmetric contributions to the testing risk for most optimal procedures, the FNR part being dominant at the boundary. The multiple testing hardness is then investigated for classes of arbitrary sparse signals. A number of extensions, including results for classification losses and convergence rates in the case of large signals, are also investigated. △ Less

Submitted 30 August, 2023; v1 submitted 28 September, 2021; originally announced September 2021.

Comments: Revision extending the noise models permitted and allowing for "strong signal" settings. 33 pages (main body) or 86 (including supplement). 4 figures

MSC Class: 62G10 (primary); 62C20 (secondary)

arXiv:2102.00929 [pdf, ps, other]

Empirical Bayes cumulative $\ell$-value multiple testing procedure for sparse sequences

Authors: Kweku Abraham, Ismael Castillo, Etienne Roquain

Abstract: In the sparse sequence model, we consider a popular Bayesian multiple testing procedure and investigate for the first time its behaviour from the frequentist point of view. Given a spike-and-slab prior on the high-dimensional sparse unknown parameter, one can easily compute posterior probabilities of coming from the spike, which correspond to the well known local-fdr values, also called $\ell$-val… ▽ More In the sparse sequence model, we consider a popular Bayesian multiple testing procedure and investigate for the first time its behaviour from the frequentist point of view. Given a spike-and-slab prior on the high-dimensional sparse unknown parameter, one can easily compute posterior probabilities of coming from the spike, which correspond to the well known local-fdr values, also called $\ell$-values. The spike-and-slab weight parameter is calibrated in an empirical Bayes fashion, using marginal maximum likelihood. The multiple testing procedure under study, called here the cumulative $\ell$-value procedure, ranks coordinates according to their empirical $\ell$-values and thresholds so that the cumulative ranked sum does not exceed a user-specified level $t$. We validate the use of this method from the multiple testing perspective: for alternatives of appropriately large signal strength, the false discovery rate (FDR) of the procedure is shown to converge to the target level $t$, while its false negative rate (FNR) goes to $0$. We complement this study by providing convergence rates for the method. Additionally, we prove that the $q$-value multiple testing procedure shares similar convergence rates in this model. △ Less

Submitted 28 March, 2022; v1 submitted 1 February, 2021; originally announced February 2021.

MSC Class: 62G10 (Primary) 62C12 (Secondary)

arXiv:2101.04491 [pdf, ps, other]

Bayesian inference in high-dimensional models

Authors: Sayantan Banerjee, Ismaël Castillo, Subhashis Ghosal

Abstract: Models with dimension more than the available sample size are now commonly used in various applications. A sensible inference is possible using a lower-dimensional structure. In regression problems with a large number of predictors, the model is often assumed to be sparse, with only a few predictors active. Interdependence between a large number of variables is succinctly described by a graphical… ▽ More Models with dimension more than the available sample size are now commonly used in various applications. A sensible inference is possible using a lower-dimensional structure. In regression problems with a large number of predictors, the model is often assumed to be sparse, with only a few predictors active. Interdependence between a large number of variables is succinctly described by a graphical model, where variables are represented by nodes on a graph and an edge between two nodes is used to indicate their conditional dependence given other variables. Many procedures for making inferences in the high-dimensional setting, typically using penalty functions to induce sparsity in the solution obtained by minimizing a loss function, were developed. Bayesian methods have been proposed for such problems more recently, where the prior takes care of the sparsity structure. These methods have the natural ability to also automatically quantify the uncertainty of the inference through the posterior distribution. Theoretical studies of Bayesian procedures in high-dimension have been carried out recently. Questions that arise are, whether the posterior distribution contracts near the true value of the parameter at the minimax optimal rate, whether the correct lower-dimensional structure is discovered with high posterior probability, and whether a credible region has adequate frequentist coverage. In this paper, we review these properties of Bayesian and related methods for several high-dimensional models such as many normal means problem, linear regression, generalized linear models, Gaussian and non-Gaussian graphical models. Effective computational approaches are also discussed. △ Less

Submitted 12 January, 2021; originally announced January 2021.

Comments: Review chapter, 42 pages

arXiv:2101.03838 [pdf, ps, other]

Multiple Testing in Nonparametric Hidden Markov Models: An Empirical Bayes Approach

Authors: Kweku Abraham, Ismael Castillo, Elisabeth Gassiat

Abstract: Given a nonparametric Hidden Markov Model (HMM) with two states, the question of constructing efficient multiple testing procedures is considered, treating one of the states as an unknown null hypothesis. A procedure is introduced, based on nonparametric empirical Bayes ideas, that controls the False Discovery Rate (FDR) at a user--specified level. Guarantees on power are also provided, in the for… ▽ More Given a nonparametric Hidden Markov Model (HMM) with two states, the question of constructing efficient multiple testing procedures is considered, treating one of the states as an unknown null hypothesis. A procedure is introduced, based on nonparametric empirical Bayes ideas, that controls the False Discovery Rate (FDR) at a user--specified level. Guarantees on power are also provided, in the form of a control of the true positive rate. One of the key steps in the construction requires supremum--norm convergence of preliminary estimators of the emission densities of the HMM. We provide the existence of such estimators, with convergence at the optimal minimax rate, for the case of a HMM with $J\ge 2$ states, which is of independent interest. △ Less

Submitted 11 January, 2021; originally announced January 2021.

MSC Class: 62G10 (primary); 62M05 (secondary)

arXiv:2101.01263 [pdf]

Finding the Sequence of Largest Small n-Polygons by Numerical Optimization

Authors: János D. Pintér, Frank J. Kampas, Ignacio Castillo

Abstract: LSP(n), the largest small polygon with n vertices, is the polygon of unit diameter that has maximal area A(n). It is known that for all odd values $n \geq 3$, LSP(n) is the regular n-polygon; however, this statement is not valid for even values of n. Finding the polygon LSP(n) and A(n) for even values $n \geq 6$ has been a long-standing challenge. In this work, we develop high-precision numerical… ▽ More LSP(n), the largest small polygon with n vertices, is the polygon of unit diameter that has maximal area A(n). It is known that for all odd values $n \geq 3$, LSP(n) is the regular n-polygon; however, this statement is not valid for even values of n. Finding the polygon LSP(n) and A(n) for even values $n \geq 6$ has been a long-standing challenge. In this work, we develop high-precision numerical solution estimates of A(n) for even values $n \geq 4$, using the Mathematica model development environment and the IPOPT local nonlinear optimization solver engine. First, we present a revised (tightened) LSP model that greatly assists the efficient solution of the model-class considered. This is followed by numerical results for an illustrative sequence of even values of n, up to $n \leq 1000$. Our results are in close agreement with, or surpass, the best results reported in all earlier studies. Most of these earlier works addressed special cases up to $n \leq 20$, while others obtained numerical optimization results for a range of values from $6 \leq n \leq 100$. For completeness, we also calculate numerically optimized results for a selection of odd values of n, up to $n \leq 999$: these results can be compared to the corresponding theoretical (exact) values. The results obtained are used to provide regression model-based estimates of the optimal area sequence {A(n)}, for all even and odd values n of interest, thereby essentially solving the entire LSP model-class numerically, with demonstrably high precision. △ Less

Submitted 4 January, 2021; originally announced January 2021.

arXiv:2005.02889 [pdf, other]

Multiscale Bayesian Survival Analysis

Authors: Ismaël Castillo, Stéphanie van der Pas

Abstract: We consider Bayesian nonparametric inference in the right-censoring survival model, where modeling is made at the level of the hazard rate. We derive posterior limiting distributions for linear functionals of the hazard, and then for `many' functionals simultaneously in appropriate multiscale spaces. As an application, we derive Bernstein-von Mises theorems for the cumulative hazard and survival f… ▽ More We consider Bayesian nonparametric inference in the right-censoring survival model, where modeling is made at the level of the hazard rate. We derive posterior limiting distributions for linear functionals of the hazard, and then for `many' functionals simultaneously in appropriate multiscale spaces. As an application, we derive Bernstein-von Mises theorems for the cumulative hazard and survival functions, which lead to asymptotically efficient confidence bands for these quantities. Further, we show optimal posterior contraction rates for the hazard in terms of the supremum norm. In medical studies, a popular approach is to model hazards a priori as random histograms with possibly dependent heights. This and more general classes of arbitrarily smooth prior distributions are considered as applications of our theory. A sampler is provided for possibly dependent histogram posteriors. Its finite sample properties are investigated on both simulated and real data experiments. △ Less

Submitted 31 May, 2021; v1 submitted 6 May, 2020; originally announced May 2020.

MSC Class: 62G15 (Primary); 62G20 (Secondary)

arXiv:1911.12106 [pdf, ps, other]

Spike and Slab Pólya tree posterior distributions: adaptive inference

Authors: Ismaël Castillo, Romain Mismer

Abstract: In the density estimation model, the question of adaptive inference using Pólya tree-type prior distributions is considered. A class of prior densities having a tree structure, called spike-and-slab Pólya trees, is introduced. For this class, two types of results are obtained: first, the Bayesian posterior distribution is shown to converge at the minimax rate for the supremum norm in an adaptive w… ▽ More In the density estimation model, the question of adaptive inference using Pólya tree-type prior distributions is considered. A class of prior densities having a tree structure, called spike-and-slab Pólya trees, is introduced. For this class, two types of results are obtained: first, the Bayesian posterior distribution is shown to converge at the minimax rate for the supremum norm in an adaptive way, for any Hölder regularity of the true density between $0$ and $1$, thereby providing adaptive counterparts to the results for classical Pólya trees in Castillo (2017). Second, the question of uncertainty quantification is considered. An adaptive nonparametric Bernstein-von Mises theorem is derived. Next, it is shown that, under a self-similarity condition on the true density, certain credible sets from the posterior distribution are adaptive confidence bands, having prescribed coverage level and with a diameter shrinking at optimal rate in the minimax sense. △ Less

Submitted 17 September, 2020; v1 submitted 27 November, 2019; originally announced November 2019.

Comments: 40 pages

arXiv:1910.07635 [pdf, other]

Uncertainty Quantification for Bayesian CART

Authors: Ismael Castillo, Veronika Rockova

Abstract: This work affords new insights into Bayesian CART in the context of structured wavelet shrinkage. The main thrust is to develop a formal inferential framework for Bayesian tree-based regression. We reframe Bayesian CART as a g-type prior which departs from the typical wavelet product priors by harnessing correlation induced by the tree topology. The practically used Bayesian CART priors are shown… ▽ More This work affords new insights into Bayesian CART in the context of structured wavelet shrinkage. The main thrust is to develop a formal inferential framework for Bayesian tree-based regression. We reframe Bayesian CART as a g-type prior which departs from the typical wavelet product priors by harnessing correlation induced by the tree topology. The practically used Bayesian CART priors are shown to attain adaptive near rate-minimax posterior concentration in the supremum norm in regression models. For the fundamental goal of uncertainty quantification, we construct adaptive confidence bands for the regression function with uniform coverage under self-similarity. In addition, we show that tree-posteriors enable optimal inference in the form of efficient confidence sets for smooth functionals of the regression function. △ Less

Submitted 24 May, 2021; v1 submitted 16 October, 2019; originally announced October 2019.

arXiv:1901.07056 [pdf]

Packing Ovals in Optimized Regular Polygons

Authors: Frank J. Kampas, Janos D. Pinter, Ignacio Castillo

Abstract: We present a model development framework and numerical solution approach to the general problem-class of packing convex objects into optimized convex containers. Specifically, here we discuss the problem of packing ovals (egg-shaped objects, defined here as generalized ellipses) into optimized regular polygons in $\mathbb{R}^2$. Our solution strategy is based on the use of embedded Lagrange multip… ▽ More We present a model development framework and numerical solution approach to the general problem-class of packing convex objects into optimized convex containers. Specifically, here we discuss the problem of packing ovals (egg-shaped objects, defined here as generalized ellipses) into optimized regular polygons in $\mathbb{R}^2$. Our solution strategy is based on the use of embedded Lagrange multipliers, followed by nonlinear (global-local) optimization. The numerical results are attained using randomized starting solutions refined by a single call to a local optimization solver. We obtain credible, tight packings for packing 4 to 10 ovals into regular polygons with 3 to 10 sides in all (224) test problems presented here, and for other similarly difficult packing problems. △ Less

Submitted 21 January, 2019; originally announced January 2019.

Comments: Submitted for publication November 2018

arXiv:1808.09748 [pdf, other]

On spike and slab empirical Bayes multiple testing

Authors: Ismael Castillo, Etienne Roquain

Abstract: This paper explores a connection between empirical Bayes posterior distributions and false discovery rate (FDR) control. In the Gaussian sequence model, this work shows that empirical Bayes-calibrated spike and slab posterior distributions allow a correct FDR control under sparsity. Doing so, it offers a frequentist theoretical validation of empirical Bayes methods in the context of multiple testi… ▽ More This paper explores a connection between empirical Bayes posterior distributions and false discovery rate (FDR) control. In the Gaussian sequence model, this work shows that empirical Bayes-calibrated spike and slab posterior distributions allow a correct FDR control under sparsity. Doing so, it offers a frequentist theoretical validation of empirical Bayes methods in the context of multiple testing. Our theoretical results are illustrated with numerical experiments. △ Less

Submitted 15 June, 2019; v1 submitted 29 August, 2018; originally announced August 2018.

Comments: 83 pages, 7 figures

arXiv:1808.07721 [pdf, ps, other]

Spike and slab empirical Bayes sparse credible sets

Authors: Ismael Castillo, Botond Szabo

Abstract: In the sparse normal means model, coverage of adaptive Bayesian posterior credible sets associated to spike and slab prior distributions is considered. The key sparsity hyperparameter is calibrated via marginal maximum likelihood empirical Bayes. First, adaptive posterior contraction rates are derived with respect to $d_q$--type--distances for $q\leq 2$. Next, under a type of so-called excessive-b… ▽ More In the sparse normal means model, coverage of adaptive Bayesian posterior credible sets associated to spike and slab prior distributions is considered. The key sparsity hyperparameter is calibrated via marginal maximum likelihood empirical Bayes. First, adaptive posterior contraction rates are derived with respect to $d_q$--type--distances for $q\leq 2$. Next, under a type of so-called excessive-bias conditions, credible sets are constructed that have coverage of the true parameter at prescribed $1-α$ confidence level and at the same time are of optimal diameter. We also prove that the previous conditions cannot be significantly weakened from the minimax perspective. △ Less

Submitted 2 February, 2019; v1 submitted 23 August, 2018; originally announced August 2018.

Comments: 45 pages

MSC Class: 62G20

arXiv:1801.01696 [pdf, ps, other]

Empirical Bayes analysis of spike and slab posterior distributions

Authors: Ismaël Castillo, Romain Mismer

Abstract: In the sparse normal means model, convergence of the Bayesian posterior distribution associated to spike and slab prior distributions is considered. The key sparsity hyperparameter is calibrated via marginal maximum likelihood empirical Bayes. The plug-in posterior squared-$L^2$ norm is shown to converge at the minimax rate for the euclidean norm for appropriate choices of spike and slab distribut… ▽ More In the sparse normal means model, convergence of the Bayesian posterior distribution associated to spike and slab prior distributions is considered. The key sparsity hyperparameter is calibrated via marginal maximum likelihood empirical Bayes. The plug-in posterior squared-$L^2$ norm is shown to converge at the minimax rate for the euclidean norm for appropriate choices of spike and slab distributions. Possible choices include standard spike and slab with heavy tailed slab, and the spike and slab LASSO of Rocková and George with heavy tailed slab. Surprisingly, the popular Laplace slab is shown to lead to a suboptimal rate for the full empirical Bayes posterior. This provides a striking example where convergence of aspects of the empirical Bayes posterior does not entail convergence of the full empirical Bayes posterior itself. △ Less

Submitted 16 October, 2018; v1 submitted 5 January, 2018; originally announced January 2018.

Comments: 37 pages

MSC Class: 62G20

arXiv:1703.03412 [pdf, other]

Uniform estimation in stochastic block models is slow

Authors: Ismaël Castillo, Peter Orbanz

Abstract: We explicitly quantify the empirically observed phenomenon that estimation under a stochastic block model (SBM) is hard if the model contains classes that are similar. More precisely, we consider estimation of certain functionals of random graphs generated by a SBM. The SBM may or may not be sparse, and the number of classes may be fixed or grow with the number of vertices. Minimax lower and upper… ▽ More We explicitly quantify the empirically observed phenomenon that estimation under a stochastic block model (SBM) is hard if the model contains classes that are similar. More precisely, we consider estimation of certain functionals of random graphs generated by a SBM. The SBM may or may not be sparse, and the number of classes may be fixed or grow with the number of vertices. Minimax lower and upper bounds of estimation along specific submodels are derived. The results are nonasymptotic and imply that uniform estimation of a single connectivity parameter is much slower than the expected asymptotic pointwise rate. Specifically, the uniform quadratic rate does not scale as the number of edges, but only as the number of vertices. The lower bounds are local around any possible SBM. An analogous result is derived for functionals of a class of smooth graphons. △ Less

Submitted 26 April, 2022; v1 submitted 9 March, 2017; originally announced March 2017.

arXiv:1509.01900 [pdf, ps, other]

doi 10.1214/15-AOS1270B

Discussion of "Frequentist coverage of adaptive nonparametric Bayesian credible sets"

Authors: Ismaël Castillo

Abstract: Discussion of "Frequentist coverage of adaptive nonparametric Bayesian credible sets" by Szabó, van der Vaart and van Zanten [arXiv:1310.4489v5]. Discussion of "Frequentist coverage of adaptive nonparametric Bayesian credible sets" by Szabó, van der Vaart and van Zanten [arXiv:1310.4489v5]. △ Less

Submitted 7 September, 2015; originally announced September 2015.

Comments: Published at http://dx.doi.org/10.1214/15-AOS1270B in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOS-AOS1270B

Journal ref: Annals of Statistics 2015, Vol. 43, No. 4, 1437-1443

arXiv:1403.0735 [pdf, ps, other]

doi 10.1214/15-AOS1334

Bayesian linear regression with sparse priors

Authors: Ismaël Castillo, Johannes Schmidt-Hieber, Aad van der Vaart

Abstract: We study full Bayesian procedures for high-dimensional linear regression under sparsity constraints. The prior is a mixture of point masses at zero and continuous distributions. Under compatibility conditions on the design matrix, the posterior distribution is shown to contract at the optimal rate for recovery of the unknown sparse vector, and to give optimal prediction of the response vector. It… ▽ More We study full Bayesian procedures for high-dimensional linear regression under sparsity constraints. The prior is a mixture of point masses at zero and continuous distributions. Under compatibility conditions on the design matrix, the posterior distribution is shown to contract at the optimal rate for recovery of the unknown sparse vector, and to give optimal prediction of the response vector. It is also shown to select the correct sparse model, or at least the coefficients that are significantly different from zero. The asymptotic shape of the posterior distribution is characterized and employed to the construction and study of credible sets for uncertainty quantification. △ Less

Submitted 14 October, 2015; v1 submitted 4 March, 2014; originally announced March 2014.

Comments: Published at http://dx.doi.org/10.1214/15-AOS1334 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOS-AOS1334

Journal ref: Annals of Statistics 2015, Vol. 43, No. 5, 1986-2018

arXiv:1310.2484 [pdf, ps, other]

doi 10.1214/14-AOS1246

On the Bernstein-von Mises phenomenon for nonparametric Bayes procedures

Authors: Ismaël Castillo, Richard Nickl

Abstract: We continue the investigation of Bernstein-von Mises theorems for nonparametric Bayes procedures from [Ann. Statist. 41 (2013) 1999-2028]. We introduce multiscale spaces on which nonparametric priors and posteriors are naturally defined, and prove Bernstein-von Mises theorems for a variety of priors in the setting of Gaussian nonparametric regression and in the i.i.d. sampling model. From these re… ▽ More We continue the investigation of Bernstein-von Mises theorems for nonparametric Bayes procedures from [Ann. Statist. 41 (2013) 1999-2028]. We introduce multiscale spaces on which nonparametric priors and posteriors are naturally defined, and prove Bernstein-von Mises theorems for a variety of priors in the setting of Gaussian nonparametric regression and in the i.i.d. sampling model. From these results we deduce several applications where posterior-based inference coincides with efficient frequentist procedures, including Donsker- and Kolmogorov-Smirnov theorems for the random posterior cumulative distribution functions. We also show that multiscale posterior credible bands for the regression or density function are optimal frequentist confidence bands. △ Less

Submitted 2 October, 2014; v1 submitted 9 October, 2013; originally announced October 2013.

Comments: Published in at http://dx.doi.org/10.1214/14-AOS1246 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOS-AOS1246

Journal ref: Annals of Statistics 2014, Vol. 42, No. 5, 1941-1969

arXiv:1305.4482 [pdf, ps, other]

doi 10.1214/15-AOS1336

A Bernstein-von Mises theorem for smooth functionals in semiparametric models

Authors: Ismaël Castillo, Judith Rousseau

Abstract: A Bernstein-von Mises theorem is derived for general semiparametric functionals. The result is applied to a variety of semiparametric problems in i.i.d. and non-i.i.d. situations. In particular, new tools are developed to handle semiparametric bias, in particular for nonlinear functionals and in cases where regularity is possibly low. Examples include the squared $L^2$-norm in Gaussian white noise… ▽ More A Bernstein-von Mises theorem is derived for general semiparametric functionals. The result is applied to a variety of semiparametric problems in i.i.d. and non-i.i.d. situations. In particular, new tools are developed to handle semiparametric bias, in particular for nonlinear functionals and in cases where regularity is possibly low. Examples include the squared $L^2$-norm in Gaussian white noise, nonlinear functionals in density estimation, as well as functionals in autoregressive models. For density estimation, a systematic study of BvM results for two important classes of priors is provided, namely random histograms and Gaussian process priors. △ Less

Submitted 17 November, 2015; v1 submitted 20 May, 2013; originally announced May 2013.

Comments: Published at http://dx.doi.org/10.1214/15-AOS1336 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOS-AOS1336

Journal ref: Annals of Statistics 2015, Vol. 43, No. 6, 2353-2383

arXiv:1304.1761 [pdf, ps, other]

doi 10.1214/14-AOS1253

On Bayesian supremum norm contraction rates

Authors: Ismaël Castillo

Abstract: Building on ideas from Castillo and Nickl [Ann. Statist. 41 (2013) 1999-2028], a method is provided to study nonparametric Bayesian posterior convergence rates when "strong" measures of distances, such as the sup-norm, are considered. In particular, we show that likelihood methods can achieve optimal minimax sup-norm rates in density estimation on the unit interval. The introduced methodology is u… ▽ More Building on ideas from Castillo and Nickl [Ann. Statist. 41 (2013) 1999-2028], a method is provided to study nonparametric Bayesian posterior convergence rates when "strong" measures of distances, such as the sup-norm, are considered. In particular, we show that likelihood methods can achieve optimal minimax sup-norm rates in density estimation on the unit interval. The introduced methodology is used to prove that commonly used families of prior distributions on densities, namely log-density priors and dyadic random density histograms, can indeed achieve optimal sup-norm rates of convergence. New results are also derived in the Gaussian white noise model as a further illustration of the presented techniques. △ Less

Submitted 14 October, 2014; v1 submitted 5 April, 2013; originally announced April 2013.

Comments: Published in at http://dx.doi.org/10.1214/14-AOS1253 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOS-AOS1253

Journal ref: Annals of Statistics 2014, Vol. 42, No. 5, 2058-2091

arXiv:1211.1197 [pdf, ps, other]

doi 10.1214/12-AOS1029

Needles and Straw in a Haystack: Posterior concentration for possibly sparse sequences

Authors: Ismaël Castillo, Aad van der Vaart

Abstract: We consider full Bayesian inference in the multivariate normal mean model in the situation that the mean vector is sparse. The prior distribution on the vector of means is constructed hierarchically by first choosing a collection of nonzero means and next a prior on the nonzero values. We consider the posterior distribution in the frequentist set-up that the observations are generated according to… ▽ More We consider full Bayesian inference in the multivariate normal mean model in the situation that the mean vector is sparse. The prior distribution on the vector of means is constructed hierarchically by first choosing a collection of nonzero means and next a prior on the nonzero values. We consider the posterior distribution in the frequentist set-up that the observations are generated according to a fixed mean vector, and are interested in the posterior distribution of the number of nonzero components and the contraction of the posterior distribution to the true mean vector. We find various combinations of priors on the number of nonzero coefficients and on these coefficients that give desirable performance. We also find priors that give suboptimal convergence, for instance, Gaussian priors on the nonzero coefficients. We illustrate the results by simulations. △ Less

Submitted 6 November, 2012; originally announced November 2012.

Comments: Published in at http://dx.doi.org/10.1214/12-AOS1029 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOS-AOS1029

Journal ref: Annals of Statistics 2012, Vol. 40, No. 4, 2069-2101

arXiv:1208.3862 [pdf, ps, other]

doi 10.1214/13-AOS1133

Nonparametric Bernstein-von Mises theorems in Gaussian white noise

Authors: Ismaël Castillo, Richard Nickl

Abstract: Bernstein-von Mises theorems for nonparametric Bayes priors in the Gaussian white noise model are proved. It is demonstrated how such results justify Bayes methods as efficient frequentist inference procedures in a variety of concrete nonparametric problems. Particularly Bayesian credible sets are constructed that have asymptotically exact $1-α$ frequentist coverage level and whose $L^2$-diameter… ▽ More Bernstein-von Mises theorems for nonparametric Bayes priors in the Gaussian white noise model are proved. It is demonstrated how such results justify Bayes methods as efficient frequentist inference procedures in a variety of concrete nonparametric problems. Particularly Bayesian credible sets are constructed that have asymptotically exact $1-α$ frequentist coverage level and whose $L^2$-diameter shrinks at the minimax rate of convergence (within logarithmic factors) over Hölder balls. Other applications include general classes of linear and nonlinear functionals and credible bands for auto-convolutions. The assumptions cover nonconjugate product priors defined on general orthonormal bases of $L^2$ satisfying weak conditions. △ Less

Submitted 31 October, 2013; v1 submitted 19 August, 2012; originally announced August 2012.

Comments: Published in at http://dx.doi.org/10.1214/13-AOS1133 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOS-AOS1133

Journal ref: Annals of Statistics 2013, Vol. 41, No. 4, 1999-2028

arXiv:1206.0459 [pdf, ps, other]

Thomas Bayes' walk on manifolds

Authors: Ismael Castillo, Gerard Kerkyacharian, Dominique Picard

Abstract: Convergence of the Bayes posterior measure is considered in canonical statistical settings where observations sit on a geometrical object such as a compact manifold, or more generally on a compact metric space verifying some conditions. A natural geometric prior based on randomly rescaled solutions of the heat equation is considered. Upper and lower bound posterior contraction rates are derived. Convergence of the Bayes posterior measure is considered in canonical statistical settings where observations sit on a geometrical object such as a compact manifold, or more generally on a compact metric space verifying some conditions. A natural geometric prior based on randomly rescaled solutions of the heat equation is considered. Upper and lower bound posterior contraction rates are derived. △ Less

Submitted 3 June, 2012; originally announced June 2012.

MSC Class: 62G05; 62G20

arXiv:1111.4841 [pdf, ps, other]

A weighted message-passing algorithm to estimate volume-related properties of random polytopes

Authors: Francesc Font-Clos, Francesco Alessandro Massucci, Isaac Pérez Castillo

Abstract: In this letter, we introduce a novel message-passing algorithm for a class of problems which can be mathematically understood as estimating volume-related properties of random polytopes. Unlike the usual approach consisting in approximating the real-valued cavity marginal distributions by a few parameters, we propose a weighted message-passing algorithm to deal with the entire function. Various al… ▽ More In this letter, we introduce a novel message-passing algorithm for a class of problems which can be mathematically understood as estimating volume-related properties of random polytopes. Unlike the usual approach consisting in approximating the real-valued cavity marginal distributions by a few parameters, we propose a weighted message-passing algorithm to deal with the entire function. Various alternatives of how to implement our approach are discussed and numerical results for random polytopes are compared with results using the Hit-and-Run algorithm. △ Less

Submitted 21 November, 2011; originally announced November 2011.

Comments: 4 pages, 3 figures

arXiv:0812.3253 [pdf, ps, other]

Estimation of the distribution of random shifts deformation

Authors: Ismael Castillo, Jean-Michel Loubes

Abstract: Consider discrete values of functions shifted by unobserved translation effects, which are independent realizations of a random variable with unknown distribution $μ$, modeling the variability in the response of each individual. Our aim is to construct a nonparametric estimator of the density of these random translation deformations using semiparametric preliminary estimates of the shifts. Build… ▽ More Consider discrete values of functions shifted by unobserved translation effects, which are independent realizations of a random variable with unknown distribution $μ$, modeling the variability in the response of each individual. Our aim is to construct a nonparametric estimator of the density of these random translation deformations using semiparametric preliminary estimates of the shifts. Building on results of Dalalyan et al. (2006), semiparametric estimators are obtained in our discrete framework and their performance studied. From these estimates we construct a nonparametric estimator of the target density. Both rates of convergence and an algorithm to construct the estimator are provided. △ Less

Submitted 17 December, 2008; originally announced December 2008.

MSC Class: 62G05; 62G20

arXiv:0807.2734 [pdf, ps, other]

doi 10.1214/08-EJS273

Lower bounds for posterior rates with Gaussian process priors

Authors: Ismaël Castillo

Abstract: Upper bounds for rates of convergence of posterior distributions associated to Gaussian process priors are obtained by van der Vaart and van Zanten in [14] and expressed in terms of a concentration function involving the Reproducing Kernel Hilbert Space of the Gaussian prior. Here lower-bound counterparts are obtained. As a corollary, we obtain the precise rate of convergence of posteriors for G… ▽ More Upper bounds for rates of convergence of posterior distributions associated to Gaussian process priors are obtained by van der Vaart and van Zanten in [14] and expressed in terms of a concentration function involving the Reproducing Kernel Hilbert Space of the Gaussian prior. Here lower-bound counterparts are obtained. As a corollary, we obtain the precise rate of convergence of posteriors for Gaussian priors in various settings. Additionally, we extend the upper-bound results of [14] about Riemann-Liouville priors to a continuous family of parameters. △ Less

Submitted 22 December, 2008; v1 submitted 17 July, 2008; originally announced July 2008.

Comments: Published in at http://dx.doi.org/10.1214/08-EJS273 the Electronic Journal of Statistics (http://www.i-journals.org/ejs/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-EJS-EJS_2008_273 MSC Class: 62G05; 62G20 (Primary)

Journal ref: Electronic Journal of Statistics 2008, Vol. 2, 1281-1299

arXiv:0711.3955 [pdf, ps, other]

doi 10.3150/07-BEJ5077

Semi-parametric second-order efficient estimation of the period of a signal

Authors: I. Castillo

Abstract: This paper is concerned with the estimation of the period of an unknown periodic function in Gaussian white noise. A class of estimators of the period is constructed by means of a penalized maximum likelihood method. A second-order asymptotic expansion of the risk of these estimators is obtained. Moreover, the minimax problem for the second-order term is studied and an estimator of the preceding… ▽ More This paper is concerned with the estimation of the period of an unknown periodic function in Gaussian white noise. A class of estimators of the period is constructed by means of a penalized maximum likelihood method. A second-order asymptotic expansion of the risk of these estimators is obtained. Moreover, the minimax problem for the second-order term is studied and an estimator of the preceding class is shown to be second order efficient. △ Less

Submitted 26 November, 2007; originally announced November 2007.

Comments: Published in at http://dx.doi.org/10.3150/07-BEJ5077 the Bernoulli (http://isi.cbs.nl/bernoulli/) by the International Statistical Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm)

Report number: IMS-BEJ-BEJ5077

Journal ref: Bernoulli 2007, Vol. 13, No. 4, 910-932

Showing 1–33 of 33 results for author: Castillo, I