-
Nonparametric regression on random geometric graphs sampled from submanifolds
Authors:
Paul Rosa,
Judith Rousseau
Abstract:
We consider the nonparametric regression problem when the covariates are located on an unknown smooth compact submanifold of a Euclidean space. Under defining a random geometric graph structure over the covariates we analyze the asymptotic frequentist behaviour of the posterior distribution arising from Bayesian priors designed through random basis expansion in the graph Laplacian eigenbasis. Unde…
▽ More
We consider the nonparametric regression problem when the covariates are located on an unknown smooth compact submanifold of a Euclidean space. Under defining a random geometric graph structure over the covariates we analyze the asymptotic frequentist behaviour of the posterior distribution arising from Bayesian priors designed through random basis expansion in the graph Laplacian eigenbasis. Under Holder smoothness assumption on the regression function and the density of the covariates over the submanifold, we prove that the posterior contraction rates of such methods are minimax optimal (up to logarithmic factors) for any positive smoothness index.
△ Less
Submitted 31 May, 2024;
originally announced May 2024.
-
Asymptotics of approximate Bayesian computation when summary statistics converge at heterogeneous rates
Authors:
Caroline Lawless,
Christian P. Robert,
Judith Rousseau,
Robin J. Ryder
Abstract:
We consider the asymptotic properties of Approximate Bayesian Computation (ABC) for the realistic case of summary statistics with heterogeneous rates of convergence. We allow some statistics to converge faster than the ABC tolerance, other statistics to converge slower, and cover the case where some statistics do not converge at all. We give conditions for the ABC posterior to converge, and provid…
▽ More
We consider the asymptotic properties of Approximate Bayesian Computation (ABC) for the realistic case of summary statistics with heterogeneous rates of convergence. We allow some statistics to converge faster than the ABC tolerance, other statistics to converge slower, and cover the case where some statistics do not converge at all. We give conditions for the ABC posterior to converge, and provide an explicit representation of the shape of the ABC posterior distribution in our general setting; in particular, we show how the shape of the posterior depends on the number of slow statistics. We then quantify the gain brought by the local linear post-processing step.
△ Less
Submitted 16 November, 2023;
originally announced November 2023.
-
Posterior Contraction Rates for Matérn Gaussian Processes on Riemannian Manifolds
Authors:
Paul Rosa,
Viacheslav Borovitskiy,
Alexander Terenin,
Judith Rousseau
Abstract:
Gaussian processes are used in many machine learning applications that rely on uncertainty quantification. Recently, computational tools for working with these models in geometric settings, such as when inputs lie on a Riemannian manifold, have been developed. This raises the question: can these intrinsic models be shown theoretically to lead to better performance, compared to simply embedding all…
▽ More
Gaussian processes are used in many machine learning applications that rely on uncertainty quantification. Recently, computational tools for working with these models in geometric settings, such as when inputs lie on a Riemannian manifold, have been developed. This raises the question: can these intrinsic models be shown theoretically to lead to better performance, compared to simply embedding all relevant quantities into $\mathbb{R}^d$ and using the restriction of an ordinary Euclidean Gaussian process? To study this, we prove optimal contraction rates for intrinsic Matérn Gaussian processes defined on compact Riemannian manifolds. We also prove analogous rates for extrinsic processes using trace and extension theorems between manifold and ambient Sobolev spaces: somewhat surprisingly, the rates obtained turn out to coincide with those of the intrinsic processes, provided that their smoothness parameters are matched appropriately. We illustrate these rates empirically on a number of examples, which, mirroring prior work, show that intrinsic processes can achieve better performance in practice. Therefore, our work shows that finer-grained analyses are needed to distinguish between different levels of data-efficiency of geometric Gaussian processes, particularly in settings which involve small data set sizes and non-asymptotic behavior.
△ Less
Submitted 29 October, 2023; v1 submitted 19 September, 2023;
originally announced September 2023.
-
Semiparametric posterior corrections
Authors:
Andrew Yiu,
Edwin Fong,
Chris Holmes,
Judith Rousseau
Abstract:
We present a new approach to semiparametric inference using corrected posterior distributions. The method allows us to leverage the adaptivity, regularization and predictive power of nonparametric Bayesian procedures to estimate low-dimensional functionals of interest without being restricted by the holistic Bayesian formalism. Starting from a conventional nonparametric posterior, we target the fu…
▽ More
We present a new approach to semiparametric inference using corrected posterior distributions. The method allows us to leverage the adaptivity, regularization and predictive power of nonparametric Bayesian procedures to estimate low-dimensional functionals of interest without being restricted by the holistic Bayesian formalism. Starting from a conventional nonparametric posterior, we target the functional of interest by transforming the entire distribution with a Bayesian bootstrap correction. We provide conditions for the resulting $\textit{one-step posterior}$ to possess calibrated frequentist properties and specialize the results for several canonical examples: the integrated squared density, the mean of a missing-at-random outcome, and the average causal treatment effect on the treated. The procedure is computationally attractive, requiring only a simple, efficient post-processing step that can be attached onto any arbitrary posterior sampling algorithm. Using the ACIC 2016 causal data analysis competition, we illustrate that our approach can outperform the existing state-of-the-art through the propagation of Bayesian uncertainty.
△ Less
Submitted 20 June, 2023; v1 submitted 9 June, 2023;
originally announced June 2023.
-
Scalable and adaptive variational Bayes methods for Hawkes processes
Authors:
Deborah Sulem,
Vincent Rivoirard,
Judith Rousseau
Abstract:
Hawkes processes are often applied to model dependence and interaction phenomena in multivariate event data sets, such as neuronal spike trains, social interactions, and financial transactions. In the nonparametric setting, learning the temporal dependence structure of Hawkes processes is generally a computationally expensive task, all the more with Bayesian estimation methods. In particular, for…
▽ More
Hawkes processes are often applied to model dependence and interaction phenomena in multivariate event data sets, such as neuronal spike trains, social interactions, and financial transactions. In the nonparametric setting, learning the temporal dependence structure of Hawkes processes is generally a computationally expensive task, all the more with Bayesian estimation methods. In particular, for generalised nonlinear Hawkes processes, Monte-Carlo Markov Chain methods applied to compute the doubly intractable posterior distribution are not scalable to high-dimensional processes in practice. Recently, efficient algorithms targeting a mean-field variational approximation of the posterior distribution have been proposed. In this work, we first unify existing variational Bayes approaches under a general nonparametric inference framework, and analyse the asymptotic properties of these methods under easily verifiable conditions on the prior, the variational class, and the nonlinear model. Secondly, we propose a novel sparsity-inducing procedure, and derive an adaptive mean-field variational algorithm for the popular sigmoid Hawkes processes. Our algorithm is parallelisable and therefore computationally efficient in high-dimensional setting. Through an extensive set of numerical simulations, we also demonstrate that our procedure is able to adapt to the dimensionality of the parameter of the Hawkes process, and is partially robust to some type of model mis-specification.
△ Less
Submitted 31 August, 2023; v1 submitted 1 December, 2022;
originally announced December 2022.
-
A flexible, random histogram kernel for discrete-time Hawkes processes
Authors:
Raiha Browning,
Judith Rousseau,
Kerrie Mengersen
Abstract:
Hawkes processes are a self-exciting stochastic process used to describe phenomena whereby past events increase the probability of the occurrence of future events. This work presents a flexible approach for modelling a variant of these, namely discrete-time Hawkes processes. Most standard models of Hawkes processes rely on a parametric form for the function describing the influence of past events,…
▽ More
Hawkes processes are a self-exciting stochastic process used to describe phenomena whereby past events increase the probability of the occurrence of future events. This work presents a flexible approach for modelling a variant of these, namely discrete-time Hawkes processes. Most standard models of Hawkes processes rely on a parametric form for the function describing the influence of past events, referred to as the triggering kernel. This is likely to be insufficient to capture the true excitation pattern, particularly for complex data. By utilising trans-dimensional Markov chain Monte Carlo inference techniques, our proposed model for the triggering kernel can take the form of any step function, affording significantly more flexibility than a parametric form. We first demonstrate the utility of the proposed model through a comprehensive simulation study. This includes univariate scenarios, and multivariate scenarios whereby there are multiple interacting Hawkes processes. We then apply the proposed model to several case studies: the interaction between two countries during the early to middle stages of the COVID-19 pandemic, taking Italy and France as an example, and the interaction of terrorist activity between two countries in close spatial proximity, Indonesia and the Philippines, and then within three regions of the Philippines.
△ Less
Submitted 4 August, 2022;
originally announced August 2022.
-
Evidence estimation in finite and infinite mixture models and applications
Authors:
Adrien Hairault,
Christian P. Robert,
Judith Rousseau
Abstract:
Estimating the model evidence - or mariginal likelihood of the data - is a notoriously difficult task for finite and infinite mixture models and we reexamine here different Monte Carlo techniques advocated in the recent literature, as well as novel approaches based on Geyer (1994) reverse logistic regression technique, Chib (1995) algorithm, and Sequential Monte Carlo (SMC). Applications are numer…
▽ More
Estimating the model evidence - or mariginal likelihood of the data - is a notoriously difficult task for finite and infinite mixture models and we reexamine here different Monte Carlo techniques advocated in the recent literature, as well as novel approaches based on Geyer (1994) reverse logistic regression technique, Chib (1995) algorithm, and Sequential Monte Carlo (SMC). Applications are numerous. In particular, testing for the number of components in a finite mixture model or against the fit of a finite mixture model for a given dataset has long been and still is an issue of much interest, albeit yet missing a fully satisfactory resolution. Using a Bayes factor to find the right number of components K in a finite mixture model is known to provide a consistent procedure. We furthermore establish the consistence of the Bayes factor when comparing a parametric family of finite mixtures against the nonparametric 'strongly identifiable' Dirichlet Process Mixture (DPM) model.
△ Less
Submitted 11 May, 2022;
originally announced May 2022.
-
Fast Bayesian Coresets via Subsampling and Quasi-Newton Refinement
Authors:
Cian Naik,
Judith Rousseau,
Trevor Campbell
Abstract:
Bayesian coresets approximate a posterior distribution by building a small weighted subset of the data points. Any inference procedure that is too computationally expensive to be run on the full posterior can instead be run inexpensively on the coreset, with results that approximate those on the full data. However, current approaches are limited by either a significant run-time or the need for the…
▽ More
Bayesian coresets approximate a posterior distribution by building a small weighted subset of the data points. Any inference procedure that is too computationally expensive to be run on the full posterior can instead be run inexpensively on the coreset, with results that approximate those on the full data. However, current approaches are limited by either a significant run-time or the need for the user to specify a low-cost approximation to the full posterior. We propose a Bayesian coreset construction algorithm that first selects a uniformly random subset of data, and then optimizes the weights using a novel quasi-Newton method. Our algorithm is a simple to implement, black-box method, that does not require the user to specify a low-cost posterior approximation. It is the first to come with a general high-probability bound on the KL divergence of the output coreset posterior. Experiments demonstrate that our method provides significant improvements in coreset quality against alternatives with comparable construction times, with far less storage cost and user input required.
△ Less
Submitted 15 January, 2023; v1 submitted 17 March, 2022;
originally announced March 2022.
-
Efficient Bayesian estimation and use of cut posterior in semiparametric hidden Markov models
Authors:
Daniel Moss,
Judith Rousseau
Abstract:
We consider the problem of estimation in Hidden Markov models with finite state space and nonparametric emission distributions. Efficient estimators for the transition matrix are exhibited, and a semiparametric Bernstein-von Mises result is deduced. Following from this, we propose a modular approach using the cut posterior to jointly estimate the transition matrix and the emission densities. We de…
▽ More
We consider the problem of estimation in Hidden Markov models with finite state space and nonparametric emission distributions. Efficient estimators for the transition matrix are exhibited, and a semiparametric Bernstein-von Mises result is deduced. Following from this, we propose a modular approach using the cut posterior to jointly estimate the transition matrix and the emission densities. We derive a general theorem on contraction rates for this approach. We then show how this result may be applied to obtain a contraction rate result for the emission densities in our setting; a key intermediate step is an inversion inequality relating $L^1$ distance between the marginal densities to $L^1$ distance between the emissions. Finally, a contraction result for the smoothing probabilities is shown, which avoids the common approach of sample splitting. Simulations are provided which demonstrate both the theory and the ease of its implementation.
△ Less
Submitted 8 March, 2023; v1 submitted 11 March, 2022;
originally announced March 2022.
-
Stable ResNet
Authors:
Soufiane Hayou,
Eugenio Clerico,
Bobby He,
George Deligiannidis,
Arnaud Doucet,
Judith Rousseau
Abstract:
Deep ResNet architectures have achieved state of the art performance on many tasks. While they solve the problem of gradient vanishing, they might suffer from gradient exploding as the depth becomes large (Yang et al. 2017). Moreover, recent results have shown that ResNet might lose expressivity as the depth goes to infinity (Yang et al. 2017, Hayou et al. 2019). To resolve these issues, we introd…
▽ More
Deep ResNet architectures have achieved state of the art performance on many tasks. While they solve the problem of gradient vanishing, they might suffer from gradient exploding as the depth becomes large (Yang et al. 2017). Moreover, recent results have shown that ResNet might lose expressivity as the depth goes to infinity (Yang et al. 2017, Hayou et al. 2019). To resolve these issues, we introduce a new class of ResNet architectures, called Stable ResNet, that have the property of stabilizing the gradient while ensuring expressivity in the infinite depth limit.
△ Less
Submitted 18 March, 2021; v1 submitted 24 October, 2020;
originally announced October 2020.
-
Sparse Networks with Core-Periphery Structure
Authors:
Cian Naik,
François Caron,
Judith Rousseau
Abstract:
We propose a statistical model for graphs with a core-periphery structure. To do this we define a precise notion of what it means for a graph to have this structure, based on the sparsity properties of the subgraphs of core and periphery nodes. We present a class of sparse graphs with such properties, and provide methods to simulate from this class, and to perform posterior inference. We demonstra…
▽ More
We propose a statistical model for graphs with a core-periphery structure. To do this we define a precise notion of what it means for a graph to have this structure, based on the sparsity properties of the subgraphs of core and periphery nodes. We present a class of sparse graphs with such properties, and provide methods to simulate from this class, and to perform posterior inference. We demonstrate that our model can detect core-periphery structure in simulated and real-world networks.
△ Less
Submitted 21 October, 2019;
originally announced October 2019.
-
Exact Convergence Rates of the Neural Tangent Kernel in the Large Depth Limit
Authors:
Soufiane Hayou,
Arnaud Doucet,
Judith Rousseau
Abstract:
Recent work by Jacot et al. (2018) has shown that training a neural network using gradient descent in parameter space is related to kernel gradient descent in function space with respect to the Neural Tangent Kernel (NTK). Lee et al. (2019) built on this result by establishing that the output of a neural network trained using gradient descent can be approximated by a linear model when the network…
▽ More
Recent work by Jacot et al. (2018) has shown that training a neural network using gradient descent in parameter space is related to kernel gradient descent in function space with respect to the Neural Tangent Kernel (NTK). Lee et al. (2019) built on this result by establishing that the output of a neural network trained using gradient descent can be approximated by a linear model when the network width is large. Indeed, under regularity conditions, the NTK converges to a time-independent kernel in the infinite-width limit. This regime is often called the NTK regime. In parallel, recent works on signal propagation (Poole et al., 2016; Schoenholz et al., 2017; Hayou et al., 2019a) studied the impact of the initialization and the activation function on signal propagation in deep neural networks. In this paper, we connect these two theories by quantifying the impact of the initialization and the activation function on the NTK when the network depth becomes large. In particular, we provide a comprehensive analysis of the convergence rates of the NTK regime to the infinite depth regime.
△ Less
Submitted 25 May, 2022; v1 submitted 31 May, 2019;
originally announced May 2019.
-
On the Impact of the Activation Function on Deep Neural Networks Training
Authors:
Soufiane Hayou,
Arnaud Doucet,
Judith Rousseau
Abstract:
The weight initialization and the activation function of deep neural networks have a crucial impact on the performance of the training procedure. An inappropriate selection can lead to the loss of information of the input during forward propagation and the exponential vanishing/exploding of gradients during back-propagation. Understanding the theoretical properties of untrained random networks is…
▽ More
The weight initialization and the activation function of deep neural networks have a crucial impact on the performance of the training procedure. An inappropriate selection can lead to the loss of information of the input during forward propagation and the exponential vanishing/exploding of gradients during back-propagation. Understanding the theoretical properties of untrained random networks is key to identifying which deep networks may be trained successfully as recently demonstrated by Samuel et al (2017) who showed that for deep feedforward neural networks only a specific choice of hyperparameters known as the `Edge of Chaos' can lead to good performance. While the work by Samuel et al (2017) discuss trainability issues, we focus here on training acceleration and overall performance. We give a comprehensive theoretical analysis of the Edge of Chaos and show that we can indeed tune the initialization parameters and the activation function in order to accelerate the training and improve the performance.
△ Less
Submitted 26 May, 2019; v1 submitted 18 February, 2019;
originally announced February 2019.
-
On the Selection of Initialization and Activation Function for Deep Neural Networks
Authors:
Soufiane Hayou,
Arnaud Doucet,
Judith Rousseau
Abstract:
The weight initialization and the activation function of deep neural networks have a crucial impact on the performance of the training procedure. An inappropriate selection can lead to the loss of information of the input during forward propagation and the exponential vanishing/exploding of gradients during back-propagation. Understanding the theoretical properties of untrained random networks is…
▽ More
The weight initialization and the activation function of deep neural networks have a crucial impact on the performance of the training procedure. An inappropriate selection can lead to the loss of information of the input during forward propagation and the exponential vanishing/exploding of gradients during back-propagation. Understanding the theoretical properties of untrained random networks is key to identifying which deep networks may be trained successfully as recently demonstrated by Schoenholz et al. (2017) who showed that for deep feedforward neural networks only a specific choice of hyperparameters known as the `edge of chaos' can lead to good performance. We complete this analysis by providing quantitative results showing that, for a class of ReLU-like activation functions, the information propagates indeed deeper for an initialization at the edge of chaos. By further extending this analysis, we identify a class of activation functions that improve the information propagation over ReLU-like functions. This class includes the Swish activation, $φ_{swish}(x) = x \cdot \text{sigmoid}(x)$, used in Hendrycks & Gimpel (2016), Elfwing et al. (2017) and Ramachandran et al. (2017). This provides a theoretical grounding for the excellent empirical performance of $φ_{swish}$ observed in these contributions. We complement those previous results by illustrating the benefit of using a random initialization on the edge of chaos in this context.
△ Less
Submitted 7 October, 2018; v1 submitted 21 May, 2018;
originally announced May 2018.
-
Model Misspecification in ABC: Consequences and Diagnostics
Authors:
David T. Frazier,
Christian P. Robert,
Judith Rousseau
Abstract:
We analyze the behavior of approximate Bayesian computation (ABC) when the model generating the simulated data differs from the actual data generating process; i.e., when the data simulator in ABC is misspecified. We demonstrate both theoretically and in simple, but practically relevant, examples that when the model is misspecified different versions of ABC can yield substantially different result…
▽ More
We analyze the behavior of approximate Bayesian computation (ABC) when the model generating the simulated data differs from the actual data generating process; i.e., when the data simulator in ABC is misspecified. We demonstrate both theoretically and in simple, but practically relevant, examples that when the model is misspecified different versions of ABC can yield substantially different results. Our theoretical results demonstrate that even though the model is misspecified, under regularity conditions, the accept/reject ABC approach concentrates posterior mass on an appropriately defined pseudo-true parameter value. However, under model misspecification the ABC posterior does not yield credible sets with valid frequentist coverage and has non-standard asymptotic behavior. In addition, we examine the theoretical behavior of the popular local regression adjustment to ABC under model misspecification and demonstrate that this approach concentrates posterior mass on a completely different pseudo-true value than accept/reject ABC. Using our theoretical results, we suggest two approaches to diagnose model misspecification in ABC. All theoretical results and diagnostics are illustrated in a simple running example.
△ Less
Submitted 9 July, 2019; v1 submitted 6 August, 2017;
originally announced August 2017.
-
Some comments about A Bayesian criterion for singular models by M. Drton and M. Plummer
Authors:
Christian P. Robert,
Judith Rousseau
Abstract:
These are written comments about the Read Paper A Bayesian criterion for singular models by M. Drton and M. Plummer, read to the Royal Statistical Society on October 5, 2016. The discussion was delivered by Judith Rousseau.
These are written comments about the Read Paper A Bayesian criterion for singular models by M. Drton and M. Plummer, read to the Royal Statistical Society on October 5, 2016. The discussion was delivered by Judith Rousseau.
△ Less
Submitted 8 October, 2016;
originally announced October 2016.
-
Some comments about "Penalising model component complexity" by Simpson et al. (2017)
Authors:
Christian P. Robert,
Judith Rousseau
Abstract:
This note discusses the paper "Penalising model component complexity" by Simpson et al. (2017). While we acknowledge the highly novel approach to prior construction and commend the authors for setting new-encompassing principles that will Bayesian modelling, and while we perceive the potential connection with other branches of the literature, we remain uncertain as to what extent the principles ex…
▽ More
This note discusses the paper "Penalising model component complexity" by Simpson et al. (2017). While we acknowledge the highly novel approach to prior construction and commend the authors for setting new-encompassing principles that will Bayesian modelling, and while we perceive the potential connection with other branches of the literature, we remain uncertain as to what extent the principles exposed in the paper can be developed outside specific models, given their lack of precision. The very notions of model component, base model, overfitting prior are for instance conceptual rather than mathematical and we thus fear the concept of penalised complexity may not further than extending first-guess priors into larger families, thus failing to establish reference priors on a novel sound ground.
△ Less
Submitted 22 September, 2016;
originally announced September 2016.
-
Asymptotic Properties of Approximate Bayesian Computation
Authors:
David T. Frazier,
Gael M. Martin,
Christian P. Robert,
Judith Rousseau
Abstract:
Approximate Bayesian computation allows for statistical analysis in models with intractable likelihoods. In this paper we consider the asymptotic behaviour of the posterior distribution obtained by this method. We give general results on the rate at which the posterior distribution concentrates on sets containing the true parameter, its limiting shape, and the asymptotic distribution of the poster…
▽ More
Approximate Bayesian computation allows for statistical analysis in models with intractable likelihoods. In this paper we consider the asymptotic behaviour of the posterior distribution obtained by this method. We give general results on the rate at which the posterior distribution concentrates on sets containing the true parameter, its limiting shape, and the asymptotic distribution of the posterior mean. These results hold under given rates for the tolerance used within the method, mild regularity conditions on the summary statistics, and a condition linked to identification of the true parameters. Implications for practitioners are discussed.
△ Less
Submitted 8 May, 2018; v1 submitted 23 July, 2016;
originally announced July 2016.
-
Bayesian Nonparametrics for Sparse Dynamic Networks
Authors:
Cian Naik,
Francois Caron,
Judith Rousseau,
Yee Whye Teh,
Konstantina Palla
Abstract:
In this paper we propose a Bayesian nonparametric approach to modelling sparse time-varying networks. A positive parameter is associated to each node of a network, which models the sociability of that node. Sociabilities are assumed to evolve over time, and are modelled via a dynamic point process model. The model is able to capture long term evolution of the sociabilities. Moreover, it yields spa…
▽ More
In this paper we propose a Bayesian nonparametric approach to modelling sparse time-varying networks. A positive parameter is associated to each node of a network, which models the sociability of that node. Sociabilities are assumed to evolve over time, and are modelled via a dynamic point process model. The model is able to capture long term evolution of the sociabilities. Moreover, it yields sparse graphs, where the number of edges grows subquadratically with the number of nodes. The evolution of the sociabilities is described by a tractable time-varying generalised gamma process. We provide some theoretical insights into the model and apply it to three datasets: a simulated network, a network of hyperlinks between communities on Reddit, and a network of co-occurences of words in Reuters news articles after the September 11th attacks.
△ Less
Submitted 14 April, 2022; v1 submitted 6 July, 2016;
originally announced July 2016.
-
Some comments about James Watson's and Chris Holmes' "Approximate Models and Robust Decisions": Nonparametric Bayesian clay for robust decision bricks
Authors:
Christian P. Robert,
Judith Rousseau
Abstract:
This note discusses Watson and Holmes (2016) and their pro- posals towards more robust Bayesian decisions. While we acknowledge and commend the authors for setting new and all-encompassing prin- ciples of Bayesian robustness, and we appreciate the strong anchoring of those within a decision-theoretic referential, we remain uncertain as to which extent such principles can be applied outside binary…
▽ More
This note discusses Watson and Holmes (2016) and their pro- posals towards more robust Bayesian decisions. While we acknowledge and commend the authors for setting new and all-encompassing prin- ciples of Bayesian robustness, and we appreciate the strong anchoring of those within a decision-theoretic referential, we remain uncertain as to which extent such principles can be applied outside binary de- cisions. We also wonder at the ultimate relevance of Kullback-Leibler neighbourhoods to characterise robustness and favour extensions along non-parametric axes.
△ Less
Submitted 9 April, 2016; v1 submitted 30 March, 2016;
originally announced March 2016.
-
Overfitting hidden Markov models with an unknown number of states
Authors:
Zoé van Havre,
Judith Rousseau,
Nicole White,
Kerrie Mengersen
Abstract:
This paper presents new theory and methodology for the Bayesian estimation of overfitted hidden Markov models, with finite state space. The goal is then to achieve posterior emptying of extra states. A prior configuration is constructed which favours configurations where the hidden Markov chain remains ergodic although it empties out some of the states. Asymptotic posterior convergence rates are p…
▽ More
This paper presents new theory and methodology for the Bayesian estimation of overfitted hidden Markov models, with finite state space. The goal is then to achieve posterior emptying of extra states. A prior configuration is constructed which favours configurations where the hidden Markov chain remains ergodic although it empties out some of the states. Asymptotic posterior convergence rates are proven theoretically, and demonstrated with a large sample simulation. The problem of overfitted HMMs is then considered in the context of smaller sample sizes, and due to computational and mixing issues two alternative prior structures are studied, one commonly used in practice, and a mixture of the two priors. The Prior Parallel Tempering approach of van Havre (2015) is also extended to HMMs to allow MCMC estimation of the complex posterior space. A replicate simulation study and an in-depth exploration is performed to compare the three priors with hyperparameters chosen according to the asymptotic constraints alongside less informative alternatives.
△ Less
Submitted 8 February, 2016;
originally announced February 2016.
-
Clustering action potential spikes: Insights on the use of overfitted finite mixture models and Dirichlet process mixture models
Authors:
Zoé van Havre,
Nicole White,
Judith Rousseau,
Kerrie Mengersen
Abstract:
The modelling of action potentials from extracellular recordings, or spike sorting, is a rich area of neuroscience research in which latent variable models are often used. Two such models, Overfitted Finite Mixture models (OFMs) and Dirichlet Process Mixture models (DPMs) are considered to provide insights for unsupervised clustering of complex, multivariate medical data when the number of cluster…
▽ More
The modelling of action potentials from extracellular recordings, or spike sorting, is a rich area of neuroscience research in which latent variable models are often used. Two such models, Overfitted Finite Mixture models (OFMs) and Dirichlet Process Mixture models (DPMs) are considered to provide insights for unsupervised clustering of complex, multivariate medical data when the number of clusters is unknown. OFM and DPM are structured in a similar hierarchical fashion but they are based on different philosophies with different underlying assumptions. This study investigates how these differences impact on a real study of spike sorting, for the estimation of multivariate Gaussian location-scale mixture models in the presence of common difficulties arising from complex medical data. The results provide insights allowing the future analyst to choose an approach suited to the situation and goal of the research problem at hand.
△ Less
Submitted 4 February, 2016;
originally announced February 2016.
-
Overfitting Bayesian Mixture Models with an Unknown Number of Components
Authors:
Zoe van Havre,
Nicole White,
Judith Rousseau,
Kerrie Mengersen
Abstract:
This paper proposes solutions to three issues pertaining to the estimation of finite mixture models with an unknown number of components: the non-identifiability induced by overfitting the number of components, the mixing limitations of standard Markov Chain Monte Carlo (MCMC) sampling techniques, and the related label switching problem. An overfitting approach is used to estimate the number of co…
▽ More
This paper proposes solutions to three issues pertaining to the estimation of finite mixture models with an unknown number of components: the non-identifiability induced by overfitting the number of components, the mixing limitations of standard Markov Chain Monte Carlo (MCMC) sampling techniques, and the related label switching problem. An overfitting approach is used to estimate the number of components in a finite mixture model via a Zmix algorithm. Zmix provides a bridge between multidimensional samplers and test based estimation methods, whereby priors are chosen to encourage extra groups to have weights approaching zero. MCMC sampling is made possible by the implementation of prior parallel tempering, an extension of parallel tempering. Zmix can accurately estimate the number of components, posterior parameter estimates and allocation probabilities given a sufficiently large sample size. The results will reflect uncertainty in the final model and will report the range of possible candidate models and their respective estimated probabilities from a single run. Label switching is resolved with a computationally light-weight method, Zswitch, developed for overfitted mixtures by exploiting the intuitiveness of allocation-based relabelling algorithms and the precision of label-invariant loss functions. Four simulation studies are included to illustrate Zmix and Zswitch, as well as three case studies from the literature. All methods are available as part of the R package Zmix, which can currently be applied to univariate Gaussian mixture models
△ Less
Submitted 24 August, 2015; v1 submitted 18 February, 2015;
originally announced February 2015.
-
Testing hypotheses via a mixture estimation model
Authors:
Kaniav Kamary,
Kerrie Mengersen,
Christian P. Robert,
Judith Rousseau
Abstract:
We consider a novel paradigm for Bayesian testing of hypotheses and Bayesian model comparison. Our alternative to the traditional construction of posterior probabilities that a given hypothesis is true or that the data originates from a specific model is to consider the models under comparison as components of a mixture model. We therefore replace the original testing problem with an estimation on…
▽ More
We consider a novel paradigm for Bayesian testing of hypotheses and Bayesian model comparison. Our alternative to the traditional construction of posterior probabilities that a given hypothesis is true or that the data originates from a specific model is to consider the models under comparison as components of a mixture model. We therefore replace the original testing problem with an estimation one that focus on the probability weight of a given model within a mixture model. We analyze the sensitivity on the resulting posterior distribution on the weights of various prior modeling on the weights. We stress that a major appeal in using this novel perspective is that generic improper priors are acceptable, while not putting convergence in jeopardy. Among other features, this allows for a resolution of the Lindley-Jeffreys paradox. When using a reference Beta B(a,a) prior on the mixture weights, we note that the sensitivity of the posterior estimations of the weights to the choice of a vanishes with the sample size increasing and avocate the default choice a=0.5, derived from Rousseau and Mengersen (2011). Another feature of this easily implemented alternative to the classical Bayesian solution is that the speeds of convergence of the posterior mean of the weight and of the corresponding posterior probability are quite similar.
△ Less
Submitted 31 December, 2018; v1 submitted 5 December, 2014;
originally announced December 2014.
-
Posterior concentration rates for counting processes with Aalen multiplicative intensities
Authors:
Sophie Donnet,
Vincent Rivoirard,
Judith Rousseau,
Catia Scricciolo
Abstract:
We provide general conditions to derive posterior concentration rates for Aalen counting processes. The conditions are designed to resemble those proposed in the literature for the problem of density estimation, for instance in Ghosal et al. (2000), so that existing results on density estimation can be adapted to the present setting. We apply the general theorem to some prior models including Diri…
▽ More
We provide general conditions to derive posterior concentration rates for Aalen counting processes. The conditions are designed to resemble those proposed in the literature for the problem of density estimation, for instance in Ghosal et al. (2000), so that existing results on density estimation can be adapted to the present setting. We apply the general theorem to some prior models including Dirichlet process mixtures of uniform densities to estimate monotone non-increasing intensities and log-splines.
△ Less
Submitted 22 July, 2014;
originally announced July 2014.
-
Posterior concentration rates for empirical Bayes procedures, with applications to Dirichlet Process mixtures
Authors:
Sophie Donnet,
Vincent Rivoirard,
Judith Rousseau,
Catia Scricciolo
Abstract:
In this paper we provide general conditions to check on the model and the prior to derive posterior concentration rates for data-dependent priors (or empirical Bayes approaches). We aim at providing conditions that are close to the conditions provided in the seminal paper by Ghosal and van der Vaart (2007a). We then apply the general theorem to two different settings: the estimation of a density u…
▽ More
In this paper we provide general conditions to check on the model and the prior to derive posterior concentration rates for data-dependent priors (or empirical Bayes approaches). We aim at providing conditions that are close to the conditions provided in the seminal paper by Ghosal and van der Vaart (2007a). We then apply the general theorem to two different settings: the estimation of a density using Dirichlet process mixtures of Gaussian random variables with base measure depending on some empirical quantities and the estimation of the intensity of a counting process under the Aalen model. A simulation study for inhomogeneous Poisson processes also illustrates our results. In the former case we also derive some results on the estimation of the mixing density and on the deconvolution problem. In the latter, we provide a general theorem on posterior concentration rates for counting processes with Aalen multiplicative intensity with priors not depending on the data.
△ Less
Submitted 17 June, 2014;
originally announced June 2014.
-
Bayesian matrix completion: prior specification
Authors:
Pierre Alquier,
Vincent Cottet,
Nicolas Chopin,
Judith Rousseau
Abstract:
Low-rank matrix estimation from incomplete measurements recently received increased attention due to the emergence of several challenging applications, such as recommender systems; see in particular the famous Netflix challenge. While the behaviour of algorithms based on nuclear norm minimization is now well understood, an as yet unexplored avenue of research is the behaviour of Bayesian algorithm…
▽ More
Low-rank matrix estimation from incomplete measurements recently received increased attention due to the emergence of several challenging applications, such as recommender systems; see in particular the famous Netflix challenge. While the behaviour of algorithms based on nuclear norm minimization is now well understood, an as yet unexplored avenue of research is the behaviour of Bayesian algorithms in this context. In this paper, we briefly review the priors used in the Bayesian literature for matrix completion. A standard approach is to assign an inverse gamma prior to the singular values of a certain singular value decomposition of the matrix of interest; this prior is conjugate. However, we show that two other types of priors (again for the singular values) may be conjugate for this model: a gamma prior, and a discrete prior. Conjugacy is very convenient, as it makes it possible to implement either Gibbs sampling or Variational Bayes. Interestingly enough, the maximum a posteriori for these different priors is related to the nuclear norm minimization problems. We also compare all these priors on simulated datasets, and on the classical MovieLens and Netflix datasets.
△ Less
Submitted 22 October, 2014; v1 submitted 5 June, 2014;
originally announced June 2014.
-
Using informative priors in the estimation of mixtures over time with application to aerosol particle size distributions
Authors:
Darren Wraith,
Kerrie Mengersen,
Clair Alston,
Judith Rousseau,
Tareq Hussein
Abstract:
The issue of using informative priors for estimation of mixtures at multiple time points is examined. Several different informative priors and an independent prior are compared using samples of actual and simulated aerosol particle size distribution (PSD) data. Measurements of aerosol PSDs refer to the concentration of aerosol particles in terms of their size, which is typically multimodal in natu…
▽ More
The issue of using informative priors for estimation of mixtures at multiple time points is examined. Several different informative priors and an independent prior are compared using samples of actual and simulated aerosol particle size distribution (PSD) data. Measurements of aerosol PSDs refer to the concentration of aerosol particles in terms of their size, which is typically multimodal in nature and collected at frequent time intervals. The use of informative priors is found to better identify component parameters at each time point and more clearly establish patterns in the parameters over time. Some caveats to this finding are discussed.
△ Less
Submitted 16 April, 2014;
originally announced April 2014.
-
Bayesian nonparametric dependent model for partially replicated data: the influence of fuel spills on species diversity
Authors:
Julyan Arbel,
Kerrie Mengersen,
Judith Rousseau
Abstract:
We introduce a dependent Bayesian nonparametric model for the probabilistic modeling of membership of subgroups in a community based on partially replicated data. The focus here is on species-by-site data, i.e. community data where observations at different sites are classified in distinct species. Our aim is to study the impact of additional covariates, for instance environmental variables, on th…
▽ More
We introduce a dependent Bayesian nonparametric model for the probabilistic modeling of membership of subgroups in a community based on partially replicated data. The focus here is on species-by-site data, i.e. community data where observations at different sites are classified in distinct species. Our aim is to study the impact of additional covariates, for instance environmental variables, on the data structure, and in particular on the community diversity. To that purpose, we introduce dependence a priori across the covariates, and show that it improves posterior inference. We use a dependent version of the Griffiths-Engen-McCloskey distribution defined via the stick-breaking construction. This distribution is obtained by transforming a Gaussian process whose covariance function controls the desired dependence. The resulting posterior distribution is sampled by Markov chain Monte Carlo. We illustrate the application of our model to a soil microbial dataset acquired across a hydrocarbon contamination gradient at the site of a fuel spill in Antarctica. This method allows for inference on a number of quantities of interest in ecotoxicology, such as diversity or effective concentrations, and is broadly applicable to the general problem of communities response to environmental variables.
△ Less
Submitted 27 July, 2015; v1 submitted 13 February, 2014;
originally announced February 2014.
-
Computational aspects of Bayesian spectral density estimation
Authors:
Nicolas Chopin,
Judith Rousseau,
Brunero Liseo
Abstract:
Gaussian time-series models are often specified through their spectral density. Such models present several computational challenges, in particular because of the non-sparse nature of the covariance matrix. We derive a fast approximation of the likelihood for such models. We propose to sample from the approximate posterior (that is, the prior times the approximate likelihood), and then to recover…
▽ More
Gaussian time-series models are often specified through their spectral density. Such models present several computational challenges, in particular because of the non-sparse nature of the covariance matrix. We derive a fast approximation of the likelihood for such models. We propose to sample from the approximate posterior (that is, the prior times the approximate likelihood), and then to recover the exact posterior through importance sampling. We show that the variance of the importance sampling weights vanishes as the sample size goes to infinity. We explain why the approximate posterior may typically multi-modal, and we derive a Sequential Monte Carlo sampler based on an annealing sequence in order to sample from that target distribution. Performance of the overall approach is evaluated on simulated and real datasets. In addition, for one real world dataset, we provide some numerical evidence that a Bayesian approach to semi-parametric estimation of spectral density may provide more reasonable results than its Frequentist counter-parts.
△ Less
Submitted 19 November, 2012;
originally announced November 2012.
-
Relevant statistics for Bayesian model choice
Authors:
J. -M. Marin,
N. Pillai,
C. P. Robert,
J. Rousseau
Abstract:
The choice of the summary statistics used in Bayesian inference and in particular in ABC algorithms has bearings on the validation of the resulting inference. Those statistics are nonetheless customarily used in ABC algorithms without consistency checks. We derive necessary and sufficient conditions on summary statistics for the corresponding Bayes factor to be convergent, namely to asymptotically…
▽ More
The choice of the summary statistics used in Bayesian inference and in particular in ABC algorithms has bearings on the validation of the resulting inference. Those statistics are nonetheless customarily used in ABC algorithms without consistency checks. We derive necessary and sufficient conditions on summary statistics for the corresponding Bayes factor to be convergent, namely to asymptotically select the true model. Those conditions, which amount to the expectations of the summary statistics to asymptotically differ under both models, are quite natural and can be exploited in ABC settings to infer whether or not a choice of summary statistics is appropriate, via a Monte Carlo validation.
△ Less
Submitted 22 August, 2013; v1 submitted 21 October, 2011;
originally announced October 2011.
-
Inherent Difficulties of Non-Bayesian Likelihood-based Inference, as Revealed by an Examination of a Recent Book by Aitkin
Authors:
Andrew Gelman,
Christian P. Robert,
Judith Rousseau
Abstract:
For many decades, statisticians have made attempts to prepare the Bayesian omelette without breaking the Bayesian eggs; that is, to obtain probabilistic likelihood-based inferences without relying on informative prior distributions. A recent example is Murray Aitkin's recent book, {\em Statistical Inference}, which presents an approach to statistical hypothesis testing based on comparisons of post…
▽ More
For many decades, statisticians have made attempts to prepare the Bayesian omelette without breaking the Bayesian eggs; that is, to obtain probabilistic likelihood-based inferences without relying on informative prior distributions. A recent example is Murray Aitkin's recent book, {\em Statistical Inference}, which presents an approach to statistical hypothesis testing based on comparisons of posterior distributions of likelihoods under competing models. Aitkin develops and illustrates his method using some simple examples of inference from iid data and two-way tests of independence. We analyze in this note some consequences of the inferential paradigm adopted therein, discussing why the approach is incompatible with a Bayesian perspective and why we do not find it relevant for applied work.
△ Less
Submitted 6 November, 2011; v1 submitted 10 December, 2010;
originally announced December 2010.
-
Bayesian nonparametric estimation of the spectral density of a long or intermediate memory Gaussian process
Authors:
Judith Rousseau,
Nicolas Chopin,
Brunero Liseo
Abstract:
A stationary Gaussian process is said to be long-range dependent (resp., anti-persistent) if its spectral density $f(λ)$ can be written as $f(λ)=|λ|^{-2d}g(|λ|)$, where $0<d<1/2$ (resp., $-1/2<d<0$), and $g$ is continuous and positive. We propose a novel Bayesian nonparametric approach for the estimation of the spectral density of such processes. We prove posterior consistency for both $d$ and…
▽ More
A stationary Gaussian process is said to be long-range dependent (resp., anti-persistent) if its spectral density $f(λ)$ can be written as $f(λ)=|λ|^{-2d}g(|λ|)$, where $0<d<1/2$ (resp., $-1/2<d<0$), and $g$ is continuous and positive. We propose a novel Bayesian nonparametric approach for the estimation of the spectral density of such processes. We prove posterior consistency for both $d$ and $g$, under appropriate conditions on the prior distribution. We establish the rate of convergence for a general class of priors and apply our results to the family of fractionally exponential priors. Our approach is based on the true likelihood and does not resort to Whittle's approximation.
△ Less
Submitted 23 July, 2012; v1 submitted 22 July, 2010;
originally announced July 2010.
-
Bayesian Inference
Authors:
Christian P. Robert,
Jean-Michel Marin,
Judith Rousseau
Abstract:
This chapter provides a overview of Bayesian inference, mostly emphasising that it is a universal method for summarising uncertainty and making estimates and predictions using probability statements conditional on observed data and an assumed model (Gelman 2008). The Bayesian perspective is thus applicable to all aspects of statistical inference, while being open to the incorporation of informat…
▽ More
This chapter provides a overview of Bayesian inference, mostly emphasising that it is a universal method for summarising uncertainty and making estimates and predictions using probability statements conditional on observed data and an assumed model (Gelman 2008). The Bayesian perspective is thus applicable to all aspects of statistical inference, while being open to the incorporation of information items resulting from earlier experiments and from expert opinions. We provide here the basic elements of Bayesian analysis when considered for standard models, refering to Marin and Robert (2007) and to Robert (2007) for book-length entries.1 In the following, we refrain from embarking upon philosophical discussions about the nature of knowledge (see, e.g., Robert 2007, Chapter 10), opting instead for a mathematically sound presentation of an eminently practical statistical methodology. We indeed believe that the most convincing arguments for adopting a Bayesian version of data analyses are in the versatility of this tool and in the large range of existing applications, rather than in those polemical arguments.
△ Less
Submitted 10 February, 2010;
originally announced February 2010.
-
On Bayesian Data Analysis
Authors:
Christian P. Robert,
Judith Rousseau
Abstract:
This introduction to Bayesian statistics presents the main concepts as well as the principal reasons advocated in favour of a Bayesian modelling. We cover the various approaches to prior determination as well as the basis asymptotic arguments in favour of using Bayes estimators. The testing aspects of Bayesian inference are also examined in details.
This introduction to Bayesian statistics presents the main concepts as well as the principal reasons advocated in favour of a Bayesian modelling. We cover the various approaches to prior determination as well as the basis asymptotic arguments in favour of using Bayes estimators. The testing aspects of Bayesian inference are also examined in details.
△ Less
Submitted 9 February, 2010; v1 submitted 26 January, 2010;
originally announced January 2010.
-
Rejoinder: Harold Jeffreys's Theory of Probability Revisited
Authors:
Christian P. Robert,
Nicolas Chopin,
Judith Rousseau
Abstract:
We are grateful to all discussants of our re-visitation for their strong support in our enterprise and for their overall agreement with our perspective. Further discussions with them and other leading statisticians showed that the legacy of Theory of Probability is alive and lasting. [arXiv:0804.3173]
We are grateful to all discussants of our re-visitation for their strong support in our enterprise and for their overall agreement with our perspective. Further discussions with them and other leading statisticians showed that the legacy of Theory of Probability is alive and lasting. [arXiv:0804.3173]
△ Less
Submitted 18 January, 2010; v1 submitted 5 September, 2009;
originally announced September 2009.