-
Empirical Bayes in Bayesian learning: understanding a common practice
Authors:
Stefano Rizzelli,
Judith Rousseau,
Sonia Petrone
Abstract:
In applications of Bayesian procedures, even when the prior law is carefully specified, it may be delicate to elicit the prior hyperparameters so that it is often tempting to fix them from the data, usually by their maximum likelihood estimates (MMLE), obtaining a so-called empirical Bayes posterior distribution. Although questionable, this is a common practice; but theoretical properties seem mos…
▽ More
In applications of Bayesian procedures, even when the prior law is carefully specified, it may be delicate to elicit the prior hyperparameters so that it is often tempting to fix them from the data, usually by their maximum likelihood estimates (MMLE), obtaining a so-called empirical Bayes posterior distribution. Although questionable, this is a common practice; but theoretical properties seem mostly only available on a case-by-case basis. In this paper we provide general properties for parametric models. First, we study the limit behavior of the MMLE and prove results in quite general settings, while also conceptualizing the frequentist context as an unexplored case of maximum likelihood estimation under model misspecification. We cover both identifiable models, illustrating applications to sparse regression, and non-identifiable models - specifically, overfitted mixture models. Finally, we prove higher order merging results. In regular cases, the empirical Bayes posterior is shown to be a fast approximation to the Bayesian posterior distribution of the researcher who, within the given class of priors, has the most information about the true model's parameters. This is a faster approximation than classic Bernstein-von Mises results. Given the class of priors, our work provides formal contents to common beliefs on this popular practice.
△ Less
Submitted 29 February, 2024;
originally announced February 2024.
-
Exchangeability, prediction and predictive modeling in Bayesian statistics
Authors:
Sandra Fortini,
Sonia Petrone
Abstract:
Prediction is a central problem in Statistics, and there is currently a renewed interest for the so-called predictive approach in Bayesian statistics. What is the latter about? One has to return on foundational concepts, which we do in this paper, moving from the role of exchangeability and reviewing forms of partial exchangeability for more structured data, with the aim of discussing their use an…
▽ More
Prediction is a central problem in Statistics, and there is currently a renewed interest for the so-called predictive approach in Bayesian statistics. What is the latter about? One has to return on foundational concepts, which we do in this paper, moving from the role of exchangeability and reviewing forms of partial exchangeability for more structured data, with the aim of discussing their use and implications in Bayesian statistics. There we show the underlying concept that, in Bayesian statistics, a predictive rule is meant as a learning rule - how one conveys past information to information on future events. This concept has implications on the use of exchangeability and generally invests all statistical problems, also in inference. It applies to classic contexts and to less explored situations, such as the use of predictive algorithms that can be read as Bayesian learning rules. The paper offers a historical overview, but also includes a few new results, presents some recent developments and poses some open questions.
△ Less
Submitted 15 February, 2024;
originally announced February 2024.
-
Bayesian dependent mixture models: A predictive comparison and survey
Authors:
Sara Wade,
Vanda Inacio,
Sonia Petrone
Abstract:
For exchangeable data, mixture models are an extremely useful tool for density estimation due to their attractive balance between smoothness and flexibility. When additional covariate information is present, mixture models can be extended for flexible regression by modeling the mixture parameters, namely the weights and atoms, as functions of the covariates. These types of models are interpretable…
▽ More
For exchangeable data, mixture models are an extremely useful tool for density estimation due to their attractive balance between smoothness and flexibility. When additional covariate information is present, mixture models can be extended for flexible regression by modeling the mixture parameters, namely the weights and atoms, as functions of the covariates. These types of models are interpretable and highly flexible, allowing non only the mean but the whole density of the response to change with the covariates, which is also known as density regression. This article reviews Bayesian covariate-dependent mixture models and highlights which data types can be accommodated by the different models along with the methodological and applied areas where they have been used. In addition to being highly flexible, these models are also numerous; we focus on nonparametric constructions and broadly organize them into three categories: 1) joint models of the responses and covariates, 2) conditional models with single-weights and covariate-dependent atoms, and 3) conditional models with covariate-dependent weights. The diversity and variety of the available models in the literature raises the question of how to choose among them for the application at hand. We attempt to shed light on this question through a careful analysis of the predictive equations for the conditional mean and density function as well as predictive comparisons in three simulated data examples.
△ Less
Submitted 30 July, 2023;
originally announced July 2023.
-
Bayesian Time-Varying Tensor Vector Autoregressive Models for Dynamic Effective Connectivity
Authors:
Wei Zhang,
Ivor Cribben,
sonia Petrone,
Michele Guindani
Abstract:
In contemporary neuroscience, a key area of interest is dynamic effective connectivity, which is crucial for understanding the dynamic interactions and causal relationships between different brain regions. Dynamic effective connectivity can provide insights into how brain network interactions are altered in neurological disorders such as dyslexia. Time-varying vector autoregressive (TV-VAR) models…
▽ More
In contemporary neuroscience, a key area of interest is dynamic effective connectivity, which is crucial for understanding the dynamic interactions and causal relationships between different brain regions. Dynamic effective connectivity can provide insights into how brain network interactions are altered in neurological disorders such as dyslexia. Time-varying vector autoregressive (TV-VAR) models have been employed to draw inferences for this purpose. However, their significant computational requirements pose challenges, since the number of parameters to be estimated increases quadratically with the number of time series. In this paper, we propose a computationally efficient Bayesian time-varying VAR approach. For dealing with large-dimensional time series, the proposed framework employs a tensor decomposition for the VAR coefficient matrices at different lags. Dynamically varying connectivity patterns are captured by assuming that at any given time only a subset of components in the tensor decomposition is active. Latent binary time series select the active components at each time via an innovative and parsimonious Ising model in the time-domain. Furthermore, we propose parsity-inducing priors to achieve global-local shrinkage of the VAR coefficients, determine automatically the rank of the tensor decomposition and guide the selection of the lags of the auto-regression. We show the performances of our model formulation via simulation studies and data from a real fMRI study involving a book reading experiment.
△ Less
Submitted 28 May, 2024; v1 submitted 26 June, 2021;
originally announced June 2021.
-
Infinite-color randomly reinforced urns with dominant colors
Authors:
Hristo Sariev,
Sandra Fortini,
Sonia Petrone
Abstract:
We define and prove limit results for a class of dominant Pólya sequences, which are randomly reinforced urn processes with color-specific random weights and unbounded number of possible colors. Under fairly mild assumptions on the expected reinforcement, we show that the predictive and the empirical distributions converge almost surely (a.s.) in total variation to the same random probability meas…
▽ More
We define and prove limit results for a class of dominant Pólya sequences, which are randomly reinforced urn processes with color-specific random weights and unbounded number of possible colors. Under fairly mild assumptions on the expected reinforcement, we show that the predictive and the empirical distributions converge almost surely (a.s.) in total variation to the same random probability measure $\tilde{P}$; moreover, $\tilde{P}(\mathcal{D})=1$ a.s., where $\mathcal{D}$ denotes the set of dominant colors for which the expected reinforcement is maximum. In the general case, the predictive probabilities and the empirical frequencies of any $δ$-neighborhood of $\mathcal{D}$ converge a.s. to one. That is, although non-dominant colors continue to be regularly observed, their distance to $\mathcal{D}$ converges in probability to zero. We refine the above results with rates of convergence. We further hint potential applications of dominant Pólya sequences in randomized clinical trials and species sampling, and use our central limit results for Bayesian inference.
△ Less
Submitted 11 December, 2023; v1 submitted 8 June, 2021;
originally announced June 2021.
-
Enriched Pitman-Yor processes
Authors:
Tommaso Rigon,
Bruno Scarpa,
Sonia Petrone
Abstract:
In Bayesian nonparametrics there exists a rich variety of discrete priors, including the Dirichlet process and its generalizations, which are nowadays well-established tools. Despite the remarkable advances, few proposals are tailored for modeling observations lying on product spaces, such as $\mathbb{R}^p$. Indeed, for multivariate random measures, most available priors lack flexibility and do no…
▽ More
In Bayesian nonparametrics there exists a rich variety of discrete priors, including the Dirichlet process and its generalizations, which are nowadays well-established tools. Despite the remarkable advances, few proposals are tailored for modeling observations lying on product spaces, such as $\mathbb{R}^p$. Indeed, for multivariate random measures, most available priors lack flexibility and do not allow for separate partition structures among the spaces. We introduce a discrete nonparametric prior, termed enriched Pitman-Yor process (EPY), aimed at addressing these issues. Theoretical properties of this novel prior are extensively investigated. We discuss its formal link with the enriched Dirichlet process and normalized random measures, we describe a square-breaking representation and we obtain closed-form expressions for the posterior law and the involved urn schemes. In second place, we show that several existing approaches, including Dirichlet processes with a spike and slab base measure and mixture of mixtures models, implicitly rely on special cases of the EPY, which therefore constitutes a unified probabilistic framework for many Bayesian nonparametric priors. Interestingly, our unifying formulation will allow us to naturally extend these models while preserving their analytical tractability. As an illustration, we employ the EPY for a species sampling problem in ecology and for functional clustering in an e-commerce application.
△ Less
Submitted 25 February, 2022; v1 submitted 26 March, 2020;
originally announced March 2020.
-
Modeling a Hidden Dynamical System Using Energy Minimization and Kernel Density Estimates
Authors:
Trevor K. Karn,
Steven Petrone,
Christopher Griffin
Abstract:
In this paper we develop a kernel density estimation (KDE) approach to modeling and forecasting recurrent trajectories on a compact manifold. For the purposes of this paper, a trajectory is a sequence of coordinates in a phase space defined by an underlying hidden dynamical system. Our work is inspired by earlier work on the use of KDE to detect ship** anomalies using high-density, high-quality…
▽ More
In this paper we develop a kernel density estimation (KDE) approach to modeling and forecasting recurrent trajectories on a compact manifold. For the purposes of this paper, a trajectory is a sequence of coordinates in a phase space defined by an underlying hidden dynamical system. Our work is inspired by earlier work on the use of KDE to detect ship** anomalies using high-density, high-quality automated information system (AIS) data as well as our own earlier work in trajectory modeling. We focus specifically on the sparse, noisy trajectory reconstruction problem in which the data are (i) sparsely sampled and (ii) subject to an imperfect observer that introduces noise. Under certain regularity assumptions, we show that the constructed estimator minimizes a specific energy function defined over the trajectory as the number of samples obtained grows.
△ Less
Submitted 30 July, 2019; v1 submitted 8 April, 2019;
originally announced April 2019.
-
Quasi-Bayes properties of a recursive procedure for mixtures
Authors:
Sandra Fortini,
Sonia Petrone
Abstract:
Bayesian methods are attractive and often optimal, yet nowadays pressure for fast computations, especially with streaming data and online learning, brings renewed interest in faster, although possibly sub-optimal, solutions. To what extent these algorithms may approximate a Bayesian solution is a problem of interest, not always solved. On this background, in this paper we revisit a sequential proc…
▽ More
Bayesian methods are attractive and often optimal, yet nowadays pressure for fast computations, especially with streaming data and online learning, brings renewed interest in faster, although possibly sub-optimal, solutions. To what extent these algorithms may approximate a Bayesian solution is a problem of interest, not always solved. On this background, in this paper we revisit a sequential procedure proposed by Smith and Makov (1978) for unsupervised learning and classification in finite mixtures, and developed by M. Newton and Zhang (1999), for nonparametric mixtures. Newton's algorithm is simple and fast, and theoretically intriguing. Although originally proposed as an approximation of the Bayesian solution, its quasi-Bayes properties remain unclear. We propose a novel methodological approach. We regard the algorithm as a probabilistic learning rule, that implicitly defines an underlying probabilistic model; and we find this model. We can then prove that it is, asymptotically, a Bayesian, exchangeable mixture model. Moreover, while the algorithm only offers a point estimate, our approach allows us to obtain an asymptotic posterior distribution and asymptotic credible intervals for the mixing distribution. Our results also provide practical hints for tuning the algorithm and obtaining desirable properties, as we illustrate in a simulation study. Beyond mixture models, our study suggests a theoretical framework that may be of interest for recursive quasi-Bayes methods in other settings.
△ Less
Submitted 27 February, 2019;
originally announced February 2019.
-
A closed-form filter for binary time series
Authors:
Augusto Fasano,
Giovanni Rebaudo,
Daniele Durante,
Sonia Petrone
Abstract:
Non-Gaussian state-space models arise in several applications, and within this framework the binary time series setting provides a relevant example. However, unlike for Gaussian state-space models - where filtering, predictive and smoothing distributions are available in closed form - binary state-space models require approximations or sequential Monte Carlo strategies for inference and prediction…
▽ More
Non-Gaussian state-space models arise in several applications, and within this framework the binary time series setting provides a relevant example. However, unlike for Gaussian state-space models - where filtering, predictive and smoothing distributions are available in closed form - binary state-space models require approximations or sequential Monte Carlo strategies for inference and prediction. This is due to the apparent absence of conjugacy between the Gaussian states and the likelihood induced by the observation equation for the binary data. In this article we prove that the filtering, predictive and smoothing distributions in dynamic probit models with Gaussian state variables are, in fact, available and belong to a class of unified skew-normals (SUN) whose parameters can be updated recursively in time via analytical expressions. Also the key functionals of these distributions are, in principle, available, but their calculation requires the evaluation of multivariate Gaussian cumulative distribution functions. Leveraging SUN properties, we address this issue via novel Monte Carlo methods based on independent samples from the smoothing distribution, that can easily be adapted to the filtering and predictive case, thus improving state-of-the-art approximate and sequential Monte Carlo inference in small-to-moderate dimensional studies. Novel sequential Monte Carlo procedures that exploit the SUN properties are also developed to deal with online inference in high dimensions. Performance gains over competitors are outlined in a financial application.
△ Less
Submitted 18 May, 2021; v1 submitted 19 February, 2019;
originally announced February 2019.
-
On a notion of partially conditionally identically distributed sequences
Authors:
Sandra Fortini,
Sonia Petrone,
Polina Sporysheva
Abstract:
A notion of conditionally identically distributed (c.i.d.) sequences has been studied as a form of stochastic dependence that is weaker than exchangeability, but is equivalent to exchangeability for stationary sequences. In this article we extend this notion to families of sequences. Paralleling the extension from exchangeability to partial exchangeability in the sense of de Finetti, we propose a…
▽ More
A notion of conditionally identically distributed (c.i.d.) sequences has been studied as a form of stochastic dependence that is weaker than exchangeability, but is equivalent to exchangeability for stationary sequences. In this article we extend this notion to families of sequences. Paralleling the extension from exchangeability to partial exchangeability in the sense of de Finetti, we propose a notion of partially c.i.d. dependence, that is equivalent to partial exchangeability for stationary processes. Partially c.i.d. families of sequences preserve attractive limit properties of partial exchangeability, and are asymptotically partially exchangeable. Moreover, we provide strong laws of large numbers and two central limit theorems. Our focus is on the asymptotic agreement of predictions and empirical means, which lies in the foundations of Bayesian statistics. Natural examples are interacting randomly reinforced processes satisfying certain conditions on the reinforcement.
△ Less
Submitted 6 March, 2017; v1 submitted 1 August, 2016;
originally announced August 2016.
-
Predictive Characterization of Mixtures of Markov Chains
Authors:
Sandra Fortini,
Sonia Petrone
Abstract:
Predictive constructions are a powerful way of characterizing the probability law of stochastic processes with certain forms of invariance, such as exchangeability or Markov exchangeability. When de Finetti-like representation theorems are available, the predictive characterization implicitly defines the prior distribution, starting from assumptions on the observables; moreover, it often helps des…
▽ More
Predictive constructions are a powerful way of characterizing the probability law of stochastic processes with certain forms of invariance, such as exchangeability or Markov exchangeability. When de Finetti-like representation theorems are available, the predictive characterization implicitly defines the prior distribution, starting from assumptions on the observables; moreover, it often helps designing efficient computational strategies. In this paper we give necessary and sufficient conditions on the sequence of predictive distributions such that they characterize a Markov exchangeable probability law for a discrete valued process X. Under recurrence, Markov exchangeable processes are mixtures of Markov chains. Thus, our results help checking when a predictive scheme characterizes a prior for Bayesian inference on the unknown transition matrix of a Markov chain. Our predictive conditions are in some sense minimal sufficient conditions for Markov exchangeability; we also provide predictive conditions for recurrence. We illustrate their application in relevant examples from the literature and in novel constructions.
△ Less
Submitted 13 November, 2015; v1 submitted 20 June, 2014;
originally announced June 2014.
-
Bayes and empirical Bayes: do they merge?
Authors:
Sonia Petrone,
Judith Rousseau,
Catia Scricciolo
Abstract:
Bayesian inference is attractive for its coherence and good frequentist properties. However, it is a common experience that eliciting a honest prior may be difficult and, in practice, people often take an {\em empirical Bayes} approach, plugging empirical estimates of the prior hyperparameters into the posterior distribution. Even if not rigorously justified, the underlying idea is that, when the…
▽ More
Bayesian inference is attractive for its coherence and good frequentist properties. However, it is a common experience that eliciting a honest prior may be difficult and, in practice, people often take an {\em empirical Bayes} approach, plugging empirical estimates of the prior hyperparameters into the posterior distribution. Even if not rigorously justified, the underlying idea is that, when the sample size is large, empirical Bayes leads to "similar" inferential answers. Yet, precise mathematical results seem to be missing. In this work, we give a more rigorous justification in terms of merging of Bayes and empirical Bayes posterior distributions. We consider two notions of merging: Bayesian weak merging and frequentist merging in total variation. Since weak merging is related to consistency, we provide sufficient conditions for consistency of empirical Bayes posteriors. Also, we show that, under regularity conditions, the empirical Bayes procedure asymptotically selects the value of the hyperparameter for which the prior mostly favors the "truth". Examples include empirical Bayes density estimation with Dirichlet process mixtures.
△ Less
Submitted 6 April, 2012;
originally announced April 2012.