-
Detection of an Arbitrary Number of Communities in a Block Spin Ising Model
Authors:
Miguel Ballesteros,
Ramsés H. Mena,
José Luis Pérez,
Gabor Toth
Abstract:
We study the problem of community detection in a general version of the block spin Ising model featuring M groups, a model inspired by the Curie-Weiss model of ferromagnetism in statistical mechanics. We solve the general problem of identifying any number of groups with any possible coupling constants. Up to now, the problem was only solved for the specific situation with two groups of identical s…
▽ More
We study the problem of community detection in a general version of the block spin Ising model featuring M groups, a model inspired by the Curie-Weiss model of ferromagnetism in statistical mechanics. We solve the general problem of identifying any number of groups with any possible coupling constants. Up to now, the problem was only solved for the specific situation with two groups of identical size and identical interactions. Our results can be applied to the most realistic situations, in which there are many groups of different sizes and different interactions. In addition, we give an explicit algorithm that permits the reconstruction of the structure of the model from a sample of observations based on the comparison of empirical correlations of the spin variables, thus unveiling easy applications of the model to real-world voting data and communities in biology.
△ Less
Submitted 29 November, 2023;
originally announced November 2023.
-
On a Divergence-Based Prior Analysis of Stick-Breaking Processes
Authors:
José A. Perusquía,
Mario Diaz,
Ramsés H. Mena
Abstract:
The nonparametric view of Bayesian inference has transformed statistics and many of its applications. The canonical Dirichlet process and other more general families of nonparametric priors have served as a gateway to solve frontier uncertainty quantification problems of large, or infinite, nature. This success has been greatly due to available constructions and representations of such distributio…
▽ More
The nonparametric view of Bayesian inference has transformed statistics and many of its applications. The canonical Dirichlet process and other more general families of nonparametric priors have served as a gateway to solve frontier uncertainty quantification problems of large, or infinite, nature. This success has been greatly due to available constructions and representations of such distributions, the two most useful constructions are the one based on normalization of homogeneous completely random measures and that based on stick-breaking processes. Hence, understanding their distributional features and how different random probability measures compare among themselves is a key ingredient for their proper application. In this paper, we analyse the discrepancy among some nonparametric priors employed in the literature. Initially, we compute the mean and variance of the random Kullback-Leibler divergence between the Dirichlet process and the geometric process. Subsequently, we extend our analysis to encompass a broader class of exchangeable stick-breaking processes, which includes the Dirichlet and geometric processes as extreme cases. Our results establish quantitative conditions where all the aforementioned priors are close in total variation distance. In such instances, adhering to Occam's razor principle advocates for the preference of the simpler process.
△ Less
Submitted 12 October, 2023; v1 submitted 22 August, 2023;
originally announced August 2023.
-
Clustering constrained on linear networks
Authors:
Asael Fabian Martínez,
Somnath Chaudhuri,
Carlos Díaz-Avalos,
Pablo Juan,
Jorge Mateu,
Ramsés H. Mena
Abstract:
An unsupervised classification method for point events occurring on a network of lines is proposed. The idea relies on the distributional flexibility and practicality of random partition models to discover the clustering structure featuring observations from a particular phenomenon taking place on a given set of edges. By incorporating the spatial effect in the random partition distribution, induc…
▽ More
An unsupervised classification method for point events occurring on a network of lines is proposed. The idea relies on the distributional flexibility and practicality of random partition models to discover the clustering structure featuring observations from a particular phenomenon taking place on a given set of edges. By incorporating the spatial effect in the random partition distribution, induced by a Dirichlet process, one is able to control the distance between edges and events, thus leading to an appealing clustering method. A Gibbs sampler algorithm is proposed and evaluated with a sensitivity analysis. The proposal is motivated and illustrated by the analysis of crime and violence patterns in Mexico City.
△ Less
Submitted 19 July, 2022;
originally announced July 2022.
-
COVID-19 Clinical footprint to infer about mortality
Authors:
Carlos E. Rodríguez,
Ramsés H. Mena
Abstract:
Information of 1.6 million patients identified as SARS-CoV-2 positive in Mexico is used to understand the relationship between comorbidities, symptoms, hospitalizations and deaths due to the COVID-19 disease. Using the presence or absence of these latter variables a clinical footprint for each patient is created. The risk, expected mortality and the prediction of death outcomes, among other releva…
▽ More
Information of 1.6 million patients identified as SARS-CoV-2 positive in Mexico is used to understand the relationship between comorbidities, symptoms, hospitalizations and deaths due to the COVID-19 disease. Using the presence or absence of these latter variables a clinical footprint for each patient is created. The risk, expected mortality and the prediction of death outcomes, among other relevant quantities, are obtained and analyzed by means of a multivariate Bernoulli distribution. The proposal considers all possible footprint combinations resulting in a robust model suitable for Bayesian inference.
△ Less
Submitted 14 April, 2021;
originally announced April 2021.
-
Asymptotic behavior of the number of distinct values in a sample from the geometric stick-breaking process
Authors:
Pierpaolo De Blasi,
Ramsés H. Mena,
Igor Prünster
Abstract:
Discrete random probability measures are a key ingredient of Bayesian nonparametric inferential procedures. A sample generates ties with positive probability and a fundamental object of both theoretical and applied interest is the corresponding random number of distinct values. The growth rate can be determined from the rate of decay of the small frequencies implying that, when the decreasingly or…
▽ More
Discrete random probability measures are a key ingredient of Bayesian nonparametric inferential procedures. A sample generates ties with positive probability and a fundamental object of both theoretical and applied interest is the corresponding random number of distinct values. The growth rate can be determined from the rate of decay of the small frequencies implying that, when the decreasingly ordered frequencies admit a tractable form, the asymptotics of the number of distinct values can be conveniently assessed. We focus on the geometric stick-breaking process and we investigate the effect of the choice of the distribution for the success probability on the asymptotic behavior of the number of distinct values. We show that a whole range of logarithmic behaviors are obtained by appropriately tuning the prior. We also derive a two-term expansion and illustrate its use in a comparison with a larger family of discrete random probability measures having an additional parameter given by the scale of the negative binomial distribution.
△ Less
Submitted 19 January, 2021;
originally announced January 2021.
-
Stick-breaking processes with exchangeable length variables
Authors:
María F. Gil-Leyva,
Ramsés H. Mena
Abstract:
Our object of study is the general class of stick-breaking processes with exchangeable length variables. These generalize well-known Bayesian non-parametric priors in an unexplored direction. We give conditions to assure the respective species sampling process is proper and the corresponding prior has full support. For a rich sub-class we explain how, by tuning a single $[0,1]$-valued parameter, t…
▽ More
Our object of study is the general class of stick-breaking processes with exchangeable length variables. These generalize well-known Bayesian non-parametric priors in an unexplored direction. We give conditions to assure the respective species sampling process is proper and the corresponding prior has full support. For a rich sub-class we explain how, by tuning a single $[0,1]$-valued parameter, the stochastic ordering of the weights can be modulated, and Dirichlet and Geometric priors can be recovered. A general formula for the distribution of the latent allocation variables is derived and an MCMC algorithm is proposed for density estimation purposes.
△ Less
Submitted 18 July, 2021; v1 submitted 10 August, 2020;
originally announced August 2020.
-
A Copula-based Fully Bayesian Nonparametric Evaluation of Cardiovascular Risk Markers in the Mexico City Diabetes Study
Authors:
Claudia Wehrhahn,
Ruth Fuentes-García,
Ramsés H. Mena,
Fabrizio Leisen,
Maria Elena González-Villalpando,
Clicerio González-Villalpando
Abstract:
Cardiovascular disease lead the cause of death world wide and several studies have been carried out to understand and explore cardiovascular risk markers in normoglycemic and diabetic populations. In this work, we explore the association structure between hyperglycemic markers and cardiovascular risk markers controlled by triglycerides, body mass index, age and gender, for the normoglycemic popula…
▽ More
Cardiovascular disease lead the cause of death world wide and several studies have been carried out to understand and explore cardiovascular risk markers in normoglycemic and diabetic populations. In this work, we explore the association structure between hyperglycemic markers and cardiovascular risk markers controlled by triglycerides, body mass index, age and gender, for the normoglycemic population in The Mexico City Diabetes Study. Understanding the association structure could contribute to the assessment of additional cardiovascular risk markers in this low income urban population with a high prevalence of classic cardiovascular risk biomarkers. The association structure is measured by conditional Kendall's tau, defined through conditional copula functions. The latter are in turn modeled under a fully Bayesian nonparametric approach, which allows the complete shape of the copula function to vary for different values of the controlled covariates.
△ Less
Submitted 20 August, 2021; v1 submitted 22 July, 2020;
originally announced July 2020.
-
Using posterior predictive distributions to analyse epidemic models: COVID-19 in Mexico City
Authors:
Ramsés H. Mena,
Jorge X. Velasco-Hernandez,
Natalia B. Mantilla-Beniers,
Gabriel A. Carranco-Sapiéns,
Luis Benet,
Denis Boyer,
Isaac Pérez Castillo
Abstract:
Epidemiological models contain a set of parameters that must be adjusted based on available observations. Once a model has been calibrated, it can be used as a forecasting tool to make predictions and to evaluate contingency plans. It is customary to employ only point estimators for such predictions. However, some models may fit the same data reasonably well for a broad range of parameter values,…
▽ More
Epidemiological models contain a set of parameters that must be adjusted based on available observations. Once a model has been calibrated, it can be used as a forecasting tool to make predictions and to evaluate contingency plans. It is customary to employ only point estimators for such predictions. However, some models may fit the same data reasonably well for a broad range of parameter values, and this flexibility means that predictions stemming from such models will vary widely, depending on the particular parameter values employed within the range that give a good fit. When data are poor or incomplete, model uncertainty widens further. A way to circumvent this problem is to use Bayesian statistics to incorporate observations and use the full range of parameter estimates contained in the posterior distribution to adjust for uncertainties in model predictions. Specifically, given the epidemiological model and a probability distribution for observations, we use the posterior distribution of model parameters to generate all possible epidemiological curves via the posterior predictive distribution. From the envelope of all curves one can extract the worst-case scenario and study the impact of implementing contingency plans according to this assessment. We apply this approach to the potential evolution of COVID-19 in Mexico City and assess whether contingency plans are being successful and whether the epidemiological curve has flattened.
△ Less
Submitted 15 May, 2020; v1 submitted 5 May, 2020;
originally announced May 2020.
-
Beta-Binomial stick-breaking non-parametric prior
Authors:
María F. Gil-Leyva,
Ramsés H. Mena,
Theodoros Nicoleris
Abstract:
A new class of nonparametric prior distributions, termed Beta-Binomial stick-breaking process, is proposed. By allowing the underlying length random variables to be dependent through a Beta marginals Markov chain, an appealing discrete random probability measure arises. The chain's dependence parameter controls the ordering of the stick-breaking weights, and thus tunes the model's label-switching…
▽ More
A new class of nonparametric prior distributions, termed Beta-Binomial stick-breaking process, is proposed. By allowing the underlying length random variables to be dependent through a Beta marginals Markov chain, an appealing discrete random probability measure arises. The chain's dependence parameter controls the ordering of the stick-breaking weights, and thus tunes the model's label-switching ability. Also, by tuning this parameter, the resulting class contains the Dirichlet process and the Geometric process priors as particular cases, which is of interest for fast convergence of MCMC implementations. Some properties of the model are discussed and a density estimation algorithm is proposed and tested with simulated datasets.
△ Less
Submitted 10 August, 2020; v1 submitted 19 August, 2019;
originally announced August 2019.
-
Modeling failures times with dependent renewal type models via exchangeability
Authors:
Arrigo Coen,
Luis Gutiérrez,
Ramsés H. Mena
Abstract:
Failure times of a machinery cannot always be assumed independent and identically distributed, e.g. if after reparations the machinery is not restored to a same-as-new condition. Framed within the renewal processes approach, a generalization that considers exchangeable inter-arrival times is presented. The resulting model provides a more realistic approach to capture the dependence among events oc…
▽ More
Failure times of a machinery cannot always be assumed independent and identically distributed, e.g. if after reparations the machinery is not restored to a same-as-new condition. Framed within the renewal processes approach, a generalization that considers exchangeable inter-arrival times is presented. The resulting model provides a more realistic approach to capture the dependence among events occurring at random times, while retaining much of the tractability of the classical renewal process. Extensions of some classical results and special cases of renewal functions are analyzed, in particular the one corresponding to an exchangeable sequence driven by a Dirichlet process. The proposal is tested through an estimation procedure using simulated data sets and with an application to the reliability of hydraulic subsystems in load-haul-dump machines.
△ Less
Submitted 13 May, 2019;
originally announced May 2019.
-
On a flexible construction of a negative binomial model
Authors:
Fabrizio Leisen,
Ramsés H. Mena,
Freddy Palma Mancilla,
Luca Rossini
Abstract:
This work presents a construction of stationary Markov models with negative-binomial marginal distributions. A simple closed form expression for the corresponding transition probabilities is given, linking the proposal to well-known classes of birth and death processes and thus revealing interesting characterizations. The advantage of having such closed form expressions is tested on simulated and…
▽ More
This work presents a construction of stationary Markov models with negative-binomial marginal distributions. A simple closed form expression for the corresponding transition probabilities is given, linking the proposal to well-known classes of birth and death processes and thus revealing interesting characterizations. The advantage of having such closed form expressions is tested on simulated and real data.
△ Less
Submitted 9 April, 2019; v1 submitted 18 December, 2018;
originally announced December 2018.
-
Discussions of the paper "Sparse graphs using exchangeable random measures" by F. Caron and E. B. Fox
Authors:
Julyan Arbel,
Marco Battiston,
Stefano Favaro,
Antonio Lijoi,
Igor Prünster,
Ramsés H. Mena,
Yang Ni,
Peter Müller
Abstract:
These are written discussions of the paper "Sparse graphs using exchangeable random measures" by François Caron and Emily B. Fox, contributed to the Journal of the Royal Statistical Society Series B.
These are written discussions of the paper "Sparse graphs using exchangeable random measures" by François Caron and Emily B. Fox, contributed to the Journal of the Royal Statistical Society Series B.
△ Less
Submitted 4 July, 2017;
originally announced July 2017.
-
A Harris process to model stochastic volatility
Authors:
Michelle Anzarut,
Ramses H. Mena
Abstract:
We present a tractable non-independent increment process which provides a high modeling flexibility. The process lies on an extension of the so-called Harris chains to continuous time being stationary and Feller. We exhibit constructions, properties, and inference methods for the process. Afterwards, we use the process to propose a stochastic volatility model with an arbitrary but fixed invariant…
▽ More
We present a tractable non-independent increment process which provides a high modeling flexibility. The process lies on an extension of the so-called Harris chains to continuous time being stationary and Feller. We exhibit constructions, properties, and inference methods for the process. Afterwards, we use the process to propose a stochastic volatility model with an arbitrary but fixed invariant distribution, which can be tailored to fit different applied scenarios. We study the model performance through simulation while illustrating its use in practice with empirical work. The model proves to be an interesting competitor to a number of short-range stochastic volatility models.
△ Less
Submitted 17 May, 2016;
originally announced May 2016.
-
Are Gibbs-type priors the most natural generalization of the Dirichlet process?
Authors:
P. De Blasi,
S. Favaro,
A. Lijoi,
R. H. Mena,
I. Pruenster,
M. Ruggiero
Abstract:
Discrete random probability measures and the exchangeable random partitions they induce are key tools for addressing a variety of estimation and prediction problems in Bayesian inference. Indeed, many popular nonparametric priors, such as the Dirichlet and the Pitman-Yor process priors, select discrete probability distributions almost surely and, therefore, automatically induce exchangeable random…
▽ More
Discrete random probability measures and the exchangeable random partitions they induce are key tools for addressing a variety of estimation and prediction problems in Bayesian inference. Indeed, many popular nonparametric priors, such as the Dirichlet and the Pitman-Yor process priors, select discrete probability distributions almost surely and, therefore, automatically induce exchangeable random partitions. Here we focus on the family of Gibbs-type priors, a recent and elegant generalization of the Dirichlet and the Pitman-Yor process priors. These random probability measures share properties that are appealing both from a theoretical and an applied point of view: (i) they admit an intuitive characterization in terms of their predictive structure justifying their use in terms of a precise assumption on the learning mechanism; (ii) they stand out in terms of mathematical tractability; (iii) they include several interesting special cases besides the Dirichlet and the Pitman-Yor processes. The goal of our paper is to provide a systematic and unified treatment of Gibbs-type priors and highlight their implications for Bayesian nonparametric inference. We will deal with their distributional properties, the resulting estimators, frequentist asymptotic validation and the construction of time-dependent versions. Applications, mainly concerning hierarchical mixture models and species sampling, will serve to convey the main ideas. The intuition inherent to this class of priors and the neat results that can be deduced for it lead one to wonder whether it actually represents the most natural generalization of the Dirichlet process.
△ Less
Submitted 28 February, 2015;
originally announced March 2015.
-
Dynamic density estimation with diffusive Dirichlet mixtures
Authors:
Ramsés H. Mena,
Matteo Ruggiero
Abstract:
We introduce a new class of nonparametric prior distributions on the space of continuously varying densities, induced by Dirichlet process mixtures which diffuse in time. These select time-indexed random functions without jumps, whose sections are continuous or discrete distributions depending on the choice of kernel. The construction exploits the widely used stick-breaking representation of the D…
▽ More
We introduce a new class of nonparametric prior distributions on the space of continuously varying densities, induced by Dirichlet process mixtures which diffuse in time. These select time-indexed random functions without jumps, whose sections are continuous or discrete distributions depending on the choice of kernel. The construction exploits the widely used stick-breaking representation of the Dirichlet process and induces the time dependence by replacing the stick-breaking components with one-dimensional Wright-Fisher diffusions. These features combine appealing properties of the model, inherited from the Wright-Fisher diffusions and the Dirichlet mixture structure, with great flexibility and tractability for posterior computation. The construction can be easily extended to multi-parameter GEM marginal states, which include, for example, the Pitman--Yor process. A full inferential strategy is detailed and illustrated on simulated and real data.
△ Less
Submitted 9 February, 2016; v1 submitted 9 October, 2014;
originally announced October 2014.
-
A probability for classification based on the mixture of Dirichlet process model
Authors:
Ruth Fuentes-Garcia,
Ramses H Mena,
Stephen G Walker
Abstract:
In this paper, we provide an explicit probability distribution for classification purposes. It is derived from the Bayesian nonparametric mixture of Dirichlet process model, but with suitable modifications which remove unsuitable aspects of the classification based on this model. The resulting approach then more closely resembles a classical hierarchical grou** rule in that it depends on sums…
▽ More
In this paper, we provide an explicit probability distribution for classification purposes. It is derived from the Bayesian nonparametric mixture of Dirichlet process model, but with suitable modifications which remove unsuitable aspects of the classification based on this model. The resulting approach then more closely resembles a classical hierarchical grou** rule in that it depends on sums of squares of neighboring values. The proposed probability model for classification relies on a simulation algorithm which will be based on a reversible MCMC algorithm for determining the probabilities, and we provide numerical illustrations comparing with alternative ideas for classification.
△ Less
Submitted 2 May, 2009;
originally announced May 2009.