Search | arXiv e-print repository

A continuous approach of modeling tumorigenesis and axons regulation for the pancreatic cancer

Authors: Marie-Jose Chaaya, Sophie Chauvet, Florence Hubert, Fanny Mann, Mathieu Mezache, Pierre Pudlo

Abstract: The pancreatic innervation undergoes dynamic remodeling during the development of pancreatic ductal adenocarcinoma (PDAC). Denervation experiments have shown that different types of axons can exert either pro- or anti-tumor effects, but conflicting results exist in the literature, leaving the overall influence of the nervous system on PDAC incompletely understood. To address this gap, we propose a… ▽ More The pancreatic innervation undergoes dynamic remodeling during the development of pancreatic ductal adenocarcinoma (PDAC). Denervation experiments have shown that different types of axons can exert either pro- or anti-tumor effects, but conflicting results exist in the literature, leaving the overall influence of the nervous system on PDAC incompletely understood. To address this gap, we propose a continuous mathematical model of nerve-tumor interactions that allows in silico simulation of denervation at different phases of tumor development. This model takes into account the pro- or anti-tumor properties of different types of axons (sympathetic or sensory) and their distinct remodeling dynamics during PDAC development. We observe a "shift effect" where an initial pro-tumor effect of sympathetic axon denervation is later outweighed by the anti-tumor effect of sensory axon denervation, leading to a transition from an overall protective to a deleterious role of the nervous system on PDAC tumorigenesis. Our model also highlights the importance of the impact of sympathetic axon remodeling dynamics on tumor progression. These findings may guide strategies targeting the nervous system to improve PDAC treatment. △ Less

Submitted 3 April, 2024; originally announced April 2024.

arXiv:2310.06508 [pdf, other]

Topological data analysis of human vowels: Persistent homologies across representation spaces

Authors: Guillem Bonafos, Jean-Marc Freyermuth, Pierre Pudlo, Samuel Tronçon, Arnaud Rey

Abstract: Topological Data Analysis (TDA) has been successfully used for various tasks in signal/image processing, from visualization to supervised/unsupervised classification. Often, topological characteristics are obtained from persistent homology theory. The standard TDA pipeline starts from the raw signal data or a representation of it. Then, it consists in building a multiscale topological structure on… ▽ More Topological Data Analysis (TDA) has been successfully used for various tasks in signal/image processing, from visualization to supervised/unsupervised classification. Often, topological characteristics are obtained from persistent homology theory. The standard TDA pipeline starts from the raw signal data or a representation of it. Then, it consists in building a multiscale topological structure on the top of the data using a pre-specified filtration, and finally to compute the topological signature to be further exploited. The commonly used topological signature is a persistent diagram (or transformations of it). Current research discusses the consequences of the many ways to exploit topological signatures, much less often the choice of the filtration, but to the best of our knowledge, the choice of the representation of a signal has not been the subject of any study yet. This paper attempts to provide some answers on the latter problem. To this end, we collected real audio data and built a comparative study to assess the quality of the discriminant information of the topological signatures extracted from three different representation spaces. Each audio signal is represented as i) an embedding of observed data in a higher dimensional space using Taken's representation, ii) a spectrogram viewed as a surface in a 3D ambient space, iii) the set of spectrogram's zeroes. From vowel audio recordings, we use topological signature for three prediction problems: speaker gender, vowel type, and individual. We show that topologically-augmented random forest improves the Out-of-Bag Error (OOB) over solely based Mel-Frequency Cepstral Coefficients (MFCC) for the last two problems. Our results also suggest that the topological information extracted from different signal representations is complementary, and that spectrogram's zeros offers the best improvement for gender prediction. △ Less

Submitted 10 October, 2023; originally announced October 2023.

arXiv:2302.07640 [pdf, other]

Detection and classification of vocal productions in large scale audio recordings

Authors: Guillem Bonafos, Pierre Pudlo, Jean-Marc Freyermuth, Thierry Legou, Joël Fagot, Samuel Tronçon, Arnaud Rey

Abstract: We propose an automatic data processing pipeline to extract vocal productions from large-scale natural audio recordings and classify these vocal productions. The pipeline is based on a deep neural network and adresses both issues simultaneously. Though a series of computationel steps (windowing, creation of a noise class, data augmentation, re-sampling, transfer learning, Bayesian optimisation), i… ▽ More We propose an automatic data processing pipeline to extract vocal productions from large-scale natural audio recordings and classify these vocal productions. The pipeline is based on a deep neural network and adresses both issues simultaneously. Though a series of computationel steps (windowing, creation of a noise class, data augmentation, re-sampling, transfer learning, Bayesian optimisation), it automatically trains a neural network without requiring a large sample of labeled data and important computing resources. Our end-to-end methodology can handle noisy recordings made under different recording conditions. We test it on two different natural audio data sets, one from a group of Guinea baboons recorded from a primate research center and one from human babies recorded at home. The pipeline trains a model on 72 and 77 minutes of labeled audio recordings, with an accuracy of 94.58% and 99.76%. It is then used to process 443 and 174 hours of natural continuous recordings and it creates two new databases of 38.8 and 35.2 hours, respectively. We discuss the strengths and limitations of this approach that can be applied to any massive audio recording. △ Less

Submitted 11 August, 2023; v1 submitted 14 February, 2023; originally announced February 2023.

arXiv:2205.01501 [pdf, other]

Tempered, Anti-trunctated, Multiple Importance Sampling

Authors: Grégoire Aufort, Pierre Pudlo, Denis Burgarella

Abstract: Importance sampling is a Monte Carlo method that introduces a proposal distribution to sample the space according to the target distribution. Yet calibration of the proposal distribution is essential to achieving efficiency, thus the resort to adaptive algorithms to tune this distribution. In the paper, we propose a new adpative importance sampling scheme, named Tempered Anti-truncated Adaptive Mu… ▽ More Importance sampling is a Monte Carlo method that introduces a proposal distribution to sample the space according to the target distribution. Yet calibration of the proposal distribution is essential to achieving efficiency, thus the resort to adaptive algorithms to tune this distribution. In the paper, we propose a new adpative importance sampling scheme, named Tempered Anti-truncated Adaptive Multiple Importance Sampling (TAMIS) algorithm. We combine a tempering scheme and a new nonlinear transformation of the weights we named anti-truncation. For efficiency, we were also concerned not to increase the number of evaluations of the target density. As a result, our proposal is an automatically tuned sequential algorithm that is robust to poor initial proposals, does not require gradient computations and scales well with the dimension. △ Less

Submitted 16 June, 2022; v1 submitted 3 May, 2022; originally announced May 2022.

arXiv:2002.07815 [pdf, other]

doi 10.1051/0004-6361/201936788

Constraining the recent star formation history of galaxies : an Approximate Bayesian Computation approach

Authors: G. Aufort, L. Ciesla, P. Pudlo, V. Buat

Abstract: [Abridged] Although galaxies are found to follow a tight relation between their star formation rate and stellar mass, they are expected to exhibit complex star formation histories (SFH), with short-term fluctuations. The goal of this pilot study is to present a method that will identify galaxies that are undergoing a strong variation of star formation activity in the last tens to hundreds Myr. In… ▽ More [Abridged] Although galaxies are found to follow a tight relation between their star formation rate and stellar mass, they are expected to exhibit complex star formation histories (SFH), with short-term fluctuations. The goal of this pilot study is to present a method that will identify galaxies that are undergoing a strong variation of star formation activity in the last tens to hundreds Myr. In other words, the proposed method will determine whether a variation in the last few hundreds of Myr of the SFH is needed to properly model the SED rather than a smooth normal SFH. To do so, we analyze a sample of COSMOS galaxies using high signal-to-noise ratio broad band photometry. We apply Approximate Bayesian Computation, a state-of-the-art statistical method to perform model choice, associated to machine learning algorithms to provide the probability that a flexible SFH is preferred based on the observed flux density ratios of galaxies. We present the method and test it on a sample of simulated SEDs. The input information fed to the algorithm is a set of broadband UV to NIR (rest-frame) flux ratios for each galaxy. The method has an error rate of 21% in recovering the right SFH and is sensitive to SFR variations larger than 1 dex. A more traditional SED fitting method using CIGALE is tested to achieve the same goal, based on fits comparisons through Bayesian Information Criterion but the best error rate obtained is higher, 28%. We apply our new method to the COSMOS galaxies sample. The stellar mass distribution of galaxies with a strong to decisive evidence against the smooth delayed-$τ$ SFH peaks at lower M* compared to galaxies where the smooth delayed-$τ$ SFH is preferred. We discuss the fact that this result does not come from any bias due to our training. Finally, we argue that flexible SFHs are needed to be able to cover that largest SFR-M* parameter space possible. △ Less

Submitted 18 February, 2020; originally announced February 2020.

Journal ref: A&A 635, A136 (2020)

arXiv:1910.14227 [pdf, other]

Combined parameter and state inference with automatically calibrated ABC

Authors: Anthony Ebert, Pierre Pudlo, Kerrie Mengersen, Paul Wu, Christopher Drovandi

Abstract: State space models contain time-indexed parameters, termed states, as well as static parameters, simply termed parameters. The problem of inferring both static parameters as well as states simultaneously, based on time-indexed observations, is the subject of much recent literature. This problem is compounded once we consider models with intractable likelihoods. In these situations, some emerging a… ▽ More State space models contain time-indexed parameters, termed states, as well as static parameters, simply termed parameters. The problem of inferring both static parameters as well as states simultaneously, based on time-indexed observations, is the subject of much recent literature. This problem is compounded once we consider models with intractable likelihoods. In these situations, some emerging approaches have incorporated existing likelihood-free techniques for static parameters, such as approximate Bayesian computation (ABC) into likelihood-based algorithms for combined inference of parameters and states. These emerging approaches currently require extensive manual calibration of a time-indexed tuning parameter: the acceptance threshold. We design an SMC$^2$ algorithm (Chopin et al., 2013, JRSS B) for likelihood-free approximation with automatically tuned thresholds. We prove consistency of the algorithm and discuss the proposed calibration. We demonstrate this algorithm's performance with three examples. We begin with two examples of state space models. The first example is a toy example, with an emission distribution that is a skew normal distribution. The second example is a stochastic volatility model involving an intractable stable distribution. The last example is the most challenging; it deals with an inhomogeneous Hawkes process. △ Less

Submitted 26 May, 2021; v1 submitted 30 October, 2019; originally announced October 2019.

arXiv:1605.05537 [pdf, other]

doi 10.24072/pci.evolbiol.100036

ABC random forests for Bayesian parameter inference

Authors: Louis Raynal, Jean-Michel Marin, Pierre Pudlo, Mathieu Ribatet, Christian P. Robert, Arnaud Estoup

Abstract: This preprint has been reviewed and recommended by Peer Community In Evolutionary Biology (http://dx.doi.org/10.24072/pci.evolbiol.100036). Approximate Bayesian computation (ABC) has grown into a standard methodology that manages Bayesian inference for models associated with intractable likelihood functions. Most ABC implementations require the preliminary selection of a vector of informative stat… ▽ More This preprint has been reviewed and recommended by Peer Community In Evolutionary Biology (http://dx.doi.org/10.24072/pci.evolbiol.100036). Approximate Bayesian computation (ABC) has grown into a standard methodology that manages Bayesian inference for models associated with intractable likelihood functions. Most ABC implementations require the preliminary selection of a vector of informative statistics summarizing raw data. Furthermore, in almost all existing implementations, the tolerance level that separates acceptance from rejection of simulated parameter values needs to be calibrated. We propose to conduct likelihood-free Bayesian inferences about parameters with no prior selection of the relevant components of the summary statistics and bypassing the derivation of the associated tolerance level. The approach relies on the random forest methodology of Breiman (2001) applied in a (non parametric) regression setting. We advocate the derivation of a new random forest for each component of the parameter vector of interest. When compared with earlier ABC solutions, this method offers significant gains in terms of robustness to the choice of the summary statistics, does not depend on any type of tolerance level, and is a good trade-off in term of quality of point estimator precision and credible interval estimations for a given computing time. We illustrate the performance of our methodological proposal and compare it with earlier ABC methods on a Normal toy example and a population genetics example dealing with human population evolution. All methods designed here have been incorporated in the R package abcrf (version 1.7) available on CRAN. △ Less

Submitted 2 November, 2018; v1 submitted 18 May, 2016; originally announced May 2016.

Comments: Main text: 24 pages, 6 figures Supplementary Information: 14 pages, 5 figures

arXiv:1604.08403 [pdf, other]

Bayesian functional linear regression with sparse step functions

Authors: Paul-Marie Grollemund, Christophe Abraham, Meïli Baragatti, Pierre Pudlo

Abstract: The functional linear regression model is a common tool to determine the relationship between a scalar outcome and a functional predictor seen as a function of time. This paper focuses on the Bayesian estimation of the support of the coefficient function. To this aim we propose a parsimonious and adaptive decomposition of the coefficient function as a step function, and a model including a prior d… ▽ More The functional linear regression model is a common tool to determine the relationship between a scalar outcome and a functional predictor seen as a function of time. This paper focuses on the Bayesian estimation of the support of the coefficient function. To this aim we propose a parsimonious and adaptive decomposition of the coefficient function as a step function, and a model including a prior distribution that we name Bayesian functional Linear regression with Sparse Step functions (Bliss). The aim of the method is to recover areas of time which influences the most the outcome. A Bayes estimator of the support is built with a specific loss function, as well as two Bayes estimators of the coefficient function, a first one which is smooth and a second one which is a step function. The performance of the proposed methodology is analysed on various synthetic datasets and is illustrated on a black Périgord truffle dataset to study the influence of rainfall on the production. △ Less

Submitted 6 January, 2017; v1 submitted 28 April, 2016; originally announced April 2016.

arXiv:1603.07237 [pdf, other]

Resampling: an improvement of Importance Sampling in varying population size models

Authors: Coralie Merle, Raphaël Leblois, François Rousset, Pierre Pudlo

Abstract: Sequential importance sampling algorithms have been defined to estimate likelihoods in models of ancestral population processes. However, these algorithms are based on features of the models with constant population size, and become inefficient when the population size varies in time, making likelihood-based inferences difficult in many demographic situations. In this work, we modify a previous se… ▽ More Sequential importance sampling algorithms have been defined to estimate likelihoods in models of ancestral population processes. However, these algorithms are based on features of the models with constant population size, and become inefficient when the population size varies in time, making likelihood-based inferences difficult in many demographic situations. In this work, we modify a previous sequential importance sampling algorithm to improve the efficiency of the likelihood estimation. Our procedure is still based on features of the model with constant size, but uses a resampling technique with a new resampling probability distribution depending on the pairwise composite likelihood. We tested our algorithm, called sequential importance sampling with resampling (SISR) on simulated data sets under different demographic cases. In most cases, we divided the computational cost by two for the same accuracy of inference, in some cases even by one hundred. This study provides the first assessment of the impact of such resampling techniques on parameter inference using sequential importance sampling, and extends the range of situations where likelihood inferences can be easily performed. △ Less

Submitted 23 March, 2016; originally announced March 2016.

arXiv:1602.02606 [pdf, other]

doi 10.1002/sta4.112

Hidden Gibbs random fields model selection using Block Likelihood Information Criterion

Authors: Julien Stoehr, Jean-Michel Marin, Pierre Pudlo

Abstract: Performing model selection between Gibbs random fields is a very challenging task. Indeed, due to the Markovian dependence structure, the normalizing constant of the fields cannot be computed using standard analytical or numerical methods. Furthermore, such unobserved fields cannot be integrated out and the likelihood evaluztion is a doubly intractable problem. This forms a central issue to pick t… ▽ More Performing model selection between Gibbs random fields is a very challenging task. Indeed, due to the Markovian dependence structure, the normalizing constant of the fields cannot be computed using standard analytical or numerical methods. Furthermore, such unobserved fields cannot be integrated out and the likelihood evaluztion is a doubly intractable problem. This forms a central issue to pick the model that best fits an observed data. We introduce a new approximate version of the Bayesian Information Criterion. We partition the lattice into continuous rectangular blocks and we approximate the probability measure of the hidden Gibbs field by the product of some Gibbs distributions over the blocks. On that basis, we estimate the likelihood and derive the Block Likelihood Information Criterion (BLIC) that answers model choice questions such as the selection of the dependency structure or the number of latent states. We study the performances of BLIC for those questions. In addition, we present a comparison with ABC algorithms to point out that the novel criterion offers a better trade-off between time efficiency and reliable results. △ Less

Submitted 8 February, 2016; originally announced February 2016.

Journal ref: Stat (2016) 5:158-172

arXiv:1503.07689 [pdf, other]

Likelihood-free Model Choice

Authors: Jean-Michel Marin, Pierre Pudlo, Arnaud Estoup, Christian P. Robert

Abstract: This document is an invited chapter covering the specificities of ABC model choice, intended for the incoming Handbook of ABC by Sisson, Fan, and Beaumont (2017). Beyond exposing the potential pitfalls of ABC based posterior probabilities, the review emphasizes mostly the solution proposed by Pudlo et al. (2016) on the use of random forests for aggregating summary statistics and and for estimating… ▽ More This document is an invited chapter covering the specificities of ABC model choice, intended for the incoming Handbook of ABC by Sisson, Fan, and Beaumont (2017). Beyond exposing the potential pitfalls of ABC based posterior probabilities, the review emphasizes mostly the solution proposed by Pudlo et al. (2016) on the use of random forests for aggregating summary statistics and and for estimating the posterior probability of the most likely model via a secondary random fores. △ Less

Submitted 16 September, 2016; v1 submitted 26 March, 2015; originally announced March 2015.

Comments: 21 pages, 9 figures, 2 tables

arXiv:1406.6288 [pdf, other]

Reliable ABC model choice via random forests

Authors: Pierre Pudlo, Jean-Michel Marin, Arnaud Estoup, Jean-Marie Cornuet, Mathieu Gautier, Christian P. Robert

Abstract: Approximate Bayesian computation (ABC) methods provide an elaborate approach to Bayesian inference on complex models, including model choice. Both theoretical arguments and simulation experiments indicate, however, that model posterior probabilities may be poorly evaluated by standard ABC techniques. We propose a novel approach based on a machine learning tool named random forests to conduct selec… ▽ More Approximate Bayesian computation (ABC) methods provide an elaborate approach to Bayesian inference on complex models, including model choice. Both theoretical arguments and simulation experiments indicate, however, that model posterior probabilities may be poorly evaluated by standard ABC techniques. We propose a novel approach based on a machine learning tool named random forests to conduct selection among the highly complex models covered by ABC algorithms. We thus modify the way Bayesian model selection is both understood and operated, in that we rephrase the inferential goal as a classification problem, first predicting the model that best fits the data with random forests and postponing the approximation of the posterior probability of the predicted MAP for a second stage also relying on random forests. Compared with earlier implementations of ABC model choice, the ABC random forest approach offers several potential improvements: (i) it often has a larger discriminative power among the competing models, (ii) it is more robust against the number and choice of statistics summarizing the data, (iii) the computing effort is drastically reduced (with a gain in computation efficiency of at least fifty), and (iv) it includes an approximation of the posterior probability of the selected model. The call to random forests will undoubtedly extend the range of size of datasets and complexity of models that ABC can handle. We illustrate the power of this novel methodology by analyzing controlled experiments as well as genuine population genetics datasets. The proposed methodologies are implemented in the R package abcrf available on the CRAN. △ Less

Submitted 2 September, 2015; v1 submitted 24 June, 2014; originally announced June 2014.

Comments: 39 pages, 15 figures, 6 tables

arXiv:1402.1380 [pdf, other]

doi 10.1007/s11222-014-9514-9

Adaptive ABC model choice and geometric summary statistics for hidden Gibbs random fields

Authors: Julien Stoehr, Pierre Pudlo, Lionel Cucala

Abstract: Selecting between different dependency structures of hidden Markov random field can be very challenging, due to the intractable normalizing constant in the likelihood. We answer this question with approximate Bayesian computation (ABC) which provides a model choice method in the Bayesian paradigm. This comes after the work of Grelaud et al. (2009) who exhibited sufficient statistics on directly ob… ▽ More Selecting between different dependency structures of hidden Markov random field can be very challenging, due to the intractable normalizing constant in the likelihood. We answer this question with approximate Bayesian computation (ABC) which provides a model choice method in the Bayesian paradigm. This comes after the work of Grelaud et al. (2009) who exhibited sufficient statistics on directly observed Gibbs random fields. But when the random field is latent, the sufficiency falls and we complement the set with geometric summary statistics. The general approach to construct these intuitive statistics relies on a clustering analysis of the sites based on the observed colors and plausible latent graphs. The efficiency of ABC model choice based on these statistics is evaluated via a local error rate which may be of independent interest. As a byproduct we derived an ABC algorithm that adapts the dimension of the summary statistics to the dataset without distorting the model selection. △ Less

Submitted 16 July, 2014; v1 submitted 6 February, 2014; originally announced February 2014.

Journal ref: Statistics and Computing (2015) 25:129-141

arXiv:1211.2548 [pdf, other]

Consistency of the Adaptive Multiple Importance Sampling

Authors: Jean-Michel Marin, Pierre Pudlo, Mohammed Sedki

Abstract: Among Monte Carlo techniques, the importance sampling requires fine tuning of a proposal distribution, which is now fluently resolved through iterative schemes. The Adaptive Multiple Importance Sampling (AMIS) of Cornuet et al. (2012) provides a significant improvement in stability and effective sample size due to the introduction of a recycling procedure. However, the consistency of the AMIS esti… ▽ More Among Monte Carlo techniques, the importance sampling requires fine tuning of a proposal distribution, which is now fluently resolved through iterative schemes. The Adaptive Multiple Importance Sampling (AMIS) of Cornuet et al. (2012) provides a significant improvement in stability and effective sample size due to the introduction of a recycling procedure. However, the consistency of the AMIS estimator remains largely open. In this work we prove the convergence of the AMIS, at a cost of a slight modification in the learning process. Contrary to Douc et al. (2007a), results are obtained here in the asymptotic regime where the number of iterations is going to infinity while the number of drawings per iteration is a fixed, but growing sequence of integers. Hence some of the results shed new light on adaptive population Monte Carlo algorithms in that last regime. △ Less

Submitted 26 May, 2014; v1 submitted 12 November, 2012; originally announced November 2012.

MSC Class: 65C05 (Primary) 60F17 (Secondary)

arXiv:1210.1388 [pdf, other]

Efficient learning in ABC algorithms

Authors: Mohammed Sedki, Pierre Pudlo, Jean-Michel Marin, Christian P. Robert, Jean-Marie Cornuet

Abstract: Approximate Bayesian Computation has been successfully used in population genetics to bypass the calculation of the likelihood. These methods provide accurate estimates of the posterior distribution by comparing the observed dataset to a sample of datasets simulated from the model. Although parallelization is easily achieved, computation times for ensuring a suitable approximation quality of the p… ▽ More Approximate Bayesian Computation has been successfully used in population genetics to bypass the calculation of the likelihood. These methods provide accurate estimates of the posterior distribution by comparing the observed dataset to a sample of datasets simulated from the model. Although parallelization is easily achieved, computation times for ensuring a suitable approximation quality of the posterior distribution are still high. To alleviate the computational burden, we propose an adaptive, sequential algorithm that runs faster than other ABC algorithms but maintains accuracy of the approximation. This proposal relies on the sequential Monte Carlo sampler of Del Moral et al. (2012) but is calibrated to reduce the number of simulations from the model. The paper concludes with numerical experiments on a toy example and on a population genetic study of Apis mellifera, where our algorithm was shown to be faster than traditional ABC schemes. △ Less

Submitted 15 March, 2013; v1 submitted 4 October, 2012; originally announced October 2012.

arXiv:1205.5658 [pdf, other]

doi 10.1073/pnas.1208827110

Bayesian computation via empirical likelihood

Authors: K. L. Mengersen, P. Pudlo, C. P. Robert

Abstract: Approximate Bayesian computation (ABC) has become an essential tool for the analysis of complex stochastic models when the likelihood function is numerically unavailable. However, the well-established statistical method of empirical likelihood provides another route to such settings that bypasses simulations from the model and the choices of the ABC parameters (summary statistics, distance, tolera… ▽ More Approximate Bayesian computation (ABC) has become an essential tool for the analysis of complex stochastic models when the likelihood function is numerically unavailable. However, the well-established statistical method of empirical likelihood provides another route to such settings that bypasses simulations from the model and the choices of the ABC parameters (summary statistics, distance, tolerance), while being convergent in the number of observations. Furthermore, bypassing model simulations may lead to significant time savings in complex models, for instance those found in population genetics. The BCel algorithm we develop in this paper also provides an evaluation of its own performance through an associated effective sample size. The method is illustrated using several examples, including estimation of standard distributions, time series, and population genetics models. △ Less

Submitted 5 December, 2012; v1 submitted 25 May, 2012; originally announced May 2012.

Comments: 21 pages, 12 figures, revised version of the previous version with a new title

arXiv:1201.1314 [pdf, other]

Some discussions of D. Fearnhead and D. Prangle's Read Paper "Constructing summary statistics for approximate Bayesian computation: semi-automatic approximate Bayesian computation"

Authors: Christophe Andrieu, Simon Barthelme, Nicolas Chopin, Julien Cornebise, Arnaud Doucet, Mark Girolami, Ioannis Kosmidis, Ajay Jasra, Anthony Lee, Jean-Michel Marin, Pierre Pudlo, Christian P. Robert, Mohammed Sedki., Sumeetpal S. Singh

Abstract: This report is a collection of comments on the Read Paper of Fearnhead and Prangle (2011), to appear in the Journal of the Royal Statistical Society Series B, along with a reply from the authors. This report is a collection of comments on the Read Paper of Fearnhead and Prangle (2011), to appear in the Journal of the Royal Statistical Society Series B, along with a reply from the authors. △ Less

Submitted 5 January, 2012; originally announced January 2012.

Comments: 10 pages

arXiv:1106.5919 [pdf, other]

Monte Carlo algorithms for model assessment via conflicting summaries

Authors: Oliver Ratmann, Pierre Pudlo, Sylvia Richardson, Christian Robert

Abstract: The development of statistical methods and numerical algorithms for model choice is vital to many real-world applications. In practice, the ABC approach can be instrumental for sequential model design; however, the theoretical basis of its use has been questioned. We present a measure-theoretic framework for using the ABC error towards model choice and describe how easily existing rejection, Metro… ▽ More The development of statistical methods and numerical algorithms for model choice is vital to many real-world applications. In practice, the ABC approach can be instrumental for sequential model design; however, the theoretical basis of its use has been questioned. We present a measure-theoretic framework for using the ABC error towards model choice and describe how easily existing rejection, Metropolis-Hastings and sequential importance sampling ABC algorithms are extended for the purpose of model checking. Considering a panel of applications from evolutionary biology to dynamic systems, we discuss the choice of summaries which differs from standard ABC approaches. The methods and algorithms presented here may provide the workhorse machinery for an exploratory approach to ABC model choice, particularly as the application of standard Bayesian tools can prove impossible. △ Less

Submitted 29 June, 2011; originally announced June 2011.

Comments: Under review

ACM Class: G.3; I.6.4; J.3

arXiv:1101.0955 [pdf, other]

Approximate Bayesian Computational methods

Authors: Jean-Michel Marin, Pierre Pudlo, Christian P. Robert, Robin Ryder

Abstract: Also known as likelihood-free methods, approximate Bayesian computational (ABC) methods have appeared in the past ten years as the most satisfactory approach to untractable likelihood problems, first in genetics then in a broader spectrum of applications. However, these methods suffer to some degree from calibration difficulties that make them rather volatile in their implementation and thus rende… ▽ More Also known as likelihood-free methods, approximate Bayesian computational (ABC) methods have appeared in the past ten years as the most satisfactory approach to untractable likelihood problems, first in genetics then in a broader spectrum of applications. However, these methods suffer to some degree from calibration difficulties that make them rather volatile in their implementation and thus render them suspicious to the users of more traditional Monte Carlo methods. In this survey, we study the various improvements and extensions made to the original ABC algorithm over the recent years. △ Less

Submitted 27 May, 2011; v1 submitted 5 January, 2011; originally announced January 2011.

Comments: 7 figures

arXiv:1002.2313 [pdf, ps, other]

Operator norm convergence of spectral clustering on level sets

Authors: Bruno Pelletier, Pierre Pudlo

Abstract: Following Hartigan, a cluster is defined as a connected component of the t-level set of the underlying density, i.e., the set of points for which the density is greater than t. A clustering algorithm which combines a density estimate with spectral clustering techniques is proposed. Our algorithm is composed of two steps. First, a nonparametric density estimate is used to extract the data points… ▽ More Following Hartigan, a cluster is defined as a connected component of the t-level set of the underlying density, i.e., the set of points for which the density is greater than t. A clustering algorithm which combines a density estimate with spectral clustering techniques is proposed. Our algorithm is composed of two steps. First, a nonparametric density estimate is used to extract the data points for which the estimated density takes a value greater than t. Next, the extracted points are clustered based on the eigenvectors of a graph Laplacian matrix. Under mild assumptions, we prove the almost sure convergence in operator norm of the empirical graph Laplacian operator associated with the algorithm. Furthermore, we give the typical behavior of the representation of the dataset into the feature space, which establishes the strong consistency of our proposed algorithm. △ Less

Submitted 11 February, 2010; originally announced February 2010.

Showing 1–20 of 20 results for author: Pudlo, P