Search | arXiv e-print repository

Easily Computed Marginal Likelihoods from Posterior Simulation Using the THAMES Estimator

Authors: Martin Metodiev, Marie Perrot-Dockès, Sarah Ouadah, Nicholas J. Irons, Adrian E. Raftery

Abstract: We propose an easily computed estimator of marginal likelihoods from posterior simulation output, via reciprocal importance sampling, combining earlier proposals of DiCiccio et al (1997) and Robert and Wraith (2009). This involves only the unnormalized posterior densities from the sampled parameter values, and does not involve additional simulations beyond the main posterior simulation, or additio… ▽ More We propose an easily computed estimator of marginal likelihoods from posterior simulation output, via reciprocal importance sampling, combining earlier proposals of DiCiccio et al (1997) and Robert and Wraith (2009). This involves only the unnormalized posterior densities from the sampled parameter values, and does not involve additional simulations beyond the main posterior simulation, or additional complicated calculations. It is unbiased for the reciprocal of the marginal likelihood, consistent, has finite variance, and is asymptotically normal. It involves one user-specified control parameter, and we derive an optimal way of specifying this. We illustrate it with several numerical examples. △ Less

Submitted 15 May, 2023; originally announced May 2023.

arXiv:2105.00288 [pdf, other]

Post hoc false discovery proportion inference under a Hidden Markov Model

Authors: Marie Perrot-Dockès, Gilles Blanchard, Pierre Neuvial, Etienne Roquain

Abstract: We address the multiple testing problem under the assumption that the true/false hypotheses are driven by a Hidden Markov Model (HMM), which is recognized as a fundamental setting to model multiple testing under dependence since the seminal work of \citet{sun2009large}. While previous work has concentrated on deriving specific procedures with a controlled False Discovery Rate (FDR) under this mode… ▽ More We address the multiple testing problem under the assumption that the true/false hypotheses are driven by a Hidden Markov Model (HMM), which is recognized as a fundamental setting to model multiple testing under dependence since the seminal work of \citet{sun2009large}. While previous work has concentrated on deriving specific procedures with a controlled False Discovery Rate (FDR) under this model, following a recent trend in selective inference, we consider the problem of establishing confidence bounds on the false discovery proportion (FDP), for a user-selected set of hypotheses that can depend on the observed data in an arbitrary way. We develop a methodology to construct such confidence bounds first when the HMM model is known, then when its parameters are unknown and estimated, including the data distribution under the null and the alternative, using a nonparametric approach. In the latter case, we propose a bootstrap-based methodology to take into account the effect of parameter estimation error. We show that taking advantage of the assumed HMM structure allows for a substantial improvement of confidence bound sharpness over existing agnostic (structure-free) methods, as witnessed both via numerical experiments and real data examples. △ Less

Submitted 1 May, 2021; originally announced May 2021.

arXiv:2012.03534 [pdf, other]

Some detection tests for low complexity data models and unknown background distribution

Authors: D. Mary, S. Bourguignon, E. Roquain, S. Sulis, M. Perrot-Dockes

Abstract: We consider several detection situations where, under the alternative hypothesis, the signal admits a low complexity model and, under both the null and the alternative hypotheses, the distribution of the background noise is {unknown}. We present several detection strategies for such cases, whose design relies on exogenous or on endogenous data. These testing procedures have been inspired by and ar… ▽ More We consider several detection situations where, under the alternative hypothesis, the signal admits a low complexity model and, under both the null and the alternative hypotheses, the distribution of the background noise is {unknown}. We present several detection strategies for such cases, whose design relies on exogenous or on endogenous data. These testing procedures have been inspired by and are applied to two specific problems in Astrophysics, namely the detection of exoplanets from radial velocity curves and of distant galaxies in hyperspectral datacubes. △ Less

Submitted 7 December, 2020; originally announced December 2020.

Comments: in Proceedings of iTWIST'20, Paper-ID: 34, Nantes, France, December, 2-4, 2020

arXiv:1806.10093 [pdf, other]

Estimation of large block structured covariance matrices: Application to "multi-omic" approaches to study seed quality

Authors: Marie Perrot-Dockès, Céline Lévy-Leduc, Loïc Rajjou

Abstract: Motivated by an application in high-throughput genomics and metabolomics, we propose a novel, efficient and fully data-driven approach for estimating large block structured sparse covariance matrices in the case where the number of variables is much larger than the number of samples without limiting ourselves to block diagonal matrices. Our approach consists in approximating such a covariance matr… ▽ More Motivated by an application in high-throughput genomics and metabolomics, we propose a novel, efficient and fully data-driven approach for estimating large block structured sparse covariance matrices in the case where the number of variables is much larger than the number of samples without limiting ourselves to block diagonal matrices. Our approach consists in approximating such a covariance matrix by the sum of a low-rank sparse matrix and a diagonal matrix. Our methodology also can deal with matrices for which the block structure appears only if the columns and rows are permuted according to an unknown permutation. Our technique is implemented in the R package \texttt{BlockCov} which is available from the Comprehensive R Archive Network (CRAN) and from GitHub. In order to illustrate the statistical and numerical performance of our package some numerical experiments are provided as well as a thorough comparison with alternative methods. Finally, our approach is applied to the use of "multi-omic" approaches for studying seed quality. △ Less

Submitted 6 December, 2019; v1 submitted 26 June, 2018; originally announced June 2018.

arXiv:1707.04145 [pdf, other]

Variable selection in multivariate linear models with high-dimensional covariance matrix estimation

Authors: Marie Perrot-Dockès, Céline Lévy-Leduc, Laure Sansonnet, Julien Chiquet

Abstract: In this paper, we propose a novel variable selection approach in the framework of multivariate linear models taking into account the dependence that may exist between the responses. It consists in estimating beforehand the covariance matrix of the responses and to plug this estimator in a Lasso criterion, in order to obtain a sparse estimator of the coefficient matrix. The properties of our approa… ▽ More In this paper, we propose a novel variable selection approach in the framework of multivariate linear models taking into account the dependence that may exist between the responses. It consists in estimating beforehand the covariance matrix of the responses and to plug this estimator in a Lasso criterion, in order to obtain a sparse estimator of the coefficient matrix. The properties of our approach are investigated both from a theoretical and a numerical point of view. More precisely, we give general conditions that the estimators of the covariance matrix and its inverse have to satisfy in order to recover the positions of the null and non null entries of the coefficient matrix when the size of the covariance matrix is not fixed and can tend to infinity. We prove that these conditions are satisfied in the particular case of some Toeplitz matrices. Our approach is implemented in the R package MultiVarSel available from the Comprehensive R Archive Network (CRAN) and is very attractive since it benefits from a low computational load. We also assess the performance of our methodology using synthetic data and compare it with alternative approaches. Our numerical experiments show that including the estimation of the covariance matrix in the Lasso criterion dramatically improves the variable selection performance in many cases. △ Less

Submitted 13 July, 2017; originally announced July 2017.

arXiv:1704.00076 [pdf, other]

A multivariate variable selection approach for analyzing LC-MS metabolomics data

Authors: M. Perrot-Dockès, C. Lévy-Leduc, J. Chiquet, L. Sansonnet, M. Brégère, M. -P. Étienne, S. Robin, G. Genta-Jouve

Abstract: Omic data are characterized by the presence of strong dependence structures that result either from data acquisition or from some underlying biological processes. In metabolomics, for instance, data resulting from Liquid Chromatography-Mass Spectrometry (LC-MS) -- a technique which gives access to a large coverage of metabolites -- exhibit such patterns. These data sets are typically used to find… ▽ More Omic data are characterized by the presence of strong dependence structures that result either from data acquisition or from some underlying biological processes. In metabolomics, for instance, data resulting from Liquid Chromatography-Mass Spectrometry (LC-MS) -- a technique which gives access to a large coverage of metabolites -- exhibit such patterns. These data sets are typically used to find the metabolites characterizing a phenotype of interest associated with the samples. However, applying some statistical procedures that do not adjust the variable selection step to the dependence pattern may result in a loss of power and the selection of spurious variables. The goal of this paper is to propose a variable selection procedure in the multivariate linear model that accounts for the dependence structure of the multiple outputs which may lead in the LC-MS framework to the selection of more relevant metabolites. We propose a novel Lasso-based approach in the multivariate framework of the general linear model taking into account the dependence structure by using various modelings of the covariance matrix of the residuals. Our numerical experiments show that including the estimation of the covariance matrix of the residuals in the Lasso criterion dramatically improves the variable selection performance. Our approach is also successfully applied to a LC-MS data set made of African copals samples for which it is able to provide a small list of metabolites without altering the phenotype discrimination. Our methodology is implemented in the R package MultiVarSel which is available from the CRAN (Comprehensive R Archive Network). △ Less

Submitted 31 March, 2017; originally announced April 2017.

Showing 1–6 of 6 results for author: Perrot-Dockes, M