Search | arXiv e-print repository

A subsampling approach for Bayesian model selection

Authors: Jon Lachmann, Geir Storvik, Florian Frommlet, Aliaksadr Hubin

Abstract: It is common practice to use Laplace approximations to compute marginal likelihoods in Bayesian versions of generalised linear models (GLM). Marginal likelihoods combined with model priors are then used in different search algorithms to compute the posterior marginal probabilities of models and individual covariates. This allows performing Bayesian model selection and model averaging. For large sa… ▽ More It is common practice to use Laplace approximations to compute marginal likelihoods in Bayesian versions of generalised linear models (GLM). Marginal likelihoods combined with model priors are then used in different search algorithms to compute the posterior marginal probabilities of models and individual covariates. This allows performing Bayesian model selection and model averaging. For large sample sizes, even the Laplace approximation becomes computationally challenging because the optimisation routine involved needs to evaluate the likelihood on the full set of data in multiple iterations. As a consequence, the algorithm is not scalable for large datasets. To address this problem, we suggest using a version of a popular batch stochastic gradient descent (BSGD) algorithm for estimating the marginal likelihood of a GLM by subsampling from the data. We further combine the algorithm with Markov chain Monte Carlo (MCMC) based methods for Bayesian model selection and provide some theoretical results on the convergence of the estimates. Finally, we report results from experiments illustrating the performance of the proposed algorithm. △ Less

Submitted 31 January, 2022; originally announced January 2022.

Comments: 33 pages, 17 figures, tables

MSC Class: 62-02; 62-09; 62F07; 62F15; 62J12; 62J05; 62J99; 62M05; 05A16; 60J22; 92D20; 90C27; 90C59 ACM Class: G.1.2; G.1.6; G.2.1; G.3; I.2.0; I.2.6; I.2.8; I.5.1; I.6; I.6.4

arXiv:2110.05316 [pdf, ps, other]

Reversible Genetically Modified Mode Jum** MCMC

Authors: Aliaksandr Hubin, Florian Frommlet, Geir Storvik

Abstract: In this paper, we introduce a reversible version of a genetically modified mode jum** Markov chain Monte Carlo algorithm (GMJMCMC) for inference on posterior model probabilities in complex model spaces, where the number of explanatory variables is prohibitively large for classical Markov Chain Monte Carlo methods. Unlike the earlier proposed GMJMCMC algorithm, the introduced algorithm is a prope… ▽ More In this paper, we introduce a reversible version of a genetically modified mode jum** Markov chain Monte Carlo algorithm (GMJMCMC) for inference on posterior model probabilities in complex model spaces, where the number of explanatory variables is prohibitively large for classical Markov Chain Monte Carlo methods. Unlike the earlier proposed GMJMCMC algorithm, the introduced algorithm is a proper MCMC and its limiting distribution corresponds to the posterior marginal model probabilities in the explored model space under reasonable regularity conditions. △ Less

Submitted 15 October, 2021; v1 submitted 11 October, 2021; originally announced October 2021.

Comments: 6 pages, 2 table, based on arXiv:1806.02160, which got divided into two revised articles

MSC Class: 62-02; 62-09; 62F07; 62F15; 62J12; 62J05; 62J99; 62M05; 05A16; 60J22; 92D20; 90C27; 90C59

Journal ref: Published in Proceedings of 22nd European Young Statisticians Meeting (ISBN: 978-960-7943-23-1), 2021. URL: https://www.eysm2021.panteion.gr/files/Proceedings_EYSM_2021.pdf Parpoula & Athanasios Rakitzis

arXiv:2011.12154 [pdf, other]

Identifying important predictors in large data bases -- multiple testing and model selection

Authors: Malgorzata Bogdan, Florian Frommlet

Abstract: This is a chapter of the forthcoming Handbook of Multiple Testing. We consider a variety of model selection strategies in a high-dimensional setting, where the number of potential predictors p is large compared to the number of available observations n. In particular modifications of information criteria which are suitable in case of p > n are introduced and compared with a variety of penalized li… ▽ More This is a chapter of the forthcoming Handbook of Multiple Testing. We consider a variety of model selection strategies in a high-dimensional setting, where the number of potential predictors p is large compared to the number of available observations n. In particular modifications of information criteria which are suitable in case of p > n are introduced and compared with a variety of penalized likelihood methods, in particular SLOPE and SLOBE. The focus is on methods which control the FDR in terms of model identification. Theoretical results are provided both with respect to model identification and prediction and various simulation results are presented which illustrate the performance of the different methods in different situations. △ Less

Submitted 24 November, 2020; originally announced November 2020.

arXiv:2005.00605 [pdf]

Rejoinder for the discussion of the paper "A novel algorithmic approach to Bayesian Logic Regression"

Authors: Aliaksandr Hubin, Geir Storvik, Florian Frommlet

Abstract: In this rejoinder we summarize the comments, questions and remarks on the paper "A novel algorithmic approach to Bayesian Logic Regression" from the discussants. We then respond to those comments, questions and remarks, provide several extensions of the original model and give a tutorial on our R-package EMJMCMC (http://aliaksah.github.io/EMJMCMC2016/) In this rejoinder we summarize the comments, questions and remarks on the paper "A novel algorithmic approach to Bayesian Logic Regression" from the discussants. We then respond to those comments, questions and remarks, provide several extensions of the original model and give a tutorial on our R-package EMJMCMC (http://aliaksah.github.io/EMJMCMC2016/) △ Less

Submitted 1 May, 2020; originally announced May 2020.

Comments: published in Bayesian Analysis, Volume 15, Number 1 (2020)

Journal ref: Bayesian Analysis, Volume 15, Number 1 (2020)

arXiv:2003.02929 [pdf, ps, other]

doi 10.1613/jair.1.13047

Flexible Bayesian Nonlinear Model Configuration

Authors: Aliaksandr Hubin, Geir Storvik, Florian Frommlet

Abstract: Regression models are used in a wide range of applications providing a powerful scientific tool for researchers from different fields. Linear, or simple parametric, models are often not sufficient to describe complex relationships between input variables and a response. Such relationships can be better described through flexible approaches such as neural networks, but this results in less interpre… ▽ More Regression models are used in a wide range of applications providing a powerful scientific tool for researchers from different fields. Linear, or simple parametric, models are often not sufficient to describe complex relationships between input variables and a response. Such relationships can be better described through flexible approaches such as neural networks, but this results in less interpretable models and potential overfitting. Alternatively, specific parametric nonlinear functions can be used, but the specification of such functions is in general complicated. In this paper, we introduce a flexible approach for the construction and selection of highly flexible nonlinear parametric regression models. Nonlinear features are generated hierarchically, similarly to deep learning, but have additional flexibility on the possible types of features to be considered. This flexibility, combined with variable selection, allows us to find a small set of important features and thereby more interpretable models. Within the space of possible functions, a Bayesian approach, introducing priors for functions based on their complexity, is considered. A genetically modified mode jum** Markov chain Monte Carlo algorithm is adopted to perform Bayesian inference and estimate posterior probabilities for model averaging. In various applications, we illustrate how our approach is used to obtain meaningful nonlinear models. Additionally, we compare its predictive performance with several machine learning algorithms. △ Less

Submitted 23 November, 2021; v1 submitted 5 March, 2020; originally announced March 2020.

Comments: 42 pages; 18 Tables. arXiv admin note: text overlap with arXiv:1806.02160

MSC Class: 62-02; 62-09; 62F07; 62F15; 62J12; 62J05; 62J99; 62M05; 05A16; 60J22; 92D20; 90C27; 90C59

Journal ref: Journal of Artificial Intelligence Research (2021), Volume 72, Pages 901-942

arXiv:1806.02160 [pdf, ps, other]

Deep Bayesian regression models

Authors: Aliaksandr Hubin, Geir Storvik, Florian Frommlet

Abstract: Regression models are used for inference and prediction in a wide range of applications providing a powerful scientific tool for researchers and analysts from different fields. In many research fields the amount of available data as well as the number of potential explanatory variables is rapidly increasing. Variable selection and model averaging have become extremely important tools for improving… ▽ More Regression models are used for inference and prediction in a wide range of applications providing a powerful scientific tool for researchers and analysts from different fields. In many research fields the amount of available data as well as the number of potential explanatory variables is rapidly increasing. Variable selection and model averaging have become extremely important tools for improving inference and prediction. However, often linear models are not sufficient and the complex relationship between input variables and a response is better described by introducing non-linearities and complex functional interactions. Deep learning models have been extremely successful in terms of prediction although they are often difficult to specify and potentially suffer from overfitting. The aim of this paper is to bring the ideas of deep learning into a statistical framework which yields more parsimonious models and allows to quantify model uncertainty. To this end we introduce the class of deep Bayesian regression models (DBRM) consisting of a generalized linear model combined with a comprehensive non-linear feature space, where non-linear features are generated just like in deep learning but combined with variable selection in order to include only important features. DBRM can easily be extended to include latent Gaussian variables to model complex correlation structures between observations, which seems to be not easily possible with existing deep learning approaches. Two different algorithms based on MCMC are introduced to fit DBRM and to perform Bayesian inference. The predictive performance of these algorithms is compared with a large number of state of the art algorithms. Furthermore we illustrate how DBRM can be used for model inference in various applications. △ Less

Submitted 7 June, 2018; v1 submitted 6 June, 2018; originally announced June 2018.

Comments: 50 pages, 12 tables

MSC Class: 62-02; 62-09; 62F07; 62F15; 62J12; 62J05; 62J99; 62M05; 05A16; 60J22; 92D20; 90C27; 90C59 ACM Class: G.1.2; G.1.6; G.2.1; G.3; I.2.0; I.2.6; I.2.8; I.5.1; I.6; I.6.4

arXiv:1705.07616 [pdf, other]

doi 10.1214/18-BA1141

A novel algorithmic approach to Bayesian Logic Regression

Authors: Aliaksandr Hubin, Geir Storvik, Florian Frommlet

Abstract: Logic regression was developed more than a decade ago as a tool to construct predictors from Boolean combinations of binary covariates. It has been mainly used to model epistatic effects in genetic association studies, which is very appealing due to the intuitive interpretation of logic expressions to describe the interaction between genetic variations. Nevertheless logic regression has (partly du… ▽ More Logic regression was developed more than a decade ago as a tool to construct predictors from Boolean combinations of binary covariates. It has been mainly used to model epistatic effects in genetic association studies, which is very appealing due to the intuitive interpretation of logic expressions to describe the interaction between genetic variations. Nevertheless logic regression has (partly due to computational challenges) remained less well known than other approaches to epistatic association map**. Here we will adapt an advanced evolutionary algorithm called GMJMCMC (Genetically modified Mode Jum** Markov Chain Monte Carlo) to perform Bayesian model selection in the space of logic regression models. After describing the algorithmic details of GMJMCMC we perform a comprehensive simulation study that illustrates its performance given logic regression terms of various complexity. Specifically GMJMCMC is shown to be able to identify three-way and even four-way interactions with relatively large power, a level of complexity which has not been achieved by previous implementations of logic regression. We apply GMJMCMC to reanalyze QTL map** data for Recombinant Inbred Lines in \textit{Arabidopsis thaliana} and from a backcross population in \textit{Drosophila} where we identify several interesting epistatic effects. The method is implemented in an R package which is available on github. △ Less

Submitted 28 April, 2020; v1 submitted 22 May, 2017; originally announced May 2017.

Comments: 19 pages, 10 tables

MSC Class: 62-02; 62-09; 62F07; 62F15; 62J12; 62J05; 62J99; 62M05; 05A16; 60J22; 92D20; 90C27; 90C59

Journal ref: Bayesian Analysis, Volume 15, Number 1 (2020)

arXiv:1505.01949 [pdf, other]

doi 10.1371/journal.pone.0148620

An adaptive Ridge procedure for L0 regularization

Authors: Florian Frommlet, Gregory Nuel

Abstract: Penalized selection criteria like AIC or BIC are among the most popular methods for variable selection. Their theoretical properties have been studied intensively and are well understood, but making use of them in case of high-dimensional data is difficult due to the non-convex optimization problem induced by L0 penalties. An elegant solution to this problem is provided by the multi-step adaptive… ▽ More Penalized selection criteria like AIC or BIC are among the most popular methods for variable selection. Their theoretical properties have been studied intensively and are well understood, but making use of them in case of high-dimensional data is difficult due to the non-convex optimization problem induced by L0 penalties. An elegant solution to this problem is provided by the multi-step adaptive lasso, where iteratively weighted lasso problems are solved, whose weights are updated in such a way that the procedure converges towards selection with L0 penalties. In this paper we introduce an adaptive ridge procedure (AR) which mimics the adaptive lasso, but is based on weighted Ridge problems. After introducing AR its theoretical properties are studied in the particular case of orthogonal linear regression. For the non-orthogonal case extensive simulations are performed to assess the performance of AR. In case of Poisson regression and logistic regression it is illustrated how the iterative procedure of AR can be combined with iterative maximization procedures. The paper ends with an efficient implementation of AR in the context of least-squares segmentation. △ Less

Submitted 8 May, 2015; originally announced May 2015.

arXiv:1403.6623 [pdf, other]

doi 10.1371/journal.pone.0103322

Analyzing genome-wide association studies with an FDR controlling modification of the Bayesian information criterion

Authors: Erich Dolejsi, Bernhard Bodenstorfer, Florian Frommlet

Abstract: The prevailing method of analyzing GWAS data is still to test each marker individually, although from a statistical point of view it is quite obvious that in case of complex traits such single marker tests are not ideal. Recently several model selection approaches for GWAS have been suggested, most of them based on LASSO-type procedures. Here we will discuss an alternative model selection approach… ▽ More The prevailing method of analyzing GWAS data is still to test each marker individually, although from a statistical point of view it is quite obvious that in case of complex traits such single marker tests are not ideal. Recently several model selection approaches for GWAS have been suggested, most of them based on LASSO-type procedures. Here we will discuss an alternative model selection approach which is based on a modification of the Bayesian Information Criterion (mBIC2) which was previously shown to have certain asymptotic optimality properties in terms of minimizing the misclassification error. Heuristic search strategies are introduced which attempt to find the model which minimizes mBIC2, and which are efficient enough to allow the analysis of GWAS data. Our approach is implemented in a software package called MOSGWA. Its performance in case control GWAS is compared with the two algorithms HLASSO and GWASelect, as well as with single marker tests, where we performed a simulation study based on real SNP data from the POPRES sample. Our results show that MOSGWA performs slightly better than HLASSO, whereas according to our simulations GWASelect does not control the type I error when used to automatically determine the number of important SNPs. We also reanalyze the GWAS data from the Wellcome Trust Case-Control Consortium (WTCCC) and compare the findings of the different procedures. △ Less

Submitted 26 March, 2014; originally announced March 2014.

arXiv:1010.0124 [pdf, other]

A model selection approach to genome wide association studies

Authors: Florian Frommlet, Felix Ruhaltinger, Piotr Twarog, Malgorzata Bogdan

Abstract: For the vast majority of genome wide association studies (GWAS) published so far, statistical analysis was performed by testing markers individually. In this article we present some elementary statistical considerations which clearly show that in case of complex traits the approach based on multiple regression or generalized linear models is preferable to multiple testing. We introduce a model sel… ▽ More For the vast majority of genome wide association studies (GWAS) published so far, statistical analysis was performed by testing markers individually. In this article we present some elementary statistical considerations which clearly show that in case of complex traits the approach based on multiple regression or generalized linear models is preferable to multiple testing. We introduce a model selection approach to GWAS based on modifications of Bayesian Information Criterion (BIC) and develop some simple search strategies to deal with the huge number of potential models. Comprehensive simulations based on real SNP data confirm that model selection has larger power than multiple testing to detect causal SNPs in complex models. On the other hand multiple testing has substantial problems with proper ranking of causal SNPs and tends to detect a certain number of false positive SNPs, which are not linked to any of the causal mutations. We show that this behavior is typical in GWAS for complex traits and can be explained by an aggregated influence of many small random sample correlations between genotypes of a SNP under investigation and other causal SNPs. We believe that our findings at least partially explain problems with low power and nonreplicability of results in many real data GWAS. Finally, we discuss the advantages of our model selection approach in the context of real data analysis, where we consider publicly available gene expression data as traits for individuals from the HapMap project. △ Less

Submitted 1 October, 2010; originally announced October 2010.

arXiv:1005.4753 [pdf, ps, other]

Asymptotic Bayes optimality under sparsity for generally distributed effect sizes under the alternative

Authors: Florian Frommlet, Arijit Chakrabarti, Magdalena Murawska, Malgorzata Bogdan

Abstract: Recent results concerning asymptotic Bayes-optimality under sparsity (ABOS) of multiple testing procedures are extended to fairly generally distributed effect sizes under the alternative. An asymptotic framework is considered where both the number of tests m and the sample size m go to infinity, while the fraction p of true alternatives converges to zero. It is shown that under mild restrictions o… ▽ More Recent results concerning asymptotic Bayes-optimality under sparsity (ABOS) of multiple testing procedures are extended to fairly generally distributed effect sizes under the alternative. An asymptotic framework is considered where both the number of tests m and the sample size m go to infinity, while the fraction p of true alternatives converges to zero. It is shown that under mild restrictions on the loss function nontrivial asymptotic inference is possible only if n increases to infinity at least at the rate of log m. Based on this assumption precise conditions are given under which the Bonferroni correction with nominal Family Wise Error Rate (FWER) level alpha and the Benjamini- Hochberg procedure (BH) at FDR level alpha are asymptotically optimal. When n is proportional to log m then alpha can remain fixed, whereas when n increases to infinity at a quicker rate, then alpha has to converge to zero roughly like n^(-1/2). Under these conditions the Bonferroni correction is ABOS in case of extreme sparsity, while BH adapts well to the unknown level of sparsity. In the second part of this article these optimality results are carried over to model selection in the context of multiple regression with orthogonal regressors. Several modifications of Bayesian Information Criterion are considered, controlling either FWER or FDR, and conditions are provided under which these selection criteria are ABOS. Finally the performance of these criteria is examined in a brief simulation study. △ Less

Submitted 12 July, 2011; v1 submitted 26 May, 2010; originally announced May 2010.

arXiv:1002.3501 [pdf, ps, other]

doi 10.1214/10-AOS869

Asymptotic Bayes-optimality under sparsity of some multiple testing procedures

Authors: Małgorzata Bogdan, Arijit Chakrabarti, Florian Frommlet, Jayanta K. Ghosh

Abstract: Within a Bayesian decision theoretic framework we investigate some asymptotic optimality properties of a large class of multiple testing rules. A parametric setup is considered, in which observations come from a normal scale mixture model and the total loss is assumed to be the sum of losses for individual tests. Our model can be used for testing point null hypotheses, as well as to distinguish la… ▽ More Within a Bayesian decision theoretic framework we investigate some asymptotic optimality properties of a large class of multiple testing rules. A parametric setup is considered, in which observations come from a normal scale mixture model and the total loss is assumed to be the sum of losses for individual tests. Our model can be used for testing point null hypotheses, as well as to distinguish large signals from a multitude of very small effects. A rule is defined to be asymptotically Bayes optimal under sparsity (ABOS), if within our chosen asymptotic framework the ratio of its Bayes risk and that of the Bayes oracle (a rule which minimizes the Bayes risk) converges to one. Our main interest is in the asymptotic scheme where the proportion p of "true" alternatives converges to zero. We fully characterize the class of fixed threshold multiple testing rules which are ABOS, and hence derive conditions for the asymptotic optimality of rules controlling the Bayesian False Discovery Rate (BFDR). We finally provide conditions under which the popular Benjamini-Hochberg (BH) and Bonferroni procedures are ABOS and show that for a wide class of sparsity levels, the threshold of the former can be approximated by a nonrandom threshold. △ Less

Submitted 21 November, 2012; v1 submitted 18 February, 2010; originally announced February 2010.

Comments: Published in at http://dx.doi.org/10.1214/10-AOS869 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOS-AOS869

Journal ref: Annals of Statistics 2011, Vol. 39, No. 3, 1551-1579

arXiv:math/0503174 [pdf, ps, other]

Improving SDP bounds for minimizing quadratic functions over the l1-ball

Authors: Immanuel M. Bomze, Florian Frommlet, Martin Rubey

Abstract: In this note, we establish superiority of the so-called copositive bound over a bound suggested by Nesterov for the quadratic problem to minimize a quadratic form over the l1-ball. We illustrate the improvement by simulation results. The copositive bound has the additional advantage that it can be easily extended to the inhomogeneous case of quadratic objectives including a linear term. We also… ▽ More In this note, we establish superiority of the so-called copositive bound over a bound suggested by Nesterov for the quadratic problem to minimize a quadratic form over the l1-ball. We illustrate the improvement by simulation results. The copositive bound has the additional advantage that it can be easily extended to the inhomogeneous case of quadratic objectives including a linear term. We also indicate some improvements of the eigenvalue bound for the quadratic optimization over the lp-ball with 1<p<2, at least for p close to one. △ Less

Submitted 22 March, 2005; v1 submitted 9 March, 2005; originally announced March 2005.

Comments: 12 pages, 4 figures, v2: Figure 2a corrected, minor changes

Showing 1–13 of 13 results for author: Frommlet, F