Search | arXiv e-print repository

A Variational Spike-and-Slab Approach for Group Variable Selection

Authors: Buyu Lin, Changhao Ge, Jun S. Liu

Abstract: We introduce a class of generic spike-and-slab priors for high-dimensional linear regression with grouped variables and present a Coordinate-ascent Variational Inference (CAVI) algorithm for obtaining an optimal variational Bayes approximation. Using parameter expansion for a specific, yet comprehensive, family of slab distributions, we obtain a further gain in computational efficiency. The method… ▽ More We introduce a class of generic spike-and-slab priors for high-dimensional linear regression with grouped variables and present a Coordinate-ascent Variational Inference (CAVI) algorithm for obtaining an optimal variational Bayes approximation. Using parameter expansion for a specific, yet comprehensive, family of slab distributions, we obtain a further gain in computational efficiency. The method can be easily extended to fitting additive models. Theoretically, we present general conditions on the generic spike-and-slab priors that enable us to derive the contraction rates for both the true posterior and the VB posterior for linear regression and additive models, of which some previous theoretical results can be viewed as special cases. Our simulation studies and real data application demonstrate that the proposed method is superior to existing methods in both variable selection and parameter estimation. Our algorithm is implemented in the R package GVSSB. △ Less

Submitted 28 September, 2023; originally announced September 2023.

Comments: 64 pages, 6 figures

arXiv:2307.02777 [pdf, other]

On the Optimality of Functional Sliced Inverse Regression

Authors: Rui Chen, Songtao Tian, Dongming Huang, Qian Lin, Jun S. Liu

Abstract: In this paper, we prove that functional sliced inverse regression (FSIR) achieves the optimal (minimax) rate for estimating the central space in functional sufficient dimension reduction problems. First, we provide a concentration inequality for the FSIR estimator of the covariance of the conditional mean, i.e., $\var(\E[\boldsymbol{X}\mid Y])$. Based on this inequality, we establish the root-$n$… ▽ More In this paper, we prove that functional sliced inverse regression (FSIR) achieves the optimal (minimax) rate for estimating the central space in functional sufficient dimension reduction problems. First, we provide a concentration inequality for the FSIR estimator of the covariance of the conditional mean, i.e., $\var(\E[\boldsymbol{X}\mid Y])$. Based on this inequality, we establish the root-$n$ consistency of the FSIR estimator of the image of $\var(\E[\boldsymbol{X}\mid Y])$. Second, we apply the most widely used truncated scheme to estimate the inverse of the covariance operator and identify the truncation parameter which ensures that FSIR can achieve the optimal minimax convergence rate for estimating the central space. Finally, we conduct simulations to demonstrate the optimal choice of truncation parameter and the estimation efficiency of FSIR. To the best of our knowledge, this is the first paper to rigorously prove the minimax optimality of FSIR in estimating the central space for multiple-index models and general $Y$ (not necessarily discrete). △ Less

Submitted 6 July, 2023; originally announced July 2023.

arXiv:2112.08641 [pdf, ps, other]

On Gibbs Sampling for Structured Bayesian Models Discussion of paper by Zanella and Roberts

Authors: Xiaodong Yang, Jun S. Liu

Abstract: This article is a discussion of Zanella and Roberts' paper: Multilevel linear models, gibbs samplers and multigrid decompositions. We consider several extensions in which the multigrid decomposition would bring us interesting insights, including vector hierarchical models, linear mixed effects models and partial centering parametrizations. This article is a discussion of Zanella and Roberts' paper: Multilevel linear models, gibbs samplers and multigrid decompositions. We consider several extensions in which the multigrid decomposition would bring us interesting insights, including vector hierarchical models, linear mixed effects models and partial centering parametrizations. △ Less

Submitted 16 December, 2021; originally announced December 2021.

Comments: 18 pages

arXiv:2111.15084 [pdf, other]

Convergence Rate of Multiple-try Metropolis Independent sampler

Authors: Xiaodong Yang, Jun S. Liu

Abstract: The Multiple-try Metropolis (MTM) method is an interesting extension of the classical Metropolis-Hastings algorithm. However, theoretical understandings of its convergence behavior as well as whether and how it may help are still unknown. This paper derives the exact convergence rate for Multiple-try Metropolis Independent sampler (MTM-IS) via an explicit eigen analysis. As a by-product, we prove… ▽ More The Multiple-try Metropolis (MTM) method is an interesting extension of the classical Metropolis-Hastings algorithm. However, theoretical understandings of its convergence behavior as well as whether and how it may help are still unknown. This paper derives the exact convergence rate for Multiple-try Metropolis Independent sampler (MTM-IS) via an explicit eigen analysis. As a by-product, we prove that MTM-IS is less efficient than the simpler approach of repeated independent Metropolis-Hastings method at the same computational cost. We further explore more variations and find it possible to design more efficient MTM algorithms by creating correlated multiple trials. △ Less

Submitted 3 February, 2023; v1 submitted 29 November, 2021; originally announced November 2021.

Comments: 34 pages; 7 figures

arXiv:2010.08132 [pdf, other]

Power of Knockoff: The Impact of Ranking Algorithm, Augmented Design, and Symmetric Statistic

Authors: Zheng Tracy Ke, Jun S. Liu, Yucong Ma

Abstract: The knockoff filter is a recent false discovery rate (FDR) control method for high-dimensional linear models. We point out that knockoff has three key components: ranking algorithm, augmented design, and symmetric statistic, and each component admits multiple choices. By considering various combinations of the three components, we obtain a collection of variants of knockoff. All these variants gua… ▽ More The knockoff filter is a recent false discovery rate (FDR) control method for high-dimensional linear models. We point out that knockoff has three key components: ranking algorithm, augmented design, and symmetric statistic, and each component admits multiple choices. By considering various combinations of the three components, we obtain a collection of variants of knockoff. All these variants guarantee finite-sample FDR control, and our goal is to compare their power. We assume a Rare and Weak signal model on regression coefficients and compare the power of different variants of knockoff by deriving explicit formulas of false positive rate and false negative rate. Our results provide new insights on how to improve power when controlling FDR at a targeted level. We also compare the power of knockoff with its propotype - a method that uses the same ranking algorithm but has access to an ideal threshold. The comparison reveals the additional price one pays by finding a data-driven threshold to control FDR. △ Less

Submitted 13 February, 2024; v1 submitted 15 October, 2020; originally announced October 2020.

Comments: 67 pages, 13 figures

Journal ref: Journal of Machine Learning Research, 2024

arXiv:2004.01975 [pdf, other]

Stratification and Optimal Resampling for Sequential Monte Carlo

Authors: Yichao Li, Wenshuo Wang, Ke Deng, Jun S Liu

Abstract: Sequential Monte Carlo (SMC), also known as particle filters, has been widely accepted as a powerful computational tool for making inference with dynamical systems. A key step in SMC is resampling, which plays the role of steering the algorithm towards the future dynamics. Several strategies have been proposed and used in practice, including multinomial resampling, residual resampling (Liu and Che… ▽ More Sequential Monte Carlo (SMC), also known as particle filters, has been widely accepted as a powerful computational tool for making inference with dynamical systems. A key step in SMC is resampling, which plays the role of steering the algorithm towards the future dynamics. Several strategies have been proposed and used in practice, including multinomial resampling, residual resampling (Liu and Chen 1998), optimal resampling (Fearnhead and Clifford 2003), stratified resampling (Kitagawa 1996), and optimal transport resampling (Reich 2013). We show that, in the one dimensional case, optimal transport resampling is equivalent to stratified resampling on the sorted particles, and they both minimize the resampling variance as well as the expected squared energy distance between the original and resampled empirical distributions; in the multidimensional case, the variance of stratified resampling after sorting particles using Hilbert curve (Gerber et al. 2019) in $\mathbb{R}^d$ is $O(m^{-(1+2/d)})$, an improved rate compared to the original $O(m^{-(1+1/d)})$, where $m$ is the number of resampled particles. This improved rate is the lowest for ordered stratified resampling schemes, as conjectured in Gerber et al. (2019). We also present an almost sure bound on the Wasserstein distance between the original and Hilbert-curve-resampled empirical distributions. In light of these theoretical results, we propose the stratified multiple-descendant growth (SMG) algorithm, which allows us to explore the sample space more efficiently compared to the standard i.i.d. multiple-descendant sampling-resampling approach as measured by the Wasserstein metric. Numerical evidence is provided to demonstrate the effectiveness of our proposed method. △ Less

Submitted 7 December, 2020; v1 submitted 4 April, 2020; originally announced April 2020.

arXiv:1911.02171 [pdf, other]

Minimax Nonparametric Two-sample Test under Smoothing

Authors: Xin Xing, Zuofeng Shang, Pang Du, ** Ma, Wenxuan Zhong, Jun S. Liu

Abstract: We consider the problem of comparing probability densities between two groups. A new probabilistic tensor product smoothing spline framework is developed to model the joint density of two variables. Under such a framework, the probability density comparison is equivalent to testing the presence/absence of interactions. We propose a penalized likelihood ratio test for such interaction testing and s… ▽ More We consider the problem of comparing probability densities between two groups. A new probabilistic tensor product smoothing spline framework is developed to model the joint density of two variables. Under such a framework, the probability density comparison is equivalent to testing the presence/absence of interactions. We propose a penalized likelihood ratio test for such interaction testing and show that the test statistic is asymptotically chi-square distributed under the null hypothesis. Furthermore, we derive a sharp minimax testing rate based on the Bernstein width for nonparametric two-sample tests and show that our proposed test statistics is minimax optimal. In addition, a data-adaptive tuning criterion is developed to choose the penalty parameter. Simulations and real applications demonstrate that the proposed test outperforms the conventional approaches under various scenarios. △ Less

Submitted 11 January, 2021; v1 submitted 5 November, 2019; originally announced November 2019.

arXiv:1907.11985 [pdf, other]

doi 10.1103/PhysRevE.101.033301

The Wang-Landau Algorithm as Stochastic Optimization and Its Acceleration

Authors: Chenguang Dai, Jun S. Liu

Abstract: We show that the Wang-Landau algorithm can be formulated as a stochastic gradient descent algorithm minimizing a smooth and convex objective function, of which the gradient is estimated using Markov chain Monte Carlo iterations. The optimization formulation provides us a new way to establish the convergence rate of the Wang-Landau algorithm, by exploiting the fact that almost surely, the density e… ▽ More We show that the Wang-Landau algorithm can be formulated as a stochastic gradient descent algorithm minimizing a smooth and convex objective function, of which the gradient is estimated using Markov chain Monte Carlo iterations. The optimization formulation provides us a new way to establish the convergence rate of the Wang-Landau algorithm, by exploiting the fact that almost surely, the density estimates (on the logarithmic scale) remain in a compact set, upon which the objective function is strongly convex. The optimization viewpoint motivates us to improve the efficiency of the Wang-Landau algorithm using popular tools including the momentum method and the adaptive learning rate method. We demonstrate the accelerated Wang-Landau algorithm on a two-dimensional Ising model and a two-dimensional ten-state Potts model. △ Less

Submitted 2 February, 2020; v1 submitted 27 July, 2019; originally announced July 2019.

Comments: 10 pages, 3 figures

Journal ref: Phys. Rev. E 101, 033301 (2020)

arXiv:1805.01820 [pdf, ps, other]

Global testing under the sparse alternatives for single index models

Authors: Qian Lin, Zhigen Zhao, Jun S. Liu

Abstract: For the single index model $y=f(β^τx,ε)$ with Gaussian design, %satisfying that rank $var(\mathbb{E}[x\mid y])=1$ where $f$ is unknown and $β$ is a sparse $p$-dimensional unit vector with at most $s$ nonzero entries, we are interested in testing the null hypothesis that $β$, when viewed as a whole vector, is zero against the alternative that some entries of $β$ is nonzero. Assuming that… ▽ More For the single index model $y=f(β^τx,ε)$ with Gaussian design, %satisfying that rank $var(\mathbb{E}[x\mid y])=1$ where $f$ is unknown and $β$ is a sparse $p$-dimensional unit vector with at most $s$ nonzero entries, we are interested in testing the null hypothesis that $β$, when viewed as a whole vector, is zero against the alternative that some entries of $β$ is nonzero. Assuming that $var(\mathbb{E}[x \mid y])$ is non-vanishing, we define the generalized signal-to-noise ratio (gSNR) $λ$ of the model as the unique non-zero eigenvalue of $var(\mathbb{E}[x \mid y])$. We show that if $s^{2}\log^2(p)\wedge p$ is of a smaller order of $n$, denoted as $s^{2}\log^2(p)\wedge p\prec n$, where $n$ is the sample size, one can detect the existence of signals if and only if gSNR$\succ\frac{p^{1/2}}{n}\wedge \frac{s\log(p)}{n}$. Furthermore, if the noise is additive (i.e., $y=f(β^τx)+ε$), one can detect the existence of the signal if and only if gSNR$\succ\frac{p^{1/2}}{n}\wedge \frac{s\log(p)}{n} \wedge \frac{1}{\sqrt{n}}$. It is rather surprising that the detection boundary for the single index model with additive noise matches that for linear regression models. These results pave the road for thorough theoretical analysis of single/multiple index models in high dimensions. △ Less

Submitted 4 May, 2018; originally announced May 2018.

Comments: 22 pages, 4 figures

arXiv:1701.06009 [pdf, other]

On the optimality of sliced inverse regression in high dimensions

Authors: Qian Lin, Xinran Li, Dongming Huang, Jun S. Liu

Abstract: The central subspace of a pair of random variables $(y,x) \in \mathbb{R}^{p+1}$ is the minimal subspace $\mathcal{S}$ such that $y \perp \hspace{-2mm} \perp x\mid P_{\mathcal{S}}x$. In this paper, we consider the minimax rate of estimating the central space of the multiple index models $y=f(β_{1}^τx,β_{2}^τx,...,β_{d}^τx,ε)$ with at most $s$ active predictors where $x \sim N(0,I_{p})$. We first in… ▽ More The central subspace of a pair of random variables $(y,x) \in \mathbb{R}^{p+1}$ is the minimal subspace $\mathcal{S}$ such that $y \perp \hspace{-2mm} \perp x\mid P_{\mathcal{S}}x$. In this paper, we consider the minimax rate of estimating the central space of the multiple index models $y=f(β_{1}^τx,β_{2}^τx,...,β_{d}^τx,ε)$ with at most $s$ active predictors where $x \sim N(0,I_{p})$. We first introduce a large class of models depending on the smallest non-zero eigenvalue $λ$ of $var(\mathbb{E}[x|y])$, over which we show that an aggregated estimator based on the SIR procedure converges at rate $d\wedge((sd+s\log(ep/s))/(nλ))$. We then show that this rate is optimal in two scenarios: the single index models; and the multiple index models with fixed central dimension $d$ and fixed $λ$. By assuming a technical conjecture, we can show that this rate is also optimal for multiple index models with bounded dimension of the central space. We believe that these (conditional) optimal rate results bring us meaningful insights of general SDR problems in high dimensions. △ Less

Submitted 23 January, 2017; v1 submitted 21 January, 2017; originally announced January 2017.

Comments: 40 pages, 2 figures

arXiv:1611.06655 [pdf, ps, other]

Sparse Sliced Inverse Regression Via Lasso

Authors: Qian Lin, Zhigen Zhao, Jun S. Liu

Abstract: For multiple index models, it has recently been shown that the sliced inverse regression (SIR) is consistent for estimating the sufficient dimension reduction (SDR) space if and only if $ρ=\lim\frac{p}{n}=0$, where $p$ is the dimension and $n$ is the sample size. Thus, when $p$ is of the same or a higher order of $n$, additional assumptions such as sparsity must be imposed in order to ensure consi… ▽ More For multiple index models, it has recently been shown that the sliced inverse regression (SIR) is consistent for estimating the sufficient dimension reduction (SDR) space if and only if $ρ=\lim\frac{p}{n}=0$, where $p$ is the dimension and $n$ is the sample size. Thus, when $p$ is of the same or a higher order of $n$, additional assumptions such as sparsity must be imposed in order to ensure consistency for SIR. By constructing artificial response variables made up from top eigenvectors of the estimated conditional covariance matrix, we introduce a simple Lasso regression method to obtain an estimate of the SDR space. The resulting algorithm, Lasso-SIR, is shown to be consistent and achieve the optimal convergence rate under certain sparsity conditions when $p$ is of order $o(n^2λ^2)$, where $λ$ is the generalized signal-to-noise ratio. We also demonstrate the superior performance of Lasso-SIR compared with existing approaches via extensive numerical studies and several real data examples. △ Less

Submitted 17 June, 2018; v1 submitted 21 November, 2016; originally announced November 2016.

Comments: 41 pages, 2 figures

MSC Class: 62J02 (Primary); 62H25 (Secondary)

arXiv:1511.08102 [pdf, other]

L1-Regularized Least Squares for Support Recovery of High Dimensional Single Index Models with Gaussian Designs

Authors: Matey Neykov, Jun S. Liu, Tianxi Cai

Abstract: It is known that for a certain class of single index models (SIMs) $Y = f(\boldsymbol{X}_{p \times 1}^\intercal\boldsymbolβ_0, \varepsilon)$, support recovery is impossible when $\boldsymbol{X} \sim \mathcal{N}(0, \mathbb{I}_{p \times p})$ and a model complexity adjusted sample size is below a critical threshold. Recently, optimal algorithms based on Sliced Inverse Regression (SIR) were suggested.… ▽ More It is known that for a certain class of single index models (SIMs) $Y = f(\boldsymbol{X}_{p \times 1}^\intercal\boldsymbolβ_0, \varepsilon)$, support recovery is impossible when $\boldsymbol{X} \sim \mathcal{N}(0, \mathbb{I}_{p \times p})$ and a model complexity adjusted sample size is below a critical threshold. Recently, optimal algorithms based on Sliced Inverse Regression (SIR) were suggested. These algorithms work provably under the assumption that the design $\boldsymbol{X}$ comes from an i.i.d. Gaussian distribution. In the present paper we analyze algorithms based on covariance screening and least squares with $L_1$ penalization (i.e. LASSO) and demonstrate that they can also enjoy optimal (up to a scalar) rescaled sample size in terms of support recovery, albeit under slightly different assumptions on $f$ and $\varepsilon$ compared to the SIR based algorithms. Furthermore, we show more generally, that LASSO succeeds in recovering the signed support of $\boldsymbolβ_0$ if $\boldsymbol{X} \sim \mathcal{N}(0, \boldsymbolΣ)$, and the covariance $\boldsymbolΣ$ satisfies the irrepresentable condition. Our work extends existing results on the support recovery of LASSO for the linear model, to a more general class of SIMs. △ Less

Submitted 22 June, 2016; v1 submitted 25 November, 2015; originally announced November 2015.

Comments: 36 pages; 6 figures; typos corrected; clearer notation introduced

arXiv:1511.02270 [pdf, other]

Signed Support Recovery for Single Index Models in High-Dimensions

Authors: Matey Neykov, Qian Lin, Jun S. Liu

Abstract: In this paper we study the support recovery problem for single index models $Y=f(\boldsymbol{X}^{\intercal} \boldsymbolβ,\varepsilon)$, where $f$ is an unknown link function, $\boldsymbol{X}\sim N_p(0,\mathbb{I}_{p})$ and $\boldsymbolβ$ is an $s$-sparse unit vector such that $\boldsymbolβ_{i}\in \{\pm\frac{1}{\sqrt{s}},0\}$. In particular, we look into the performance of two computationally inexpe… ▽ More In this paper we study the support recovery problem for single index models $Y=f(\boldsymbol{X}^{\intercal} \boldsymbolβ,\varepsilon)$, where $f$ is an unknown link function, $\boldsymbol{X}\sim N_p(0,\mathbb{I}_{p})$ and $\boldsymbolβ$ is an $s$-sparse unit vector such that $\boldsymbolβ_{i}\in \{\pm\frac{1}{\sqrt{s}},0\}$. In particular, we look into the performance of two computationally inexpensive algorithms: (a) the diagonal thresholding sliced inverse regression (DT-SIR) introduced by Lin et al. (2015); and (b) a semi-definite programming (SDP) approach inspired by Amini & Wainwright (2008). When $s=O(p^{1-δ})$ for some $δ>0$, we demonstrate that both procedures can succeed in recovering the support of $\boldsymbolβ$ as long as the rescaled sample size $κ=\frac{n}{s\log(p-s)}$ is larger than a certain critical threshold. On the other hand, when $κ$ is smaller than a critical value, any algorithm fails to recover the support with probability at least $\frac{1}{2}$ asymptotically. In other words, we demonstrate that both DT-SIR and the SDP approach are optimal (up to a scalar) for recovering the support of $\boldsymbolβ$ in terms of sample size. We provide extensive simulations, as well as a real dataset application to help verify our theoretical observations. △ Less

Submitted 22 June, 2016; v1 submitted 6 November, 2015; originally announced November 2015.

Comments: 38 pages, 7 figures; 1 table; data set analysis added; typos corrected

arXiv:1510.08986 [pdf, other]

A Unified Theory of Confidence Regions and Testing for High Dimensional Estimating Equations

Authors: Matey Neykov, Yang Ning, Jun S. Liu, Han Liu

Abstract: We propose a new inferential framework for constructing confidence regions and testing hypotheses in statistical models specified by a system of high dimensional estimating equations. We construct an influence function by projecting the fitted estimating equations to a sparse direction obtained by solving a large-scale linear program. Our main theoretical contribution is to establish a unified Z-e… ▽ More We propose a new inferential framework for constructing confidence regions and testing hypotheses in statistical models specified by a system of high dimensional estimating equations. We construct an influence function by projecting the fitted estimating equations to a sparse direction obtained by solving a large-scale linear program. Our main theoretical contribution is to establish a unified Z-estimation theory of confidence regions for high dimensional problems. Different from existing methods, all of which require the specification of the likelihood or pseudo-likelihood, our framework is likelihood-free. As a result, our approach provides valid inference for a broad class of high dimensional constrained estimating equation problems, which are not covered by existing methods. Such examples include, noisy compressed sensing, instrumental variable regression, undirected graphical models, discriminant analysis and vector autoregressive models. We present detailed theoretical results for all these examples. Finally, we conduct thorough numerical simulations, and a real dataset analysis to back up the developed theoretical results. △ Less

Submitted 22 June, 2016; v1 submitted 30 October, 2015; originally announced October 2015.

Comments: 67 pages, 2 tables, 1 figure

arXiv:1507.03895 [pdf, ps, other]

On consistency and sparsity for sliced inverse regression in high dimensions

Authors: Qian Lin, Zhigen Zhao, Jun S. Liu

Abstract: We provide here a framework to analyze the phase transition phenomenon of slice inverse regression (SIR), a supervised dimension reduction technique introduced by \cite{Li:1991}. Under mild conditions, the asymptotic ratio $ρ= \lim p/n$ is the phase transition parameter and the SIR estimator is consistent if and only if $ρ= 0$. When dimension $p$ is greater than $n$, we propose a diagonal threshol… ▽ More We provide here a framework to analyze the phase transition phenomenon of slice inverse regression (SIR), a supervised dimension reduction technique introduced by \cite{Li:1991}. Under mild conditions, the asymptotic ratio $ρ= \lim p/n$ is the phase transition parameter and the SIR estimator is consistent if and only if $ρ= 0$. When dimension $p$ is greater than $n$, we propose a diagonal thresholding screening SIR (DT-SIR) algorithm. This method provides us with an estimate of the eigen-space of the covariance matrix of the conditional expectation $var(\mathbf{E}[\boldsymbol{x}|y])$. The desired dimension reduction space is then obtained by multiplying the inverse of the covariance matrix on the eigen-space. Under certain sparsity assumptions on both the covariance matrix of predictors and the loadings of the directions, we prove the consistency of DT-SIR in estimating the dimension reduction space in high dimensional data analysis. Extensive numerical experiments demonstrate superior performances of the proposed method in comparison to its competitors. △ Less

Submitted 21 November, 2016; v1 submitted 14 July, 2015; originally announced July 2015.

Comments: 49 pages, 4 figures

MSC Class: 62J02 (Primary); 62H25 (Secondary)

arXiv:1005.5483 [pdf, ps, other]

doi 10.1111/rssb.12023

Model Selection Principles in Misspecified Models

Authors: **chi Lv, Jun S. Liu

Abstract: Model selection is of fundamental importance to high dimensional modeling featured in many contemporary applications. Classical principles of model selection include the Kullback-Leibler divergence principle and the Bayesian principle, which lead to the Akaike information criterion and Bayesian information criterion when models are correctly specified. Yet model misspecification is unavoidable whe… ▽ More Model selection is of fundamental importance to high dimensional modeling featured in many contemporary applications. Classical principles of model selection include the Kullback-Leibler divergence principle and the Bayesian principle, which lead to the Akaike information criterion and Bayesian information criterion when models are correctly specified. Yet model misspecification is unavoidable when we have no knowledge of the true model or when we have the correct family of distributions but miss some true predictor. In this paper, we propose a family of semi-Bayesian principles for model selection in misspecified models, which combine the strengths of the two well-known principles. We derive asymptotic expansions of the semi-Bayesian principles in misspecified generalized linear models, which give the new semi-Bayesian information criteria (SIC). A specific form of SIC admits a natural decomposition into the negative maximum quasi-log-likelihood, a penalty on model dimensionality, and a penalty on model misspecification directly. Numerical studies demonstrate the advantage of the newly proposed SIC methodology for model selection in both correctly specified and misspecified models. △ Less

Submitted 11 May, 2016; v1 submitted 29 May, 2010; originally announced May 2010.

Comments: 25 pages, 6 tables

MSC Class: 62J12(Primary); 62B10; 62F07; 62F15; 62J07(Secondary)

Journal ref: Journal of the Royal Statistical Society Series B 76, 141-167 (2014)

arXiv:math/0611217 [pdf, ps, other]

doi 10.1214/009053606000000489

Discussion of "Equi-energy sampler" by Kou, Zhou and Wong

Authors: Yves F. Atchadé, Jun S. Liu

Abstract: We congratulate Samuel Kou, Qing Zhou and Wing Wong [math.ST/0507080] (referred to subsequently as KZW) for this beautifully written paper, which opens a new direction in Monte Carlo computation. This discussion has two parts. First, we describe a very closely related method, multicanonical sampling (MCS), and report a simulation example that compares the equi-energy (EE) sampler with MCS. Overa… ▽ More We congratulate Samuel Kou, Qing Zhou and Wing Wong [math.ST/0507080] (referred to subsequently as KZW) for this beautifully written paper, which opens a new direction in Monte Carlo computation. This discussion has two parts. First, we describe a very closely related method, multicanonical sampling (MCS), and report a simulation example that compares the equi-energy (EE) sampler with MCS. Overall, we found the two algorithms to be of comparable efficiency for the simulation problem considered. In the second part, we develop some additional convergence results for the EE sampler. △ Less

Submitted 8 November, 2006; originally announced November 2006.

Comments: Published at http://dx.doi.org/10.1214/009053606000000489 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOS-AOS0088B

Journal ref: Annals of Statistics 2006, Vol. 34, No. 4, 1620-1628

arXiv:math/0610655 [pdf, ps, other]

Bayesian Clustering of Transcription Factor Binding Motifs

Authors: Shane T. Jensen, Jun S. Liu

Abstract: Genes are often regulated in living cells by proteins called transcription factors (TFs) that bind directly to short segments of DNA in close proximity to specific genes. These binding sites have a conserved nucleotide appearance, which is called a motif. Several recent studies of transcriptional regulation require the reduction of a large collection of motifs into clusters based on the similari… ▽ More Genes are often regulated in living cells by proteins called transcription factors (TFs) that bind directly to short segments of DNA in close proximity to specific genes. These binding sites have a conserved nucleotide appearance, which is called a motif. Several recent studies of transcriptional regulation require the reduction of a large collection of motifs into clusters based on the similarity of their nucleotide composition. We present a principled approach to this clustering problem based upon a Bayesian hierarchical model that accounts for both within- and between-motif variability. We use a Dirichlet process prior distribution that allows the number of clusters to vary and we also present a novel generalization that allows the core width of each motif to vary. This clustering model is implemented, using a Gibbs sampling strategy, on several collections of transcription factor motif matrices. Our clusters provide a means by which to organize transcription factors based on binding motif similarities, which can be used to reduce motif redundancy within large databases such as JASPAR and TRANSFAC. Finally, our clustering procedure has been used in combination with discovery of evolutionarily-conserved motifs to predict co-regulated genes. An alternative to our Dirichlet process prior distribution is explored but shows no substantive difference in the clustering results for our datasets. Our Bayesian clustering model based on the Dirichlet process has several advantages over traditional clustering methods that could make our procedure appropriate and useful for many clustering applications. △ Less

Submitted 21 October, 2006; originally announced October 2006.

Comments: Submitted to the Journal of the American Statistical Association

Showing 1–18 of 18 results for author: Liu, J S