Skip to main content

Showing 1–46 of 46 results for author: Liu, J S

Searching in archive stat. Search in all archives.
.
  1. arXiv:2405.19058  [pdf, other

    stat.ME

    Participation bias in the estimation of heritability and genetic correlation

    Authors: Shuang Song, Stefania Benonisdottir, Jun S. Liu, Augustine Kong

    Abstract: It is increasingly recognized that participation bias can pose problems for genetic studies. Recently, to overcome the challenge that genetic information of non-participants is unavailable, it is shown that by comparing the IBD (identity by descent) shared and not-shared segments among the participants, one can estimate the genetic component underlying participation. That, however, does not direct… ▽ More

    Submitted 30 May, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

  2. arXiv:2401.06383  [pdf, other

    stat.ME

    Decomposition with Monotone B-splines: Fitting and Testing

    Authors: Lijun Wang, Xiaodan Fan, Hongyu Zhao, Jun S. Liu

    Abstract: A univariate continuous function can always be decomposed as the sum of a non-increasing function and a non-decreasing one. Based on this property, we propose a non-parametric regression method that combines two spline-fitted monotone curves. We demonstrate by extensive simulations that, compared to standard spline-fitting methods, the proposed approach is particularly advantageous in high-noise s… ▽ More

    Submitted 9 April, 2024; v1 submitted 12 January, 2024; originally announced January 2024.

  3. arXiv:2309.16855  [pdf, other

    stat.ME math.ST

    A Variational Spike-and-Slab Approach for Group Variable Selection

    Authors: Buyu Lin, Changhao Ge, Jun S. Liu

    Abstract: We introduce a class of generic spike-and-slab priors for high-dimensional linear regression with grouped variables and present a Coordinate-ascent Variational Inference (CAVI) algorithm for obtaining an optimal variational Bayes approximation. Using parameter expansion for a specific, yet comprehensive, family of slab distributions, we obtain a further gain in computational efficiency. The method… ▽ More

    Submitted 28 September, 2023; originally announced September 2023.

    Comments: 64 pages, 6 figures

  4. arXiv:2308.15370  [pdf, other

    stat.ML cs.LG

    Multi-Response Heteroscedastic Gaussian Process Models and Their Inference

    Authors: Taehee Lee, Jun S. Liu

    Abstract: Despite the widespread utilization of Gaussian process models for versatile nonparametric modeling, they exhibit limitations in effectively capturing abrupt changes in function smoothness and accommodating relationships with heteroscedastic errors. Addressing these shortcomings, the heteroscedastic Gaussian process (HeGP) regression seeks to introduce flexibility by acknowledging the variability o… ▽ More

    Submitted 30 August, 2023; v1 submitted 29 August, 2023; originally announced August 2023.

    Comments: submitted to the Journal of the American Statistical Association (JASA)

  5. arXiv:2307.01748  [pdf, other

    stat.ME astro-ph.IM stat.CO

    Monotone Cubic B-Splines with a Neural-Network Generator

    Authors: Lijun Wang, Xiaodan Fan, Huabai Li, Jun S. Liu

    Abstract: We present a method for fitting monotone curves using cubic B-splines, which is equivalent to putting a monotonicity constraint on the coefficients. We explore different ways of enforcing this constraint and analyze their theoretical and empirical properties. We propose two algorithms for solving the spline fitting problem: one that uses standard optimization techniques and one that trains a Multi… ▽ More

    Submitted 17 November, 2023; v1 submitted 4 July, 2023; originally announced July 2023.

  6. arXiv:2201.10063  [pdf, other

    stat.ME

    Varying Coefficient Model via Adaptive Spline Fitting

    Authors: Xufei Wang, Bo Jiang, Jun S. Liu

    Abstract: The varying coefficient model has received broad attention from researchers as it is a powerful dimension reduction tool for non-parametric modeling. Most existing varying coefficient models fitted with polynomial spline assume equidistant knots and take the number of knots as the hyperparameter. However, imposing equidistant knots appears to be too rigid, and determining the optimal number of kno… ▽ More

    Submitted 14 June, 2022; v1 submitted 24 January, 2022; originally announced January 2022.

  7. arXiv:2112.08641  [pdf, ps, other

    stat.CO math.PR

    On Gibbs Sampling for Structured Bayesian Models Discussion of paper by Zanella and Roberts

    Authors: Xiaodong Yang, Jun S. Liu

    Abstract: This article is a discussion of Zanella and Roberts' paper: Multilevel linear models, gibbs samplers and multigrid decompositions. We consider several extensions in which the multigrid decomposition would bring us interesting insights, including vector hierarchical models, linear mixed effects models and partial centering parametrizations.

    Submitted 16 December, 2021; originally announced December 2021.

    Comments: 18 pages

  8. arXiv:2111.15084  [pdf, other

    stat.CO math.ST

    Convergence Rate of Multiple-try Metropolis Independent sampler

    Authors: Xiaodong Yang, Jun S. Liu

    Abstract: The Multiple-try Metropolis (MTM) method is an interesting extension of the classical Metropolis-Hastings algorithm. However, theoretical understandings of its convergence behavior as well as whether and how it may help are still unknown. This paper derives the exact convergence rate for Multiple-try Metropolis Independent sampler (MTM-IS) via an explicit eigen analysis. As a by-product, we prove… ▽ More

    Submitted 3 February, 2023; v1 submitted 29 November, 2021; originally announced November 2021.

    Comments: 34 pages; 7 figures

  9. arXiv:2110.15406  [pdf, other

    stat.ME

    Kernel-based Partial Permutation Test for Detecting Heterogeneous Functional Relationship

    Authors: Xinran Li, Bo Jiang, Jun S. Liu

    Abstract: We propose a kernel-based partial permutation test for checking the equality of functional relationship between response and covariates among different groups. The main idea, which is intuitive and easy to implement, is to keep the projections of the response vector $\boldsymbol{Y}$ on leading principle components of a kernel matrix fixed and permute $\boldsymbol{Y}$'s projections on the remaining… ▽ More

    Submitted 28 October, 2021; originally announced October 2021.

  10. arXiv:2104.07261  [pdf, other

    stat.ME

    Partition-Mallows Model and Its Inference for Rank Aggregation

    Authors: Wanchuang Zhu, Yingkai Jiang, Jun S. Liu, Ke Deng

    Abstract: Learning how to aggregate ranking lists has been an active research area for many years and its advances have played a vital role in many applications ranging from bioinformatics to internet commerce. The problem of discerning reliability of rankers based only on the rank data is of great interest to many practitioners, but has received less attention from researchers. By dividing the ranked entit… ▽ More

    Submitted 15 April, 2021; originally announced April 2021.

  11. arXiv:2010.06175  [pdf, other

    stat.ML cs.LG

    Neural Gaussian Mirror for Controlled Feature Selection in Neural Networks

    Authors: Xin Xing, Yu Gui, Chenguang Dai, Jun S. Liu

    Abstract: Deep neural networks (DNNs) have become increasingly popular and achieved outstanding performance in predictive tasks. However, the DNN framework itself cannot inform the user which features are more or less relevant for making the prediction, which limits its applicability in many scientific fields. We introduce neural Gaussian mirrors (NGMs), in which mirrored features are created, via a structu… ▽ More

    Submitted 13 October, 2020; originally announced October 2020.

  12. arXiv:2007.07498  [pdf, other

    stat.ML cs.LG stat.ME

    Measurement error models: from nonparametric methods to deep neural networks

    Authors: Zhirui Hu, Zheng Tracy Ke, Jun S Liu

    Abstract: The success of deep learning has inspired recent interests in applying neural networks in statistical inference. In this paper, we investigate the use of deep neural networks for nonparametric regression with measurement errors. We propose an efficient neural network design for estimating measurement error models, in which we use a fully connected feed-forward neural network (FNN) to approximate t… ▽ More

    Submitted 15 July, 2020; originally announced July 2020.

    Comments: 37 pages, 8 figures

  13. arXiv:2007.06136  [pdf, other

    stat.AP

    Bayesian Bi-clustering Methods with Applications in Computational Biology

    Authors: Han Yan, Jiexing Wu, Yang Li, Jun S. Liu

    Abstract: Bi-clustering is a useful approach in analyzing biological data when observations come from heterogeneous groups and have a large number of features. We outline a general Bayesian approach in tackling bi-clustering problems in moderate to high dimensions, and propose three Bayesian bi-clustering models on categorical data, which increase in complexities in their modeling of the distributions of fe… ▽ More

    Submitted 9 February, 2021; v1 submitted 12 July, 2020; originally announced July 2020.

  14. arXiv:2007.01237  [pdf, other

    stat.ME

    A Scale-free Approach for False Discovery Rate Control in Generalized Linear Models

    Authors: Chenguang Dai, Buyu Lin, Xin Xing, Jun S. Liu

    Abstract: The generalized linear models (GLM) have been widely used in practice to model non-Gaussian response variables. When the number of explanatory features is relatively large, scientific researchers are of interest to perform controlled feature selection in order to simplify the downstream analysis. This paper introduces a new framework for feature selection in GLMs that can achieve false discovery r… ▽ More

    Submitted 2 July, 2020; originally announced July 2020.

    Comments: 60 pages, 13 pages

  15. arXiv:2006.01055  [pdf, other

    stat.ME

    On Posterior Consistency of Bayesian Factor Models in High Dimensions

    Authors: Yucong Ma, Jun S. Liu

    Abstract: As a principled dimension reduction technique, factor models have been widely adopted in social science, economics, bioinformatics, and many other fields. However, in high-dimensional settings, conducting a 'correct' Bayesianfactor analysis can be subtle since it requires both a careful prescription of the prior distribution and a suitable computational strategy. In particular, we analyze the issu… ▽ More

    Submitted 1 January, 2021; v1 submitted 1 June, 2020; originally announced June 2020.

  16. arXiv:2006.00767  [pdf, other

    stat.ME

    Generative Multiple-purpose Sampler for Weighted M-estimation

    Authors: Minsuk Shin, Shijie Wang, Jun S Liu

    Abstract: To overcome the computational bottleneck of various data perturbation procedures such as the bootstrap and cross validations, we propose the Generative Multiple-purpose Sampler (GMS), which constructs a generator function to produce solutions of weighted M-estimators from a set of given weights and tuning parameters. The GMS is implemented by a single optimization without having to repeatedly eval… ▽ More

    Submitted 16 October, 2023; v1 submitted 1 June, 2020; originally announced June 2020.

  17. arXiv:2004.01975  [pdf, other

    stat.ME math.ST stat.CO

    Stratification and Optimal Resampling for Sequential Monte Carlo

    Authors: Yichao Li, Wenshuo Wang, Ke Deng, Jun S Liu

    Abstract: Sequential Monte Carlo (SMC), also known as particle filters, has been widely accepted as a powerful computational tool for making inference with dynamical systems. A key step in SMC is resampling, which plays the role of steering the algorithm towards the future dynamics. Several strategies have been proposed and used in practice, including multinomial resampling, residual resampling (Liu and Che… ▽ More

    Submitted 7 December, 2020; v1 submitted 4 April, 2020; originally announced April 2020.

  18. arXiv:2002.08542  [pdf, other

    stat.ME

    False Discovery Rate Control via Data Splitting

    Authors: Chenguang Dai, Buyu Lin, Xin Xing, Jun S. Liu

    Abstract: Selecting relevant features associated with a given response variable is an important issue in many scientific fields. Quantifying quality and uncertainty of a selection result via false discovery rate (FDR) control has been of recent interest. This paper introduces a way of using data-splitting strategies to asymptotically control the FDR while maintaining a high power. For each feature, the meth… ▽ More

    Submitted 15 December, 2020; v1 submitted 19 February, 2020; originally announced February 2020.

  19. arXiv:1911.09761  [pdf, other

    stat.ME

    Controlling False Discovery Rate Using Gaussian Mirrors

    Authors: Xin Xing, Zhigen Zhao, Jun S. Liu

    Abstract: Simultaneously finding multiple influential variables and controlling the false discovery rate (FDR) for linear regression models is a fundamental problem. We here propose the Gaussian Mirror (GM) method, which creates for each predictor variable a pair of mirror variables by adding and subtracting a randomly generated Gaussian perturbation, and proceeds with a certain regression method, such as t… ▽ More

    Submitted 19 March, 2021; v1 submitted 21 November, 2019; originally announced November 2019.

  20. arXiv:1911.02171  [pdf, other

    stat.ME math.ST stat.AP stat.ML

    Minimax Nonparametric Two-sample Test under Smoothing

    Authors: Xin Xing, Zuofeng Shang, Pang Du, ** Ma, Wenxuan Zhong, Jun S. Liu

    Abstract: We consider the problem of comparing probability densities between two groups. A new probabilistic tensor product smoothing spline framework is developed to model the joint density of two variables. Under such a framework, the probability density comparison is equivalent to testing the presence/absence of interactions. We propose a penalized likelihood ratio test for such interaction testing and s… ▽ More

    Submitted 11 January, 2021; v1 submitted 5 November, 2019; originally announced November 2019.

  21. arXiv:1909.05922  [pdf, other

    stat.CO stat.ME

    Monte Carlo Approximation of Bayes Factors via Mixing with Surrogate Distributions

    Authors: Chenguang Dai, Jun S. Liu

    Abstract: By mixing the target posterior distribution with a surrogate distribution, of which the normalizing constant is tractable, we propose a method for estimating the marginal likelihood using the Wang-Landau algorithm. We show that a faster convergence of the proposed method can be achieved via the momentum acceleration. Two implementation strategies are detailed: (i) facilitating global jumps between… ▽ More

    Submitted 15 December, 2020; v1 submitted 12 September, 2019; originally announced September 2019.

  22. arXiv:1907.11985  [pdf, other

    stat.CO cs.LG math.OC

    The Wang-Landau Algorithm as Stochastic Optimization and Its Acceleration

    Authors: Chenguang Dai, Jun S. Liu

    Abstract: We show that the Wang-Landau algorithm can be formulated as a stochastic gradient descent algorithm minimizing a smooth and convex objective function, of which the gradient is estimated using Markov chain Monte Carlo iterations. The optimization formulation provides us a new way to establish the convergence rate of the Wang-Landau algorithm, by exploiting the fact that almost surely, the density e… ▽ More

    Submitted 2 February, 2020; v1 submitted 27 July, 2019; originally announced July 2019.

    Comments: 10 pages, 3 figures

    Journal ref: Phys. Rev. E 101, 033301 (2020)

  23. arXiv:1905.12440  [pdf, other

    cs.LG stat.ME stat.ML

    Generative Parameter Sampler For Scalable Uncertainty Quantification

    Authors: Minsuk Shin, Young Lee, Jun S. Liu

    Abstract: Uncertainty quantification has been a core of the statistical machine learning, but its computational bottleneck has been a serious challenge for both Bayesians and frequentists. We propose a model-based framework in quantifying uncertainty, called predictive-matching Generative Parameter Sampler (GPS). This procedure considers an Uncertainty Quantification (UQ) distribution on the targeted parame… ▽ More

    Submitted 2 June, 2019; v1 submitted 28 May, 2019; originally announced May 2019.

  24. arXiv:1810.02658  [pdf, other

    stat.ML cs.LG

    IMMIGRATE: A Margin-based Feature Selection Method with Interaction Terms

    Authors: Ruzhang Zhao, Pengyu Hong, Jun S Liu

    Abstract: Relief based algorithms have often been claimed to uncover feature interactions. However, it is still unclear whether and how interaction terms will be differentiated from marginal effects. In this paper, we propose IMMIGRATE algorithm by including and training weights for interaction terms. Besides applying the large margin principle, we focus on the robustness of the contributors of margin and c… ▽ More

    Submitted 3 March, 2020; v1 submitted 5 October, 2018; originally announced October 2018.

    Comments: R package ('Immigrate') available on CRAN

    Journal ref: Entropy. 2020; 22(3):291

  25. arXiv:1810.00141  [pdf, other

    stat.ME

    Neuronized Priors for Bayesian Sparse Linear Regression

    Authors: Minsuk Shin, Jun S Liu

    Abstract: Although Bayesian variable selection methods have been intensively studied, their routine use in practice has not caught up with their non-Bayesian counterparts such as Lasso, likely due to difficulties in both computations and flexibilities of prior choices. To ease these challenges, we propose the neuronized priors to unify and extend some popular shrinkage priors, such as Laplace, Cauchy, horse… ▽ More

    Submitted 5 July, 2021; v1 submitted 28 September, 2018; originally announced October 2018.

  26. arXiv:1808.06109  [pdf, other

    stat.AP

    Bayesian Hidden Markov Tree Models for Clustering Genes with Shared Evolutionary History

    Authors: Yang Li, Shaoyang Ning, Sarah E. Calvo, Vamsi K. Mootha, Jun S. Liu

    Abstract: Determination of functions for poorly characterized genes is crucial for understanding biological processes and studying human diseases. Functionally associated genes are often gained and lost together through evolution. Therefore identifying co-evolution of genes can predict functional gene-gene associations. We describe here the full statistical model and computational strategies underlying the… ▽ More

    Submitted 18 August, 2018; originally announced August 2018.

    Comments: 34 pages, 8 figures

  27. arXiv:1807.01635  [pdf, other

    stat.ME stat.AP

    Randomization Inference for Peer Effects

    Authors: Xinran Li, Peng Ding, Qian Lin, Dawei Yang, Jun S. Liu

    Abstract: Many previous causal inference studies require no interference, that is, the potential outcomes of a unit do not depend on the treatments of other units. However, this no-interference assumption becomes unreasonable when a unit interacts with other units in the same group or cluster. In a motivating application, a university in China admits students through two channels: the college entrance exam… ▽ More

    Submitted 20 December, 2018; v1 submitted 4 July, 2018; originally announced July 2018.

  28. arXiv:1611.08649  [pdf, other

    stat.ME

    Robust Variable and Interaction Selection for Logistic Regression and Multiple Index Models

    Authors: Yang Li, Jun S. Liu

    Abstract: We propose Stepwise cOnditional likelihood variable selection for Discriminant Analysis (SODA) to detect both main and quadratic interaction effects in logistic regression and quadratic discriminant analysis (QDA) models. In the forward stage, SODA adds in important predictors evaluated based on their overall contributions, whereas in the backward stage SODA removes unimportant terms so as to opti… ▽ More

    Submitted 29 May, 2017; v1 submitted 25 November, 2016; originally announced November 2016.

  29. arXiv:1611.06655  [pdf, ps, other

    math.ST stat.ME

    Sparse Sliced Inverse Regression Via Lasso

    Authors: Qian Lin, Zhigen Zhao, Jun S. Liu

    Abstract: For multiple index models, it has recently been shown that the sliced inverse regression (SIR) is consistent for estimating the sufficient dimension reduction (SDR) space if and only if $ρ=\lim\frac{p}{n}=0$, where $p$ is the dimension and $n$ is the sample size. Thus, when $p$ is of the same or a higher order of $n$, additional assumptions such as sparsity must be imposed in order to ensure consi… ▽ More

    Submitted 17 June, 2018; v1 submitted 21 November, 2016; originally announced November 2016.

    Comments: 41 pages, 2 figures

    MSC Class: 62J02 (Primary); 62H25 (Secondary)

  30. arXiv:1607.06051  [pdf, other

    stat.ME

    Bayesian Analysis of Rank Data with Covariates and Heterogeneous Rankers

    Authors: Xinran Li, Dingdong Yi, Jun S. Liu

    Abstract: Data in the form of ranking lists are frequently encountered, and combining ranking results from different sources can potentially generate a better ranking list and help understand behaviors of the rankers. Of interest here are the rank data under the following settings: (i) covariate information available for the ranked entities; (ii) rankers of varying qualities or having different opinions; an… ▽ More

    Submitted 16 July, 2020; v1 submitted 20 July, 2016; originally announced July 2016.

  31. arXiv:1604.02736  [pdf, other

    stat.ME

    Generalized R-squared for Detecting Dependence

    Authors: Xufei Wang, Bo Jiang, Jun S. Liu

    Abstract: Detecting dependence between two random variables is a fundamental problem. Although the Pearson correlation is effective for capturing linear dependency, it can be entirely powerless for detecting nonlinear and/or heteroscedastic patterns. We introduce a new measure, G-squared, to test whether two univariate random variables are independent and to measure the strength of their relationship. The G… ▽ More

    Submitted 17 November, 2016; v1 submitted 10 April, 2016; originally announced April 2016.

  32. arXiv:1511.08102  [pdf, other

    math.ST stat.ML

    L1-Regularized Least Squares for Support Recovery of High Dimensional Single Index Models with Gaussian Designs

    Authors: Matey Neykov, Jun S. Liu, Tianxi Cai

    Abstract: It is known that for a certain class of single index models (SIMs) $Y = f(\boldsymbol{X}_{p \times 1}^\intercal\boldsymbolβ_0, \varepsilon)$, support recovery is impossible when $\boldsymbol{X} \sim \mathcal{N}(0, \mathbb{I}_{p \times p})$ and a model complexity adjusted sample size is below a critical threshold. Recently, optimal algorithms based on Sliced Inverse Regression (SIR) were suggested.… ▽ More

    Submitted 22 June, 2016; v1 submitted 25 November, 2015; originally announced November 2015.

    Comments: 36 pages; 6 figures; typos corrected; clearer notation introduced

  33. arXiv:1511.02270  [pdf, other

    math.ST stat.ML

    Signed Support Recovery for Single Index Models in High-Dimensions

    Authors: Matey Neykov, Qian Lin, Jun S. Liu

    Abstract: In this paper we study the support recovery problem for single index models $Y=f(\boldsymbol{X}^{\intercal} \boldsymbolβ,\varepsilon)$, where $f$ is an unknown link function, $\boldsymbol{X}\sim N_p(0,\mathbb{I}_{p})$ and $\boldsymbolβ$ is an $s$-sparse unit vector such that $\boldsymbolβ_{i}\in \{\pm\frac{1}{\sqrt{s}},0\}$. In particular, we look into the performance of two computationally inexpe… ▽ More

    Submitted 22 June, 2016; v1 submitted 6 November, 2015; originally announced November 2015.

    Comments: 38 pages, 7 figures; 1 table; data set analysis added; typos corrected

  34. arXiv:1510.08986  [pdf, other

    math.ST stat.ME stat.ML

    A Unified Theory of Confidence Regions and Testing for High Dimensional Estimating Equations

    Authors: Matey Neykov, Yang Ning, Jun S. Liu, Han Liu

    Abstract: We propose a new inferential framework for constructing confidence regions and testing hypotheses in statistical models specified by a system of high dimensional estimating equations. We construct an influence function by projecting the fitted estimating equations to a sparse direction obtained by solving a large-scale linear program. Our main theoretical contribution is to establish a unified Z-e… ▽ More

    Submitted 22 June, 2016; v1 submitted 30 October, 2015; originally announced October 2015.

    Comments: 67 pages, 2 tables, 1 figure

  35. arXiv:1510.07158  [pdf, other

    stat.ME

    Fast Parameter Estimation in Loss Tomography for Networks of General Topology

    Authors: Ke Deng, Yang Li, Wei** Zhu, Jun S. Liu

    Abstract: As a technique to investigate link-level loss rates of a computer network with low operational cost, loss tomography has received considerable attentions in recent years. A number of parameter estimation methods have been proposed for loss tomography of networks with a tree structure as well as a general topological structure. However, these methods suffer from either high computational cost or in… ▽ More

    Submitted 24 October, 2015; originally announced October 2015.

    Comments: To appear in Annals of Applied Statistics

  36. arXiv:1506.08852  [pdf, other

    stat.CO

    Locally weighted Markov chain Monte Carlo

    Authors: Espen Bernton, Shihao Yang, Yang Chen, Neil Shephard, Jun S. Liu

    Abstract: We propose a weighting scheme for the proposals within Markov chain Monte Carlo algorithms and show how this can improve statistical efficiency at no extra computational cost. These methods are most powerful when combined with multi-proposal MCMC algorithms such as multiple-try Metropolis, which can efficiently exploit modern computer architectures with large numbers of cores. The locally weighted… ▽ More

    Submitted 29 June, 2015; originally announced June 2015.

  37. arXiv:1506.02371  [pdf, other

    stat.ML

    Interpretable Selection and Visualization of Features and Interactions Using Bayesian Forests

    Authors: Viktoriya Krakovna, Jiong Du, Jun S. Liu

    Abstract: It is becoming increasingly important for machine learning methods to make predictions that are interpretable as well as accurate. In many practical applications, it is of interest which features and feature interactions are relevant to the prediction task. We present a novel method, Selective Bayesian Forest Classifier, that strikes a balance between predictive power and interpretability by simul… ▽ More

    Submitted 7 February, 2016; v1 submitted 8 June, 2015; originally announced June 2015.

    Comments: R package: github.com/vkrakovna/sbfc

  38. arXiv:1411.3070  [pdf, other

    stat.ME

    Bayesian nonparametric tests via sliced inverse modeling

    Authors: Bo Jiang, Chao Ye, Jun S. Liu

    Abstract: We study the problem of independence and conditional independence tests between categorical covariates and a continuous response variable, which has an immediate application in genetics. Instead of estimating the conditional distribution of the response given values of covariates, we model the conditional distribution of covariates given the discretized response (aka "slices"). By assigning a prio… ▽ More

    Submitted 1 May, 2015; v1 submitted 12 November, 2014; originally announced November 2014.

    Comments: 32 pages, 7 figures

  39. Variable selection for general index models via sliced inverse regression

    Authors: Bo Jiang, Jun S. Liu

    Abstract: Variable selection, also known as feature selection in machine learning, plays an important role in modeling high dimensional data and is key to data-driven scientific discoveries. We consider here the problem of detecting influential variables under the general index model, in which the response is dependent of predictors through an unknown function of one or more linear combinations of them. Ins… ▽ More

    Submitted 23 September, 2014; v1 submitted 15 April, 2013; originally announced April 2013.

    Comments: Published in at http://dx.doi.org/10.1214/14-AOS1233 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOS-AOS1233

    Journal ref: Annals of Statistics 2014, Vol. 42, No. 5, 1751-1786

  40. Lookahead Strategies for Sequential Monte Carlo

    Authors: Ming Lin, Rong Chen, Jun S. Liu

    Abstract: Based on the principles of importance sampling and resampling, sequential Monte Carlo (SMC) encompasses a large set of powerful techniques dealing with complex stochastic dynamic systems. Many of these systems possess strong memory, with which future information can help sharpen the inference about the current state. By providing theoretical justification of several existing algorithms and introdu… ▽ More

    Submitted 21 February, 2013; originally announced February 2013.

    Comments: Published in at http://dx.doi.org/10.1214/12-STS401 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-STS-STS401

    Journal ref: Statistical Science 2013, Vol. 28, No. 1, 69-94

  41. Block-based Bayesian epistasis association map** with application to WTCCC type 1 diabetes data

    Authors: Yu Zhang, **g Zhang, Jun S. Liu

    Abstract: Interactions among multiple genes across the genome may contribute to the risks of many complex human diseases. Whole-genome single nucleotide polymorphisms (SNPs) data collected for many thousands of SNP markers from thousands of individuals under the case--control design promise to shed light on our understanding of such interactions. However, nearby SNPs are highly correlated due to linkage dis… ▽ More

    Submitted 25 November, 2011; originally announced November 2011.

    Comments: Published in at http://dx.doi.org/10.1214/11-AOAS469 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOAS-AOAS469

    Journal ref: Annals of Applied Statistics 2011, Vol. 5, No. 3, 2052-2077

  42. arXiv:1104.2180  [pdf, ps, other

    stat.ME q-bio.GN q-bio.QM

    The EM Algorithm and the Rise of Computational Biology

    Authors: Xiaodan Fan, Yuan Yuan, Jun S. Liu

    Abstract: In the past decade computational biology has grown from a cottage industry with a handful of researchers to an attractive interdisciplinary field, catching the attention and imagination of many quantitatively-minded scientists. Of interest to us is the key role played by the EM algorithm during this transformation. We survey the use of the EM algorithm in a few important computational biology prob… ▽ More

    Submitted 12 April, 2011; originally announced April 2011.

    Comments: Published in at http://dx.doi.org/10.1214/09-STS312 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-STS-STS312

    Journal ref: Statistical Science 2010, Vol. 25, No. 4, 476-491

  43. Bayesian meta-analysis for identifying periodically expressed genes in fission yeast cell cycle

    Authors: Xiaodan Fan, Saumyadipta Pyne, Jun S. Liu

    Abstract: The effort to identify genes with periodic expression during the cell cycle from genome-wide microarray time series data has been ongoing for a decade. However, the lack of rigorous modeling of periodic expression as well as the lack of a comprehensive model for integrating information across genes and experiments has impaired the effort for the accurate identification of periodically expressed ge… ▽ More

    Submitted 9 November, 2010; originally announced November 2010.

    Comments: Published in at http://dx.doi.org/10.1214/09-AOAS300 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOAS-AOAS300

    Journal ref: Annals of Applied Statistics 2010, Vol. 4, No. 2, 988-1013

  44. arXiv:1005.5483  [pdf, ps, other

    math.ST stat.ME

    Model Selection Principles in Misspecified Models

    Authors: **chi Lv, Jun S. Liu

    Abstract: Model selection is of fundamental importance to high dimensional modeling featured in many contemporary applications. Classical principles of model selection include the Kullback-Leibler divergence principle and the Bayesian principle, which lead to the Akaike information criterion and Bayesian information criterion when models are correctly specified. Yet model misspecification is unavoidable whe… ▽ More

    Submitted 11 May, 2016; v1 submitted 29 May, 2010; originally announced May 2010.

    Comments: 25 pages, 6 tables

    MSC Class: 62J12(Primary); 62B10; 62F07; 62F15; 62J07(Secondary)

    Journal ref: Journal of the Royal Statistical Society Series B 76, 141-167 (2014)

  45. Doubly stochastic continuous-time hidden Markov approach for analyzing genome tiling arrays

    Authors: W. Evan Johnson, X. Shirley Liu, Jun S. Liu

    Abstract: Microarrays have been developed that tile the entire nonrepetitive genomes of many different organisms, allowing for the unbiased map** of active transcription regions or protein binding sites across the entire genome. These tiling array experiments produce massive correlated data sets that have many experimental artifacts, presenting many challenges to researchers that require innovative anal… ▽ More

    Submitted 12 October, 2009; originally announced October 2009.

    Comments: Published in at http://dx.doi.org/10.1214/09-AOAS248 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-AOAS-AOAS248

    Journal ref: Annals of Applied Statistics 2009, Vol. 3, No. 3, 1183-1203

  46. arXiv:math/0610655  [pdf, ps, other

    math.ST q-bio.GN stat.ME

    Bayesian Clustering of Transcription Factor Binding Motifs

    Authors: Shane T. Jensen, Jun S. Liu

    Abstract: Genes are often regulated in living cells by proteins called transcription factors (TFs) that bind directly to short segments of DNA in close proximity to specific genes. These binding sites have a conserved nucleotide appearance, which is called a motif. Several recent studies of transcriptional regulation require the reduction of a large collection of motifs into clusters based on the similari… ▽ More

    Submitted 21 October, 2006; originally announced October 2006.

    Comments: Submitted to the Journal of the American Statistical Association