Search | arXiv e-print repository

Neural-g: A Deep Learning Framework for Mixing Density Estimation

Authors: Shijie Wang, Saptarshi Chakraborty, Qian Qin, Ray Bai

Abstract: Mixing (or prior) density estimation is an important problem in machine learning and statistics, especially in empirical Bayes $g$-modeling where accurately estimating the prior is necessary for making good posterior inferences. In this paper, we propose neural-$g$, a new neural network-based estimator for $g$-modeling. Neural-$g$ uses a softmax output layer to ensure that the estimated prior is a… ▽ More Mixing (or prior) density estimation is an important problem in machine learning and statistics, especially in empirical Bayes $g$-modeling where accurately estimating the prior is necessary for making good posterior inferences. In this paper, we propose neural-$g$, a new neural network-based estimator for $g$-modeling. Neural-$g$ uses a softmax output layer to ensure that the estimated prior is a valid probability density. Under default hyperparameters, we show that neural-$g$ is very flexible and capable of capturing many unknown densities, including those with flat regions, heavy tails, and/or discontinuities. In contrast, existing methods struggle to capture all of these prior shapes. We provide justification for neural-$g$ by establishing a new universal approximation theorem regarding the capability of neural networks to learn arbitrary probability mass functions. To accelerate convergence of our numerical implementation, we utilize a weighted average gradient descent approach to update the network parameters. Finally, we extend neural-$g$ to multivariate prior density estimation. We illustrate the efficacy of our approach through simulations and analyses of real datasets. A software package to implement neural-$g$ is publicly available at https://github.com/shijiew97/neuralG. △ Less

Submitted 9 June, 2024; originally announced June 2024.

Comments: 40 pages, 8 figures, 5 tables

arXiv:2405.10194 [pdf, ps, other]

Multivariate strong invariance principle and uncertainty assessment for time in-homogeneous cyclic MCMC samplers

Authors: Haoxiang Li, Qian Qin

Abstract: Time in-homogeneous cyclic Markov chain Monte Carlo (MCMC) samplers, including deterministic scan Gibbs samplers and Metropolis within Gibbs samplers, are extensively used for sampling from multi-dimensional distributions. We establish a multivariate strong invariance principle (SIP) for Markov chains associated with these samplers. The rate of this SIP essentially aligns with the tightest rate av… ▽ More Time in-homogeneous cyclic Markov chain Monte Carlo (MCMC) samplers, including deterministic scan Gibbs samplers and Metropolis within Gibbs samplers, are extensively used for sampling from multi-dimensional distributions. We establish a multivariate strong invariance principle (SIP) for Markov chains associated with these samplers. The rate of this SIP essentially aligns with the tightest rate available for time homogeneous Markov chains. The SIP implies the strong law of large numbers (SLLN) and the central limit theorem (CLT), and plays an essential role in uncertainty assessments. Using the SIP, we give conditions under which the multivariate batch means estimator for estimating the covariance matrix in the multivariate CLT is strongly consistent. Additionally, we provide conditions for a multivariate fixed volume sequential termination rule, which is associated with the concept of effective sample size (ESS), to be asymptotically valid. Our uncertainty assessment tools are demonstrated through various numerical experiments. △ Less

Submitted 16 May, 2024; originally announced May 2024.

arXiv:2212.01712 [pdf, other]

Convergence Analysis of Data Augmentation Algorithms for Bayesian Robust Multivariate Linear Regression with Incomplete Data

Authors: Haoxiang Li, Qian Qin, Galin L. Jones

Abstract: Gaussian mixtures are commonly used for modeling heavy-tailed error distributions in robust linear regression. Combining the likelihood of a multivariate robust linear regression model with a standard improper prior distribution yields an analytically intractable posterior distribution that can be sampled using a data augmentation algorithm. When the response matrix has missing entries, there are… ▽ More Gaussian mixtures are commonly used for modeling heavy-tailed error distributions in robust linear regression. Combining the likelihood of a multivariate robust linear regression model with a standard improper prior distribution yields an analytically intractable posterior distribution that can be sampled using a data augmentation algorithm. When the response matrix has missing entries, there are unique challenges to the application and analysis of the convergence properties of the algorithm. Conditions for geometric ergodicity are provided when the incomplete data have a "monotone" structure. In the absence of a monotone structure, an intermediate imputation step is necessary for implementing the algorithm. In this case, we provide sufficient conditions for the algorithm to be Harris ergodic. Finally, we show that, when there is a monotone structure and intermediate imputation is unnecessary, intermediate imputation slows the convergence of the underlying Monte Carlo Markov chain, while post hoc imputation does not. An R package for the data augmentation algorithm is provided. △ Less

Submitted 4 January, 2023; v1 submitted 3 December, 2022; originally announced December 2022.

MSC Class: 60J05; 62F15

arXiv:2208.11299 [pdf, ps, other]

Spectral Telescope: Convergence Rate Bounds for Random-Scan Gibbs Samplers Based on a Hierarchical Structure

Authors: Qian Qin, Guanyang Wang

Abstract: Random-scan Gibbs samplers possess a natural hierarchical structure. The structure connects Gibbs samplers targeting higher dimensional distributions to those targeting lower dimensional ones. This leads to a quasi-telesco** property of their spectral gaps. Based on this property, we derive three new bounds on the spectral gaps and convergence rates of Gibbs samplers on general domains. The thre… ▽ More Random-scan Gibbs samplers possess a natural hierarchical structure. The structure connects Gibbs samplers targeting higher dimensional distributions to those targeting lower dimensional ones. This leads to a quasi-telesco** property of their spectral gaps. Based on this property, we derive three new bounds on the spectral gaps and convergence rates of Gibbs samplers on general domains. The three bounds relate a chain's spectral gap to, respectively, the correlation structure of the target distribution, a class of random walk chains, and a collection of influence matrices. Notably, one of our results generalizes the technique of spectral independence, which has received considerable attention for its success on finite domains, to general state spaces. We illustrate our methods through a sampler targeting the uniform distribution on a corner of an $n$-cube. △ Less

Submitted 13 October, 2022; v1 submitted 24 August, 2022; originally announced August 2022.

MSC Class: 60J05 ACM Class: G.3

arXiv:2009.11119 [pdf, other]

Text Classification with Novelty Detection

Authors: Qi Qin, Wenpeng Hu, Bing Liu

Abstract: This paper studies the problem of detecting novel or unexpected instances in text classification. In traditional text classification, the classes appeared in testing must have been seen in training. However, in many applications, this is not the case because in testing, we may see unexpected instances that are not from any of the training classes. In this paper, we propose a significantly more eff… ▽ More This paper studies the problem of detecting novel or unexpected instances in text classification. In traditional text classification, the classes appeared in testing must have been seen in training. However, in many applications, this is not the case because in testing, we may see unexpected instances that are not from any of the training classes. In this paper, we propose a significantly more effective approach that converts the original problem to a pair-wise matching problem and then outputs how probable two instances belong to the same class. Under this approach, we present two models. The more effective model uses two embedding matrices of a pair of instances as two channels of a CNN. The output probabilities from such pairs are used to judge whether a test instance is from a seen class or is novel/unexpected. Experimental results show that the proposed method substantially outperforms the state-of-the-art baselines. △ Less

Submitted 23 September, 2020; originally announced September 2020.

arXiv:1810.08899 [pdf, other]

To Compress, or Not to Compress: Characterizing Deep Learning Model Compression for Embedded Inference

Authors: Qing Qin, Jie Ren, Jialong Yu, Ling Gao, Hai Wang, Jie Zheng, Yansong Feng, Jianbin Fang, Zheng Wang

Abstract: The recent advances in deep neural networks (DNNs) make them attractive for embedded systems. However, it can take a long time for DNNs to make an inference on resource-constrained computing devices. Model compression techniques can address the computation issue of deep inference on embedded devices. This technique is highly attractive, as it does not rely on specialized hardware, or computation-o… ▽ More The recent advances in deep neural networks (DNNs) make them attractive for embedded systems. However, it can take a long time for DNNs to make an inference on resource-constrained computing devices. Model compression techniques can address the computation issue of deep inference on embedded devices. This technique is highly attractive, as it does not rely on specialized hardware, or computation-offloading that is often infeasible due to privacy concerns or high latency. However, it remains unclear how model compression techniques perform across a wide range of DNNs. To design efficient embedded deep learning solutions, we need to understand their behaviors. This work develops a quantitative approach to characterize model compression techniques on a representative embedded deep learning architecture, the NVIDIA Jetson Tx2. We perform extensive experiments by considering 11 influential neural network architectures from the image classification and the natural language processing domains. We experimentally show that how two mainstream compression techniques, data quantization and pruning, perform on these network architectures and the implications of compression techniques to the model storage size, inference time, energy consumption and performance metrics. We demonstrate that there are opportunities to achieve fast deep inference on embedded systems, but one must carefully choose the compression settings. Our results provide insights on when and how to apply model compression techniques and guidelines for designing efficient embedded deep learning systems. △ Less

Submitted 21 October, 2018; originally announced October 2018.

Comments: 8 pages, To appear in ISPA 2018

arXiv:1810.08826 [pdf, ps, other]

Wasserstein-based methods for convergence complexity analysis of MCMC with applications

Authors: Qian Qin, James P. Hobert

Abstract: Over the last 25 years, techniques based on drift and minorization (d&m) have been mainstays in the convergence analysis of MCMC algorithms. However, results presented herein suggest that d&m may be less useful in the emerging area of convergence complexity analysis, which is the study of how the convergence behavior of Monte Carlo Markov chains scale with sample size, $n$, and/or number of covari… ▽ More Over the last 25 years, techniques based on drift and minorization (d&m) have been mainstays in the convergence analysis of MCMC algorithms. However, results presented herein suggest that d&m may be less useful in the emerging area of convergence complexity analysis, which is the study of how the convergence behavior of Monte Carlo Markov chains scale with sample size, $n$, and/or number of covariates, $p$. The problem appears to be that minorization can become a serious liability as dimension increases. Alternative methods for constructing convergence rate bounds (with respect to total variation distance) that do not require minorization are investigated. Based on Wasserstein distances and random map**s, these methods can produce bounds that are substantially more robust to increasing dimension than those based on d&m. The Wasserstein-based bounds are used to develop strong convergence complexity results for MCMC algorithms used in Bayesian probit regression and random effects models in the challenging asymptotic regime where $n$ and $p$ are both large. △ Less

Submitted 14 October, 2020; v1 submitted 20 October, 2018; originally announced October 2018.

MSC Class: 60J05

arXiv:1801.06934 [pdf, other]

On the Iteration Complexity Analysis of Stochastic Primal-Dual Hybrid Gradient Approach with High Probability

Authors: Linbo Qiao, Tianyi Lin, Qi Qin, Xicheng Lu

Abstract: In this paper, we propose a stochastic Primal-Dual Hybrid Gradient (PDHG) approach for solving a wide spectrum of regularized stochastic minimization problems, where the regularization term is composite with a linear function. It has been recognized that solving this kind of problem is challenging since the closed-form solution of the proximal map** associated with the regularization term is not… ▽ More In this paper, we propose a stochastic Primal-Dual Hybrid Gradient (PDHG) approach for solving a wide spectrum of regularized stochastic minimization problems, where the regularization term is composite with a linear function. It has been recognized that solving this kind of problem is challenging since the closed-form solution of the proximal map** associated with the regularization term is not available due to the imposed linear composition, and the per-iteration cost of computing the full gradient of the expected objective function is extremely high when the number of input data samples is considerably large. Our new approach overcomes these issues by exploring the special structure of the regularization term and sampling a few data points at each iteration. Rather than analyzing the convergence in expectation, we provide the detailed iteration complexity analysis for the cases of both uniformly and non-uniformly averaged iterates with high probability. This strongly supports the good practical performance of the proposed approach. Numerical experiments demonstrate that the efficiency of stochastic PDHG, which outperforms other competing algorithms, as expected by the high-probability convergence analysis. △ Less

Submitted 1 February, 2018; v1 submitted 21 January, 2018; originally announced January 2018.

arXiv:1704.00850 [pdf, other]

Estimating the spectral gap of a trace-class Markov operator

Authors: Qian Qin, James P. Hobert, Kshitij Khare

Abstract: The utility of a Markov chain Monte Carlo algorithm is, in large part, determined by the size of the spectral gap of the corresponding Markov operator. However, calculating (and even approximating) the spectral gaps of practical Monte Carlo Markov chains in statistics has proven to be an extremely difficult and often insurmountable task, especially when these chains move on continuous state spaces… ▽ More The utility of a Markov chain Monte Carlo algorithm is, in large part, determined by the size of the spectral gap of the corresponding Markov operator. However, calculating (and even approximating) the spectral gaps of practical Monte Carlo Markov chains in statistics has proven to be an extremely difficult and often insurmountable task, especially when these chains move on continuous state spaces. In this paper, a method for accurate estimation of the spectral gap is developed for general state space Markov chains whose operators are non-negative and trace-class. The method is based on the fact that the second largest eigenvalue (and hence the spectral gap) of such operators can be bounded above and below by simple functions of the power sums of the eigenvalues. These power sums often have nice integral representations. A classical Monte Carlo method is proposed to estimate these integrals, and a simple sufficient condition for finite variance is provided. This leads to asymptotically valid confidence intervals for the second largest eigenvalue (and the spectral gap) of the Markov operator. In contrast with previously existing techniques, our method is not based on a near-stationary version of the Markov chain, which, paradoxically, cannot be obtained in a principled manner without bounds on the spectral gap. On the other hand, it can be quite expensive from a computational standpoint. The efficiency of the method is studied both theoretically and empirically. △ Less

Submitted 4 April, 2019; v1 submitted 3 April, 2017; originally announced April 2017.

Showing 1–9 of 9 results for author: Qin, Q