Search | arXiv e-print repository

Variational inference based on a subclass of closed skew normals

Abstract: Gaussian distributions are widely used in Bayesian variational inference to approximate intractable posterior densities, but the ability to accommodate skewness can improve approximation accuracy significantly, especially when data or prior information is scarce. We study the properties of a subclass of closed skew normals constructed using affine transformation of independent standardized univari… ▽ More Gaussian distributions are widely used in Bayesian variational inference to approximate intractable posterior densities, but the ability to accommodate skewness can improve approximation accuracy significantly, especially when data or prior information is scarce. We study the properties of a subclass of closed skew normals constructed using affine transformation of independent standardized univariate skew normals as the variational density, and illustrate how this subclass provides increased flexibility and accuracy in approximating the joint posterior density in a variety of applications by overcoming limitations in existing skew normal variational approximations. The evidence lower bound is optimized using stochastic gradient ascent, where analytic natural gradient updates are derived. We also demonstrate how problems in maximum likelihood estimation of skew normal parameters occur similarly in stochastic variational inference and can be resolved using the centered parametrization. △ Less

Submitted 5 June, 2023; originally announced June 2023.

Comments: keywords: Closed skew normal; Gaussian variational approximation; natural gradient; centered parametrization; LU decomposition

arXiv:2210.10566 [pdf, other]

Second order stochastic gradient update for Cholesky factor in Gaussian variational approximation from Stein's Lemma

Authors: Linda S. L. Tan

Abstract: In stochastic variational inference, use of the reparametrization trick for the multivariate Gaussian gives rise to efficient updates for the mean and Cholesky factor of the covariance matrix, which depend on the first order derivative of the log joint model density. In this article, we show that an alternative unbiased gradient estimate for the Cholesky factor which depends on the second order de… ▽ More In stochastic variational inference, use of the reparametrization trick for the multivariate Gaussian gives rise to efficient updates for the mean and Cholesky factor of the covariance matrix, which depend on the first order derivative of the log joint model density. In this article, we show that an alternative unbiased gradient estimate for the Cholesky factor which depends on the second order derivative of the log joint model density can be derived using Stein's Lemma. This leads to a second order stochastic gradient update for the Cholesky factor which is able to improve convergence, as it has variance lower than the first order update (almost negligible) when close to the mode. We also derive second order update for the Cholesky factor of the precision matrix, which is useful when the precision matrix has a sparse structure reflecting conditional independence in the true posterior distribution. Our results can be used to obtain second order natural gradient updates for the Cholesky factor as well, which are more robust compared to updates based on Euclidean gradients. △ Less

Submitted 19 October, 2022; originally announced October 2022.

Comments: 15 pages, 2 figures

arXiv:2109.00375 [pdf, other]

Analytic natural gradient updates for Cholesky factor in Gaussian variational approximation

Authors: Linda S. L. Tan

Abstract: Natural gradients can improve convergence in stochastic variational inference significantly but inverting the Fisher information matrix is daunting in high dimensions. Moreover, in Gaussian variational approximation, natural gradient updates of the precision matrix do not ensure positive definiteness. To tackle this issue, we derive analytic natural gradient updates of the Cholesky factor of the c… ▽ More Natural gradients can improve convergence in stochastic variational inference significantly but inverting the Fisher information matrix is daunting in high dimensions. Moreover, in Gaussian variational approximation, natural gradient updates of the precision matrix do not ensure positive definiteness. To tackle this issue, we derive analytic natural gradient updates of the Cholesky factor of the covariance or precision matrix, and consider sparsity constraints representing different posterior correlation structures. Stochastic normalized natural gradient ascent with momentum is proposed for implementation in generalized linear mixed models and deep neural networks. △ Less

Submitted 19 May, 2024; v1 submitted 1 September, 2021; originally announced September 2021.

Comments: 47 pages, 10 figures

arXiv:1904.09591 [pdf, other]

Conditionally structured variational Gaussian approximation with importance weights

Authors: Linda S. L. Tan, Aishwarya Bhaskaran, David J. Nott

Abstract: We develop flexible methods of deriving variational inference for models with complex latent variable structure. By splitting the variables in these models into "global" parameters and "local" latent variables, we define a class of variational approximations that exploit this partitioning and go beyond Gaussian variational approximation. This approximation is motivated by the fact that in many hie… ▽ More We develop flexible methods of deriving variational inference for models with complex latent variable structure. By splitting the variables in these models into "global" parameters and "local" latent variables, we define a class of variational approximations that exploit this partitioning and go beyond Gaussian variational approximation. This approximation is motivated by the fact that in many hierarchical models, there are global variance parameters which determine the scale of local latent variables in their posterior conditional on the global parameters. We also consider parsimonious parametrizations by using conditional independence structure, and improved estimation of the log marginal likelihood and variational density using importance weights. These methods are shown to improve significantly on Gaussian variational approximation methods for a similar computational cost. Application of the methodology is illustrated using generalized linear mixed models and state space models. △ Less

Submitted 21 April, 2019; originally announced April 2019.

Comments: 18 pages, 7 figures

arXiv:1811.04249 [pdf, other]

Bayesian variational inference for exponential random graph models

Authors: Linda S. L. Tan, Nial Friel

Abstract: Deriving Bayesian inference for exponential random graph models (ERGMs) is a challenging "doubly intractable" problem as the normalizing constants of the likelihood and posterior density are both intractable. Markov chain Monte Carlo (MCMC) methods which yield Bayesian inference for ERGMs, such as the exchange algorithm, are asymptotically exact but computationally intensive, as a network has to b… ▽ More Deriving Bayesian inference for exponential random graph models (ERGMs) is a challenging "doubly intractable" problem as the normalizing constants of the likelihood and posterior density are both intractable. Markov chain Monte Carlo (MCMC) methods which yield Bayesian inference for ERGMs, such as the exchange algorithm, are asymptotically exact but computationally intensive, as a network has to be drawn from the likelihood at every step using, for instance, a "tie no tie" sampler. In this article, we develop a variety of variational methods for Gaussian approximation of the posterior density and model selection. These include nonconjugate variational message passing based on an adjusted pseudolikelihood and stochastic variational inference. To overcome the computational hurdle of drawing a network from the likelihood at each iteration, we propose stochastic gradient ascent with biased but consistent gradient estimates computed using adaptive self-normalized importance sampling. These methods provide attractive fast alternatives to MCMC for posterior approximation. We illustrate the variational methods using real networks and compare their accuracy with results obtained via MCMC and Laplace approximation. △ Less

Submitted 23 November, 2019; v1 submitted 10 November, 2018; originally announced November 2018.

Comments: 45 pages

arXiv:1805.07267 [pdf, ps, other]

Use of model reparametrization to improve variational Bayes

Authors: Linda S. L. Tan

Abstract: We propose using model reparametrization to improve variational Bayes inference for hierarchical models whose variables can be classified as global (shared across observations) or local (observation specific). Posterior dependence between local and global variables is minimized by applying an invertible affine transformation on the local variables. The functional form of this transformation is ded… ▽ More We propose using model reparametrization to improve variational Bayes inference for hierarchical models whose variables can be classified as global (shared across observations) or local (observation specific). Posterior dependence between local and global variables is minimized by applying an invertible affine transformation on the local variables. The functional form of this transformation is deduced by approximating the posterior distribution of each local variable conditional on the global variables by a Gaussian density via a second order Taylor expansion. Variational Bayes inference for the reparametrized model is then obtained using stochastic approximation. Our approach can be readily extended to large datasets via a divide and recombine strategy. Using generalized linear mixed models, we demonstrate that reparametrized variational Bayes (RVB) provides improvements in both accuracy and convergence rate compared to state of the art Gaussian variational approximation methods. △ Less

Submitted 7 March, 2020; v1 submitted 18 May, 2018; originally announced May 2018.

Journal ref: JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2020

arXiv:1712.08887 [pdf, other]

Efficient data augmentation techniques for some classes of state space models

Authors: Linda S. L. Tan

Abstract: Data augmentation improves the convergence of iterative algorithms, such as the EM algorithm and Gibbs sampler by introducing carefully designed latent variables. In this article, we first propose a data augmentation scheme for the first-order autoregression plus noise model, where optimal values of working parameters introduced for recentering and rescaling of the latent states, can be derived an… ▽ More Data augmentation improves the convergence of iterative algorithms, such as the EM algorithm and Gibbs sampler by introducing carefully designed latent variables. In this article, we first propose a data augmentation scheme for the first-order autoregression plus noise model, where optimal values of working parameters introduced for recentering and rescaling of the latent states, can be derived analytically by minimizing the fraction of missing information in the EM algorithm. The proposed data augmentation scheme is then utilized to design efficient Markov chain Monte Carlo (MCMC) algorithms for Bayesian inference of some non-Gaussian and nonlinear state space models, via a mixture of normals approximation coupled with a block-specific reparametrization strategy. Applications on simulated and benchmark real datasets indicate that the proposed MCMC sampler can yield improvements in simulation efficiency compared with centering, noncentering and even the ancillarity-sufficiency interweaving strategy. △ Less

Submitted 4 July, 2022; v1 submitted 24 December, 2017; originally announced December 2017.

Comments: Keywords: Data augmentation, State space model, Stochastic volatility model, EM algorithm, Reparametrization, Markov chain Monte Carlo, Ancillarity-sufficiency interweaving strategy

arXiv:1705.09088 [pdf, other]

doi 10.1177/1471082X18770760

Dynamic degree-corrected blockmodels for social networks: a nonparametric approach

Authors: Linda S. L. Tan, Maria De Iorio

Abstract: A nonparametric approach to the modeling of social networks using degree-corrected stochastic blockmodels is proposed. The model for static network consists of a stochastic blockmodel using a probit regression formulation and popularity parameters are incorporated to account for degree heterogeneity. Dirichlet processes are used to detect community structure as well as induce clustering in the pop… ▽ More A nonparametric approach to the modeling of social networks using degree-corrected stochastic blockmodels is proposed. The model for static network consists of a stochastic blockmodel using a probit regression formulation and popularity parameters are incorporated to account for degree heterogeneity. Dirichlet processes are used to detect community structure as well as induce clustering in the popularity parameters. This approach is flexible yet parsimonious as it allows the appropriate number of communities and popularity clusters to be determined automatically by the data. We further discuss some ways of extending the static model to dynamic networks. We consider a Bayesian approach and derive Gibbs samplers for posterior inference. The models are illustrated using several real-world benchmark social networks. △ Less

Submitted 25 May, 2017; originally announced May 2017.

Journal ref: Statistical Modelling (2019), 19, 386-411

arXiv:1605.05622 [pdf, other]

doi 10.1007/s11222-017-9729-7

Gaussian variational approximation with sparse precision matrices

Authors: Linda S. L. Tan, David J. Nott

Abstract: We consider the problem of learning a Gaussian variational approximation to the posterior distribution for a high-dimensional parameter, where we impose sparsity in the precision matrix to reflect appropriate conditional independence structure in the model. Incorporating sparsity in the precision matrix allows the Gaussian variational distribution to be both flexible and parsimonious, and the spar… ▽ More We consider the problem of learning a Gaussian variational approximation to the posterior distribution for a high-dimensional parameter, where we impose sparsity in the precision matrix to reflect appropriate conditional independence structure in the model. Incorporating sparsity in the precision matrix allows the Gaussian variational distribution to be both flexible and parsimonious, and the sparsity is achieved through parameterization in terms of the Cholesky factor. Efficient stochastic gradient methods which make appropriate use of gradient information for the target distribution are developed for the optimization. We consider alternative estimators of the stochastic gradients which have lower variation and are more stable. Our approach is illustrated using generalized linear mixed models and state space models for time series. △ Less

Submitted 12 April, 2017; v1 submitted 18 May, 2016; originally announced May 2016.

Comments: 18 pages, 9 figures

Journal ref: Statistics and Computing 28 (2018) 259-275

arXiv:1603.06358 [pdf, other]

doi 10.1214/17-AOAS1076

Bayesian inference for multiple Gaussian graphical models with application to metabolic association networks

Authors: Linda S. L. Tan, Ajay Jasra, Maria De Iorio, Timothy M. D. Ebbels

Abstract: We investigate the effect of cadmium (a toxic environmental pollutant) on the correlation structure of a number of urinary metabolites using Gaussian graphical models (GGMs). The inferred metabolic associations can provide important information on the physiological state of a metabolic system and insights on complex metabolic relationships. Using the fitted GGMs, we construct differential networks… ▽ More We investigate the effect of cadmium (a toxic environmental pollutant) on the correlation structure of a number of urinary metabolites using Gaussian graphical models (GGMs). The inferred metabolic associations can provide important information on the physiological state of a metabolic system and insights on complex metabolic relationships. Using the fitted GGMs, we construct differential networks, which highlight significant changes in metabolite interactions under different experimental conditions. The analysis of such metabolic association networks can reveal differences in the underlying biological reactions caused by cadmium exposure. We consider Bayesian inference and propose using the multiplicative (or Chung-Lu random graph) model as a prior on the graphical space. In the multiplicative model, each edge is chosen independently with probability equal to the product of the connectivities of the end nodes. This class of prior is parsimonious yet highly flexible; it can be used to encourage sparsity or graphs with a pre-specified degree distribution when such prior knowledge is available. We extend the multiplicative model to multiple GGMs linking the probability of edge inclusion through logistic regression and demonstrate how this leads to joint inference for multiple GGMs. A sequential Monte Carlo (SMC) algorithm is developed for estimating the posterior distribution of the graphs. △ Less

Submitted 13 April, 2017; v1 submitted 21 March, 2016; originally announced March 2016.

Journal ref: Ann. Appl. Stat. 11 (2017) 2222-2251

arXiv:1502.07190 [pdf, other]

doi 10.1214/15-AOAS887

Topic-adjusted visibility metric for scientific articles

Authors: Linda S. L. Tan, Aik Hui Chan, Tian Zheng

Abstract: Measuring the impact of scientific articles is important for evaluating the research output of individual scientists, academic institutions and journals. While citations are raw data for constructing impact measures, there exist biases and potential issues if factors affecting citation patterns are not properly accounted for. In this work, we address the problem of field variation and introduce an… ▽ More Measuring the impact of scientific articles is important for evaluating the research output of individual scientists, academic institutions and journals. While citations are raw data for constructing impact measures, there exist biases and potential issues if factors affecting citation patterns are not properly accounted for. In this work, we address the problem of field variation and introduce an article level metric useful for evaluating individual articles' visibility. This measure derives from joint probabilistic modeling of the content in the articles and the citations amongst them using latent Dirichlet allocation (LDA) and the mixed membership stochastic blockmodel (MMSB). Our proposed model provides a visibility metric for individual articles adjusted for field variation in citation rates, a structural understanding of citation behavior in different fields, and article recommendations which take into account article visibility and citation patterns. We develop an efficient algorithm for model fitting using variational methods. To scale up to large networks, we develop an online variant using stochastic gradient methods and case-control likelihood approximation. We apply our methods to the benchmark KDD Cup 2003 dataset with approximately 30,000 high energy physics papers. △ Less

Submitted 16 October, 2015; v1 submitted 25 February, 2015; originally announced February 2015.

Journal ref: Annals of Applied Statistics, Volume 10, Number 1 (2016), 1-31

arXiv:1405.5623 [pdf, ps, other]

doi 10.1007/s11222-015-9618-x

Stochastic variational inference for large-scale discrete choice models using adaptive batch sizes

Authors: Linda S. L. Tan

Abstract: Discrete choice models describe the choices made by decision makers among alternatives and play an important role in transportation planning, marketing research and other applications. The mixed multinomial logit (MMNL) model is a popular discrete choice model that captures heterogeneity in the preferences of decision makers through random coefficients. While Markov chain Monte Carlo methods provi… ▽ More Discrete choice models describe the choices made by decision makers among alternatives and play an important role in transportation planning, marketing research and other applications. The mixed multinomial logit (MMNL) model is a popular discrete choice model that captures heterogeneity in the preferences of decision makers through random coefficients. While Markov chain Monte Carlo methods provide the Bayesian analogue to classical procedures for estimating MMNL models, computations can be prohibitively expensive for large datasets. Approximate inference can be obtained using variational methods at a lower computational cost with competitive accuracy. In this paper, we develop variational methods for estimating MMNL models that allow random coefficients to be correlated in the posterior and can be extended easily to large-scale datasets. We explore three alternatives: (1) Laplace variational inference, (2) nonconjugate variational message passing and (3) stochastic linear regression. Their performances are compared using real and simulated data. To accelerate convergence for large datasets, we develop stochastic variational inference for MMNL models using each of the above alternatives. Stochastic variational inference allows data to be processed in minibatches by optimizing global variational parameters using stochastic gradient approximation. A novel strategy for increasing minibatch sizes adaptively within stochastic variational inference is proposed. △ Less

Submitted 8 October, 2015; v1 submitted 21 May, 2014; originally announced May 2014.

Journal ref: Statistics and Computing (2017) 27 pp 237-257

arXiv:1306.1999 [pdf, ps, other]

doi 10.1007/s11222-015-9600-7

Variational inference for sparse spectrum Gaussian process regression

Authors: Linda S. L. Tan, Victor M. H. Ong, David J. Nott, Ajay Jasra

Abstract: We develop a fast variational approximation scheme for Gaussian process (GP) regression, where the spectrum of the covariance function is subjected to a sparse approximation. Our approach enables uncertainty in covariance function hyperparameters to be treated without using Monte Carlo methods and is robust to overfitting. Our article makes three contributions. First, we present a variational Baye… ▽ More We develop a fast variational approximation scheme for Gaussian process (GP) regression, where the spectrum of the covariance function is subjected to a sparse approximation. Our approach enables uncertainty in covariance function hyperparameters to be treated without using Monte Carlo methods and is robust to overfitting. Our article makes three contributions. First, we present a variational Bayes algorithm for fitting sparse spectrum GP regression models that uses nonconjugate variational message passing to derive fast and efficient updates. Second, we propose a novel adaptive neighbourhood technique for obtaining predictive inference that is effective in dealing with nonstationarity. Regression is performed locally at each point to be predicted and the neighbourhood is determined using a measure defined based on lengthscales estimated from an initial fit. Weighting dimensions according to lengthscales, this downweights variables of little relevance, leading to automatic variable selection and improved prediction. Third, we introduce a technique for accelerating convergence in nonconjugate variational message passing by adapting step sizes in the direction of the natural gradient of the lower bound. Our adaptive strategy can be easily implemented and empirical results indicate significant speedups. △ Less

Submitted 26 January, 2015; v1 submitted 9 June, 2013; originally announced June 2013.

Comments: 20 pages, 11 figures, 1 table

Journal ref: Statistics and Computing (2016) 26 pp 1243-1261

arXiv:1208.4949 [pdf, other]

doi 10.1214/14-BA885

A stochastic variational framework for fitting and diagnosing generalized linear mixed models

Authors: Linda S. L. Tan, David J. Nott

Abstract: In stochastic variational inference, the variational Bayes objective function is optimized using stochastic gradient approximation, where gradients computed on small random subsets of data are used to approximate the true gradient over the whole data set. This enables complex models to be fit to large data sets as data can be processed in mini-batches. In this article, we extend stochastic variati… ▽ More In stochastic variational inference, the variational Bayes objective function is optimized using stochastic gradient approximation, where gradients computed on small random subsets of data are used to approximate the true gradient over the whole data set. This enables complex models to be fit to large data sets as data can be processed in mini-batches. In this article, we extend stochastic variational inference for conjugate-exponential models to nonconjugate models and present a stochastic nonconjugate variational message passing algorithm for fitting generalized linear mixed models that is scalable to large data sets. In addition, we show that diagnostics for prior-likelihood conflict, which are useful for Bayesian model criticism, can be obtained from nonconjugate variational message passing automatically, as an alternative to simulation-based Markov chain Monte Carlo methods. Finally, we demonstrate that for moderate-sized data sets, convergence can be accelerated by using the stochastic version of nonconjugate variational message passing in the initial stage of optimization before switching to the standard version. △ Less

Submitted 28 March, 2014; v1 submitted 24 August, 2012; originally announced August 2012.

Comments: 42 pages, 13 figures, 9 tables

Journal ref: Bayesian Analysis (2014), 9, 963-1004

arXiv:1205.3906 [pdf, ps, other]

doi 10.1214/13-STS418

Variational Inference for Generalized Linear Mixed Models Using Partially Noncentered Parametrizations

Authors: Linda S. L. Tan, David J. Nott

Abstract: The effects of different parametrizations on the convergence of Bayesian computational algorithms for hierarchical models are well explored. Techniques such as centering, noncentering and partial noncentering can be used to accelerate convergence in MCMC and EM algorithms but are still not well studied for variational Bayes (VB) methods. As a fast deterministic approach to posterior approximation,… ▽ More The effects of different parametrizations on the convergence of Bayesian computational algorithms for hierarchical models are well explored. Techniques such as centering, noncentering and partial noncentering can be used to accelerate convergence in MCMC and EM algorithms but are still not well studied for variational Bayes (VB) methods. As a fast deterministic approach to posterior approximation, VB is attracting increasing interest due to its suitability for large high-dimensional data. Use of different parametrizations for VB has not only computational but also statistical implications, as different parametrizations are associated with different factorized posterior approximations. We examine the use of partially noncentered parametrizations in VB for generalized linear mixed models (GLMMs). Our paper makes four contributions. First, we show how to implement an algorithm called nonconjugate variational message passing for GLMMs. Second, we show that the partially noncentered parametrization can adapt to the quantity of information in the data and determine a parametrization close to optimal. Third, we show that partial noncentering can accelerate convergence and produce more accurate posterior approximations than centering or noncentering. Finally, we demonstrate how the variational lower bound, produced as part of the computation, can be useful for model selection. △ Less

Submitted 11 June, 2013; v1 submitted 17 May, 2012; originally announced May 2012.

Comments: Published in at http://dx.doi.org/10.1214/13-STS418 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-STS-STS418

Journal ref: Statistical Science 2013, Vol. 28, No. 2, 168-188

Showing 1–15 of 15 results for author: Tan, L S L