Search | arXiv e-print repository

The VampPrior Mixture Model

Abstract: Current clustering priors for deep latent variable models (DLVMs) require defining the number of clusters a-priori and are susceptible to poor initializations. Addressing these deficiencies could greatly benefit deep learning-based scRNA-seq analysis by performing integration and clustering simultaneously. We adapt the VampPrior (Tomczak & Welling, 2018) into a Dirichlet process Gaussian mixture m… ▽ More Current clustering priors for deep latent variable models (DLVMs) require defining the number of clusters a-priori and are susceptible to poor initializations. Addressing these deficiencies could greatly benefit deep learning-based scRNA-seq analysis by performing integration and clustering simultaneously. We adapt the VampPrior (Tomczak & Welling, 2018) into a Dirichlet process Gaussian mixture model, resulting in the VampPrior Mixture Model (VMM), a novel prior for DLVMs. We propose an inference procedure that alternates between variational inference and Empirical Bayes to cleanly distinguish variational and prior parameters. Using the VMM in a Variational Autoencoder attains highly competitive clustering performance on benchmark datasets. Augmenting scVI (Lopez et al., 2018), a popular scRNA-seq integration method, with the VMM significantly improves its performance and automatically arranges cells into biologically meaningful clusters. △ Less

Submitted 4 March, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

arXiv:2212.09184 [pdf, other]

Faithful Heteroscedastic Regression with Neural Networks

Authors: Andrew Stirn, Hans-Hermann Wessels, Megan Schertzer, Laura Pereira, Neville E. Sanjana, David A. Knowles

Abstract: Heteroscedastic regression models a Gaussian variable's mean and variance as a function of covariates. Parametric methods that employ neural networks for these parameter maps can capture complex relationships in the data. Yet, optimizing network parameters via log likelihood gradients can yield suboptimal mean and uncalibrated variance estimates. Current solutions side-step this optimization probl… ▽ More Heteroscedastic regression models a Gaussian variable's mean and variance as a function of covariates. Parametric methods that employ neural networks for these parameter maps can capture complex relationships in the data. Yet, optimizing network parameters via log likelihood gradients can yield suboptimal mean and uncalibrated variance estimates. Current solutions side-step this optimization problem with surrogate objectives or Bayesian treatments. Instead, we make two simple modifications to optimization. Notably, their combination produces a heteroscedastic model with mean estimates that are provably as accurate as those from its homoscedastic counterpart (i.e.~fitting the mean under squared error loss). For a wide variety of network and task complexities, we find that mean estimates from existing heteroscedastic solutions can be significantly less accurate than those from an equivalently expressive mean-only model. Our approach provably retains the accuracy of an equally flexible mean-only model while also offering best-in-class variance calibration. Lastly, we show how to leverage our method to recover the underlying heteroscedastic noise variance. △ Less

Submitted 18 December, 2022; originally announced December 2022.

arXiv:2006.04910 [pdf, other]

Variational Variance: Simple, Reliable, Calibrated Heteroscedastic Noise Variance Parameterization

Authors: Andrew Stirn, David A. Knowles

Abstract: Brittle optimization has been observed to adversely impact model likelihoods for regression and VAEs when simultaneously fitting neural network map**s from a (random) variable onto the mean and variance of a dependent Gaussian variable. Previous works have bolstered optimization and improved likelihoods, but fail other basic posterior predictive checks (PPCs). Under the PPC framework, we propose… ▽ More Brittle optimization has been observed to adversely impact model likelihoods for regression and VAEs when simultaneously fitting neural network map**s from a (random) variable onto the mean and variance of a dependent Gaussian variable. Previous works have bolstered optimization and improved likelihoods, but fail other basic posterior predictive checks (PPCs). Under the PPC framework, we propose critiques to test predictive mean and variance calibration and the predictive distribution's ability to generate sensible data. We find that our attractively simple solution, to treat heteroscedastic variance variationally, sufficiently regularizes variance to pass these PPCs. We consider a diverse gamut of existing and novel priors and find our methods preserve or outperform existing model likelihoods while significantly improving parameter calibration and sample quality for regression and VAEs. △ Less

Submitted 30 October, 2020; v1 submitted 8 June, 2020; originally announced June 2020.

Comments: 17 pages, 6 figures, 10 tables

arXiv:1905.12052 [pdf, other]

A New Distribution on the Simplex with Auto-Encoding Applications

Authors: Andrew Stirn, Tony Jebara, David A Knowles

Abstract: We construct a new distribution for the simplex using the Kumaraswamy distribution and an ordered stick-breaking process. We explore and develop the theoretical properties of this new distribution and prove that it exhibits symmetry under the same conditions as the well-known Dirichlet. Like the Dirichlet, the new distribution is adept at capturing sparsity but, unlike the Dirichlet, has an exact… ▽ More We construct a new distribution for the simplex using the Kumaraswamy distribution and an ordered stick-breaking process. We explore and develop the theoretical properties of this new distribution and prove that it exhibits symmetry under the same conditions as the well-known Dirichlet. Like the Dirichlet, the new distribution is adept at capturing sparsity but, unlike the Dirichlet, has an exact and closed form reparameterization--making it well suited for deep variational Bayesian modeling. We demonstrate the distribution's utility in a variety of semi-supervised auto-encoding tasks. In all cases, the resulting models achieve competitive performance commensurate with their simplicity, use of explicit probability models, and abstinence from adversarial training. △ Less

Submitted 14 December, 2019; v1 submitted 28 May, 2019; originally announced May 2019.

Comments: 15 pages, 6 figures, 1 tables

arXiv:1509.01631 [pdf, other]

Stochastic gradient variational Bayes for gamma approximating distributions

Authors: David A. Knowles

Abstract: While stochastic variational inference is relatively well known for scaling inference in Bayesian probabilistic models, related methods also offer ways to circumnavigate the approximation of analytically intractable expectations. The key challenge in either setting is controlling the variance of gradient estimates: recent work has shown that for continuous latent variables, particularly multivaria… ▽ More While stochastic variational inference is relatively well known for scaling inference in Bayesian probabilistic models, related methods also offer ways to circumnavigate the approximation of analytically intractable expectations. The key challenge in either setting is controlling the variance of gradient estimates: recent work has shown that for continuous latent variables, particularly multivariate Gaussians, this can be achieved by using the gradient of the log posterior. In this paper we apply the same idea to gamma distributed latent variables given gamma variational distributions, enabling straightforward "black box" variational inference in models where sparsity and non-negativity are appropriate. We demonstrate the method on a recently proposed gamma process model for network data, as well as a novel sparse factor analysis. We outperform generic sampling algorithms and the approach of using Gaussian variational distributions on transformed variables. △ Less

Submitted 4 September, 2015; originally announced September 2015.

arXiv:1506.08180 [pdf, other]

An Empirical Study of Stochastic Variational Algorithms for the Beta Bernoulli Process

Authors: Amar Shah, David A. Knowles, Zoubin Ghahramani

Abstract: Stochastic variational inference (SVI) is emerging as the most promising candidate for scaling inference in Bayesian probabilistic models to large datasets. However, the performance of these methods has been assessed primarily in the context of Bayesian topic models, particularly latent Dirichlet allocation (LDA). Deriving several new algorithms, and using synthetic, image and genomic datasets, we… ▽ More Stochastic variational inference (SVI) is emerging as the most promising candidate for scaling inference in Bayesian probabilistic models to large datasets. However, the performance of these methods has been assessed primarily in the context of Bayesian topic models, particularly latent Dirichlet allocation (LDA). Deriving several new algorithms, and using synthetic, image and genomic datasets, we investigate whether the understanding gleaned from LDA applies in the setting of sparse latent factor models, specifically beta process factor analysis (BPFA). We demonstrate that the big picture is consistent: using Gibbs sampling within SVI to maintain certain posterior dependencies is extremely effective. However, we find that different posterior dependencies are important in BPFA relative to LDA. Particularly, approximations able to model intra-local variable dependence perform best. △ Less

Submitted 26 June, 2015; originally announced June 2015.

Comments: ICML, 12 pages. Volume 37: Proceedings of The 32nd International Conference on Machine Learning, 2015

arXiv:1408.3378 [pdf, other]

Beta diffusion trees and hierarchical feature allocations

Authors: Creighton Heaukulani, David A. Knowles, Zoubin Ghahramani

Abstract: We define the beta diffusion tree, a random tree structure with a set of leaves that defines a collection of overlap** subsets of objects, known as a feature allocation. A generative process for the tree structure is defined in terms of particles (representing the objects) diffusing in some continuous space, analogously to the Dirichlet diffusion tree (Neal, 2003), which defines a tree structure… ▽ More We define the beta diffusion tree, a random tree structure with a set of leaves that defines a collection of overlap** subsets of objects, known as a feature allocation. A generative process for the tree structure is defined in terms of particles (representing the objects) diffusing in some continuous space, analogously to the Dirichlet diffusion tree (Neal, 2003), which defines a tree structure over partitions (i.e., non-overlap** subsets) of the objects. Unlike in the Dirichlet diffusion tree, multiple copies of a particle may exist and diffuse along multiple branches in the beta diffusion tree, and an object may therefore belong to multiple subsets of particles. We demonstrate how to build a hierarchically-clustered factor analysis model with the beta diffusion tree and how to perform inference over the random tree structures with a Markov chain Monte Carlo algorithm. We conclude with several numerical experiments on missing data problems with data sets of gene expression microarrays, international development statistics, and intranational socioeconomic measurements. △ Less

Submitted 3 April, 2015; v1 submitted 14 August, 2014; originally announced August 2014.

Comments: 43 pages, 13 figures. Major revision to the proof of Thm. 2. Large portions of Chs. 2 & 4 moved into the appendix. Added Fig. 4. Revisions throughout

arXiv:1403.4206 [pdf, other]

A reversible infinite HMM using normalised random measures

Authors: Konstantina Palla, David A. Knowles, Zoubin Ghahramani

Abstract: We present a nonparametric prior over reversible Markov chains. We use completely random measures, specifically gamma processes, to construct a countably infinite graph with weighted edges. By enforcing symmetry to make the edges undirected we define a prior over random walks on graphs that results in a reversible Markov chain. The resulting prior over infinite transition matrices is closely relat… ▽ More We present a nonparametric prior over reversible Markov chains. We use completely random measures, specifically gamma processes, to construct a countably infinite graph with weighted edges. By enforcing symmetry to make the edges undirected we define a prior over random walks on graphs that results in a reversible Markov chain. The resulting prior over infinite transition matrices is closely related to the hierarchical Dirichlet process but enforces reversibility. A reinforcement scheme has recently been proposed with similar properties, but the de Finetti measure is not well characterised. We take the alternative approach of explicitly constructing the mixing measure, which allows more straightforward and efficient inference at the cost of no longer having a closed form predictive distribution. We use our process to construct a reversible infinite HMM which we apply to two real datasets, one from epigenomics and one ion channel recording. △ Less

Submitted 17 March, 2014; originally announced March 2014.

Comments: 9 pages, 6 figures

arXiv:1401.1022 [pdf, ps, other]

On Using Control Variates with Stochastic Approximation for Variational Bayes and its Connection to Stochastic Linear Regression

Authors: Tim Salimans, David A. Knowles

Abstract: Recently, we and several other authors have written about the possibilities of using stochastic approximation techniques for fitting variational approximations to intractable Bayesian posterior distributions. Naive implementations of stochastic approximation suffer from high variance in this setting. Several authors have therefore suggested using control variates to reduce this variance, while we… ▽ More Recently, we and several other authors have written about the possibilities of using stochastic approximation techniques for fitting variational approximations to intractable Bayesian posterior distributions. Naive implementations of stochastic approximation suffer from high variance in this setting. Several authors have therefore suggested using control variates to reduce this variance, while we have taken a different but analogous approach to reducing the variance which we call stochastic linear regression. In this note we take the former perspective and derive the ideal set of control variates for stochastic approximation variational Bayes under a certain set of assumptions. We then show that using these control variates is closely related to using the stochastic linear regression approximation technique we proposed earlier. A simple example shows that our method for constructing control variates leads to stochastic estimators with much lower variance compared to other approaches. △ Less

Submitted 12 January, 2014; v1 submitted 6 January, 2014; originally announced January 2014.

arXiv:1309.6858 [pdf]

The Supervised IBP: Neighbourhood Preserving Infinite Latent Feature Models

Authors: Novi Quadrianto, Viktoriia Sharmanska, David A. Knowles, Zoubin Ghahramani

Abstract: We propose a probabilistic model to infer supervised latent variables in the Hamming space from observed data. Our model allows simultaneous inference of the number of binary latent variables, and their values. The latent variables preserve neighbourhood structure of the data in a sense that objects in the same semantic concept have similar latent values, and objects in different concepts have dis… ▽ More We propose a probabilistic model to infer supervised latent variables in the Hamming space from observed data. Our model allows simultaneous inference of the number of binary latent variables, and their values. The latent variables preserve neighbourhood structure of the data in a sense that objects in the same semantic concept have similar latent values, and objects in different concepts have dissimilar latent values. We formulate the supervised infinite latent variable problem based on an intuitive principle of pulling objects together if they are of the same type, and pushing them apart if they are not. We then combine this principle with a flexible Indian Buffet Process prior on the latent variables. We show that the inferred supervised latent variables can be directly used to perform a nearest neighbour search for the purpose of retrieval. We introduce a new application of dynamically extending hash codes, and show how to effectively couple the structure of the hash codes with continuously growing structure of the neighbourhood preserving infinite latent feature space. △ Less

Submitted 26 September, 2013; originally announced September 2013.

Comments: Appears in Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence (UAI2013)

Report number: UAI-P-2013-PG-527-536

arXiv:1303.3265 [pdf, other]

A dependent partition-valued process for multitask clustering and time evolving network modelling

Authors: Konstantina Palla, David A. Knowles, Zoubin Ghahramani

Abstract: The fundamental aim of clustering algorithms is to partition data points. We consider tasks where the discovered partition is allowed to vary with some covariate such as space or time. One approach would be to use fragmentation-coagulation processes, but these, being Markov processes, are restricted to linear or tree structured covariate spaces. We define a partition-valued process on an arbitrary… ▽ More The fundamental aim of clustering algorithms is to partition data points. We consider tasks where the discovered partition is allowed to vary with some covariate such as space or time. One approach would be to use fragmentation-coagulation processes, but these, being Markov processes, are restricted to linear or tree structured covariate spaces. We define a partition-valued process on an arbitrary covariate space using Gaussian processes. We use the process to construct a multitask clustering model which partitions datapoints in a similar way across multiple data sources, and a time series model of network data which allows cluster assignments to vary over time. We describe sampling algorithms for inference and apply our method to defining cancer subtypes based on different types of cellular characteristics, finding regulatory modules from gene expression data from multiple human populations, and discovering time varying community structure in a social network. △ Less

Submitted 31 October, 2013; v1 submitted 13 March, 2013; originally announced March 2013.

Comments: 9 pages, 7 figures, submitted for review

arXiv:1206.6679 [pdf, other]

doi 10.1214/13-BA858

Fixed-Form Variational Posterior Approximation through Stochastic Linear Regression

Authors: Tim Salimans, David A. Knowles

Abstract: We propose a general algorithm for approximating nonstandard Bayesian posterior distributions. The algorithm minimizes the Kullback-Leibler divergence of an approximating distribution to the intractable posterior distribution. Our method can be used to approximate any posterior distribution, provided that it is given in closed form up to the proportionality constant. The approximation can be any d… ▽ More We propose a general algorithm for approximating nonstandard Bayesian posterior distributions. The algorithm minimizes the Kullback-Leibler divergence of an approximating distribution to the intractable posterior distribution. Our method can be used to approximate any posterior distribution, provided that it is given in closed form up to the proportionality constant. The approximation can be any distribution in the exponential family or any mixture of such distributions, which means that it can be made arbitrarily precise. Several examples illustrate the speed and accuracy of our approximation method in practice. △ Less

Submitted 28 July, 2014; v1 submitted 28 June, 2012; originally announced June 2012.

MSC Class: 62F15

Journal ref: Bayesian Analysis, Volume 8, Number 4 (2013), 837-882

arXiv:1110.4411 [pdf, other]

Gaussian Process Regression Networks

Authors: Andrew Gordon Wilson, David A. Knowles, Zoubin Ghahramani

Abstract: We introduce a new regression framework, Gaussian process regression networks (GPRN), which combines the structural properties of Bayesian neural networks with the non-parametric flexibility of Gaussian processes. This model accommodates input dependent signal and noise correlations between multiple response variables, input dependent length-scales and amplitudes, and heavy-tailed predictive distr… ▽ More We introduce a new regression framework, Gaussian process regression networks (GPRN), which combines the structural properties of Bayesian neural networks with the non-parametric flexibility of Gaussian processes. This model accommodates input dependent signal and noise correlations between multiple response variables, input dependent length-scales and amplitudes, and heavy-tailed predictive distributions. We derive both efficient Markov chain Monte Carlo and variational Bayes inference procedures for this model. We apply GPRN as a multiple output regression and multivariate volatility model, demonstrating substantially improved performance over eight popular multiple output (multi-task) Gaussian process models and three multivariate volatility models on benchmark datasets, including a 1000 dimensional gene expression dataset. △ Less

Submitted 19 October, 2011; originally announced October 2011.

Comments: 17 pages, 3 figures, 1 table. Submitted for publication

arXiv:1106.2494 [pdf, other]

Pitman-Yor Diffusion Trees

Authors: David A. Knowles, Zoubin Ghahramani

Abstract: We introduce the Pitman Yor Diffusion Tree (PYDT) for hierarchical clustering, a generalization of the Dirichlet Diffusion Tree (Neal, 2001) which removes the restriction to binary branching structure. The generative process is described and shown to result in an exchangeable distribution over data points. We prove some theoretical properties of the model and then present two inference methods: a… ▽ More We introduce the Pitman Yor Diffusion Tree (PYDT) for hierarchical clustering, a generalization of the Dirichlet Diffusion Tree (Neal, 2001) which removes the restriction to binary branching structure. The generative process is described and shown to result in an exchangeable distribution over data points. We prove some theoretical properties of the model and then present two inference methods: a collapsed MCMC sampler which allows us to model uncertainty over tree structures, and a computationally efficient greedy Bayesian EM search algorithm. Both algorithms use message passing on the tree structure. The utility of the model and algorithms is demonstrated on synthetic and real world data, both continuous and binary. △ Less

Submitted 16 June, 2011; v1 submitted 13 June, 2011; originally announced June 2011.

Comments: 8 pages, to be presented at UAI 2011

MSC Class: 62G07; 62H30 ACM Class: G.3.7

Showing 1–14 of 14 results for author: Knowles, D A