Skip to main content

Showing 1–50 of 96 results for author: Ghahramani, Z

Searching in archive stat. Search in all archives.
.
  1. arXiv:2207.07411  [pdf, other

    cs.LG stat.ML

    Plex: Towards Reliability using Pretrained Large Model Extensions

    Authors: Dustin Tran, Jeremiah Liu, Michael W. Dusenberry, Du Phan, Mark Collier, Jie Ren, Kehang Han, Zi Wang, Zelda Mariet, Huiyi Hu, Neil Band, Tim G. J. Rudner, Karan Singhal, Zachary Nado, Joost van Amersfoort, Andreas Kirsch, Rodolphe Jenatton, Nithum Thain, Honglin Yuan, Kelly Buchanan, Kevin Murphy, D. Sculley, Yarin Gal, Zoubin Ghahramani, Jasper Snoek , et al. (1 additional authors not shown)

    Abstract: A recent trend in artificial intelligence is the use of pretrained models for language and vision tasks, which have achieved extraordinary performance but also puzzling failures. Probing these models' abilities in diverse ways is therefore critical to the field. In this paper, we explore the reliability of models, where we define a reliable model as one that not only achieves strong predictive per… ▽ More

    Submitted 15 July, 2022; originally announced July 2022.

    Comments: Code available at https://goo.gle/plex-code

  2. arXiv:2207.03084  [pdf, other

    cs.LG cs.AI stat.ML

    Pre-training helps Bayesian optimization too

    Authors: Zi Wang, George E. Dahl, Kevin Swersky, Chansoo Lee, Zelda Mariet, Zachary Nado, Justin Gilmer, Jasper Snoek, Zoubin Ghahramani

    Abstract: Bayesian optimization (BO) has become a popular strategy for global optimization of many expensive real-world functions. Contrary to a common belief that BO is suited to optimizing black-box functions, it actually requires domain knowledge on characteristics of those functions to deploy BO successfully. Such domain knowledge often manifests in Gaussian process priors that specify initial beliefs o… ▽ More

    Submitted 7 July, 2022; originally announced July 2022.

    Comments: ICML2022 Workshop on Adaptive Experimental Design and Active Learning in the Real World. arXiv admin note: substantial text overlap with arXiv:2109.08215

  3. arXiv:2206.03992  [pdf, other

    stat.ML cs.LG

    Neural Diffusion Processes

    Authors: Vincent Dutordoir, Alan Saul, Zoubin Ghahramani, Fergus Simpson

    Abstract: Neural network approaches for meta-learning distributions over functions have desirable properties such as increased flexibility and a reduced complexity of inference. Building on the successes of denoising diffusion models for generative modelling, we propose Neural Diffusion Processes (NDPs), a novel approach that learns to sample from a rich distribution over functions through its finite margin… ▽ More

    Submitted 6 June, 2023; v1 submitted 8 June, 2022; originally announced June 2022.

    Comments: 23 pages, Proceedings of the 40th International Conference on Machine Learning, PMLR 202

  4. arXiv:2109.08215  [pdf, other

    cs.LG stat.ML

    Pre-trained Gaussian processes for Bayesian optimization

    Authors: Zi Wang, George E. Dahl, Kevin Swersky, Chansoo Lee, Zelda Mariet, Zachary Nado, Justin Gilmer, Jasper Snoek, Zoubin Ghahramani

    Abstract: Bayesian optimization (BO) has become a popular strategy for global optimization of many expensive real-world functions. Contrary to a common belief that BO is suited to optimizing black-box functions, it actually requires domain knowledge on characteristics of those functions to deploy BO successfully. Such domain knowledge often manifests in Gaussian process priors that specify initial beliefs o… ▽ More

    Submitted 6 July, 2022; v1 submitted 16 September, 2021; originally announced September 2021.

  5. arXiv:2105.04504  [pdf, other

    stat.ML cs.LG

    Deep Neural Networks as Point Estimates for Deep Gaussian Processes

    Authors: Vincent Dutordoir, James Hensman, Mark van der Wilk, Carl Henrik Ek, Zoubin Ghahramani, Nicolas Durrande

    Abstract: Neural networks and Gaussian processes are complementary in their strengths and weaknesses. Having a better understanding of their relationship comes with the promise to make each method benefit from the strengths of the other. In this work, we establish an equivalence between the forward passes of neural networks and (deep) sparse Gaussian process models. The theory we develop is based on interpr… ▽ More

    Submitted 9 December, 2021; v1 submitted 10 May, 2021; originally announced May 2021.

    Comments: 35th Conference on Neural Information Processing Systems (NeurIPS 2021)

  6. arXiv:2004.09703  [pdf, other

    cs.LG cs.AI stat.ML

    Learning Continuous Treatment Policy and Bipartite Embeddings for Matching with Heterogeneous Causal Effects

    Authors: Will Y. Zou, Smitha Shyam, Michael Mui, Mingshi Wang, Jan Pedersen, Zoubin Ghahramani

    Abstract: Causal inference methods are widely applied in the fields of medicine, policy, and economics. Central to these applications is the estimation of treatment effects to make decisions. Current methods make binary yes-or-no decisions based on the treatment effect of a single outcome dimension. These methods are unable to capture continuous space treatment policies with a measure of intensity. They als… ▽ More

    Submitted 20 April, 2020; originally announced April 2020.

  7. arXiv:2004.06231  [pdf, other

    cs.LG stat.ML

    Einsum Networks: Fast and Scalable Learning of Tractable Probabilistic Circuits

    Authors: Robert Peharz, Steven Lang, Antonio Vergari, Karl Stelzner, Alejandro Molina, Martin Trapp, Guy Van den Broeck, Kristian Kersting, Zoubin Ghahramani

    Abstract: Probabilistic circuits (PCs) are a promising avenue for probabilistic modeling, as they permit a wide range of exact and efficient inference routines. Recent ``deep-learning-style'' implementations of PCs strive for a better scalability, but are still difficult to train on real-world data, due to their sparsely connected computational graphs. In this paper, we propose Einsum Networks (EiNets), a n… ▽ More

    Submitted 13 April, 2020; originally announced April 2020.

  8. arXiv:2002.02702  [pdf

    cs.LG cs.PL stat.ML

    DynamicPPL: Stan-like Speed for Dynamic Probabilistic Models

    Authors: Mohamed Tarek, Kai Xu, Martin Trapp, Hong Ge, Zoubin Ghahramani

    Abstract: We present the preliminary high-level design and features of DynamicPPL.jl, a modular library providing a lightning-fast infrastructure for probabilistic programming. Besides a computational performance that is often close to or better than Stan, DynamicPPL provides an intuitive DSL that allows the rapid development of complex dynamic probabilistic programs. Being entirely written in Julia, a high… ▽ More

    Submitted 7 February, 2020; originally announced February 2020.

  9. arXiv:2001.03048  [pdf, other

    stat.ML cs.LG

    Resource-Efficient Neural Networks for Embedded Systems

    Authors: Wolfgang Roth, Günther Schindler, Bernhard Klein, Robert Peharz, Sebastian Tschiatschek, Holger Fröning, Franz Pernkopf, Zoubin Ghahramani

    Abstract: While machine learning is traditionally a resource intensive task, embedded systems, autonomous navigation, and the vision of the Internet of Things fuel the interest in resource-efficient approaches. These approaches aim for a carefully chosen trade-off between performance and resource consumption in terms of computation and energy. The development of such approaches is among the major challenges… ▽ More

    Submitted 7 April, 2024; v1 submitted 7 January, 2020; originally announced January 2020.

    Comments: arXiv admin note: text overlap with arXiv:1812.02240; accepted at JMLR

  10. arXiv:1905.10884  [pdf, other

    cs.LG stat.ML

    Bayesian Learning of Sum-Product Networks

    Authors: Martin Trapp, Robert Peharz, Hong Ge, Franz Pernkopf, Zoubin Ghahramani

    Abstract: Sum-product networks (SPNs) are flexible density estimators and have received significant attention due to their attractive inference properties. While parameter learning in SPNs is well developed, structure learning leaves something to be desired: Even though there is a plethora of SPN structure learners, most of them are somewhat ad-hoc and based on intuition rather than a clear learning princip… ▽ More

    Submitted 4 November, 2019; v1 submitted 26 May, 2019; originally announced May 2019.

    Comments: NeurIPS 2019; See conference page for supplement

  11. arXiv:1812.02240  [pdf, other

    cs.LG stat.ML

    Efficient and Robust Machine Learning for Real-World Systems

    Authors: Franz Pernkopf, Wolfgang Roth, Matthias Zoehrer, Lukas Pfeifenberger, Guenther Schindler, Holger Froening, Sebastian Tschiatschek, Robert Peharz, Matthew Mattina, Zoubin Ghahramani

    Abstract: While machine learning is traditionally a resource intensive task, embedded systems, autonomous navigation and the vision of the Internet-of-Things fuel the interest in resource efficient approaches. These approaches require a carefully chosen trade-off between performance and resource consumption in terms of computation and energy. On top of this, it is crucial to treat uncertainty in a consisten… ▽ More

    Submitted 5 December, 2018; originally announced December 2018.

  12. arXiv:1810.00555  [pdf, other

    stat.ML cs.AI cs.LG

    Probabilistic Meta-Representations Of Neural Networks

    Authors: Theofanis Karaletsos, Peter Dayan, Zoubin Ghahramani

    Abstract: Existing Bayesian treatments of neural networks are typically characterized by weak prior and approximate posterior distributions according to which all the weights are drawn independently. Here, we consider a richer prior distribution in which units in the network are represented by latent variables, and the weights between units are drawn conditionally on the values of the collection of those va… ▽ More

    Submitted 1 October, 2018; originally announced October 2018.

    Comments: presented at UAI 2018 Uncertainty In Deep Learning Workshop (UDL AUG. 2018)

  13. arXiv:1807.09306  [pdf, other

    stat.ML cs.LG

    Automatic Bayesian Density Analysis

    Authors: Antonio Vergari, Alejandro Molina, Robert Peharz, Zoubin Ghahramani, Kristian Kersting, Isabel Valera

    Abstract: Making sense of a dataset in an automatic and unsupervised fashion is a challenging problem in statistics and AI. Classical approaches for {exploratory data analysis} are usually not flexible enough to deal with the uncertainty inherent to real-world data: they are often restricted to fixed latent interaction models and homogeneous likelihoods; they are sensitive to missing, corrupt and anomalous… ▽ More

    Submitted 10 February, 2019; v1 submitted 24 July, 2018; originally announced July 2018.

    Comments: In proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19)

  14. arXiv:1807.03653  [pdf, other

    cs.LG cs.AI stat.ML

    Handling Incomplete Heterogeneous Data using VAEs

    Authors: Alfredo Nazabal, Pablo M. Olmos, Zoubin Ghahramani, Isabel Valera

    Abstract: Variational autoencoders (VAEs), as well as other generative models, have been shown to be efficient and accurate for capturing the latent structure of vast amounts of complex high-dimensional data. However, existing VAEs can still not directly handle data that are heterogenous (mixed continuous and discrete) or incomplete (with missing data at random), which is indeed common in real-world applica… ▽ More

    Submitted 22 May, 2020; v1 submitted 10 July, 2018; originally announced July 2018.

  15. arXiv:1807.01969  [pdf, other

    stat.ML cs.LG

    Variational Bayesian dropout: pitfalls and fixes

    Authors: Jiri Hron, Alexander G. de G. Matthews, Zoubin Ghahramani

    Abstract: Dropout, a stochastic regularisation technique for training of neural networks, has recently been reinterpreted as a specific type of approximate inference algorithm for Bayesian neural networks. The main contribution of the reinterpretation is in providing a theoretical framework useful for analysing and extending the algorithm. We show that the proposed framework suffers from several issues; fro… ▽ More

    Submitted 5 July, 2018; originally announced July 2018.

    Comments: Extended version of the paper accepted to ICML 2018: more details in the proofs, few minor modifications

  16. arXiv:1807.00400  [pdf, other

    stat.ML cs.LG

    Antithetic and Monte Carlo kernel estimators for partial rankings

    Authors: Maria Lomeli, Mark Rowland, Arthur Gretton, Zoubin Ghahramani

    Abstract: In the modern age, rankings data is ubiquitous and it is useful for a variety of applications such as recommender systems, multi-object tracking and preference learning. However, most rankings data encountered in the real world is incomplete, which prevents the direct application of existing modelling tools for complete rankings. Our contribution is a novel way to extend kernel methods for complet… ▽ More

    Submitted 25 July, 2018; v1 submitted 1 July, 2018; originally announced July 2018.

  17. arXiv:1806.01910  [pdf, other

    cs.LG cs.AI stat.ML

    Probabilistic Deep Learning using Random Sum-Product Networks

    Authors: Robert Peharz, Antonio Vergari, Karl Stelzner, Alejandro Molina, Martin Trapp, Kristian Kersting, Zoubin Ghahramani

    Abstract: The need for consistent treatment of uncertainty has recently triggered increased interest in probabilistic deep learning methods. However, most current approaches have severe limitations when it comes to inference, since many of these models do not even permit to evaluate exact data likelihoods. Sum-product networks (SPNs), on the other hand, are an excellent architecture in that regard, as they… ▽ More

    Submitted 22 June, 2018; v1 submitted 5 June, 2018; originally announced June 2018.

  18. arXiv:1804.11271  [pdf, other

    stat.ML cs.LG

    Gaussian Process Behaviour in Wide Deep Neural Networks

    Authors: Alexander G. de G. Matthews, Mark Rowland, Jiri Hron, Richard E. Turner, Zoubin Ghahramani

    Abstract: Whilst deep neural networks have shown great empirical success, there is still much work to be done to understand their theoretical properties. In this paper, we study the relationship between random, wide, fully connected, feedforward networks with more than one hidden layer and Gaussian processes with a recursive kernel definition. We show that, under broad conditions, as we make the architectur… ▽ More

    Submitted 16 August, 2018; v1 submitted 30 April, 2018; originally announced April 2018.

    Comments: This work substantially extends the work of Matthews et al. (2018) published at the International Conference on Learning Representations (ICLR) 2018

  19. arXiv:1802.10031  [pdf, other

    cs.LG stat.ML

    The Mirage of Action-Dependent Baselines in Reinforcement Learning

    Authors: George Tucker, Surya Bhupatiraju, Shixiang Gu, Richard E. Turner, Zoubin Ghahramani, Sergey Levine

    Abstract: Policy gradient methods are a widely used class of model-free reinforcement learning algorithms where a state-dependent baseline is used to reduce gradient estimator variance. Several recent papers extend the baseline to depend on both the state and action and suggest that this significantly reduces variance and improves sample efficiency without introducing bias into the gradient estimates. To be… ▽ More

    Submitted 19 November, 2018; v1 submitted 27 February, 2018; originally announced February 2018.

    Comments: Updated to ICML final submission

  20. arXiv:1802.03039  [pdf, other

    stat.ML cs.LG cs.NE

    Few-shot learning of neural networks from scratch by pseudo example optimization

    Authors: Akisato Kimura, Zoubin Ghahramani, Koh Takeuchi, Tomoharu Iwata, Naonori Ueda

    Abstract: In this paper, we propose a simple but effective method for training neural networks with a limited amount of training data. Our approach inherits the idea of knowledge distillation that transfers knowledge from a deep or wide reference model to a shallow or narrow target model. The proposed method employs this idea to mimic predictions of reference estimators that are more robust against overfitt… ▽ More

    Submitted 5 July, 2018; v1 submitted 8 February, 2018; originally announced February 2018.

    Comments: 14 pages, 2 figures, will be presented at BMVC2018

  21. arXiv:1711.02989  [pdf, other

    stat.ML

    Variational Gaussian Dropout is not Bayesian

    Authors: Jiri Hron, Alexander G. de G. Matthews, Zoubin Ghahramani

    Abstract: Gaussian multiplicative noise is commonly used as a stochastic regularisation technique in training of deterministic neural networks. A recent paper reinterpreted the technique as a specific algorithm for approximate inference in Bayesian neural networks; several extensions ensued. We show that the log-uniform prior used in all the above publications does not generally induce a proper posterior, a… ▽ More

    Submitted 8 November, 2017; originally announced November 2017.

  22. arXiv:1707.08352  [pdf, other

    stat.ML cs.LG

    General Latent Feature Modeling for Data Exploration Tasks

    Authors: Isabel Valera, Melanie F. Pradier, Zoubin Ghahramani

    Abstract: This paper introduces a general Bayesian non- parametric latent feature model suitable to per- form automatic exploratory analysis of heterogeneous datasets, where the attributes describing each object can be either discrete, continuous or mixed variables. The proposed model presents several important properties. First, it accounts for heterogeneous data while can be inferred in linear time with r… ▽ More

    Submitted 26 July, 2017; originally announced July 2017.

    Comments: presented at 2017 ICML Workshop on Human Interpretability in Machine Learning (WHI 2017), Sydney, NSW, Australia

  23. arXiv:1707.05922  [pdf, other

    stat.ML

    Improving Output Uncertainty Estimation and Generalization in Deep Learning via Neural Network Gaussian Processes

    Authors: Tomoharu Iwata, Zoubin Ghahramani

    Abstract: We propose a simple method that combines neural networks and Gaussian processes. The proposed method can estimate the uncertainty of outputs and flexibly adjust target functions where training data exist, which are advantages of Gaussian processes. The proposed method can also achieve high generalization performance for unseen input configurations, which is an advantage of neural networks. With th… ▽ More

    Submitted 18 July, 2017; originally announced July 2017.

  24. arXiv:1707.05562  [pdf, other

    stat.ML cs.LG

    One-Shot Learning in Discriminative Neural Networks

    Authors: Jordan Burgess, James Robert Lloyd, Zoubin Ghahramani

    Abstract: We consider the task of one-shot learning of visual categories. In this paper we explore a Bayesian procedure for updating a pretrained convnet to classify a novel image category for which data is limited. We decompose this convnet into a fixed feature extractor and softmax classifier. We assume that the target weights for the new task come from the same distribution as the pretrained softmax weig… ▽ More

    Submitted 18 July, 2017; originally announced July 2017.

    Comments: 3 pages, 3 figures

  25. arXiv:1707.02476  [pdf, other

    stat.ML

    Adversarial Examples, Uncertainty, and Transfer Testing Robustness in Gaussian Process Hybrid Deep Networks

    Authors: John Bradshaw, Alexander G. de G. Matthews, Zoubin Ghahramani

    Abstract: Deep neural networks (DNNs) have excellent representative power and are state of the art classifiers on many tasks. However, they often do not capture their own uncertainties well making them less robust in the real world as they overconfidently extrapolate and do not notice domain shift. Gaussian processes (GPs) with RBF kernels on the other hand have better calibrated uncertainties and do not ov… ▽ More

    Submitted 8 July, 2017; originally announced July 2017.

  26. arXiv:1706.04161  [pdf, other

    stat.ML cs.LG

    Lost Relatives of the Gumbel Trick

    Authors: Matej Balog, Nilesh Tripuraneni, Zoubin Ghahramani, Adrian Weller

    Abstract: The Gumbel trick is a method to sample from a discrete probability distribution, or to estimate its normalizing partition function. The method relies on repeatedly applying a random perturbation to the distribution in a particular way, each time solving for the most likely configuration. We derive an entire family of related methods, of which the Gumbel trick is one member, and show that the new m… ▽ More

    Submitted 13 June, 2017; originally announced June 2017.

    Comments: 34th International Conference on Machine Learning (ICML 2017)

  27. arXiv:1706.03779  [pdf, other

    stat.ML

    General Latent Feature Models for Heterogeneous Datasets

    Authors: Isabel Valera, Melanie F. Pradier, Maria Lomeli, Zoubin Ghahramani

    Abstract: Latent feature modeling allows capturing the latent structure responsible for generating the observed properties of a set of objects. It is often used to make predictions either for new values of interest or missing information in the original data, as well as to perform data exploratory analysis. However, although there is an extensive literature on latent feature models for homogeneous datasets,… ▽ More

    Submitted 8 March, 2018; v1 submitted 12 June, 2017; originally announced June 2017.

    Comments: Software library available at https://github.com/ivaleraM/GLFM

  28. arXiv:1703.02910  [pdf, other

    cs.LG cs.CV stat.ML

    Deep Bayesian Active Learning with Image Data

    Authors: Yarin Gal, Riashat Islam, Zoubin Ghahramani

    Abstract: Even though active learning forms an important pillar of machine learning, deep learning tools are not prevalent within it. Deep learning poses several difficulties when used in an active learning setting. First, active learning (AL) methods generally rely on being able to learn and update models from small amounts of data. Recent advances in deep learning, on the other hand, are notorious for the… ▽ More

    Submitted 8 March, 2017; originally announced March 2017.

  29. arXiv:1702.08239  [pdf, other

    stat.ML

    Bayesian inference on random simple graphs with power law degree distributions

    Authors: Juho Lee, Creighton Heaukulani, Zoubin Ghahramani, Lancelot F. James, Seung** Choi

    Abstract: We present a model for random simple graphs with a degree distribution that obeys a power law (i.e., is heavy-tailed). To attain this behavior, the edge probabilities in the graph are constructed from Bertoin-Fujita-Roynette-Yor (BFRY) random variables, which have been recently utilized in Bayesian statistics for the construction of power law models in several applications. Our construction readil… ▽ More

    Submitted 18 June, 2017; v1 submitted 27 February, 2017; originally announced February 2017.

  30. arXiv:1610.08733  [pdf, other

    stat.ML

    GPflow: A Gaussian process library using TensorFlow

    Authors: Alexander G. de G. Matthews, Mark van der Wilk, Tom Nickson, Keisuke Fujii, Alexis Boukouvalas, Pablo León-Villagrá, Zoubin Ghahramani, James Hensman

    Abstract: GPflow is a Gaussian process library that uses TensorFlow for its core computations and Python for its front end. The distinguishing features of GPflow are that it uses variational inference as the primary approximation method, provides concise code through the use of automatic differentiation, has been engineered with a particular emphasis on software testing and is able to exploit GPU hardware.

    Submitted 27 October, 2016; originally announced October 2016.

  31. arXiv:1607.02738  [pdf, other

    stat.ML

    Magnetic Hamiltonian Monte Carlo

    Authors: Nilesh Tripuraneni, Mark Rowland, Zoubin Ghahramani, Richard Turner

    Abstract: Hamiltonian Monte Carlo (HMC) exploits Hamiltonian dynamics to construct efficient proposals for Markov chain Monte Carlo (MCMC). In this paper, we present a generalization of HMC which exploits \textit{non-canonical} Hamiltonian dynamics. We refer to this algorithm as magnetic HMC, since in 3 dimensions a subset of the dynamics map onto the mechanics of a charged particle coupled to a magnetic fi… ▽ More

    Submitted 19 August, 2017; v1 submitted 10 July, 2016; originally announced July 2016.

    Comments: 34th International Conference on Machine Learning (ICML 2017)

  32. arXiv:1606.05241  [pdf, other

    stat.ML

    The Mondrian Kernel

    Authors: Matej Balog, Balaji Lakshminarayanan, Zoubin Ghahramani, Daniel M. Roy, Yee Whye Teh

    Abstract: We introduce the Mondrian kernel, a fast random feature approximation to the Laplace kernel. It is suitable for both batch and online learning, and admits a fast kernel-width-selection procedure as the random features can be re-used efficiently for all kernel widths. The features are constructed by sampling trees via a Mondrian process [Roy and Teh, 2009], and we highlight the connection to Mondri… ▽ More

    Submitted 16 June, 2016; originally announced June 2016.

    Comments: Accepted for presentation at the 32nd Conference on Uncertainty in Artificial Intelligence (UAI 2016)

  33. arXiv:1604.07928  [pdf, ps, other

    cs.LG cs.AI cs.DC stat.ML

    Distributed Flexible Nonlinear Tensor Factorization

    Authors: Shandian Zhe, Kai Zhang, Pengyuan Wang, Kuang-chih Lee, Zenglin Xu, Yuan Qi, Zoubin Ghahramani

    Abstract: Tensor factorization is a powerful tool to analyse multi-way data. Compared with traditional multi-linear methods, nonlinear tensor factorization models are capable of capturing more complex relationships in the data. However, they are computationally expensive and may suffer severe learning bias in case of extreme data sparsity. To overcome these limitations, in this paper we propose a distribute… ▽ More

    Submitted 21 May, 2016; v1 submitted 27 April, 2016; originally announced April 2016.

    Comments: Gaussian process, tensor factorization, multidimensional arrays, large scale, spark, map-reduce

    ACM Class: I.5.1; I.5.4

  34. arXiv:1512.05287  [pdf, other

    stat.ML

    A Theoretically Grounded Application of Dropout in Recurrent Neural Networks

    Authors: Yarin Gal, Zoubin Ghahramani

    Abstract: Recurrent neural networks (RNNs) stand at the forefront of many recent developments in deep learning. Yet a major difficulty with these models is their tendency to overfit, with dropout shown to fail when applied to recurrent layers. Recent results at the intersection of Bayesian modelling and deep learning offer a Bayesian interpretation of common deep learning techniques such as dropout. This gr… ▽ More

    Submitted 5 October, 2016; v1 submitted 16 December, 2015; originally announced December 2015.

    Comments: Added clarifications; Published in NIPS 2016

  35. arXiv:1511.09422  [pdf, other

    stat.ML

    A General Framework for Constrained Bayesian Optimization using Information-based Search

    Authors: José Miguel Hernández-Lobato, Michael A. Gelbart, Ryan P. Adams, Matthew W. Hoffman, Zoubin Ghahramani

    Abstract: We present an information-theoretic framework for solving global black-box optimization problems that also have black-box constraints. Of particular interest to us is to efficiently solve problems with decoupled constraints, in which subsets of the objective and constraint functions may be evaluated independently. For example, when the objective is evaluated on a CPU and the constraints are evalua… ▽ More

    Submitted 4 September, 2016; v1 submitted 30 November, 2015; originally announced November 2015.

  36. arXiv:1511.07130  [pdf, other

    cs.LG stat.ML

    Parallel Predictive Entropy Search for Batch Global Optimization of Expensive Objective Functions

    Authors: Amar Shah, Zoubin Ghahramani

    Abstract: We develop parallel predictive entropy search (PPES), a novel algorithm for Bayesian optimization of expensive black-box objective functions. At each iteration, PPES aims to select a batch of points which will maximize the information gain about the global maximizer of the objective. Well known strategies exist for suggesting a single evaluation point based on previous observations, while far fewe… ▽ More

    Submitted 23 November, 2015; originally announced November 2015.

    Comments: 12 pages in Neural Information Processing Systems 2015

  37. arXiv:1511.02543  [pdf, other

    stat.ML cs.LG stat.CO

    Sandwiching the marginal likelihood using bidirectional Monte Carlo

    Authors: Roger B. Grosse, Zoubin Ghahramani, Ryan P. Adams

    Abstract: Computing the marginal likelihood (ML) of a model requires marginalizing out all of the parameters and latent variables, a difficult high-dimensional summation or integration problem. To make matters worse, it is often hard to measure the accuracy of one's ML estimates. We present bidirectional Monte Carlo, a technique for obtaining accurate log-ML estimates on data simulated from a model. This me… ▽ More

    Submitted 8 November, 2015; originally announced November 2015.

  38. arXiv:1509.04781  [pdf, other

    stat.ML

    Dirichlet Fragmentation Processes

    Authors: Hong Ge, Yarin Gal, Zoubin Ghahramani

    Abstract: Tree structures are ubiquitous in data across many domains, and many datasets are naturally modelled by unobserved tree structures. In this paper, first we review the theory of random fragmentation processes [Bertoin, 2006], and a number of existing methods for modelling trees, including the popular nested Chinese restaurant process (nCRP). Then we define a general class of probability distributio… ▽ More

    Submitted 15 September, 2015; originally announced September 2015.

  39. arXiv:1506.09039  [pdf, other

    stat.ML cs.LG

    Scalable Discrete Sampling as a Multi-Armed Bandit Problem

    Authors: Yutian Chen, Zoubin Ghahramani

    Abstract: Drawing a sample from a discrete distribution is one of the building components for Monte Carlo methods. Like other sampling algorithms, discrete sampling suffers from the high computational burden in large-scale inference problems. We study the problem of sampling a discrete random variable with a high degree of dependency that is typical in large-scale Bayesian inference and graphical models, an… ▽ More

    Submitted 27 April, 2016; v1 submitted 30 June, 2015; originally announced June 2015.

  40. arXiv:1506.08180  [pdf, other

    stat.ML cs.LG stat.AP stat.CO stat.ME

    An Empirical Study of Stochastic Variational Algorithms for the Beta Bernoulli Process

    Authors: Amar Shah, David A. Knowles, Zoubin Ghahramani

    Abstract: Stochastic variational inference (SVI) is emerging as the most promising candidate for scaling inference in Bayesian probabilistic models to large datasets. However, the performance of these methods has been assessed primarily in the context of Bayesian topic models, particularly latent Dirichlet allocation (LDA). Deriving several new algorithms, and using synthetic, image and genomic datasets, we… ▽ More

    Submitted 26 June, 2015; originally announced June 2015.

    Comments: ICML, 12 pages. Volume 37: Proceedings of The 32nd International Conference on Machine Learning, 2015

  41. arXiv:1506.04000  [pdf, other

    stat.ML

    MCMC for Variationally Sparse Gaussian Processes

    Authors: James Hensman, Alexander G. de G. Matthews, Maurizio Filippone, Zoubin Ghahramani

    Abstract: Gaussian process (GP) models form a core part of probabilistic machine learning. Considerable research effort has been made into attacking three issues with GP models: how to compute efficiently when the number of data is large; how to approximate the posterior when the likelihood is not Gaussian and how to estimate covariance function parameter posteriors. This paper simultaneously addresses thes… ▽ More

    Submitted 12 June, 2015; originally announced June 2015.

    Comments: 16 pages

  42. arXiv:1506.03338  [pdf, ps, other

    cs.LG stat.ML

    Neural Adaptive Sequential Monte Carlo

    Authors: Shixiang Gu, Zoubin Ghahramani, Richard E. Turner

    Abstract: Sequential Monte Carlo (SMC), or particle filtering, is a popular class of methods for sampling from an intractable target distribution using a sequence of simpler intermediate distributions. Like other importance sampling-based methods, performance is critically dependent on the proposal distribution: a bad proposal can lead to arbitrarily inaccurate estimates of the target distribution. This pap… ▽ More

    Submitted 16 November, 2015; v1 submitted 10 June, 2015; originally announced June 2015.

  43. arXiv:1506.02158  [pdf, other

    stat.ML cs.LG

    Bayesian Convolutional Neural Networks with Bernoulli Approximate Variational Inference

    Authors: Yarin Gal, Zoubin Ghahramani

    Abstract: Convolutional neural networks (CNNs) work well on large datasets. But labelled data is hard to collect, and in some applications larger amounts of data are not available. The problem then is how to use CNNs with small data -- as CNNs overfit quickly. We present an efficient Bayesian CNN, offering better robustness to over-fitting on small data than traditional approaches. This is by placing a prob… ▽ More

    Submitted 18 January, 2016; v1 submitted 6 June, 2015; originally announced June 2015.

    Comments: 12 pages, 3 figures, ICLR format, updated with reviewer comments

  44. arXiv:1506.02157  [pdf, other

    stat.ML

    Dropout as a Bayesian Approximation: Appendix

    Authors: Yarin Gal, Zoubin Ghahramani

    Abstract: We show that a neural network with arbitrary depth and non-linearities, with dropout applied before every weight layer, is mathematically equivalent to an approximation to a well known Bayesian model. This interpretation might offer an explanation to some of dropout's key properties, such as its robustness to over-fitting. Our interpretation allows us to reason about uncertainty in deep learning,… ▽ More

    Submitted 25 May, 2016; v1 submitted 6 June, 2015; originally announced June 2015.

    Comments: 20 pages, 1 figure; ICML proceedings version

  45. arXiv:1506.02142  [pdf, other

    stat.ML cs.LG

    Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning

    Authors: Yarin Gal, Zoubin Ghahramani

    Abstract: Deep learning tools have gained tremendous attention in applied machine learning. However such tools for regression and classification do not capture model uncertainty. In comparison, Bayesian models offer a mathematically grounded framework to reason about model uncertainty, but usually come with a prohibitive computational cost. In this paper we develop a new theoretical framework casting dropou… ▽ More

    Submitted 4 October, 2016; v1 submitted 6 June, 2015; originally announced June 2015.

    Comments: 12 pages, 6 figures; fixed a mistake with standard error and added a new table with updated results (marked "Update [October 2016]"); Published in ICML 2016

  46. arXiv:1505.03906  [pdf, other

    stat.ML cs.LG

    Training generative neural networks via Maximum Mean Discrepancy optimization

    Authors: Gintare Karolina Dziugaite, Daniel M. Roy, Zoubin Ghahramani

    Abstract: We consider training a deep neural network to generate samples from an unknown distribution given i.i.d. data. We frame learning as an optimization minimizing a two-sample test statistic---informally speaking, a good generator network produces samples that cause a two-sample test to fail to reject the null hypothesis. As our two-sample test statistic, we use an unbiased estimate of the maximum mea… ▽ More

    Submitted 14 May, 2015; originally announced May 2015.

    Comments: 10 pages, to appear in Uncertainty in Artificial Intelligence (UAI) 2015

  47. Bayesian cluster analysis: Point estimation and credible balls

    Authors: Sara Wade, Zoubin Ghahramani

    Abstract: Clustering is widely studied in statistics and machine learning, with applications in a variety of fields. As opposed to classical algorithms which return a single clustering solution, Bayesian nonparametric models provide a posterior over the entire space of partitions, allowing one to assess statistical properties, such as uncertainty on the number of clusters. However, an important problem is h… ▽ More

    Submitted 8 February, 2019; v1 submitted 13 May, 2015; originally announced May 2015.

    Journal ref: Bayesian Anal., Volume 13, Number 2 (2018), 559-626

  48. arXiv:1505.00428  [pdf, other

    stat.ML

    A Linear-Time Particle Gibbs Sampler for Infinite Hidden Markov Models

    Authors: Nilesh Tripuraneni, Shane Gu, Hong Ge, Zoubin Ghahramani

    Abstract: Infinite Hidden Markov Models (iHMM's) are an attractive, nonparametric generalization of the classical Hidden Markov Model which can automatically infer the number of hidden states in the system. However, due to the infinite-dimensional nature of transition dynamics performing inference in the iHMM is difficult. In this paper, we present an infinite-state Particle Gibbs (PG) algorithm to resample… ▽ More

    Submitted 9 June, 2015; v1 submitted 3 May, 2015; originally announced May 2015.

  49. arXiv:1504.07027  [pdf, ps, other

    stat.ML

    On Sparse variational methods and the Kullback-Leibler divergence between stochastic processes

    Authors: Alexander G. de G. Matthews, James Hensman, Richard E. Turner, Zoubin Ghahramani

    Abstract: The variational framework for learning inducing variables (Titsias, 2009a) has had a large impact on the Gaussian process literature. The framework may be interpreted as minimizing a rigorously defined Kullback-Leibler divergence between the approximating and posterior processes. To our knowledge this connection has thus far gone unremarked in the literature. In this paper we give a substantial ge… ▽ More

    Submitted 4 December, 2015; v1 submitted 27 April, 2015; originally announced April 2015.

    Comments: 9 pages. No figures

  50. arXiv:1503.02182  [pdf, other

    stat.ML

    Latent Gaussian Processes for Distribution Estimation of Multivariate Categorical Data

    Authors: Yarin Gal, Yutian Chen, Zoubin Ghahramani

    Abstract: Multivariate categorical data occur in many applications of machine learning. One of the main difficulties with these vectors of categorical variables is sparsity. The number of possible observations grows exponentially with vector length, but dataset diversity might be poor in comparison. Recent models have gained significant improvement in supervised tasks with this data. These models embed obse… ▽ More

    Submitted 7 March, 2015; originally announced March 2015.

    Comments: 11 pages, 6 figures