Skip to main content

Showing 1–50 of 69 results for author: Ghahramani, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.07839  [pdf, other

    cs.LG cs.AI cs.CL

    RecurrentGemma: Moving Past Transformers for Efficient Open Language Models

    Authors: Aleksandar Botev, Soham De, Samuel L Smith, Anushan Fernando, George-Cristian Muraru, Ruba Haroun, Leonard Berrada, Razvan Pascanu, Pier Giuseppe Sessa, Robert Dadashi, Léonard Hussenot, Johan Ferret, Sertan Girgin, Olivier Bachem, Alek Andreev, Kathleen Kenealy, Thomas Mesnard, Cassidy Hardin, Surya Bhupatiraju, Shreya Pathak, Laurent Sifre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, Pouya Tafti , et al. (37 additional authors not shown)

    Abstract: We introduce RecurrentGemma, an open language model which uses Google's novel Griffin architecture. Griffin combines linear recurrences with local attention to achieve excellent performance on language. It has a fixed-sized state, which reduces memory use and enables efficient inference on long sequences. We provide a pre-trained model with 2B non-embedding parameters, and an instruction tuned var… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  2. arXiv:2403.08295  [pdf, other

    cs.CL cs.AI

    Gemma: Open Models Based on Gemini Research and Technology

    Authors: Gemma Team, Thomas Mesnard, Cassidy Hardin, Robert Dadashi, Surya Bhupatiraju, Shreya Pathak, Laurent Sifre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, Pouya Tafti, Léonard Hussenot, Pier Giuseppe Sessa, Aakanksha Chowdhery, Adam Roberts, Aditya Barua, Alex Botev, Alex Castro-Ros, Ambrose Slone, Amélie Héliou, Andrea Tacchetti, Anna Bulanova, Antonia Paterson, Beth Tsai, Bobak Shahriari , et al. (83 additional authors not shown)

    Abstract: This work introduces Gemma, a family of lightweight, state-of-the art open models built from the research and technology used to create Gemini models. Gemma models demonstrate strong performance across academic benchmarks for language understanding, reasoning, and safety. We release two sizes of models (2 billion and 7 billion parameters), and provide both pretrained and fine-tuned checkpoints. Ge… ▽ More

    Submitted 16 April, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

  3. arXiv:2207.07411  [pdf, other

    cs.LG stat.ML

    Plex: Towards Reliability using Pretrained Large Model Extensions

    Authors: Dustin Tran, Jeremiah Liu, Michael W. Dusenberry, Du Phan, Mark Collier, Jie Ren, Kehang Han, Zi Wang, Zelda Mariet, Huiyi Hu, Neil Band, Tim G. J. Rudner, Karan Singhal, Zachary Nado, Joost van Amersfoort, Andreas Kirsch, Rodolphe Jenatton, Nithum Thain, Honglin Yuan, Kelly Buchanan, Kevin Murphy, D. Sculley, Yarin Gal, Zoubin Ghahramani, Jasper Snoek , et al. (1 additional authors not shown)

    Abstract: A recent trend in artificial intelligence is the use of pretrained models for language and vision tasks, which have achieved extraordinary performance but also puzzling failures. Probing these models' abilities in diverse ways is therefore critical to the field. In this paper, we explore the reliability of models, where we define a reliable model as one that not only achieves strong predictive per… ▽ More

    Submitted 15 July, 2022; originally announced July 2022.

    Comments: Code available at https://goo.gle/plex-code

  4. arXiv:2207.03084  [pdf, other

    cs.LG cs.AI stat.ML

    Pre-training helps Bayesian optimization too

    Authors: Zi Wang, George E. Dahl, Kevin Swersky, Chansoo Lee, Zelda Mariet, Zachary Nado, Justin Gilmer, Jasper Snoek, Zoubin Ghahramani

    Abstract: Bayesian optimization (BO) has become a popular strategy for global optimization of many expensive real-world functions. Contrary to a common belief that BO is suited to optimizing black-box functions, it actually requires domain knowledge on characteristics of those functions to deploy BO successfully. Such domain knowledge often manifests in Gaussian process priors that specify initial beliefs o… ▽ More

    Submitted 7 July, 2022; originally announced July 2022.

    Comments: ICML2022 Workshop on Adaptive Experimental Design and Active Learning in the Real World. arXiv admin note: substantial text overlap with arXiv:2109.08215

  5. arXiv:2206.03992  [pdf, other

    stat.ML cs.LG

    Neural Diffusion Processes

    Authors: Vincent Dutordoir, Alan Saul, Zoubin Ghahramani, Fergus Simpson

    Abstract: Neural network approaches for meta-learning distributions over functions have desirable properties such as increased flexibility and a reduced complexity of inference. Building on the successes of denoising diffusion models for generative modelling, we propose Neural Diffusion Processes (NDPs), a novel approach that learns to sample from a rich distribution over functions through its finite margin… ▽ More

    Submitted 6 June, 2023; v1 submitted 8 June, 2022; originally announced June 2022.

    Comments: 23 pages, Proceedings of the 40th International Conference on Machine Learning, PMLR 202

  6. arXiv:2109.08215  [pdf, other

    cs.LG stat.ML

    Pre-trained Gaussian processes for Bayesian optimization

    Authors: Zi Wang, George E. Dahl, Kevin Swersky, Chansoo Lee, Zelda Mariet, Zachary Nado, Justin Gilmer, Jasper Snoek, Zoubin Ghahramani

    Abstract: Bayesian optimization (BO) has become a popular strategy for global optimization of many expensive real-world functions. Contrary to a common belief that BO is suited to optimizing black-box functions, it actually requires domain knowledge on characteristics of those functions to deploy BO successfully. Such domain knowledge often manifests in Gaussian process priors that specify initial beliefs o… ▽ More

    Submitted 6 July, 2022; v1 submitted 16 September, 2021; originally announced September 2021.

  7. arXiv:2105.04504  [pdf, other

    stat.ML cs.LG

    Deep Neural Networks as Point Estimates for Deep Gaussian Processes

    Authors: Vincent Dutordoir, James Hensman, Mark van der Wilk, Carl Henrik Ek, Zoubin Ghahramani, Nicolas Durrande

    Abstract: Neural networks and Gaussian processes are complementary in their strengths and weaknesses. Having a better understanding of their relationship comes with the promise to make each method benefit from the strengths of the other. In this work, we establish an equivalence between the forward passes of neural networks and (deep) sparse Gaussian process models. The theory we develop is based on interpr… ▽ More

    Submitted 9 December, 2021; v1 submitted 10 May, 2021; originally announced May 2021.

    Comments: 35th Conference on Neural Information Processing Systems (NeurIPS 2021)

  8. arXiv:2004.09703  [pdf, other

    cs.LG cs.AI stat.ML

    Learning Continuous Treatment Policy and Bipartite Embeddings for Matching with Heterogeneous Causal Effects

    Authors: Will Y. Zou, Smitha Shyam, Michael Mui, Mingshi Wang, Jan Pedersen, Zoubin Ghahramani

    Abstract: Causal inference methods are widely applied in the fields of medicine, policy, and economics. Central to these applications is the estimation of treatment effects to make decisions. Current methods make binary yes-or-no decisions based on the treatment effect of a single outcome dimension. These methods are unable to capture continuous space treatment policies with a measure of intensity. They als… ▽ More

    Submitted 20 April, 2020; originally announced April 2020.

  9. arXiv:2004.06231  [pdf, other

    cs.LG stat.ML

    Einsum Networks: Fast and Scalable Learning of Tractable Probabilistic Circuits

    Authors: Robert Peharz, Steven Lang, Antonio Vergari, Karl Stelzner, Alejandro Molina, Martin Trapp, Guy Van den Broeck, Kristian Kersting, Zoubin Ghahramani

    Abstract: Probabilistic circuits (PCs) are a promising avenue for probabilistic modeling, as they permit a wide range of exact and efficient inference routines. Recent ``deep-learning-style'' implementations of PCs strive for a better scalability, but are still difficult to train on real-world data, due to their sparsely connected computational graphs. In this paper, we propose Einsum Networks (EiNets), a n… ▽ More

    Submitted 13 April, 2020; originally announced April 2020.

  10. arXiv:2002.02702  [pdf

    cs.LG cs.PL stat.ML

    DynamicPPL: Stan-like Speed for Dynamic Probabilistic Models

    Authors: Mohamed Tarek, Kai Xu, Martin Trapp, Hong Ge, Zoubin Ghahramani

    Abstract: We present the preliminary high-level design and features of DynamicPPL.jl, a modular library providing a lightning-fast infrastructure for probabilistic programming. Besides a computational performance that is often close to or better than Stan, DynamicPPL provides an intuitive DSL that allows the rapid development of complex dynamic probabilistic programs. Being entirely written in Julia, a high… ▽ More

    Submitted 7 February, 2020; originally announced February 2020.

  11. arXiv:2001.03048  [pdf, other

    stat.ML cs.LG

    Resource-Efficient Neural Networks for Embedded Systems

    Authors: Wolfgang Roth, Günther Schindler, Bernhard Klein, Robert Peharz, Sebastian Tschiatschek, Holger Fröning, Franz Pernkopf, Zoubin Ghahramani

    Abstract: While machine learning is traditionally a resource intensive task, embedded systems, autonomous navigation, and the vision of the Internet of Things fuel the interest in resource-efficient approaches. These approaches aim for a carefully chosen trade-off between performance and resource consumption in terms of computation and energy. The development of such approaches is among the major challenges… ▽ More

    Submitted 7 April, 2024; v1 submitted 7 January, 2020; originally announced January 2020.

    Comments: arXiv admin note: text overlap with arXiv:1812.02240; accepted at JMLR

  12. arXiv:1905.10884  [pdf, other

    cs.LG stat.ML

    Bayesian Learning of Sum-Product Networks

    Authors: Martin Trapp, Robert Peharz, Hong Ge, Franz Pernkopf, Zoubin Ghahramani

    Abstract: Sum-product networks (SPNs) are flexible density estimators and have received significant attention due to their attractive inference properties. While parameter learning in SPNs is well developed, structure learning leaves something to be desired: Even though there is a plethora of SPN structure learners, most of them are somewhat ad-hoc and based on intuition rather than a clear learning princip… ▽ More

    Submitted 4 November, 2019; v1 submitted 26 May, 2019; originally announced May 2019.

    Comments: NeurIPS 2019; See conference page for supplement

  13. arXiv:1812.02240  [pdf, other

    cs.LG stat.ML

    Efficient and Robust Machine Learning for Real-World Systems

    Authors: Franz Pernkopf, Wolfgang Roth, Matthias Zoehrer, Lukas Pfeifenberger, Guenther Schindler, Holger Froening, Sebastian Tschiatschek, Robert Peharz, Matthew Mattina, Zoubin Ghahramani

    Abstract: While machine learning is traditionally a resource intensive task, embedded systems, autonomous navigation and the vision of the Internet-of-Things fuel the interest in resource efficient approaches. These approaches require a carefully chosen trade-off between performance and resource consumption in terms of computation and energy. On top of this, it is crucial to treat uncertainty in a consisten… ▽ More

    Submitted 5 December, 2018; originally announced December 2018.

  14. arXiv:1810.00555  [pdf, other

    stat.ML cs.AI cs.LG

    Probabilistic Meta-Representations Of Neural Networks

    Authors: Theofanis Karaletsos, Peter Dayan, Zoubin Ghahramani

    Abstract: Existing Bayesian treatments of neural networks are typically characterized by weak prior and approximate posterior distributions according to which all the weights are drawn independently. Here, we consider a richer prior distribution in which units in the network are represented by latent variables, and the weights between units are drawn conditionally on the values of the collection of those va… ▽ More

    Submitted 1 October, 2018; originally announced October 2018.

    Comments: presented at UAI 2018 Uncertainty In Deep Learning Workshop (UDL AUG. 2018)

  15. arXiv:1807.09306  [pdf, other

    stat.ML cs.LG

    Automatic Bayesian Density Analysis

    Authors: Antonio Vergari, Alejandro Molina, Robert Peharz, Zoubin Ghahramani, Kristian Kersting, Isabel Valera

    Abstract: Making sense of a dataset in an automatic and unsupervised fashion is a challenging problem in statistics and AI. Classical approaches for {exploratory data analysis} are usually not flexible enough to deal with the uncertainty inherent to real-world data: they are often restricted to fixed latent interaction models and homogeneous likelihoods; they are sensitive to missing, corrupt and anomalous… ▽ More

    Submitted 10 February, 2019; v1 submitted 24 July, 2018; originally announced July 2018.

    Comments: In proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19)

  16. arXiv:1807.03653  [pdf, other

    cs.LG cs.AI stat.ML

    Handling Incomplete Heterogeneous Data using VAEs

    Authors: Alfredo Nazabal, Pablo M. Olmos, Zoubin Ghahramani, Isabel Valera

    Abstract: Variational autoencoders (VAEs), as well as other generative models, have been shown to be efficient and accurate for capturing the latent structure of vast amounts of complex high-dimensional data. However, existing VAEs can still not directly handle data that are heterogenous (mixed continuous and discrete) or incomplete (with missing data at random), which is indeed common in real-world applica… ▽ More

    Submitted 22 May, 2020; v1 submitted 10 July, 2018; originally announced July 2018.

  17. arXiv:1807.01969  [pdf, other

    stat.ML cs.LG

    Variational Bayesian dropout: pitfalls and fixes

    Authors: Jiri Hron, Alexander G. de G. Matthews, Zoubin Ghahramani

    Abstract: Dropout, a stochastic regularisation technique for training of neural networks, has recently been reinterpreted as a specific type of approximate inference algorithm for Bayesian neural networks. The main contribution of the reinterpretation is in providing a theoretical framework useful for analysing and extending the algorithm. We show that the proposed framework suffers from several issues; fro… ▽ More

    Submitted 5 July, 2018; originally announced July 2018.

    Comments: Extended version of the paper accepted to ICML 2018: more details in the proofs, few minor modifications

  18. arXiv:1807.00400  [pdf, other

    stat.ML cs.LG

    Antithetic and Monte Carlo kernel estimators for partial rankings

    Authors: Maria Lomeli, Mark Rowland, Arthur Gretton, Zoubin Ghahramani

    Abstract: In the modern age, rankings data is ubiquitous and it is useful for a variety of applications such as recommender systems, multi-object tracking and preference learning. However, most rankings data encountered in the real world is incomplete, which prevents the direct application of existing modelling tools for complete rankings. Our contribution is a novel way to extend kernel methods for complet… ▽ More

    Submitted 25 July, 2018; v1 submitted 1 July, 2018; originally announced July 2018.

  19. arXiv:1806.01910  [pdf, other

    cs.LG cs.AI stat.ML

    Probabilistic Deep Learning using Random Sum-Product Networks

    Authors: Robert Peharz, Antonio Vergari, Karl Stelzner, Alejandro Molina, Martin Trapp, Kristian Kersting, Zoubin Ghahramani

    Abstract: The need for consistent treatment of uncertainty has recently triggered increased interest in probabilistic deep learning methods. However, most current approaches have severe limitations when it comes to inference, since many of these models do not even permit to evaluate exact data likelihoods. Sum-product networks (SPNs), on the other hand, are an excellent architecture in that regard, as they… ▽ More

    Submitted 22 June, 2018; v1 submitted 5 June, 2018; originally announced June 2018.

  20. arXiv:1804.11271  [pdf, other

    stat.ML cs.LG

    Gaussian Process Behaviour in Wide Deep Neural Networks

    Authors: Alexander G. de G. Matthews, Mark Rowland, Jiri Hron, Richard E. Turner, Zoubin Ghahramani

    Abstract: Whilst deep neural networks have shown great empirical success, there is still much work to be done to understand their theoretical properties. In this paper, we study the relationship between random, wide, fully connected, feedforward networks with more than one hidden layer and Gaussian processes with a recursive kernel definition. We show that, under broad conditions, as we make the architectur… ▽ More

    Submitted 16 August, 2018; v1 submitted 30 April, 2018; originally announced April 2018.

    Comments: This work substantially extends the work of Matthews et al. (2018) published at the International Conference on Learning Representations (ICLR) 2018

  21. arXiv:1802.10031  [pdf, other

    cs.LG stat.ML

    The Mirage of Action-Dependent Baselines in Reinforcement Learning

    Authors: George Tucker, Surya Bhupatiraju, Shixiang Gu, Richard E. Turner, Zoubin Ghahramani, Sergey Levine

    Abstract: Policy gradient methods are a widely used class of model-free reinforcement learning algorithms where a state-dependent baseline is used to reduce gradient estimator variance. Several recent papers extend the baseline to depend on both the state and action and suggest that this significantly reduces variance and improves sample efficiency without introducing bias into the gradient estimates. To be… ▽ More

    Submitted 19 November, 2018; v1 submitted 27 February, 2018; originally announced February 2018.

    Comments: Updated to ICML final submission

  22. arXiv:1802.04668  [pdf, other

    cs.CV cs.MM cs.SI

    Weakly supervised collective feature learning from curated media

    Authors: Yusuke Mukuta, Akisato Kimura, David B Adrian, Zoubin Ghahramani

    Abstract: The current state-of-the-art in feature learning relies on the supervised learning of large-scale datasets consisting of target content items and their respective category labels. However, constructing such large-scale fully-labeled datasets generally requires painstaking manual effort. One possible solution to this problem is to employ community contributed text tags as weak labels, however, the… ▽ More

    Submitted 13 February, 2018; originally announced February 2018.

    Comments: Published in the Proceedings of AAAI Conferenrence on Artificial Intelligence (AAAI2018)

  23. arXiv:1802.03039  [pdf, other

    stat.ML cs.LG cs.NE

    Few-shot learning of neural networks from scratch by pseudo example optimization

    Authors: Akisato Kimura, Zoubin Ghahramani, Koh Takeuchi, Tomoharu Iwata, Naonori Ueda

    Abstract: In this paper, we propose a simple but effective method for training neural networks with a limited amount of training data. Our approach inherits the idea of knowledge distillation that transfers knowledge from a deep or wide reference model to a shallow or narrow target model. The proposed method employs this idea to mimic predictions of reference estimators that are more robust against overfitt… ▽ More

    Submitted 5 July, 2018; v1 submitted 8 February, 2018; originally announced February 2018.

    Comments: 14 pages, 2 figures, will be presented at BMVC2018

  24. Denotational validation of higher-order Bayesian inference

    Authors: Adam Ścibior, Ohad Kammar, Matthijs Vákár, Sam Staton, Hongseok Yang, Yufei Cai, Klaus Ostermann, Sean K. Moss, Chris Heunen, Zoubin Ghahramani

    Abstract: We present a modular semantic account of Bayesian inference algorithms for probabilistic programming languages, as used in data science and machine learning. Sophisticated inference algorithms are often explained in terms of composition of smaller parts. However, neither their theoretical justification nor their implementation reflects this modularity. We show how to conceptualise and analyse such… ▽ More

    Submitted 8 November, 2017; originally announced November 2017.

    Journal ref: Proc. ACM Program. Lang. 2, POPL, Article 60 (January 2018)

  25. arXiv:1707.08352  [pdf, other

    stat.ML cs.LG

    General Latent Feature Modeling for Data Exploration Tasks

    Authors: Isabel Valera, Melanie F. Pradier, Zoubin Ghahramani

    Abstract: This paper introduces a general Bayesian non- parametric latent feature model suitable to per- form automatic exploratory analysis of heterogeneous datasets, where the attributes describing each object can be either discrete, continuous or mixed variables. The proposed model presents several important properties. First, it accounts for heterogeneous data while can be inferred in linear time with r… ▽ More

    Submitted 26 July, 2017; originally announced July 2017.

    Comments: presented at 2017 ICML Workshop on Human Interpretability in Machine Learning (WHI 2017), Sydney, NSW, Australia

  26. arXiv:1707.05562  [pdf, other

    stat.ML cs.LG

    One-Shot Learning in Discriminative Neural Networks

    Authors: Jordan Burgess, James Robert Lloyd, Zoubin Ghahramani

    Abstract: We consider the task of one-shot learning of visual categories. In this paper we explore a Bayesian procedure for updating a pretrained convnet to classify a novel image category for which data is limited. We decompose this convnet into a fixed feature extractor and softmax classifier. We assume that the target weights for the new task come from the same distribution as the pretrained softmax weig… ▽ More

    Submitted 18 July, 2017; originally announced July 2017.

    Comments: 3 pages, 3 figures

  27. arXiv:1706.04161  [pdf, other

    stat.ML cs.LG

    Lost Relatives of the Gumbel Trick

    Authors: Matej Balog, Nilesh Tripuraneni, Zoubin Ghahramani, Adrian Weller

    Abstract: The Gumbel trick is a method to sample from a discrete probability distribution, or to estimate its normalizing partition function. The method relies on repeatedly applying a random perturbation to the distribution in a particular way, each time solving for the most likely configuration. We derive an entire family of related methods, of which the Gumbel trick is one member, and show that the new m… ▽ More

    Submitted 13 June, 2017; originally announced June 2017.

    Comments: 34th International Conference on Machine Learning (ICML 2017)

  28. arXiv:1706.00387  [pdf, other

    cs.LG cs.AI cs.RO

    Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning

    Authors: Shixiang Gu, Timothy Lillicrap, Zoubin Ghahramani, Richard E. Turner, Bernhard Schölkopf, Sergey Levine

    Abstract: Off-policy model-free deep reinforcement learning methods using previously collected data can improve sample efficiency over on-policy policy gradient techniques. On the other hand, on-policy algorithms are often more stable and easier to use. This paper examines, both theoretically and empirically, approaches to merging on- and off-policy updates for deep reinforcement learning. Theoretical resul… ▽ More

    Submitted 1 June, 2017; originally announced June 2017.

  29. arXiv:1703.02910  [pdf, other

    cs.LG cs.CV stat.ML

    Deep Bayesian Active Learning with Image Data

    Authors: Yarin Gal, Riashat Islam, Zoubin Ghahramani

    Abstract: Even though active learning forms an important pillar of machine learning, deep learning tools are not prevalent within it. Deep learning poses several difficulties when used in an active learning setting. First, active learning (AL) methods generally rely on being able to learn and update models from small amounts of data. Recent advances in deep learning, on the other hand, are notorious for the… ▽ More

    Submitted 8 March, 2017; originally announced March 2017.

  30. arXiv:1611.02247  [pdf, other

    cs.LG

    Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic

    Authors: Shixiang Gu, Timothy Lillicrap, Zoubin Ghahramani, Richard E. Turner, Sergey Levine

    Abstract: Model-free deep reinforcement learning (RL) methods have been successful in a wide variety of simulated domains. However, a major obstacle facing deep RL in the real world is their high sample complexity. Batch policy gradient methods offer stable learning, but at the cost of high variance, which often requires large batches. TD-style methods, such as off-policy actor-critic and Q-learning, are mo… ▽ More

    Submitted 27 February, 2017; v1 submitted 7 November, 2016; originally announced November 2016.

    Comments: Conference Paper at the International Conference on Learning Representations (ICLR) 2017

  31. arXiv:1608.00853  [pdf, other

    cs.CV cs.LG

    A study of the effect of JPG compression on adversarial images

    Authors: Gintare Karolina Dziugaite, Zoubin Ghahramani, Daniel M. Roy

    Abstract: Neural network image classifiers are known to be vulnerable to adversarial images, i.e., natural images which have been modified by an adversarial perturbation specifically designed to be imperceptible to humans yet fool the classifier. Not only can adversarial images be generated easily, but these images will often be adversarial for networks trained on disjoint subsets of data or with different… ▽ More

    Submitted 2 August, 2016; originally announced August 2016.

    Comments: 8 pages, 4 figures

  32. arXiv:1604.07928  [pdf, ps, other

    cs.LG cs.AI cs.DC stat.ML

    Distributed Flexible Nonlinear Tensor Factorization

    Authors: Shandian Zhe, Kai Zhang, Pengyuan Wang, Kuang-chih Lee, Zenglin Xu, Yuan Qi, Zoubin Ghahramani

    Abstract: Tensor factorization is a powerful tool to analyse multi-way data. Compared with traditional multi-linear methods, nonlinear tensor factorization models are capable of capturing more complex relationships in the data. However, they are computationally expensive and may suffer severe learning bias in case of extreme data sparsity. To overcome these limitations, in this paper we propose a distribute… ▽ More

    Submitted 21 May, 2016; v1 submitted 27 April, 2016; originally announced April 2016.

    Comments: Gaussian process, tensor factorization, multidimensional arrays, large scale, spark, map-reduce

    ACM Class: I.5.1; I.5.4

  33. arXiv:1511.07130  [pdf, other

    cs.LG stat.ML

    Parallel Predictive Entropy Search for Batch Global Optimization of Expensive Objective Functions

    Authors: Amar Shah, Zoubin Ghahramani

    Abstract: We develop parallel predictive entropy search (PPES), a novel algorithm for Bayesian optimization of expensive black-box objective functions. At each iteration, PPES aims to select a batch of points which will maximize the information gain about the global maximizer of the objective. Well known strategies exist for suggesting a single evaluation point based on previous observations, while far fewe… ▽ More

    Submitted 23 November, 2015; originally announced November 2015.

    Comments: 12 pages in Neural Information Processing Systems 2015

  34. arXiv:1511.02543  [pdf, other

    stat.ML cs.LG stat.CO

    Sandwiching the marginal likelihood using bidirectional Monte Carlo

    Authors: Roger B. Grosse, Zoubin Ghahramani, Ryan P. Adams

    Abstract: Computing the marginal likelihood (ML) of a model requires marginalizing out all of the parameters and latent variables, a difficult high-dimensional summation or integration problem. To make matters worse, it is often hard to measure the accuracy of one's ML estimates. We present bidirectional Monte Carlo, a technique for obtaining accurate log-ML estimates on data simulated from a model. This me… ▽ More

    Submitted 8 November, 2015; originally announced November 2015.

  35. arXiv:1506.09039  [pdf, other

    stat.ML cs.LG

    Scalable Discrete Sampling as a Multi-Armed Bandit Problem

    Authors: Yutian Chen, Zoubin Ghahramani

    Abstract: Drawing a sample from a discrete distribution is one of the building components for Monte Carlo methods. Like other sampling algorithms, discrete sampling suffers from the high computational burden in large-scale inference problems. We study the problem of sampling a discrete random variable with a high degree of dependency that is typical in large-scale Bayesian inference and graphical models, an… ▽ More

    Submitted 27 April, 2016; v1 submitted 30 June, 2015; originally announced June 2015.

  36. arXiv:1506.08180  [pdf, other

    stat.ML cs.LG stat.AP stat.CO stat.ME

    An Empirical Study of Stochastic Variational Algorithms for the Beta Bernoulli Process

    Authors: Amar Shah, David A. Knowles, Zoubin Ghahramani

    Abstract: Stochastic variational inference (SVI) is emerging as the most promising candidate for scaling inference in Bayesian probabilistic models to large datasets. However, the performance of these methods has been assessed primarily in the context of Bayesian topic models, particularly latent Dirichlet allocation (LDA). Deriving several new algorithms, and using synthetic, image and genomic datasets, we… ▽ More

    Submitted 26 June, 2015; originally announced June 2015.

    Comments: ICML, 12 pages. Volume 37: Proceedings of The 32nd International Conference on Machine Learning, 2015

  37. arXiv:1506.03338  [pdf, ps, other

    cs.LG stat.ML

    Neural Adaptive Sequential Monte Carlo

    Authors: Shixiang Gu, Zoubin Ghahramani, Richard E. Turner

    Abstract: Sequential Monte Carlo (SMC), or particle filtering, is a popular class of methods for sampling from an intractable target distribution using a sequence of simpler intermediate distributions. Like other importance sampling-based methods, performance is critically dependent on the proposal distribution: a bad proposal can lead to arbitrarily inaccurate estimates of the target distribution. This pap… ▽ More

    Submitted 16 November, 2015; v1 submitted 10 June, 2015; originally announced June 2015.

  38. arXiv:1506.02158  [pdf, other

    stat.ML cs.LG

    Bayesian Convolutional Neural Networks with Bernoulli Approximate Variational Inference

    Authors: Yarin Gal, Zoubin Ghahramani

    Abstract: Convolutional neural networks (CNNs) work well on large datasets. But labelled data is hard to collect, and in some applications larger amounts of data are not available. The problem then is how to use CNNs with small data -- as CNNs overfit quickly. We present an efficient Bayesian CNN, offering better robustness to over-fitting on small data than traditional approaches. This is by placing a prob… ▽ More

    Submitted 18 January, 2016; v1 submitted 6 June, 2015; originally announced June 2015.

    Comments: 12 pages, 3 figures, ICLR format, updated with reviewer comments

  39. arXiv:1506.02142  [pdf, other

    stat.ML cs.LG

    Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning

    Authors: Yarin Gal, Zoubin Ghahramani

    Abstract: Deep learning tools have gained tremendous attention in applied machine learning. However such tools for regression and classification do not capture model uncertainty. In comparison, Bayesian models offer a mathematically grounded framework to reason about model uncertainty, but usually come with a prohibitive computational cost. In this paper we develop a new theoretical framework casting dropou… ▽ More

    Submitted 4 October, 2016; v1 submitted 6 June, 2015; originally announced June 2015.

    Comments: 12 pages, 6 figures; fixed a mistake with standard error and added a new table with updated results (marked "Update [October 2016]"); Published in ICML 2016

  40. arXiv:1505.03906  [pdf, other

    stat.ML cs.LG

    Training generative neural networks via Maximum Mean Discrepancy optimization

    Authors: Gintare Karolina Dziugaite, Daniel M. Roy, Zoubin Ghahramani

    Abstract: We consider training a deep neural network to generate samples from an unknown distribution given i.i.d. data. We frame learning as an optimization minimizing a two-sample test statistic---informally speaking, a good generator network produces samples that cause a two-sample test to fail to reject the null hypothesis. As our two-sample test statistic, we use an unbiased estimate of the maximum mea… ▽ More

    Submitted 14 May, 2015; originally announced May 2015.

    Comments: 10 pages, to appear in Uncertainty in Artificial Intelligence (UAI) 2015

  41. arXiv:1501.04684  [pdf, other

    cs.AI cs.PL

    Slice Sampling for Probabilistic Programming

    Authors: Razvan Ranca, Zoubin Ghahramani

    Abstract: We introduce the first, general purpose, slice sampling inference engine for probabilistic programs. This engine is released as part of StocPy, a new Turing-Complete probabilistic programming language, available as a Python library. We present a transdimensional generalisation of slice sampling which is necessary for the inference engine to work on traces with different numbers of random variables… ▽ More

    Submitted 19 January, 2015; originally announced January 2015.

    Comments: 11 pages

  42. arXiv:1408.2061  [pdf

    cs.LG stat.ML

    Warped Mixtures for Nonparametric Cluster Shapes

    Authors: Tomoharu Iwata, David Duvenaud, Zoubin Ghahramani

    Abstract: A mixture of Gaussians fit to a single curved or heavy-tailed cluster will report that the data contains many clusters. To produce more appropriate clusterings, we introduce a model which warps a latent mixture of Gaussians to produce nonparametric cluster shapes. The possibly low-dimensional latent mixture model allows us to summarize the properties of the high-dimensional clusters (or density ma… ▽ More

    Submitted 9 August, 2014; originally announced August 2014.

    Comments: Appears in Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence (UAI2013)

    Report number: UAI-P-2013-PG-311-320

  43. arXiv:1406.2541  [pdf, other

    stat.ML cs.LG

    Predictive Entropy Search for Efficient Global Optimization of Black-box Functions

    Authors: José Miguel Hernández-Lobato, Matthew W. Hoffman, Zoubin Ghahramani

    Abstract: We propose a novel information-theoretic approach for Bayesian optimization called Predictive Entropy Search (PES). At each iteration, PES selects the next evaluation point that maximizes the expected information gained with respect to the global maximum. PES codifies this intractable acquisition function in terms of the expected reduction in the differential entropy of the predictive distribution… ▽ More

    Submitted 10 June, 2014; originally announced June 2014.

  44. arXiv:1402.5836  [pdf, other

    stat.ML cs.LG

    Avoiding pathologies in very deep networks

    Authors: David Duvenaud, Oren Rippel, Ryan P. Adams, Zoubin Ghahramani

    Abstract: Choosing appropriate architectures and regularization strategies for deep networks is crucial to good predictive performance. To shed light on this problem, we analyze the analogous problem of constructing useful priors on compositions of functions. Specifically, we study the deep Gaussian process, a type of infinitely-wide, deep neural network. We show that in standard architectures, the represen… ▽ More

    Submitted 8 July, 2016; v1 submitted 24 February, 2014; originally announced February 2014.

    Comments: Fixed a typo regarding number of layers

  45. arXiv:1402.4306  [pdf, other

    stat.ML cs.AI cs.LG stat.ME

    Student-t Processes as Alternatives to Gaussian Processes

    Authors: Amar Shah, Andrew Gordon Wilson, Zoubin Ghahramani

    Abstract: We investigate the Student-t process as an alternative to the Gaussian process as a nonparametric prior over functions. We derive closed form expressions for the marginal likelihood and predictive distribution of a Student-t process, by integrating away an inverse Wishart process prior over the covariance kernel of a Gaussian process model. We show surprising equivalences between different hierarc… ▽ More

    Submitted 19 February, 2014; v1 submitted 18 February, 2014; originally announced February 2014.

    Comments: 13 pages, 6 figures, 1 table. To appear in "The Seventeenth International Conference on Artificial Intelligence and Statistics (AISTATS), 2014."

  46. arXiv:1402.4304  [pdf, other

    stat.ML cs.LG

    Automatic Construction and Natural-Language Description of Nonparametric Regression Models

    Authors: James Robert Lloyd, David Duvenaud, Roger Grosse, Joshua B. Tenenbaum, Zoubin Ghahramani

    Abstract: This paper presents the beginnings of an automatic statistician, focusing on regression problems. Our system explores an open-ended space of statistical models to discover a good explanation of a data set, and then produces a detailed report with figures and natural-language text. Our approach treats unknown regression functions nonparametrically using Gaussian processes, which has two important c… ▽ More

    Submitted 24 April, 2014; v1 submitted 18 February, 2014; originally announced February 2014.

  47. arXiv:1402.4293  [pdf, other

    stat.ML cs.LG

    The Random Forest Kernel and other kernels for big data from random partitions

    Authors: Alex Davies, Zoubin Ghahramani

    Abstract: We present Random Partition Kernels, a new class of kernels derived by demonstrating a natural connection between random partitions of objects and kernels between those objects. We show how the construction can be used to create kernels from methods that would not normally be viewed as random partitions, such as Random Forest. To demonstrate the potential of this method, we propose two new kernels… ▽ More

    Submitted 18 February, 2014; originally announced February 2014.

  48. arXiv:1402.0119  [pdf, other

    stat.ML cs.LG

    Randomized Nonlinear Component Analysis

    Authors: David Lopez-Paz, Suvrit Sra, Alex Smola, Zoubin Ghahramani, Bernhard Schölkopf

    Abstract: Classical methods such as Principal Component Analysis (PCA) and Canonical Correlation Analysis (CCA) are ubiquitous in statistics. However, these techniques are only able to reveal linear relationships in data. Although nonlinear variants of PCA and CCA have been proposed, these are computationally prohibitive in the large scale. In a separate strand of recent research, randomized methods have… ▽ More

    Submitted 13 May, 2014; v1 submitted 1 February, 2014; originally announced February 2014.

    Comments: Appearing in ICML 2014

  49. arXiv:1309.6862  [pdf

    cs.LG stat.ML

    Determinantal Clustering Processes - A Nonparametric Bayesian Approach to Kernel Based Semi-Supervised Clustering

    Authors: Amar Shah, Zoubin Ghahramani

    Abstract: Semi-supervised clustering is the task of clustering data points into clusters where only a fraction of the points are labelled. The true number of clusters in the data is often unknown and most models require this parameter as an input. Dirichlet process mixture models are appealing as they can infer the number of clusters from the data. However, these models do not deal with high dimensional dat… ▽ More

    Submitted 26 September, 2013; originally announced September 2013.

    Comments: Appears in Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence (UAI2013)

    Report number: UAI-P-2013-PG-566-575

  50. arXiv:1309.6858  [pdf

    cs.LG stat.ML

    The Supervised IBP: Neighbourhood Preserving Infinite Latent Feature Models

    Authors: Novi Quadrianto, Viktoriia Sharmanska, David A. Knowles, Zoubin Ghahramani

    Abstract: We propose a probabilistic model to infer supervised latent variables in the Hamming space from observed data. Our model allows simultaneous inference of the number of binary latent variables, and their values. The latent variables preserve neighbourhood structure of the data in a sense that objects in the same semantic concept have similar latent values, and objects in different concepts have dis… ▽ More

    Submitted 26 September, 2013; originally announced September 2013.

    Comments: Appears in Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence (UAI2013)

    Report number: UAI-P-2013-PG-527-536