Skip to main content

Showing 1–24 of 24 results for author: Mnih, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2209.14249  [pdf, other

    cs.LG stat.ML

    Compositional Score Modeling for Simulation-based Inference

    Authors: Tomas Geffner, George Papamakarios, Andriy Mnih

    Abstract: Neural Posterior Estimation methods for simulation-based inference can be ill-suited for dealing with posterior distributions obtained by conditioning on multiple observations, as they tend to require a large number of simulator calls to learn accurate approximations. In contrast, Neural Likelihood Estimation methods can handle multiple observations at inference time after learning from individual… ▽ More

    Submitted 9 July, 2023; v1 submitted 28 September, 2022; originally announced September 2022.

  2. arXiv:2109.11817  [pdf, other

    cs.LG stat.ML

    Unbiased Gradient Estimation with Balanced Assignments for Mixtures of Experts

    Authors: Wouter Kool, Chris J. Maddison, Andriy Mnih

    Abstract: Training large-scale mixture of experts models efficiently on modern hardware requires assigning datapoints in a batch to different experts, each with a limited capacity. Recently proposed assignment procedures lack a probabilistic interpretation and use biased estimators for training. As an alternative, we propose two unbiased estimators based on principled stochastic assignment procedures: one t… ▽ More

    Submitted 8 December, 2021; v1 submitted 24 September, 2021; originally announced September 2021.

    Comments: I (Still) Can't Believe It's Not Better Workshop at NeurIPS 2021

  3. arXiv:2106.08056  [pdf, other

    cs.LG stat.ML

    Coupled Gradient Estimators for Discrete Latent Variables

    Authors: Zhe Dong, Andriy Mnih, George Tucker

    Abstract: Training models with discrete latent variables is challenging due to the high variance of unbiased gradient estimators. While low-variance reparameterization gradients of a continuous relaxation can provide an effective solution, a continuous relaxation is not always available or tractable. Dong et al. (2020) and Yin et al. (2020) introduced a performant estimator that does not rely on continuous… ▽ More

    Submitted 15 November, 2022; v1 submitted 15 June, 2021; originally announced June 2021.

    Comments: Published in NeurIPS 2021

  4. arXiv:2101.11046  [pdf, other

    stat.ML cs.LG

    Generalized Doubly Reparameterized Gradient Estimators

    Authors: Matthias Bauer, Andriy Mnih

    Abstract: Efficient low-variance gradient estimation enabled by the reparameterization trick (RT) has been essential to the success of variational autoencoders. Doubly-reparameterized gradients (DReGs) improve on the RT for multi-sample variational bounds by applying reparameterization a second time for an additional reduction in variance. Here, we develop two generalizations of the DReGs estimator and show… ▽ More

    Submitted 13 July, 2021; v1 submitted 26 January, 2021; originally announced January 2021.

    Journal ref: 38th International Conference on Machine Learning (ICML 2021)

  5. arXiv:2006.10680  [pdf, other

    cs.LG stat.ML

    DisARM: An Antithetic Gradient Estimator for Binary Latent Variables

    Authors: Zhe Dong, Andriy Mnih, George Tucker

    Abstract: Training models with discrete latent variables is challenging due to the difficulty of estimating the gradients accurately. Much of the recent progress has been achieved by taking advantage of continuous relaxations of the system, which are not always available or even possible. The Augment-REINFORCE-Merge (ARM) estimator provides an alternative that, instead of relaxation, uses continuous augment… ▽ More

    Submitted 3 December, 2020; v1 submitted 18 June, 2020; originally announced June 2020.

    Journal ref: Part of Advances in Neural Information Processing Systems 33 proceedings (NeurIPS 2020)

  6. arXiv:2006.04710  [pdf, other

    stat.ML cs.LG

    The Lipschitz Constant of Self-Attention

    Authors: Hyunjik Kim, George Papamakarios, Andriy Mnih

    Abstract: Lipschitz constants of neural networks have been explored in various contexts in deep learning, such as provable adversarial robustness, estimating Wasserstein distance, stabilising training of GANs, and formulating invertible neural networks. Such works have focused on bounding the Lipschitz constant of fully connected or convolutional networks, composed of linear maps and pointwise non-lineariti… ▽ More

    Submitted 9 June, 2021; v1 submitted 8 June, 2020; originally announced June 2020.

  7. arXiv:2001.08116  [pdf, other

    cs.LG cs.AI stat.ML

    Q-Learning in enormous action spaces via amortized approximate maximization

    Authors: Tom Van de Wiele, David Warde-Farley, Andriy Mnih, Volodymyr Mnih

    Abstract: Applying Q-learning to high-dimensional or continuous action spaces can be difficult due to the required maximization over the set of possible actions. Motivated by techniques from amortized inference, we replace the expensive maximization over all actions with a maximization over a small subset of possible actions sampled from a learned proposal distribution. The resulting approach, which we dub… ▽ More

    Submitted 22 January, 2020; originally announced January 2020.

    Comments: A previous version of this work appeared at the Deep Reinforcement Learning Workshop, NeurIPS 2018

  8. arXiv:1910.10596  [pdf, other

    stat.ML cs.LG

    Sparse Orthogonal Variational Inference for Gaussian Processes

    Authors: Jiaxin Shi, Michalis K. Titsias, Andriy Mnih

    Abstract: We introduce a new interpretation of sparse variational approximations for Gaussian processes using inducing points, which can lead to more scalable algorithms than previous methods. It is based on decomposing a Gaussian process as a sum of two independent processes: one spanned by a finite basis of inducing points and the other capturing the remaining variation. We show that this formulation reco… ▽ More

    Submitted 24 February, 2024; v1 submitted 23 October, 2019; originally announced October 2019.

    Comments: AISTATS 2020

  9. arXiv:1906.10652  [pdf, other

    stat.ML cs.LG math.OC

    Monte Carlo Gradient Estimation in Machine Learning

    Authors: Shakir Mohamed, Mihaela Rosca, Michael Figurnov, Andriy Mnih

    Abstract: This paper is a broad and accessible survey of the methods we have at our disposal for Monte Carlo gradient estimation in machine learning and across the statistical sciences: the problem of computing the gradient of an expectation of a function with respect to parameters defining the distribution that is integrated; the problem of sensitivity analysis. In machine learning research, this gradient… ▽ More

    Submitted 29 September, 2020; v1 submitted 25 June, 2019; originally announced June 2019.

    Comments: 62 pages

    Journal ref: Journal of Machine Learning Research, 21(132):1-62, 2020

  10. arXiv:1901.05761  [pdf, other

    cs.LG stat.ML

    Attentive Neural Processes

    Authors: Hyunjik Kim, Andriy Mnih, Jonathan Schwarz, Marta Garnelo, Ali Eslami, Dan Rosenbaum, Oriol Vinyals, Yee Whye Teh

    Abstract: Neural Processes (NPs) (Garnelo et al 2018a;b) approach regression by learning to map a context set of observed input-output pairs to a distribution over regression functions. Each function models the distribution of the output given an input, conditioned on the context. NPs have the benefit of fitting observed data efficiently with linear complexity in the number of context input-output pairs, an… ▽ More

    Submitted 9 July, 2019; v1 submitted 17 January, 2019; originally announced January 2019.

  11. arXiv:1810.11428  [pdf, other

    stat.ML cs.LG

    Resampled Priors for Variational Autoencoders

    Authors: Matthias Bauer, Andriy Mnih

    Abstract: We propose Learned Accept/Reject Sampling (LARS), a method for constructing richer priors using rejection sampling with a learned acceptance function. This work is motivated by recent analyses of the VAE objective, which pointed out that commonly used simple priors can lead to underfitting. As the distribution induced by LARS involves an intractable normalizing constant, we show how to estimate it… ▽ More

    Submitted 26 April, 2019; v1 submitted 26 October, 2018; originally announced October 2018.

    Journal ref: Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS) 2019

  12. arXiv:1805.08498  [pdf, other

    cs.LG stat.ML

    Implicit Reparameterization Gradients

    Authors: Michael Figurnov, Shakir Mohamed, Andriy Mnih

    Abstract: By providing a simple and efficient way of computing low-variance gradients of continuous random variables, the reparameterization trick has become the technique of choice for training a variety of latent variable models. However, it is not applicable to a number of important continuous distributions. We introduce an alternative approach to computing reparameterization gradients based on implicit… ▽ More

    Submitted 30 January, 2019; v1 submitted 22 May, 2018; originally announced May 2018.

    Comments: NeurIPS 2018

  13. arXiv:1802.05983  [pdf, other

    stat.ML cs.LG

    Disentangling by Factorising

    Authors: Hyunjik Kim, Andriy Mnih

    Abstract: We define and address the problem of unsupervised learning of disentangled representations on data generated from independent factors of variation. We propose FactorVAE, a method that disentangles by encouraging the distribution of representations to be factorial and hence independent across the dimensions. We show that it improves upon $β$-VAE by providing a better trade-off between disentangleme… ▽ More

    Submitted 9 July, 2019; v1 submitted 16 February, 2018; originally announced February 2018.

    Comments: Shorter version appeared in Learning Disentangled Representations: From Perception to Control workshop at NIPS, 2017: https://sites.google.com/corp/view/disentanglenips2017

  14. arXiv:1709.07116  [pdf, other

    cs.LG

    Variational Memory Addressing in Generative Models

    Authors: Jörg Bornschein, Andriy Mnih, Daniel Zoran, Danilo J. Rezende

    Abstract: Aiming to augment generative models with external memory, we interpret the output of a memory module with stochastic addressing as a conditional mixture distribution, where a read operation corresponds to sampling a discrete memory address and retrieving the corresponding content from memory. This perspective allows us to apply variational inference to memory addressing, which enables effective tr… ▽ More

    Submitted 20 September, 2017; originally announced September 2017.

  15. arXiv:1705.09279  [pdf, other

    cs.LG cs.AI cs.NE stat.ML

    Filtering Variational Objectives

    Authors: Chris J. Maddison, Dieterich Lawson, George Tucker, Nicolas Heess, Mohammad Norouzi, Andriy Mnih, Arnaud Doucet, Yee Whye Teh

    Abstract: When used as a surrogate objective for maximum likelihood estimation in latent variable models, the evidence lower bound (ELBO) produces state-of-the-art results. Inspired by this, we consider the extension of the ELBO to a family of lower bounds defined by a particle filter's estimator of the marginal likelihood, the filtering variational objectives (FIVOs). FIVOs take the same arguments as the E… ▽ More

    Submitted 12 November, 2017; v1 submitted 25 May, 2017; originally announced May 2017.

  16. arXiv:1703.07370  [pdf, other

    cs.LG stat.ML

    REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models

    Authors: George Tucker, Andriy Mnih, Chris J. Maddison, Dieterich Lawson, Jascha Sohl-Dickstein

    Abstract: Learning in models with discrete latent variables is challenging due to high variance gradient estimators. Generally, approaches have relied on control variates to reduce the variance of the REINFORCE estimator. Recent work (Jang et al. 2016, Maddison et al. 2016) has taken a different approach, introducing a continuous relaxation of discrete variables to produce low-variance, but biased, gradient… ▽ More

    Submitted 6 November, 2017; v1 submitted 21 March, 2017; originally announced March 2017.

    Comments: NIPS 2017

  17. arXiv:1703.05820  [pdf, other

    cs.LG cs.AI

    Particle Value Functions

    Authors: Chris J. Maddison, Dieterich Lawson, George Tucker, Nicolas Heess, Arnaud Doucet, Andriy Mnih, Yee Whye Teh

    Abstract: The policy gradients of the expected return objective can react slowly to rare rewards. Yet, in some cases agents may wish to emphasize the low or high returns regardless of their probability. Borrowing from the economics and control literature, we review the risk-sensitive value function that arises from an exponential utility and illustrate its effects on an example. This risk-sensitive value fu… ▽ More

    Submitted 16 March, 2017; originally announced March 2017.

  18. arXiv:1611.00712  [pdf, other

    cs.LG stat.ML

    The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables

    Authors: Chris J. Maddison, Andriy Mnih, Yee Whye Teh

    Abstract: The reparameterization trick enables optimizing large scale stochastic computation graphs via gradient descent. The essence of the trick is to refactor each stochastic node into a differentiable function of its parameters and a random variable with fixed distribution. After refactoring, the gradients of the loss propagated by the chain rule through the graph are low variance unbiased estimators of… ▽ More

    Submitted 5 March, 2017; v1 submitted 2 November, 2016; originally announced November 2016.

  19. arXiv:1602.06725  [pdf, other

    cs.LG stat.ML

    Variational inference for Monte Carlo objectives

    Authors: Andriy Mnih, Danilo J. Rezende

    Abstract: Recent progress in deep latent variable models has largely been driven by the development of flexible and scalable variational inference methods. Variational training of this type involves maximizing a lower bound on the log-likelihood, using samples from the variational posterior to compute the required gradients. Recently, Burda et al. (2016) have derived a tighter lower bound using a multi-samp… ▽ More

    Submitted 1 June, 2016; v1 submitted 22 February, 2016; originally announced February 2016.

    Comments: Appears in Proceedings of the 33rd International Conference on Machine Learning (ICML), New York, NY, USA, 2016. JMLR: W&CP volume 48

  20. arXiv:1511.05176  [pdf, other

    cs.LG

    MuProp: Unbiased Backpropagation for Stochastic Neural Networks

    Authors: Shixiang Gu, Sergey Levine, Ilya Sutskever, Andriy Mnih

    Abstract: Deep neural networks are powerful parametric models that can be trained efficiently using the backpropagation algorithm. Stochastic neural networks combine the power of large parametric functions with that of graphical models, which makes it possible to learn very complex distributions. However, as backpropagation is not directly applicable to stochastic networks that include discrete sampling ope… ▽ More

    Submitted 25 February, 2016; v1 submitted 16 November, 2015; originally announced November 2015.

    Comments: Published as a conference paper at ICLR 2016

  21. arXiv:1402.0030  [pdf, ps, other

    cs.LG stat.ML

    Neural Variational Inference and Learning in Belief Networks

    Authors: Andriy Mnih, Karol Gregor

    Abstract: Highly expressive directed latent variable models, such as sigmoid belief networks, are difficult to train on large datasets because exact inference in them is intractable and none of the approximate inference methods that have been applied to them scale well. We propose a fast non-iterative approximate inference method that uses a feedforward network to implement efficient exact sampling from the… ▽ More

    Submitted 4 June, 2014; v1 submitted 31 January, 2014; originally announced February 2014.

    Journal ref: Proceedings of the 31st International Conference on Machine Learning (ICML), JMLR: W&CP volume 32, 2014 pgs 1791-1799

  22. arXiv:1310.8499  [pdf, other

    cs.LG stat.ML

    Deep AutoRegressive Networks

    Authors: Karol Gregor, Ivo Danihelka, Andriy Mnih, Charles Blundell, Daan Wierstra

    Abstract: We introduce a deep, generative autoencoder capable of learning hierarchies of distributed representations from data. Successive deep stochastic hidden layers are equipped with autoregressive connections, which enable the model to be sampled from quickly and exactly via ancestral sampling. We derive an efficient approximate parameter estimation method based on the minimum description length (MDL)… ▽ More

    Submitted 20 May, 2014; v1 submitted 31 October, 2013; originally announced October 2013.

    Comments: Appears in Proceedings of the 31st International Conference on Machine Learning (ICML), Bei**g, China, 2014

    Journal ref: Karol Gregor, Ivo Danihelka, Andriy Mnih, Charles Blundell, Daan Wierstra. Deep AutoRegressive Networks. In Proceedings of the 31st International Conference on Machine Learning (ICML), JMLR: W&CP volume 32, 2014

  23. arXiv:1206.6426  [pdf

    cs.CL cs.LG

    A Fast and Simple Algorithm for Training Neural Probabilistic Language Models

    Authors: Andriy Mnih, Yee Whye Teh

    Abstract: In spite of their superior performance, neural probabilistic language models (NPLMs) remain far less widely used than n-gram models due to their notoriously long training times, which are measured in weeks even for moderately-sized datasets. Training NPLMs is computationally expensive because they are explicitly normalized, which leads to having to consider all words in the vocabulary when computi… ▽ More

    Submitted 27 June, 2012; originally announced June 2012.

    Comments: Appears in Proceedings of the 29th International Conference on Machine Learning (ICML 2012)

    Journal ref: In Proceedings of the 29th International Conference on Machine Learning, pages 1751-1758, 2012

  24. arXiv:1109.5894  [pdf, ps, other

    cs.LG stat.ML

    Learning Item Trees for Probabilistic Modelling of Implicit Feedback

    Authors: Andriy Mnih, Yee Whye Teh

    Abstract: User preferences for items can be inferred from either explicit feedback, such as item ratings, or implicit feedback, such as rental histories. Research in collaborative filtering has concentrated on explicit feedback, resulting in the development of accurate and scalable models. However, since explicit feedback is often difficult to collect it is important to develop effective models that take ad… ▽ More

    Submitted 27 September, 2011; originally announced September 2011.

    Comments: 8 pages