Search | arXiv e-print repository

arXiv:1906.10652 [pdf, other]

Monte Carlo Gradient Estimation in Machine Learning

Authors: Shakir Mohamed, Mihaela Rosca, Michael Figurnov, Andriy Mnih

Abstract: This paper is a broad and accessible survey of the methods we have at our disposal for Monte Carlo gradient estimation in machine learning and across the statistical sciences: the problem of computing the gradient of an expectation of a function with respect to parameters defining the distribution that is integrated; the problem of sensitivity analysis. In machine learning research, this gradient… ▽ More This paper is a broad and accessible survey of the methods we have at our disposal for Monte Carlo gradient estimation in machine learning and across the statistical sciences: the problem of computing the gradient of an expectation of a function with respect to parameters defining the distribution that is integrated; the problem of sensitivity analysis. In machine learning research, this gradient problem lies at the core of many learning problems, in supervised, unsupervised and reinforcement learning. We will generally seek to rewrite such gradients in a form that allows for Monte Carlo estimation, allowing them to be easily and efficiently used and analysed. We explore three strategies--the pathwise, score function, and measure-valued gradient estimators--exploring their historical development, derivation, and underlying assumptions. We describe their use in other fields, show how they are related and can be combined, and expand on their possible generalisations. Wherever Monte Carlo gradient estimators have been derived and deployed in the past, important advances have followed. A deeper and more widely-held understanding of this problem will lead to further advances, and it is these advances that we wish to support. △ Less

Submitted 29 September, 2020; v1 submitted 25 June, 2019; originally announced June 2019.

Comments: 62 pages

Journal ref: Journal of Machine Learning Research, 21(132):1-62, 2020

arXiv:1806.02382 [pdf, other]

Variational Autoencoder with Arbitrary Conditioning

Authors: Oleg Ivanov, Michael Figurnov, Dmitry Vetrov

Abstract: We propose a single neural probabilistic model based on variational autoencoder that can be conditioned on an arbitrary subset of observed features and then sample the remaining features in "one shot". The features may be both real-valued and categorical. Training of the model is performed by stochastic variational Bayes. The experimental evaluation on synthetic data, as well as feature imputation… ▽ More We propose a single neural probabilistic model based on variational autoencoder that can be conditioned on an arbitrary subset of observed features and then sample the remaining features in "one shot". The features may be both real-valued and categorical. Training of the model is performed by stochastic variational Bayes. The experimental evaluation on synthetic data, as well as feature imputation and image inpainting problems, shows the effectiveness of the proposed approach and diversity of the generated samples. △ Less

Submitted 27 June, 2019; v1 submitted 6 June, 2018; originally announced June 2018.

Comments: Published as a conference paper at ICLR 2019

arXiv:1805.08498 [pdf, other]

Implicit Reparameterization Gradients

Authors: Michael Figurnov, Shakir Mohamed, Andriy Mnih

Abstract: By providing a simple and efficient way of computing low-variance gradients of continuous random variables, the reparameterization trick has become the technique of choice for training a variety of latent variable models. However, it is not applicable to a number of important continuous distributions. We introduce an alternative approach to computing reparameterization gradients based on implicit… ▽ More By providing a simple and efficient way of computing low-variance gradients of continuous random variables, the reparameterization trick has become the technique of choice for training a variety of latent variable models. However, it is not applicable to a number of important continuous distributions. We introduce an alternative approach to computing reparameterization gradients based on implicit differentiation and demonstrate its broader applicability by applying it to Gamma, Beta, Dirichlet, and von Mises distributions, which cannot be used with the classic reparameterization trick. Our experiments show that the proposed approach is faster and more accurate than the existing gradient estimators for these distributions. △ Less

Submitted 30 January, 2019; v1 submitted 22 May, 2018; originally announced May 2018.

Comments: NeurIPS 2018

arXiv:1801.01928 [pdf, ps, other]

Tensor Train decomposition on TensorFlow (T3F)

Authors: Alexander Novikov, Pavel Izmailov, Valentin Khrulkov, Michael Figurnov, Ivan Oseledets

Abstract: Tensor Train decomposition is used across many branches of machine learning. We present T3F -- a library for Tensor Train decomposition based on TensorFlow. T3F supports GPU execution, batch processing, automatic differentiation, and versatile functionality for the Riemannian optimization framework, which takes into account the underlying manifold structure to construct efficient optimization meth… ▽ More Tensor Train decomposition is used across many branches of machine learning. We present T3F -- a library for Tensor Train decomposition based on TensorFlow. T3F supports GPU execution, batch processing, automatic differentiation, and versatile functionality for the Riemannian optimization framework, which takes into account the underlying manifold structure to construct efficient optimization methods. The library makes it easier to implement machine learning papers that rely on the Tensor Train decomposition. T3F includes documentation, examples and 94% test coverage. △ Less

Submitted 2 March, 2020; v1 submitted 5 January, 2018; originally announced January 2018.

arXiv:1712.00386 [pdf, other]

Probabilistic Adaptive Computation Time

Authors: Michael Figurnov, Artem Sobolev, Dmitry Vetrov

Abstract: We present a probabilistic model with discrete latent variables that control the computation time in deep learning models such as ResNets and LSTMs. A prior on the latent variables expresses the preference for faster computation. The amount of computation for an input is determined via amortized maximum a posteriori (MAP) inference. MAP inference is performed using a novel stochastic variational o… ▽ More We present a probabilistic model with discrete latent variables that control the computation time in deep learning models such as ResNets and LSTMs. A prior on the latent variables expresses the preference for faster computation. The amount of computation for an input is determined via amortized maximum a posteriori (MAP) inference. MAP inference is performed using a novel stochastic variational optimization method. The recently proposed Adaptive Computation Time mechanism can be seen as an ad-hoc relaxation of this model. We demonstrate training using the general-purpose Concrete relaxation of discrete variables. Evaluation on ResNet shows that our method matches the speed-accuracy trade-off of Adaptive Computation Time, while allowing for evaluation with a simple deterministic procedure that has a lower memory footprint. △ Less

Submitted 1 December, 2017; originally announced December 2017.

arXiv:1612.02297 [pdf, other]

Spatially Adaptive Computation Time for Residual Networks

Authors: Michael Figurnov, Maxwell D. Collins, Yukun Zhu, Li Zhang, Jonathan Huang, Dmitry Vetrov, Ruslan Salakhutdinov

Abstract: This paper proposes a deep learning architecture based on Residual Network that dynamically adjusts the number of executed layers for the regions of the image. This architecture is end-to-end trainable, deterministic and problem-agnostic. It is therefore applicable without any modifications to a wide range of computer vision problems such as image classification, object detection and image segment… ▽ More This paper proposes a deep learning architecture based on Residual Network that dynamically adjusts the number of executed layers for the regions of the image. This architecture is end-to-end trainable, deterministic and problem-agnostic. It is therefore applicable without any modifications to a wide range of computer vision problems such as image classification, object detection and image segmentation. We present experimental results showing that this model improves the computational efficiency of Residual Networks on the challenging ImageNet classification and COCO object detection datasets. Additionally, we evaluate the computation time maps on the visual saliency dataset cat2000 and find that they correlate surprisingly well with human eye fixation positions. △ Less

Submitted 2 July, 2017; v1 submitted 7 December, 2016; originally announced December 2016.

Comments: CVPR 2017

arXiv:1611.09226 [pdf, other]

Robust Variational Inference

Authors: Michael Figurnov, Kirill Struminsky, Dmitry Vetrov

Abstract: Variational inference is a powerful tool for approximate inference. However, it mainly focuses on the evidence lower bound as variational objective and the development of other measures for variational inference is a promising area of research. This paper proposes a robust modification of evidence and a lower bound for the evidence, which is applicable when the majority of the training set samples… ▽ More Variational inference is a powerful tool for approximate inference. However, it mainly focuses on the evidence lower bound as variational objective and the development of other measures for variational inference is a promising area of research. This paper proposes a robust modification of evidence and a lower bound for the evidence, which is applicable when the majority of the training set samples are random noise objects. We provide experiments for variational autoencoders to show advantage of the objective over the evidence lower bound on synthetic datasets obtained by adding uninformative noise objects to MNIST and OMNIGLOT. Additionally, for the original MNIST and OMNIGLOT datasets we observe a small improvement over the non-robust evidence lower bound. △ Less

Submitted 28 November, 2016; originally announced November 2016.

Comments: NIPS 2016 Workshop, Advances in Approximate Bayesian Inference

arXiv:1504.08362 [pdf, other]

PerforatedCNNs: Acceleration through Elimination of Redundant Convolutions

Authors: Michael Figurnov, Aijan Ibraimova, Dmitry Vetrov, Pushmeet Kohli

Abstract: We propose a novel approach to reduce the computational cost of evaluation of convolutional neural networks, a factor that has hindered their deployment in low-power devices such as mobile phones. Inspired by the loop perforation technique from source code optimization, we speed up the bottleneck convolutional layers by skip** their evaluation in some of the spatial positions. We propose and ana… ▽ More We propose a novel approach to reduce the computational cost of evaluation of convolutional neural networks, a factor that has hindered their deployment in low-power devices such as mobile phones. Inspired by the loop perforation technique from source code optimization, we speed up the bottleneck convolutional layers by skip** their evaluation in some of the spatial positions. We propose and analyze several strategies of choosing these positions. We demonstrate that perforation can accelerate modern convolutional networks such as AlexNet and VGG-16 by a factor of 2x - 4x. Additionally, we show that perforation is complementary to the recently proposed acceleration method of Zhang et al. △ Less

Submitted 15 October, 2016; v1 submitted 30 April, 2015; originally announced April 2015.

Comments: NIPS 2016

Showing 1–8 of 8 results for author: Figurnov, M