Search | arXiv e-print repository

Ensemble sampling for linear bandits: small ensembles suffice

Authors: David Janz, Alexander E. Litvak, Csaba Szepesvári

Abstract: We provide the first useful and rigorous analysis of ensemble sampling for the stochastic linear bandit setting. In particular, we show that, under standard assumptions, for a $d$-dimensional stochastic linear bandit with an interaction horizon $T$, ensemble sampling with an ensemble of size of order $\smash{d \log T}$ incurs regret at most of the order $\smash{(d \log T)^{5/2} \sqrt{T}}$. Ours is… ▽ More We provide the first useful and rigorous analysis of ensemble sampling for the stochastic linear bandit setting. In particular, we show that, under standard assumptions, for a $d$-dimensional stochastic linear bandit with an interaction horizon $T$, ensemble sampling with an ensemble of size of order $\smash{d \log T}$ incurs regret at most of the order $\smash{(d \log T)^{5/2} \sqrt{T}}$. Ours is the first result in any structured setting not to require the size of the ensemble to scale linearly with $T$ -- which defeats the purpose of ensemble sampling -- while obtaining near $\smash{\sqrt{T}}$ order regret. Ours is also the first result that allows infinite action sets. △ Less

Submitted 6 March, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

arXiv:2311.07565 [pdf, other]

Exploration via linearly perturbed loss minimisation

Authors: David Janz, Shuai Liu, Alex Ayoub, Csaba Szepesvári

Abstract: We introduce exploration via linear loss perturbations (EVILL), a randomised exploration method for structured stochastic bandit problems that works by solving for the minimiser of a linearly perturbed regularised negative log-likelihood function. We show that, for the case of generalised linear bandits, EVILL reduces to perturbed history exploration (PHE), a method where exploration is done by tr… ▽ More We introduce exploration via linear loss perturbations (EVILL), a randomised exploration method for structured stochastic bandit problems that works by solving for the minimiser of a linearly perturbed regularised negative log-likelihood function. We show that, for the case of generalised linear bandits, EVILL reduces to perturbed history exploration (PHE), a method where exploration is done by training on randomly perturbed rewards. In doing so, we provide a simple and clean explanation of when and why random reward perturbations give rise to good bandit algorithms. We propose data-dependent perturbations not present in previous PHE-type methods that allow EVILL to match the performance of Thompson-sampling-style parameter-perturbation methods, both in theory and in practice. Moreover, we show an example outside generalised linear bandits where PHE leads to inconsistent estimates, and thus linear regret, while EVILL remains performant. Like PHE, EVILL can be implemented in just a few lines of code. △ Less

Submitted 6 March, 2024; v1 submitted 13 November, 2023; originally announced November 2023.

arXiv:2310.20581 [pdf, other]

Stochastic Gradient Descent for Gaussian Processes Done Right

Authors: Jihao Andreas Lin, Shreyas Padhy, Javier Antorán, Austin Tripp, Alexander Terenin, Csaba Szepesvári, José Miguel Hernández-Lobato, David Janz

Abstract: As is well known, both sampling from the posterior and computing the mean of the posterior in Gaussian process regression reduces to solving a large linear system of equations. We study the use of stochastic gradient descent for solving this linear system, and show that when \emph{done right} -- by which we mean using specific insights from the optimisation and kernel communities -- stochastic gra… ▽ More As is well known, both sampling from the posterior and computing the mean of the posterior in Gaussian process regression reduces to solving a large linear system of equations. We study the use of stochastic gradient descent for solving this linear system, and show that when \emph{done right} -- by which we mean using specific insights from the optimisation and kernel communities -- stochastic gradient descent is highly effective. To that end, we introduce a particularly simple \emph{stochastic dual descent} algorithm, explain its design in an intuitive manner and illustrate the design choices through a series of ablation studies. Further experiments demonstrate that our new method is highly competitive. In particular, our evaluations on the UCI regression tasks and on Bayesian optimisation set our approach apart from preconditioned conjugate gradients and variational Gaussian process approximations. Moreover, our method places Gaussian process regression on par with state-of-the-art graph neural networks for molecular binding affinity prediction. △ Less

Submitted 28 April, 2024; v1 submitted 31 October, 2023; originally announced October 2023.

arXiv:2306.11589 [pdf, other]

Sampling from Gaussian Process Posteriors using Stochastic Gradient Descent

Authors: Jihao Andreas Lin, Javier Antorán, Shreyas Padhy, David Janz, José Miguel Hernández-Lobato, Alexander Terenin

Abstract: Gaussian processes are a powerful framework for quantifying uncertainty and for sequential decision-making but are limited by the requirement of solving linear systems. In general, this has a cubic cost in dataset size and is sensitive to conditioning. We explore stochastic gradient algorithms as a computationally efficient method of approximately solving these linear systems: we develop low-varia… ▽ More Gaussian processes are a powerful framework for quantifying uncertainty and for sequential decision-making but are limited by the requirement of solving linear systems. In general, this has a cubic cost in dataset size and is sensitive to conditioning. We explore stochastic gradient algorithms as a computationally efficient method of approximately solving these linear systems: we develop low-variance optimization objectives for sampling from the posterior and extend these to inducing points. Counterintuitively, stochastic gradient descent often produces accurate predictions, even in cases where it does not converge quickly to the optimum. We explain this through a spectral characterization of the implicit bias from non-convergence. We show that stochastic gradient descent produces predictive distributions close to the true posterior both in regions with sufficient data coverage, and in regions sufficiently far away from the data. Experimentally, stochastic gradient descent achieves state-of-the-art performance on sufficiently large-scale or ill-conditioned regression tasks. Its uncertainty estimates match the performance of significantly more expensive baselines on a large-scale Bayesian optimization task. △ Less

Submitted 15 January, 2024; v1 submitted 20 June, 2023; originally announced June 2023.

Journal ref: Advances in Neural Information Processing Systems, 2023

arXiv:2210.04994 [pdf, other]

Sampling-based inference for large linear models, with application to linearised Laplace

Authors: Javier Antorán, Shreyas Padhy, Riccardo Barbano, Eric Nalisnick, David Janz, José Miguel Hernández-Lobato

Abstract: Large-scale linear models are ubiquitous throughout machine learning, with contemporary application as surrogate models for neural network uncertainty quantification; that is, the linearised Laplace method. Alas, the computational cost associated with Bayesian linear models constrains this method's application to small networks, small output spaces and small datasets. We address this limitation by… ▽ More Large-scale linear models are ubiquitous throughout machine learning, with contemporary application as surrogate models for neural network uncertainty quantification; that is, the linearised Laplace method. Alas, the computational cost associated with Bayesian linear models constrains this method's application to small networks, small output spaces and small datasets. We address this limitation by introducing a scalable sample-based Bayesian inference method for conjugate Gaussian multi-output linear models, together with a matching method for hyperparameter (regularisation) selection. Furthermore, we use a classic feature normalisation method (the g-prior) to resolve a previously highlighted pathology of the linearised Laplace method. Together, these contributions allow us to perform linearised neural network inference with ResNet-18 on CIFAR100 (11M parameters, 100 outputs x 50k datapoints), with ResNet-50 on Imagenet (50M parameters, 1000 outputs x 1.2M datapoints) and with a U-Net on a high-resolution tomographic reconstruction task (2M parameters, 251k output~dimensions). △ Less

Submitted 16 March, 2023; v1 submitted 10 October, 2022; originally announced October 2022.

Comments: Published at ICLR 2023. This latest Arxiv version is extended with a demonstration of the proposed methods on the Imagenet dataset

arXiv:2206.08900 [pdf, other]

Adapting the Linearised Laplace Model Evidence for Modern Deep Learning

Authors: Javier Antorán, David Janz, James Urquhart Allingham, Erik Daxberger, Riccardo Barbano, Eric Nalisnick, José Miguel Hernández-Lobato

Abstract: The linearised Laplace method for estimating model uncertainty has received renewed attention in the Bayesian deep learning community. The method provides reliable error bars and admits a closed-form expression for the model evidence, allowing for scalable selection of model hyperparameters. In this work, we examine the assumptions behind this method, particularly in conjunction with model selecti… ▽ More The linearised Laplace method for estimating model uncertainty has received renewed attention in the Bayesian deep learning community. The method provides reliable error bars and admits a closed-form expression for the model evidence, allowing for scalable selection of model hyperparameters. In this work, we examine the assumptions behind this method, particularly in conjunction with model selection. We show that these interact poorly with some now-standard tools of deep learning--stochastic approximation methods and normalisation layers--and make recommendations for how to better adapt this classic method to the modern setting. We provide theoretical support for our recommendations and validate them empirically on MLPs, classic CNNs, residual networks with and without normalisation layers, generative autoencoders and transformers. △ Less

Submitted 8 December, 2022; v1 submitted 17 June, 2022; originally announced June 2022.

Comments: Paper appearing at ICML 2022

arXiv:2001.10396 [pdf, other]

Bandit optimisation of functions in the Matérn kernel RKHS

Authors: David Janz, David R. Burt, Javier González

Abstract: We consider the problem of optimising functions in the reproducing kernel Hilbert space (RKHS) of a Matérn kernel with smoothness parameter $ν$ over the domain $[0,1]^d$ under noisy bandit feedback. Our contribution, the $π$-GP-UCB algorithm, is the first practical approach with guaranteed sublinear regret for all $ν>1$ and $d \geq 1$. Empirical validation suggests better performance and drastical… ▽ More We consider the problem of optimising functions in the reproducing kernel Hilbert space (RKHS) of a Matérn kernel with smoothness parameter $ν$ over the domain $[0,1]^d$ under noisy bandit feedback. Our contribution, the $π$-GP-UCB algorithm, is the first practical approach with guaranteed sublinear regret for all $ν>1$ and $d \geq 1$. Empirical validation suggests better performance and drastically improved computational scalablity compared with its predecessor, Improved GP-UCB. △ Less

Submitted 26 February, 2023; v1 submitted 28 January, 2020; originally announced January 2020.

Comments: Included an errata highlighting an omission in the proof of lemma 1 and pointing to a fix in the author's thesis; the omission does not affect the main result

arXiv:1810.06530 [pdf, other]

Successor Uncertainties: Exploration and Uncertainty in Temporal Difference Learning

Authors: David Janz, Jiri Hron, Przemysław Mazur, Katja Hofmann, José Miguel Hernández-Lobato, Sebastian Tschiatschek

Abstract: Posterior sampling for reinforcement learning (PSRL) is an effective method for balancing exploration and exploitation in reinforcement learning. Randomised value functions (RVF) can be viewed as a promising approach to scaling PSRL. However, we show that most contemporary algorithms combining RVF with neural network function approximation do not possess the properties which make PSRL effective, a… ▽ More Posterior sampling for reinforcement learning (PSRL) is an effective method for balancing exploration and exploitation in reinforcement learning. Randomised value functions (RVF) can be viewed as a promising approach to scaling PSRL. However, we show that most contemporary algorithms combining RVF with neural network function approximation do not possess the properties which make PSRL effective, and provably fail in sparse reward problems. Moreover, we find that propagation of uncertainty, a property of PSRL previously thought important for exploration, does not preclude this failure. We use these insights to design Successor Uncertainties (SU), a cheap and easy to implement RVF algorithm that retains key properties of PSRL. SU is highly effective on hard tabular exploration benchmarks. Furthermore, on the Atari 2600 domain, it surpasses human performance on 38 of 49 games tested (achieving a median human normalised score of 2.09), and outperforms its closest RVF competitor, Bootstrapped DQN, on 36 of those. △ Less

Submitted 3 December, 2019; v1 submitted 15 October, 2018; originally announced October 2018.

Comments: Camera ready version, NeurIPS 2019

arXiv:1807.00412 [pdf, other]

Learning to Drive in a Day

Authors: Alex Kendall, Jeffrey Hawke, David Janz, Przemyslaw Mazur, Daniele Reda, John-Mark Allen, Vinh-Dieu Lam, Alex Bewley, Amar Shah

Abstract: We demonstrate the first application of deep reinforcement learning to autonomous driving. From randomly initialised parameters, our model is able to learn a policy for lane following in a handful of training episodes using a single monocular image as input. We provide a general and easy to obtain reward: the distance travelled by the vehicle without the safety driver taking control. We use a cont… ▽ More We demonstrate the first application of deep reinforcement learning to autonomous driving. From randomly initialised parameters, our model is able to learn a policy for lane following in a handful of training episodes using a single monocular image as input. We provide a general and easy to obtain reward: the distance travelled by the vehicle without the safety driver taking control. We use a continuous, model-free deep reinforcement learning algorithm, with all exploration and optimisation performed on-vehicle. This demonstrates a new framework for autonomous driving which moves away from reliance on defined logical rules, map**, and direct supervision. We discuss the challenges and opportunities to scale this approach to a broader range of autonomous driving tasks. △ Less

Submitted 11 September, 2018; v1 submitted 1 July, 2018; originally announced July 2018.

Comments: Further results and demo videos can be viewed at: https://wayve.ai/blog/l2diad

arXiv:1712.01664 [pdf, other]

Learning a Generative Model for Validity in Complex Discrete Structures

Authors: David Janz, Jos van der Westhuizen, Brooks Paige, Matt J. Kusner, José Miguel Hernández-Lobato

Abstract: Deep generative models have been successfully used to learn representations for high-dimensional discrete spaces by representing discrete objects as sequences and employing powerful sequence-based deep models. Unfortunately, these sequence-based models often produce invalid sequences: sequences which do not represent any underlying discrete structure; invalid sequences hinder the utility of such m… ▽ More Deep generative models have been successfully used to learn representations for high-dimensional discrete spaces by representing discrete objects as sequences and employing powerful sequence-based deep models. Unfortunately, these sequence-based models often produce invalid sequences: sequences which do not represent any underlying discrete structure; invalid sequences hinder the utility of such models. As a step towards solving this problem, we propose to learn a deep recurrent validator model, which can estimate whether a partial sequence can function as the beginning of a full, valid sequence. This validator provides insight as to how individual sequence elements influence the validity of the overall sequence, and can be used to constrain sequence based models to generate valid sequences -- and thus faithfully model discrete objects. Our approach is inspired by reinforcement learning, where an oracle which can evaluate validity of complete sequences provides a sparse reward signal. We demonstrate its effectiveness as a generative model of Python 3 source code for mathematical expressions, and in improving the ability of a variational autoencoder trained on SMILES strings to decode valid molecular structures. △ Less

Submitted 1 November, 2018; v1 submitted 5 December, 2017; originally announced December 2017.

Comments: Conference paper at ICLR 2018. Code available online

arXiv:1708.04465 [pdf, ps, other]

Actively Learning what makes a Discrete Sequence Valid

Authors: David Janz, Jos van der Westhuizen, José Miguel Hernández-Lobato

Abstract: Deep learning techniques have been hugely successful for traditional supervised and unsupervised machine learning problems. In large part, these techniques solve continuous optimization problems. Recently however, discrete generative deep learning models have been successfully used to efficiently search high-dimensional discrete spaces. These methods work by representing discrete objects as sequen… ▽ More Deep learning techniques have been hugely successful for traditional supervised and unsupervised machine learning problems. In large part, these techniques solve continuous optimization problems. Recently however, discrete generative deep learning models have been successfully used to efficiently search high-dimensional discrete spaces. These methods work by representing discrete objects as sequences, for which powerful sequence-based deep models can be employed. Unfortunately, these techniques are significantly hindered by the fact that these generative models often produce invalid sequences. As a step towards solving this problem, we propose to learn a deep recurrent validator model. Given a partial sequence, our model learns the probability of that sequence occurring as the beginning of a full valid sequence. Thus this identifies valid versus invalid sequences and crucially it also provides insight about how individual sequence elements influence the validity of discrete objects. To learn this model we propose an approach inspired by seminal work in Bayesian active learning. On a synthetic dataset, we demonstrate the ability of our model to distinguish valid and invalid sequences. We believe this is a key step toward learning generative models that faithfully produce valid discrete objects. △ Less

Submitted 15 August, 2017; originally announced August 2017.

Comments: 6 pages, 2 figures

arXiv:1611.06863 [pdf, other]

Probabilistic structure discovery in time series data

Authors: David Janz, Brooks Paige, Tom Rainforth, Jan-Willem van de Meent, Frank Wood

Abstract: Existing methods for structure discovery in time series data construct interpretable, compositional kernels for Gaussian process regression models. While the learned Gaussian process model provides posterior mean and variance estimates, typically the structure is learned via a greedy optimization procedure. This restricts the space of possible solutions and leads to over-confident uncertainty esti… ▽ More Existing methods for structure discovery in time series data construct interpretable, compositional kernels for Gaussian process regression models. While the learned Gaussian process model provides posterior mean and variance estimates, typically the structure is learned via a greedy optimization procedure. This restricts the space of possible solutions and leads to over-confident uncertainty estimates. We introduce a fully Bayesian approach, inferring a full posterior over structures, which more reliably captures the uncertainty of the model. △ Less

Submitted 21 November, 2016; originally announced November 2016.

Showing 1–12 of 12 results for author: Janz, D