Skip to main content

Showing 1–26 of 26 results for author: Osband, I

Searching in archive stat. Search in all archives.
.
  1. arXiv:2206.03633  [pdf, other

    cs.LG cs.AI stat.ML

    Ensembles for Uncertainty Estimation: Benefits of Prior Functions and Bootstrap**

    Authors: Vikranth Dwaracherla, Zheng Wen, Ian Osband, Xiuyuan Lu, Seyed Mohammad Asghari, Benjamin Van Roy

    Abstract: In machine learning, an agent needs to estimate uncertainty to efficiently explore and adapt and to make effective decisions. A common approach to uncertainty estimation maintains an ensemble of models. In recent years, several approaches have been proposed for training ensembles, and conflicting views prevail with regards to the importance of various ingredients of these approaches. In this paper… ▽ More

    Submitted 7 June, 2022; originally announced June 2022.

  2. arXiv:2202.13509  [pdf, other

    stat.ML cs.AI cs.LG

    Evaluating High-Order Predictive Distributions in Deep Learning

    Authors: Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla, Xiuyuan Lu, Benjamin Van Roy

    Abstract: Most work on supervised learning research has focused on marginal predictions. In decision problems, joint predictive distributions are essential for good performance. Previous work has developed methods for assessing low-order predictive distributions with inputs sampled i.i.d. from the testing distribution. With low-dimensional inputs, these methods distinguish agents that effectively estimate u… ▽ More

    Submitted 27 February, 2022; originally announced February 2022.

  3. arXiv:2110.04629  [pdf, other

    cs.LG cs.AI stat.ML

    The Neural Testbed: Evaluating Joint Predictions

    Authors: Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla, Botao Hao, Morteza Ibrahimi, Dieterich Lawson, Xiuyuan Lu, Brendan O'Donoghue, Benjamin Van Roy

    Abstract: Predictive distributions quantify uncertainties ignored by point estimates. This paper introduces The Neural Testbed: an open-source benchmark for controlled and principled evaluation of agents that generate such predictions. Crucially, the testbed assesses agents not only on the quality of their marginal predictions per input, but also on their joint predictions across many inputs. We evaluate a… ▽ More

    Submitted 1 November, 2022; v1 submitted 9 October, 2021; originally announced October 2021.

  4. arXiv:2107.09224  [pdf, ps, other

    cs.LG stat.ML

    From Predictions to Decisions: The Importance of Joint Predictive Distributions

    Authors: Zheng Wen, Ian Osband, Chao Qin, Xiuyuan Lu, Morteza Ibrahimi, Vikranth Dwaracherla, Mohammad Asghari, Benjamin Van Roy

    Abstract: A fundamental challenge for any intelligent system is prediction: given some inputs, can you predict corresponding outcomes? Most work on supervised learning has focused on producing accurate marginal predictions for each input. However, we show that for a broad class of decision problems, accurate joint predictions are required to deliver good performance. In particular, we establish several resu… ▽ More

    Submitted 23 May, 2022; v1 submitted 19 July, 2021; originally announced July 2021.

  5. arXiv:2107.08924  [pdf, other

    cs.LG cs.AI stat.ML

    Epistemic Neural Networks

    Authors: Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla, Morteza Ibrahimi, Xiuyuan Lu, Benjamin Van Roy

    Abstract: Intelligence relies on an agent's knowledge of what it does not know. This capability can be assessed based on the quality of joint predictions of labels across multiple inputs. In principle, ensemble-based approaches produce effective joint predictions, but the computational costs of training large ensembles can become prohibitive. We introduce the epinet: an architecture that can supplement any… ▽ More

    Submitted 17 May, 2023; v1 submitted 19 July, 2021; originally announced July 2021.

  6. arXiv:2006.07464  [pdf, other

    cs.LG math.OC stat.ML

    Hypermodels for Exploration

    Authors: Vikranth Dwaracherla, Xiuyuan Lu, Morteza Ibrahimi, Ian Osband, Zheng Wen, Benjamin Van Roy

    Abstract: We study the use of hypermodels to represent epistemic uncertainty and guide exploration. This generalizes and extends the use of ensembles to approximate Thompson sampling. The computational cost of training an ensemble grows with its size, and as such, prior work has typically been limited to ensembles with tens of elements. We show that alternative hypermodels can enjoy dramatic efficiency gain… ▽ More

    Submitted 12 June, 2020; originally announced June 2020.

    Comments: Published as a conference paper at ICLR 2020

  7. arXiv:2006.05145  [pdf, other

    cs.LG stat.CO stat.ML

    Matrix games with bandit feedback

    Authors: Brendan O'Donoghue, Tor Lattimore, Ian Osband

    Abstract: We study a version of the classical zero-sum matrix game with unknown payoff matrix and bandit feedback, where the players only observe each others actions and a noisy payoff. This generalizes the usual matrix game, where the payoff matrix is known to the players. Despite numerous applications, this problem has received relatively little attention. Although adversarial bandit algorithms achieve lo… ▽ More

    Submitted 12 June, 2021; v1 submitted 9 June, 2020; originally announced June 2020.

  8. arXiv:1908.03568  [pdf, other

    cs.LG cs.AI stat.ML

    Behaviour Suite for Reinforcement Learning

    Authors: Ian Osband, Yotam Doron, Matteo Hessel, John Aslanides, Eren Sezener, Andre Saraiva, Katrina McKinney, Tor Lattimore, Csaba Szepesvari, Satinder Singh, Benjamin Van Roy, Richard Sutton, David Silver, Hado Van Hasselt

    Abstract: This paper introduces the Behaviour Suite for Reinforcement Learning, or bsuite for short. bsuite is a collection of carefully-designed experiments that investigate core capabilities of reinforcement learning (RL) agents with two objectives. First, to collect clear, informative and scalable problems that capture key issues in the design of general and efficient learning algorithms. Second, to stud… ▽ More

    Submitted 14 February, 2020; v1 submitted 9 August, 2019; originally announced August 2019.

  9. arXiv:1905.03030  [pdf, other

    cs.LG cs.AI stat.ML

    Meta-learning of Sequential Strategies

    Authors: Pedro A. Ortega, Jane X. Wang, Mark Rowland, Tim Genewein, Zeb Kurth-Nelson, Razvan Pascanu, Nicolas Heess, Joel Veness, Alex Pritzel, Pablo Sprechmann, Siddhant M. Jayakumar, Tom McGrath, Kevin Miller, Mohammad Azar, Ian Osband, Neil Rabinowitz, András György, Silvia Chiappa, Simon Osindero, Yee Whye Teh, Hado van Hasselt, Nando de Freitas, Matthew Botvinick, Shane Legg

    Abstract: In this report we review memory-based meta-learning as a tool for building sample-efficient strategies that learn from past experience to adapt to any task within a target class. Our goal is to equip the reader with the conceptual foundations of this tool for building new, scalable agents that operate on broad domains. To do so, we present basic algorithmic templates for building near-optimal pred… ▽ More

    Submitted 18 July, 2019; v1 submitted 8 May, 2019; originally announced May 2019.

    Comments: DeepMind Technical Report (15 pages, 6 figures). Version V1.1

  10. arXiv:1806.03335  [pdf, other

    stat.ML cs.AI cs.LG

    Randomized Prior Functions for Deep Reinforcement Learning

    Authors: Ian Osband, John Aslanides, Albin Cassirer

    Abstract: Dealing with uncertainty is essential for efficient reinforcement learning. There is a growing literature on uncertainty estimation for deep learning from fixed datasets, but many of the most popular approaches are poorly-suited to sequential decision problems. Other methods, such as bootstrap sampling, have no mechanism for uncertainty that does not come from the observed data. We highlight why t… ▽ More

    Submitted 15 November, 2018; v1 submitted 8 June, 2018; originally announced June 2018.

  11. arXiv:1805.08948  [pdf, other

    cs.LG cs.AI stat.ML

    Scalable Coordinated Exploration in Concurrent Reinforcement Learning

    Authors: Maria Dimakopoulou, Ian Osband, Benjamin Van Roy

    Abstract: We consider a team of reinforcement learning agents that concurrently operate in a common environment, and we develop an approach to efficient coordinated exploration that is suitable for problems of practical scale. Our approach builds on seed sampling (Dimakopoulou and Van Roy, 2018) and randomized value function learning (Osband et al., 2016). We demonstrate that, for simple tabular contexts, t… ▽ More

    Submitted 16 December, 2018; v1 submitted 22 May, 2018; originally announced May 2018.

    Comments: NIPS 2018

  12. arXiv:1709.05380  [pdf, other

    cs.AI cs.LG math.OC stat.ML

    The Uncertainty Bellman Equation and Exploration

    Authors: Brendan O'Donoghue, Ian Osband, Remi Munos, Volodymyr Mnih

    Abstract: We consider the exploration/exploitation problem in reinforcement learning. For exploitation, it is well known that the Bellman equation connects the value at any time-step to the expected value at subsequent time-steps. In this paper we consider a similar \textit{uncertainty} Bellman equation (UBE), which connects the uncertainty at any time-step to the expected uncertainties at subsequent time-s… ▽ More

    Submitted 22 October, 2018; v1 submitted 15 September, 2017; originally announced September 2017.

  13. arXiv:1706.10295  [pdf, other

    cs.LG stat.ML

    Noisy Networks for Exploration

    Authors: Meire Fortunato, Mohammad Gheshlaghi Azar, Bilal Piot, Jacob Menick, Ian Osband, Alex Graves, Vlad Mnih, Remi Munos, Demis Hassabis, Olivier Pietquin, Charles Blundell, Shane Legg

    Abstract: We introduce NoisyNet, a deep reinforcement learning agent with parametric noise added to its weights, and show that the induced stochasticity of the agent's policy can be used to aid efficient exploration. The parameters of the noise are learned with gradient descent along with the remaining network weights. NoisyNet is straightforward to implement and adds little computational overhead. We find… ▽ More

    Submitted 9 July, 2019; v1 submitted 30 June, 2017; originally announced June 2017.

    Comments: ICLR 2018

  14. arXiv:1706.04241  [pdf, other

    stat.ML cs.LG

    On Optimistic versus Randomized Exploration in Reinforcement Learning

    Authors: Ian Osband, Benjamin Van Roy

    Abstract: We discuss the relative merits of optimistic and randomized approaches to exploration in reinforcement learning. Optimistic approaches presented in the literature apply an optimistic boost to the value estimate at each state-action pair and select actions that are greedy with respect to the resulting optimistic value function. Randomized approaches sample from among statistically plausible value f… ▽ More

    Submitted 13 June, 2017; originally announced June 2017.

    Comments: Extended abstract for RLDM 2017

  15. arXiv:1703.07608  [pdf, other

    stat.ML cs.AI cs.LG

    Deep Exploration via Randomized Value Functions

    Authors: Ian Osband, Benjamin Van Roy, Daniel Russo, Zheng Wen

    Abstract: We study the use of randomized value functions to guide deep exploration in reinforcement learning. This offers an elegant means for synthesizing statistically and computationally efficient exploration with common practical approaches to value function learning. We present several reinforcement learning algorithms that leverage randomized value functions and demonstrate their efficacy through comp… ▽ More

    Submitted 23 September, 2019; v1 submitted 22 March, 2017; originally announced March 2017.

    Comments: Accepted for publication in Journal of Machine Learning Research 2019

  16. arXiv:1703.05449  [pdf, ps, other

    stat.ML cs.AI cs.LG

    Minimax Regret Bounds for Reinforcement Learning

    Authors: Mohammad Gheshlaghi Azar, Ian Osband, Rémi Munos

    Abstract: We consider the problem of provably optimal exploration in reinforcement learning for finite horizon MDPs. We show that an optimistic modification to value iteration achieves a regret bound of $\tilde{O}( \sqrt{HSAT} + H^2S^2A+H\sqrt{T})$ where $H$ is the time horizon, $S$ the number of states, $A$ the number of actions and $T$ the number of time-steps. This result improves over the best previous… ▽ More

    Submitted 1 July, 2017; v1 submitted 15 March, 2017; originally announced March 2017.

  17. arXiv:1702.04126  [pdf, other

    stat.ML cs.LG math.PR

    Gaussian-Dirichlet Posterior Dominance in Sequential Learning

    Authors: Ian Osband, Benjamin Van Roy

    Abstract: We consider the problem of sequential learning from categorical observations bounded in [0,1]. We establish an ordering between the Dirichlet posterior over categorical outcomes and a Gaussian posterior under observations with N(0,1) noise. We establish that, conditioned upon identical data with at least two observations, the posterior mean of the categorical distribution will always second-order… ▽ More

    Submitted 9 February, 2018; v1 submitted 14 February, 2017; originally announced February 2017.

  18. arXiv:1608.02732  [pdf, other

    stat.ML cs.LG

    On Lower Bounds for Regret in Reinforcement Learning

    Authors: Ian Osband, Benjamin Van Roy

    Abstract: This is a brief technical note to clarify the state of lower bounds on regret for reinforcement learning. In particular, this paper: - Reproduces a lower bound on regret for reinforcement learning, similar to the result of Theorem 5 in the journal UCRL2 paper (Jaksch et al 2010). - Clarifies that the proposed proof of Theorem 6 in the REGAL paper (Bartlett and Tewari 2009) does not hold using… ▽ More

    Submitted 9 August, 2016; originally announced August 2016.

  19. arXiv:1608.02731  [pdf, ps, other

    stat.ML cs.LG

    Posterior Sampling for Reinforcement Learning Without Episodes

    Authors: Ian Osband, Benjamin Van Roy

    Abstract: This is a brief technical note to clarify some of the issues with applying the application of the algorithm posterior sampling for reinforcement learning (PSRL) in environments without fixed episodes. In particular, this paper aims to: - Review some of results which have been proven for finite horizon MDPs (Osband et al 2013, 2014a, 2014b, 2016) and also for MDPs with finite ergodic structure (G… ▽ More

    Submitted 9 August, 2016; originally announced August 2016.

  20. arXiv:1607.00215  [pdf, other

    stat.ML cs.AI cs.LG

    Why is Posterior Sampling Better than Optimism for Reinforcement Learning?

    Authors: Ian Osband, Benjamin Van Roy

    Abstract: Computational results demonstrate that posterior sampling for reinforcement learning (PSRL) dramatically outperforms algorithms driven by optimism, such as UCRL2. We provide insight into the extent of this performance boost and the phenomenon that drives it. We leverage this insight to establish an $\tilde{O}(H\sqrt{SAT})$ Bayesian expected regret bound for PSRL in finite-horizon episodic Markov d… ▽ More

    Submitted 13 June, 2017; v1 submitted 1 July, 2016; originally announced July 2016.

  21. arXiv:1602.04621  [pdf, other

    cs.LG cs.AI eess.SY stat.ML

    Deep Exploration via Bootstrapped DQN

    Authors: Ian Osband, Charles Blundell, Alexander Pritzel, Benjamin Van Roy

    Abstract: Efficient exploration in complex environments remains a major challenge for reinforcement learning. We propose bootstrapped DQN, a simple algorithm that explores in a computationally and statistically efficient manner through use of randomized value functions. Unlike dithering strategies such as epsilon-greedy exploration, bootstrapped DQN carries out temporally-extended (or deep) exploration; thi… ▽ More

    Submitted 4 July, 2016; v1 submitted 15 February, 2016; originally announced February 2016.

  22. arXiv:1507.00300  [pdf, other

    stat.ML cs.LG

    Bootstrapped Thompson Sampling and Deep Exploration

    Authors: Ian Osband, Benjamin Van Roy

    Abstract: This technical note presents a new approach to carrying out the kind of exploration achieved by Thompson sampling, but without explicitly maintaining or sampling from posterior distributions. The approach is based on a bootstrap technique that uses a combination of observed and artificially generated data. The latter serves to induce a prior distribution which, as we will demonstrate, is critical… ▽ More

    Submitted 1 July, 2015; originally announced July 2015.

  23. arXiv:1406.1853  [pdf, ps, other

    stat.ML cs.LG

    Model-based Reinforcement Learning and the Eluder Dimension

    Authors: Ian Osband, Benjamin Van Roy

    Abstract: We consider the problem of learning to optimize an unknown Markov decision process (MDP). We show that, if the MDP can be parameterized within some known function class, we can obtain regret bounds that scale with the dimensionality, rather than cardinality, of the system. We characterize this dependence explicitly as $\tilde{O}(\sqrt{d_K d_E T})$ where $T$ is time elapsed, $d_K$ is the Kolmogorov… ▽ More

    Submitted 31 October, 2014; v1 submitted 6 June, 2014; originally announced June 2014.

  24. arXiv:1403.3741  [pdf, ps, other

    stat.ML cs.LG

    Near-optimal Reinforcement Learning in Factored MDPs

    Authors: Ian Osband, Benjamin Van Roy

    Abstract: Any reinforcement learning algorithm that applies to all Markov decision processes (MDPs) will suffer $Ω(\sqrt{SAT})$ regret on some MDP, where $T$ is the elapsed time and $S$ and $A$ are the cardinalities of the state and action spaces. This implies $T = Ω(SA)$ time to guarantee a near-optimal policy. In many settings of practical interest, due to the curse of dimensionality, $S$ and $A$ can be s… ▽ More

    Submitted 31 October, 2014; v1 submitted 14 March, 2014; originally announced March 2014.

  25. arXiv:1402.0635  [pdf, other

    stat.ML cs.AI cs.LG eess.SY

    Generalization and Exploration via Randomized Value Functions

    Authors: Ian Osband, Benjamin Van Roy, Zheng Wen

    Abstract: We propose randomized least-squares value iteration (RLSVI) -- a new reinforcement learning algorithm designed to explore and generalize efficiently via linearly parameterized value functions. We explain why versions of least-squares value iteration that use Boltzmann or epsilon-greedy exploration can be highly inefficient, and we present computational results that demonstrate dramatic efficiency… ▽ More

    Submitted 15 February, 2016; v1 submitted 4 February, 2014; originally announced February 2014.

    Comments: arXiv admin note: text overlap with arXiv:1307.4847

  26. arXiv:1306.0940  [pdf, other

    stat.ML cs.LG

    (More) Efficient Reinforcement Learning via Posterior Sampling

    Authors: Ian Osband, Daniel Russo, Benjamin Van Roy

    Abstract: Most provably-efficient learning algorithms introduce optimism about poorly-understood states and actions to encourage exploration. We study an alternative approach for efficient exploration, posterior sampling for reinforcement learning (PSRL). This algorithm proceeds in repeated episodes of known duration. At the start of each episode, PSRL updates a prior distribution over Markov decision proce… ▽ More

    Submitted 26 December, 2013; v1 submitted 4 June, 2013; originally announced June 2013.

    Comments: 10 pages