Skip to main content

Showing 1–29 of 29 results for author: Hessel, M

.
  1. arXiv:2110.12840  [pdf, other

    cs.LG cs.AI stat.ML

    Self-Consistent Models and Values

    Authors: Gregory Farquhar, Kate Baumli, Zita Marinho, Angelos Filos, Matteo Hessel, Hado van Hasselt, David Silver

    Abstract: Learned models of the environment provide reinforcement learning (RL) agents with flexible ways of making predictions about the environment. In particular, models enable planning, i.e. using more computation to improve value functions or policies, without requiring additional environment interactions. In this work, we investigate a way of augmenting model-based RL, by additionally encouraging a le… ▽ More

    Submitted 25 October, 2021; originally announced October 2021.

    Comments: NeurIPS 2021

  2. arXiv:2106.11779  [pdf, other

    cs.LG stat.ML

    Emphatic Algorithms for Deep Reinforcement Learning

    Authors: Ray Jiang, Tom Zahavy, Zhongwen Xu, Adam White, Matteo Hessel, Charles Blundell, Hado van Hasselt

    Abstract: Off-policy learning allows us to learn about possible policies of behavior from experience generated by a different behavior policy. Temporal difference (TD) learning algorithms can become unstable when combined with function approximation and off-policy sampling - this is known as the ''deadly triad''. Emphatic temporal difference (ETD($λ$)) algorithm ensures convergence in the linear case by app… ▽ More

    Submitted 21 June, 2021; originally announced June 2021.

    Journal ref: Proceedings of the 38th International Conference on Machine Learning, PMLR 139, 2021

  3. arXiv:2106.07093  [pdf

    physics.optics physics.app-ph

    Switchable induced-transmission filters enabled by vanadium dioxide

    Authors: Chenghao Wan, David Woolf, Colin M. Hessel, Jad Salman, Yuzhe Xiao, Chunhui Yao, Albert Wright, Joel M. Hensley, Mikhail A. Kats

    Abstract: Abstract: An induced-transmission filter (ITF) uses an ultrathin layer of metal positioned at an electric-field node within a dielectric thin-film bandpass filter to select one transmission band while suppressing other transmission bands that would have been present without the metal layer. Here, we introduce a switchable mid-infrared ITF where the metal film can be "switched on and off", enabling… ▽ More

    Submitted 13 June, 2021; originally announced June 2021.

    Comments: Main text + supplementary

  4. arXiv:2104.06272  [pdf, other

    cs.LG

    Podracer architectures for scalable Reinforcement Learning

    Authors: Matteo Hessel, Manuel Kroiss, Aidan Clark, Iurii Kemaev, John Quan, Thomas Keck, Fabio Viola, Hado van Hasselt

    Abstract: Supporting state-of-the-art AI research requires balancing rapid prototy**, ease of use, and quick iteration, with the ability to deploy experiments at a scale traditionally associated with production systems.Deep learning frameworks such as TensorFlow, PyTorch and JAX allow users to transparently make use of accelerators, such as TPUs and GPUs, to offload the more computationally intensive part… ▽ More

    Submitted 13 April, 2021; originally announced April 2021.

  5. arXiv:2104.06159  [pdf, other

    cs.LG cs.AI

    Muesli: Combining Improvements in Policy Optimization

    Authors: Matteo Hessel, Ivo Danihelka, Fabio Viola, Arthur Guez, Simon Schmitt, Laurent Sifre, Theophane Weber, David Silver, Hado van Hasselt

    Abstract: We propose a novel policy update that combines regularized policy optimization with model learning as an auxiliary loss. The update (henceforth Muesli) matches MuZero's state-of-the-art performance on Atari. Notably, Muesli does so without using deep search: it acts directly with a policy network and has computation speed comparable to model-free baselines. The Atari results are complemented by ex… ▽ More

    Submitted 31 March, 2022; v1 submitted 13 April, 2021; originally announced April 2021.

  6. arXiv:2102.06741  [pdf, other

    cs.LG cs.AI

    Discovery of Options via Meta-Learned Subgoals

    Authors: Vivek Veeriah, Tom Zahavy, Matteo Hessel, Zhongwen Xu, Junhyuk Oh, Iurii Kemaev, Hado van Hasselt, David Silver, Satinder Singh

    Abstract: Temporal abstractions in the form of options have been shown to help reinforcement learning (RL) agents learn faster. However, despite prior work on this topic, the problem of discovering options through interaction with an environment remains a challenge. In this paper, we introduce a novel meta-gradient approach for discovering useful options in multi-task RL environments. Our approach is based… ▽ More

    Submitted 12 February, 2021; originally announced February 2021.

  7. arXiv:2007.08794  [pdf, other

    cs.LG cs.AI

    Discovering Reinforcement Learning Algorithms

    Authors: Junhyuk Oh, Matteo Hessel, Wojciech M. Czarnecki, Zhongwen Xu, Hado van Hasselt, Satinder Singh, David Silver

    Abstract: Reinforcement learning (RL) algorithms update an agent's parameters according to one of several possible rules, discovered manually through years of research. Automating the discovery of update rules from data could lead to more efficient algorithms, or algorithms that are better adapted to specific environments. Although there have been prior attempts at addressing this significant scientific cha… ▽ More

    Submitted 5 January, 2021; v1 submitted 17 July, 2020; originally announced July 2020.

  8. arXiv:2007.08433  [pdf, other

    cs.LG cs.AI stat.ML

    Meta-Gradient Reinforcement Learning with an Objective Discovered Online

    Authors: Zhongwen Xu, Hado van Hasselt, Matteo Hessel, Junhyuk Oh, Satinder Singh, David Silver

    Abstract: Deep reinforcement learning includes a broad family of algorithms that parameterise an internal representation, such as a value function or policy, by a deep neural network. Each algorithm optimises its parameters with respect to an objective, such as Q-learning or policy gradient, that defines its semantics. In this work, we propose an algorithm based on meta-gradient descent that discovers its o… ▽ More

    Submitted 16 July, 2020; originally announced July 2020.

  9. arXiv:2007.01839  [pdf, other

    cs.LG cs.AI stat.ML

    Expected Eligibility Traces

    Authors: Hado van Hasselt, Sephora Madjiheurem, Matteo Hessel, David Silver, André Barreto, Diana Borsa

    Abstract: The question of how to determine which states and actions are responsible for a certain outcome is known as the credit assignment problem and remains a central research question in reinforcement learning and artificial intelligence. Eligibility traces enable efficient credit assignment to the recent sequence of states and actions experienced by the agent, but not to counterfactual sequences that c… ▽ More

    Submitted 8 February, 2021; v1 submitted 3 July, 2020; originally announced July 2020.

    Comments: AAAI, distinguished paper award

  10. arXiv:2002.12928  [pdf, other

    stat.ML cs.LG

    A Self-Tuning Actor-Critic Algorithm

    Authors: Tom Zahavy, Zhongwen Xu, Vivek Veeriah, Matteo Hessel, Junhyuk Oh, Hado van Hasselt, David Silver, Satinder Singh

    Abstract: Reinforcement learning algorithms are highly sensitive to the choice of hyperparameters, typically requiring significant manual effort to identify hyperparameters that perform well on a new domain. In this paper, we take a step towards addressing this issue by using metagradients to automatically adapt hyperparameters online by meta-gradient descent (Xu et al., 2018). We apply our algorithm, Self-… ▽ More

    Submitted 14 April, 2021; v1 submitted 28 February, 2020; originally announced February 2020.

  11. arXiv:1912.05500  [pdf, other

    cs.AI cs.LG

    What Can Learned Intrinsic Rewards Capture?

    Authors: Zeyu Zheng, Junhyuk Oh, Matteo Hessel, Zhongwen Xu, Manuel Kroiss, Hado van Hasselt, David Silver, Satinder Singh

    Abstract: The objective of a reinforcement learning agent is to behave so as to maximise the sum of a suitable scalar function of state: the reward. These rewards are typically given and immutable. In this paper, we instead consider the proposition that the reward function itself can be a good locus of learned knowledge. To investigate this, we propose a scalable meta-gradient framework for learning useful… ▽ More

    Submitted 21 August, 2020; v1 submitted 11 December, 2019; originally announced December 2019.

    Comments: ICML 2020. The first two authors contributed equally

  12. arXiv:1909.11583  [pdf, other

    cs.LG cs.AI stat.ML

    Off-Policy Actor-Critic with Shared Experience Replay

    Authors: Simon Schmitt, Matteo Hessel, Karen Simonyan

    Abstract: We investigate the combination of actor-critic reinforcement learning algorithms with uniform large-scale experience replay and propose solutions for two challenges: (a) efficient actor-critic learning with experience replay (b) stability of off-policy learning where agents learn from other agents behaviour. We employ those insights to accelerate hyper-parameter sweeps in which all participating a… ▽ More

    Submitted 18 November, 2019; v1 submitted 25 September, 2019; originally announced September 2019.

  13. arXiv:1909.04607  [pdf, other

    cs.AI cs.LG

    Discovery of Useful Questions as Auxiliary Tasks

    Authors: Vivek Veeriah, Matteo Hessel, Zhongwen Xu, Richard Lewis, Janarthanan Rajendran, Junhyuk Oh, Hado van Hasselt, David Silver, Satinder Singh

    Abstract: Arguably, intelligent agents ought to be able to discover their own questions so that in learning answers for them they learn unanticipated useful knowledge and skills; this departs from the focus in much of machine learning on agents learning answers to externally defined questions. We present a novel method for a reinforcement learning (RL) agent to discover questions formulated as general value… ▽ More

    Submitted 10 September, 2019; originally announced September 2019.

  14. arXiv:1908.03568  [pdf, other

    cs.LG cs.AI stat.ML

    Behaviour Suite for Reinforcement Learning

    Authors: Ian Osband, Yotam Doron, Matteo Hessel, John Aslanides, Eren Sezener, Andre Saraiva, Katrina McKinney, Tor Lattimore, Csaba Szepesvari, Satinder Singh, Benjamin Van Roy, Richard Sutton, David Silver, Hado Van Hasselt

    Abstract: This paper introduces the Behaviour Suite for Reinforcement Learning, or bsuite for short. bsuite is a collection of carefully-designed experiments that investigate core capabilities of reinforcement learning (RL) agents with two objectives. First, to collect clear, informative and scalable problems that capture key issues in the design of general and efficient learning algorithms. Second, to stud… ▽ More

    Submitted 14 February, 2020; v1 submitted 9 August, 2019; originally announced August 2019.

  15. arXiv:1907.03687  [pdf, other

    cs.LG cs.AI stat.ML

    General non-linear Bellman equations

    Authors: Hado van Hasselt, John Quan, Matteo Hessel, Zhongwen Xu, Diana Borsa, Andre Barreto

    Abstract: We consider a general class of non-linear Bellman equations. These open up a design space of algorithms that have interesting properties, which has two potential advantages. First, we can perhaps better model natural phenomena. For instance, hyperbolic discounting has been proposed as a mathematical model that matches human and animal data well, and can therefore be used to explain preference orde… ▽ More

    Submitted 8 July, 2019; originally announced July 2019.

  16. arXiv:1907.02908  [pdf, other

    cs.LG cs.AI stat.ML

    On Inductive Biases in Deep Reinforcement Learning

    Authors: Matteo Hessel, Hado van Hasselt, Joseph Modayil, David Silver

    Abstract: Many deep reinforcement learning algorithms contain inductive biases that sculpt the agent's objective and its interface to the environment. These inductive biases can take many forms, including domain knowledge and pretuned hyper-parameters. In general, there is a trade-off between generality and performance when algorithms use such biases. Stronger biases can lead to faster learning, but weaker… ▽ More

    Submitted 5 July, 2019; originally announced July 2019.

  17. arXiv:1906.05243  [pdf, other

    cs.LG cs.AI stat.ML

    When to use parametric models in reinforcement learning?

    Authors: Hado van Hasselt, Matteo Hessel, John Aslanides

    Abstract: We examine the question of when and how parametric models are most useful in reinforcement learning. In particular, we look at commonalities and differences between parametric models and experience replay. Replay-based learning algorithms share important traits with model-based approaches, including the ability to plan: to use more computation without additional data to improve predictions and beh… ▽ More

    Submitted 12 June, 2019; originally announced June 2019.

    Journal ref: NeurIPS 2019

  18. arXiv:1901.10964  [pdf, other

    cs.LG cs.AI

    Transfer in Deep Reinforcement Learning Using Successor Features and Generalised Policy Improvement

    Authors: André Barreto, Diana Borsa, John Quan, Tom Schaul, David Silver, Matteo Hessel, Daniel Mankowitz, Augustin Žídek, Rémi Munos

    Abstract: The ability to transfer skills across tasks has the potential to scale up reinforcement learning (RL) agents to environments currently out of reach. Recently, a framework based on two ideas, successor features (SFs) and generalised policy improvement (GPI), has been introduced as a principled way of transferring skills. In this paper we extend the SFs & GPI framework in two ways. One of the basic… ▽ More

    Submitted 30 January, 2019; originally announced January 2019.

    Comments: Published at ICML 2018

  19. arXiv:1901.02517  [pdf

    physics.optics physics.app-ph

    Optical properties of thin-film vanadium dioxide from the visible to the far infrared

    Authors: Chenghao Wan, Zhen Zhang, David Woolf, Colin M. Hessel, Jura Rensberg, Joel M. Hensley, Yuzhe Xiao, Alireza Shahsafi, Jad Salman, Steffen Richter, Yifei Sun, M. Mumtaz Qazilbash, Rüdiger Schmidt-Grund, Carsten Ronning, Shriram Ramanathan, Mikhail A. Kats

    Abstract: The insulator-to-metal transition (IMT) in vanadium dioxide (VO2) can enable a variety of optics applications, including switching and modulation, optical limiting, and tuning of optical resonators. Despite the widespread interest in optics, the optical properties of VO2 across its IMT are scattered throughout the literature, and are not available in some wavelength regions. We characterized the c… ▽ More

    Submitted 8 January, 2019; originally announced January 2019.

    Comments: Main text + supplementary information

  20. arXiv:1812.05979  [pdf, ps, other

    cs.LG cs.CR cs.NE

    Scaling shared model governance via model splitting

    Authors: Miljan Martic, Jan Leike, Andrew Trask, Matteo Hessel, Shane Legg, Pushmeet Kohli

    Abstract: Currently the only techniques for sharing governance of a deep learning model are homomorphic encryption and secure multiparty computation. Unfortunately, neither of these techniques is applicable to the training of large neural networks due to their large computational and communication overheads. As a scalable technique for shared model governance, we propose splitting deep learning model betwee… ▽ More

    Submitted 14 December, 2018; originally announced December 2018.

    Comments: 9 pages

  21. arXiv:1812.02648  [pdf, other

    cs.AI cs.LG

    Deep Reinforcement Learning and the Deadly Triad

    Authors: Hado van Hasselt, Yotam Doron, Florian Strub, Matteo Hessel, Nicolas Sonnerat, Joseph Modayil

    Abstract: We know from reinforcement learning theory that temporal difference learning can fail in certain cases. Sutton and Barto (2018) identify a deadly triad of function approximation, bootstrap**, and off-policy learning. When these three properties are combined, learning can diverge with the value estimates becoming unbounded. However, several algorithms successfully combine these three properties,… ▽ More

    Submitted 6 December, 2018; originally announced December 2018.

  22. arXiv:1809.04474  [pdf, other

    cs.LG stat.ML

    Multi-task Deep Reinforcement Learning with PopArt

    Authors: Matteo Hessel, Hubert Soyer, Lasse Espeholt, Wojciech Czarnecki, Simon Schmitt, Hado van Hasselt

    Abstract: The reinforcement learning community has made great strides in designing algorithms capable of exceeding human performance on specific tasks. These algorithms are mostly trained one task at the time, each new task requiring to train a brand new agent instance. This means the learning algorithm is general, but each solution is not; each agent can only solve the one task it was trained on. In this w… ▽ More

    Submitted 12 September, 2018; originally announced September 2018.

  23. arXiv:1805.11593  [pdf, other

    cs.LG cs.AI stat.ML

    Observe and Look Further: Achieving Consistent Performance on Atari

    Authors: Tobias Pohlen, Bilal Piot, Todd Hester, Mohammad Gheshlaghi Azar, Dan Horgan, David Budden, Gabriel Barth-Maron, Hado van Hasselt, John Quan, Mel Večerík, Matteo Hessel, Rémi Munos, Olivier Pietquin

    Abstract: Despite significant advances in the field of deep Reinforcement Learning (RL), today's algorithms still fail to learn human-level policies consistently over a set of diverse tasks such as Atari 2600 games. We identify three key challenges that any algorithm needs to master in order to perform well on all games: processing diverse reward distributions, reasoning over long time horizons, and explori… ▽ More

    Submitted 29 May, 2018; originally announced May 2018.

  24. arXiv:1803.00933  [pdf, other

    cs.LG

    Distributed Prioritized Experience Replay

    Authors: Dan Horgan, John Quan, David Budden, Gabriel Barth-Maron, Matteo Hessel, Hado van Hasselt, David Silver

    Abstract: We propose a distributed architecture for deep reinforcement learning at scale, that enables agents to learn effectively from orders of magnitude more data than previously possible. The algorithm decouples acting from learning: the actors interact with their own instances of the environment by selecting actions according to a shared neural network, and accumulate the resulting experience in a shar… ▽ More

    Submitted 2 March, 2018; originally announced March 2018.

    Comments: Accepted to International Conference on Learning Representations 2018

  25. arXiv:1802.08294  [pdf, other

    cs.LG

    Unicorn: Continual Learning with a Universal, Off-policy Agent

    Authors: Daniel J. Mankowitz, Augustin Žídek, André Barreto, Dan Horgan, Matteo Hessel, John Quan, Junhyuk Oh, Hado van Hasselt, David Silver, Tom Schaul

    Abstract: Some real-world domains are best characterized as a single task, but for others this perspective is limiting. Instead, some tasks continually grow in complexity, in tandem with the agent's competence. In continual learning, also referred to as lifelong learning, there are no explicit task boundaries or curricula. As learning agents have become more powerful, continual learning remains one of the f… ▽ More

    Submitted 3 July, 2018; v1 submitted 22 February, 2018; originally announced February 2018.

  26. arXiv:1710.02298  [pdf, other

    cs.AI cs.LG

    Rainbow: Combining Improvements in Deep Reinforcement Learning

    Authors: Matteo Hessel, Joseph Modayil, Hado van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horgan, Bilal Piot, Mohammad Azar, David Silver

    Abstract: The deep reinforcement learning community has made several independent improvements to the DQN algorithm. However, it is unclear which of these extensions are complementary and can be fruitfully combined. This paper examines six extensions to the DQN algorithm and empirically studies their combination. Our experiments show that the combination provides state-of-the-art performance on the Atari 260… ▽ More

    Submitted 6 October, 2017; originally announced October 2017.

    Comments: Under review as a conference paper at AAAI 2018

  27. arXiv:1612.08810  [pdf, other

    cs.LG cs.AI cs.NE

    The Predictron: End-To-End Learning and Planning

    Authors: David Silver, Hado van Hasselt, Matteo Hessel, Tom Schaul, Arthur Guez, Tim Harley, Gabriel Dulac-Arnold, David Reichert, Neil Rabinowitz, Andre Barreto, Thomas Degris

    Abstract: One of the key challenges of artificial intelligence is to learn models that are effective in the context of planning. In this document we introduce the predictron architecture. The predictron consists of a fully abstract model, represented by a Markov reward process, that can be rolled forward multiple "imagined" planning steps. Each forward pass of the predictron accumulates internal rewards and… ▽ More

    Submitted 20 July, 2017; v1 submitted 28 December, 2016; originally announced December 2016.

    Comments: Camera-ready version, ICML 2017, with supplement

  28. arXiv:1602.07714  [pdf, other

    cs.LG cs.AI cs.NE stat.ML

    Learning values across many orders of magnitude

    Authors: Hado van Hasselt, Arthur Guez, Matteo Hessel, Volodymyr Mnih, David Silver

    Abstract: Most learning algorithms are not invariant to the scale of the function that is being approximated. We propose to adaptively normalize the targets used in learning. This is useful in value-based reinforcement learning, where the magnitude of appropriate value approximations can change over time when we update the policy of behavior. Our main motivation is prior work on learning to play Atari games… ▽ More

    Submitted 16 August, 2016; v1 submitted 24 February, 2016; originally announced February 2016.

    Comments: Paper accepted for publication at NIPS 2016. This version includes the appendix

  29. arXiv:1511.06581  [pdf, other

    cs.LG

    Dueling Network Architectures for Deep Reinforcement Learning

    Authors: Ziyu Wang, Tom Schaul, Matteo Hessel, Hado van Hasselt, Marc Lanctot, Nando de Freitas

    Abstract: In recent years there have been many successes of using deep representations in reinforcement learning. Still, many of these applications use conventional architectures, such as convolutional networks, LSTMs, or auto-encoders. In this paper, we present a new neural network architecture for model-free reinforcement learning. Our dueling network represents two separate estimators: one for the state… ▽ More

    Submitted 5 April, 2016; v1 submitted 20 November, 2015; originally announced November 2015.

    Comments: 15 pages, 5 figures, and 5 tables