Skip to main content

Showing 1–19 of 19 results for author: Farquhar, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2310.02782  [pdf, other

    cs.LG cs.AI

    Discovering General Reinforcement Learning Algorithms with Adversarial Environment Design

    Authors: Matthew Thomas Jackson, Minqi Jiang, Jack Parker-Holder, Risto Vuorio, Chris Lu, Gregory Farquhar, Shimon Whiteson, Jakob Nicolaus Foerster

    Abstract: The past decade has seen vast progress in deep reinforcement learning (RL) on the back of algorithms manually designed by human researchers. Recently, it has been shown that it is possible to meta-learn update rules, with the hope of discovering algorithms that can perform well on a wide range of RL tasks. Despite impressive initial results from algorithms such as Learned Policy Gradient (LPG), th… ▽ More

    Submitted 4 October, 2023; originally announced October 2023.

    Comments: Published at NeurIPS 2023

  2. arXiv:2209.11303  [pdf, other

    cs.LG

    An Investigation of the Bias-Variance Tradeoff in Meta-Gradients

    Authors: Risto Vuorio, Jacob Beck, Shimon Whiteson, Jakob Foerster, Gregory Farquhar

    Abstract: Meta-gradients provide a general approach for optimizing the meta-parameters of reinforcement learning (RL) algorithms. Estimation of meta-gradients is central to the performance of these meta-algorithms, and has been studied in the setting of MAML-style short-horizon meta-RL problems. In this context, prior work has investigated the estimation of the Hessian of the RL objective, as well as tackli… ▽ More

    Submitted 22 September, 2022; originally announced September 2022.

  3. arXiv:2112.04153  [pdf, other

    cs.LG cs.AI

    Model-Value Inconsistency as a Signal for Epistemic Uncertainty

    Authors: Angelos Filos, Eszter Vértes, Zita Marinho, Gregory Farquhar, Diana Borsa, Abram Friesen, Feryal Behbahani, Tom Schaul, André Barreto, Simon Osindero

    Abstract: Using a model of the environment and a value function, an agent can construct many estimates of a state's value, by unrolling the model for different lengths and bootstrap** with its value function. Our key insight is that one can treat this set of value estimates as a type of ensemble, which we call an \emph{implicit value ensemble} (IVE). Consequently, the discrepancy between these estimates c… ▽ More

    Submitted 29 June, 2022; v1 submitted 8 December, 2021; originally announced December 2021.

    Comments: The first three authors contributed equally. Accepted at ICML 2022

  4. arXiv:2110.12840  [pdf, other

    cs.LG cs.AI stat.ML

    Self-Consistent Models and Values

    Authors: Gregory Farquhar, Kate Baumli, Zita Marinho, Angelos Filos, Matteo Hessel, Hado van Hasselt, David Silver

    Abstract: Learned models of the environment provide reinforcement learning (RL) agents with flexible ways of making predictions about the environment. In particular, models enable planning, i.e. using more computation to improve value functions or policies, without requiring additional environment interactions. In this work, we investigate a way of augmenting model-based RL, by additionally encouraging a le… ▽ More

    Submitted 25 October, 2021; originally announced October 2021.

    Comments: NeurIPS 2021

  5. arXiv:2106.10316  [pdf, other

    cs.AI cs.LG

    Proper Value Equivalence

    Authors: Christopher Grimm, André Barreto, Gregory Farquhar, David Silver, Satinder Singh

    Abstract: One of the main challenges in model-based reinforcement learning (RL) is to decide which aspects of the environment should be modeled. The value-equivalence (VE) principle proposes a simple answer to this question: a model should capture the aspects of the environment that are relevant for value-based planning. Technically, VE distinguishes models based on a set of policies and a set of functions:… ▽ More

    Submitted 12 December, 2021; v1 submitted 18 June, 2021; originally announced June 2021.

    Journal ref: NeurIPS 2021

  6. arXiv:2102.12560  [pdf, other

    cs.LG cs.AI

    PsiPhi-Learning: Reinforcement Learning with Demonstrations using Successor Features and Inverse Temporal Difference Learning

    Authors: Angelos Filos, Clare Lyle, Yarin Gal, Sergey Levine, Natasha Jaques, Gregory Farquhar

    Abstract: We study reinforcement learning (RL) with no-reward demonstrations, a setting in which an RL agent has access to additional data from the interaction of other agents with the same environment. However, it has no access to the rewards or goals of these agents, and their objectives and levels of expertise may vary widely. These assumptions are common in multi-agent settings, such as autonomous drivi… ▽ More

    Submitted 10 June, 2021; v1 submitted 24 February, 2021; originally announced February 2021.

    Comments: The last two authors contributed equally. Accepted at ICML 2021

  7. arXiv:2006.10800  [pdf, other

    cs.LG cs.MA stat.ML

    Weighted QMIX: Expanding Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning

    Authors: Tabish Rashid, Gregory Farquhar, Bei Peng, Shimon Whiteson

    Abstract: QMIX is a popular $Q$-learning algorithm for cooperative MARL in the centralised training and decentralised execution paradigm. In order to enable easy decentralisation, QMIX restricts the joint action $Q$-values it can represent to be a monotonic mixing of each agent's utilities. However, this restriction prevents it from representing value functions in which an agent's ordering over its actions… ▽ More

    Submitted 22 October, 2020; v1 submitted 18 June, 2020; originally announced June 2020.

  8. arXiv:2006.05826  [pdf, other

    cs.LG cs.AI stat.ML

    Transient Non-Stationarity and Generalisation in Deep Reinforcement Learning

    Authors: Maximilian Igl, Gregory Farquhar, Jelena Luketina, Wendelin Boehmer, Shimon Whiteson

    Abstract: Non-stationarity can arise in Reinforcement Learning (RL) even in stationary environments. For example, most RL algorithms collect new data throughout training, using a non-stationary behaviour policy. Due to the transience of this non-stationarity, it is often not explicitly addressed in deep RL and a single neural network is continually updated. However, we find evidence that neural networks exh… ▽ More

    Submitted 22 September, 2021; v1 submitted 10 June, 2020; originally announced June 2020.

  9. arXiv:2003.08839  [pdf, other

    cs.LG cs.MA stat.ML

    Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning

    Authors: Tabish Rashid, Mikayel Samvelyan, Christian Schroeder de Witt, Gregory Farquhar, Jakob Foerster, Shimon Whiteson

    Abstract: In many real-world settings, a team of agents must coordinate its behaviour while acting in a decentralised fashion. At the same time, it is often possible to train the agents in a centralised fashion where global state information is available and communication constraints are lifted. Learning joint action-values conditioned on extra state information is an attractive way to exploit centralised l… ▽ More

    Submitted 27 August, 2020; v1 submitted 19 March, 2020; originally announced March 2020.

    Comments: Extended version of the ICML 2018 conference paper (arXiv:1803.11485)

    Journal ref: Journal of Machine Learning Research 21(178):1-51, 2020

  10. arXiv:1909.10549  [pdf, other

    cs.LG stat.ML

    Loaded DiCE: Trading off Bias and Variance in Any-Order Score Function Estimators for Reinforcement Learning

    Authors: Gregory Farquhar, Shimon Whiteson, Jakob Foerster

    Abstract: Gradient-based methods for optimisation of objectives in stochastic settings with unknown or intractable dynamics require estimators of derivatives. We derive an objective that, under automatic differentiation, produces low-variance unbiased estimators of derivatives at any order. Our objective is compatible with arbitrary advantage estimators, which allows the control of the bias and variance of… ▽ More

    Submitted 23 September, 2019; originally announced September 2019.

  11. arXiv:1906.12266  [pdf, other

    cs.LG cs.AI stat.ML

    Growing Action Spaces

    Authors: Gregory Farquhar, Laura Gustafson, Zeming Lin, Shimon Whiteson, Nicolas Usunier, Gabriel Synnaeve

    Abstract: In complex tasks, such as those with large combinatorial action spaces, random exploration may be too inefficient to achieve meaningful learning progress. In this work, we use a curriculum of progressively growing action spaces to accelerate learning. We assume the environment is out of our control, but that the agent may set an internal curriculum by initially restricting its action space. Our ap… ▽ More

    Submitted 28 June, 2019; originally announced June 2019.

  12. arXiv:1906.03926  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    A Survey of Reinforcement Learning Informed by Natural Language

    Authors: Jelena Luketina, Nantas Nardelli, Gregory Farquhar, Jakob Foerster, Jacob Andreas, Edward Grefenstette, Shimon Whiteson, Tim Rocktäschel

    Abstract: To be successful in real-world tasks, Reinforcement Learning (RL) needs to exploit the compositional, relational, and hierarchical structure of the world, and learn to transfer it to the task at hand. Recent advances in representation learning for language make it possible to build models that acquire world knowledge from text corpora and integrate this knowledge into downstream decision making pr… ▽ More

    Submitted 10 June, 2019; originally announced June 2019.

    Comments: Published at IJCAI'19

  13. arXiv:1902.04043  [pdf, other

    cs.LG cs.MA stat.ML

    The StarCraft Multi-Agent Challenge

    Authors: Mikayel Samvelyan, Tabish Rashid, Christian Schroeder de Witt, Gregory Farquhar, Nantas Nardelli, Tim G. J. Rudner, Chia-Man Hung, Philip H. S. Torr, Jakob Foerster, Shimon Whiteson

    Abstract: In the last few years, deep multi-agent reinforcement learning (RL) has become a highly active area of research. A particularly challenging class of problems in this area is partially observable, cooperative, multi-agent learning, in which teams of agents must learn to coordinate their behaviour while conditioning only on their private observations. This is an attractive research area since such p… ▽ More

    Submitted 9 December, 2019; v1 submitted 11 February, 2019; originally announced February 2019.

  14. arXiv:1810.11702  [pdf, other

    cs.MA cs.AI cs.GT cs.LG

    Multi-Agent Common Knowledge Reinforcement Learning

    Authors: Christian A. Schroeder de Witt, Jakob N. Foerster, Gregory Farquhar, Philip H. S. Torr, Wendelin Boehmer, Shimon Whiteson

    Abstract: Cooperative multi-agent reinforcement learning often requires decentralised policies, which severely limit the agents' ability to coordinate their behaviour. In this paper, we show that common knowledge between agents allows for complex decentralised coordination. Common knowledge arises naturally in a large number of decentralised cooperative multi-agent tasks, for example, when agents can recons… ▽ More

    Submitted 11 January, 2020; v1 submitted 27 October, 2018; originally announced October 2018.

    Comments: Advances in Neural Information Processing Systems, 9924-9935

  15. arXiv:1803.11485  [pdf, other

    cs.LG cs.MA stat.ML

    QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning

    Authors: Tabish Rashid, Mikayel Samvelyan, Christian Schroeder de Witt, Gregory Farquhar, Jakob Foerster, Shimon Whiteson

    Abstract: In many real-world settings, a team of agents must coordinate their behaviour while acting in a decentralised way. At the same time, it is often possible to train the agents in a centralised fashion in a simulated or laboratory setting, where global state information is available and communication constraints are lifted. Learning joint action-values conditioned on extra state information is an att… ▽ More

    Submitted 6 June, 2018; v1 submitted 30 March, 2018; originally announced March 2018.

    Comments: Camera-ready version, International Conference of Machine Learning 2018

  16. arXiv:1802.05098  [pdf, other

    cs.LG cs.AI cs.NE

    DiCE: The Infinitely Differentiable Monte-Carlo Estimator

    Authors: Jakob Foerster, Gregory Farquhar, Maruan Al-Shedivat, Tim Rocktäschel, Eric P. Xing, Shimon Whiteson

    Abstract: The score function estimator is widely used for estimating gradients of stochastic objectives in stochastic computation graphs (SCG), eg, in reinforcement learning and meta-learning. While deriving the first-order gradient estimators by differentiating a surrogate loss (SL) objective is computationally and conceptually simple, using the same approach for higher-order derivatives is more challengin… ▽ More

    Submitted 19 September, 2018; v1 submitted 14 February, 2018; originally announced February 2018.

  17. arXiv:1710.11417  [pdf, other

    cs.AI cs.LG cs.NE stat.ML

    TreeQN and ATreeC: Differentiable Tree-Structured Models for Deep Reinforcement Learning

    Authors: Gregory Farquhar, Tim Rocktäschel, Maximilian Igl, Shimon Whiteson

    Abstract: Combining deep model-free reinforcement learning with on-line planning is a promising approach to building on the successes of deep RL. On-line planning with look-ahead trees has proven successful in environments where transition models are known a priori. However, in complex environments where transition models need to be learned from data, the deficiencies of learned models have limited their ut… ▽ More

    Submitted 8 March, 2018; v1 submitted 31 October, 2017; originally announced October 2017.

  18. arXiv:1705.08926  [pdf, other

    cs.AI cs.MA

    Counterfactual Multi-Agent Policy Gradients

    Authors: Jakob Foerster, Gregory Farquhar, Triantafyllos Afouras, Nantas Nardelli, Shimon Whiteson

    Abstract: Cooperative multi-agent systems can be naturally used to model many real world problems, such as network packet routing and the coordination of autonomous vehicles. There is a great need for new reinforcement learning methods that can efficiently learn decentralised policies for such systems. To this end, we propose a new multi-agent actor-critic method called counterfactual multi-agent (COMA) pol… ▽ More

    Submitted 14 December, 2017; v1 submitted 24 May, 2017; originally announced May 2017.

  19. arXiv:1702.08887  [pdf, other

    cs.AI cs.LG cs.MA

    Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning

    Authors: Jakob Foerster, Nantas Nardelli, Gregory Farquhar, Triantafyllos Afouras, Philip H. S. Torr, Pushmeet Kohli, Shimon Whiteson

    Abstract: Many real-world problems, such as network packet routing and urban traffic control, are naturally modeled as multi-agent reinforcement learning (RL) problems. However, existing multi-agent RL methods typically scale poorly in the problem size. Therefore, a key challenge is to translate the success of deep learning on single-agent RL to the multi-agent setting. A major stumbling block is that indep… ▽ More

    Submitted 21 May, 2018; v1 submitted 28 February, 2017; originally announced February 2017.

    Comments: Camera-ready version, International Conference of Machine Learning 2017; updated to fix print-breaking image