Skip to main content

Showing 1–38 of 38 results for author: Schaul, T

.
  1. arXiv:2406.04268  [pdf, other

    cs.LG cs.AI

    Open-Endedness is Essential for Artificial Superhuman Intelligence

    Authors: Edward Hughes, Michael Dennis, Jack Parker-Holder, Feryal Behbahani, Aditi Mavalankar, Yuge Shi, Tom Schaul, Tim Rocktaschel

    Abstract: In recent years there has been a tremendous surge in the general capabilities of AI systems, mainly fuelled by training foundation models on internetscale data. Nevertheless, the creation of openended, ever self-improving AI remains elusive. In this position paper, we argue that the ingredients are now in place to achieve openendedness in AI systems with respect to a human observer. Furthermore, w… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  2. arXiv:2312.09187  [pdf, other

    cs.LG

    Vision-Language Models as a Source of Rewards

    Authors: Kate Baumli, Satinder Baveja, Feryal Behbahani, Harris Chan, Gheorghe Comanici, Sebastian Flennerhag, Maxime Gazeau, Kristian Holsheimer, Dan Horgan, Michael Laskin, Clare Lyle, Hussain Masoom, Kay McKinney, Volodymyr Mnih, Alexander Neitz, Fabio Pardo, Jack Parker-Holder, John Quan, Tim Rocktäschel, Himanshu Sahni, Tom Schaul, Yannick Schroecker, Stephen Spencer, Richie Steigerwald, Luyu Wang , et al. (1 additional authors not shown)

    Abstract: Building generalist agents that can accomplish many goals in rich open-ended environments is one of the research frontiers for reinforcement learning. A key limiting factor for building generalist agents with RL has been the need for a large number of reward functions for achieving different goals. We investigate the feasibility of using off-the-shelf vision-language models, or VLMs, as sources of… ▽ More

    Submitted 21 February, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

    Comments: 10 pages, 5 figures

  3. arXiv:2304.03995  [pdf, other

    cs.NE cs.LG

    Discovering Attention-Based Genetic Algorithms via Meta-Black-Box Optimization

    Authors: Robert Tjarko Lange, Tom Schaul, Yutian Chen, Chris Lu, Tom Zahavy, Valentin Dalibard, Sebastian Flennerhag

    Abstract: Genetic algorithms constitute a family of black-box optimization algorithms, which take inspiration from the principles of biological evolution. While they provide a general-purpose tool for optimization, their particular instantiations can be heuristic and motivated by loose biological intuition. In this work we explore a fundamentally different approach: Given a sufficiently flexible parametriza… ▽ More

    Submitted 8 April, 2023; originally announced April 2023.

    Comments: 14 pages, 31 figures

  4. arXiv:2302.04693  [pdf, other

    cs.LG cs.AI

    Scaling Goal-based Exploration via Pruning Proto-goals

    Authors: Akhil Bagaria, Ray Jiang, Ramana Kumar, Tom Schaul

    Abstract: One of the gnarliest challenges in reinforcement learning (RL) is exploration that scales to vast domains, where novelty-, or coverage-seeking behaviour falls short. Goal-directed, purposeful behaviours are able to overcome this, but rely on a good goal space. The core challenge in goal discovery is finding the right balance between generality (not hand-crafted) and tractability (useful, not too m… ▽ More

    Submitted 9 February, 2023; originally announced February 2023.

  5. arXiv:2211.11260  [pdf, other

    cs.NE cs.AI

    Discovering Evolution Strategies via Meta-Black-Box Optimization

    Authors: Robert Tjarko Lange, Tom Schaul, Yutian Chen, Tom Zahavy, Valentin Dallibard, Chris Lu, Satinder Singh, Sebastian Flennerhag

    Abstract: Optimizing functions without access to gradients is the remit of black-box methods such as evolution strategies. While highly general, their learning dynamics are often times heuristic and inflexible - exactly the limitations that meta-learning can address. Hence, we propose to discover effective update rules for evolution strategies via meta-learning. Concretely, our approach employs a search str… ▽ More

    Submitted 2 March, 2023; v1 submitted 21 November, 2022; originally announced November 2022.

    Comments: 25 pages, 21 figures

    Journal ref: 11th International Conference on Learning Representations, ICLR 2023

  6. arXiv:2206.00730  [pdf, other

    cs.LG cs.AI stat.ML

    The Phenomenon of Policy Churn

    Authors: Tom Schaul, André Barreto, John Quan, Georg Ostrovski

    Abstract: We identify and study the phenomenon of policy churn, that is, the rapid change of the greedy policy in value-based reinforcement learning. Policy churn operates at a surprisingly rapid pace, changing the greedy action in a large fraction of states within a handful of learning updates (in a typical deep RL set-up such as DQN on Atari). We characterise the phenomenon empirically, verifying that it… ▽ More

    Submitted 20 October, 2022; v1 submitted 1 June, 2022; originally announced June 2022.

    Comments: Published at NeurIPS 2022

    MSC Class: 68T07 ACM Class: I.2.6

  7. arXiv:2112.04153  [pdf, other

    cs.LG cs.AI

    Model-Value Inconsistency as a Signal for Epistemic Uncertainty

    Authors: Angelos Filos, Eszter Vértes, Zita Marinho, Gregory Farquhar, Diana Borsa, Abram Friesen, Feryal Behbahani, Tom Schaul, André Barreto, Simon Osindero

    Abstract: Using a model of the environment and a value function, an agent can construct many estimates of a state's value, by unrolling the model for different lengths and bootstrap** with its value function. Our key insight is that one can treat this set of value estimates as a type of ensemble, which we call an \emph{implicit value ensemble} (IVE). Consequently, the discrepancy between these estimates c… ▽ More

    Submitted 29 June, 2022; v1 submitted 8 December, 2021; originally announced December 2021.

    Comments: The first three authors contributed equally. Accepted at ICML 2022

  8. arXiv:2108.11811  [pdf, other

    cs.LG cs.AI

    When should agents explore?

    Authors: Miruna Pîslar, David Szepesvari, Georg Ostrovski, Diana Borsa, Tom Schaul

    Abstract: Exploration remains a central challenge for reinforcement learning (RL). Virtually all existing methods share the feature of a monolithic behaviour policy that changes only gradually (at best). In contrast, the exploratory behaviours of animals and humans exhibit a rich diversity, namely including forms of switching between modes. This paper presents an initial study of mode-switching, non-monolit… ▽ More

    Submitted 4 March, 2022; v1 submitted 26 August, 2021; originally announced August 2021.

    MSC Class: 68T05 ACM Class: I.2.6

  9. arXiv:2105.05347  [pdf, other

    cs.LG cs.AI stat.ML

    Return-based Scaling: Yet Another Normalisation Trick for Deep RL

    Authors: Tom Schaul, Georg Ostrovski, Iurii Kemaev, Diana Borsa

    Abstract: Scaling issues are mundane yet irritating for practitioners of reinforcement learning. Error scales vary across domains, tasks, and stages of learning; sometimes by many orders of magnitude. This can be detrimental to learning speed and stability, create interference between learning tasks, and necessitate substantial tuning. We revisit this topic for agents based on temporal-difference learning,… ▽ More

    Submitted 11 May, 2021; originally announced May 2021.

  10. arXiv:2002.11833  [pdf, other

    cs.LG cs.AI stat.ML

    Policy Evaluation Networks

    Authors: Jean Harb, Tom Schaul, Doina Precup, Pierre-Luc Bacon

    Abstract: Many reinforcement learning algorithms use value functions to guide the search for better policies. These methods estimate the value of a single policy while generalizing across many states. The core idea of this paper is to flip this convention and estimate the value of many policies, for a single set of states. This approach opens up the possibility of performing direct gradient ascent in policy… ▽ More

    Submitted 26 February, 2020; originally announced February 2020.

    Comments: 12 pages, 11 figures

  11. arXiv:1912.06910  [pdf, other

    cs.LG cs.AI stat.ML

    Adapting Behaviour for Learning Progress

    Authors: Tom Schaul, Diana Borsa, David Ding, David Szepesvari, Georg Ostrovski, Will Dabney, Simon Osindero

    Abstract: Determining what experience to generate to best facilitate learning (i.e. exploration) is one of the distinguishing features and open challenges in reinforcement learning. The advent of distributed agents that interact with parallel instances of the environment has enabled larger scales and greater flexibility, but has not removed the need to tune exploration to the task, because the ideal data fo… ▽ More

    Submitted 14 December, 2019; originally announced December 2019.

  12. arXiv:1910.07479  [pdf, other

    cs.LG stat.ML

    Conditional Importance Sampling for Off-Policy Learning

    Authors: Mark Rowland, Anna Harutyunyan, Hado van Hasselt, Diana Borsa, Tom Schaul, Rémi Munos, Will Dabney

    Abstract: The principal contribution of this paper is a conceptual framework for off-policy reinforcement learning, based on conditional expectations of importance sampling ratios. This framework yields new perspectives and understanding of existing off-policy algorithms, and reveals a broad space of unexplored algorithms. We theoretically analyse this space, and concretely investigate several algorithms th… ▽ More

    Submitted 30 July, 2020; v1 submitted 16 October, 2019; originally announced October 2019.

    Comments: AISTATS 2020 camera-ready version

  13. arXiv:1906.03139  [pdf, other

    cs.NE cs.LG stat.ML

    Non-Differentiable Supervised Learning with Evolution Strategies and Hybrid Methods

    Authors: Karel Lenc, Erich Elsen, Tom Schaul, Karen Simonyan

    Abstract: In this work we show that Evolution Strategies (ES) are a viable method for learning non-differentiable parameters of large supervised models. ES are black-box optimization algorithms that estimate distributions of model parameters; however they have only been used for relatively small problems so far. We show that it is possible to scale ES to more complex tasks and models with millions of parame… ▽ More

    Submitted 7 June, 2019; originally announced June 2019.

  14. arXiv:1904.11455  [pdf, other

    cs.LG cs.AI stat.ML

    Ray Interference: a Source of Plateaus in Deep Reinforcement Learning

    Authors: Tom Schaul, Diana Borsa, Joseph Modayil, Razvan Pascanu

    Abstract: Rather than proposing a new method, this paper investigates an issue present in existing learning algorithms. We study the learning dynamics of reinforcement learning (RL), specifically a characteristic coupling between learning and data generation that arises because RL agents control their future data distribution. In the presence of function approximation, this coupling can lead to a problemati… ▽ More

    Submitted 25 April, 2019; originally announced April 2019.

    Comments: Full version of RLDM abstract

  15. arXiv:1901.10964  [pdf, other

    cs.LG cs.AI

    Transfer in Deep Reinforcement Learning Using Successor Features and Generalised Policy Improvement

    Authors: André Barreto, Diana Borsa, John Quan, Tom Schaul, David Silver, Matteo Hessel, Daniel Mankowitz, Augustin Žídek, Rémi Munos

    Abstract: The ability to transfer skills across tasks has the potential to scale up reinforcement learning (RL) agents to environments currently out of reach. Recently, a framework based on two ideas, successor features (SFs) and generalised policy improvement (GPI), has been introduced as a principled way of transferring skills. In this paper we extend the SFs & GPI framework in two ways. One of the basic… ▽ More

    Submitted 30 January, 2019; originally announced January 2019.

    Comments: Published at ICML 2018

  16. arXiv:1812.07626  [pdf, other

    cs.LG cs.AI stat.ML

    Universal Successor Features Approximators

    Authors: Diana Borsa, André Barreto, John Quan, Daniel Mankowitz, Rémi Munos, Hado van Hasselt, David Silver, Tom Schaul

    Abstract: The ability of a reinforcement learning (RL) agent to learn about many reward functions at the same time has many potential benefits, such as the decomposition of complex tasks into simpler ones, the exchange of information between tasks, and the reuse of skills. We focus on one aspect in particular, namely the ability to generalise to unseen tasks. Parametric generalisation relies on the interpol… ▽ More

    Submitted 18 December, 2018; originally announced December 2018.

  17. arXiv:1811.07004  [pdf, ps, other

    cs.AI cs.LG

    The Barbados 2018 List of Open Issues in Continual Learning

    Authors: Tom Schaul, Hado van Hasselt, Joseph Modayil, Martha White, Adam White, Pierre-Luc Bacon, Jean Harb, Shibl Mourad, Marc Bellemare, Doina Precup

    Abstract: We want to make progress toward artificial general intelligence, namely general-purpose agents that autonomously learn how to competently act in complex environments. The purpose of this report is to sketch a research outline, share some of the most important open issues we are facing, and stimulate further discussion in the community. The content is based on some of our discussions during a week-… ▽ More

    Submitted 16 November, 2018; originally announced November 2018.

    Comments: NIPS Continual Learning Workshop 2018

  18. arXiv:1806.07917  [pdf, other

    cs.NE cs.AI cs.LG

    Meta-Learning by the Baldwin Effect

    Authors: Chrisantha Thomas Fernando, Jakub Sygnowski, Simon Osindero, Jane Wang, Tom Schaul, Denis Teplyashin, Pablo Sprechmann, Alexander Pritzel, Andrei A. Rusu

    Abstract: The scope of the Baldwin effect was recently called into question by two papers that closely examined the seminal work of Hinton and Nowlan. To this date there has been no demonstration of its necessity in empirically challenging tasks. Here we show that the Baldwin effect is capable of evolving few-shot supervised and reinforcement learning mechanisms, by sha** the hyperparameters and the initi… ▽ More

    Submitted 22 June, 2018; v1 submitted 6 June, 2018; originally announced June 2018.

  19. arXiv:1802.08294  [pdf, other

    cs.LG

    Unicorn: Continual Learning with a Universal, Off-policy Agent

    Authors: Daniel J. Mankowitz, Augustin Žídek, André Barreto, Dan Horgan, Matteo Hessel, John Quan, Junhyuk Oh, Hado van Hasselt, David Silver, Tom Schaul

    Abstract: Some real-world domains are best characterized as a single task, but for others this perspective is limiting. Instead, some tasks continually grow in complexity, in tandem with the agent's competence. In continual learning, also referred to as lifelong learning, there are no explicit task boundaries or curricula. As learning agents have become more powerful, continual learning remains one of the f… ▽ More

    Submitted 3 July, 2018; v1 submitted 22 February, 2018; originally announced February 2018.

  20. arXiv:1711.08378  [pdf

    cs.AI

    Building Machines that Learn and Think for Themselves: Commentary on Lake et al., Behavioral and Brain Sciences, 2017

    Authors: M. Botvinick, D. G. T. Barrett, P. Battaglia, N. de Freitas, D. Kumaran, J. Z Leibo, T. Lillicrap, J. Modayil, S. Mohamed, N. C. Rabinowitz, D. J. Rezende, A. Santoro, T. Schaul, C. Summerfield, G. Wayne, T. Weber, D. Wierstra, S. Legg, D. Hassabis

    Abstract: We agree with Lake and colleagues on their list of key ingredients for building humanlike intelligence, including the idea that model-based reasoning is essential. However, we favor an approach that centers on one additional ingredient: autonomy. In particular, we aim toward agents that can both build and exploit their own internal models, with minimal human hand-engineering. We believe an approac… ▽ More

    Submitted 22 November, 2017; originally announced November 2017.

  21. arXiv:1710.02298  [pdf, other

    cs.AI cs.LG

    Rainbow: Combining Improvements in Deep Reinforcement Learning

    Authors: Matteo Hessel, Joseph Modayil, Hado van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horgan, Bilal Piot, Mohammad Azar, David Silver

    Abstract: The deep reinforcement learning community has made several independent improvements to the DQN algorithm. However, it is unclear which of these extensions are complementary and can be fruitfully combined. This paper examines six extensions to the DQN algorithm and empirically studies their combination. Our experiments show that the combination provides state-of-the-art performance on the Atari 260… ▽ More

    Submitted 6 October, 2017; originally announced October 2017.

    Comments: Under review as a conference paper at AAAI 2018

  22. arXiv:1708.04782  [pdf, other

    cs.LG cs.AI

    StarCraft II: A New Challenge for Reinforcement Learning

    Authors: Oriol Vinyals, Timo Ewalds, Sergey Bartunov, Petko Georgiev, Alexander Sasha Vezhnevets, Michelle Yeo, Alireza Makhzani, Heinrich Küttler, John Agapiou, Julian Schrittwieser, John Quan, Stephen Gaffney, Stig Petersen, Karen Simonyan, Tom Schaul, Hado van Hasselt, David Silver, Timothy Lillicrap, Kevin Calderone, Paul Keet, Anthony Brunasso, David Lawrence, Anders Ekermo, Jacob Repp, Rodney Tsing

    Abstract: This paper introduces SC2LE (StarCraft II Learning Environment), a reinforcement learning environment based on the StarCraft II game. This domain poses a new grand challenge for reinforcement learning, representing a more difficult class of problems than considered in most prior work. It is a multi-agent problem with multiple players interacting; there is imperfect information due to a partially o… ▽ More

    Submitted 16 August, 2017; originally announced August 2017.

    Comments: Collaboration between DeepMind & Blizzard. 20 pages, 9 figures, 2 tables

  23. arXiv:1704.03732  [pdf, ps, other

    cs.AI cs.LG

    Deep Q-learning from Demonstrations

    Authors: Todd Hester, Matej Vecerik, Olivier Pietquin, Marc Lanctot, Tom Schaul, Bilal Piot, Dan Horgan, John Quan, Andrew Sendonaris, Gabriel Dulac-Arnold, Ian Osband, John Agapiou, Joel Z. Leibo, Audrunas Gruslys

    Abstract: Deep reinforcement learning (RL) has achieved several high profile successes in difficult decision-making problems. However, these algorithms typically require a huge amount of data before they reach reasonable performance. In fact, their performance during learning can be extremely poor. This may be acceptable for a simulator, but it severely limits the applicability of deep RL to many real-world… ▽ More

    Submitted 22 November, 2017; v1 submitted 12 April, 2017; originally announced April 2017.

    Comments: Published at AAAI 2018. Previously on arxiv as "Learning from Demonstrations for Real World Reinforcement Learning"

  24. arXiv:1703.01161  [pdf, other

    cs.AI

    FeUdal Networks for Hierarchical Reinforcement Learning

    Authors: Alexander Sasha Vezhnevets, Simon Osindero, Tom Schaul, Nicolas Heess, Max Jaderberg, David Silver, Koray Kavukcuoglu

    Abstract: We introduce FeUdal Networks (FuNs): a novel architecture for hierarchical reinforcement learning. Our approach is inspired by the feudal reinforcement learning proposal of Dayan and Hinton, and gains power and efficacy by decoupling end-to-end learning across multiple levels -- allowing it to utilise different resolutions of time. Our framework employs a Manager module and a Worker module. The Ma… ▽ More

    Submitted 6 March, 2017; v1 submitted 3 March, 2017; originally announced March 2017.

  25. arXiv:1612.08810  [pdf, other

    cs.LG cs.AI cs.NE

    The Predictron: End-To-End Learning and Planning

    Authors: David Silver, Hado van Hasselt, Matteo Hessel, Tom Schaul, Arthur Guez, Tim Harley, Gabriel Dulac-Arnold, David Reichert, Neil Rabinowitz, Andre Barreto, Thomas Degris

    Abstract: One of the key challenges of artificial intelligence is to learn models that are effective in the context of planning. In this document we introduce the predictron architecture. The predictron consists of a fully abstract model, represented by a Markov reward process, that can be rolled forward multiple "imagined" planning steps. Each forward pass of the predictron accumulates internal rewards and… ▽ More

    Submitted 20 July, 2017; v1 submitted 28 December, 2016; originally announced December 2016.

    Comments: Camera-ready version, ICML 2017, with supplement

  26. arXiv:1611.05397  [pdf, other

    cs.LG cs.NE

    Reinforcement Learning with Unsupervised Auxiliary Tasks

    Authors: Max Jaderberg, Volodymyr Mnih, Wojciech Marian Czarnecki, Tom Schaul, Joel Z Leibo, David Silver, Koray Kavukcuoglu

    Abstract: Deep reinforcement learning agents have achieved state-of-the-art results by directly maximising cumulative reward. However, environments contain a much wider variety of possible training signals. In this paper, we introduce an agent that also maximises many other pseudo-reward functions simultaneously by reinforcement learning. All of these tasks share a common representation that, like unsupervi… ▽ More

    Submitted 16 November, 2016; originally announced November 2016.

  27. arXiv:1606.05312  [pdf, other

    cs.AI

    Successor Features for Transfer in Reinforcement Learning

    Authors: André Barreto, Will Dabney, Rémi Munos, Jonathan J. Hunt, Tom Schaul, Hado van Hasselt, David Silver

    Abstract: Transfer in reinforcement learning refers to the notion that generalization should occur not only within a task but also across tasks. We propose a transfer framework for the scenario where the reward function changes between tasks but the environment's dynamics remain the same. Our approach rests on two key ideas: "successor features", a value function representation that decouples the dynamics o… ▽ More

    Submitted 12 April, 2018; v1 submitted 16 June, 2016; originally announced June 2016.

    Comments: Published at NIPS 2017

  28. arXiv:1606.04474  [pdf, other

    cs.NE cs.LG

    Learning to learn by gradient descent by gradient descent

    Authors: Marcin Andrychowicz, Misha Denil, Sergio Gomez, Matthew W. Hoffman, David Pfau, Tom Schaul, Brendan Shillingford, Nando de Freitas

    Abstract: The move from hand-designed features to learned features in machine learning has been wildly successful. In spite of this, optimization algorithms are still designed by hand. In this paper we show how the design of an optimization algorithm can be cast as a learning problem, allowing the algorithm to learn to exploit structure in the problems of interest in an automatic way. Our learned algorithms… ▽ More

    Submitted 30 November, 2016; v1 submitted 14 June, 2016; originally announced June 2016.

  29. arXiv:1606.01868  [pdf, other

    cs.AI cs.LG stat.ML

    Unifying Count-Based Exploration and Intrinsic Motivation

    Authors: Marc G. Bellemare, Sriram Srinivasan, Georg Ostrovski, Tom Schaul, David Saxton, Remi Munos

    Abstract: We consider an agent's uncertainty about its environment and the problem of generalizing this uncertainty across observations. Specifically, we focus on the problem of exploration in non-tabular reinforcement learning. Drawing inspiration from the intrinsic motivation literature, we use density models to measure uncertainty, and propose a novel algorithm for deriving a pseudo-count from an arbitra… ▽ More

    Submitted 7 November, 2016; v1 submitted 6 June, 2016; originally announced June 2016.

  30. arXiv:1511.06581  [pdf, other

    cs.LG

    Dueling Network Architectures for Deep Reinforcement Learning

    Authors: Ziyu Wang, Tom Schaul, Matteo Hessel, Hado van Hasselt, Marc Lanctot, Nando de Freitas

    Abstract: In recent years there have been many successes of using deep representations in reinforcement learning. Still, many of these applications use conventional architectures, such as convolutional networks, LSTMs, or auto-encoders. In this paper, we present a new neural network architecture for model-free reinforcement learning. Our dueling network represents two separate estimators: one for the state… ▽ More

    Submitted 5 April, 2016; v1 submitted 20 November, 2015; originally announced November 2015.

    Comments: 15 pages, 5 figures, and 5 tables

  31. arXiv:1511.05952  [pdf, other

    cs.LG

    Prioritized Experience Replay

    Authors: Tom Schaul, John Quan, Ioannis Antonoglou, David Silver

    Abstract: Experience replay lets online reinforcement learning agents remember and reuse experiences from the past. In prior work, experience transitions were uniformly sampled from a replay memory. However, this approach simply replays transitions at the same frequency that they were originally experienced, regardless of their significance. In this paper we develop a framework for prioritizing experience,… ▽ More

    Submitted 25 February, 2016; v1 submitted 18 November, 2015; originally announced November 2015.

    Comments: Published at ICLR 2016

  32. arXiv:1312.6055  [pdf, other

    cs.LG

    Unit Tests for Stochastic Optimization

    Authors: Tom Schaul, Ioannis Antonoglou, David Silver

    Abstract: Optimization by stochastic gradient descent is an important component of many large-scale machine learning algorithms. A wide variety of such optimization algorithms have been devised; however, it is unclear whether these algorithms are robust and widely applicable across many different optimization landscapes. In this paper we develop a collection of unit tests for stochastic optimization. Each u… ▽ More

    Submitted 25 February, 2014; v1 submitted 20 December, 2013; originally announced December 2013.

    Comments: Final submission to ICLR 2014 (revised according to reviews, additional results added)

  33. arXiv:1301.3764  [pdf, other

    cs.LG cs.AI stat.ML

    Adaptive learning rates and parallelization for stochastic, sparse, non-smooth gradients

    Authors: Tom Schaul, Yann LeCun

    Abstract: Recent work has established an empirically successful framework for adapting learning rates for stochastic gradient descent (SGD). This effectively removes all needs for tuning, while automatically reducing learning rates over time on stationary problems, and permitting learning rates to grow appropriately in non-stationary tasks. Here, we extend the idea in three directions, addressing proper min… ▽ More

    Submitted 27 March, 2013; v1 submitted 16 January, 2013; originally announced January 2013.

    Comments: Published at the First International Conference on Learning Representations (ICLR-2013). Public reviews are available at http://openreview.net/document/c14f2204-fd66-4d91-bed4-153523694041#c14f2204-fd66-4d91-bed4-153523694041

  34. arXiv:1209.5853  [pdf, other

    cs.AI

    Efficient Natural Evolution Strategies

    Authors: Yi Sun, Daan Wierstra, Tom Schaul, Juergen Schmidhuber

    Abstract: Efficient Natural Evolution Strategies (eNES) is a novel alternative to conventional evolutionary algorithms, using the natural gradient to adapt the mutation distribution. Unlike previous methods based on natural gradients, eNES uses a fast algorithm to calculate the inverse of the exact Fisher information matrix, thus increasing both robustness and performance of its evolution gradient estimatio… ▽ More

    Submitted 26 September, 2012; originally announced September 2012.

    Comments: Puslished in GECCO'2009

  35. arXiv:1206.1106  [pdf, other

    stat.ML cs.LG

    No More Pesky Learning Rates

    Authors: Tom Schaul, Sixin Zhang, Yann LeCun

    Abstract: The performance of stochastic gradient descent (SGD) depends critically on how learning rates are tuned and decreased over time. We propose a method to automatically adjust multiple learning rates so as to minimize the expected error at any one time. The method relies on local gradient variations across samples. In our approach, learning rates can increase as well as decrease, making it suitable f… ▽ More

    Submitted 18 February, 2013; v1 submitted 5 June, 2012; originally announced June 2012.

  36. arXiv:1109.1314  [pdf, ps, other

    cs.AI

    Measuring Intelligence through Games

    Authors: Tom Schaul, Julian Togelius, Jürgen Schmidhuber

    Abstract: Artificial general intelligence (AGI) refers to research aimed at tackling the full problem of artificial intelligence, that is, create truly intelligent agents. This sets it apart from most AI research which aims at solving relatively narrow domains, such as character recognition, motion planning, or increasing player satisfaction in games. But how do we know when an agent is truly intelligent? A… ▽ More

    Submitted 6 September, 2011; originally announced September 2011.

  37. arXiv:1106.4487  [pdf, ps, other

    stat.ML cs.NE

    Natural Evolution Strategies

    Authors: Daan Wierstra, Tom Schaul, Tobias Glasmachers, Yi Sun, Jürgen Schmidhuber

    Abstract: This paper presents Natural Evolution Strategies (NES), a recent family of algorithms that constitute a more principled approach to black-box optimization than established evolutionary algorithms. NES maintains a parameterized distribution on the set of solution candidates, and the natural gradient is used to update the distribution's parameters in the direction of higher expected fitness. We intr… ▽ More

    Submitted 22 June, 2011; originally announced June 2011.

  38. arXiv:1106.1998  [pdf, other

    cs.AI

    A Linear Time Natural Evolution Strategy for Non-Separable Functions

    Authors: Yi Sun, Faustino Gomez, Tom Schaul, Juergen Schmidhuber

    Abstract: We present a novel Natural Evolution Strategy (NES) variant, the Rank-One NES (R1-NES), which uses a low rank approximation of the search distribution covariance matrix. The algorithm allows computation of the natural gradient with cost linear in the dimensionality of the parameter space, and excels in solving high-dimensional non-separable problems, including the best result to date on the Rosenb… ▽ More

    Submitted 13 June, 2011; v1 submitted 10 June, 2011; originally announced June 2011.