Skip to main content

Showing 1–26 of 26 results for author: Jaderberg, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2109.13800  [pdf, other

    cs.NE stat.ML

    Faster Improvement Rate Population Based Training

    Authors: Valentin Dalibard, Max Jaderberg

    Abstract: The successful training of neural networks typically involves careful and time consuming hyperparameter tuning. Population Based Training (PBT) has recently been proposed to automate this process. PBT trains a population of neural networks concurrently, frequently mutating their hyperparameters throughout their training. However, the decision mechanisms of PBT are greedy and favour short-term impr… ▽ More

    Submitted 28 September, 2021; originally announced September 2021.

    Comments: 9 pages, 5 figures

  2. arXiv:2107.12808  [pdf, other

    cs.LG cs.AI cs.MA

    Open-Ended Learning Leads to Generally Capable Agents

    Authors: Open Ended Learning Team, Adam Stooke, Anuj Mahajan, Catarina Barros, Charlie Deck, Jakob Bauer, Jakub Sygnowski, Maja Trebacz, Max Jaderberg, Michael Mathieu, Nat McAleese, Nathalie Bradley-Schmieg, Nathaniel Wong, Nicolas Porcel, Roberta Raileanu, Steph Hughes-Fitt, Valentin Dalibard, Wojciech Marian Czarnecki

    Abstract: In this work we create agents that can perform well beyond a single, individual task, that exhibit much wider generalisation of behaviour to a massive, rich space of challenges. We define a universe of tasks within an environment domain and demonstrate the ability to train agents that are generally capable across this vast space and beyond. The environment is natively multi-agent, spanning the con… ▽ More

    Submitted 31 July, 2021; v1 submitted 27 July, 2021; originally announced July 2021.

  3. arXiv:2006.15223  [pdf, other

    cs.AI cs.LG

    Perception-Prediction-Reaction Agents for Deep Reinforcement Learning

    Authors: Adam Stooke, Valentin Dalibard, Siddhant M. Jayakumar, Wojciech M. Czarnecki, Max Jaderberg

    Abstract: We introduce a new recurrent agent architecture and associated auxiliary losses which improve reinforcement learning in partially observable tasks requiring long-term memory. We employ a temporal hierarchy, using a slow-ticking recurrent core to allow information to flow more easily over long time spans, and three fast-ticking recurrent cores with connections designed to create an information asym… ▽ More

    Submitted 26 June, 2020; originally announced June 2020.

  4. arXiv:2004.09468  [pdf, other

    cs.LG stat.ML

    Real World Games Look Like Spinning Tops

    Authors: Wojciech Marian Czarnecki, Gauthier Gidel, Brendan Tracey, Karl Tuyls, Shayegan Omidshafiei, David Balduzzi, Max Jaderberg

    Abstract: This paper investigates the geometrical properties of real world games (e.g. Tic-Tac-Toe, Go, StarCraft II). We hypothesise that their geometrical structure resemble a spinning top, with the upright axis representing transitive strength, and the radial axis, which corresponds to the number of cycles that exist at a particular transitive strength, representing the non-transitive dimension. We prove… ▽ More

    Submitted 17 June, 2020; v1 submitted 20 April, 2020; originally announced April 2020.

  5. arXiv:1912.07559  [pdf, other

    cs.LG stat.ML

    A Deep Neural Network's Loss Surface Contains Every Low-dimensional Pattern

    Authors: Wojciech Marian Czarnecki, Simon Osindero, Razvan Pascanu, Max Jaderberg

    Abstract: The work "Loss Landscape Sightseeing with Multi-Point Optimization" (Skorokhodov and Burtsev, 2019) demonstrated that one can empirically find arbitrary 2D binary patterns inside loss surfaces of popular neural networks. In this paper we prove that: (i) this is a general property of deep universal approximators; and (ii) this property holds for arbitrary smooth patterns, for other dimensionalities… ▽ More

    Submitted 2 January, 2020; v1 submitted 16 December, 2019; originally announced December 2019.

  6. arXiv:1910.06764  [pdf, other

    cs.LG cs.AI stat.ML

    Stabilizing Transformers for Reinforcement Learning

    Authors: Emilio Parisotto, H. Francis Song, Jack W. Rae, Razvan Pascanu, Caglar Gulcehre, Siddhant M. Jayakumar, Max Jaderberg, Raphael Lopez Kaufman, Aidan Clark, Seb Noury, Matthew M. Botvinick, Nicolas Heess, Raia Hadsell

    Abstract: Owing to their ability to both effectively integrate information over long time horizons and scale to massive amounts of data, self-attention architectures have recently shown breakthrough success in natural language processing (NLP), achieving state-of-the-art results in domains such as language modeling and machine translation. Harnessing the transformer's ability to process long time horizons o… ▽ More

    Submitted 13 October, 2019; originally announced October 2019.

  7. arXiv:1902.02186  [pdf, other

    cs.LG cs.AI stat.ML

    Distilling Policy Distillation

    Authors: Wojciech Marian Czarnecki, Razvan Pascanu, Simon Osindero, Siddhant M. Jayakumar, Grzegorz Swirszcz, Max Jaderberg

    Abstract: The transfer of knowledge from one policy to another is an important tool in Deep Reinforcement Learning. This process, referred to as distillation, has been used to great success, for example, by enhancing the optimisation of agents, leading to stronger performance faster, on harder domains [26, 32, 5, 8]. Despite the widespread use and conceptual simplicity of distillation, many different formul… ▽ More

    Submitted 6 February, 2019; originally announced February 2019.

    Comments: Accepted at AISTATS 2019

  8. arXiv:1902.01894  [pdf, other

    cs.AI cs.DC cs.LG cs.NE

    A Generalized Framework for Population Based Training

    Authors: Ang Li, Aleksandra Spyra, Sagi Perel, Valentin Dalibard, Max Jaderberg, Chenjie Gu, David Budden, Tim Harley, Pramod Gupta

    Abstract: Population Based Training (PBT) is a recent approach that jointly optimizes neural network weights and hyperparameters which periodically copies weights of the best performers and mutates hyperparameters during training. Previous PBT implementations have been synchronized glass-box systems. We propose a general, black-box PBT framework that distributes many asynchronous "trials" (a small number of… ▽ More

    Submitted 5 February, 2019; originally announced February 2019.

    Comments: 9 pages

  9. arXiv:1901.08106  [pdf, other

    cs.LG cs.GT cs.MA stat.ML

    Open-ended Learning in Symmetric Zero-sum Games

    Authors: David Balduzzi, Marta Garnelo, Yoram Bachrach, Wojciech M. Czarnecki, Julien Perolat, Max Jaderberg, Thore Graepel

    Abstract: Zero-sum games such as chess and poker are, abstractly, functions that evaluate pairs of agents, for example labeling them `winner' and `loser'. If the game is approximately transitive, then self-play generates sequences of agents of increasing strength. However, nontransitive games, such as rock-paper-scissors, can exhibit strategic cycles, and there is no longer a clear objective -- we want agen… ▽ More

    Submitted 13 May, 2019; v1 submitted 23 January, 2019; originally announced January 2019.

    Comments: ICML 2019, final version

  10. arXiv:1807.01281  [pdf, other

    cs.LG cs.AI stat.ML

    Human-level performance in first-person multiplayer games with population-based deep reinforcement learning

    Authors: Max Jaderberg, Wojciech M. Czarnecki, Iain Dunning, Luke Marris, Guy Lever, Antonio Garcia Castaneda, Charles Beattie, Neil C. Rabinowitz, Ari S. Morcos, Avraham Ruderman, Nicolas Sonnerat, Tim Green, Louise Deason, Joel Z. Leibo, David Silver, Demis Hassabis, Koray Kavukcuoglu, Thore Graepel

    Abstract: Recent progress in artificial intelligence through reinforcement learning (RL) has shown great success on increasingly complex single-agent environments and two-player turn-based games. However, the real-world contains multiple agents, each learning and acting independently to cooperate and compete with other agents, and environments reflecting this degree of complexity remain an open challenge. I… ▽ More

    Submitted 3 July, 2018; originally announced July 2018.

  11. arXiv:1806.01780  [pdf, other

    cs.LG stat.ML

    Mix&Match - Agent Curricula for Reinforcement Learning

    Authors: Wojciech Marian Czarnecki, Siddhant M. Jayakumar, Max Jaderberg, Leonard Hasenclever, Yee Whye Teh, Simon Osindero, Nicolas Heess, Razvan Pascanu

    Abstract: We introduce Mix&Match (M&M) - a training framework designed to facilitate rapid and effective learning in RL agents, especially those that would be too slow or too challenging to train otherwise. The key innovation is a procedure that allows us to automatically form a curriculum over agents. Through such a curriculum we can progressively train more complex agents by, effectively, bootstrap** fr… ▽ More

    Submitted 5 June, 2018; originally announced June 2018.

    Comments: ICML 2018

  12. arXiv:1711.09846  [pdf, other

    cs.LG cs.NE

    Population Based Training of Neural Networks

    Authors: Max Jaderberg, Valentin Dalibard, Simon Osindero, Wojciech M. Czarnecki, Jeff Donahue, Ali Razavi, Oriol Vinyals, Tim Green, Iain Dunning, Karen Simonyan, Chrisantha Fernando, Koray Kavukcuoglu

    Abstract: Neural networks dominate the modern machine learning landscape, but their training and success still suffer from sensitivity to empirical choices of hyperparameters such as model architecture, loss function, and optimisation algorithm. In this work we present \emph{Population Based Training (PBT)}, a simple asynchronous optimisation algorithm which effectively utilises a fixed computational budget… ▽ More

    Submitted 28 November, 2017; v1 submitted 27 November, 2017; originally announced November 2017.

  13. arXiv:1706.06551  [pdf, other

    cs.CL cs.LG stat.ML

    Grounded Language Learning in a Simulated 3D World

    Authors: Karl Moritz Hermann, Felix Hill, Simon Green, Fumin Wang, Ryan Faulkner, Hubert Soyer, David Szepesvari, Wojciech Marian Czarnecki, Max Jaderberg, Denis Teplyashin, Marcus Wainwright, Chris Apps, Demis Hassabis, Phil Blunsom

    Abstract: We are increasingly surrounded by artificially intelligent technology that takes decisions and executes actions on our behalf. This creates a pressing need for general means to communicate with, instruct and guide artificial agents, with human language the most compelling means for such communication. To achieve this in a scalable fashion, agents must be able to relate language to the world and to… ▽ More

    Submitted 26 June, 2017; v1 submitted 20 June, 2017; originally announced June 2017.

    Comments: 16 pages, 8 figures

  14. arXiv:1706.05296  [pdf, other

    cs.AI

    Value-Decomposition Networks For Cooperative Multi-Agent Learning

    Authors: Peter Sunehag, Guy Lever, Audrunas Gruslys, Wojciech Marian Czarnecki, Vinicius Zambaldi, Max Jaderberg, Marc Lanctot, Nicolas Sonnerat, Joel Z. Leibo, Karl Tuyls, Thore Graepel

    Abstract: We study the problem of cooperative multi-agent reinforcement learning with a single joint reward signal. This class of learning problems is difficult because of the often large combined action and observation spaces. In the fully centralized and decentralized approaches, we find the problem of spurious rewards and a phenomenon we call the "lazy agent" problem, which arises due to partial observab… ▽ More

    Submitted 16 June, 2017; originally announced June 2017.

    ACM Class: I.2.11

  15. arXiv:1706.04859  [pdf, other

    cs.LG

    Sobolev Training for Neural Networks

    Authors: Wojciech Marian Czarnecki, Simon Osindero, Max Jaderberg, Grzegorz Świrszcz, Razvan Pascanu

    Abstract: At the heart of deep learning we aim to use neural networks as function approximators - training them to produce outputs from inputs in emulation of a ground truth function or data creation process. In many cases we only have access to input-output pairs from the ground truth, however it is becoming more common to have access to derivatives of the target output with respect to the input - for exam… ▽ More

    Submitted 26 July, 2017; v1 submitted 15 June, 2017; originally announced June 2017.

  16. arXiv:1703.01161  [pdf, other

    cs.AI

    FeUdal Networks for Hierarchical Reinforcement Learning

    Authors: Alexander Sasha Vezhnevets, Simon Osindero, Tom Schaul, Nicolas Heess, Max Jaderberg, David Silver, Koray Kavukcuoglu

    Abstract: We introduce FeUdal Networks (FuNs): a novel architecture for hierarchical reinforcement learning. Our approach is inspired by the feudal reinforcement learning proposal of Dayan and Hinton, and gains power and efficacy by decoupling end-to-end learning across multiple levels -- allowing it to utilise different resolutions of time. Our framework employs a Manager module and a Worker module. The Ma… ▽ More

    Submitted 6 March, 2017; v1 submitted 3 March, 2017; originally announced March 2017.

  17. arXiv:1703.00522  [pdf, other

    cs.LG cs.NE

    Understanding Synthetic Gradients and Decoupled Neural Interfaces

    Authors: Wojciech Marian Czarnecki, Grzegorz Świrszcz, Max Jaderberg, Simon Osindero, Oriol Vinyals, Koray Kavukcuoglu

    Abstract: When training neural networks, the use of Synthetic Gradients (SG) allows layers or modules to be trained without update locking - without waiting for a true error gradient to be backpropagated - resulting in Decoupled Neural Interfaces (DNIs). This unlocked ability of being able to update parts of a neural network asynchronously and with only local information was demonstrated to work empirically… ▽ More

    Submitted 1 March, 2017; originally announced March 2017.

  18. arXiv:1611.05397  [pdf, other

    cs.LG cs.NE

    Reinforcement Learning with Unsupervised Auxiliary Tasks

    Authors: Max Jaderberg, Volodymyr Mnih, Wojciech Marian Czarnecki, Tom Schaul, Joel Z Leibo, David Silver, Koray Kavukcuoglu

    Abstract: Deep reinforcement learning agents have achieved state-of-the-art results by directly maximising cumulative reward. However, environments contain a much wider variety of possible training signals. In this paper, we introduce an agent that also maximises many other pseudo-reward functions simultaneously by reinforcement learning. All of these tasks share a common representation that, like unsupervi… ▽ More

    Submitted 16 November, 2016; originally announced November 2016.

  19. arXiv:1608.05343  [pdf, other

    cs.LG

    Decoupled Neural Interfaces using Synthetic Gradients

    Authors: Max Jaderberg, Wojciech Marian Czarnecki, Simon Osindero, Oriol Vinyals, Alex Graves, David Silver, Koray Kavukcuoglu

    Abstract: Training directed neural networks typically requires forward-propagating data through a computation graph, followed by backpropagating error signal, to produce weight updates. All layers, or more generally, modules, of the network are therefore locked, in the sense that they must wait for the remainder of the network to execute forwards and propagate error backwards before they can be updated. In… ▽ More

    Submitted 3 July, 2017; v1 submitted 18 August, 2016; originally announced August 2016.

  20. arXiv:1607.00662  [pdf, other

    cs.CV cs.LG stat.ML

    Unsupervised Learning of 3D Structure from Images

    Authors: Danilo Jimenez Rezende, S. M. Ali Eslami, Shakir Mohamed, Peter Battaglia, Max Jaderberg, Nicolas Heess

    Abstract: A key goal of computer vision is to recover the underlying 3D structure from 2D observations of the world. In this paper we learn strong deep generative models of 3D structures, and recover these structures from 3D and 2D images via probabilistic inference. We demonstrate high-quality samples and report log-likelihoods on several datasets, including ShapeNet [2], and establish the first benchmarks… ▽ More

    Submitted 19 June, 2018; v1 submitted 3 July, 2016; originally announced July 2016.

    Comments: Appears in Advances in Neural Information Processing Systems 29 (NIPS 2016)

  21. arXiv:1606.02580  [pdf, other

    cs.NE cs.CV cs.LG

    Convolution by Evolution: Differentiable Pattern Producing Networks

    Authors: Chrisantha Fernando, Dylan Banarse, Malcolm Reynolds, Frederic Besse, David Pfau, Max Jaderberg, Marc Lanctot, Daan Wierstra

    Abstract: In this work we introduce a differentiable version of the Compositional Pattern Producing Network, called the DPPN. Unlike a standard CPPN, the topology of a DPPN is evolved but the weights are learned. A Lamarckian algorithm, that combines evolution and learning, produces DPPNs to reconstruct an image. Our main result is that DPPNs can be evolved/trained to compress the weights of a denoising aut… ▽ More

    Submitted 8 June, 2016; originally announced June 2016.

  22. arXiv:1506.02025  [pdf, other

    cs.CV

    Spatial Transformer Networks

    Authors: Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu

    Abstract: Convolutional Neural Networks define an exceptionally powerful class of models, but are still limited by the lack of ability to be spatially invariant to the input data in a computationally and parameter efficient manner. In this work we introduce a new learnable module, the Spatial Transformer, which explicitly allows the spatial manipulation of data within the network. This differentiable module… ▽ More

    Submitted 4 February, 2016; v1 submitted 5 June, 2015; originally announced June 2015.

  23. arXiv:1412.5903  [pdf, other

    cs.CV

    Deep Structured Output Learning for Unconstrained Text Recognition

    Authors: Max Jaderberg, Karen Simonyan, Andrea Vedaldi, Andrew Zisserman

    Abstract: We develop a representation suitable for the unconstrained recognition of words in natural images: the general case of no fixed lexicon and unknown length. To this end we propose a convolutional neural network (CNN) based architecture which incorporates a Conditional Random Field (CRF) graphical model, taking the whole word image as a single input. The unaries of the CRF are provided by a CNN th… ▽ More

    Submitted 10 April, 2015; v1 submitted 18 December, 2014; originally announced December 2014.

    Comments: arXiv admin note: text overlap with arXiv:1406.2227

  24. arXiv:1412.1842  [pdf, other

    cs.CV

    Reading Text in the Wild with Convolutional Neural Networks

    Authors: Max Jaderberg, Karen Simonyan, Andrea Vedaldi, Andrew Zisserman

    Abstract: In this work we present an end-to-end system for text spotting -- localising and recognising text in natural scene images -- and text based image retrieval. This system is based on a region proposal mechanism for detection and deep convolutional neural networks for recognition. Our pipeline uses a novel combination of complementary proposal generation techniques to ensure high recall, and a fast s… ▽ More

    Submitted 4 December, 2014; originally announced December 2014.

  25. arXiv:1406.2227  [pdf, other

    cs.CV

    Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition

    Authors: Max Jaderberg, Karen Simonyan, Andrea Vedaldi, Andrew Zisserman

    Abstract: In this work we present a framework for the recognition of natural scene text. Our framework does not require any human-labelled data, and performs word recognition on the whole image holistically, departing from the character based recognition systems of the past. The deep neural network models at the centre of this framework are trained solely on data produced by a synthetic text generation engi… ▽ More

    Submitted 9 December, 2014; v1 submitted 9 June, 2014; originally announced June 2014.

  26. arXiv:1405.3866  [pdf, other

    cs.CV

    Speeding up Convolutional Neural Networks with Low Rank Expansions

    Authors: Max Jaderberg, Andrea Vedaldi, Andrew Zisserman

    Abstract: The focus of this paper is speeding up the evaluation of convolutional neural networks. While delivering impressive results across a range of computer vision and machine learning tasks, these networks are computationally demanding, limiting their deployability. Convolutional layers generally consume the bulk of the processing time, and so in this work we present two simple schemes for drastically… ▽ More

    Submitted 15 May, 2014; originally announced May 2014.