Skip to main content

Showing 1–11 of 11 results for author: Muldal, A

.
  1. arXiv:2211.11602  [pdf, other

    cs.LG cs.HC cs.MA

    Improving Multimodal Interactive Agents with Reinforcement Learning from Human Feedback

    Authors: Josh Abramson, Arun Ahuja, Federico Carnevale, Petko Georgiev, Alex Goldin, Alden Hung, Jessica Landon, Jirka Lhotka, Timothy Lillicrap, Alistair Muldal, George Powell, Adam Santoro, Guy Scully, Sanjana Srivastava, Tamara von Glehn, Greg Wayne, Nathaniel Wong, Chen Yan, Rui Zhu

    Abstract: An important goal in artificial intelligence is to create agents that can both interact naturally with humans and learn from their feedback. Here we demonstrate how to use reinforcement learning from human feedback (RLHF) to improve upon simulated, embodied agents trained to a base level of competency with imitation learning. First, we collected data of humans interacting with agents in a simulate… ▽ More

    Submitted 21 November, 2022; originally announced November 2022.

  2. arXiv:2206.03139  [pdf, other

    cs.LG cs.AI cs.CL

    Intra-agent speech permits zero-shot task acquisition

    Authors: Chen Yan, Federico Carnevale, Petko Georgiev, Adam Santoro, Aurelia Guy, Alistair Muldal, Chia-Chun Hung, Josh Abramson, Timothy Lillicrap, Gregory Wayne

    Abstract: Human language learners are exposed to a trickle of informative, context-sensitive language, but a flood of raw sensory data. Through both social language use and internal processes of rehearsal and practice, language learners are able to build high-level, semantic representations that explain their perceptions. Here, we take inspiration from such processes of "inner speech" in humans (Vygotsky, 1… ▽ More

    Submitted 7 June, 2022; originally announced June 2022.

  3. arXiv:2205.13274  [pdf, other

    cs.LG cs.AI

    Evaluating Multimodal Interactive Agents

    Authors: Josh Abramson, Arun Ahuja, Federico Carnevale, Petko Georgiev, Alex Goldin, Alden Hung, Jessica Landon, Timothy Lillicrap, Alistair Muldal, Blake Richards, Adam Santoro, Tamara von Glehn, Greg Wayne, Nathaniel Wong, Chen Yan

    Abstract: Creating agents that can interact naturally with humans is a common goal in artificial intelligence (AI) research. However, evaluating these interactions is challenging: collecting online human-agent interactions is slow and expensive, yet faster proxy metrics often do not correlate well with interactive evaluation. In this paper, we assess the merits of these existing evaluation metrics and prese… ▽ More

    Submitted 14 July, 2022; v1 submitted 26 May, 2022; originally announced May 2022.

  4. arXiv:2202.08137  [pdf, other

    cs.LG

    A data-driven approach for learning to control computers

    Authors: Peter C Humphreys, David Raposo, Toby Pohlen, Gregory Thornton, Rachita Chhaparia, Alistair Muldal, Josh Abramson, Petko Georgiev, Alex Goldin, Adam Santoro, Timothy Lillicrap

    Abstract: It would be useful for machines to use computers as humans do so that they can aid us in everyday tasks. This is a setting in which there is also the potential to leverage large-scale expert demonstrations and human judgements of interactive behaviour, which are two ingredients that have driven much recent success in AI. Here we investigate the setting of computer control using keyboard and mouse,… ▽ More

    Submitted 11 November, 2022; v1 submitted 16 February, 2022; originally announced February 2022.

    Journal ref: Proceedings of the 39th International Conference on Machine Learning, Baltimore, Maryland, USA, PMLR 162, 2022

  5. arXiv:2112.03763  [pdf, other

    cs.LG

    Creating Multimodal Interactive Agents with Imitation and Self-Supervised Learning

    Authors: DeepMind Interactive Agents Team, Josh Abramson, Arun Ahuja, Arthur Brussee, Federico Carnevale, Mary Cassin, Felix Fischer, Petko Georgiev, Alex Goldin, Mansi Gupta, Tim Harley, Felix Hill, Peter C Humphreys, Alden Hung, Jessica Landon, Timothy Lillicrap, Hamza Merzic, Alistair Muldal, Adam Santoro, Guy Scully, Tamara von Glehn, Greg Wayne, Nathaniel Wong, Chen Yan, Rui Zhu

    Abstract: A common vision from science fiction is that robots will one day inhabit our physical spaces, sense the world as we do, assist our physical labours, and communicate with us through natural language. Here we study how to design artificial agents that can interact naturally with humans using the simplification of a virtual environment. We show that imitation learning of human-human interactions in a… ▽ More

    Submitted 2 February, 2022; v1 submitted 7 December, 2021; originally announced December 2021.

  6. arXiv:2012.05672  [pdf, other

    cs.LG cs.AI cs.MA

    Imitating Interactive Intelligence

    Authors: Josh Abramson, Arun Ahuja, Iain Barr, Arthur Brussee, Federico Carnevale, Mary Cassin, Rachita Chhaparia, Stephen Clark, Bogdan Damoc, Andrew Dudzik, Petko Georgiev, Aurelia Guy, Tim Harley, Felix Hill, Alden Hung, Zachary Kenton, Jessica Landon, Timothy Lillicrap, Kory Mathewson, Soňa Mokrá, Alistair Muldal, Adam Santoro, Nikolay Savinov, Vikrant Varma, Greg Wayne , et al. (4 additional authors not shown)

    Abstract: A common vision from science fiction is that robots will one day inhabit our physical spaces, sense the world as we do, assist our physical labours, and communicate with us through natural language. Here we study how to design artificial agents that can interact naturally with humans using the simplification of a virtual environment. This setting nevertheless integrates a number of the central cha… ▽ More

    Submitted 20 January, 2021; v1 submitted 10 December, 2020; originally announced December 2020.

  7. arXiv:2009.05524  [pdf, other

    cs.AI cs.LG

    Physically Embedded Planning Problems: New Challenges for Reinforcement Learning

    Authors: Mehdi Mirza, Andrew Jaegle, Jonathan J. Hunt, Arthur Guez, Saran Tunyasuvunakool, Alistair Muldal, Théophane Weber, Peter Karkus, Sébastien Racanière, Lars Buesing, Timothy Lillicrap, Nicolas Heess

    Abstract: Recent work in deep reinforcement learning (RL) has produced algorithms capable of mastering challenging games such as Go, chess, or shogi. In these works the RL agent directly observes the natural state of the game and controls that state directly with its actions. However, when humans play such games, they do not just reason about the moves but also interact with their physical environment. They… ▽ More

    Submitted 29 October, 2020; v1 submitted 11 September, 2020; originally announced September 2020.

    Comments: 17 pages + appendix. Updated text and references

  8. dm_control: Software and Tasks for Continuous Control

    Authors: Yuval Tassa, Saran Tunyasuvunakool, Alistair Muldal, Yotam Doron, Piotr Trochim, Siqi Liu, Steven Bohez, Josh Merel, Tom Erez, Timothy Lillicrap, Nicolas Heess

    Abstract: The dm_control software package is a collection of Python libraries and task suites for reinforcement learning agents in an articulated-body simulation. A MuJoCo wrapper provides convenient bindings to functions and data structures. The PyMJCF and Composer libraries enable procedural model manipulation and task authoring. The Control Suite is a fixed set of tasks with standardised structure, inten… ▽ More

    Submitted 7 September, 2020; v1 submitted 22 June, 2020; originally announced June 2020.

    Comments: arXiv admin note: text overlap with arXiv:1801.00690

  9. arXiv:1804.08617  [pdf, other

    cs.LG cs.AI stat.ML

    Distributed Distributional Deterministic Policy Gradients

    Authors: Gabriel Barth-Maron, Matthew W. Hoffman, David Budden, Will Dabney, Dan Horgan, Dhruva TB, Alistair Muldal, Nicolas Heess, Timothy Lillicrap

    Abstract: This work adopts the very successful distributional perspective on reinforcement learning and adapts it to the continuous control setting. We combine this within a distributed framework for off-policy learning in order to develop what we call the Distributed Distributional Deep Deterministic Policy Gradient algorithm, D4PG. We also combine this technique with a number of additional, simple improve… ▽ More

    Submitted 23 April, 2018; originally announced April 2018.

  10. arXiv:1804.06318  [pdf, other

    cs.AI cs.NE cs.RO

    Learning Awareness Models

    Authors: Brandon Amos, Laurent Dinh, Serkan Cabi, Thomas Rothörl, Sergio Gómez Colmenarejo, Alistair Muldal, Tom Erez, Yuval Tassa, Nando de Freitas, Misha Denil

    Abstract: We consider the setting of an agent with a fixed body interacting with an unknown and uncertain external world. We show that models trained to predict proprioceptive information about the agent's body come to represent objects in the external world. In spite of being trained with only internally available signals, these dynamic body models come to represent external objects through the necessity o… ▽ More

    Submitted 17 April, 2018; originally announced April 2018.

    Comments: Accepted to ICLR 2018

  11. arXiv:1801.00690  [pdf, other

    cs.AI

    DeepMind Control Suite

    Authors: Yuval Tassa, Yotam Doron, Alistair Muldal, Tom Erez, Yazhe Li, Diego de Las Casas, David Budden, Abbas Abdolmaleki, Josh Merel, Andrew Lefrancq, Timothy Lillicrap, Martin Riedmiller

    Abstract: The DeepMind Control Suite is a set of continuous control tasks with a standardised structure and interpretable rewards, intended to serve as performance benchmarks for reinforcement learning agents. The tasks are written in Python and powered by the MuJoCo physics engine, making them easy to use and modify. We include benchmarks for several learning algorithms. The Control Suite is publicly avail… ▽ More

    Submitted 2 January, 2018; originally announced January 2018.

    Comments: 24 pages, 7 figures, 2 tables