Skip to main content

Showing 1–20 of 20 results for author: Yarats, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2206.15469  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    Watch and Match: Supercharging Imitation with Regularized Optimal Transport

    Authors: Siddhant Haldar, Vaibhav Mathur, Denis Yarats, Lerrel Pinto

    Abstract: Imitation learning holds tremendous promise in learning policies efficiently for complex decision making problems. Current state-of-the-art algorithms often use inverse reinforcement learning (IRL), where given a set of expert demonstrations, an agent alternatively infers a reward function and the associated optimal policy. However, such IRL approaches often require substantial online interactions… ▽ More

    Submitted 20 February, 2023; v1 submitted 30 June, 2022; originally announced June 2022.

    Comments: Code and robot videos are available on https://rot-robot.github.io/

  2. arXiv:2202.00161  [pdf, other

    cs.LG cs.AI

    CIC: Contrastive Intrinsic Control for Unsupervised Skill Discovery

    Authors: Michael Laskin, Hao Liu, Xue Bin Peng, Denis Yarats, Aravind Rajeswaran, Pieter Abbeel

    Abstract: We introduce Contrastive Intrinsic Control (CIC), an algorithm for unsupervised skill discovery that maximizes the mutual information between state-transitions and latent skill vectors. CIC utilizes contrastive learning between state-transitions and skills to learn behavior embeddings and maximizes the entropy of these embeddings as an intrinsic reward to encourage behavioral diversity. We evaluat… ▽ More

    Submitted 29 March, 2022; v1 submitted 31 January, 2022; originally announced February 2022.

    Comments: Project website: https://sites.google.com/view/cicrl/

  3. arXiv:2201.13425  [pdf, other

    cs.LG cs.AI

    Don't Change the Algorithm, Change the Data: Exploratory Data for Offline Reinforcement Learning

    Authors: Denis Yarats, David Brandfonbrener, Hao Liu, Michael Laskin, Pieter Abbeel, Alessandro Lazaric, Lerrel Pinto

    Abstract: Recent progress in deep learning has relied on access to large and diverse datasets. Such data-driven progress has been less evident in offline reinforcement learning (RL), because offline RL data is usually collected to optimize specific target tasks limiting the data's diversity. In this work, we propose Exploratory data for Offline RL (ExORL), a data-centric approach to offline RL. ExORL first… ▽ More

    Submitted 5 April, 2022; v1 submitted 31 January, 2022; originally announced January 2022.

  4. arXiv:2110.15191  [pdf, other

    cs.LG cs.AI cs.RO

    URLB: Unsupervised Reinforcement Learning Benchmark

    Authors: Michael Laskin, Denis Yarats, Hao Liu, Kimin Lee, Albert Zhan, Kevin Lu, Catherine Cang, Lerrel Pinto, Pieter Abbeel

    Abstract: Deep Reinforcement Learning (RL) has emerged as a powerful paradigm to solve a range of complex yet specific control tasks. Yet training generalist agents that can quickly adapt to new tasks remains an outstanding challenge. Recent advances in unsupervised RL have shown that pre-training RL agents with self-supervised intrinsic rewards can result in efficient adaptation. However, these algorithms… ▽ More

    Submitted 28 October, 2021; originally announced October 2021.

    Comments: Code for the Unsupervised Reinforcement Learning Benchmark is available at https://github.com/rll-research/url_benchmark

  5. arXiv:2109.10957  [pdf, other

    cs.RO stat.AP

    Real Robot Challenge: A Robotics Competition in the Cloud

    Authors: Stefan Bauer, Felix Widmaier, Manuel Wüthrich, Annika Buchholz, Sebastian Stark, Anirudh Goyal, Thomas Steinbrenner, Joel Akpo, Shruti Joshi, Vincent Berenz, Vaibhav Agrawal, Niklas Funk, Julen Urain De Jesus, Jan Peters, Joe Watson, Claire Chen, Krishnan Srinivasan, Junwu Zhang, Jeffrey Zhang, Matthew R. Walter, Rishabh Madan, Charles Schaff, Takahiro Maeda, Takuma Yoneda, Denis Yarats , et al. (17 additional authors not shown)

    Abstract: Dexterous manipulation remains an open problem in robotics. To coordinate efforts of the research community towards tackling this problem, we propose a shared benchmark. We designed and built robotic platforms that are hosted at MPI for Intelligent Systems and can be accessed remotely. Each platform consists of three robotic fingers that are capable of dexterous object manipulation. Users are able… ▽ More

    Submitted 10 June, 2022; v1 submitted 22 September, 2021; originally announced September 2021.

  6. arXiv:2107.09645  [pdf, other

    cs.AI cs.LG

    Mastering Visual Continuous Control: Improved Data-Augmented Reinforcement Learning

    Authors: Denis Yarats, Rob Fergus, Alessandro Lazaric, Lerrel Pinto

    Abstract: We present DrQ-v2, a model-free reinforcement learning (RL) algorithm for visual continuous control. DrQ-v2 builds on DrQ, an off-policy actor-critic approach that uses data augmentation to learn directly from pixels. We introduce several improvements that yield state-of-the-art results on the DeepMind Control Suite. Notably, DrQ-v2 is able to solve complex humanoid locomotion tasks directly from… ▽ More

    Submitted 20 July, 2021; originally announced July 2021.

  7. arXiv:2102.11271  [pdf, other

    cs.LG cs.AI

    Reinforcement Learning with Prototypical Representations

    Authors: Denis Yarats, Rob Fergus, Alessandro Lazaric, Lerrel Pinto

    Abstract: Learning effective representations in image-based environments is crucial for sample efficient Reinforcement Learning (RL). Unfortunately, in RL, representation learning is confounded with the exploratory experience of the agent -- learning a useful representation requires diverse data, while effective exploration is only possible with coherent representations. Furthermore, we would like to learn… ▽ More

    Submitted 20 July, 2021; v1 submitted 22 February, 2021; originally announced February 2021.

    Journal ref: ICML 2021

  8. arXiv:2011.12255  [pdf, other

    cs.RO

    Learning Navigation Skills for Legged Robots with Learned Robot Embeddings

    Authors: Joanne Truong, Denis Yarats, Tianyu Li, Franziska Meier, Sonia Chernova, Dhruv Batra, Akshara Rai

    Abstract: Recent work has shown results on learning navigation policies for idealized cylinder agents in simulation and transferring them to real wheeled robots. Deploying such navigation policies on legged robots can be challenging due to their complex dynamics, and the large dynamical difference between cylinder agents and legged systems. In this work, we learn hierarchical navigation policies that accoun… ▽ More

    Submitted 13 September, 2021; v1 submitted 24 November, 2020; originally announced November 2020.

  9. arXiv:2008.12775  [pdf, other

    cs.LG cs.AI cs.RO stat.ML

    On the model-based stochastic value gradient for continuous reinforcement learning

    Authors: Brandon Amos, Samuel Stanton, Denis Yarats, Andrew Gordon Wilson

    Abstract: For over a decade, model-based reinforcement learning has been seen as a way to leverage control-based domain knowledge to improve the sample-efficiency of reinforcement learning agents. While model-based agents are conceptually appealing, their policies tend to lag behind those of model-free agents in terms of final reward, especially in non-trivial environments. In response, researchers have pro… ▽ More

    Submitted 27 May, 2021; v1 submitted 28 August, 2020; originally announced August 2020.

    Comments: L4DC 2021

  10. arXiv:2006.12862  [pdf, other

    cs.LG cs.AI

    Automatic Data Augmentation for Generalization in Deep Reinforcement Learning

    Authors: Roberta Raileanu, Max Goldstein, Denis Yarats, Ilya Kostrikov, Rob Fergus

    Abstract: Deep reinforcement learning (RL) agents often fail to generalize to unseen scenarios, even when they are trained on many instances of semantically similar environments. Data augmentation has recently been shown to improve the sample efficiency and generalization of RL agents. However, different tasks tend to benefit from different kinds of data augmentation. In this paper, we compare three approac… ▽ More

    Submitted 20 February, 2021; v1 submitted 23 June, 2020; originally announced June 2020.

  11. arXiv:2004.13649  [pdf, other

    cs.LG cs.CV eess.IV stat.ML

    Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels

    Authors: Ilya Kostrikov, Denis Yarats, Rob Fergus

    Abstract: We propose a simple data augmentation technique that can be applied to standard model-free reinforcement learning algorithms, enabling robust learning directly from pixels without the need for auxiliary losses or pre-training. The approach leverages input perturbations commonly used in computer vision tasks to regularize the value function. Existing model-free approaches, such as Soft Actor-Critic… ▽ More

    Submitted 7 March, 2021; v1 submitted 28 April, 2020; originally announced April 2020.

  12. arXiv:1910.04209  [pdf, other

    cs.LG cs.NE stat.ML

    On the adequacy of untuned warmup for adaptive optimization

    Authors: Jerry Ma, Denis Yarats

    Abstract: Adaptive optimization algorithms such as Adam are widely used in deep learning. The stability of such algorithms is often improved with a warmup schedule for the learning rate. Motivated by the difficulty of choosing and tuning warmup schedules, recent work proposes automatic variance rectification of Adam's adaptive learning rate, claiming that this rectified approach ("RAdam") surpasses the vani… ▽ More

    Submitted 19 March, 2021; v1 submitted 9 October, 2019; originally announced October 2019.

    Comments: AAAI 2021

  13. arXiv:1910.01741  [pdf, other

    cs.LG cs.AI cs.RO stat.ML

    Improving Sample Efficiency in Model-Free Reinforcement Learning from Images

    Authors: Denis Yarats, Amy Zhang, Ilya Kostrikov, Brandon Amos, Joelle Pineau, Rob Fergus

    Abstract: Training an agent to solve control tasks directly from high-dimensional images with model-free reinforcement learning (RL) has proven difficult. A promising approach is to learn a latent representation together with the control policy. However, fitting a high-capacity encoder using a scarce reward signal is sample inefficient and leads to poor performance. Prior work has shown that auxiliary losse… ▽ More

    Submitted 9 July, 2020; v1 submitted 2 October, 2019; originally announced October 2019.

  14. arXiv:1910.01727  [pdf, other

    cs.LG stat.ML

    Generalized Inner Loop Meta-Learning

    Authors: Edward Grefenstette, Brandon Amos, Denis Yarats, Phu Mon Htut, Artem Molchanov, Franziska Meier, Douwe Kiela, Kyunghyun Cho, Soumith Chintala

    Abstract: Many (but not all) approaches self-qualifying as "meta-learning" in deep learning and reinforcement learning fit a common pattern of approximating the solution to a nested optimization problem. In this paper, we give a formalization of this shared pattern, which we call GIMLI, prove its general requirements, and derive a general-purpose algorithm for implementing similar approaches. Based on this… ▽ More

    Submitted 7 October, 2019; v1 submitted 3 October, 2019; originally announced October 2019.

    Comments: 17 pages, 3 figures, 1 algorithm

  15. arXiv:1909.12830  [pdf, other

    cs.LG cs.RO math.OC stat.ML

    The Differentiable Cross-Entropy Method

    Authors: Brandon Amos, Denis Yarats

    Abstract: We study the cross-entropy method (CEM) for the non-convex optimization of a continuous and parameterized objective function and introduce a differentiable variant that enables us to differentiate the output of CEM with respect to the objective function's parameters. In the machine learning setting this brings CEM inside of the end-to-end learning pipeline where this has otherwise been impossible.… ▽ More

    Submitted 14 August, 2020; v1 submitted 27 September, 2019; originally announced September 2019.

    Comments: ICML 2020

  16. arXiv:1906.00744  [pdf, other

    cs.AI cs.CL

    Hierarchical Decision Making by Generating and Following Natural Language Instructions

    Authors: Hengyuan Hu, Denis Yarats, Qucheng Gong, Yuandong Tian, Mike Lewis

    Abstract: We explore using latent natural language instructions as an expressive and compositional representation of complex actions for hierarchical decision making. Rather than directly selecting micro-actions, our agent first generates a latent plan in natural language, which is then executed by a separate model. We introduce a challenging real-time strategy game environment in which the actions of a lar… ▽ More

    Submitted 2 October, 2019; v1 submitted 3 June, 2019; originally announced June 2019.

  17. arXiv:1810.06801  [pdf, other

    cs.LG stat.ML

    Quasi-hyperbolic momentum and Adam for deep learning

    Authors: Jerry Ma, Denis Yarats

    Abstract: Momentum-based acceleration of stochastic gradient descent (SGD) is widely used in deep learning. We propose the quasi-hyperbolic momentum algorithm (QHM) as an extremely simple alteration of momentum SGD, averaging a plain SGD step with a momentum step. We describe numerous connections to and identities with other algorithms, and we characterize the set of two-state optimization algorithms that Q… ▽ More

    Submitted 2 May, 2019; v1 submitted 15 October, 2018; originally announced October 2018.

    Comments: Published as a conference paper at ICLR 2019. This version corrects one typological error in the published text

  18. arXiv:1712.05846  [pdf, other

    cs.CL

    Hierarchical Text Generation and Planning for Strategic Dialogue

    Authors: Denis Yarats, Mike Lewis

    Abstract: End-to-end models for goal-orientated dialogue are challenging to train, because linguistic and strategic aspects are entangled in latent state vectors. We introduce an approach to learning representations of messages in dialogues by maximizing the likelihood of subsequent sentences and actions, which decouples the semantics of the dialogue utterance from its linguistic realization. We then use th… ▽ More

    Submitted 4 June, 2018; v1 submitted 15 December, 2017; originally announced December 2017.

  19. arXiv:1706.05125  [pdf, ps, other

    cs.AI cs.CL

    Deal or No Deal? End-to-End Learning for Negotiation Dialogues

    Authors: Mike Lewis, Denis Yarats, Yann N. Dauphin, Devi Parikh, Dhruv Batra

    Abstract: Much of human dialogue occurs in semi-cooperative settings, where agents with different goals attempt to agree on common decisions. Negotiations require complex communication and reasoning skills, but success is easy to measure, making this an interesting task for AI. We gather a large dataset of human-human negotiations on a multi-issue bargaining task, where agents who cannot observe each other'… ▽ More

    Submitted 15 June, 2017; originally announced June 2017.

  20. arXiv:1705.03122  [pdf, other

    cs.CL

    Convolutional Sequence to Sequence Learning

    Authors: Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, Yann N. Dauphin

    Abstract: The prevalent approach to sequence to sequence learning maps an input sequence to a variable length output sequence via recurrent neural networks. We introduce an architecture based entirely on convolutional neural networks. Compared to recurrent models, computations over all elements can be fully parallelized during training and optimization is easier since the number of non-linearities is fixed… ▽ More

    Submitted 24 July, 2017; v1 submitted 8 May, 2017; originally announced May 2017.