Skip to main content

Showing 1–23 of 23 results for author: Kostrikov, I

Searching in archive cs. Search in all archives.
.
  1. arXiv:2305.13301  [pdf, other

    cs.LG cs.AI cs.CV

    Training Diffusion Models with Reinforcement Learning

    Authors: Kevin Black, Michael Janner, Yilun Du, Ilya Kostrikov, Sergey Levine

    Abstract: Diffusion models are a class of flexible generative models trained with an approximation to the log-likelihood objective. However, most use cases of diffusion models are not concerned with likelihoods, but instead with downstream objectives such as human-perceived image quality or drug effectiveness. In this paper, we investigate reinforcement learning methods for directly optimizing diffusion mod… ▽ More

    Submitted 4 January, 2024; v1 submitted 22 May, 2023; originally announced May 2023.

    Comments: 23 pages, 16 figures

  2. arXiv:2304.10573  [pdf, other

    cs.LG cs.AI

    IDQL: Implicit Q-Learning as an Actor-Critic Method with Diffusion Policies

    Authors: Philippe Hansen-Estruch, Ilya Kostrikov, Michael Janner, Jakub Grudzien Kuba, Sergey Levine

    Abstract: Effective offline RL methods require properly handling out-of-distribution actions. Implicit Q-learning (IQL) addresses this by training a Q-function using only dataset actions through a modified Bellman backup. However, it is unclear which policy actually attains the values represented by this implicitly trained Q-function. In this paper, we reinterpret IQL as an actor-critic method by generalizi… ▽ More

    Submitted 19 May, 2023; v1 submitted 20 April, 2023; originally announced April 2023.

    Comments: 9 Pages, 4 Figures, 3 Tables

  3. arXiv:2304.10466  [pdf, other

    cs.LG cs.AI stat.ML

    Efficient Deep Reinforcement Learning Requires Regulating Overfitting

    Authors: Qiyang Li, Aviral Kumar, Ilya Kostrikov, Sergey Levine

    Abstract: Deep reinforcement learning algorithms that learn policies by trial-and-error must learn from limited amounts of data collected by actively interacting with the environment. While many prior works have shown that proper regularization techniques are crucial for enabling data-efficient RL, a general understanding of the bottlenecks in data-efficient RL has remained unclear. Consequently, it has bee… ▽ More

    Submitted 20 April, 2023; originally announced April 2023.

    Comments: 26 pages, 18 figures, 3 tables, The International Conference on Learning Representations (ICLR) 2023

  4. arXiv:2304.09831  [pdf, other

    cs.RO cs.AI cs.LG

    FastRLAP: A System for Learning High-Speed Driving via Deep RL and Autonomous Practicing

    Authors: Kyle Stachowicz, Dhruv Shah, Arjun Bhorkar, Ilya Kostrikov, Sergey Levine

    Abstract: We present a system that enables an autonomous small-scale RC car to drive aggressively from visual observations using reinforcement learning (RL). Our system, FastRLAP (faster lap), trains autonomously in the real world, without human interventions, and without requiring any simulation or expert demonstrations. Our system integrates a number of important components to make this possible: we initi… ▽ More

    Submitted 19 April, 2023; originally announced April 2023.

  5. arXiv:2302.02948  [pdf, other

    cs.LG cs.AI

    Efficient Online Reinforcement Learning with Offline Data

    Authors: Philip J. Ball, Laura Smith, Ilya Kostrikov, Sergey Levine

    Abstract: Sample efficiency and exploration remain major challenges in online reinforcement learning (RL). A powerful approach that can be applied to address these issues is the inclusion of offline data, such as prior trajectories from a human expert or a sub-optimal exploration policy. Previous methods have relied on extensive modifications and additional complexity to ensure the effective use of this dat… ▽ More

    Submitted 31 May, 2023; v1 submitted 6 February, 2023; originally announced February 2023.

    Comments: Short Presentation at ICML 2023; to reproduce our results and use our codebase, see https://github.com/ikostrikov/rlpd

  6. arXiv:2212.08244  [pdf, other

    cs.RO cs.CV cs.LG

    Offline Reinforcement Learning for Visual Navigation

    Authors: Dhruv Shah, Arjun Bhorkar, Hrish Leen, Ilya Kostrikov, Nick Rhinehart, Sergey Levine

    Abstract: Reinforcement learning can enable robots to navigate to distant goals while optimizing user-specified reward functions, including preferences for following lanes, staying on paved paths, or avoiding freshly mowed grass. However, online learning from trial-and-error for real-world robots is logistically challenging, and methods that instead can utilize existing datasets of robotic navigation data c… ▽ More

    Submitted 15 December, 2022; originally announced December 2022.

    Comments: Project page https://sites.google.com/view/revind/home

  7. arXiv:2208.07860  [pdf, other

    cs.RO cs.AI

    A Walk in the Park: Learning to Walk in 20 Minutes With Model-Free Reinforcement Learning

    Authors: Laura Smith, Ilya Kostrikov, Sergey Levine

    Abstract: Deep reinforcement learning is a promising approach to learning policies in uncontrolled environments that do not require domain knowledge. Unfortunately, due to sample inefficiency, deep RL applications have primarily focused on simulated environments. In this work, we demonstrate that the recent advancements in machine learning algorithms and libraries combined with a carefully tuned robot contr… ▽ More

    Submitted 16 August, 2022; originally announced August 2022.

    Comments: First two authors contributed equally. Project website: https://sites.google.com/berkeley.edu/walk-in-the-park

  8. arXiv:2206.11871  [pdf, other

    cs.CL cs.LG

    Offline RL for Natural Language Generation with Implicit Language Q Learning

    Authors: Charlie Snell, Ilya Kostrikov, Yi Su, Mengjiao Yang, Sergey Levine

    Abstract: Large language models distill broad knowledge from text corpora. However, they can be inconsistent when it comes to completing user specified tasks. This issue can be addressed by finetuning such models via supervised learning on curated datasets, or via reinforcement learning. In this work, we propose a novel offline RL method, implicit language Q-learning (ILQL), designed for use on language mod… ▽ More

    Submitted 1 May, 2023; v1 submitted 5 June, 2022; originally announced June 2022.

  9. arXiv:2201.04122  [pdf, other

    cs.LG cs.AI cs.CV

    In Defense of the Unitary Scalarization for Deep Multi-Task Learning

    Authors: Vitaly Kurin, Alessandro De Palma, Ilya Kostrikov, Shimon Whiteson, M. Pawan Kumar

    Abstract: Recent multi-task learning research argues against unitary scalarization, where training simply minimizes the sum of the task losses. Several ad-hoc multi-task optimization algorithms have instead been proposed, inspired by various hypotheses about what makes multi-task settings difficult. The majority of these optimizers require per-task gradients, and introduce significant memory, runtime, and i… ▽ More

    Submitted 8 March, 2023; v1 submitted 11 January, 2022; originally announced January 2022.

    Comments: NeurIPS 2022 camera-ready version, fixed training loss y axis scale

  10. arXiv:2112.10751  [pdf, other

    cs.LG cs.AI stat.ML

    RvS: What is Essential for Offline RL via Supervised Learning?

    Authors: Scott Emmons, Benjamin Eysenbach, Ilya Kostrikov, Sergey Levine

    Abstract: Recent work has shown that supervised learning alone, without temporal difference (TD) learning, can be remarkably effective for offline RL. When does this hold true, and which algorithmic components are necessary? Through extensive experiments, we boil supervised learning for offline RL down to its essential elements. In every environment suite we consider, simply maximizing likelihood with a two… ▽ More

    Submitted 10 May, 2022; v1 submitted 20 December, 2021; originally announced December 2021.

  11. arXiv:2111.14629  [pdf, other

    cs.LG cs.AI

    Improving Zero-shot Generalization in Offline Reinforcement Learning using Generalized Similarity Functions

    Authors: Bogdan Mazoure, Ilya Kostrikov, Ofir Nachum, Jonathan Tompson

    Abstract: Reinforcement learning (RL) agents are widely used for solving complex sequential decision making tasks, but still exhibit difficulty in generalizing to scenarios not seen during training. While prior online approaches demonstrated that using additional signals beyond the reward function can lead to better generalization capabilities in RL agents, i.e. using self-supervised learning (SSL), they st… ▽ More

    Submitted 29 November, 2021; originally announced November 2021.

    Comments: Offline RL workshop at NeurIPS 2021

  12. arXiv:2110.06169  [pdf, other

    cs.LG

    Offline Reinforcement Learning with Implicit Q-Learning

    Authors: Ilya Kostrikov, Ashvin Nair, Sergey Levine

    Abstract: Offline reinforcement learning requires reconciling two conflicting aims: learning a policy that improves over the behavior policy that collected the dataset, while at the same time minimizing the deviation from the behavior policy so as to avoid errors due to distributional shift. This trade-off is critical, because most current offline reinforcement learning methods need to query the value of un… ▽ More

    Submitted 12 October, 2021; originally announced October 2021.

  13. arXiv:2103.08050  [pdf, other

    cs.LG

    Offline Reinforcement Learning with Fisher Divergence Critic Regularization

    Authors: Ilya Kostrikov, Jonathan Tompson, Rob Fergus, Ofir Nachum

    Abstract: Many modern approaches to offline Reinforcement Learning (RL) utilize behavior regularization, typically augmenting a model-free actor critic algorithm with a penalty measuring divergence of the policy from the offline data. In this work, we propose an alternative approach to encouraging the learned policy to stay close to the data, namely parameterizing the critic as the log-behavior-policy, whic… ▽ More

    Submitted 14 March, 2021; originally announced March 2021.

  14. arXiv:2007.13609  [pdf, other

    cs.LG stat.ML

    Statistical Bootstrap** for Uncertainty Estimation in Off-Policy Evaluation

    Authors: Ilya Kostrikov, Ofir Nachum

    Abstract: In reinforcement learning, it is typical to use the empirically observed transitions and rewards to estimate the value of a policy via either model-based or Q-fitting approaches. Although straightforward, these techniques in general yield biased estimates of the true value of the policy. In this work, we investigate the potential for statistical bootstrap** to be used as a way to take these bias… ▽ More

    Submitted 27 July, 2020; originally announced July 2020.

  15. arXiv:2006.12862  [pdf, other

    cs.LG cs.AI

    Automatic Data Augmentation for Generalization in Deep Reinforcement Learning

    Authors: Roberta Raileanu, Max Goldstein, Denis Yarats, Ilya Kostrikov, Rob Fergus

    Abstract: Deep reinforcement learning (RL) agents often fail to generalize to unseen scenarios, even when they are trained on many instances of semantically similar environments. Data augmentation has recently been shown to improve the sample efficiency and generalization of RL agents. However, different tasks tend to benefit from different kinds of data augmentation. In this paper, we compare three approac… ▽ More

    Submitted 20 February, 2021; v1 submitted 23 June, 2020; originally announced June 2020.

  16. arXiv:2004.13649  [pdf, other

    cs.LG cs.CV eess.IV stat.ML

    Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels

    Authors: Ilya Kostrikov, Denis Yarats, Rob Fergus

    Abstract: We propose a simple data augmentation technique that can be applied to standard model-free reinforcement learning algorithms, enabling robust learning directly from pixels without the need for auxiliary losses or pre-training. The approach leverages input perturbations commonly used in computer vision tasks to regularize the value function. Existing model-free approaches, such as Soft Actor-Critic… ▽ More

    Submitted 7 March, 2021; v1 submitted 28 April, 2020; originally announced April 2020.

  17. arXiv:1912.05032  [pdf, other

    cs.LG stat.ML

    Imitation Learning via Off-Policy Distribution Matching

    Authors: Ilya Kostrikov, Ofir Nachum, Jonathan Tompson

    Abstract: When performing imitation learning from expert demonstrations, distribution matching is a popular approach, in which one alternates between estimating distribution ratios and then using these ratios as rewards in a standard reinforcement learning (RL) algorithm. Traditionally, estimation of the distribution ratio requires on-policy data, which has caused previous work to either be exorbitantly dat… ▽ More

    Submitted 10 December, 2019; originally announced December 2019.

  18. arXiv:1912.02074  [pdf, other

    cs.LG cs.AI

    AlgaeDICE: Policy Gradient from Arbitrary Experience

    Authors: Ofir Nachum, Bo Dai, Ilya Kostrikov, Yinlam Chow, Lihong Li, Dale Schuurmans

    Abstract: In many real-world applications of reinforcement learning (RL), interactions with the environment are limited due to cost or feasibility. This presents a challenge to traditional RL algorithms since the max-return objective involves an expectation over on-policy samples. We introduce a new formulation of max-return optimization that allows the problem to be re-expressed by an expectation over an a… ▽ More

    Submitted 4 December, 2019; originally announced December 2019.

  19. arXiv:1910.01741  [pdf, other

    cs.LG cs.AI cs.RO stat.ML

    Improving Sample Efficiency in Model-Free Reinforcement Learning from Images

    Authors: Denis Yarats, Amy Zhang, Ilya Kostrikov, Brandon Amos, Joelle Pineau, Rob Fergus

    Abstract: Training an agent to solve control tasks directly from high-dimensional images with model-free reinforcement learning (RL) has proven difficult. A promising approach is to learn a latent representation together with the control policy. However, fitting a high-capacity encoder using a scarce reward signal is sample inefficient and leads to poor performance. Prior work has shown that auxiliary losse… ▽ More

    Submitted 9 July, 2020; v1 submitted 2 October, 2019; originally announced October 2019.

  20. arXiv:1809.02925  [pdf, other

    cs.LG stat.ML

    Discriminator-Actor-Critic: Addressing Sample Inefficiency and Reward Bias in Adversarial Imitation Learning

    Authors: Ilya Kostrikov, Kumar Krishna Agrawal, Debidatta Dwibedi, Sergey Levine, Jonathan Tompson

    Abstract: We identify two issues with the family of algorithms based on the Adversarial Imitation Learning framework. The first problem is implicit bias present in the reward functions used in these algorithms. While these biases might work well for some environments, they can also lead to sub-optimal behavior in others. Secondly, even though these algorithms can learn from few expert demonstrations, they r… ▽ More

    Submitted 15 October, 2018; v1 submitted 9 September, 2018; originally announced September 2018.

  21. arXiv:1705.10819  [pdf, other

    stat.ML cs.GR cs.LG

    Surface Networks

    Authors: Ilya Kostrikov, Zhongshi Jiang, Daniele Panozzo, Denis Zorin, Joan Bruna

    Abstract: We study data-driven representations for three-dimensional triangle meshes, which are one of the prevalent objects used to represent 3D geometry. Recent works have developed models that exploit the intrinsic geometry of manifolds and graphs, namely the Graph Neural Networks (GNNs) and its spectral variants, which learn from the local metric tensor via the Laplacian operator. Despite offering excel… ▽ More

    Submitted 18 June, 2018; v1 submitted 30 May, 2017; originally announced May 2017.

    Journal ref: CVPR 2018

  22. arXiv:1703.05407  [pdf, other

    cs.LG

    Intrinsic Motivation and Automatic Curricula via Asymmetric Self-Play

    Authors: Sainbayar Sukhbaatar, Zeming Lin, Ilya Kostrikov, Gabriel Synnaeve, Arthur Szlam, Rob Fergus

    Abstract: We describe a simple scheme that allows an agent to learn about its environment in an unsupervised manner. Our scheme pits two versions of the same agent, Alice and Bob, against one another. Alice proposes a task for Bob to complete; and then Bob attempts to complete the task. In this work we will focus on two kinds of environments: (nearly) reversible environments and environments that can be res… ▽ More

    Submitted 27 April, 2018; v1 submitted 15 March, 2017; originally announced March 2017.

    Comments: Published in ICLR 2018

  23. PlaNet - Photo Geolocation with Convolutional Neural Networks

    Authors: Tobias Weyand, Ilya Kostrikov, James Philbin

    Abstract: Is it possible to build a system to determine the location where a photo was taken using just its pixels? In general, the problem seems exceptionally difficult: it is trivial to construct situations where no location can be inferred. Yet images often contain informative cues such as landmarks, weather patterns, vegetation, road markings, and architectural details, which in combination may allow on… ▽ More

    Submitted 17 February, 2016; originally announced February 2016.