Skip to main content

Showing 1–18 of 18 results for author: Swamy, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.11905  [pdf, other

    cs.NE cs.LG

    EvIL: Evolution Strategies for Generalisable Imitation Learning

    Authors: Silvia Sapora, Gokul Swamy, Chris Lu, Yee Whye Teh, Jakob Nicolaus Foerster

    Abstract: Often times in imitation learning (IL), the environment we collect expert demonstrations in and the environment we want to deploy our learned policy in aren't exactly the same (e.g. demonstrations collected in simulation but deployment in the real world). Compared to policy-centric approaches to IL like behavioural cloning, reward-centric approaches like inverse reinforcement learning (IRL) often… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: 17 pages, 8 figures, ICML 2024

  2. arXiv:2406.04219  [pdf, other

    cs.LG

    Multi-Agent Imitation Learning: Value is Easy, Regret is Hard

    Authors: **gwu Tang, Gokul Swamy, Fei Fang, Zhiwei Steven Wu

    Abstract: We study a multi-agent imitation learning (MAIL) problem where we take the perspective of a learner attempting to coordinate a group of agents based on demonstrations of an expert doing so. Most prior work in MAIL essentially reduces the problem to matching the behavior of the expert within the support of the demonstrations. While doing so is sufficient to drive the value gap between the learner a… ▽ More

    Submitted 25 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

  3. arXiv:2406.01462  [pdf, other

    cs.LG cs.AI cs.CL

    Understanding Preference Fine-Tuning Through the Lens of Coverage

    Authors: Yuda Song, Gokul Swamy, Aarti Singh, J. Andrew Bagnell, Wen Sun

    Abstract: Learning from human preference data has emerged as the dominant paradigm for fine-tuning large language models (LLMs). The two most common families of techniques -- online reinforcement learning (RL) such as Proximal Policy Optimization (PPO) and offline contrastive methods such as Direct Preference Optimization (DPO) -- were positioned as equivalent in prior work due to the fact that both have to… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  4. arXiv:2404.16767  [pdf, other

    cs.LG cs.CL cs.CV

    REBEL: Reinforcement Learning via Regressing Relative Rewards

    Authors: Zhaolin Gao, Jonathan D. Chang, Wenhao Zhan, Owen Oertell, Gokul Swamy, Kianté Brantley, Thorsten Joachims, J. Andrew Bagnell, Jason D. Lee, Wen Sun

    Abstract: While originally developed for continuous control problems, Proximal Policy Optimization (PPO) has emerged as the work-horse of a variety of reinforcement learning (RL) applications, including the fine-tuning of generative models. Unfortunately, PPO requires multiple heuristics to enable stable convergence (e.g. value networks, clip**), and is notorious for its sensitivity to the precise impleme… ▽ More

    Submitted 29 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: New experimental results on general chat

  5. arXiv:2402.08848  [pdf, other

    cs.LG cs.AI

    Hybrid Inverse Reinforcement Learning

    Authors: Juntao Ren, Gokul Swamy, Zhiwei Steven Wu, J. Andrew Bagnell, Sanjiban Choudhury

    Abstract: The inverse reinforcement learning approach to imitation learning is a double-edged sword. On the one hand, it can enable learning from a smaller number of expert demonstrations with more robustness to error compounding than behavioral cloning approaches. On the other hand, it requires that the learner repeatedly solve a computationally expensive reinforcement learning (RL) problem. Often, much of… ▽ More

    Submitted 4 June, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

  6. arXiv:2402.02616   

    cs.LG

    The Virtues of Pessimism in Inverse Reinforcement Learning

    Authors: David Wu, Gokul Swamy, J. Andrew Bagnell, Zhiwei Steven Wu, Sanjiban Choudhury

    Abstract: Inverse Reinforcement Learning (IRL) is a powerful framework for learning complex behaviors from expert demonstrations. However, it traditionally requires repeatedly solving a computationally expensive reinforcement learning (RL) problem in its inner loop. It is desirable to reduce the exploration burden by leveraging expert demonstrations in the inner-loop RL. As an example, recent work resets th… ▽ More

    Submitted 8 February, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

    Comments: This paper has been withdrawn by the authors pending edits from other authors

  7. arXiv:2401.04056  [pdf, other

    cs.LG

    A Minimaximalist Approach to Reinforcement Learning from Human Feedback

    Authors: Gokul Swamy, Christoph Dann, Rahul Kidambi, Zhiwei Steven Wu, Alekh Agarwal

    Abstract: We present Self-Play Preference Optimization (SPO), an algorithm for reinforcement learning from human feedback. Our approach is minimalist in that it does not require training a reward model nor unstable adversarial training and is therefore rather simple to implement. Our approach is maximalist in that it provably handles non-Markovian, intransitive, and stochastic preferences while being robust… ▽ More

    Submitted 13 June, 2024; v1 submitted 8 January, 2024; originally announced January 2024.

  8. arXiv:2309.00711  [pdf, other

    cs.LG

    Learning Shared Safety Constraints from Multi-task Demonstrations

    Authors: Konwoo Kim, Gokul Swamy, Zuxin Liu, Ding Zhao, Sanjiban Choudhury, Zhiwei Steven Wu

    Abstract: Regardless of the particular task we want them to perform in an environment, there are often shared safety constraints we want our agents to respect. For example, regardless of whether it is making a sandwich or clearing the table, a kitchen robot should not break a plate. Manually specifying such a constraint can be both time-consuming and error-prone. We show how to learn constraints from expert… ▽ More

    Submitted 1 September, 2023; originally announced September 2023.

  9. arXiv:2303.14623  [pdf, other

    cs.LG

    Inverse Reinforcement Learning without Reinforcement Learning

    Authors: Gokul Swamy, Sanjiban Choudhury, J. Andrew Bagnell, Zhiwei Steven Wu

    Abstract: Inverse Reinforcement Learning (IRL) is a powerful set of techniques for imitation learning that aims to learn a reward function that rationalizes expert demonstrations. Unfortunately, traditional IRL methods suffer from a computational weakness: they require repeatedly solving a hard reinforcement learning (RL) problem as a subroutine. This is counter-intuitive from the viewpoint of reductions: w… ▽ More

    Submitted 29 January, 2024; v1 submitted 26 March, 2023; originally announced March 2023.

  10. arXiv:2208.09551  [pdf, ps, other

    cs.GT cs.LG

    Game-Theoretic Algorithms for Conditional Moment Matching

    Authors: Gokul Swamy, Sanjiban Choudhury, J. Andrew Bagnell, Zhiwei Steven Wu

    Abstract: A variety of problems in econometrics and machine learning, including instrumental variable regression and Bellman residual minimization, can be formulated as satisfying a set of conditional moment restrictions (CMR). We derive a general, game-theoretic strategy for satisfying CMR that scales to nonlinear problems, is amenable to gradient-based optimization, and is able to account for finite sampl… ▽ More

    Submitted 19 August, 2022; originally announced August 2022.

  11. arXiv:2208.02225  [pdf, other

    cs.LG

    Sequence Model Imitation Learning with Unobserved Contexts

    Authors: Gokul Swamy, Sanjiban Choudhury, J. Andrew Bagnell, Zhiwei Steven Wu

    Abstract: We consider imitation learning problems where the learner's ability to mimic the expert increases throughout the course of an episode as more information is revealed. One example of this is when the expert has access to privileged information: while the learner might not be able to accurately reproduce expert behavior early on in an episode, by considering the entire history of states and actions,… ▽ More

    Submitted 14 January, 2023; v1 submitted 3 August, 2022; originally announced August 2022.

  12. arXiv:2205.15397  [pdf, other

    cs.LG stat.ML

    Minimax Optimal Online Imitation Learning via Replay Estimation

    Authors: Gokul Swamy, Nived Rajaraman, Matthew Peng, Sanjiban Choudhury, J. Andrew Bagnell, Zhiwei Steven Wu, Jiantao Jiao, Kannan Ramchandran

    Abstract: Online imitation learning is the problem of how best to mimic expert demonstrations, given access to the environment or an accurate simulator. Prior work has shown that in the infinite sample regime, exact moment matching achieves value equivalence to the expert policy. However, in the finite sample regime, even if one has no optimization error, empirical variance can lead to a performance gap tha… ▽ More

    Submitted 14 January, 2023; v1 submitted 30 May, 2022; originally announced May 2022.

  13. arXiv:2202.01312  [pdf, other

    cs.LG cs.RO

    Causal Imitation Learning under Temporally Correlated Noise

    Authors: Gokul Swamy, Sanjiban Choudhury, J. Andrew Bagnell, Zhiwei Steven Wu

    Abstract: We develop algorithms for imitation learning from policy data that was corrupted by temporally correlated noise in expert actions. When noise affects multiple timesteps of recorded data, it can manifest as spurious correlations between states and actions that a learner might latch on to, leading to poor policy performance. To break up these spurious correlations, we apply modern variants of the in… ▽ More

    Submitted 2 February, 2022; originally announced February 2022.

  14. arXiv:2110.02063  [pdf, ps, other

    cs.LG

    A Critique of Strictly Batch Imitation Learning

    Authors: Gokul Swamy, Sanjiban Choudhury, J. Andrew Bagnell, Zhiwei Steven Wu

    Abstract: Recent work by Jarrett et al. attempts to frame the problem of offline imitation learning (IL) as one of learning a joint energy-based model, with the hope of out-performing standard behavioral cloning. We suggest that notational issues obscure how the psuedo-state visitation distribution the authors propose to optimize might be disconnected from the policy's $\textit{true}$ state visitation distr… ▽ More

    Submitted 5 October, 2021; originally announced October 2021.

  15. arXiv:2103.03236  [pdf, other

    cs.LG cs.RO stat.ML

    Of Moments and Matching: A Game-Theoretic Framework for Closing the Imitation Gap

    Authors: Gokul Swamy, Sanjiban Choudhury, J. Andrew Bagnell, Zhiwei Steven Wu

    Abstract: We provide a unifying view of a large family of previous imitation learning algorithms through the lens of moment matching. At its core, our classification scheme is based on whether the learner attempts to match (1) reward or (2) action-value moments of the expert's behavior, with each option leading to differing algorithmic approaches. By considering adversarially chosen divergences between lear… ▽ More

    Submitted 10 June, 2021; v1 submitted 4 March, 2021; originally announced March 2021.

  16. arXiv:1910.02910  [pdf, other

    cs.RO cs.LG stat.ML

    Scaled Autonomy: Enabling Human Operators to Control Robot Fleets

    Authors: Gokul Swamy, Siddharth Reddy, Sergey Levine, Anca D. Dragan

    Abstract: Autonomous robots often encounter challenging situations where their control policies fail and an expert human operator must briefly intervene, e.g., through teleoperation. In settings where multiple robots act in separate environments, a single human operator can manage a fleet of robots by identifying and teleoperating one robot at any given time. The key challenge is that users have limited att… ▽ More

    Submitted 8 March, 2020; v1 submitted 21 September, 2019; originally announced October 2019.

    Comments: Accepted to International Conference on Robotics and Automation (ICRA) 2020

  17. arXiv:1901.01291  [pdf, other

    cs.RO cs.LG stat.ML

    On the Utility of Model Learning in HRI

    Authors: Gokul Swamy, Jens Schulz, Rohan Choudhury, Dylan Hadfield-Menell, Anca Dragan

    Abstract: Fundamental to robotics is the debate between model-based and model-free learning: should the robot build an explicit model of the world, or learn a policy directly? In the context of HRI, part of the world to be modeled is the human. One option is for the robot to treat the human as a black box and learn a policy for how they act directly. But it can also model the human as an agent, and rely on… ▽ More

    Submitted 21 May, 2020; v1 submitted 4 January, 2019; originally announced January 2019.

  18. arXiv:1806.09070  [pdf, other

    cs.GR cs.LG stat.ML

    Generative Models for Pose Transfer

    Authors: Patrick Chao, Alexander Li, Gokul Swamy

    Abstract: We investigate nearest neighbor and generative models for transferring pose between persons. We take in a video of one person performing a sequence of actions and attempt to generate a video of another person performing the same actions. Our generative model (pix2pix) outperforms k-NN at both generating corresponding frames and generalizing outside the demonstrated action set. Our most salient con… ▽ More

    Submitted 23 June, 2018; originally announced June 2018.