Skip to main content

Showing 1–14 of 14 results for author: Sikchi, H

.
  1. arXiv:2406.08805  [pdf, other

    cs.LG cs.AI cs.RO

    A Dual Approach to Imitation Learning from Observations with Offline Datasets

    Authors: Harshit Sikchi, Caleb Chuck, Amy Zhang, Scott Niekum

    Abstract: Demonstrations are an effective alternative to task specification for learning agents in settings where designing a reward function is difficult. However, demonstrating expert behavior in the action space of the agent becomes unwieldy when robots have complex, unintuitive morphologies. We consider the practical setting where an agent has a dataset of prior interactions with the environment and is… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Under submission. 23 pages

  2. arXiv:2406.02900  [pdf, other

    cs.LG cs.AI cs.CL

    Scaling Laws for Reward Model Overoptimization in Direct Alignment Algorithms

    Authors: Rafael Rafailov, Yaswanth Chittepu, Ryan Park, Harshit Sikchi, Joey Hejna, Bradley Knox, Chelsea Finn, Scott Niekum

    Abstract: Reinforcement Learning from Human Feedback (RLHF) has been crucial to the recent success of Large Language Models (LLMs), however, it is often a complex and brittle process. In the classical RLHF framework, a reward model is first trained to represent human preferences, which is in turn used by an online reinforcement learning (RL) algorithm to optimize the LLM. A prominent issue with such methods… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  3. arXiv:2405.03113  [pdf, other

    cs.RO cs.AI

    Robot Air Hockey: A Manipulation Testbed for Robot Learning with Reinforcement Learning

    Authors: Caleb Chuck, Carl Qi, Michael J. Munje, Shuozhe Li, Max Rudolph, Chang Shi, Siddhant Agarwal, Harshit Sikchi, Abhinav Peri, Sarthak Dayal, Evan Kuo, Kavan Mehta, Anthony Wang, Peter Stone, Amy Zhang, Scott Niekum

    Abstract: Reinforcement Learning is a promising tool for learning complex policies even in fast-moving and object-interactive domains where human teleoperation or hard-coded policies might fail. To effectively reflect this challenging category of tasks, we introduce a dynamic, interactive RL testbed based on robot air hockey. By augmenting air hockey with a large family of tasks ranging from easy tasks like… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  4. arXiv:2311.02013  [pdf, other

    cs.LG cs.AI cs.RO

    SMORE: Score Models for Offline Goal-Conditioned Reinforcement Learning

    Authors: Harshit Sikchi, Rohan Chitnis, Ahmed Touati, Alborz Geramifard, Amy Zhang, Scott Niekum

    Abstract: Offline Goal-Conditioned Reinforcement Learning (GCRL) is tasked with learning to achieve multiple goals in an environment purely from offline datasets using sparse reward functions. Offline GCRL is pivotal for develo** generalist agents capable of leveraging pre-existing datasets to learn diverse and reusable skills without hand-engineering reward functions. However, contemporary approaches to… ▽ More

    Submitted 28 February, 2024; v1 submitted 3 November, 2023; originally announced November 2023.

    Comments: Published at International Conference of Learning Representations (ICLR) 2024. 26 pages

  5. arXiv:2310.13639  [pdf, other

    cs.LG cs.AI

    Contrastive Preference Learning: Learning from Human Feedback without RL

    Authors: Joey Hejna, Rafael Rafailov, Harshit Sikchi, Chelsea Finn, Scott Niekum, W. Bradley Knox, Dorsa Sadigh

    Abstract: Reinforcement Learning from Human Feedback (RLHF) has emerged as a popular paradigm for aligning models with human intent. Typically RLHF algorithms operate in two phases: first, use human preferences to learn a reward function and second, align the model by optimizing the learned reward via reinforcement learning (RL). This paradigm assumes that human preferences are distributed according to rewa… ▽ More

    Submitted 30 April, 2024; v1 submitted 20 October, 2023; originally announced October 2023.

    Comments: ICLR 2024. Code released at https://github.com/jhejna/cpl

  6. arXiv:2302.08560  [pdf, other

    cs.LG cs.AI cs.RO

    Dual RL: Unification and New Methods for Reinforcement and Imitation Learning

    Authors: Harshit Sikchi, Qinqing Zheng, Amy Zhang, Scott Niekum

    Abstract: The goal of reinforcement learning (RL) is to find a policy that maximizes the expected cumulative return. It has been shown that this objective can be represented as an optimization problem of state-action visitation distribution under linear constraints. The dual problem of this formulation, which we refer to as dual RL, is unconstrained and easier to optimize. In this work, we first cast severa… ▽ More

    Submitted 26 January, 2024; v1 submitted 16 February, 2023; originally announced February 2023.

    Comments: Published as a conference paper (spotlight) at ICLR 2024. 48 pages

  7. arXiv:2203.08371  [pdf, other

    cs.RO

    Real Robot Challenge 2021: Cartesian Position Control with Triangle Grasp and Trajectory Interpolation

    Authors: Rishabh Madan, Harshit Sikchi, Ethan K. Gordon, Tapomayukh Bhattacharjee

    Abstract: We present our runner-up approach for the Real Robot Challenge 2021. We build upon our previous approach used in Real Robot Challenge 2020. To solve the task of sequential goal-reaching we focus on two aspects to achieving near-optimal trajectory: Grasp stability and Controller performance. In the RRC 2021 simulated challenge, our method relied on a hand-designed Pinch grasp combined with Trajecto… ▽ More

    Submitted 19 March, 2022; v1 submitted 15 March, 2022; originally announced March 2022.

  8. arXiv:2202.03481  [pdf, other

    cs.LG cs.AI cs.RO

    A Ranking Game for Imitation Learning

    Authors: Harshit Sikchi, Akanksha Saran, Wonjoon Goo, Scott Niekum

    Abstract: We propose a new framework for imitation learning -- treating imitation as a two-player ranking-based game between a policy and a reward. In this game, the reward agent learns to satisfy pairwise performance rankings between behaviors, while the policy agent learns to maximize this reward. In imitation learning, near-optimal expert data can be difficult to obtain, and even in the limit of infinite… ▽ More

    Submitted 16 January, 2023; v1 submitted 7 February, 2022; originally announced February 2022.

    Comments: Published in Transactions on Machine Learning Research 2023. 38 pages

  9. arXiv:2109.10957  [pdf, other

    cs.RO stat.AP

    Real Robot Challenge: A Robotics Competition in the Cloud

    Authors: Stefan Bauer, Felix Widmaier, Manuel Wüthrich, Annika Buchholz, Sebastian Stark, Anirudh Goyal, Thomas Steinbrenner, Joel Akpo, Shruti Joshi, Vincent Berenz, Vaibhav Agrawal, Niklas Funk, Julen Urain De Jesus, Jan Peters, Joe Watson, Claire Chen, Krishnan Srinivasan, Junwu Zhang, Jeffrey Zhang, Matthew R. Walter, Rishabh Madan, Charles Schaff, Takahiro Maeda, Takuma Yoneda, Denis Yarats , et al. (17 additional authors not shown)

    Abstract: Dexterous manipulation remains an open problem in robotics. To coordinate efforts of the research community towards tackling this problem, we propose a shared benchmark. We designed and built robotic platforms that are hosted at MPI for Intelligent Systems and can be accessed remotely. Each platform consists of three robotic fingers that are capable of dexterous object manipulation. Users are able… ▽ More

    Submitted 10 June, 2022; v1 submitted 22 September, 2021; originally announced September 2021.

  10. arXiv:2103.09230  [pdf, other

    cs.LG cs.AI cs.RO

    Lyapunov Barrier Policy Optimization

    Authors: Harshit Sikchi, Wenxuan Zhou, David Held

    Abstract: Deploying Reinforcement Learning (RL) agents in the real-world require that the agents satisfy safety constraints. Current RL agents explore the environment without considering these constraints, which can lead to damage to the hardware or even other agents in the environment. We propose a new method, LBPO, that uses a Lyapunov-based barrier function to restrict the policy update to a safe set for… ▽ More

    Submitted 16 March, 2021; originally announced March 2021.

  11. arXiv:2011.04709  [pdf, other

    cs.LG cs.RO

    f-IRL: Inverse Reinforcement Learning via State Marginal Matching

    Authors: Tianwei Ni, Harshit Sikchi, Yufei Wang, Tejus Gupta, Lisa Lee, Benjamin Eysenbach

    Abstract: Imitation learning is well-suited for robotic tasks where it is difficult to directly program the behavior or specify a cost for optimal control. In this work, we propose a method for learning the reward function (and the corresponding policy) to match the expert state density. Our main result is the analytic gradient of any f-divergence between the agent and expert state distribution w.r.t. rewar… ▽ More

    Submitted 29 December, 2020; v1 submitted 9 November, 2020; originally announced November 2020.

    Comments: The first four authors have equal contribution (orders determined by dice rolling), and the last two authors have equal advising. The paper is accepted by Conference on Robot Learning (CoRL) 2020. Project videos and code link are available at https://sites.google.com/view/f-irl/home

  12. arXiv:2008.10066  [pdf, other

    cs.LG cs.AI cs.RO

    Learning Off-Policy with Online Planning

    Authors: Harshit Sikchi, Wenxuan Zhou, David Held

    Abstract: Reinforcement learning (RL) in low-data and risk-sensitive domains requires performant and flexible deployment policies that can readily incorporate constraints during deployment. One such class of policies are the semi-parametric H-step lookahead policies, which select actions using trajectory optimization over a dynamics model for a fixed horizon with a terminal value function. In this work, we… ▽ More

    Submitted 5 October, 2021; v1 submitted 23 August, 2020; originally announced August 2020.

    Comments: 30 pages, Conference of Robot Learning (CoRL) 2021

  13. arXiv:2007.16162  [pdf, other

    cs.RO cs.AI cs.LG

    Imitative Planning using Conditional Normalizing Flow

    Authors: Shubhankar Agarwal, Harshit Sikchi, Cole Gulino, Eric Wilkinson, Shivam Gautam

    Abstract: A popular way to plan trajectories in dynamic urban scenarios for Autonomous Vehicles is to rely on explicitly specified and hand crafted cost functions, coupled with random sampling in the trajectory space to find the minimum cost trajectory. Such methods require a high number of samples to find a low-cost trajectory and might end up with a highly suboptimal trajectory given the planning time bud… ▽ More

    Submitted 13 October, 2022; v1 submitted 31 July, 2020; originally announced July 2020.

  14. arXiv:1902.08802  [pdf, other

    cs.CV

    Illumination-invariant Face recognition by fusing thermal and visual images via gradient transfer

    Authors: Sumit Agarwal, Harshit S. Sikchi, Suparna Rooj, Shubhobrata Bhattacharya, Aurobinda Routray

    Abstract: Face recognition in real life situations like low illumination condition is still an open challenge in biometric security. It is well established that the state-of-the-art methods in face recognition provide low accuracy in the case of poor illumination. In this work, we propose an algorithm for a more robust illumination invariant face recognition using a multi-modal approach. We propose a new da… ▽ More

    Submitted 23 February, 2019; originally announced February 2019.