Skip to main content

Showing 1–14 of 14 results for author: Hejna, J

.
  1. arXiv:2406.02900  [pdf, other

    cs.LG cs.AI cs.CL

    Scaling Laws for Reward Model Overoptimization in Direct Alignment Algorithms

    Authors: Rafael Rafailov, Yaswanth Chittepu, Ryan Park, Harshit Sikchi, Joey Hejna, Bradley Knox, Chelsea Finn, Scott Niekum

    Abstract: Reinforcement Learning from Human Feedback (RLHF) has been crucial to the recent success of Large Language Models (LLMs), however, it is often a complex and brittle process. In the classical RLHF framework, a reward model is first trained to represent human preferences, which is in turn used by an online reinforcement learning (RL) algorithm to optimize the LLM. A prominent issue with such methods… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  2. arXiv:2406.00888  [pdf, other

    cs.CL cs.HC

    Show, Don't Tell: Aligning Language Models with Demonstrated Feedback

    Authors: Omar Shaikh, Michelle Lam, Joey Hejna, Yijia Shao, Michael Bernstein, Diyi Yang

    Abstract: Language models are aligned to emulate the collective voice of many, resulting in outputs that align with no one in particular. Steering LLMs away from generic output is possible through supervised finetuning or RLHF, but requires prohibitively large datasets for new ad-hoc tasks. We argue that it is instead possible to align an LLM to a specific setting by leveraging a very small number ($<10$) o… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  3. arXiv:2405.12213  [pdf, other

    cs.RO cs.LG

    Octo: An Open-Source Generalist Robot Policy

    Authors: Octo Model Team, Dibya Ghosh, Homer Walke, Karl Pertsch, Kevin Black, Oier Mees, Sudeep Dasari, Joey Hejna, Tobias Kreiman, Charles Xu, Jianlan Luo, You Liang Tan, Lawrence Yunliang Chen, Pannag Sanketi, Quan Vuong, Ted Xiao, Dorsa Sadigh, Chelsea Finn, Sergey Levine

    Abstract: Large policies pretrained on diverse robot datasets have the potential to transform robotic learning: instead of training new policies from scratch, such generalist robot policies may be finetuned with only a little in-domain data, yet generalize broadly. However, to be widely applicable across a range of robotic learning scenarios, environments, and tasks, such policies need to handle diverse sen… ▽ More

    Submitted 26 May, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

    Comments: Project website: https://octo-models.github.io

  4. arXiv:2404.12358  [pdf, other

    cs.LG

    From $r$ to $Q^*$: Your Language Model is Secretly a Q-Function

    Authors: Rafael Rafailov, Joey Hejna, Ryan Park, Chelsea Finn

    Abstract: Reinforcement Learning From Human Feedback (RLHF) has been a critical to the success of the latest generation of generative AI models. In response to the complex nature of the classical RLHF pipeline, direct alignment algorithms such as Direct Preference Optimization (DPO) have emerged as an alternative approach. Although DPO solves the same objective as the standard RLHF setup, there is a mismatc… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  5. arXiv:2403.12945  [pdf, other

    cs.RO

    DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset

    Authors: Alexander Khazatsky, Karl Pertsch, Suraj Nair, Ashwin Balakrishna, Sudeep Dasari, Siddharth Karamcheti, Soroush Nasiriany, Mohan Kumar Srirama, Lawrence Yunliang Chen, Kirsty Ellis, Peter David Fagan, Joey Hejna, Masha Itkina, Marion Lepert, Yecheng Jason Ma, Patrick Tree Miller, Jimmy Wu, Suneel Belkhale, Shivin Dass, Huy Ha, Arhan Jain, Abraham Lee, Youngwoon Lee, Marius Memmel, Sungjae Park , et al. (74 additional authors not shown)

    Abstract: The creation of large, diverse, high-quality robot manipulation datasets is an important step** stone on the path toward more capable and robust robotic manipulation policies. However, creating such datasets is challenging: collecting robot manipulation data in diverse environments poses logistical and safety challenges and requires substantial investments in hardware and human labour. As a resu… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: Project website: https://droid-dataset.github.io/

  6. arXiv:2310.13639  [pdf, other

    cs.LG cs.AI

    Contrastive Preference Learning: Learning from Human Feedback without RL

    Authors: Joey Hejna, Rafael Rafailov, Harshit Sikchi, Chelsea Finn, Scott Niekum, W. Bradley Knox, Dorsa Sadigh

    Abstract: Reinforcement Learning from Human Feedback (RLHF) has emerged as a popular paradigm for aligning models with human intent. Typically RLHF algorithms operate in two phases: first, use human preferences to learn a reward function and second, align the model by optimizing the learned reward via reinforcement learning (RL). This paradigm assumes that human preferences are distributed according to rewa… ▽ More

    Submitted 30 April, 2024; v1 submitted 20 October, 2023; originally announced October 2023.

    Comments: ICLR 2024. Code released at https://github.com/jhejna/cpl

  7. arXiv:2310.08864  [pdf, other

    cs.RO

    Open X-Embodiment: Robotic Learning Datasets and RT-X Models

    Authors: Open X-Embodiment Collaboration, Abby O'Neill, Abdul Rehman, Abhinav Gupta, Abhiram Maddukuri, Abhishek Gupta, Abhishek Padalkar, Abraham Lee, Acorn Pooley, Agrim Gupta, Ajay Mandlekar, A**kya Jain, Albert Tung, Alex Bewley, Alex Herzog, Alex Irpan, Alexander Khazatsky, Anant Rai, Anchit Gupta, Andrew Wang, Andrey Kolobov, Anikait Singh, Animesh Garg, Aniruddha Kembhavi, Annie Xie , et al. (267 additional authors not shown)

    Abstract: Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning method… ▽ More

    Submitted 1 June, 2024; v1 submitted 13 October, 2023; originally announced October 2023.

    Comments: Project website: https://robotics-transformer-x.github.io

  8. arXiv:2306.12554  [pdf, other

    cs.LG cs.AI

    Improving Long-Horizon Imitation Through Instruction Prediction

    Authors: Joey Hejna, Pieter Abbeel, Lerrel Pinto

    Abstract: Complex, long-horizon planning and its combinatorial nature pose steep challenges for learning-based agents. Difficulties in such settings are exacerbated in low data regimes where over-fitting stifles generalization and compounding errors hurt accuracy. In this work, we explore the use of an often unused source of auxiliary supervision: language. Inspired by recent advances in transformer-based m… ▽ More

    Submitted 21 June, 2023; originally announced June 2023.

    Comments: Published at AAAI 2023

  9. arXiv:2305.15363  [pdf, other

    cs.LG

    Inverse Preference Learning: Preference-based RL without a Reward Function

    Authors: Joey Hejna, Dorsa Sadigh

    Abstract: Reward functions are difficult to design and often hard to align with human intent. Preference-based Reinforcement Learning (RL) algorithms address these problems by learning reward functions from human feedback. However, the majority of preference-based RL methods naïvely combine supervised reward models with off-the-shelf RL algorithms. Contemporary approaches have sought to improve performance… ▽ More

    Submitted 24 November, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: Updated for NeurIPS 2023 Acceptance

  10. arXiv:2304.13774  [pdf, other

    cs.LG

    Distance Weighted Supervised Learning for Offline Interaction Data

    Authors: Joey Hejna, Jensen Gao, Dorsa Sadigh

    Abstract: Sequential decision making algorithms often struggle to leverage different sources of unstructured offline interaction data. Imitation learning (IL) methods based on supervised learning are robust, but require optimal demonstrations, which are hard to collect. Offline goal-conditioned reinforcement learning (RL) algorithms promise to learn from sub-optimal data, but face optimization challenges es… ▽ More

    Submitted 26 April, 2023; originally announced April 2023.

    Comments: ICML 2023

  11. arXiv:2301.02328  [pdf, other

    cs.LG cs.AI cs.RO

    Extreme Q-Learning: MaxEnt RL without Entropy

    Authors: Divyansh Garg, Joey Hejna, Matthieu Geist, Stefano Ermon

    Abstract: Modern Deep Reinforcement Learning (RL) algorithms require estimates of the maximal Q-value, which are difficult to compute in continuous domains with an infinite number of possible actions. In this work, we introduce a new update rule for online and offline RL which directly models the maximal value using Extreme Value Theory (EVT), drawing inspiration from economics. By doing so, we avoid comput… ▽ More

    Submitted 28 February, 2023; v1 submitted 5 January, 2023; originally announced January 2023.

    Comments: ICLR 2023 Oral

  12. arXiv:2212.03363  [pdf, other

    cs.RO cs.AI cs.LG

    Few-Shot Preference Learning for Human-in-the-Loop RL

    Authors: Joey Hejna, Dorsa Sadigh

    Abstract: While reinforcement learning (RL) has become a more popular approach for robotics, designing sufficiently informative reward functions for complex tasks has proven to be extremely difficult due their inability to capture human intent and policy exploitation. Preference based RL algorithms seek to overcome these challenges by directly learning reward functions from human feedback. Unfortunately, pr… ▽ More

    Submitted 6 December, 2022; originally announced December 2022.

    Comments: 6th Annual Conference on Robot Learning (CoRL) 2022

  13. arXiv:2102.13100  [pdf, other

    cs.LG cs.AI cs.RO

    Task-Agnostic Morphology Evolution

    Authors: Donald J. Hejna III, Pieter Abbeel, Lerrel Pinto

    Abstract: Deep reinforcement learning primarily focuses on learning behavior, usually overlooking the fact that an agent's function is largely determined by form. So, how should one go about finding a morphology fit for solving tasks in a given environment? Current approaches that co-adapt morphology and behavior use a specific task's reward as a signal for morphology optimization. However, this often requi… ▽ More

    Submitted 25 February, 2021; originally announced February 2021.

    Comments: ICLR 2021

  14. arXiv:2003.01709  [pdf, other

    cs.LG cs.AI cs.RO stat.ML

    Hierarchically Decoupled Imitation for Morphological Transfer

    Authors: Donald J. Hejna III, Pieter Abbeel, Lerrel Pinto

    Abstract: Learning long-range behaviors on complex high-dimensional agents is a fundamental problem in robot learning. For such tasks, we argue that transferring learned information from a morphologically simpler agent can massively improve the sample efficiency of a more complex one. To this end, we propose a hierarchical decoupling of policies into two parts: an independently learned low-level policy and… ▽ More

    Submitted 31 August, 2020; v1 submitted 3 March, 2020; originally announced March 2020.

    Comments: International Conference on Machine Learning (ICML) 2020 camera ready submission