Skip to main content

Showing 1–1 of 1 results for author: Gumastate, R

.
  1. arXiv:2406.10445  [pdf, other

    cs.LG

    Optimal Reward Labeling: Bridging Offline Preference and Reward-Based Reinforcement Learning

    Authors: Yinglun Xu, David Zhu, Rohan Gumastate, Gagandeep Singh

    Abstract: Offline reinforcement learning has become one of the most practical RL settings. A recent success story has been RLHF, offline preference-based RL (PBRL) with preference from humans. However, most existing works on offline RL focus on the standard setting with scalar reward feedback. It remains unknown how to universally transfer the existing rich understanding of offline RL from the reward-based… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.