Skip to main content

Showing 1–2 of 2 results for author: Das, V

Searching in archive stat. Search in all archives.
.
  1. arXiv:2312.00267  [pdf, other

    cs.LG cs.AI stat.ML

    Sample Efficient Reinforcement Learning from Human Feedback via Active Exploration

    Authors: Viraj Mehta, Vikramjeet Das, Ojash Neopane, Yijia Dai, Ilija Bogunovic, Jeff Schneider, Willie Neiswanger

    Abstract: Preference-based feedback is important for many applications in reinforcement learning where direct evaluation of a reward function is not feasible. A notable recent example arises in reinforcement learning from human feedback (RLHF) on large language models. For many applications of RLHF, the cost of acquiring the human feedback can be substantial. In this work, we take advantage of the fact that… ▽ More

    Submitted 30 November, 2023; originally announced December 2023.

  2. arXiv:2307.11288  [pdf, other

    cs.LG cs.AI stat.ML

    Kernelized Offline Contextual Dueling Bandits

    Authors: Viraj Mehta, Ojash Neopane, Vikramjeet Das, Sen Lin, Jeff Schneider, Willie Neiswanger

    Abstract: Preference-based feedback is important for many applications where direct evaluation of a reward function is not feasible. A notable recent example arises in reinforcement learning from human feedback on large language models. For many of these applications, the cost of acquiring the human feedback can be substantial or even prohibitive. In this work, we take advantage of the fact that often the a… ▽ More

    Submitted 20 July, 2023; originally announced July 2023.