Skip to main content

Showing 1–5 of 5 results for author: Knox, W B

.
  1. arXiv:2310.13639  [pdf, other

    cs.LG cs.AI

    Contrastive Preference Learning: Learning from Human Feedback without RL

    Authors: Joey Hejna, Rafael Rafailov, Harshit Sikchi, Chelsea Finn, Scott Niekum, W. Bradley Knox, Dorsa Sadigh

    Abstract: Reinforcement Learning from Human Feedback (RLHF) has emerged as a popular paradigm for aligning models with human intent. Typically RLHF algorithms operate in two phases: first, use human preferences to learn a reward function and second, align the model by optimizing the learned reward via reinforcement learning (RL). This paradigm assumes that human preferences are distributed according to rewa… ▽ More

    Submitted 30 April, 2024; v1 submitted 20 October, 2023; originally announced October 2023.

    Comments: ICLR 2024. Code released at https://github.com/jhejna/cpl

  2. arXiv:2310.02456  [pdf, other

    cs.LG cs.AI

    Learning Optimal Advantage from Preferences and Mistaking it for Reward

    Authors: W. Bradley Knox, Stephane Hatgis-Kessell, Sigurdur Orn Adalgeirsson, Serena Booth, Anca Dragan, Peter Stone, Scott Niekum

    Abstract: We consider algorithms for learning reward functions from human preferences over pairs of trajectory segments, as used in reinforcement learning from human feedback (RLHF). Most recent work assumes that human preferences are generated based only upon the reward accrued within those segments, or their partial return. Recent work casts doubt on the validity of this assumption, proposing an alternati… ▽ More

    Submitted 3 October, 2023; originally announced October 2023.

    Comments: 8 pages (16 pages with references and appendix), 11 figures

    ACM Class: I.2.6; I.2.8

  3. arXiv:2206.02231  [pdf, other

    cs.LG cs.AI eess.SY

    Models of human preference for learning reward functions

    Authors: W. Bradley Knox, Stephane Hatgis-Kessell, Serena Booth, Scott Niekum, Peter Stone, Alessandro Allievi

    Abstract: The utility of reinforcement learning is limited by the alignment of reward functions with the interests of human stakeholders. One promising method for alignment is to learn the reward function from human-generated preferences between pairs of trajectory segments, a type of reinforcement learning from human feedback (RLHF). These human preferences are typically assumed to be informed solely by pa… ▽ More

    Submitted 6 September, 2023; v1 submitted 5 June, 2022; originally announced June 2022.

    Comments: 16 pages (40 pages with references and appendix), 23 figures

    ACM Class: I.2.6; I.2.8

  4. arXiv:2104.13906  [pdf, other

    cs.LG

    Reward (Mis)design for Autonomous Driving

    Authors: W. Bradley Knox, Alessandro Allievi, Holger Banzhaf, Felix Schmitt, Peter Stone

    Abstract: This article considers the problem of diagnosing certain common errors in reward design. Its insights are also applicable to the design of cost functions and performance metrics more generally. To diagnose common errors, we develop 8 simple sanity checks for identifying flaws in reward functions. These sanity checks are applied to reward functions from past work on reinforcement learning (RL) for… ▽ More

    Submitted 11 March, 2022; v1 submitted 28 April, 2021; originally announced April 2021.

    Comments: 14 pages (27 pages with appendix), 4 figures

    MSC Class: 91B16 ACM Class: I.2.6; I.2.8; I.2.9

  5. arXiv:2009.13649  [pdf, other

    cs.HC cs.RO

    The EMPATHIC Framework for Task Learning from Implicit Human Feedback

    Authors: Yuchen Cui, Qi** Zhang, Alessandro Allievi, Peter Stone, Scott Niekum, W. Bradley Knox

    Abstract: Reactions such as gestures, facial expressions, and vocalizations are an abundant, naturally occurring channel of information that humans provide during interactions. A robot or other agent could leverage an understanding of such implicit human feedback to improve its task performance at no cost to the human. This approach contrasts with common agent teaching methods based on demonstrations, criti… ▽ More

    Submitted 7 December, 2020; v1 submitted 28 September, 2020; originally announced September 2020.

    Comments: Conference on Robot Learning 2020