Skip to main content

Showing 1–6 of 6 results for author: Siththaranjan, A

.
  1. arXiv:2405.17713  [pdf, other

    cs.AI cs.LG

    AI Alignment with Changing and Influenceable Reward Functions

    Authors: Micah Carroll, Davis Foote, Anand Siththaranjan, Stuart Russell, Anca Dragan

    Abstract: Existing AI alignment approaches assume that preferences are static, which is unrealistic: our preferences change, and may even be influenced by our interactions with AI systems themselves. To clarify the consequences of incorrectly assuming static preferences, we introduce Dynamic Reward Markov Decision Processes (DR-MDPs), which explicitly model preference changes and the AI's influence on them.… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Accepted to ICML 2024

  2. arXiv:2402.10182  [pdf, other

    eess.SY

    Intent Demonstration in General-Sum Dynamic Games via Iterative Linear-Quadratic Approximations

    Authors: **gqi Li, Anand Siththaranjan, Somayeh Sojoudi, Claire Tomlin, Andrea Bajcsy

    Abstract: Autonomous agents should be able to coordinate with other agents without knowing their intents ahead of time. While prior work has studied how agents can gather information about the intent of others, in this work we study the inverse problem: how agents can demonstrate their intent to others, within the framework of general-sum dynamic games. We first present a model of this intent demonstration… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

    Comments: Under review by L4DC 2024

  3. arXiv:2312.08358  [pdf, other

    cs.LG cs.AI stat.ML

    Distributional Preference Learning: Understanding and Accounting for Hidden Context in RLHF

    Authors: Anand Siththaranjan, Cassidy Laidlaw, Dylan Hadfield-Menell

    Abstract: In practice, preference learning from human feedback depends on incomplete data with hidden context. Hidden context refers to data that affects the feedback received, but which is not represented in the data used to train a preference model. This captures common issues of data collection, such as having human annotators with varied preferences, cognitive processes that result in seemingly irration… ▽ More

    Submitted 16 April, 2024; v1 submitted 13 December, 2023; originally announced December 2023.

    Comments: Presented at ICLR 2024

  4. arXiv:2307.15217  [pdf, other

    cs.AI cs.CL cs.LG

    Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

    Authors: Stephen Casper, Xander Davies, Claudia Shi, Thomas Krendl Gilbert, Jérémy Scheurer, Javier Rando, Rachel Freedman, Tomasz Korbak, David Lindner, Pedro Freire, Tony Wang, Samuel Marks, Charbel-Raphaël Segerie, Micah Carroll, Andi Peng, Phillip Christoffersen, Mehul Damani, Stewart Slocum, Usman Anwar, Anand Siththaranjan, Max Nadeau, Eric J. Michaud, Jacob Pfau, Dmitrii Krasheninnikov, Xin Chen , et al. (7 additional authors not shown)

    Abstract: Reinforcement learning from human feedback (RLHF) is a technique for training AI systems to align with human goals. RLHF has emerged as the central method used to finetune state-of-the-art large language models (LLMs). Despite this popularity, there has been relatively little public work systematizing its flaws. In this paper, we (1) survey open problems and fundamental limitations of RLHF and rel… ▽ More

    Submitted 11 September, 2023; v1 submitted 27 July, 2023; originally announced July 2023.

  5. arXiv:2204.01986  [pdf, other

    eess.SY math.OC

    On the Computational Consequences of Cost Function Design in Nonlinear Optimal Control

    Authors: Tyler Westenbroek, Anand Siththaranjan, Mohsin Sarwari, Claire J. Tomlin, Shankar S. Sastry

    Abstract: Optimal control is an essential tool for stabilizing complex nonlinear systems. However, despite the extensive impacts of methods such as receding horizon control, dynamic programming and reinforcement learning, the design of cost functions for a particular system often remains a heuristic-driven process of trial and error. In this paper we seek to gain insights into how the choice of cost functio… ▽ More

    Submitted 17 November, 2022; v1 submitted 5 April, 2022; originally announced April 2022.

  6. arXiv:2103.05746  [pdf, other

    cs.RO cs.AI cs.HC eess.SY

    Analyzing Human Models that Adapt Online

    Authors: Andrea Bajcsy, Anand Siththaranjan, Claire J. Tomlin, Anca D. Dragan

    Abstract: Predictive human models often need to adapt their parameters online from human data. This raises previously ignored safety-related questions for robots relying on these models such as what the model could learn online and how quickly could it learn it. For instance, when will the robot have a confident estimate in a nearby human's goal? Or, what parameter initializations guarantee that the robot c… ▽ More

    Submitted 30 September, 2021; v1 submitted 9 March, 2021; originally announced March 2021.

    Comments: ICRA 2021