Skip to main content

Showing 1–12 of 12 results for author: Laidlaw, C

.
  1. arXiv:2403.03185  [pdf, other

    cs.LG cs.AI

    Preventing Reward Hacking with Occupancy Measure Regularization

    Authors: Cassidy Laidlaw, Shivam Singhal, Anca Dragan

    Abstract: Reward hacking occurs when an agent performs very well with respect to a "proxy" reward function (which may be hand-specified or learned), but poorly with respect to the unknown true reward. Since ensuring good alignment between the proxy and true reward is extremely difficult, one approach to prevent reward hacking is optimizing the proxy conservatively. Prior work has particularly focused on enf… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

  2. arXiv:2312.09983  [pdf, other

    cs.LG cs.AI stat.ML

    Toward Computationally Efficient Inverse Reinforcement Learning via Reward Sha**

    Authors: Lauren H. Cooke, Harvey Klyne, Edwin Zhang, Cassidy Laidlaw, Milind Tambe, Finale Doshi-Velez

    Abstract: Inverse reinforcement learning (IRL) is computationally challenging, with common approaches requiring the solution of multiple reinforcement learning (RL) sub-problems. This work motivates the use of potential-based reward sha** to reduce the computational burden of each RL sub-problem. This work serves as a proof-of-concept and we hope will inspire future developments towards computationally ef… ▽ More

    Submitted 18 December, 2023; v1 submitted 15 December, 2023; originally announced December 2023.

  3. arXiv:2312.08369  [pdf, other

    stat.ML cs.AI cs.LG

    The Effective Horizon Explains Deep RL Performance in Stochastic Environments

    Authors: Cassidy Laidlaw, Banghua Zhu, Stuart Russell, Anca Dragan

    Abstract: Reinforcement learning (RL) theory has largely focused on proving minimax sample complexity bounds. These require strategic exploration algorithms that use relatively limited function classes for representing the policy or value function. Our goal is to explain why deep RL algorithms often perform well in practice, despite using random exploration and much more expressive function classes like neu… ▽ More

    Submitted 12 April, 2024; v1 submitted 13 December, 2023; originally announced December 2023.

    Journal ref: ICLR 2024 (Spotlight)

  4. arXiv:2312.08358  [pdf, other

    cs.LG cs.AI stat.ML

    Distributional Preference Learning: Understanding and Accounting for Hidden Context in RLHF

    Authors: Anand Siththaranjan, Cassidy Laidlaw, Dylan Hadfield-Menell

    Abstract: In practice, preference learning from human feedback depends on incomplete data with hidden context. Hidden context refers to data that affects the feedback received, but which is not represented in the data used to train a preference model. This captures common issues of data collection, such as having human annotators with varied preferences, cognitive processes that result in seemingly irration… ▽ More

    Submitted 16 April, 2024; v1 submitted 13 December, 2023; originally announced December 2023.

    Comments: Presented at ICLR 2024

  5. arXiv:2309.03812  [pdf, other

    cs.CV cs.AI cs.LG

    AnthroNet: Conditional Generation of Humans via Anthropometrics

    Authors: Francesco Picetti, Shrinath Deshpande, Jonathan Leban, Soroosh Shahtalebi, Jay Patel, Peifeng **g, Chunpu Wang, Charles Metze III, Cameron Sun, Cera Laidlaw, James Warren, Kathy Huynh, River Page, Jonathan Hogins, Adam Crespi, Sujoy Ganguly, Salehe Erfanian Ebadi

    Abstract: We present a novel human body model formulated by an extensive set of anthropocentric measurements, which is capable of generating a wide range of human body shapes and poses. The proposed model enables direct modeling of specific human identities through a deep generative architecture, which can produce humans in any arbitrary pose. It is the first of its kind to have been trained end-to-end usin… ▽ More

    Submitted 7 September, 2023; originally announced September 2023.

    Comments: AnthroNet's Unity data generator source code is available at: https://unity-technologies.github.io/AnthroNet/

  6. arXiv:2304.09853  [pdf, other

    cs.LG stat.ML

    Bridging RL Theory and Practice with the Effective Horizon

    Authors: Cassidy Laidlaw, Stuart Russell, Anca Dragan

    Abstract: Deep reinforcement learning (RL) works impressively in some environments and fails catastrophically in others. Ideally, RL theory should be able to provide an understanding of why this is, i.e. bounds predictive of practical performance. Unfortunately, current theory does not quite have this ability. We compare standard deep RL algorithms to prior sample complexity bounds by introducing a new data… ▽ More

    Submitted 11 January, 2024; v1 submitted 19 April, 2023; originally announced April 2023.

    Journal ref: NeurIPS 2023 (Oral)

  7. arXiv:2204.10759  [pdf, other

    cs.AI cs.LG cs.MA cs.RO

    The Boltzmann Policy Distribution: Accounting for Systematic Suboptimality in Human Models

    Authors: Cassidy Laidlaw, Anca Dragan

    Abstract: Models of human behavior for prediction and collaboration tend to fall into two categories: ones that learn from large amounts of data via imitation learning, and ones that assume human behavior to be noisily-optimal for some reward function. The former are very useful, but only when it is possible to gather a lot of human data in the target environment and distribution. The advantage of the latte… ▽ More

    Submitted 22 April, 2022; originally announced April 2022.

    Comments: Published at ICLR 2022

  8. arXiv:2106.10394  [pdf, ps, other

    stat.ML cs.AI cs.LG

    Uncertain Decisions Facilitate Better Preference Learning

    Authors: Cassidy Laidlaw, Stuart Russell

    Abstract: Existing observational approaches for learning human preferences, such as inverse reinforcement learning, usually make strong assumptions about the observability of the human's environment. However, in reality, people make many important decisions under uncertainty. To better understand preference learning in these cases, we study the setting of inverse decision theory (IDT), a previously proposed… ▽ More

    Submitted 28 October, 2021; v1 submitted 18 June, 2021; originally announced June 2021.

    Comments: Accepted at NeurIPS 2021 (Spotlight)

  9. arXiv:2006.12655  [pdf, other

    cs.LG cs.CV stat.ML

    Perceptual Adversarial Robustness: Defense Against Unseen Threat Models

    Authors: Cassidy Laidlaw, Sahil Singla, Soheil Feizi

    Abstract: A key challenge in adversarial robustness is the lack of a precise mathematical characterization of human perception, used in the very definition of adversarial attacks that are imperceptible to human eyes. Most current attacks and defenses try to avoid this issue by considering restrictive adversarial threat models such as those bounded by $L_2$ or $L_\infty$ distance, spatial perturbations, etc.… ▽ More

    Submitted 4 July, 2021; v1 submitted 22 June, 2020; originally announced June 2020.

    Comments: Published in ICLR 2021. Code and data are available at https://github.com/cassidylaidlaw/perceptual-advex

  10. arXiv:1911.11253  [pdf, other

    cs.LG cs.AI stat.ML

    Playing it Safe: Adversarial Robustness with an Abstain Option

    Authors: Cassidy Laidlaw, Soheil Feizi

    Abstract: We explore adversarial robustness in the setting in which it is acceptable for a classifier to abstain---that is, output no class---on adversarial examples. Adversarial examples are small perturbations of normal inputs to a classifier that cause the classifier to give incorrect output; they present security and safety challenges for machine learning systems. In many safety-critical applications, i… ▽ More

    Submitted 25 November, 2019; originally announced November 2019.

  11. arXiv:1906.00001  [pdf, other

    cs.LG cs.CV

    Functional Adversarial Attacks

    Authors: Cassidy Laidlaw, Soheil Feizi

    Abstract: We propose functional adversarial attacks, a novel class of threat models for crafting adversarial examples to fool machine learning models. Unlike a standard $\ell_p$-ball threat model, a functional adversarial threat model allows only a single function to be used to perturb input features to produce an adversarial example. For example, a functional adversarial attack applied on colors of an imag… ▽ More

    Submitted 29 October, 2019; v1 submitted 29 May, 2019; originally announced June 2019.

    Comments: Accepted to NeurIPS 2019

  12. arXiv:1905.03079  [pdf, other

    cs.CV

    Capture, Learning, and Synthesis of 3D Speaking Styles

    Authors: Daniel Cudeiro, Timo Bolkart, Cassidy Laidlaw, Anurag Ranjan, Michael J. Black

    Abstract: Audio-driven 3D facial animation has been widely explored, but achieving realistic, human-like performance is still unsolved. This is due to the lack of available 3D datasets, models, and standard evaluation metrics. To address this, we introduce a unique 4D face dataset with about 29 minutes of 4D scans captured at 60 fps and synchronized audio from 12 speakers. We then train a neural network on… ▽ More

    Submitted 8 May, 2019; originally announced May 2019.

    Comments: To appear in CVPR 2019