Skip to main content

Showing 1–50 of 54 results for author: Brown, D S

.
  1. arXiv:2404.07185  [pdf, other

    cs.RO cs.AI cs.LG

    Reward Learning from Suboptimal Demonstrations with Applications in Surgical Electrocautery

    Authors: Zohre Karimi, Shing-Hei Ho, Bao Thach, Alan Kuntz, Daniel S. Brown

    Abstract: Automating robotic surgery via learning from demonstration (LfD) techniques is extremely challenging. This is because surgical tasks often involve sequential decision-making processes with complex interactions of physical objects and have low tolerance for mistakes. Prior works assume that all demonstrations are fully observable and optimal, which might not be practical in the real world. This pap… ▽ More

    Submitted 15 April, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

    Comments: In proceedings of the International Symposium on Medical Robotics (ISMR) 2024. Equal contribution from two first authors

  2. arXiv:2404.04241  [pdf, other

    cs.RO

    Modeling Kinematic Uncertainty of Tendon-Driven Continuum Robots via Mixture Density Networks

    Authors: Jordan Thompson, Brian Y. Cho, Daniel S. Brown, Alan Kuntz

    Abstract: Tendon-driven continuum robot kinematic models are frequently computationally expensive, inaccurate due to unmodeled effects, or both. In particular, unmodeled effects produce uncertainties that arise during the robot's operation that lead to variability in the resulting geometry. We propose a novel solution to these issues through the development of a Gaussian mixture kinematic model. We train a… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

  3. arXiv:2403.02431  [pdf, other

    cs.RO

    Bayesian Constraint Inference from User Demonstrations Based on Margin-Respecting Preference Models

    Authors: Dimitris Papadimitriou, Daniel S. Brown

    Abstract: It is crucial for robots to be aware of the presence of constraints in order to acquire safe policies. However, explicitly specifying all constraints in an environment can be a challenging task. State-of-the-art constraint inference algorithms learn constraints from demonstrations, but tend to be computationally expensive and prone to instability issues. In this paper, we propose a novel Bayesian… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

  4. arXiv:2310.16941  [pdf, other

    cs.RO cs.LG cs.MA

    Exploring Behavior Discovery Methods for Heterogeneous Swarms of Limited-Capability Robots

    Authors: Connor Mattson, Jeremy C. Clark, Daniel S. Brown

    Abstract: We study the problem of determining the emergent behaviors that are possible given a functionally heterogeneous swarm of robots with limited capabilities. Prior work has considered behavior search for homogeneous swarms and proposed the use of novelty search over either a hand-specified or learned behavior space followed by clustering to return a taxonomy of emergent behaviors to the user. In this… ▽ More

    Submitted 25 October, 2023; originally announced October 2023.

    Comments: 11 pages, 9 figures, To be published in Proceedings IEEE International Symposium on Multi-Robot & Multi-Agent Systems (MRS 2023)

  5. arXiv:2310.10610  [pdf, other

    cs.AI cs.LG cs.RO

    Quantifying Assistive Robustness Via the Natural-Adversarial Frontier

    Authors: Jerry Zhi-Yang He, Zackory Erickson, Daniel S. Brown, Anca D. Dragan

    Abstract: Our ultimate goal is to build robust policies for robots that assist people. What makes this hard is that people can behave unexpectedly at test time, potentially interacting with the robot outside its training distribution and leading to failures. Even just measuring robustness is a challenge. Adversarial perturbations are the default, but they can paint the wrong picture: they can correspond to… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

  6. arXiv:2309.11408  [pdf, other

    cs.RO eess.SY

    Indirect Swarm Control: Characterization and Analysis of Emergent Swarm Behaviors

    Authors: Ricardo Vega, Connor Mattson, Daniel S. Brown, Cameron Nowzari

    Abstract: Emergence and emergent behaviors are often defined as cases where changes in local interactions between agents at a lower level effectively changes what occurs in the higher level of the system (i.e., the whole swarm) and its properties. However, the manner in which these collective emergent behaviors self-organize is less understood. The focus of this paper is in presenting a new framework for ch… ▽ More

    Submitted 28 March, 2024; v1 submitted 20 September, 2023; originally announced September 2023.

    Comments: 8 pages, 13 figures, submitted to IROS 2024 conference

  7. arXiv:2307.10026  [pdf, other

    cs.LG

    Contextual Reliability: When Different Features Matter in Different Contexts

    Authors: Gaurav Ghosal, Amrith Setlur, Daniel S. Brown, Anca D. Dragan, Aditi Raghunathan

    Abstract: Deep neural networks often fail catastrophically by relying on spurious correlations. Most prior work assumes a clear dichotomy into spurious and reliable features; however, this is often unrealistic. For example, most of the time we do not want an autonomous car to simply copy the speed of surrounding cars -- we don't want our car to run a red light if a neighboring car does so. However, we canno… ▽ More

    Submitted 19 July, 2023; originally announced July 2023.

    Comments: ICML 2023 Camera Ready Version

  8. arXiv:2306.13004  [pdf, other

    cs.LG cs.AI

    Can Differentiable Decision Trees Enable Interpretable Reward Learning from Human Feedback?

    Authors: Akansha Kalra, Daniel S. Brown

    Abstract: Reinforcement Learning from Human Feedback (RLHF) has emerged as a popular paradigm for capturing human intent to alleviate the challenges of hand-crafting the reward values. Despite the increasing interest in RLHF, most works learn black box reward functions that while expressive are difficult to interpret and often require running the whole costly process of RL before we can even decipher if the… ▽ More

    Submitted 24 June, 2024; v1 submitted 22 June, 2023; originally announced June 2023.

    Comments: Accepted at RLC 2024

  9. arXiv:2305.16148  [pdf, other

    cs.MA cs.LG cs.RO

    Leveraging Human Feedback to Evolve and Discover Novel Emergent Behaviors in Robot Swarms

    Authors: Connor Mattson, Daniel S. Brown

    Abstract: Robot swarms often exhibit emergent behaviors that are fascinating to observe; however, it is often difficult to predict what swarm behaviors can emerge under a given set of agent capabilities. We seek to efficiently leverage human input to automatically discover a taxonomy of collective behaviors that can emerge from a particular multi-agent system, without requiring the human to know beforehand… ▽ More

    Submitted 16 July, 2023; v1 submitted 25 April, 2023; originally announced May 2023.

    Comments: 13 pages, 10 figures, To be published in Proceedings Genetic and Evolutionary Computation Conference (GECCO 2023)

  10. arXiv:2301.04741  [pdf, other

    cs.LG

    Efficient Preference-Based Reinforcement Learning Using Learned Dynamics Models

    Authors: Yi Liu, Gaurav Datta, Ellen Novoseller, Daniel S. Brown

    Abstract: Preference-based reinforcement learning (PbRL) can enable robots to learn to perform tasks based on an individual's preferences without requiring a hand-crafted reward function. However, existing approaches either assume access to a high-fidelity simulator or analytic model or take a model-free approach that requires extensive, possibly unsafe online environment interactions. In this paper, we stu… ▽ More

    Submitted 9 February, 2024; v1 submitted 11 January, 2023; originally announced January 2023.

    Comments: In proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA 2023)

  11. arXiv:2301.01392  [pdf, other

    cs.LG cs.AI

    Benchmarks and Algorithms for Offline Preference-Based Reward Learning

    Authors: Daniel Shin, Anca D. Dragan, Daniel S. Brown

    Abstract: Learning a reward function from human preferences is challenging as it typically requires having a high-fidelity simulator or using expensive and potentially unsafe actual physical rollouts in the environment. However, in many tasks the agent might have access to offline data from related tasks in the same target environment. While offline data is increasingly being used to aid policy optimization… ▽ More

    Submitted 3 January, 2023; originally announced January 2023.

    Comments: Transactions on Machine Learning Research. arXiv admin note: text overlap with arXiv:2107.09251

  12. arXiv:2301.00810  [pdf, other

    cs.RO cs.AI cs.HC cs.LG

    SIRL: Similarity-based Implicit Representation Learning

    Authors: Andreea Bobu, Yi Liu, Rohin Shah, Daniel S. Brown, Anca D. Dragan

    Abstract: When robots learn reward functions using high capacity models that take raw state directly as input, they need to both learn a representation for what matters in the task -- the task ``features" -- as well as how to combine these features into a single objective. If they try to do both at once from input designed to teach the full reward function, it is easy to end up with a representation that co… ▽ More

    Submitted 17 March, 2023; v1 submitted 2 January, 2023; originally announced January 2023.

    Comments: 12 pages, 6 figures, HRI 2023

  13. arXiv:2212.03175  [pdf, other

    cs.LG cs.AI cs.RO

    Learning Representations that Enable Generalization in Assistive Tasks

    Authors: Jerry Zhi-Yang He, Aditi Raghunathan, Daniel S. Brown, Zackory Erickson, Anca D. Dragan

    Abstract: Recent work in sim2real has successfully enabled robots to act in physical environments by training in simulation with a diverse ''population'' of environments (i.e. domain randomization). In this work, we focus on enabling generalization in assistive tasks: tasks in which the robot is acting to assist a user (e.g. hel** someone with motor impairments with bathing or with scratching an itch). Su… ▽ More

    Submitted 5 December, 2022; originally announced December 2022.

  14. Autonomous Assessment of Demonstration Sufficiency via Bayesian Inverse Reinforcement Learning

    Authors: Tu Trinh, Haoyu Chen, Daniel S. Brown

    Abstract: We examine the problem of determining demonstration sufficiency: how can a robot self-assess whether it has received enough demonstrations from an expert to ensure a desired level of performance? To address this problem, we propose a novel self-assessment approach based on Bayesian inverse reinforcement learning and value-at-risk, enabling learning-from-demonstration ("LfD") robots to compute high… ▽ More

    Submitted 2 January, 2024; v1 submitted 28 November, 2022; originally announced November 2022.

    Comments: Prior version appears in proceedings of AAAI FSS-22 Symposium "Lessons Learned for Autonomous Assessment of Machine Abilities (LLAAMA)". Current version appears in proceedings of HRI '24, March 11-14, 2024, Boulder, CO, USA

  15. arXiv:2210.07432  [pdf, other

    cs.LG cs.AI

    Monte Carlo Augmented Actor-Critic for Sparse Reward Deep Reinforcement Learning from Suboptimal Demonstrations

    Authors: Albert Wilcox, Ashwin Balakrishna, Jules Dedieu, Wyame Benslimane, Daniel S. Brown, Ken Goldberg

    Abstract: Providing densely shaped reward functions for RL algorithms is often exceedingly challenging, motivating the development of RL algorithms that can learn from easier-to-specify sparse reward functions. This sparsity poses new exploration challenges. One common way to address this problem is using demonstrations to provide initial signal about regions of the state space with high rewards. However, p… ▽ More

    Submitted 20 October, 2022; v1 submitted 13 October, 2022; originally announced October 2022.

    Comments: To be published in the 36th Conference on Neural Information Processing Systems (NeurIPS 2022). 19 pages. 11 figures

  16. arXiv:2208.10687  [pdf, other

    cs.LG cs.AI

    The Effect of Modeling Human Rationality Level on Learning Rewards from Multiple Feedback Types

    Authors: Gaurav R. Ghosal, Matthew Zurek, Daniel S. Brown, Anca D. Dragan

    Abstract: When inferring reward functions from human behavior (be it demonstrations, comparisons, physical corrections, or e-stops), it has proven useful to model the human as making noisy-rational choices, with a "rationality coefficient" capturing how much noise or entropy we expect to see in the human behavior. Prior work typically sets the rationality level to a constant value, regardless of the type, o… ▽ More

    Submitted 9 March, 2023; v1 submitted 22 August, 2022; originally announced August 2022.

    Comments: Published at AAAI 2023; 10 pages, 5 figures plus appendices

  17. arXiv:2207.00911  [pdf, other

    cs.RO

    Learning Switching Criteria for Sim2Real Transfer of Robotic Fabric Manipulation Policies

    Authors: Satvik Sharma, Ellen Novoseller, Vainavi Viswanath, Zaynah Javed, Rishi Parikh, Ryan Hoque, Ashwin Balakrishna, Daniel S. Brown, Ken Goldberg

    Abstract: Simulation-to-reality transfer has emerged as a popular and highly successful method to train robotic control policies for a wide variety of tasks. However, it is often challenging to determine when policies trained in simulation are ready to be transferred to the physical world. Deploying policies that have been trained with very little simulation data can result in unreliable and dangerous behav… ▽ More

    Submitted 2 July, 2022; originally announced July 2022.

    Comments: CASE 2022. The first two authors contributed equally. 9 pages; 5 figures; 1 table

  18. arXiv:2204.06601  [pdf, other

    cs.LG cs.RO

    Causal Confusion and Reward Misidentification in Preference-Based Reward Learning

    Authors: Jeremy Tien, Jerry Zhi-Yang He, Zackory Erickson, Anca D. Dragan, Daniel S. Brown

    Abstract: Learning policies via preference-based reward learning is an increasingly popular method for customizing agent behavior, but has been shown anecdotally to be prone to spurious correlations and reward hacking behaviors. While much prior work focuses on causal confusion in reinforcement learning and behavioral cloning, we focus on a systematic study of causal confusion and reward misidentification w… ▽ More

    Submitted 18 March, 2023; v1 submitted 13 April, 2022; originally announced April 2022.

    Comments: In the proceedings of the Eleventh International Conference on Learning Representations (ICLR 2023). https://iclr.cc/virtual/2023/poster/10822

  19. arXiv:2203.02091  [pdf, other

    cs.RO cs.AI

    Teaching Robots to Span the Space of Functional Expressive Motion

    Authors: Arjun Sripathy, Andreea Bobu, Zhongyu Li, Koushil Sreenath, Daniel S. Brown, Anca D. Dragan

    Abstract: Our goal is to enable robots to perform functional tasks in emotive ways, be it in response to their users' emotional states, or expressive of their confidence levels. Prior work has proposed learning independent cost functions from user feedback for each target emotion, so that the robot may optimize it alongside task and environment specific objectives for any situation it encounters. However, t… ▽ More

    Submitted 2 August, 2022; v1 submitted 3 March, 2022; originally announced March 2022.

  20. arXiv:2111.15002  [pdf, other

    cs.RO

    LEGS: Learning Efficient Grasp Sets for Exploratory Gras**

    Authors: Letian Fu, Michael Danielczuk, Ashwin Balakrishna, Daniel S. Brown, Jeffrey Ichnowski, Eugen Solowjow, Ken Goldberg

    Abstract: While deep learning has enabled significant progress in designing general purpose robot gras** systems, there remain objects which still pose challenges for these systems. Recent work on Exploratory Gras** has formalized the problem of systematically exploring grasps on these adversarial objects and explored a multi-armed bandit model for identifying high-quality grasps on each object stable p… ▽ More

    Submitted 1 March, 2022; v1 submitted 29 November, 2021; originally announced November 2021.

    Comments: Proceedings of 2022 IEEE International Conference on Robotics and Automation. Philadelphia, PA. May, 2022

  21. arXiv:2109.08273  [pdf, other

    cs.RO cs.AI

    ThriftyDAgger: Budget-Aware Novelty and Risk Gating for Interactive Imitation Learning

    Authors: Ryan Hoque, Ashwin Balakrishna, Ellen Novoseller, Albert Wilcox, Daniel S. Brown, Ken Goldberg

    Abstract: Effective robot learning often requires online human feedback and interventions that can cost significant human time, giving rise to the central challenge in interactive imitation learning: is it possible to control the timing and length of interventions to both facilitate learning and limit burden on the human supervisor? This paper presents ThriftyDAgger, an algorithm for actively querying a hum… ▽ More

    Submitted 16 September, 2021; originally announced September 2021.

    Comments: CoRL 2021 Oral

  22. arXiv:2107.09251  [pdf, other

    cs.LG

    Offline Preference-Based Apprenticeship Learning

    Authors: Daniel Shin, Daniel S. Brown, Anca D. Dragan

    Abstract: Learning a reward function from human preferences is challenging as it typically requires having a high-fidelity simulator or using expensive and potentially unsafe actual physical rollouts in the environment. However, in many tasks the agent might have access to offline data from related tasks in the same target environment. While offline data is increasingly being used to aid policy optimization… ▽ More

    Submitted 16 February, 2022; v1 submitted 20 July, 2021; originally announced July 2021.

    Comments: ICML Workshop on Human-AI Collaboration in Sequential Decision-Making, 2021

  23. arXiv:2107.05789  [pdf, other

    cs.RO cs.AI cs.CV

    Kit-Net: Self-Supervised Learning to Kit Novel 3D Objects into Novel 3D Cavities

    Authors: Shivin Devgon, Jeffrey Ichnowski, Michael Danielczuk, Daniel S. Brown, Ashwin Balakrishna, Shirin Joshi, Eduardo M. C. Rocha, Eugen Solowjow, Ken Goldberg

    Abstract: In industrial part kitting, 3D objects are inserted into cavities for transportation or subsequent assembly. Kitting is a critical step as it can decrease downstream processing and handling times and enable lower storage and ship** costs. We present Kit-Net, a framework for kitting previously unseen 3D objects into cavities given depth images of both the target cavity and an object held by a gri… ▽ More

    Submitted 12 July, 2021; originally announced July 2021.

    Journal ref: Conference on Automation Science and Engineering (CASE) 2021

  24. arXiv:2106.06499  [pdf, other

    cs.LG cs.AI

    Policy Gradient Bayesian Robust Optimization for Imitation Learning

    Authors: Zaynah Javed, Daniel S. Brown, Satvik Sharma, Jerry Zhu, Ashwin Balakrishna, Marek Petrik, Anca D. Dragan, Ken Goldberg

    Abstract: The difficulty in specifying rewards for many real-world problems has led to an increased focus on learning rewards from human feedback, such as demonstrations. However, there are often many different reward functions that explain the human feedback, leaving agents with uncertainty over what the true reward function is. While most policy optimization approaches handle this uncertainty by optimizin… ▽ More

    Submitted 21 June, 2021; v1 submitted 11 June, 2021; originally announced June 2021.

    Comments: In proceedings of the International Conference on Machine Learning (ICML) 2021

  25. arXiv:2104.11353  [pdf, other

    cs.RO cs.LG eess.SY

    Optimal Cost Design for Model Predictive Control

    Authors: Avik Jain, Lawrence Chan, Daniel S. Brown, Anca D. Dragan

    Abstract: Many robotics domains use some form of nonconvex model predictive control (MPC) for planning, which sets a reduced time horizon, performs trajectory optimization, and replans at every step. The actual task typically requires a much longer horizon than is computationally tractable, and is specified via a cost function that cumulates over that full horizon. For instance, an autonomous car may have a… ▽ More

    Submitted 9 June, 2021; v1 submitted 22 April, 2021; originally announced April 2021.

    Comments: In proceedings of 3rd Annual Learning for Dynamics & Control Conference (L4DC) 2021

  26. arXiv:2104.06556  [pdf, other

    cs.RO cs.HC cs.LG

    Situational Confidence Assistance for Lifelong Shared Autonomy

    Authors: Matthew Zurek, Andreea Bobu, Daniel S. Brown, Anca D. Dragan

    Abstract: Shared autonomy enables robots to infer user intent and assist in accomplishing it. But when the user wants to do a new task that the robot does not know about, shared autonomy will hinder their performance by attempting to assist them with something that is not their intent. Our key idea is that the robot can detect when its repertoire of intents is insufficient to explain the user's input, and g… ▽ More

    Submitted 13 April, 2021; originally announced April 2021.

    Comments: In proceedings ICRA 2021

  27. arXiv:2104.00053  [pdf, other

    cs.RO cs.AI

    LazyDAgger: Reducing Context Switching in Interactive Imitation Learning

    Authors: Ryan Hoque, Ashwin Balakrishna, Carl Putterman, Michael Luo, Daniel S. Brown, Daniel Seita, Brijen Thananjeyan, Ellen Novoseller, Ken Goldberg

    Abstract: Corrective interventions while a robot is learning to automate a task provide an intuitive method for a human supervisor to assist the robot and convey information about desired behavior. However, these interventions can impose significant burden on a human supervisor, as each intervention interrupts other work the human is doing, incurs latency with each context switch between supervisor and auto… ▽ More

    Submitted 20 July, 2021; v1 submitted 31 March, 2021; originally announced April 2021.

    Comments: IEEE CASE 2021

  28. arXiv:2103.07815  [pdf, other

    cs.AI cs.RO

    Dynamically Switching Human Prediction Models for Efficient Planning

    Authors: Arjun Sripathy, Andreea Bobu, Daniel S. Brown, Anca D. Dragan

    Abstract: As environments involving both robots and humans become increasingly common, so does the need to account for people during planning. To plan effectively, robots must be able to respond to and sometimes influence what humans do. This requires a human model which predicts future human actions. A simple model may assume the human will continue what they did previously; a more complex one might predic… ▽ More

    Submitted 13 March, 2021; originally announced March 2021.

    Comments: ICRA '21

  29. arXiv:2012.01557  [pdf, other

    cs.LG

    Value Alignment Verification

    Authors: Daniel S. Brown, Jordan Schneider, Anca D. Dragan, Scott Niekum

    Abstract: As humans interact with autonomous agents to perform increasingly complicated, potentially risky tasks, it is important to be able to efficiently evaluate an agent's performance and correctness. In this paper we formalize and theoretically analyze the problem of efficient value alignment verification: how to efficiently test whether the behavior of another agent is aligned with a human's values. T… ▽ More

    Submitted 11 June, 2021; v1 submitted 2 December, 2020; originally announced December 2020.

    Comments: In proceedings International Conference on Machine Learning (ICML) 2021

  30. Topology of Coronal Magnetic Fields: Extending the Magnetic Skeleton Using Null-like Points

    Authors: D. T. Lee, D. S. Brown

    Abstract: Many phenomena in the Sun's atmosphere are magnetic in nature and study of the atmospheric magnetic field plays an important part in understanding these phenomena. Tools to study solar magnetic fields include magnetic topology and features such as magnetic null points, separatrix surfaces, and separators. The theory of these has most robustly been developed under magnetic charge topology, where th… ▽ More

    Submitted 20 November, 2020; originally announced November 2020.

    Comments: 21 pages, 11 figures, Accepted for publication in SolPhys

  31. arXiv:2011.05632  [pdf, other

    cs.RO cs.AI cs.LG

    Exploratory Gras**: Asymptotically Optimal Algorithms for Gras** Challenging Polyhedral Objects

    Authors: Michael Danielczuk, Ashwin Balakrishna, Daniel S. Brown, Shivin Devgon, Ken Goldberg

    Abstract: There has been significant recent work on data-driven algorithms for learning general-purpose gras** policies. However, these policies can consistently fail to grasp challenging objects which are significantly out of the distribution of objects in the training data or which have very few high quality grasps. Motivated by such objects, we propose a novel problem setting, Exploratory Gras**, for… ▽ More

    Submitted 11 November, 2020; v1 submitted 11 November, 2020; originally announced November 2020.

    Comments: Conference on Robot Learning (CoRL) 2020. First two authors contributed equally

  32. arXiv:2007.12315  [pdf, other

    cs.LG stat.ML

    Bayesian Robust Optimization for Imitation Learning

    Authors: Daniel S. Brown, Scott Niekum, Marek Petrik

    Abstract: One of the main challenges in imitation learning is determining what action an agent should take when outside the state distribution of the demonstrations. Inverse reinforcement learning (IRL) can enable generalization to new states by learning a parameterized reward function, but these approaches still face uncertainty over the true reward function and corresponding optimal policy. Existing safe… ▽ More

    Submitted 29 February, 2024; v1 submitted 23 July, 2020; originally announced July 2020.

    Comments: In proceedings NeurIPS 2020

  33. arXiv:2002.09089  [pdf, other

    cs.LG stat.ML

    Safe Imitation Learning via Fast Bayesian Reward Inference from Preferences

    Authors: Daniel S. Brown, Russell Coleman, Ravi Srinivasan, Scott Niekum

    Abstract: Bayesian reward learning from demonstrations enables rigorous safety and uncertainty analysis when performing imitation learning. However, Bayesian reward learning methods are typically computationally intractable for complex control problems. We propose Bayesian Reward Extrapolation (Bayesian REX), a highly efficient Bayesian reward learning algorithm that scales to high-dimensional imitation lea… ▽ More

    Submitted 17 December, 2020; v1 submitted 20 February, 2020; originally announced February 2020.

    Comments: In proceedings ICML 2020

  34. arXiv:1912.04472  [pdf, other

    cs.LG cs.AI stat.ML

    Deep Bayesian Reward Learning from Preferences

    Authors: Daniel S. Brown, Scott Niekum

    Abstract: Bayesian inverse reinforcement learning (IRL) methods are ideal for safe imitation learning, as they allow a learning agent to reason about reward uncertainty and the safety of a learned policy. However, Bayesian IRL is computationally intractable for high-dimensional problems because each sample from the posterior requires solving an entire Markov Decision Process (MDP). While there exist non-Bay… ▽ More

    Submitted 9 December, 2019; originally announced December 2019.

    Comments: Workshop on Safety and Robustness in Decision Making at the 33rd Conference on Neural Information Processing Systems (NeurIPS) 2019

  35. arXiv:1907.03976  [pdf, other

    cs.LG stat.ML

    Better-than-Demonstrator Imitation Learning via Automatically-Ranked Demonstrations

    Authors: Daniel S. Brown, Wonjoon Goo, Scott Niekum

    Abstract: The performance of imitation learning is typically upper-bounded by the performance of the demonstrator. While recent empirical results demonstrate that ranked demonstrations allow for better-than-demonstrator performance, preferences over demonstrations may be difficult to obtain, and little is known theoretically about when such methods can be expected to successfully extrapolate beyond the perf… ▽ More

    Submitted 14 October, 2019; v1 submitted 9 July, 2019; originally announced July 2019.

    Comments: In proceedings of 3rd Conference on Robot Learning (CoRL) 2019

  36. arXiv:1904.06387  [pdf, other

    cs.LG stat.ML

    Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations

    Authors: Daniel S. Brown, Wonjoon Goo, Prabhat Nagarajan, Scott Niekum

    Abstract: A critical flaw of existing inverse reinforcement learning (IRL) methods is their inability to significantly outperform the demonstrator. This is because IRL typically seeks a reward function that makes the demonstrator appear near-optimal, rather than inferring the underlying intentions of the demonstrator that may have been poorly executed in practice. In this paper, we introduce a novel reward-… ▽ More

    Submitted 8 July, 2019; v1 submitted 12 April, 2019; originally announced April 2019.

    Comments: In proceedings of Thirty-sixth International Conference on Machine Learning (ICML 2019)

  37. arXiv:1901.02161  [pdf, other

    cs.LG stat.ML

    Risk-Aware Active Inverse Reinforcement Learning

    Authors: Daniel S. Brown, Yuchen Cui, Scott Niekum

    Abstract: Active learning from demonstration allows a robot to query a human for specific types of input to achieve efficient learning. Existing work has explored a variety of active query strategies; however, to our knowledge, none of these strategies directly minimize the performance risk of the policy the robot is learning. Utilizing recent advances in performance bounds for inverse reinforcement learnin… ▽ More

    Submitted 3 June, 2019; v1 submitted 8 January, 2019; originally announced January 2019.

    Comments: In proceedings of the 2nd Conference on Robot Learning (CoRL) 2018

  38. arXiv:1811.03563  [pdf, other

    cs.RO

    LAAIR: A Layered Architecture for Autonomous Interactive Robots

    Authors: Yuqian Jiang, Nick Walker, Minkyu Kim, Nicolas Brissonneau, Daniel S. Brown, Justin W. Hart, Scott Niekum, Luis Sentis, Peter Stone

    Abstract: When develo** general purpose robots, the overarching software architecture can greatly affect the ease of accomplishing various tasks. Initial efforts to create unified robot systems in the 1990s led to hybrid architectures, emphasizing a hierarchy in which deliberative plans direct the use of reactive skills. However, since that time there has been significant progress in the low-level skills… ▽ More

    Submitted 8 November, 2018; v1 submitted 8 November, 2018; originally announced November 2018.

    Comments: Presented at LTA AAAI-FSS, 2018

  39. arXiv:1805.07687  [pdf, other

    cs.LG stat.ML

    Machine Teaching for Inverse Reinforcement Learning: Algorithms and Applications

    Authors: Daniel S. Brown, Scott Niekum

    Abstract: Inverse reinforcement learning (IRL) infers a reward function from demonstrations, allowing for policy improvement and generalization. However, despite much recent interest in IRL, little work has been done to understand the minimum set of demonstrations needed to teach a specific sequential decision-making task. We formalize the problem of finding maximally informative demonstrations for IRL as a… ▽ More

    Submitted 16 August, 2019; v1 submitted 19 May, 2018; originally announced May 2018.

    Comments: In proceedings of the AAAI Conference on Artificial Intelligence, 2019

  40. Beam-energy and centrality dependence of direct-photon emission from ultra-relativistic heavy-ion collisions

    Authors: A. Adare, S. Afanasiev, C. Aidala, N. N. Ajitanand, Y. Akiba, R. Akimoto, H. Al-Bataineh, J. Alexander, M. Alfred, A. Al-Jamel, H. Al-Ta'ani, A. Angerami, K. Aoki, N. Apadula, L. Aphecetche, Y. Aramaki, R. Armendariz, S. H. Aronson, J. Asai, H. Asano, E. C. Aschenauer, E. T. Atomssa, R. Averbeck, T. C. Awes, B. Azmoun , et al. (648 additional authors not shown)

    Abstract: The PHENIX collaboration presents first measurements of low-momentum ($0.4<p_T<3$ GeV/$c$) direct-photon yields from Au$+$Au collisions at $\sqrt{s_{_{NN}}}$=39 and 62.4 GeV. For both beam energies the direct-photon yields are substantially enhanced with respect to expectations from prompt processes, similar to the yields observed in Au$+$Au collisions at $\sqrt{s_{_{NN}}}$=200. Analyzing the phot… ▽ More

    Submitted 5 June, 2019; v1 submitted 10 May, 2018; originally announced May 2018.

    Comments: 673 authors from 82 institutions, 10 pages, 4 figures. v2 is version accepted for publication in Physical Review Letters. Plain text data tables for the points plotted in figures for this and previous PHENIX publications are (or will be) publicly available at http://www.phenix.bnl.gov/papers.html

    Journal ref: Phys. Rev. Lett. 123, 022301 (2019)

  41. arXiv:1707.00724  [pdf, other

    cs.AI cs.LG stat.ML

    Efficient Probabilistic Performance Bounds for Inverse Reinforcement Learning

    Authors: Daniel S. Brown, Scott Niekum

    Abstract: In the field of reinforcement learning there has been recent progress towards safety and high-confidence bounds on policy performance. However, to our knowledge, no practical methods exist for determining high-confidence policy performance bounds in the inverse reinforcement learning setting---where the true reward function is unknown and only samples of expert behavior are given. We propose a sam… ▽ More

    Submitted 22 June, 2018; v1 submitted 3 July, 2017; originally announced July 2017.

    Comments: In proceedings AAAI-18

  42. Transverse energy production and charged-particle multiplicity at midrapidity in various systems from $\sqrt{s_{NN}}=7.7$ to 200 GeV

    Authors: A. Adare, S. Afanasiev, C. Aidala, N. N. Ajitanand, Y. Akiba, R. Akimoto, H. Al-Bataineh, J. Alexander, M. Alfred, A. Al-Jamel, H. Al-Ta'ani, A. Angerami, K. Aoki, N. Apadula, L. Aphecetche, Y. Aramaki, R. Armendariz, S. H. Aronson, J. Asai, H. Asano, E. C. Aschenauer, E. T. Atomssa, R. Averbeck, T. C. Awes, B. Azmoun , et al. (681 additional authors not shown)

    Abstract: Measurements of midrapidity charged particle multiplicity distributions, $dN_{\rm ch}/dη$, and midrapidity transverse-energy distributions, $dE_T/dη$, are presented for a variety of collision systems and energies. Included are distributions for Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$, 130, 62.4, 39, 27, 19.6, 14.5, and 7.7 GeV, Cu$+$Cu collisions at $\sqrt{s_{_{NN}}}=200$ and 62.4 GeV, Cu$+$A… ▽ More

    Submitted 23 February, 2016; v1 submitted 22 September, 2015; originally announced September 2015.

    Comments: 706 authors, 32 pages, 20 figures, 34 tables, 2004, 2005, 2008, 2010, 2011, and 2012 data. v2 is version accepted for publication in Phys. Rev. C

    Journal ref: Phys. Rev. C 93, 024901 (2016)

  43. Systematic Study of Azimuthal Anisotropy in Cu$+$Cu and Au$+$Au Collisions at $\sqrt{s_{_{NN}}} = 62.4$ and 200 GeV

    Authors: A. Adare, S. Afanasiev, C. Aidala, N. N. Ajitanand, Y. Akiba, H. Al-Bataineh, A. Al-Jamel, J. Alexander, K. Aoki, L. Aphecetche, R. Armendariz, S. H. Aronson, J. Asai, E. T. Atomssa, R. Averbeck, T. C. Awes, B. Azmoun, V. Babintsev, G. Baksay, L. Baksay, A. Baldisseri, K. N. Barish, P. D. Barnes, B. Bassalleck, S. Bathe , et al. (399 additional authors not shown)

    Abstract: We have studied the dependence of azimuthal anisotropy $v_2$ for inclusive and identified charged hadrons in Au$+$Au and Cu$+$Cu collisions on collision energy, species, and centrality. The values of $v_2$ as a function of transverse momentum $p_T$ and centrality in Au$+$Au collisions at $\sqrt{s_{_{NN}}}$=200 GeV and 62.4 GeV are the same within uncertainties. However, in Cu$+$Cu collisions we ob… ▽ More

    Submitted 18 September, 2015; v1 submitted 2 December, 2014; originally announced December 2014.

    Comments: 424 authors, 22 pages, 22 figures, 6 tables. v2 is the version accepted for publication in Phys. Rev. C. Plain text data tables for the points plotted in figures for this and previous PHENIX publications are (or will be) publicly available at http://www.phenix.bnl.gov/papers.html

  44. Transverse-energy distributions at midrapidity in $p$$+$$p$, $d$$+$Au, and Au$+$Au collisions at $\sqrt{s_{_{NN}}}=62.4$--200~GeV and implications for particle-production models

    Authors: S. S. Adler, S. Afanasiev, C. Aidala, N. N. Ajitanand, Y. Akiba, A. Al-Jamel, J. Alexander, K. Aoki, L. Aphecetche, R. Armendariz, S. H. Aronson, R. Averbeck, T. C. Awes, B. Azmoun, V. Babintsev, A. Baldisseri, K. N. Barish, P. D. Barnes, B. Bassalleck, S. Bathe, S. Batsouli, V. Baublis, F. Bauer, A. Bazilevsky, S. Belikov , et al. (366 additional authors not shown)

    Abstract: Measurements of the midrapidity transverse energy distribution, $d\Et/dη$, are presented for $p$$+$$p$, $d$$+$Au, and Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$ GeV and additionally for Au$+$Au collisions at $\sqrt{s_{_{NN}}}=62.4$ and 130 GeV. The $d\Et/dη$ distributions are first compared with the number of nucleon participants $N_{\rm part}$, number of binary collisions $N_{\rm coll}$, and nu… ▽ More

    Submitted 23 December, 2013; originally announced December 2013.

    Comments: 391 authors, 24 pages, 19 figures, and 15 Tables. Submitted to Phys. Rev. C. Plain text data tables for the points plotted in figures for this and previous PHENIX publications are publicly available at http://www.phenix.bnl.gov/papers.html

    Journal ref: Phys. Rev. C 89, 044905 (2014)

  45. arXiv:1304.5488  [pdf, ps, other

    astro-ph.IM astro-ph.SR

    On-Orbit Degradation of Solar Instruments

    Authors: A. BenMoussa, S. Gissot, U. Schühle, G. Del Zanna, F. Auchère, S. Mekaoui, A. R. Jones, D. Walton, C. J. Eyles, G. Thuillier, D. Seaton, I. E. Dammasch, G. Cessateur, M. Meftah, V. Andretta, D. Berghmans, D. Bewsher, D. Bolsée, L. Bradley, D. S. Brown, P. C. Chamberlin, S. Dewitte, L. V. Didkovsky, M. Dominique, F. G. Eparvier , et al. (16 additional authors not shown)

    Abstract: We present the lessons learned about the degradation observed in several space solar missions, based on contributions at the Workshop about On-Orbit Degradation of Solar and Space Weather Instruments that took place at the Solar Terrestrial Centre of Excellence (Royal Observatory of Belgium) in Brussels on 3 May 2012. The aim of this workshop was to open discussions related to the degradation obse… ▽ More

    Submitted 19 April, 2013; originally announced April 2013.

  46. Direct photon production in d+Au collisions at sqrt(s_NN)=200 GeV

    Authors: A. Adare, S. S. Adler, S. Afanasiev, C. Aidala, N. N. Ajitanand, Y. Akiba, H. Al-Bataineh, A. Al-Jamel, J. Alexander, A. Angerami, K. Aoki, N. Apadula, L. Aphecetche, Y. Aramaki, R. Armendariz, S. H. Aronson, J. Asai, E. T. Atomssa, R. Averbeck, T. C. Awes, B. Azmoun, V. Babintsev, M. Bai, G. Baksay, L. Baksay , et al. (522 additional authors not shown)

    Abstract: Direct photons have been measured in sqrt(s_NN)=200 GeV d+Au collisions at midrapidity. A wide p_T range is covered by measurements of nearly-real virtual photons (1<p_T<6 GeV/c) and real photons (5<p_T<16 GeV/c). The invariant yield of the direct photons in d+Au collisions over the scaled p+p cross section is consistent with unity. Theoretical calculations assuming standard cold nuclear matter ef… ▽ More

    Submitted 6 August, 2012; originally announced August 2012.

    Comments: 547 authors, 7 pages, 4 figures. Submitted to Phys. Rev. Lett.. Plain text data tables for the points plotted in figures for this and previous PHENIX publications are (or will be) publicly available at http://www.phenix.bnl.gov/papers.html

    Journal ref: Phys. Rev. C 87, 054907 (2013)

  47. Measurement of Direct Photons in Au+Au Collisions at sqrt(s_NN) = 200 GeV

    Authors: S. Afanasiev, C. Aidala, N. N. Ajitanand, Y. Akiba, A. Al-Jamel, J. Alexander, K. Aoki, L. Aphecetche, R. Armendariz, S. H. Aronson, R. Averbeck, T. C. Awes, B. Azmoun, V. Babintsev, A. Baldisseri, K. N. Barish, P. D. Barnes, B. Bassalleck, S. Bathe, S. Batsouli, V. Baublis, F. Bauer, A. Bazilevsky, S. Belikov, R. Bennett , et al. (321 additional authors not shown)

    Abstract: We report the measurement of direct photons at midrapidity in Au+Au collisions at sqrt{s_NN} = 200 GeV. The direct photon signal was extracted for the transverse-momentum range of 4 GeV/c < p_T < 22 GeV/c, using a statistical method to subtract decay photons from the inclusive-photon sample. The direct-photon nuclear-modification factor R_AA was calculated as a function of p_T for different Au+Au… ▽ More

    Submitted 25 May, 2012; originally announced May 2012.

    Comments: PHENIX Collaboration, 346 authors, 8 pages, 4 figures, 1 table. Submitted to Phys. Rev. Lett. Plain text data tables for the points plotted in figures for this and previous PHENIX publications are (or will be) publicly available at http://www.phenix.bnl.gov/papers.html

  48. Production of omega mesons in p+p, d+Au, Cu+Cu, and Au+Au collisions at sqrt(s_NN)=200 GeV

    Authors: A. Adare, S. Afanasiev, C. Aidala, N. N. Ajitanand, Y. Akiba, H. Al-Bataineh, A. Al-Jamel, J. Alexander, A. Angerami, K. Aoki, N. Apadula, L. Aphecetche, Y. Aramaki, R. Armendariz, S. H. Aronson, J. Asai, E. T. Atomssa, R. Averbeck, T. C. Awes, B. Azmoun, V. Babintsev, M. Bai, G. Baksay, L. Baksay, A. Baldisseri , et al. (517 additional authors not shown)

    Abstract: The PHENIX experiment at the Relativistic Heavy Ion Collider (RHIC) has measured omega meson production via leptonic and hadronic decay channels in p+p, d+Au, Cu+Cu, and Au+Au collisions at sqrt(s_NN) = 200 GeV. The invariant transverse momentum spectra measured in different decay modes give consistent results. Measurements in the hadronic decay channel in Cu+Cu and Au+Au collisions show that omeg… ▽ More

    Submitted 17 May, 2011; originally announced May 2011.

    Comments: 542 authors, pages, 11 figures, 3 tables. Submitted to Phys. Rev. C. Plain text data tables for the points plotted in figures for this and previous PHENIX publications are (or will be) publicly available at http://www.phenix.bnl.gov/papers.html

  49. Heavy Quark Production in p+p and Energy Loss and Flow of Heavy Quarks in Au+Au Collisions at sqrt(s_NN)=200 GeV

    Authors: A. Adare, S. Afanasiev, C. Aidala, N. N. Ajitanand, Y. Akiba, H. Al-Bataineh, J. Alexander, A. Al-Jamel, K. Aoki, L. Aphecetche, R. Armendariz, S. H. Aronson, J. Asai, E. T. Atomssa, R. Averbeck, T. C. Awes, B. Azmoun, V. Babintsev, G. Baksay, L. Baksay, A. Baldisseri, K. N. Barish, P. D. Barnes, B. Bassalleck, S. Bathe , et al. (398 additional authors not shown)

    Abstract: Transverse momentum (p^e_T) spectra of electrons from semileptonic weak decays of heavy flavor mesons in the range of 0.3 < p^e_T < 9.0 GeV/c have been measured at mid-rapidity (|eta| < 0.35) by the PHENIX experiment at the Relativistic Heavy Ion Collider in p+p and Au+Au collisions at sqrt(s_NN)=200 GeV. The nuclear modification factor R_AA with respect to p+p collisions indicates substantial ene… ▽ More

    Submitted 17 May, 2010; v1 submitted 10 May, 2010; originally announced May 2010.

    Comments: 422 authors from 59 institutions, 48 pages, 46 figures, 18 tables. v2 removes line numbers and matches submission to PRC. Plain text data tables for points plotted in figures, but not in tables are at http://www.phenix.bnl.gov/papers.html

  50. Nuclear modification factors of phi mesons in d+Au, Cu+Cu and Au+Au collisions at sqrt(S_NN)=200 GeV

    Authors: PHENIX Collaboration, A. Adare, S. Afanasiev, C. Aidala, N. N. Ajitanand, Y. Akiba, H. Al-Bataineh, J. Alexander, A. Al-Jamel, A. Angerami, K. Aoki, L. Aphecetche, Y. Aramaki, R. Armendariz, S. H. Aronson, J. Asai, E. T. Atomssa, R. Averbeck, T. C. Awes, B. Azmoun, V. Babintsev, M. Bai, G. Baksay, L. Baksay, A. Baldisseri , et al. (508 additional authors not shown)

    Abstract: The PHENIX experiment at the Relativistic Heavy Ion Collider (RHIC) has performed systematic measurements of phi meson production in the K+K- decay channel at midrapidity in p+p, d+Au, Cu+Cu and Au+Au collisions at sqrt(S_NN)=200 GeV. Results are presented on the phi invariant yield and the nuclear modification factor R_AA for Au+Au and Cu+Cu, and R_dA for d+Au collisions, studied as a function o… ▽ More

    Submitted 21 April, 2010; v1 submitted 20 April, 2010; originally announced April 2010.

    Comments: 532 authors, 11 pages text, RevTeX-4, 7 figures, 1 Table. Submitted to Physical Review C. v2 interchanged to proper .eps files for Figs. 6 and 7; fixed minor typos. Plain text data tables for points plotted in figures are at http://www.phenix.bnl.gov/papers.html

    Journal ref: Phys.Rev.C83:024909,2011