Skip to main content

Showing 1–50 of 63 results for author: Bagnell, J A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.07253  [pdf, other

    cs.LG

    Hybrid Reinforcement Learning from Offline Observation Alone

    Authors: Yuda Song, J. Andrew Bagnell, Aarti Singh

    Abstract: We consider the hybrid reinforcement learning setting where the agent has access to both offline data and online interactive access. While Reinforcement Learning (RL) research typically assumes offline data contains complete action, reward and transition information, datasets with only state information (also known as observation-only datasets) are more general, abundant and practical. This motiva… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: 34 pages, 7 figures, published at ICML 2024

  2. arXiv:2406.01462  [pdf, other

    cs.LG cs.AI cs.CL

    Understanding Preference Fine-Tuning Through the Lens of Coverage

    Authors: Yuda Song, Gokul Swamy, Aarti Singh, J. Andrew Bagnell, Wen Sun

    Abstract: Learning from human preference data has emerged as the dominant paradigm for fine-tuning large language models (LLMs). The two most common families of techniques -- online reinforcement learning (RL) such as Proximal Policy Optimization (PPO) and offline contrastive methods such as Direct Preference Optimization (DPO) -- were positioned as equivalent in prior work due to the fact that both have to… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  3. arXiv:2404.16767  [pdf, other

    cs.LG cs.CL cs.CV

    REBEL: Reinforcement Learning via Regressing Relative Rewards

    Authors: Zhaolin Gao, Jonathan D. Chang, Wenhao Zhan, Owen Oertell, Gokul Swamy, Kianté Brantley, Thorsten Joachims, J. Andrew Bagnell, Jason D. Lee, Wen Sun

    Abstract: While originally developed for continuous control problems, Proximal Policy Optimization (PPO) has emerged as the work-horse of a variety of reinforcement learning (RL) applications, including the fine-tuning of generative models. Unfortunately, PPO requires multiple heuristics to enable stable convergence (e.g. value networks, clip**), and is notorious for its sensitivity to the precise impleme… ▽ More

    Submitted 29 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: New experimental results on general chat

  4. arXiv:2402.08848  [pdf, other

    cs.LG cs.AI

    Hybrid Inverse Reinforcement Learning

    Authors: Juntao Ren, Gokul Swamy, Zhiwei Steven Wu, J. Andrew Bagnell, Sanjiban Choudhury

    Abstract: The inverse reinforcement learning approach to imitation learning is a double-edged sword. On the one hand, it can enable learning from a smaller number of expert demonstrations with more robustness to error compounding than behavioral cloning approaches. On the other hand, it requires that the learner repeatedly solve a computationally expensive reinforcement learning (RL) problem. Often, much of… ▽ More

    Submitted 4 June, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

  5. arXiv:2402.02616   

    cs.LG

    The Virtues of Pessimism in Inverse Reinforcement Learning

    Authors: David Wu, Gokul Swamy, J. Andrew Bagnell, Zhiwei Steven Wu, Sanjiban Choudhury

    Abstract: Inverse Reinforcement Learning (IRL) is a powerful framework for learning complex behaviors from expert demonstrations. However, it traditionally requires repeatedly solving a computationally expensive reinforcement learning (RL) problem in its inner loop. It is desirable to reduce the exploration burden by leveraging expert demonstrations in the inner-loop RL. As an example, recent work resets th… ▽ More

    Submitted 8 February, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

    Comments: This paper has been withdrawn by the authors pending edits from other authors

  6. arXiv:2303.14623  [pdf, other

    cs.LG

    Inverse Reinforcement Learning without Reinforcement Learning

    Authors: Gokul Swamy, Sanjiban Choudhury, J. Andrew Bagnell, Zhiwei Steven Wu

    Abstract: Inverse Reinforcement Learning (IRL) is a powerful set of techniques for imitation learning that aims to learn a reward function that rationalizes expert demonstrations. Unfortunately, traditional IRL methods suffer from a computational weakness: they require repeatedly solving a hard reinforcement learning (RL) problem as a subroutine. This is counter-intuitive from the viewpoint of reductions: w… ▽ More

    Submitted 29 January, 2024; v1 submitted 26 March, 2023; originally announced March 2023.

  7. arXiv:2303.00694  [pdf, other

    cs.LG cs.RO eess.SY

    The Virtues of Laziness in Model-based RL: A Unified Objective and Algorithms

    Authors: Anirudh Vemula, Yuda Song, Aarti Singh, J. Andrew Bagnell, Sanjiban Choudhury

    Abstract: We propose a novel approach to addressing two fundamental challenges in Model-based Reinforcement Learning (MBRL): the computational expense of repeatedly finding a good policy in the learned model, and the objective mismatch between model fitting and policy computation. Our "lazy" method leverages a novel unified objective, Performance Difference via Advantage in Model, to capture the performance… ▽ More

    Submitted 1 March, 2023; originally announced March 2023.

  8. arXiv:2210.06718  [pdf, other

    cs.LG

    Hybrid RL: Using Both Offline and Online Data Can Make RL Efficient

    Authors: Yuda Song, Yifei Zhou, Ayush Sekhari, J. Andrew Bagnell, Akshay Krishnamurthy, Wen Sun

    Abstract: We consider a hybrid reinforcement learning setting (Hybrid RL), in which an agent has access to an offline dataset and the ability to collect experience via real-world online interaction. The framework mitigates the challenges that arise in both pure offline and online RL settings, allowing for the design of simple and highly effective algorithms, in both theory and practice. We demonstrate these… ▽ More

    Submitted 11 March, 2023; v1 submitted 13 October, 2022; originally announced October 2022.

    Comments: 42 pages, 6 figures. Published at ICLR 2023. Code available at https://github.com/yudasong/HyQ

  9. arXiv:2208.09551  [pdf, ps, other

    cs.GT cs.LG

    Game-Theoretic Algorithms for Conditional Moment Matching

    Authors: Gokul Swamy, Sanjiban Choudhury, J. Andrew Bagnell, Zhiwei Steven Wu

    Abstract: A variety of problems in econometrics and machine learning, including instrumental variable regression and Bellman residual minimization, can be formulated as satisfying a set of conditional moment restrictions (CMR). We derive a general, game-theoretic strategy for satisfying CMR that scales to nonlinear problems, is amenable to gradient-based optimization, and is able to account for finite sampl… ▽ More

    Submitted 19 August, 2022; originally announced August 2022.

  10. arXiv:2208.02225  [pdf, other

    cs.LG

    Sequence Model Imitation Learning with Unobserved Contexts

    Authors: Gokul Swamy, Sanjiban Choudhury, J. Andrew Bagnell, Zhiwei Steven Wu

    Abstract: We consider imitation learning problems where the learner's ability to mimic the expert increases throughout the course of an episode as more information is revealed. One example of this is when the expert has access to privileged information: while the learner might not be able to accurately reproduce expert behavior early on in an episode, by considering the entire history of states and actions,… ▽ More

    Submitted 14 January, 2023; v1 submitted 3 August, 2022; originally announced August 2022.

  11. arXiv:2205.15397  [pdf, other

    cs.LG stat.ML

    Minimax Optimal Online Imitation Learning via Replay Estimation

    Authors: Gokul Swamy, Nived Rajaraman, Matthew Peng, Sanjiban Choudhury, J. Andrew Bagnell, Zhiwei Steven Wu, Jiantao Jiao, Kannan Ramchandran

    Abstract: Online imitation learning is the problem of how best to mimic expert demonstrations, given access to the environment or an accurate simulator. Prior work has shown that in the infinite sample regime, exact moment matching achieves value equivalence to the expert policy. However, in the finite sample regime, even if one has no optimization error, empirical variance can lead to a performance gap tha… ▽ More

    Submitted 14 January, 2023; v1 submitted 30 May, 2022; originally announced May 2022.

  12. arXiv:2202.01312  [pdf, other

    cs.LG cs.RO

    Causal Imitation Learning under Temporally Correlated Noise

    Authors: Gokul Swamy, Sanjiban Choudhury, J. Andrew Bagnell, Zhiwei Steven Wu

    Abstract: We develop algorithms for imitation learning from policy data that was corrupted by temporally correlated noise in expert actions. When noise affects multiple timesteps of recorded data, it can manifest as spurious correlations between states and actions that a learner might latch on to, leading to poor policy performance. To break up these spurious correlations, we apply modern variants of the in… ▽ More

    Submitted 2 February, 2022; originally announced February 2022.

  13. arXiv:2111.09434  [pdf, other

    cs.RO cs.LG eess.SY

    On the Effectiveness of Iterative Learning Control

    Authors: Anirudh Vemula, Wen Sun, Maxim Likhachev, J. Andrew Bagnell

    Abstract: Iterative learning control (ILC) is a powerful technique for high performance tracking in the presence of modeling errors for optimal control applications. There is extensive prior work showing its empirical effectiveness in applications such as chemical reactors, industrial robots and quadcopters. However, there is little prior theoretical work that explains the effectiveness of ILC even in the p… ▽ More

    Submitted 8 December, 2021; v1 submitted 17 November, 2021; originally announced November 2021.

    Comments: Submitted to L4DC 2022

  14. arXiv:2110.02063  [pdf, ps, other

    cs.LG

    A Critique of Strictly Batch Imitation Learning

    Authors: Gokul Swamy, Sanjiban Choudhury, J. Andrew Bagnell, Zhiwei Steven Wu

    Abstract: Recent work by Jarrett et al. attempts to frame the problem of offline imitation learning (IL) as one of learning a joint energy-based model, with the hope of out-performing standard behavioral cloning. We suggest that notational issues obscure how the psuedo-state visitation distribution the authors propose to optimize might be disconnected from the policy's $\textit{true}$ state visitation distr… ▽ More

    Submitted 5 October, 2021; originally announced October 2021.

  15. arXiv:2103.03236  [pdf, other

    cs.LG cs.RO stat.ML

    Of Moments and Matching: A Game-Theoretic Framework for Closing the Imitation Gap

    Authors: Gokul Swamy, Sanjiban Choudhury, J. Andrew Bagnell, Zhiwei Steven Wu

    Abstract: We provide a unifying view of a large family of previous imitation learning algorithms through the lens of moment matching. At its core, our classification scheme is based on whether the learner attempts to match (1) reward or (2) action-value moments of the expert's behavior, with each option leading to differing algorithmic approaches. By considering adversarially chosen divergences between lear… ▽ More

    Submitted 10 June, 2021; v1 submitted 4 March, 2021; originally announced March 2021.

  16. arXiv:2102.02872  [pdf, other

    cs.LG cs.RO stat.ML

    Feedback in Imitation Learning: The Three Regimes of Covariate Shift

    Authors: Jonathan Spencer, Sanjiban Choudhury, Arun Venkatraman, Brian Ziebart, J. Andrew Bagnell

    Abstract: Imitation learning practitioners have often noted that conditioning policies on previous actions leads to a dramatic divergence between "held out" error and performance of the learner in situ. Interactive approaches can provably address this divergence but require repeated querying of a demonstrator. Recent work identifies this divergence as stemming from a "causal confound" in predicting the curr… ▽ More

    Submitted 11 February, 2021; v1 submitted 4 February, 2021; originally announced February 2021.

  17. arXiv:2009.09942  [pdf, other

    cs.RO cs.AI cs.LG

    CMAX++ : Leveraging Experience in Planning and Execution using Inaccurate Models

    Authors: Anirudh Vemula, J. Andrew Bagnell, Maxim Likhachev

    Abstract: Given access to accurate dynamical models, modern planning approaches are effective in computing feasible and optimal plans for repetitive robotic tasks. However, it is difficult to model the true dynamics of the real world before execution, especially for tasks requiring interactions with objects whose parameters are unknown. A recent planning approach, CMAX, tackles this problem by adapting the… ▽ More

    Submitted 15 October, 2020; v1 submitted 21 September, 2020; originally announced September 2020.

  18. arXiv:2004.00500  [pdf, other

    cs.LG stat.ML

    Exploration in Action Space

    Authors: Anirudh Vemula, Wen Sun, J. Andrew Bagnell

    Abstract: Parameter space exploration methods with black-box optimization have recently been shown to outperform state-of-the-art approaches in continuous control reinforcement learning domains. In this paper, we examine reasons why these methods work better and the situations in which they are worse than traditional action space exploration methods. Through a simple theoretical analysis, we show that when… ▽ More

    Submitted 30 March, 2020; originally announced April 2020.

    Comments: Presented at RSS 2018 in Learning and Inference in Robotics: Integrating Structure, Priors and Models workshop. arXiv admin note: text overlap with arXiv:1901.11503

  19. arXiv:2003.14393  [pdf, other

    cs.RO eess.SY

    TRON: A Fast Solver for Trajectory Optimization with Non-Smooth Cost Functions

    Authors: Anirudh Vemula, J. Andrew Bagnell

    Abstract: Trajectory optimization is an important tool for control and planning of complex, underactuated robots, and has shown impressive results in real world robotic tasks. However, in applications where the cost function to be optimized is non-smooth, modern trajectory optimization methods have extremely slow convergence. In this work, we present TRON, an iterative solver that can be used for efficient… ▽ More

    Submitted 31 March, 2020; v1 submitted 31 March, 2020; originally announced March 2020.

    Comments: Submitted to CDC 2020

  20. arXiv:2003.04394  [pdf, other

    cs.RO cs.LG

    Planning and Execution using Inaccurate Models with Provable Guarantees

    Authors: Anirudh Vemula, Yash Oza, J. Andrew Bagnell, Maxim Likhachev

    Abstract: Models used in modern planning problems to simulate outcomes of real world action executions are becoming increasingly complex, ranging from simulators that do physics-based reasoning to precomputed analytical motion primitives. However, robots operating in the real world often face situations not modeled by these models before execution. This imperfect modeling can lead to highly suboptimal or ev… ▽ More

    Submitted 15 October, 2020; v1 submitted 9 March, 2020; originally announced March 2020.

    Comments: Accepted at RSS 2020. 12 pages, 5 figures. Code at https://github.com/vvanirudh/CMAX , video at https://youtu.be/eQmAeWIhjO8 and blog post at https://vvanirudh.github.io/blog/cmax

  21. arXiv:1905.10948  [pdf, other

    cs.LG stat.ML

    Provably Efficient Imitation Learning from Observation Alone

    Authors: Wen Sun, Anirudh Vemula, Byron Boots, J. Andrew Bagnell

    Abstract: We study Imitation Learning (IL) from Observations alone (ILFO) in large-scale MDPs. While most IL algorithms rely on an expert to directly provide actions to the learner, in this setting the expert only supplies sequences of observations. We design a new model-free algorithm for ILFO, Forward Adversarial Imitation Learning (FAIL), which learns a sequence of time-dependent policies by minimizing a… ▽ More

    Submitted 11 June, 2019; v1 submitted 26 May, 2019; originally announced May 2019.

    Comments: ICML 2019

  22. arXiv:1901.11503  [pdf, other

    cs.LG cs.AI cs.RO stat.ML

    Contrasting Exploration in Parameter and Action Space: A Zeroth-Order Optimization Perspective

    Authors: Anirudh Vemula, Wen Sun, J. Andrew Bagnell

    Abstract: Black-box optimizers that explore in parameter space have often been shown to outperform more sophisticated action space exploration methods developed specifically for the reinforcement learning problem. We examine these black-box methods closely to identify situations in which they are worse than action space exploration methods and those in which they are superior. Through simple theoretical ana… ▽ More

    Submitted 31 January, 2019; originally announced January 2019.

    Comments: Accepted at AISTATS 2019

  23. An Algorithmic Perspective on Imitation Learning

    Authors: Takayuki Osa, Joni Pajarinen, Gerhard Neumann, J. Andrew Bagnell, Pieter Abbeel, Jan Peters

    Abstract: As robots and other intelligent agents move from simple environments and problems to more complex, unstructured settings, manually programming their behavior has become increasingly challenging and expensive. Often, it is easier for a teacher to demonstrate a desired behavior rather than attempt to manually engineer it. This process of learning from demonstrations, and the study of algorithms to d… ▽ More

    Submitted 16 November, 2018; originally announced November 2018.

    Comments: 187 pages. Published in Foundations and Trends in Robotics

  24. arXiv:1805.11240  [pdf, other

    cs.LG stat.ML

    Truncated Horizon Policy Search: Combining Reinforcement Learning & Imitation Learning

    Authors: Wen Sun, J. Andrew Bagnell, Byron Boots

    Abstract: In this paper, we propose to combine imitation and reinforcement learning via the idea of reward sha** using an oracle. We study the effectiveness of the near-optimal cost-to-go oracle on the planning horizon and demonstrate that the cost-to-go oracle shortens the learner's planning horizon as function of its accuracy: a globally optimal oracle can shorten the planning horizon to one, leading to… ▽ More

    Submitted 29 May, 2018; originally announced May 2018.

    Comments: ICLR 2018

  25. arXiv:1805.10755  [pdf, other

    cs.LG stat.ML

    Dual Policy Iteration

    Authors: Wen Sun, Geoffrey J. Gordon, Byron Boots, J. Andrew Bagnell

    Abstract: Recently, a novel class of Approximate Policy Iteration (API) algorithms have demonstrated impressive practical performance (e.g., ExIt from [2], AlphaGo-Zero from [27]). This new family of algorithms maintains, and alternately optimizes, two policies: a fast, reactive policy (e.g., a deep neural network) deployed at test time, and a slow, non-reactive policy (e.g., Tree Search), that can plan mul… ▽ More

    Submitted 5 April, 2019; v1 submitted 27 May, 2018; originally announced May 2018.

    Comments: NeurIPS 2018; Additional related works

  26. arXiv:1711.00002  [pdf, other

    cs.CV cs.LG

    Log-DenseNet: How to Sparsify a DenseNet

    Authors: Hanzhang Hu, Debadeepta Dey, Allison Del Giorno, Martial Hebert, J. Andrew Bagnell

    Abstract: Skip connections are increasingly utilized by deep neural networks to improve accuracy and cost-efficiency. In particular, the recent DenseNet is efficient in computation and parameters, and achieves state-of-the-art predictions by directly connecting each feature layer to all previous ones. However, DenseNet's extreme connectivity pattern may hinder its scalability to high depths, and in applicat… ▽ More

    Submitted 30 October, 2017; originally announced November 2017.

  27. arXiv:1709.08520  [pdf, other

    stat.ML cs.LG

    Predictive-State Decoders: Encoding the Future into Recurrent Networks

    Authors: Arun Venkatraman, Nicholas Rhinehart, Wen Sun, Lerrel Pinto, Martial Hebert, Byron Boots, Kris M. Kitani, J. Andrew Bagnell

    Abstract: Recurrent neural networks (RNNs) are a vital modeling technique that rely on internal states learned indirectly by optimization of a supervised, unsupervised, or reinforcement training loss. RNNs are used to model dynamic processes that are characterized by underlying latent states whose form is often unknown, precluding its analytic representation inside an RNN. In the Predictive-State Representa… ▽ More

    Submitted 25 September, 2017; originally announced September 2017.

    Comments: NIPS 2017

  28. arXiv:1709.04549  [pdf, other

    cs.LG

    Ignoring Distractors in the Absence of Labels: Optimal Linear Projection to Remove False Positives During Anomaly Detection

    Authors: Allison Del Giorno, J. Andrew Bagnell, Martial Hebert

    Abstract: In the anomaly detection setting, the native feature embedding can be a crucial source of bias. We present a technique, Feature Omission using Context in Unsupervised Settings (FOCUS) to learn a feature map** that is invariant to changes exemplified in training sets while retaining as much descriptive power as possible. While this method could apply to many unsupervised settings, we focus on app… ▽ More

    Submitted 13 September, 2017; originally announced September 2017.

    Comments: 13 pages, 6 figures

  29. arXiv:1708.06832  [pdf, other

    cs.LG cs.AI

    Learning Anytime Predictions in Neural Networks via Adaptive Loss Balancing

    Authors: Hanzhang Hu, Debadeepta Dey, Martial Hebert, J. Andrew Bagnell

    Abstract: This work considers the trade-off between accuracy and test-time computational cost of deep neural networks (DNNs) via \emph{anytime} predictions from auxiliary predictions. Specifically, we optimize auxiliary losses jointly in an \emph{adaptive} weighted sum, where the weights are inversely proportional to average of each loss. Intuitively, this balances the losses to have the same scale. We demo… ▽ More

    Submitted 25 May, 2018; v1 submitted 22 August, 2017; originally announced August 2017.

  30. arXiv:1706.00155  [pdf, other

    cs.RO

    Shared Autonomy via Hindsight Optimization for Teleoperation and Teaming

    Authors: Shervin Javdani, Henny Admoni, Stefania Pellegrinelli, Siddhartha S. Srinivasa, J. Andrew Bagnell

    Abstract: In shared autonomy, a user and autonomous system work together to achieve shared goals. To collaborate effectively, the autonomous system must know the user's goal. As such, most prior works follow a predict-then-act model, first predicting the user's goal with high confidence, then assisting given that goal. Unfortunately, confidently predicting the user's goal may not be possible until they have… ▽ More

    Submitted 31 May, 2017; originally announced June 2017.

  31. arXiv:1705.10664  [pdf, other

    cs.RO

    A Fast Stochastic Contact Model for Planar Pushing and Gras**: Theory and Experimental Validation

    Authors: Jiaji Zhou, J. Andrew Bagnell, Matthew T. Mason

    Abstract: Based on the convex force-motion polynomial model for quasi-static sliding, we derive the kinematic contact model to determine the contact modes and instantaneous object motion on a supporting surface given a position controlled manipulator. The inherently stochastic object-to-surface friction distribution is modelled by sampling physically consistent parameters from appropriate distributions, wit… ▽ More

    Submitted 30 May, 2017; originally announced May 2017.

    Comments: Robotics: Science and Systems 2017

  32. arXiv:1703.01030  [pdf, other

    cs.LG

    Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction

    Authors: Wen Sun, Arun Venkatraman, Geoffrey J. Gordon, Byron Boots, J. Andrew Bagnell

    Abstract: Researchers have demonstrated state-of-the-art performance in sequential decision making problems (e.g., robotics control, sequential prediction) with deep neural network models. One often has access to near-optimal oracles that achieve good performance on the task during training. We demonstrate that AggreVaTeD --- a policy gradient extension of the Imitation Learning (IL) approach of (Ross & Bag… ▽ More

    Submitted 2 March, 2017; originally announced March 2017.

    Comments: 17 pages

  33. arXiv:1703.00377  [pdf, other

    cs.LG

    Gradient Boosting on Stochastic Data Streams

    Authors: Hanzhang Hu, Wen Sun, Arun Venkatraman, Martial Hebert, J. Andrew Bagnell

    Abstract: Boosting is a popular ensemble algorithm that generates more powerful learners by linearly combining base models from a simpler hypothesis class. In this work, we investigate the problem of adapting batch gradient boosting for minimizing convex loss functions to online setting where the loss at each iteration is i.i.d sampled from an unknown distribution. To generalize from batch to online, we fir… ▽ More

    Submitted 1 March, 2017; originally announced March 2017.

    Comments: To appear in AISTATS 2017

  34. arXiv:1609.08938  [pdf, other

    cs.CV stat.ML

    A Discriminative Framework for Anomaly Detection in Large Videos

    Authors: Allison Del Giorno, J. Andrew Bagnell, Martial Hebert

    Abstract: We address an anomaly detection setting in which training sequences are unavailable and anomalies are scored independently of temporal ordering. Current algorithms in anomaly detection are based on the classical density estimation approach of learning high-dimensional models and finding low-probability events. These algorithms are sensitive to the order in which anomalies appear and require either… ▽ More

    Submitted 28 September, 2016; originally announced September 2016.

    Comments: 14 pages without references, 16 pages with. 7 figures. Accepted to ECCV 2016

  35. arXiv:1608.00627  [pdf, other

    cs.RO cs.AI cs.LG

    Learning Transferable Policies for Monocular Reactive MAV Control

    Authors: Shreyansh Daftry, J. Andrew Bagnell, Martial Hebert

    Abstract: The ability to transfer knowledge gained in previous tasks into new contexts is one of the most important mechanisms of human learning. Despite this, adapting autonomous behavior to be reused in partially similar settings is still an open problem in current robotics research. In this paper, we take a small step in this direction and propose a generic framework for learning transferable motion poli… ▽ More

    Submitted 1 August, 2016; originally announced August 2016.

    Comments: International Symposium on Experimental Robotics (ISER 2016)

  36. arXiv:1607.08665  [pdf, other

    cs.RO cs.AI cs.CV

    Introspective Perception: Learning to Predict Failures in Vision Systems

    Authors: Shreyansh Daftry, Sam Zeng, J. Andrew Bagnell, Martial Hebert

    Abstract: As robots aspire for long-term autonomous operations in complex dynamic environments, the ability to reliably take mission-critical decisions in ambiguous situations becomes critical. This motivates the need to build systems that have situational awareness to assess how qualified they are at that moment to make a decision. We call this self-evaluating capability as introspection. In this paper, we… ▽ More

    Submitted 28 July, 2016; originally announced July 2016.

    Comments: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2016)

  37. arXiv:1604.04779  [pdf, other

    cs.RO

    Robust Monocular Flight in Cluttered Outdoor Environments

    Authors: Shreyansh Daftry, Sam Zeng, Arbaaz Khan, Debadeepta Dey, Narek Melik-Barkhudarov, J. Andrew Bagnell, Martial Hebert

    Abstract: Recently, there have been numerous advances in the development of biologically inspired lightweight Micro Aerial Vehicles (MAVs). While autonomous navigation is fairly straight-forward for large UAVs as expensive sensors and monitoring devices can be employed, robust methods for obstacle avoidance remains a challenging task for MAVs which operate at low altitude in cluttered unstructured environme… ▽ More

    Submitted 16 April, 2016; originally announced April 2016.

  38. arXiv:1602.06056  [pdf, other

    cs.RO

    A Convex Polynomial Force-Motion Model for Planar Sliding: Identification and Application

    Authors: Jiaji Zhou, Robert Paolini, J. Andrew Bagnell, Matthew T. Mason

    Abstract: We propose a polynomial force-motion model for planar sliding. The set of generalized friction loads is the 1-sublevel set of a polynomial whose gradient directions correspond to generalized velocities. Additionally, the polynomial is confined to be convex even-degree homogeneous in order to obey the maximum work inequality, symmetry, shape invariance in scale, and fast invertibility. We present a… ▽ More

    Submitted 15 June, 2016; v1 submitted 19 February, 2016; originally announced February 2016.

    Comments: 2016 IEEE International Conference on Robotics and Automation (ICRA)

  39. arXiv:1512.08836  [pdf, other

    cs.LG

    Learning to Filter with Predictive State Inference Machines

    Authors: Wen Sun, Arun Venkatraman, Byron Boots, J. Andrew Bagnell

    Abstract: Latent state space models are a fundamental and widely used tool for modeling dynamical systems. However, they are difficult to learn from data and learned models often lack performance guarantees on inference tasks such as filtering and prediction. In this work, we present the PREDICTIVE STATE INFERENCE MACHINE (PSIM), a data-driven method that considers the inference procedure on a dynamical sys… ▽ More

    Submitted 30 May, 2016; v1 submitted 29 December, 2015; originally announced December 2015.

    Comments: ICML 2016

  40. arXiv:1503.07619  [pdf, other

    cs.RO

    Shared Autonomy via Hindsight Optimization

    Authors: Shervin Javdani, Siddhartha S. Srinivasa, J. Andrew Bagnell

    Abstract: In shared autonomy, user input and robot autonomy are combined to control a robot to achieve a goal. Often, the robot does not know a priori which goal the user wants to achieve, and must both predict the user's intended goal, and assist in achieving that goal. We formulate the problem of shared autonomy as a Partially Observable Markov Decision Process with uncertainty over the user's goal. We ut… ▽ More

    Submitted 17 April, 2015; v1 submitted 26 March, 2015; originally announced March 2015.

  41. arXiv:1503.05451  [pdf, other

    cs.RO

    Autonomy Infused Teleoperation with Application to BCI Manipulation

    Authors: Katharina Muelling, Arun Venkatraman, Jean-Sebastien Valois, John Downey, Jeffrey Weiss, Shervin Javdani, Martial Hebert, Andrew B. Schwartz, Jennifer L. Collinger, J. Andrew Bagnell

    Abstract: Robot teleoperation systems face a common set of challenges including latency, low-dimensional user commands, and asymmetric control inputs. User control with Brain-Computer Interfaces (BCIs) exacerbates these problems through especially noisy and erratic low-dimensional motion commands due to the difficulty in decoding neural activity. We introduce a general framework to address these challenges… ▽ More

    Submitted 7 June, 2015; v1 submitted 18 March, 2015; originally announced March 2015.

  42. arXiv:1411.7974  [pdf, ps, other

    cs.AI cs.GT cs.MA

    Solving Games with Functional Regret Estimation

    Authors: Kevin Waugh, Dustin Morrill, J. Andrew Bagnell, Michael Bowling

    Abstract: We propose a novel online learning method for minimizing regret in large extensive-form games. The approach learns a function approximator online to estimate the regret for choosing a particular action. A no-regret algorithm uses these estimates in place of the true regrets to define a sequence of policies. We prove the approach sound by providing a bound relating the quality of the function app… ▽ More

    Submitted 31 December, 2014; v1 submitted 28 November, 2014; originally announced November 2014.

    Comments: AAAI Conference on Artificial Intelligence 2015

  43. arXiv:1411.6326  [pdf, other

    cs.RO cs.CV cs.LG

    Vision and Learning for Deliberative Monocular Cluttered Flight

    Authors: Debadeepta Dey, Kumar Shaurya Shankar, Sam Zeng, Rupesh Mehta, M. Talha Agcayazi, Christopher Eriksen, Shreyansh Daftry, Martial Hebert, J. Andrew Bagnell

    Abstract: Cameras provide a rich source of information while being passive, cheap and lightweight for small and medium Unmanned Aerial Vehicles (UAVs). In this work we present the first implementation of receding horizon control, which is widely used in ground vehicles, with monocular vision as the only sensing mode for autonomous UAV flight in dense clutter. We make it feasible on UAVs via a number of cont… ▽ More

    Submitted 23 November, 2014; originally announced November 2014.

  44. arXiv:1411.5007  [pdf, ps, other

    cs.AI cs.GT

    A Unified View of Large-scale Zero-sum Equilibrium Computation

    Authors: Kevin Waugh, J. Andrew Bagnell

    Abstract: The task of computing approximate Nash equilibria in large zero-sum extensive-form games has received a tremendous amount of attention due mainly to the Annual Computer Poker Competition. Immediately after its inception, two competing and seemingly different approaches emerged---one an application of no-regret online learning, the other a sophisticated gradient method applied to a convex-concave s… ▽ More

    Submitted 18 November, 2014; originally announced November 2014.

    Comments: AAAI Workshop on Computer Poker and Imperfect Information

  45. arXiv:1410.7376  [pdf, other

    cs.CV

    Visual Chunking: A List Prediction Framework for Region-Based Object Detection

    Authors: Nicholas Rhinehart, Jiaji Zhou, Martial Hebert, J. Andrew Bagnell

    Abstract: We consider detecting objects in an image by iteratively selecting from a set of arbitrarily shaped candidate regions. Our generic approach, which we term visual chunking, reasons about the locations of multiple object instances in an image while expressively describing object boundaries. We design an optimization criterion for measuring the performance of a list of such detections as a natural ex… ▽ More

    Submitted 16 March, 2015; v1 submitted 27 October, 2014; originally announced October 2014.

    Comments: to appear at ICRA 2015

  46. arXiv:1409.5495  [pdf, other

    cs.LG

    Efficient Feature Group Sequencing for Anytime Linear Prediction

    Authors: Hanzhang Hu, Alexander Grubb, J. Andrew Bagnell, Martial Hebert

    Abstract: We consider \textit{anytime} linear prediction in the common machine learning setting, where features are in groups that have costs. We achieve anytime (or interruptible) predictions by sequencing the computation of feature groups and reporting results using the computed features at interruption. We extend Orthogonal Matching Pursuit (OMP) and Forward Regression (FR) to learn the sequencing greedi… ▽ More

    Submitted 5 December, 2016; v1 submitted 18 September, 2014; originally announced September 2014.

    Comments: Published in UAI 2016, Proceedings of the Thirty-Second Conference on Uncertainty in Artificial Intelligence, UAI 2016

  47. arXiv:1406.5979  [pdf, ps, other

    cs.LG stat.ML

    Reinforcement and Imitation Learning via Interactive No-Regret Learning

    Authors: Stephane Ross, J. Andrew Bagnell

    Abstract: Recent work has demonstrated that problems-- particularly imitation learning and structured prediction-- where a learner's predictions influence the input-distribution it is tested on can be naturally addressed by an interactive approach and analyzed using no-regret online learning. These approaches to imitation learning, however, neither require nor benefit from information about the cost of acti… ▽ More

    Submitted 23 June, 2014; originally announced June 2014.

    Comments: 14 pages. Under review for NIPS 2014 conference

  48. arXiv:1402.5886  [pdf, other

    cs.LG cs.AI

    Near Optimal Bayesian Active Learning for Decision Making

    Authors: Shervin Javdani, Yuxin Chen, Amin Karbasi, Andreas Krause, J. Andrew Bagnell, Siddhartha Srinivasa

    Abstract: How should we gather information to make effective decisions? We address Bayesian active learning and experimental design problems, where we sequentially select tests to reduce uncertainty about a set of hypotheses. Instead of minimizing uncertainty per se, we consider a set of overlap** decision regions of these hypotheses. Our goal is to drive uncertainty into a single decision region as quick… ▽ More

    Submitted 24 February, 2014; originally announced February 2014.

    Comments: Extended version of work appearing in the International conference on Artificial Intelligence and Statistics (AISTATS) 2014

  49. arXiv:1312.0579  [pdf, other

    cs.LG

    SpeedMachines: Anytime Structured Prediction

    Authors: Alexander Grubb, Daniel Munoz, J. Andrew Bagnell, Martial Hebert

    Abstract: Structured prediction plays a central role in machine learning applications from computational biology to computer vision. These models require significantly more computation than unstructured models, and, in many applications, algorithms may need to make predictions within a computational budget or in an anytime fashion. In this work we propose an anytime technique for learning structured predict… ▽ More

    Submitted 2 December, 2013; originally announced December 2013.

    Comments: 17 pages

  50. arXiv:1308.3541  [pdf, ps, other

    cs.LG

    Knapsack Constrained Contextual Submodular List Prediction with Application to Multi-document Summarization

    Authors: Jiaji Zhou, Stephane Ross, Yisong Yue, Debadeepta Dey, J. Andrew Bagnell

    Abstract: We study the problem of predicting a set or list of options under knapsack constraint. The quality of such lists are evaluated by a submodular reward function that measures both quality and diversity. Similar to DAgger (Ross et al., 2010), by a reduction to online learning, we show how to adapt two sequence prediction models to imitate greedy maximization under knapsack constraint problems: CONSEQ… ▽ More

    Submitted 15 March, 2014; v1 submitted 15 August, 2013; originally announced August 2013.

    Comments: 8 pages, ICML 2013 Workshop on Inferning: Interactions between Inference and Learning