Search | arXiv e-print repository

The Virtues of Laziness in Model-based RL: A Unified Objective and Algorithms

Authors: Anirudh Vemula, Yuda Song, Aarti Singh, J. Andrew Bagnell, Sanjiban Choudhury

Abstract: We propose a novel approach to addressing two fundamental challenges in Model-based Reinforcement Learning (MBRL): the computational expense of repeatedly finding a good policy in the learned model, and the objective mismatch between model fitting and policy computation. Our "lazy" method leverages a novel unified objective, Performance Difference via Advantage in Model, to capture the performance… ▽ More We propose a novel approach to addressing two fundamental challenges in Model-based Reinforcement Learning (MBRL): the computational expense of repeatedly finding a good policy in the learned model, and the objective mismatch between model fitting and policy computation. Our "lazy" method leverages a novel unified objective, Performance Difference via Advantage in Model, to capture the performance difference between the learned policy and expert policy under the true dynamics. This objective demonstrates that optimizing the expected policy advantage in the learned model under an exploration distribution is sufficient for policy computation, resulting in a significant boost in computational efficiency compared to traditional planning methods. Additionally, the unified objective uses a value moment matching term for model fitting, which is aligned with the model's usage during policy computation. We present two no-regret algorithms to optimize the proposed objective, and demonstrate their statistical and computational gains compared to existing MBRL methods through simulated benchmarks. △ Less

Submitted 1 March, 2023; originally announced March 2023.

arXiv:2111.09434 [pdf, other]

On the Effectiveness of Iterative Learning Control

Authors: Anirudh Vemula, Wen Sun, Maxim Likhachev, J. Andrew Bagnell

Abstract: Iterative learning control (ILC) is a powerful technique for high performance tracking in the presence of modeling errors for optimal control applications. There is extensive prior work showing its empirical effectiveness in applications such as chemical reactors, industrial robots and quadcopters. However, there is little prior theoretical work that explains the effectiveness of ILC even in the p… ▽ More Iterative learning control (ILC) is a powerful technique for high performance tracking in the presence of modeling errors for optimal control applications. There is extensive prior work showing its empirical effectiveness in applications such as chemical reactors, industrial robots and quadcopters. However, there is little prior theoretical work that explains the effectiveness of ILC even in the presence of large modeling errors, where optimal control methods using the misspecified model (MM) often perform poorly. Our work presents such a theoretical study of the performance of both ILC and MM on Linear Quadratic Regulator (LQR) problems with unknown transition dynamics. We show that the suboptimality gap, as measured with respect to the optimal LQR controller, for ILC is lower than that for MM by higher order terms that become significant in the regime of high modeling errors. A key part of our analysis is the perturbation bounds for the discrete Ricatti equation in the finite horizon setting, where the solution is not a fixed point and requires tracking the error using recursive bounds. We back our theoretical findings with empirical experiments on a toy linear dynamical system with an approximate model, a nonlinear inverted pendulum system with misspecified mass, and a nonlinear planar quadrotor system in the presence of wind. Experiments show that ILC outperforms MM significantly, in terms of the cost of computed trajectories, when modeling errors are high. △ Less

Submitted 8 December, 2021; v1 submitted 17 November, 2021; originally announced November 2021.

Comments: Submitted to L4DC 2022

arXiv:2109.12427 [pdf, other]

Improved Soft Duplicate Detection in Search-Based Motion Planning

Authors: Nader Maray, Anirudh Vemula, Maxim Likhachev

Abstract: Search-based techniques have shown great success in motion planning problems such as robotic navigation by discretizing the state space and precomputing motion primitives. However in domains with complex dynamic constraints, constructing motion primitives in a discretized state space is non-trivial. This requires operating in continuous space which can be challenging for search-based planners as t… ▽ More Search-based techniques have shown great success in motion planning problems such as robotic navigation by discretizing the state space and precomputing motion primitives. However in domains with complex dynamic constraints, constructing motion primitives in a discretized state space is non-trivial. This requires operating in continuous space which can be challenging for search-based planners as they can get stuck in local minima regions. Previous work on planning in continuous spaces introduced soft duplicate detection which requires search to compute the duplicity of a state with respect to previously seen states to avoid exploring states that are likely to be duplicates, especially in local minima regions. They propose a simple metric utilizing the euclidean distance between states, and proximity to obstacles to compute the duplicity. In this paper, we improve upon this metric by introducing a kinodynamically informed metric, subtree overlap, between two states as the similarity between their successors that can be reached within a fixed time horizon using kinodynamic motion primitives. This captures the intuition that, due to robot dynamics, duplicate states can be far in euclidean distance and result in very similar successor states, while non-duplicate states can be close and result in widely different successors. △ Less

Submitted 25 September, 2021; originally announced September 2021.

Comments: submitted to ICRA2022

MSC Class: ACM-class: I.2.9

arXiv:2105.05019 [pdf, other]

Learning Optimal Decision Making for an Industrial Truck Unloading Robot using Minimal Simulator Runs

Authors: Manash Pratim Das, Anirudh Vemula, Mayank Pathak, Sandip Aine, Maxim Likhachev

Abstract: Consider a truck filled with boxes of varying size and unknown mass and an industrial robot with end-effectors that can unload multiple boxes from any reachable location. In this work, we investigate how would the robot with the help of a simulator, learn to maximize the number of boxes unloaded by each action. Most high-fidelity robotic simulators like ours are time-consuming. Therefore, we inves… ▽ More Consider a truck filled with boxes of varying size and unknown mass and an industrial robot with end-effectors that can unload multiple boxes from any reachable location. In this work, we investigate how would the robot with the help of a simulator, learn to maximize the number of boxes unloaded by each action. Most high-fidelity robotic simulators like ours are time-consuming. Therefore, we investigate the above learning problem with a focus on minimizing the number of simulation runs required. The optimal decision-making problem under this setting can be formulated as a multi-class classification problem. However, to obtain the outcome of any action requires us to run the time-consuming simulator, thereby restricting the amount of training data that can be collected. Thus, we need a data-efficient approach to learn the classifier and generalize it with a minimal amount of data. A high-fidelity physics-based simulator is common in general for complex manipulation tasks involving multi-body interactions. To this end, we train an optimal decision tree as the classifier, and for each branch of the decision tree, we reason about the confidence in the decision using a Probably Approximately Correct (PAC) framework to determine whether more simulator data will help reach a certain confidence level. This provides us with a mechanism to evaluate when simulation can be avoided for certain decisions, and when simulation will improve the decision making. For the truck unloading problem, our experiments show that a significant reduction in simulator runs can be achieved using the proposed method as compared to naively running the simulator to collect data to train equally performing decision trees. △ Less

Submitted 13 March, 2021; originally announced May 2021.

Comments: 8 pages, 8 figures, Pre-Print. This work has been submitted to the IEEE for possible publication

arXiv:2009.09942 [pdf, other]

CMAX++ : Leveraging Experience in Planning and Execution using Inaccurate Models

Authors: Anirudh Vemula, J. Andrew Bagnell, Maxim Likhachev

Abstract: Given access to accurate dynamical models, modern planning approaches are effective in computing feasible and optimal plans for repetitive robotic tasks. However, it is difficult to model the true dynamics of the real world before execution, especially for tasks requiring interactions with objects whose parameters are unknown. A recent planning approach, CMAX, tackles this problem by adapting the… ▽ More Given access to accurate dynamical models, modern planning approaches are effective in computing feasible and optimal plans for repetitive robotic tasks. However, it is difficult to model the true dynamics of the real world before execution, especially for tasks requiring interactions with objects whose parameters are unknown. A recent planning approach, CMAX, tackles this problem by adapting the planner online during execution to bias the resulting plans away from inaccurately modeled regions. CMAX, while being provably guaranteed to reach the goal, requires strong assumptions on the accuracy of the model used for planning and fails to improve the quality of the solution over repetitions of the same task. In this paper we propose CMAX++, an approach that leverages real-world experience to improve the quality of resulting plans over successive repetitions of a robotic task. CMAX++ achieves this by integrating model-free learning using acquired experience with model-based planning using the potentially inaccurate model. We provide provable guarantees on the completeness and asymptotic convergence of CMAX++ to the optimal path cost as the number of repetitions increases. CMAX++ is also shown to outperform baselines in simulated robotic tasks including 3D mobile robot navigation where the track friction is incorrectly modeled, and a 7D pick-and-place task where the mass of the object is unknown leading to discrepancy between true and modeled dynamics. △ Less

Submitted 15 October, 2020; v1 submitted 21 September, 2020; originally announced September 2020.

arXiv:2004.00500 [pdf, other]

Exploration in Action Space

Authors: Anirudh Vemula, Wen Sun, J. Andrew Bagnell

Abstract: Parameter space exploration methods with black-box optimization have recently been shown to outperform state-of-the-art approaches in continuous control reinforcement learning domains. In this paper, we examine reasons why these methods work better and the situations in which they are worse than traditional action space exploration methods. Through a simple theoretical analysis, we show that when… ▽ More Parameter space exploration methods with black-box optimization have recently been shown to outperform state-of-the-art approaches in continuous control reinforcement learning domains. In this paper, we examine reasons why these methods work better and the situations in which they are worse than traditional action space exploration methods. Through a simple theoretical analysis, we show that when the parametric complexity required to solve the reinforcement learning problem is greater than the product of action space dimensionality and horizon length, exploration in action space is preferred. This is also shown empirically by comparing simple exploration methods on several toy problems. △ Less

Submitted 30 March, 2020; originally announced April 2020.

Comments: Presented at RSS 2018 in Learning and Inference in Robotics: Integrating Structure, Priors and Models workshop. arXiv admin note: text overlap with arXiv:1901.11503

arXiv:2003.14393 [pdf, other]

TRON: A Fast Solver for Trajectory Optimization with Non-Smooth Cost Functions

Authors: Anirudh Vemula, J. Andrew Bagnell

Abstract: Trajectory optimization is an important tool for control and planning of complex, underactuated robots, and has shown impressive results in real world robotic tasks. However, in applications where the cost function to be optimized is non-smooth, modern trajectory optimization methods have extremely slow convergence. In this work, we present TRON, an iterative solver that can be used for efficient… ▽ More Trajectory optimization is an important tool for control and planning of complex, underactuated robots, and has shown impressive results in real world robotic tasks. However, in applications where the cost function to be optimized is non-smooth, modern trajectory optimization methods have extremely slow convergence. In this work, we present TRON, an iterative solver that can be used for efficient trajectory optimization in applications with non-smooth cost functions that are composed of smooth components. TRON achieves this by exploiting the structure of the objective to adaptively smooth the cost function, resulting in a sequence of objectives that can be efficiently optimized. TRON is provably guaranteed to converge to the global optimum of the non-smooth convex cost function when the dynamics are linear, and to a stationary point when the dynamics are nonlinear. Empirically, we show that TRON has faster convergence and lower final costs when compared to other trajectory optimization methods on a range of simulated tasks including collision-free motion planning for a mobile robot, sparse optimal control for surgical needle, and a satellite rendezvous problem. △ Less

Submitted 31 March, 2020; v1 submitted 31 March, 2020; originally announced March 2020.

Comments: Submitted to CDC 2020

arXiv:2003.04394 [pdf, other]

Planning and Execution using Inaccurate Models with Provable Guarantees

Authors: Anirudh Vemula, Yash Oza, J. Andrew Bagnell, Maxim Likhachev

Abstract: Models used in modern planning problems to simulate outcomes of real world action executions are becoming increasingly complex, ranging from simulators that do physics-based reasoning to precomputed analytical motion primitives. However, robots operating in the real world often face situations not modeled by these models before execution. This imperfect modeling can lead to highly suboptimal or ev… ▽ More Models used in modern planning problems to simulate outcomes of real world action executions are becoming increasingly complex, ranging from simulators that do physics-based reasoning to precomputed analytical motion primitives. However, robots operating in the real world often face situations not modeled by these models before execution. This imperfect modeling can lead to highly suboptimal or even incomplete behavior during execution. In this paper, we propose CMAX an approach for interleaving planning and execution. CMAX adapts its planning strategy online during real-world execution to account for any discrepancies in dynamics during planning, without requiring updates to the dynamics of the model. This is achieved by biasing the planner away from transitions whose dynamics are discovered to be inaccurately modeled, thereby leading to robot behavior that tries to complete the task despite having an inaccurate model. We provide provable guarantees on the completeness and efficiency of the proposed planning and execution framework under specific assumptions on the model, for both small and large state spaces. Our approach CMAX is shown to be efficient empirically in simulated robotic tasks including 4D planar pushing, and in real robotic experiments using PR2 involving a 3D pick-and-place task where the mass of the object is incorrectly modeled, and a 7D arm planning task where one of the joints is not operational leading to discrepancy in dynamics. The video of our physical robot experiments can be found at https://youtu.be/eQmAeWIhjO8 △ Less

Submitted 15 October, 2020; v1 submitted 9 March, 2020; originally announced March 2020.

Comments: Accepted at RSS 2020. 12 pages, 5 figures. Code at https://github.com/vvanirudh/CMAX , video at https://youtu.be/eQmAeWIhjO8 and blog post at https://vvanirudh.github.io/blog/cmax

arXiv:1910.12284 [pdf, other]

Task-Informed Fidelity Management for Speeding Up Robotics Simulation

Authors: Abhijeet Tallavajhula, Adrian Schoisengeier, Sung-Kyun Kim, Anirudh Vemula, Levi Lister, Oren Salzman

Abstract: Simulators are an important tool in robotics that is used to develop robot software and generate synthetic data for machine learning algorithms. Faster simulation can result in better software validation and larger amounts of data. Previous efforts for speeding up simulators have been performed at the level of simulator building blocks, and robot systems. Our key insight, motivating this work, is… ▽ More Simulators are an important tool in robotics that is used to develop robot software and generate synthetic data for machine learning algorithms. Faster simulation can result in better software validation and larger amounts of data. Previous efforts for speeding up simulators have been performed at the level of simulator building blocks, and robot systems. Our key insight, motivating this work, is that further speedups can be obtained at the level of the robot task. Building on the observation that not all parts of a scene need to be simulated in high fidelity at all times, our approach is to toggle between high- and low-fidelity states for scene objects in a task-informed manner. Our contribution is a framework for speeding up robot simulation by exploiting task knowledge. The framework is agnostic to the underlying simulator, and preserves simulation fidelity. As a case study, we consider a complex material-handling task. For the associated simulation, which contains many of the characteristics that make robot simulation slow, we achieve a speedup that can be up to three times faster than high fidelity without compromising on the quality of the results. We also demonstrate that faster simulation allows us to train better policies for performing the task at hand in a short period of time. A video summarizing our contributions can be found at https://youtu.be/PEzypDyqc3o . △ Less

Submitted 27 October, 2019; originally announced October 2019.

arXiv:1910.09453 [pdf, other]

Planning, Learning and Reasoning Framework for Robot Truck Unloading

Authors: Fahad Islam, Anirudh Vemula, Sung-Kyun Kim, Andrew Dornbush, Oren Salzman, Maxim Likhachev

Abstract: We consider the task of autonomously unloading boxes from trucks using an industrial manipulator robot. There are multiple challenges that arise: (1) real-time motion planning for a complex robotic system carrying two articulated mechanisms, an arm and a scooper, (2) decision-making in terms of what action to execute next given imperfect information about boxes such as their masses, (3) accounting… ▽ More We consider the task of autonomously unloading boxes from trucks using an industrial manipulator robot. There are multiple challenges that arise: (1) real-time motion planning for a complex robotic system carrying two articulated mechanisms, an arm and a scooper, (2) decision-making in terms of what action to execute next given imperfect information about boxes such as their masses, (3) accounting for the sequential nature of the problem where current actions affect future state of the boxes, and (4) real-time execution that interleaves high-level decision-making with lower level motion planning. In this work, we propose a planning, learning, and reasoning framework to tackle these challenges, and describe its components including motion planning, belief space planning for offline learning, online decision-making based on offline learning, and an execution module to combine decision-making with motion planning. We analyze the performance of the framework on real-world scenarios. In particular, motion planning and execution modules are evaluated in simulation and on a real robot, while offline learning and online decision-making are evaluated in simulated real-world scenarios. △ Less

Submitted 18 June, 2020; v1 submitted 21 October, 2019; originally announced October 2019.

arXiv:1905.10948 [pdf, other]

Provably Efficient Imitation Learning from Observation Alone

Authors: Wen Sun, Anirudh Vemula, Byron Boots, J. Andrew Bagnell

Abstract: We study Imitation Learning (IL) from Observations alone (ILFO) in large-scale MDPs. While most IL algorithms rely on an expert to directly provide actions to the learner, in this setting the expert only supplies sequences of observations. We design a new model-free algorithm for ILFO, Forward Adversarial Imitation Learning (FAIL), which learns a sequence of time-dependent policies by minimizing a… ▽ More We study Imitation Learning (IL) from Observations alone (ILFO) in large-scale MDPs. While most IL algorithms rely on an expert to directly provide actions to the learner, in this setting the expert only supplies sequences of observations. We design a new model-free algorithm for ILFO, Forward Adversarial Imitation Learning (FAIL), which learns a sequence of time-dependent policies by minimizing an Integral Probability Metric between the observation distributions of the expert policy and the learner. FAIL is the first provably efficient algorithm in ILFO setting, which learns a near-optimal policy with a number of samples that is polynomial in all relevant parameters but independent of the number of unique observations. The resulting theory extends the domain of provably sample efficient learning algorithms beyond existing results, which typically only consider tabular reinforcement learning settings or settings that require access to a near-optimal reset distribution. We also investigate the extension of FAIL in a model-based setting. Finally we demonstrate the efficacy of FAIL on multiple OpenAI Gym control tasks. △ Less

Submitted 11 June, 2019; v1 submitted 26 May, 2019; originally announced May 2019.

Comments: ICML 2019

arXiv:1901.11503 [pdf, other]

Contrasting Exploration in Parameter and Action Space: A Zeroth-Order Optimization Perspective

Authors: Anirudh Vemula, Wen Sun, J. Andrew Bagnell

Abstract: Black-box optimizers that explore in parameter space have often been shown to outperform more sophisticated action space exploration methods developed specifically for the reinforcement learning problem. We examine these black-box methods closely to identify situations in which they are worse than action space exploration methods and those in which they are superior. Through simple theoretical ana… ▽ More Black-box optimizers that explore in parameter space have often been shown to outperform more sophisticated action space exploration methods developed specifically for the reinforcement learning problem. We examine these black-box methods closely to identify situations in which they are worse than action space exploration methods and those in which they are superior. Through simple theoretical analyses, we prove that complexity of exploration in parameter space depends on the dimensionality of parameter space, while complexity of exploration in action space depends on both the dimensionality of action space and horizon length. This is also demonstrated empirically by comparing simple exploration methods on several model problems, including Contextual Bandit, Linear Regression and Reinforcement Learning in continuous control. △ Less

Submitted 31 January, 2019; originally announced January 2019.

Comments: Accepted at AISTATS 2019

arXiv:1710.04689 [pdf, other]

Social Attention: Modeling Attention in Human Crowds

Authors: Anirudh Vemula, Katharina Muelling, Jean Oh

Abstract: Robots that navigate through human crowds need to be able to plan safe, efficient, and human predictable trajectories. This is a particularly challenging problem as it requires the robot to predict future human trajectories within a crowd where everyone implicitly cooperates with each other to avoid collisions. Previous approaches to human trajectory prediction have modeled the interactions betwee… ▽ More Robots that navigate through human crowds need to be able to plan safe, efficient, and human predictable trajectories. This is a particularly challenging problem as it requires the robot to predict future human trajectories within a crowd where everyone implicitly cooperates with each other to avoid collisions. Previous approaches to human trajectory prediction have modeled the interactions between humans as a function of proximity. However, that is not necessarily true as some people in our immediate vicinity moving in the same direction might not be as important as other people that are further away, but that might collide with us in the future. In this work, we propose Social Attention, a novel trajectory prediction model that captures the relative importance of each person when navigating in the crowd, irrespective of their proximity. We demonstrate the performance of our method against a state-of-the-art approach on two publicly available crowd datasets and analyze the trained attention model to gain a better understanding of which surrounding agents humans attend to, when navigating in a crowd. △ Less

Submitted 29 October, 2018; v1 submitted 12 October, 2017; originally announced October 2017.

arXiv:1705.06201 [pdf, other]

Modeling Cooperative Navigation in Dense Human Crowds

Authors: Anirudh Vemula, Katharina Muelling, Jean Oh

Abstract: For robots to be a part of our daily life, they need to be able to navigate among crowds not only safely but also in a socially compliant fashion. This is a challenging problem because humans tend to navigate by implicitly cooperating with one another to avoid collisions, while heading toward their respective destinations. Previous approaches have used hand-crafted functions based on proximity to… ▽ More For robots to be a part of our daily life, they need to be able to navigate among crowds not only safely but also in a socially compliant fashion. This is a challenging problem because humans tend to navigate by implicitly cooperating with one another to avoid collisions, while heading toward their respective destinations. Previous approaches have used hand-crafted functions based on proximity to model human-human and human-robot interactions. However, these approaches can only model simple interactions and fail to generalize for complex crowded settings. In this paper, we develop an approach that models the joint distribution over future trajectories of all interacting agents in the crowd, through a local interaction model that we train using real human trajectory data. The interaction model infers the velocity of each agent based on the spatial orientation of other agents in his vicinity. During prediction, our approach infers the goal of the agent from its past trajectory and uses the learned model to predict its future trajectory. We demonstrate the performance of our method against a state-of-the-art approach on a public dataset and show that our model outperforms when predicting future trajectories for longer horizons. △ Less

Submitted 17 May, 2017; originally announced May 2017.

Comments: Accepted at ICRA 2017

arXiv:1605.06853 [pdf, other]

Path Planning in Dynamic Environments with Adaptive Dimensionality

Authors: Anirudh Vemula, Katharina Muelling, Jean Oh

Abstract: Path planning in the presence of dynamic obstacles is a challenging problem due to the added time dimension in search space. In approaches that ignore the time dimension and treat dynamic obstacles as static, frequent re-planning is unavoidable as the obstacles move, and their solutions are generally sub-optimal and can be incomplete. To achieve both optimality and completeness, it is necessary to… ▽ More Path planning in the presence of dynamic obstacles is a challenging problem due to the added time dimension in search space. In approaches that ignore the time dimension and treat dynamic obstacles as static, frequent re-planning is unavoidable as the obstacles move, and their solutions are generally sub-optimal and can be incomplete. To achieve both optimality and completeness, it is necessary to consider the time dimension during planning. The notion of adaptive dimensionality has been successfully used in high-dimensional motion planning such as manipulation of robot arms, but has not been used in the context of path planning in dynamic environments. In this paper, we apply the idea of adaptive dimensionality to speed up path planning in dynamic environments for a robot with no assumptions on its dynamic model. Specifically, our approach considers the time dimension only in those regions of the environment where a potential collision may occur, and plans in a low-dimensional state-space elsewhere. We show that our approach is complete and is guaranteed to find a solution, if one exists, within a cost sub-optimality bound. We experimentally validate our method on the problem of 3D vehicle navigation (x, y, heading) in dynamic environments. Our results show that the presented approach achieves substantial speedups in planning time over 4D heuristic-based A*, especially when the resulting plan deviates significantly from the one suggested by the heuristic. △ Less

Submitted 22 May, 2016; originally announced May 2016.

Comments: Accepted in SoCS 2016

Showing 1–15 of 15 results for author: Vemula, A