Search | arXiv e-print repository

Influencing Long-Term Behavior in Multiagent Reinforcement Learning

Authors: Dong-Ki Kim, Matthew Riemer, Miao Liu, Jakob N. Foerster, Michael Everett, Chuangchuang Sun, Gerald Tesauro, Jonathan P. How

Abstract: The main challenge of multiagent reinforcement learning is the difficulty of learning useful policies in the presence of other simultaneously learning agents whose changing behaviors jointly affect the environment's transition and reward dynamics. An effective approach that has recently emerged for addressing this non-stationarity is for each agent to anticipate the learning of other agents and in… ▽ More The main challenge of multiagent reinforcement learning is the difficulty of learning useful policies in the presence of other simultaneously learning agents whose changing behaviors jointly affect the environment's transition and reward dynamics. An effective approach that has recently emerged for addressing this non-stationarity is for each agent to anticipate the learning of other agents and influence the evolution of future policies towards desirable behavior for its own benefit. Unfortunately, previous approaches for achieving this suffer from myopic evaluation, considering only a finite number of policy updates. As such, these methods can only influence transient future policies rather than achieving the promise of scalable equilibrium selection approaches that influence the behavior at convergence. In this paper, we propose a principled framework for considering the limiting policies of other agents as time approaches infinity. Specifically, we develop a new optimization objective that maximizes each agent's average reward by directly accounting for the impact of its behavior on the limiting set of policies that other agents will converge to. Our paper characterizes desirable solution concepts within this problem setting and provides practical approaches for optimizing over possible outcomes. As a result of our farsighted objective, we demonstrate better long-term performance than state-of-the-art baselines across a suite of diverse multiagent benchmark domains. △ Less

Submitted 15 October, 2022; v1 submitted 7 March, 2022; originally announced March 2022.

Comments: Accepted to NeurIPS 2022. The earlier version was presented at the Gamification and Multiagent Solutions Workshop (ICLR 2022) with a spotlight. Code at https://github.com/dkkim93/further and videos at https://sites.google.com/view/further-marl

arXiv:2203.00851 [pdf, other]

Distributed Riemannian Optimization with Lazy Communication for Collaborative Geometric Estimation

Authors: Yulun Tian, Amrit Singh Bedi, Alec Koppel, Miguel Calvo-Fullana, David M. Rosen, Jonathan P. How

Abstract: We present the first distributed optimization algorithm with lazy communication for collaborative geometric estimation, the backbone of modern collaborative simultaneous localization and map** (SLAM) and structure-from-motion (SfM) applications. Our method allows agents to cooperatively reconstruct a shared geometric model on a central server by fusing individual observations, but without the ne… ▽ More We present the first distributed optimization algorithm with lazy communication for collaborative geometric estimation, the backbone of modern collaborative simultaneous localization and map** (SLAM) and structure-from-motion (SfM) applications. Our method allows agents to cooperatively reconstruct a shared geometric model on a central server by fusing individual observations, but without the need to transmit potentially sensitive information about the agents themselves (such as their locations). Furthermore, to alleviate the burden of communication during iterative optimization, we design a set of communication triggering conditions that enable agents to selectively upload a targeted subset of local information that is useful to global optimization. Our approach thus achieves significant communication reduction with minimal impact on optimization performance. As our main theoretical contribution, we prove that our method converges to first-order critical points with a global sublinear convergence rate. Numerical evaluations on bundle adjustment problems from collaborative SLAM and SfM datasets show that our method performs competitively against existing distributed techniques, while achieving up to 78% total communication reduction. △ Less

Submitted 29 July, 2022; v1 submitted 1 March, 2022; originally announced March 2022.

Comments: technical report (17 pages, 3 figures); to appear at IROS 2022

arXiv:2201.07372 [pdf, other]

Prospective Learning: Principled Extrapolation to the Future

Authors: Ashwin De Silva, Rahul Ramesh, Lyle Ungar, Marshall Hussain Shuler, Noah J. Cowan, Michael Platt, Chen Li, Leyla Isik, Seung-Eon Roh, Adam Charles, Archana Venkataraman, Brian Caffo, Javier J. How, Justus M Kebschull, John W. Krakauer, Maxim Bichuch, Kaleab Alemayehu Kinfu, Eva Yezerets, Dinesh Jayaraman, Jong M. Shin, Soledad Villar, Ian Phillips, Carey E. Priebe, Thomas Hartung, Michael I. Miller , et al. (18 additional authors not shown)

Abstract: Learning is a process which can update decision rules, based on past experience, such that future performance improves. Traditionally, machine learning is often evaluated under the assumption that the future will be identical to the past in distribution or change adversarially. But these assumptions can be either too optimistic or pessimistic for many problems in the real world. Real world scenari… ▽ More Learning is a process which can update decision rules, based on past experience, such that future performance improves. Traditionally, machine learning is often evaluated under the assumption that the future will be identical to the past in distribution or change adversarially. But these assumptions can be either too optimistic or pessimistic for many problems in the real world. Real world scenarios evolve over multiple spatiotemporal scales with partially predictable dynamics. Here we reformulate the learning problem to one that centers around this idea of dynamic futures that are partially learnable. We conjecture that certain sequences of tasks are not retrospectively learnable (in which the data distribution is fixed), but are prospectively learnable (in which distributions may be dynamic), suggesting that prospective learning is more difficult in kind than retrospective learning. We argue that prospective learning more accurately characterizes many real world problems that (1) currently stymie existing artificial intelligence solutions and/or (2) lack adequate explanations for how natural intelligences solve them. Thus, studying prospective learning will lead to deeper insights and solutions to currently vexing challenges in both natural and artificial intelligences. △ Less

Submitted 13 July, 2023; v1 submitted 18 January, 2022; originally announced January 2022.

Comments: Accepted at the 2nd Conference on Lifelong Learning Agents (CoLLAs), 2023

arXiv:2111.14990 [pdf, other]

MIXER: A Principled Framework for Multimodal, Multiway Data Association

Authors: Parker C. Lusk, Ronak Roy, Kaveh Fathian, Jonathan P. How

Abstract: A fundamental problem in robotic perception is matching identical objects or data, with applications such as loop closure detection, place recognition, object tracking, and map fusion. While the problem becomes considerably more challenging when matching should be done jointly across multiple, multimodal sets of data, the robustness and accuracy of matching in the presence of noise and outliers ca… ▽ More A fundamental problem in robotic perception is matching identical objects or data, with applications such as loop closure detection, place recognition, object tracking, and map fusion. While the problem becomes considerably more challenging when matching should be done jointly across multiple, multimodal sets of data, the robustness and accuracy of matching in the presence of noise and outliers can be greatly improved in this setting. At present, multimodal techniques do not leverage multiway information, and multiway techniques do not incorporate different modalities, leading to inferior results. In contrast, we present a principled mixed-integer quadratic framework to address this issue. We use a novel continuous relaxation in a projected gradient descent algorithm that guarantees feasible solutions of the integer program are obtained efficiently. We demonstrate experimentally that correspondences obtained from our approach are more stable to noise and errors than state-of-the-art techniques. Tested on a robotics dataset, our algorithm resulted in a 35% increase in F1 score when compared to the best alternative. △ Less

Submitted 29 November, 2021; originally announced November 2021.

Comments: presented in ICRA 2021 Workshop on Robust Perception for Autonomous Field Robots in Challenging Environments

arXiv:2110.00876 [pdf, other]

Incremental Non-Gaussian Inference for SLAM Using Normalizing Flows

Authors: Qiangqiang Huang, Can Pu, Kasra Khosoussi, David M. Rosen, Dehann Fourie, Jonathan P. How, John J. Leonard

Abstract: This paper presents normalizing flows for incremental smoothing and map** (NF-iSAM), a novel algorithm for inferring the full posterior distribution in SLAM problems with nonlinear measurement models and non-Gaussian factors. NF-iSAM exploits the expressive power of neural networks, and trains normalizing flows to model and sample the full posterior. By leveraging the Bayes tree, NF-iSAM enables… ▽ More This paper presents normalizing flows for incremental smoothing and map** (NF-iSAM), a novel algorithm for inferring the full posterior distribution in SLAM problems with nonlinear measurement models and non-Gaussian factors. NF-iSAM exploits the expressive power of neural networks, and trains normalizing flows to model and sample the full posterior. By leveraging the Bayes tree, NF-iSAM enables efficient incremental updates similar to iSAM2, albeit in the more challenging non-Gaussian setting. We demonstrate the advantages of NF-iSAM over state-of-the-art point and distribution estimation algorithms using range-only SLAM problems with data association ambiguity. NF-iSAM presents superior accuracy in describing the posterior beliefs of continuous variables (e.g., position) and discrete variables (e.g., data association). △ Less

Submitted 2 July, 2022; v1 submitted 2 October, 2021; originally announced October 2021.

Comments: Extension of work published at arXiv:2105.05045

arXiv:2109.09910 [pdf, other]

Demonstration-Efficient Guided Policy Search via Imitation of Robust Tube MPC

Authors: Andrea Tagliabue, Dong-Ki Kim, Michael Everett, Jonathan P. How

Abstract: We propose a demonstration-efficient strategy to compress a computationally expensive Model Predictive Controller (MPC) into a more computationally efficient representation based on a deep neural network and Imitation Learning (IL). By generating a Robust Tube variant (RTMPC) of the MPC and leveraging properties from the tube, we introduce a data augmentation method that enables high demonstration… ▽ More We propose a demonstration-efficient strategy to compress a computationally expensive Model Predictive Controller (MPC) into a more computationally efficient representation based on a deep neural network and Imitation Learning (IL). By generating a Robust Tube variant (RTMPC) of the MPC and leveraging properties from the tube, we introduce a data augmentation method that enables high demonstration-efficiency, being capable to compensate the distribution shifts typically encountered in IL. Our approach opens the possibility of zero-shot transfer from a single demonstration collected in a nominal domain, such as a simulation or a robot in a lab/controlled environment, to a domain with bounded model errors/perturbations. Numerical and experimental evaluations performed on a trajectory tracking MPC for a quadrotor show that our method outperforms strategies commonly employed in IL, such as DAgger and Domain Randomization, in terms of demonstration-efficiency and robustness to perturbations unseen during training. △ Less

Submitted 23 September, 2021; v1 submitted 20 September, 2021; originally announced September 2021.

Comments: Submitted to the 2022 IEEE Conference on Robotics and Automation (ICRA). Video: https://youtu.be/28zQFktJIqg

arXiv:2109.09876 [pdf, other]

Context-Specific Representation Abstraction for Deep Option Learning

Authors: Marwa Abdulhai, Dong-Ki Kim, Matthew Riemer, Miao Liu, Gerald Tesauro, Jonathan P. How

Abstract: Hierarchical reinforcement learning has focused on discovering temporally extended actions, such as options, that can provide benefits in problems requiring extensive exploration. One promising approach that learns these options end-to-end is the option-critic (OC) framework. We examine and show in this paper that OC does not decompose a problem into simpler sub-problems, but instead increases the… ▽ More Hierarchical reinforcement learning has focused on discovering temporally extended actions, such as options, that can provide benefits in problems requiring extensive exploration. One promising approach that learns these options end-to-end is the option-critic (OC) framework. We examine and show in this paper that OC does not decompose a problem into simpler sub-problems, but instead increases the size of the search over policy space with each option considering the entire state space during learning. This issue can result in practical limitations of this method, including sample inefficient learning. To address this problem, we introduce Context-Specific Representation Abstraction for Deep Option Learning (CRADOL), a new framework that considers both temporal abstraction and context-specific representation abstraction to effectively reduce the size of the search over policy space. Specifically, our method learns a factored belief state representation that enables each option to learn a policy over only a subsection of the state space. We test our method against hierarchical, non-hierarchical, and modular recurrent neural network baselines, demonstrating significant sample efficiency improvements in challenging partially observable environments. △ Less

Submitted 23 April, 2022; v1 submitted 20 September, 2021; originally announced September 2021.

Comments: Accepted at AAAI 2022

arXiv:2109.06795 [pdf, other]

ROMAX: Certifiably Robust Deep Multiagent Reinforcement Learning via Convex Relaxation

Authors: Chuangchuang Sun, Dong-Ki Kim, Jonathan P. How

Abstract: In a multirobot system, a number of cyber-physical attacks (e.g., communication hijack, observation perturbations) can challenge the robustness of agents. This robustness issue worsens in multiagent reinforcement learning because there exists the non-stationarity of the environment caused by simultaneously learning agents whose changing policies affect the transition and reward functions. In this… ▽ More In a multirobot system, a number of cyber-physical attacks (e.g., communication hijack, observation perturbations) can challenge the robustness of agents. This robustness issue worsens in multiagent reinforcement learning because there exists the non-stationarity of the environment caused by simultaneously learning agents whose changing policies affect the transition and reward functions. In this paper, we propose a minimax MARL approach to infer the worst-case policy update of other agents. As the minimax formulation is computationally intractable to solve, we apply the convex relaxation of neural networks to solve the inner minimization problem. Such convex relaxation enables robustness in interacting with peer agents that may have significantly different behaviors and also achieves a certified bound of the original optimization problem. We evaluate our approach on multiple mixed cooperative-competitive tasks and show that our method outperforms the previous state of the art approaches on this topic. △ Less

Submitted 14 September, 2021; originally announced September 2021.

arXiv:2108.04140 [pdf, other]

doi 10.1109/ACCESS.2021.3133370

Reachability Analysis of Neural Feedback Loops

Authors: Michael Everett, Golnaz Habibi, Chuangchuang Sun, Jonathan P. How

Abstract: Neural Networks (NNs) can provide major empirical performance improvements for closed-loop systems, but they also introduce challenges in formally analyzing those systems' safety properties. In particular, this work focuses on estimating the forward reachable set of \textit{neural feedback loops} (closed-loop systems with NN controllers). Recent work provides bounds on these reachable sets, but th… ▽ More Neural Networks (NNs) can provide major empirical performance improvements for closed-loop systems, but they also introduce challenges in formally analyzing those systems' safety properties. In particular, this work focuses on estimating the forward reachable set of \textit{neural feedback loops} (closed-loop systems with NN controllers). Recent work provides bounds on these reachable sets, but the computationally tractable approaches yield overly conservative bounds (thus cannot be used to verify useful properties), and the methods that yield tighter bounds are too intensive for online computation. This work bridges the gap by formulating a convex optimization problem for the reachability analysis of closed-loop systems with NN controllers. While the solutions are less tight than previous (semidefinite program-based) methods, they are substantially faster to compute, and some of those computational time savings can be used to refine the bounds through new input set partitioning techniques, which is shown to dramatically reduce the tightness gap. The new framework is developed for systems with uncertainty (e.g., measurement and process noise) and nonlinearities (e.g., polynomial dynamics), and thus is shown to be applicable to real-world systems. To inform the design of an initial state set when only the target state set is known/specified, a novel algorithm for backward reachability analysis is also provided, which computes the set of states that are guaranteed to lead to the target set. The numerical experiments show that our approach (based on linear relaxations and partitioning) gives a $5\times$ reduction in conservatism in $150\times$ less computation time compared to the state-of-the-art. Furthermore, experiments on quadrotor, 270-state, and polynomial systems demonstrate the method's ability to handle uncertainty sources, high dimensionality, and nonlinear dynamics, respectively. △ Less

Submitted 2 February, 2022; v1 submitted 9 August, 2021; originally announced August 2021.

arXiv:2106.14386 [pdf, other]

Kimera-Multi: Robust, Distributed, Dense Metric-Semantic SLAM for Multi-Robot Systems

Authors: Yulun Tian, Yun Chang, Fernando Herrera Arias, Carlos Nieto-Granda, Jonathan P. How, Luca Carlone

Abstract: This paper presents Kimera-Multi, the first multi-robot system that (i) is robust and capable of identifying and rejecting incorrect inter and intra-robot loop closures resulting from perceptual aliasing, (ii) is fully distributed and only relies on local (peer-to-peer) communication to achieve distributed localization and map**, and (iii) builds a globally consistent metric-semantic 3D mesh mod… ▽ More This paper presents Kimera-Multi, the first multi-robot system that (i) is robust and capable of identifying and rejecting incorrect inter and intra-robot loop closures resulting from perceptual aliasing, (ii) is fully distributed and only relies on local (peer-to-peer) communication to achieve distributed localization and map**, and (iii) builds a globally consistent metric-semantic 3D mesh model of the environment in real-time, where faces of the mesh are annotated with semantic labels. Kimera-Multi is implemented by a team of robots equipped with visual-inertial sensors. Each robot builds a local trajectory estimate and a local mesh using Kimera. When communication is available, robots initiate a distributed place recognition and robust pose graph optimization protocol based on a novel distributed graduated non-convexity algorithm. The proposed protocol allows the robots to improve their local trajectory estimates by leveraging inter-robot loop closures while being robust to outliers. Finally, each robot uses its improved trajectory estimate to correct the local mesh using mesh deformation techniques. We demonstrate Kimera-Multi in photo-realistic simulations, SLAM benchmarking datasets, and challenging outdoor datasets collected using ground robots. Both real and simulated experiments involve long trajectories (e.g., up to 800 meters per robot). The experiments show that Kimera-Multi (i) outperforms the state of the art in terms of robustness and accuracy, (ii) achieves estimation errors comparable to a centralized SLAM system while being fully distributed, (iii) is parsimonious in terms of communication bandwidth, (iv) produces accurate metric-semantic 3D meshes, and (v) is modular and can be also used for standard 3D reconstruction (i.e., without semantic labels) or for trajectory estimation (i.e., without reconstructing a 3D mesh). △ Less

Submitted 17 December, 2021; v1 submitted 27 June, 2021; originally announced June 2021.

Comments: Accepted by IEEE Transactions on Robotics (18 pages, 15 figures)

arXiv:2105.13506 [pdf, other]

Airflow-Inertial Odometry for Resilient State Estimation on Multirotors

Authors: Andrea Tagliabue, Jonathan P. How

Abstract: We present a dead reckoning strategy for increased resilience to position estimation failures on multirotors, using only data from a low-cost IMU and novel, bio-inspired airflow sensors. The goal is challenging, since low-cost IMUs are subject to large noise and drift, while 3D airflow sensing is made difficult by the interference caused by the propellers and by the wind. Our approach relies on a… ▽ More We present a dead reckoning strategy for increased resilience to position estimation failures on multirotors, using only data from a low-cost IMU and novel, bio-inspired airflow sensors. The goal is challenging, since low-cost IMUs are subject to large noise and drift, while 3D airflow sensing is made difficult by the interference caused by the propellers and by the wind. Our approach relies on a deep-learning strategy to interpret the measurements of the bio-inspired sensors, a map of the wind speed to compensate for position-dependent wind, and a filter to fuse the information and generate a pose and velocity estimate. Our results show that the approach reduces the drift with respect to IMU-only dead reckoning by up to an order of magnitude over 30 seconds after a position sensor failure in non-windy environments, and it can compensate for the challenging effects of turbulent, and spatially varying wind. △ Less

Submitted 27 May, 2021; originally announced May 2021.

Comments: Accepted to the 2021 International Conference on Robotics and Automation (ICRA 2021). Contains minor updates in Fig. 2 and Section IV.b, VII.E, VII.F

arXiv:2105.05045 [pdf, other]

NF-iSAM: Incremental Smoothing and Map** via Normalizing Flows

Authors: Qiangqiang Huang, Can Pu, Dehann Fourie, Kasra Khosoussi, Jonathan P. How, John J. Leonard

Abstract: This paper presents a novel non-Gaussian inference algorithm, Normalizing Flow iSAM (NF-iSAM), for solving SLAM problems with non-Gaussian factors and/or non-linear measurement models. NF-iSAM exploits the expressive power of neural networks, and trains normalizing flows to draw samples from the joint posterior of non-Gaussian factor graphs. By leveraging the Bayes tree, NF-iSAM is able to exploit… ▽ More This paper presents a novel non-Gaussian inference algorithm, Normalizing Flow iSAM (NF-iSAM), for solving SLAM problems with non-Gaussian factors and/or non-linear measurement models. NF-iSAM exploits the expressive power of neural networks, and trains normalizing flows to draw samples from the joint posterior of non-Gaussian factor graphs. By leveraging the Bayes tree, NF-iSAM is able to exploit the sparsity structure of SLAM, thus enabling efficient incremental updates similar to iSAM2, albeit in the more challenging non-Gaussian setting. We demonstrate the performance of NF-iSAM and compare it against the state-of-the-art algorithms such as iSAM2 (Gaussian) and mm-iSAM (non-Gaussian) in synthetic and real range-only SLAM datasets. △ Less

Submitted 11 May, 2021; originally announced May 2021.

Comments: 8 pages, 6 figures, to be published in IEEE International Conference on Robotics and Automation (ICRA) 2021

arXiv:2103.14805 [pdf, other]

Multi-Robot Distributed Semantic Map** in Unfamiliar Environments through Online Matching of Learned Representations

Authors: Stewart Jamieson, Kaveh Fathian, Kasra Khosoussi, Jonathan P. How, Yogesh Girdhar

Abstract: We present a solution to multi-robot distributed semantic map** of novel and unfamiliar environments. Most state-of-the-art semantic map** systems are based on supervised learning algorithms that cannot classify novel observations online. While unsupervised learning algorithms can invent labels for novel observations, approaches to detect when multiple robots have independently developed their… ▽ More We present a solution to multi-robot distributed semantic map** of novel and unfamiliar environments. Most state-of-the-art semantic map** systems are based on supervised learning algorithms that cannot classify novel observations online. While unsupervised learning algorithms can invent labels for novel observations, approaches to detect when multiple robots have independently developed their own labels for the same new class are prone to erroneous or inconsistent matches. These issues worsen as the number of robots in the system increases and prevent fusing the local maps produced by each robot into a consistent global map, which is crucial for cooperative planning and joint mission summarization. Our proposed solution overcomes these obstacles by having each robot learn an unsupervised semantic scene model online and use a multiway matching algorithm to identify consistent sets of matches between learned semantic labels belonging to different robots. Compared to the state of the art, the proposed solution produces 20-60% higher quality global maps that do not degrade even as many more local maps are fused. △ Less

Submitted 27 March, 2021; originally announced March 2021.

Comments: 7 pages, 6 figures, 1 table; accepted for presentation in IEEE Int. Conf. on Robotics and Automation, ICRA '21, Xi'an, China, June 2021

arXiv:2103.06372 [pdf, other]

doi 10.1109/ACCESS.2022.3154037

PANTHER: Perception-Aware Trajectory Planner in Dynamic Environments

Authors: Jesus Tordesillas, Jonathan P. How

Abstract: This paper presents PANTHER, a real-time perception-aware (PA) trajectory planner for multirotor-UAVs (Unmanned Aerial Vehicles) in dynamic environments. PANTHER plans trajectories that avoid dynamic obstacles while also kee** them in the sensor field of view (FOV) and minimizing the blur to aid in object tracking. The rotation and translation of the UAV are jointly optimized, which allows PANTH… ▽ More This paper presents PANTHER, a real-time perception-aware (PA) trajectory planner for multirotor-UAVs (Unmanned Aerial Vehicles) in dynamic environments. PANTHER plans trajectories that avoid dynamic obstacles while also kee** them in the sensor field of view (FOV) and minimizing the blur to aid in object tracking. The rotation and translation of the UAV are jointly optimized, which allows PANTHER to fully exploit the differential flatness of multirotors to maximize the PA objective. Real-time performance is achieved by implicitly imposing the underactuated dynamics of the UAV through the Hopf fibration. PANTHER is able to keep the obstacles inside the FOV 7.9 and 1.5 times more than non-PA approaches and PA approaches that decouple translation and yaw, respectively. The projected velocity (and hence the blur) is reduced by 18% and 34%, respectively. This leads to average success rates three times larger than state-of-the-art approaches in multi-obstacle avoidance scenarios. The MINVO basis is used to impose low-conservative collision avoidance constraints in position and velocity space. Finally, extensive hardware experiments in unknown dynamic environments with all the computation running onboard are presented, with velocities of up to 5.8 m/s, and with relative velocities (with respect to the obstacles) of up to 6.3 m/s. The only sensors used are an IMU, a forward-facing depth camera, and a downward-facing monocular camera. △ Less

Submitted 22 March, 2022; v1 submitted 10 March, 2021; originally announced March 2021.

Comments: 16 pages

arXiv:2102.13073 [pdf, other]

Where to go next: Learning a Subgoal Recommendation Policy for Navigation Among Pedestrians

Authors: Bruno Brito, Michael Everett, Jonathan P. How, Javier Alonso-Mora

Abstract: Robotic navigation in environments shared with other robots or humans remains challenging because the intentions of the surrounding agents are not directly observable and the environment conditions are continuously changing. Local trajectory optimization methods, such as model predictive control (MPC), can deal with those changes but require global guidance, which is not trivial to obtain in crowd… ▽ More Robotic navigation in environments shared with other robots or humans remains challenging because the intentions of the surrounding agents are not directly observable and the environment conditions are continuously changing. Local trajectory optimization methods, such as model predictive control (MPC), can deal with those changes but require global guidance, which is not trivial to obtain in crowded scenarios. This paper proposes to learn, via deep Reinforcement Learning (RL), an interaction-aware policy that provides long-term guidance to the local planner. In particular, in simulations with cooperative and non-cooperative agents, we train a deep network to recommend a subgoal for the MPC planner. The recommended subgoal is expected to help the robot in making progress towards its goal and accounts for the expected interaction with other agents. Based on the recommended subgoal, the MPC planner then optimizes the inputs for the robot satisfying its kinodynamic and collision avoidance constraints. Our approach is shown to substantially improve the navigation performance in terms of number of collisions as compared to prior MPC frameworks, and in terms of both travel time and number of collisions compared to deep RL methods in cooperative, competitive and mixed multiagent scenarios. △ Less

Submitted 26 February, 2021; v1 submitted 25 February, 2021; originally announced February 2021.

Comments: 8 pages, 6 figures

arXiv:2102.03443 [pdf, other]

LION: Lidar-Inertial Observability-Aware Navigator for Vision-Denied Environments

Authors: Andrea Tagliabue, Jesus Tordesillas, Xiaoyi Cai, Angel Santamaria-Navarro, Jonathan P. How, Luca Carlone, Ali-akbar Agha-mohammadi

Abstract: State estimation for robots navigating in GPS-denied and perceptually-degraded environments, such as underground tunnels, mines and planetary subsurface voids, remains challenging in robotics. Towards this goal, we present LION (Lidar-Inertial Observability-Aware Navigator), which is part of the state estimation framework developed by the team CoSTAR for the DARPA Subterranean Challenge, where the… ▽ More State estimation for robots navigating in GPS-denied and perceptually-degraded environments, such as underground tunnels, mines and planetary subsurface voids, remains challenging in robotics. Towards this goal, we present LION (Lidar-Inertial Observability-Aware Navigator), which is part of the state estimation framework developed by the team CoSTAR for the DARPA Subterranean Challenge, where the team achieved second and first places in the Tunnel and Urban circuits in August 2019 and February 2020, respectively. LION provides high-rate odometry estimates by fusing high-frequency inertial data from an IMU and low-rate relative pose estimates from a lidar via a fixed-lag sliding window smoother. LION does not require knowledge of relative positioning between lidar and IMU, as the extrinsic calibration is estimated online. In addition, LION is able to self-assess its performance using an observability metric that evaluates whether the pose estimate is geometrically ill-constrained. Odometry and confidence estimates are used by HeRO, a supervisory algorithm that provides robust estimates by switching between different odometry sources. In this paper we benchmark the performance of LION in perceptually-degraded subterranean environments, demonstrating its high technology readiness level for deployment in the field. △ Less

Submitted 5 February, 2021; originally announced February 2021.

Comments: 2020 International Symposium on Experimental Robotics (ISER 2020)

arXiv:2101.11093 [pdf, other]

Non-Monotone Energy-Aware Information Gathering for Heterogeneous Robot Teams

Authors: Xiaoyi Cai, Brent Schlotfeldt, Kasra Khosoussi, Nikolay Atanasov, George J. Pappas, Jonathan P. How

Abstract: This paper considers the problem of planning trajectories for a team of sensor-equipped robots to reduce uncertainty about a dynamical process. Optimizing the trade-off between information gain and energy cost (e.g., control effort, distance travelled) is desirable but leads to a non-monotone objective function in the set of robot trajectories. Therefore, common multi-robot planning algorithms bas… ▽ More This paper considers the problem of planning trajectories for a team of sensor-equipped robots to reduce uncertainty about a dynamical process. Optimizing the trade-off between information gain and energy cost (e.g., control effort, distance travelled) is desirable but leads to a non-monotone objective function in the set of robot trajectories. Therefore, common multi-robot planning algorithms based on techniques such as coordinate descent lose their performance guarantees. Methods based on local search provide performance guarantees for optimizing a non-monotone submodular function, but require access to all robots' trajectories, making it not suitable for distributed execution. This work proposes a distributed planning approach based on local search and shows how lazy/greedy methods can be adopted to reduce the computation and communication of the approach. We demonstrate the efficacy of the proposed method by coordinating robot teams composed of both ground and aerial vehicles with different sensing/control profiles and evaluate the algorithm's performance in two target tracking scenarios. Compared to the naive distributed execution of local search, our approach saves up to 60% communication and 80--92% computation on average when coordinating up to 10 robots, while outperforming the coordinate descent based algorithm in achieving a desirable trade-off between sensing and energy cost. △ Less

Submitted 26 March, 2021; v1 submitted 26 January, 2021; originally announced January 2021.

Comments: To appear in ICRA 2021. Video: https://www.youtube.com/watch?v=xWgFi6fwex0

arXiv:2101.01815 [pdf, other]

Efficient Reachability Analysis of Closed-Loop Systems with Neural Network Controllers

Authors: Michael Everett, Golnaz Habibi, Jonathan P. How

Abstract: Neural Networks (NNs) can provide major empirical performance improvements for robotic systems, but they also introduce challenges in formally analyzing those systems' safety properties. In particular, this work focuses on estimating the forward reachable set of closed-loop systems with NN controllers. Recent work provides bounds on these reachable sets, yet the computationally efficient approache… ▽ More Neural Networks (NNs) can provide major empirical performance improvements for robotic systems, but they also introduce challenges in formally analyzing those systems' safety properties. In particular, this work focuses on estimating the forward reachable set of closed-loop systems with NN controllers. Recent work provides bounds on these reachable sets, yet the computationally efficient approaches provide overly conservative bounds (thus cannot be used to verify useful properties), whereas tighter methods are too intensive for online computation. This work bridges the gap by formulating a convex optimization problem for reachability analysis for closed-loop systems with NN controllers. While the solutions are less tight than prior semidefinite program-based methods, they are substantially faster to compute, and some of the available computation time can be used to refine the bounds through input set partitioning, which more than overcomes the tightness gap. The proposed framework further considers systems with measurement and process noise, thus being applicable to realistic systems with uncertainty. Finally, numerical comparisons show $10\times$ reduction in conservatism in $\frac{1}{2}$ of the computation time compared to the state-of-the-art, and the ability to handle various sources of uncertainty is highlighted on a quadrotor model. △ Less

Submitted 24 May, 2021; v1 submitted 5 January, 2021; originally announced January 2021.

arXiv:2012.12403 [pdf, other]

Performance Analysis of Adaptive Dynamic Tube MPC

Authors: Savva Morozov, Parker C. Lusk, Brett T. Lopez, Jonathan P. How

Abstract: Model predictive control (MPC) is an effective method for control of constrained systems but is susceptible to the external disturbances and modeling error often encountered in real-world applications. To address these issues, techniques such as Tube MPC (TMPC) utilize an ancillary offline-generated robust controller to ensure that the system remains within an invariant set, referred to as a tube,… ▽ More Model predictive control (MPC) is an effective method for control of constrained systems but is susceptible to the external disturbances and modeling error often encountered in real-world applications. To address these issues, techniques such as Tube MPC (TMPC) utilize an ancillary offline-generated robust controller to ensure that the system remains within an invariant set, referred to as a tube, around an online-generated trajectory. However, TMPC is unable to modify its tube and ancillary controller in response to changing state-dependent uncertainty, often resulting in overly-conservative solutions. Dynamic Tube MPC (DTMPC) addresses these problems by simultaneously optimizing the desired trajectory and tube geometry online. Building upon this framework, Adaptive DTMPC (ADTMPC) produces better model approximations by reducing model uncertainty, resulting in more accurate control policies. This work presents an experimental analysis and performance evaluation of TMPC, DTMPC, and ADTMPC for an uncertain nonlinear system. In particular, DTMPC is shown to outperform TMPC because it is able to dynamically adjust to changing environments, limiting aggressive control and conservative behavior to only the cases when the constraints and uncertainty require it. Applied to a pendulum testbed, this enables DTMPC to use up to 30% less control effort while achieving up to 80% higher speeds. This performance is further improved by ADTMPC, which reduces the feedback control effort by up to another 35%, while delivering up to 34% better trajectory tracking. This analysis establishes that the DTMPC and ADTMPC frameworks yield significantly more effective robust control policies for systems with changing uncertainty, goals, and operating conditions. △ Less

Submitted 22 December, 2020; originally announced December 2020.

Comments: 14 main pages, 2 additional pages, accepted to 2021 AIAA SciTech Forum

arXiv:2011.10202 [pdf, other]

CLIPPER: A Graph-Theoretic Framework for Robust Data Association

Authors: Parker C. Lusk, Kaveh Fathian, Jonathan P. How

Abstract: We present CLIPPER (Consistent LInking, Pruning, and Pairwise Error Rectification), a framework for robust data association in the presence of noise and outliers. We formulate the problem in a graph-theoretic framework using the notion of geometric consistency. State-of-the-art techniques that use this framework utilize either combinatorial optimization techniques that do not scale well to large-s… ▽ More We present CLIPPER (Consistent LInking, Pruning, and Pairwise Error Rectification), a framework for robust data association in the presence of noise and outliers. We formulate the problem in a graph-theoretic framework using the notion of geometric consistency. State-of-the-art techniques that use this framework utilize either combinatorial optimization techniques that do not scale well to large-sized problems, or use heuristic approximations that yield low accuracy in high-noise, high-outlier regimes. In contrast, CLIPPER uses a relaxation of the combinatorial problem and returns solutions that are guaranteed to correspond to the optima of the original problem. Low time complexity is achieved with an efficient projected gradient ascent approach. Experiments indicate that CLIPPER maintains a consistently low runtime of 15 ms where exact methods can require up to 24 s at their peak, even on small-sized problems with 200 associations. When evaluated on noisy point cloud registration problems, CLIPPER achieves 100% precision and 98% recall in 90% outlier regimes while competing algorithms begin degrading by 70% outliers. In an instance of associating noisy points of the Stanford Bunny with 990 outlier associations and only 10 inlier associations, CLIPPER successfully returns 8 inlier associations with 100% precision in 138 ms. Code is available at https://mit-acl.github.io/clipper. △ Less

Submitted 9 April, 2021; v1 submitted 19 November, 2020; originally announced November 2020.

Comments: accepted ICRA'21

arXiv:2011.04087 [pdf, other]

doi 10.1109/ICRA48506.2021.9561090

Kimera-Multi: a System for Distributed Multi-Robot Metric-Semantic Simultaneous Localization and Map**

Authors: Yun Chang, Yulun Tian, Jonathan P. How, Luca Carlone

Abstract: We present the first fully distributed multi-robot system for dense metric-semantic Simultaneous Localization and Map** (SLAM). Our system, dubbed Kimera-Multi, is implemented by a team of robots equipped with visual-inertial sensors, and builds a 3D mesh model of the environment in real-time, where each face of the mesh is annotated with a semantic label (e.g., building, road, objects). In Kime… ▽ More We present the first fully distributed multi-robot system for dense metric-semantic Simultaneous Localization and Map** (SLAM). Our system, dubbed Kimera-Multi, is implemented by a team of robots equipped with visual-inertial sensors, and builds a 3D mesh model of the environment in real-time, where each face of the mesh is annotated with a semantic label (e.g., building, road, objects). In Kimera-Multi, each robot builds a local trajectory estimate and a local mesh using Kimera. Then, when two robots are within communication range, they initiate a distributed place recognition and robust pose graph optimization protocol with a novel incremental maximum clique outlier rejection; the protocol allows the robots to improve their local trajectory estimates by leveraging inter-robot loop closures. Finally, each robot uses its improved trajectory estimate to correct the local mesh using mesh deformation techniques. We demonstrate Kimera-Multi in photo-realistic simulations and real data. Kimera-Multi (i) is able to build accurate 3D metric-semantic meshes, (ii) is robust to incorrect loop closures while requiring less computation than state-of-the-art distributed SLAM back-ends, and (iii) is efficient, both in terms of computation at each robot as well as communication bandwidth. △ Less

Submitted 8 November, 2020; originally announced November 2020.

Comments: 9 pages

arXiv:2011.00382 [pdf, other]

A Policy Gradient Algorithm for Learning to Learn in Multiagent Reinforcement Learning

Authors: Dong-Ki Kim, Miao Liu, Matthew Riemer, Chuangchuang Sun, Marwa Abdulhai, Golnaz Habibi, Sebastian Lopez-Cot, Gerald Tesauro, Jonathan P. How

Abstract: A fundamental challenge in multiagent reinforcement learning is to learn beneficial behaviors in a shared environment with other simultaneously learning agents. In particular, each agent perceives the environment as effectively non-stationary due to the changing policies of other agents. Moreover, each agent is itself constantly learning, leading to natural non-stationarity in the distribution of… ▽ More A fundamental challenge in multiagent reinforcement learning is to learn beneficial behaviors in a shared environment with other simultaneously learning agents. In particular, each agent perceives the environment as effectively non-stationary due to the changing policies of other agents. Moreover, each agent is itself constantly learning, leading to natural non-stationarity in the distribution of experiences encountered. In this paper, we propose a novel meta-multiagent policy gradient theorem that directly accounts for the non-stationary policy dynamics inherent to multiagent learning settings. This is achieved by modeling our gradient updates to consider both an agent's own non-stationary policy dynamics and the non-stationary policy dynamics of other agents in the environment. We show that our theoretically grounded approach provides a general solution to the multiagent learning problem, which inherently comprises all key aspects of previous state of the art approaches on this topic. We test our method on a diverse suite of multiagent benchmarks and demonstrate a more efficient ability to adapt to new agents as they learn than baseline methods across the full spectrum of mixed incentive, competitive, and cooperative domains. △ Less

Submitted 11 June, 2021; v1 submitted 31 October, 2020; originally announced November 2020.

Comments: Accepted to ICML 2021. Code at https://github.com/dkkim93/meta-mapg and Videos at https://sites.google.com/view/meta-mapg/home

arXiv:2010.11061 [pdf, other]

MADER: Trajectory Planner in Multi-Agent and Dynamic Environments

Authors: Jesus Tordesillas, Jonathan P. How

Abstract: This paper presents MADER, a 3D decentralized and asynchronous trajectory planner for UAVs that generates collision-free trajectories in environments with static obstacles, dynamic obstacles, and other planning agents. Real-time collision avoidance with other dynamic obstacles or agents is done by performing outer polyhedral representations of every interval of the trajectories and then including… ▽ More This paper presents MADER, a 3D decentralized and asynchronous trajectory planner for UAVs that generates collision-free trajectories in environments with static obstacles, dynamic obstacles, and other planning agents. Real-time collision avoidance with other dynamic obstacles or agents is done by performing outer polyhedral representations of every interval of the trajectories and then including the plane that separates each pair of polyhedra as a decision variable in the optimization problem. MADER uses our recently developed MINVO basis to obtain outer polyhedral representations with volumes 2.36 and 254.9 times, respectively, smaller than the Bernstein or B-Spline bases used extensively in the planning literature. Our decentralized and asynchronous algorithm guarantees safety with respect to other agents by including their committed trajectories as constraints in the optimization and then executing a collision check-recheck scheme. Finally, extensive simulations in challenging cluttered environments show up to a 33.9% reduction in the flight time, and a 88.8% reduction in the number of stops compared to the Bernstein and B-Spline bases, shorter flight distances than centralized approaches, and shorter total times on average than synchronous decentralized approaches. △ Less

Submitted 15 April, 2021; v1 submitted 21 October, 2020; originally announced October 2020.

Comments: 15 pages, 15 figures

arXiv:2010.10726 [pdf, other]

MINVO Basis: Finding Simplexes with Minimum Volume Enclosing Polynomial Curves

Authors: Jesus Tordesillas, Jonathan P. How

Abstract: This paper studies the polynomial basis that generates the smallest $n$-simplex enclosing a given $n^{\text{th}}$-degree polynomial curve in $\mathbb{R}^n$. Although the Bernstein and B-Spline polynomial bases provide feasible solutions to this problem, the simplexes obtained by these bases are not the smallest possible, which leads to overly conservative results in many CAD (computer-aided design… ▽ More This paper studies the polynomial basis that generates the smallest $n$-simplex enclosing a given $n^{\text{th}}$-degree polynomial curve in $\mathbb{R}^n$. Although the Bernstein and B-Spline polynomial bases provide feasible solutions to this problem, the simplexes obtained by these bases are not the smallest possible, which leads to overly conservative results in many CAD (computer-aided design) applications. We first prove that the polynomial basis that solves this problem (MINVO basis) also solves for the $n^\text{th}$-degree polynomial curve with largest convex hull enclosed in a given $n$-simplex. Then, we present a formulation that is independent of the $n$-simplex or $n^{\text{th}}$-degree polynomial curve given. By using Sum-Of-Squares (SOS) programming, branch and bound, and moment relaxations, we obtain high-quality feasible solutions for any $n\in\mathbb{N}$, and prove (numerical) global optimality for $n=1,2,3$ and (numerical) local optimality for $n=4$. The results obtained for $n=3$ show that, for any given $3^{\text{rd}}$-degree polynomial curve in $\mathbb{R}^3$, the MINVO basis is able to obtain an enclosing simplex whose volume is $2.36$ and $254.9$ times smaller than the ones obtained by the Bernstein and B-Spline bases, respectively. When $n=7$, these ratios increase to $902.7$ and $2.997\cdot10^{21}$, respectively. △ Less

Submitted 26 September, 2022; v1 submitted 20 October, 2020; originally announced October 2020.

Comments: 27 pages, 25 figures

arXiv:2010.00540 [pdf, other]

Robustness Analysis of Neural Networks via Efficient Partitioning with Applications in Control Systems

Authors: Michael Everett, Golnaz Habibi, Jonathan P. How

Abstract: Neural networks (NNs) are now routinely implemented on systems that must operate in uncertain environments, but the tools for formally analyzing how this uncertainty propagates to NN outputs are not yet commonplace. Computing tight bounds on NN output sets (given an input set) provides a measure of confidence associated with the NN decisions and is essential to deploy NNs on safety-critical system… ▽ More Neural networks (NNs) are now routinely implemented on systems that must operate in uncertain environments, but the tools for formally analyzing how this uncertainty propagates to NN outputs are not yet commonplace. Computing tight bounds on NN output sets (given an input set) provides a measure of confidence associated with the NN decisions and is essential to deploy NNs on safety-critical systems. Recent works approximate the propagation of sets through nonlinear activations or partition the uncertainty set to provide a guaranteed outer bound on the set of possible NN outputs. However, the bound looseness causes excessive conservatism and/or the computation is too slow for online analysis. This paper unifies propagation and partition approaches to provide a family of robustness analysis algorithms that give tighter bounds than existing works for the same amount of computation time (or reduced computational effort for a desired accuracy level). Moreover, we provide new partitioning techniques that are aware of their current bound estimates and desired boundary shape (e.g., lower bounds, weighted $\ell_\infty$-ball, convex hull), leading to further improvements in the computation-tightness tradeoff. The paper demonstrates the tighter bounds and reduced conservatism of the proposed robustness analysis framework with examples from model-free RL and forward kinematics learning. △ Less

Submitted 7 December, 2020; v1 submitted 1 October, 2020; originally announced October 2020.

arXiv:2007.07702 [pdf, other]

Lunar Terrain Relative Navigation Using a Convolutional Neural Network for Visual Crater Detection

Authors: Lena M. Downes, Ted J. Steiner, Jonathan P. How

Abstract: Terrain relative navigation can improve the precision of a spacecraft's position estimate by detecting global features that act as supplementary measurements to correct for drift in the inertial navigation system. This paper presents a system that uses a convolutional neural network (CNN) and image processing methods to track the location of a simulated spacecraft with an extended Kalman filter (E… ▽ More Terrain relative navigation can improve the precision of a spacecraft's position estimate by detecting global features that act as supplementary measurements to correct for drift in the inertial navigation system. This paper presents a system that uses a convolutional neural network (CNN) and image processing methods to track the location of a simulated spacecraft with an extended Kalman filter (EKF). The CNN, called LunaNet, visually detects craters in the simulated camera frame and those detections are matched to known lunar craters in the region of the current estimated spacecraft position. These matched craters are treated as features that are tracked using the EKF. LunaNet enables more reliable position tracking over a simulated trajectory due to its greater robustness to changes in image brightness and more repeatable crater detections from frame to frame throughout a trajectory. LunaNet combined with an EKF produces a decrease of 60% in the average final position estimation error and a decrease of 25% in average final velocity estimation error compared to an EKF using an image processing-based crater detection method when tested on trajectories using images of standard brightness. △ Less

Submitted 15 July, 2020; originally announced July 2020.

Comments: 6 pages, 4 figures. This work was accepted by the 2020 American Control Conference

arXiv:2006.11419 [pdf, other]

FISAR: Forward Invariant Safe Reinforcement Learning with a Deep Neural Network-Based Optimize

Authors: Chuangchuang Sun, Dong-Ki Kim, Jonathan P. How

Abstract: This paper investigates reinforcement learning with constraints, which are indispensable in safety-critical environments. To drive the constraint violation monotonically decrease, we take the constraints as Lyapunov functions and impose new linear constraints on the policy parameters' updating dynamics. As a result, the original safety set can be forward-invariant. However, because the new guarant… ▽ More This paper investigates reinforcement learning with constraints, which are indispensable in safety-critical environments. To drive the constraint violation monotonically decrease, we take the constraints as Lyapunov functions and impose new linear constraints on the policy parameters' updating dynamics. As a result, the original safety set can be forward-invariant. However, because the new guaranteed-feasible constraints are imposed on the updating dynamics instead of the original policy parameters, classic optimization algorithms are no longer applicable. To address this, we propose to learn a generic deep neural network (DNN)-based optimizer to optimize the objective while satisfying the linear constraints. The constraint-satisfaction is achieved via projection onto a polytope formulated by multiple linear inequality constraints, which can be solved analytically with our newly designed metric. To the best of our knowledge, this is the \textit{first} DNN-based optimizer for constrained optimization with the forward invariance guarantee. We show that our optimizer trains a policy to decrease the constraint violation and maximize the cumulative reward monotonically. Results on numerical constrained optimization and obstacle-avoidance navigation validate the theoretical findings. △ Less

Submitted 5 May, 2021; v1 submitted 19 June, 2020; originally announced June 2020.

Comments: Accepted to ICML 2020 Workshop Theoretical Foundations of RL; Accepted to ICRA 2021

arXiv:2006.01109 [pdf, other]

Collision Probabilities for Continuous-Time Systems Without Sampling [with Appendices]

Authors: Kristoffer M. Frey, Ted J. Steiner, Jonathan P. How

Abstract: Demand for high-performance, robust, and safe autonomous systems has grown substantially in recent years. These objectives motivate the desire for efficient safety-theoretic reasoning that can be embedded in core decision-making tasks such as motion planning, particularly in constrained environments. On one hand, Monte-Carlo (MC) and other sampling-based techniques provide accurate collision proba… ▽ More Demand for high-performance, robust, and safe autonomous systems has grown substantially in recent years. These objectives motivate the desire for efficient safety-theoretic reasoning that can be embedded in core decision-making tasks such as motion planning, particularly in constrained environments. On one hand, Monte-Carlo (MC) and other sampling-based techniques provide accurate collision probability estimates for a wide variety of motion models but are cumbersome in the context of continuous optimization. On the other, "direct" approximations aim to compute (or upper-bound) the failure probability as a smooth function of the decision variables, and thus are convenient for optimization. However, existing direct approaches fundamentally assume discrete-time dynamics and can perform unpredictably when applied to continuous-time systems ubiquitous in the real world, often manifesting as severe conservatism. State-of-the-art attempts to address this within a conventional discrete-time framework require additional Gaussianity approximations that ultimately produce inconsistency of their own. In this paper we take a fundamentally different approach, deriving a risk approximation framework directly in continuous time and producing a lightweight estimate that actually converges as the underlying discretization is refined. Our approximation is shown to significantly outperform state-of-the-art techniques in replicating the MC estimate while maintaining the functional and computational benefits of a direct method. This enables robust, risk-aware, continuous motion-planning for a broad class of nonlinear and/or partially-observable systems. △ Less

Submitted 24 December, 2022; v1 submitted 1 June, 2020; originally announced June 2020.

Comments: Presented at RSS 2020. Updated version contains restructured proofs and analysis, as well as as a number of notational tweaks throughout

arXiv:2004.06496 [pdf, other]

doi 10.1109/TNNLS.2021.3056046

Certifiable Robustness to Adversarial State Uncertainty in Deep Reinforcement Learning

Authors: Michael Everett, Bjorn Lutjens, Jonathan P. How

Abstract: Deep Neural Network-based systems are now the state-of-the-art in many robotics tasks, but their application in safety-critical domains remains dangerous without formal guarantees on network robustness. Small perturbations to sensor inputs (from noise or adversarial examples) are often enough to change network-based decisions, which was recently shown to cause an autonomous vehicle to swerve into… ▽ More Deep Neural Network-based systems are now the state-of-the-art in many robotics tasks, but their application in safety-critical domains remains dangerous without formal guarantees on network robustness. Small perturbations to sensor inputs (from noise or adversarial examples) are often enough to change network-based decisions, which was recently shown to cause an autonomous vehicle to swerve into another lane. In light of these dangers, numerous algorithms have been developed as defensive mechanisms from these adversarial inputs, some of which provide formal robustness guarantees or certificates. This work leverages research on certified adversarial robustness to develop an online certifiably robust for deep reinforcement learning algorithms. The proposed defense computes guaranteed lower bounds on state-action values during execution to identify and choose a robust action under a worst-case deviation in input space due to possible adversaries or noise. Moreover, the resulting policy comes with a certificate of solution quality, even though the true state and optimal action are unknown to the certifier due to the perturbations. The approach is demonstrated on a Deep Q-Network policy and is shown to increase robustness to noise and adversaries in pedestrian collision avoidance scenarios and a classic control task. This work extends one of our prior works with new performance guarantees, extensions to other RL algorithms, expanded results aggregated across more scenarios, an extension into scenarios with adversarial behavior, comparisons with a more computationally expensive method, and visualizations that provide intuition about the robustness algorithm. △ Less

Submitted 2 February, 2022; v1 submitted 11 April, 2020; originally announced April 2020.

Comments: arXiv admin note: text overlap with arXiv:1910.12908

arXiv:2003.10028 [pdf, other]

Robust Adaptive Control Barrier Functions: An Adaptive & Data-Driven Approach to Safety (Extended Version)

Authors: Brett T. Lopez, Jean-Jacques E. Slotine, Jonathan P. How

Abstract: A new framework is developed for control of constrained nonlinear systems with structured parametric uncertainties. Forward invariance of a safe set is achieved through online parameter adaptation and data-driven model estimation. The new adaptive data-driven safety paradigm is merged with a recent adaptive control algorithm for systems nominally contracting in closed-loop. This unification is mor… ▽ More A new framework is developed for control of constrained nonlinear systems with structured parametric uncertainties. Forward invariance of a safe set is achieved through online parameter adaptation and data-driven model estimation. The new adaptive data-driven safety paradigm is merged with a recent adaptive control algorithm for systems nominally contracting in closed-loop. This unification is more general than other safety controllers as closed-loop contraction does not require the system be invertible or in a particular form. Additionally, the approach is less expensive than nonlinear model predictive control as it does not require a full desired trajectory, but rather only a desired terminal state. The approach is illustrated on the pitch dynamics of an aircraft with uncertain nonlinear aerodynamics. △ Less

Submitted 28 May, 2020; v1 submitted 22 March, 2020; originally announced March 2020.

Comments: Added aCBF non-Lipschitz example and discussion on approach implementation

arXiv:2003.05016 [pdf, other]

doi 10.1109/ICRA40945.2020.9196922

Active Reward Learning for Co-Robotic Vision Based Exploration in Bandwidth Limited Environments

Authors: Stewart Jamieson, Jonathan P. How, Yogesh Girdhar

Abstract: We present a novel POMDP problem formulation for a robot that must autonomously decide where to go to collect new and scientifically relevant images given a limited ability to communicate with its human operator. From this formulation we derive constraints and design principles for the observation model, reward model, and communication strategy of such a robot, exploring techniques to deal with th… ▽ More We present a novel POMDP problem formulation for a robot that must autonomously decide where to go to collect new and scientifically relevant images given a limited ability to communicate with its human operator. From this formulation we derive constraints and design principles for the observation model, reward model, and communication strategy of such a robot, exploring techniques to deal with the very high-dimensional observation space and scarcity of relevant training data. We introduce a novel active reward learning strategy based on making queries to help the robot minimize path "regret" online, and evaluate it for suitability in autonomous visual exploration through simulations. We demonstrate that, in some bandwidth-limited environments, this novel regret-based criterion enables the robotic explorer to collect up to 17% more reward per mission than the next-best criterion. △ Less

Submitted 10 March, 2020; originally announced March 2020.

Comments: 7 pages, 4 figures; accepted for presentation in IEEE Int. Conf. on Robotics and Automation, ICRA '20, Paris, France, June 2020

Journal ref: 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 2020, pp. 1806-1812

arXiv:2003.03281 [pdf, ps, other]

doi 10.1109/LRA.2020.3010216

Asynchronous and Parallel Distributed Pose Graph Optimization

Authors: Yulun Tian, Alec Koppel, Amrit Singh Bedi, Jonathan P. How

Abstract: We present Asynchronous Stochastic Parallel Pose Graph Optimization (ASAPP), the first asynchronous algorithm for distributed pose graph optimization (PGO) in multi-robot simultaneous localization and map**. By enabling robots to optimize their local trajectory estimates without synchronization, ASAPP offers resiliency against communication delays and alleviates the need to wait for stragglers i… ▽ More We present Asynchronous Stochastic Parallel Pose Graph Optimization (ASAPP), the first asynchronous algorithm for distributed pose graph optimization (PGO) in multi-robot simultaneous localization and map**. By enabling robots to optimize their local trajectory estimates without synchronization, ASAPP offers resiliency against communication delays and alleviates the need to wait for stragglers in the network. Furthermore, ASAPP can be applied on the rank-restricted relaxations of PGO, a crucial class of non-convex Riemannian optimization problems that underlies recent breakthroughs on globally optimal PGO. Under bounded delay, we establish the global first-order convergence of ASAPP using a sufficiently small stepsize. The derived stepsize depends on the worst-case delay and inherent problem sparsity, and furthermore matches known result for synchronous algorithms when there is no delay. Numerical evaluations on simulated and real-world datasets demonstrate favorable performance compared to state-of-the-art synchronous approach, and show ASAPP's resilience against a wide range of delays in practice. △ Less

Submitted 30 June, 2023; v1 submitted 6 March, 2020; originally announced March 2020.

Comments: full paper with appendices

arXiv:2003.02305 [pdf, other]

Touch the Wind: Simultaneous Airflow, Drag and Interaction Sensing on a Multirotor

Authors: Andrea Tagliabue, Aleix Paris, Suhan Kim, Regan Kubicek, Sarah Bergbreiter, Jonathan P. How

Abstract: Disturbance estimation for Micro Aerial Vehicles (MAVs) is crucial for robustness and safety. In this paper, we use novel, bio-inspired airflow sensors to measure the airflow acting on a MAV, and we fuse this information in an Unscented Kalman Filter (UKF) to simultaneously estimate the three-dimensional wind vector, the drag force, and other interaction forces (e.g. due to collisions, interaction… ▽ More Disturbance estimation for Micro Aerial Vehicles (MAVs) is crucial for robustness and safety. In this paper, we use novel, bio-inspired airflow sensors to measure the airflow acting on a MAV, and we fuse this information in an Unscented Kalman Filter (UKF) to simultaneously estimate the three-dimensional wind vector, the drag force, and other interaction forces (e.g. due to collisions, interaction with a human) acting on the robot. To this end, we present and compare a fully model-based and a deep learning-based strategy. The model-based approach considers the MAV and airflow sensor dynamics and its interaction with the wind, while the deep learning-based strategy uses a Long Short-Term Memory (LSTM) neural network to obtain an estimate of the relative airflow, which is then fused in the proposed filter. We validate our methods in hardware experiments, showing that we can accurately estimate relative airflow of up to 4 m/s, and we can differentiate drag and interaction force. △ Less

Submitted 4 March, 2020; originally announced March 2020.

Comments: The first two authors contributed equally

arXiv:2003.01851 [pdf, other]

doi 10.1109/LRA.2020.3006823

A Distributed Pipeline for Scalable, Deconflicted Formation Flying

Authors: Parker C. Lusk, Xiaoyi Cai, Samir Wadhwania, Aleix Paris, Kaveh Fathian, Jonathan P. How

Abstract: Reliance on external localization infrastructure and centralized coordination are main limiting factors for formation flying of vehicles in large numbers and in unprepared environments. While solutions using onboard localization address the dependency on external infrastructure, the associated coordination strategies typically lack collision avoidance and scalability. To address these shortcomings… ▽ More Reliance on external localization infrastructure and centralized coordination are main limiting factors for formation flying of vehicles in large numbers and in unprepared environments. While solutions using onboard localization address the dependency on external infrastructure, the associated coordination strategies typically lack collision avoidance and scalability. To address these shortcomings, we present a unified pipeline with onboard localization and a distributed, collision-free motion planning strategy that scales to a large number of vehicles. Since distributed collision avoidance strategies are known to result in gridlock, we also present a decentralized task assignment solution to deconflict vehicles. We experimentally validate our pipeline in simulation and hardware. The results show that our approach for solving the optimization problem associated with motion planning gives solutions within seconds in cases where general purpose solvers fail due to high complexity. In addition, our lightweight assignment strategy leads to successful and quicker formation convergence in 96-100% of all trials, whereas indefinite gridlocks occur without it for 33-50% of trials. By enabling large-scale, deconflicted coordination, this pipeline should help pave the way for anytime, anywhere deployment of aerial swarms. △ Less

Submitted 3 July, 2020; v1 submitted 3 March, 2020; originally announced March 2020.

Comments: 8 main pages, 1 additional page, accepted to RA-L and IROS'20

arXiv:2003.01040 [pdf, other]

Scaling Up Multiagent Reinforcement Learning for Robotic Systems: Learn an Adaptive Sparse Communication Graph

Authors: Chuangchuang Sun, Macheng Shen, Jonathan P. How

Abstract: The complexity of multiagent reinforcement learning (MARL) in multiagent systems increases exponentially with respect to the agent number. This scalability issue prevents MARL from being applied in large-scale multiagent systems. However, one critical feature in MARL that is often neglected is that the interactions between agents are quite sparse. Without exploiting this sparsity structure, existi… ▽ More The complexity of multiagent reinforcement learning (MARL) in multiagent systems increases exponentially with respect to the agent number. This scalability issue prevents MARL from being applied in large-scale multiagent systems. However, one critical feature in MARL that is often neglected is that the interactions between agents are quite sparse. Without exploiting this sparsity structure, existing works aggregate information from all of the agents and thus have a high sample complexity. To address this issue, we propose an adaptive sparse attention mechanism by generalizing a sparsity-inducing activation function. Then a sparse communication graph in MARL is learned by graph neural networks based on this new attention mechanism. Through this sparsity structure, the agents can communicate in an effective as well as efficient way via only selectively attending to agents that matter the most and thus the scale of the MARL problem is reduced with little optimality compromised. Comparative results show that our algorithm can learn an interpretable sparse structure and outperforms previous works by a significant margin on applications involving a large-scale multiagent system. △ Less

Submitted 3 March, 2020; v1 submitted 2 March, 2020; originally announced March 2020.

arXiv:2002.06684 [pdf, other]

R-MADDPG for Partially Observable Environments and Limited Communication

Authors: Rose E. Wang, Michael Everett, Jonathan P. How

Abstract: There are several real-world tasks that would benefit from applying multiagent reinforcement learning (MARL) algorithms, including the coordination among self-driving cars. The real world has challenging conditions for multiagent learning systems, such as its partial observable and nonstationary nature. Moreover, if agents must share a limited resource (e.g. network bandwidth) they must all learn… ▽ More There are several real-world tasks that would benefit from applying multiagent reinforcement learning (MARL) algorithms, including the coordination among self-driving cars. The real world has challenging conditions for multiagent learning systems, such as its partial observable and nonstationary nature. Moreover, if agents must share a limited resource (e.g. network bandwidth) they must all learn how to coordinate resource use. This paper introduces a deep recurrent multiagent actor-critic framework (R-MADDPG) for handling multiagent coordination under partial observable set-tings and limited communication. We investigate recurrency effects on performance and communication use of a team of agents. We demonstrate that the resulting framework learns time dependencies for sharing missing observations, handling resource limitations, and develo** different communication patterns among agents. △ Less

Submitted 17 February, 2020; v1 submitted 16 February, 2020; originally announced February 2020.

Comments: Reinforcement Learning for Real Life (RL4RealLife) Workshop in the 36th International Conference on Machine Learning, Long Beach, California, USA, 2019

Journal ref: Reinforcement Learning for Real Life (RL4RealLife) Workshop in the 36th International Conference on Machine Learning, Long Beach, California, USA, 2019

arXiv:2001.06627 [pdf, other]

Multi-agent Motion Planning for Dense and Dynamic Environments via Deep Reinforcement Learning

Authors: Samaneh Hosseini Semnani, Hugh Liu, Michael Everett, Anton de Ruiter, Jonathan P. How

Abstract: This paper introduces a hybrid algorithm of deep reinforcement learning (RL) and Force-based motion planning (FMP) to solve distributed motion planning problem in dense and dynamic environments. Individually, RL and FMP algorithms each have their own limitations. FMP is not able to produce time-optimal paths and existing RL solutions are not able to produce collision-free paths in dense environmen… ▽ More This paper introduces a hybrid algorithm of deep reinforcement learning (RL) and Force-based motion planning (FMP) to solve distributed motion planning problem in dense and dynamic environments. Individually, RL and FMP algorithms each have their own limitations. FMP is not able to produce time-optimal paths and existing RL solutions are not able to produce collision-free paths in dense environments. Therefore, we first tried improving the performance of recent RL approaches by introducing a new reward function that not only eliminates the requirement of a pre supervised learning (SL) step but also decreases the chance of collision in crowded environments. That improved things, but there were still a lot of failure cases. So, we developed a hybrid approach to leverage the simpler FMP approach in stuck, simple and high-risk cases, and continue using RL for normal cases in which FMP can't produce optimal path. Also, we extend GA3C-CADRL algorithm to 3D environment. Simulation results show that the proposed algorithm outperforms both deep RL and FMP algorithms and produces up to 50% more successful scenarios than deep RL and up to 75% less extra time to reach goal than FMP. △ Less

Submitted 18 January, 2020; originally announced January 2020.

Comments: IEEE Robotics and Automation Letters (2020)

arXiv:2001.04420 [pdf, other]

FASTER: Fast and Safe Trajectory Planner for Navigation in Unknown Environments

Authors: Jesus Tordesillas, Brett T. Lopez, Michael Everett, Jonathan P. How

Abstract: Planning high-speed trajectories for UAVs in unknown environments requires algorithmic techniques that enable fast reaction times to guarantee safety as more information about the environment becomes available. The standard approaches that ensure safety by enforcing a "stop" condition in the free-known space can severely limit the speed of the vehicle, especially in situations where much of the wo… ▽ More Planning high-speed trajectories for UAVs in unknown environments requires algorithmic techniques that enable fast reaction times to guarantee safety as more information about the environment becomes available. The standard approaches that ensure safety by enforcing a "stop" condition in the free-known space can severely limit the speed of the vehicle, especially in situations where much of the world is unknown. Moreover, the ad-hoc time and interval allocation scheme usually imposed on the trajectory also leads to conservative and slower trajectories. This work proposes FASTER (Fast and Safe Trajectory Planner) to ensure safety without sacrificing speed. FASTER obtains high-speed trajectories by enabling the local planner to optimize in both the free-known and unknown spaces. Safety is ensured by always having a safe back-up trajectory in the free-known space. The MIQP formulation proposed also allows the solver to choose the trajectory interval allocation. FASTER is tested extensively in simulation and in real hardware, showing flights in unknown cluttered environments with velocities up to 7.8m/s, and experiments at the maximum speed of a skid-steer ground robot (2m/s). △ Less

Submitted 30 August, 2021; v1 submitted 9 January, 2020; originally announced January 2020.

Comments: This paper has been accepted for publication in IEEE Transactions on Robotics. arXiv admin note: text overlap with arXiv:1903.03558

arXiv:1911.09476 [pdf, other]

Incremental Learning of Motion Primitives for Pedestrian Trajectory Prediction at Intersections

Authors: Golnaz Habibi, Nikita Japuria, Jonathan P. How

Abstract: This paper presents a novel incremental learning algorithm for pedestrian motion prediction, with the ability to improve the learned model over time when data is incrementally available. In this setup, trajectories are modeled as simple segments called motion primitives. Transitions between motion primitives are modeled as Gaussian Processes. When new data is available, the motion primitives learn… ▽ More This paper presents a novel incremental learning algorithm for pedestrian motion prediction, with the ability to improve the learned model over time when data is incrementally available. In this setup, trajectories are modeled as simple segments called motion primitives. Transitions between motion primitives are modeled as Gaussian Processes. When new data is available, the motion primitives learned from the new data are compared with the previous ones by measuring the inner product of the motion primitive vectors. Similar motion primitives and transitions are fused and novel motion primitives are added to capture newly observed behaviors. The proposed approach is tested and compared with other baselines in intersection scenarios where the data is incrementally available either from a single intersection or from multiple intersections with different geometries. In both cases, our method incrementally learns motion patterns and outperforms the offline learning approach in terms of prediction errors. The results also show that the model size in our algorithm grows at a much lower rate than standard incremental learning, where newly learned motion primitives and transitions are simply accumulated over time. △ Less

Submitted 21 November, 2019; originally announced November 2019.

arXiv:1911.03721 [pdf, other]

Distributed Certifiably Correct Pose-Graph Optimization

Authors: Yulun Tian, Kasra Khosoussi, David M. Rosen, Jonathan P. How

Abstract: This paper presents the first certifiably correct algorithm for distributed pose-graph optimization (PGO), the backbone of modern collaborative simultaneous localization and map** (CSLAM) and camera network localization (CNL) systems. Our method is based upon a sparse semidefinite relaxation that we prove provides globally-optimal PGO solutions under moderate measurement noise (matching the guar… ▽ More This paper presents the first certifiably correct algorithm for distributed pose-graph optimization (PGO), the backbone of modern collaborative simultaneous localization and map** (CSLAM) and camera network localization (CNL) systems. Our method is based upon a sparse semidefinite relaxation that we prove provides globally-optimal PGO solutions under moderate measurement noise (matching the guarantees enjoyed by state-of-the-art centralized methods), but is amenable to distributed optimization using the low-rank Riemannian Staircase framework. To implement the Riemannian Staircase in the distributed setting, we develop Riemannian block coordinate descent (RBCD), a novel method for (locally) minimizing a function over a product of Riemannian manifolds. We also propose the first distributed solution verification and saddle escape methods to certify the global optimality of critical points recovered via RBCD, and to descend from suboptimal critical points (if necessary). All components of our approach are inherently decentralized: they require only local communication, provide privacy protection, and are easily parallelizable. Extensive evaluations on synthetic and real-world datasets demonstrate that the proposed method correctly recovers globally optimal solutions under moderate noise, and outperforms alternative distributed techniques in terms of solution precision and convergence speed. △ Less

Submitted 18 May, 2021; v1 submitted 9 November, 2019; originally announced November 2019.

Comments: Updated convergence proofs. Paper accepted at T-RO

arXiv:1910.12908 [pdf, other]

Certified Adversarial Robustness for Deep Reinforcement Learning

Authors: Björn Lütjens, Michael Everett, Jonathan P. How

Abstract: Deep Neural Network-based systems are now the state-of-the-art in many robotics tasks, but their application in safety-critical domains remains dangerous without formal guarantees on network robustness. Small perturbations to sensor inputs (from noise or adversarial examples) are often enough to change network-based decisions, which was already shown to cause an autonomous vehicle to swerve into o… ▽ More Deep Neural Network-based systems are now the state-of-the-art in many robotics tasks, but their application in safety-critical domains remains dangerous without formal guarantees on network robustness. Small perturbations to sensor inputs (from noise or adversarial examples) are often enough to change network-based decisions, which was already shown to cause an autonomous vehicle to swerve into oncoming traffic. In light of these dangers, numerous algorithms have been developed as defensive mechanisms from these adversarial inputs, some of which provide formal robustness guarantees or certificates. This work leverages research on certified adversarial robustness to develop an online certified defense for deep reinforcement learning algorithms. The proposed defense computes guaranteed lower bounds on state-action values during execution to identify and choose the optimal action under a worst-case deviation in input space due to possible adversaries or noise. The approach is demonstrated on a Deep Q-Network policy and is shown to increase robustness to noise and adversaries in pedestrian collision avoidance scenarios and a classic control task. △ Less

Submitted 6 March, 2020; v1 submitted 28 October, 2019; originally announced October 2019.

Comments: Published at Conference on Robot Learning (CoRL) 2019; (v2) contains minor updates to related works; (v3) acknowledged AWS

Journal ref: Proceedings of Machine Learning Research (PMLR) Vol. 100, 2019

arXiv:1910.11689 [pdf, other]

doi 10.1109/ACCESS.2021.3050338

Collision Avoidance in Pedestrian-Rich Environments with Deep Reinforcement Learning

Authors: Michael Everett, Yu Fan Chen, Jonathan P. How

Abstract: Collision avoidance algorithms are essential for safe and efficient robot operation among pedestrians. This work proposes using deep reinforcement (RL) learning as a framework to model the complex interactions and cooperation with nearby, decision-making agents, such as pedestrians and other robots. Existing RL-based works assume homogeneity of agent properties, use specific motion models over sho… ▽ More Collision avoidance algorithms are essential for safe and efficient robot operation among pedestrians. This work proposes using deep reinforcement (RL) learning as a framework to model the complex interactions and cooperation with nearby, decision-making agents, such as pedestrians and other robots. Existing RL-based works assume homogeneity of agent properties, use specific motion models over short timescales, or lack a principled method to handle a large, possibly varying number of agents. Therefore, this work develops an algorithm that learns collision avoidance among a variety of heterogeneous, non-communicating, dynamic agents without assuming they follow any particular behavior rules. It extends our previous work by introducing a strategy using Long Short-Term Memory (LSTM) that enables the algorithm to use observations of an arbitrary number of other agents, instead of a small, fixed number of neighbors. The proposed algorithm is shown to outperform a classical collision avoidance algorithm, another deep RL-based algorithm, and scales with the number of agents better (fewer collisions, shorter time to goal) than our previously published learning-based approach. Analysis of the LSTM provides insights into how observations of nearby agents affect the hidden state and quantifies the performance impact of various agent ordering heuristics. The learned policy generalizes to several applications beyond the training scenarios: formation control (arrangement into letters), demonstrations on a fleet of four multirotors and on a fully autonomous robotic vehicle capable of traveling at human walking speed among pedestrians. △ Less

Submitted 25 January, 2021; v1 submitted 24 October, 2019; originally announced October 2019.

Comments: arXiv admin note: substantial text overlap with arXiv:1805.01956

arXiv:1910.10763 [pdf, ps, other]

Representation Learning in Heterogeneous Professional Social Networks with Ambiguous Social Connections

Authors: Baoxu Shi, Jaewon Yang, Tim Weninger, **g How, Qi He

Abstract: Network representations have been shown to improve performance within a variety of tasks, including classification, clustering, and link prediction. However, most models either focus on moderate-sized, homogeneous networks or require a significant amount of auxiliary input to be provided by the user. Moreover, few works have studied network representations in real-world heterogeneous social networ… ▽ More Network representations have been shown to improve performance within a variety of tasks, including classification, clustering, and link prediction. However, most models either focus on moderate-sized, homogeneous networks or require a significant amount of auxiliary input to be provided by the user. Moreover, few works have studied network representations in real-world heterogeneous social networks with ambiguous social connections and are often incomplete. In the present work, we investigate the problem of learning low-dimensional node representations in heterogeneous professional social networks (HPSNs), which are incomplete and have ambiguous social connections. We present a general heterogeneous network representation learning model called Star2Vec that learns entity and person embeddings jointly using a social connection strength-aware biased random walk combined with a node-structure expansion function. Experiments on LinkedIn's Economic Graph and publicly available snapshots of Facebook's network show that Star2Vec outperforms existing methods on members' industry and social circle classification, skill and title clustering, and member-entity link predictions. We also conducted large-scale case studies to demonstrate practical applications of the Star2Vec embeddings trained on LinkedIn's Economic Graph such as next career move, alternative career suggestions, and general entity similarity searches. △ Less

Submitted 23 October, 2019; originally announced October 2019.

Comments: 10 pages, accepted at IEEE BigData 2019

arXiv:1909.11071 [pdf, other]

Dynamic Landing of an Autonomous Quadrotor on a Moving Platform in Turbulent Wind Conditions

Authors: Aleix Paris, Brett T. Lopez, Jonathan P. How

Abstract: Autonomous landing on a moving platform presents unique challenges for multirotor vehicles, including the need to accurately localize the platform, fast trajectory planning, and precise/robust control. Previous works studied this problem but most lack explicit consideration of the wind disturbance, which typically leads to slow descents onto the platform. This work presents a fully autonomous visi… ▽ More Autonomous landing on a moving platform presents unique challenges for multirotor vehicles, including the need to accurately localize the platform, fast trajectory planning, and precise/robust control. Previous works studied this problem but most lack explicit consideration of the wind disturbance, which typically leads to slow descents onto the platform. This work presents a fully autonomous vision-based system that addresses these limitations by tightly coupling the localization, planning, and control, thereby enabling fast and accurate landing on a moving platform. The platform's position, orientation, and velocity are estimated by an extended Kalman filter using simulated GPS measurements when the quadrotor-platform distance is large, and by a visual fiducial system when the platform is nearby. The landing trajectory is computed online using receding horizon control and is followed by a boundary layer sliding controller that provides tracking performance guarantees in the presence of unknown, but bounded, disturbances. To improve the performance, the characteristics of the turbulent conditions are accounted for in the controller. The landing trajectory is fast, direct, and does not require hovering over the platform, as is typical of most state-of-the-art approaches. Simulations and hardware experiments are presented to validate the robustness of the approach. △ Less

Submitted 13 March, 2020; v1 submitted 24 September, 2019; originally announced September 2019.

Comments: 7 pages, 8 figures, ICRA2020 accepted paper

arXiv:1909.08735 [pdf, other]

Robust Opponent Modeling via Adversarial Ensemble Reinforcement Learning in Asymmetric Imperfect-Information Games

Authors: Macheng Shen, Jonathan P. How

Abstract: This paper presents an algorithmic framework for learning robust policies in asymmetric imperfect-information games, where the joint reward could depend on the uncertain opponent type (a private information known only to the opponent itself and its ally). In order to maximize the reward, the protagonist agent has to infer the opponent type through agent modeling. We use multiagent reinforcement le… ▽ More This paper presents an algorithmic framework for learning robust policies in asymmetric imperfect-information games, where the joint reward could depend on the uncertain opponent type (a private information known only to the opponent itself and its ally). In order to maximize the reward, the protagonist agent has to infer the opponent type through agent modeling. We use multiagent reinforcement learning (MARL) to learn opponent models through self-play, which captures the full strategy interaction and reasoning between agents. However, agent policies learned from self-play can suffer from mutual overfitting. Ensemble training methods can be used to improve the robustness of agent policy against different opponents, but it also significantly increases the computational overhead. In order to achieve a good trade-off between the robustness of the learned policy and the computation complexity, we propose to train a separate opponent policy against the protagonist agent for evaluation purposes. The reward achieved by this opponent is a noisy measure of the robustness of the protagonist agent policy due to the intrinsic stochastic nature of a reinforcement learner. To handle this stochasticity, we apply a stochastic optimization scheme to dynamically update the opponent ensemble to optimize an objective function that strikes a balance between robustness and computation complexity. We empirically show that, under the same limited computational budget, the proposed method results in more robust policy learning than standard ensemble training. △ Less

Submitted 3 March, 2020; v1 submitted 18 September, 2019; originally announced September 2019.

arXiv:1909.05004 [pdf, other]

Predicting optimal value functions by interpolating reward functions in scalarized multi-objective reinforcement learning

Authors: Arpan Kusari, Jonathan P. How

Abstract: A common approach for defining a reward function for Multi-objective Reinforcement Learning (MORL) problems is the weighted sum of the multiple objectives. The weights are then treated as design parameters dependent on the expertise (and preference) of the person performing the learning, with the typical result that a new solution is required for any change in these settings. This paper investigat… ▽ More A common approach for defining a reward function for Multi-objective Reinforcement Learning (MORL) problems is the weighted sum of the multiple objectives. The weights are then treated as design parameters dependent on the expertise (and preference) of the person performing the learning, with the typical result that a new solution is required for any change in these settings. This paper investigates the relationship between the reward function and the optimal value function for MORL; specifically addressing the question of how to approximate the optimal value function well beyond the set of weights for which the optimization problem was actually solved, thereby avoiding the need to recompute for any particular choice. We prove that the value function transforms smoothly given a transformation of weights of the reward function (and thus a smooth interpolation in the policy space). A Gaussian process is used to obtain a smooth interpolation over the reward function weights of the optimal value function for three well-known examples: GridWorld, Objectworld and Pendulum. The results show that the interpolation can provide very robust values for sample states and action space in discrete and continuous domain problems. Significant advantages arise from utilizing this interpolation technique in the domain of autonomous vehicles: easy, instant adaptation of user preferences while driving and true randomization of obstacle vehicle behavior preferences during training. △ Less

Submitted 3 March, 2020; v1 submitted 11 September, 2019; originally announced September 2019.

Comments: Accepted at ICRA 2020

arXiv:1908.10541 [pdf, other]

Search and Rescue under the Forest Canopy using Multiple UAVs

Authors: Yulun Tian, Katherine Liu, Kyel Ok, Loc Tran, Danette Allen, Nicholas Roy, Jonathan P. How

Abstract: We present a multi-robot system for GPS-denied search and rescue under the forest canopy. Forests are particularly challenging environments for collaborative exploration and map**, in large part due to the existence of severe perceptual aliasing which hinders reliable loop closure detection for mutual localization and map fusion. Our proposed system features unmanned aerial vehicles (UAVs) that… ▽ More We present a multi-robot system for GPS-denied search and rescue under the forest canopy. Forests are particularly challenging environments for collaborative exploration and map**, in large part due to the existence of severe perceptual aliasing which hinders reliable loop closure detection for mutual localization and map fusion. Our proposed system features unmanned aerial vehicles (UAVs) that perform onboard sensing, estimation, and planning. When communication is available, each UAV transmits compressed tree-based submaps to a central ground station for collaborative simultaneous localization and map** (CSLAM). To overcome high measurement noise and perceptual aliasing, we use the local configuration of a group of trees as a distinctive feature for robust loop closure detection. Furthermore, we propose a novel procedure based on cycle consistent multiway matching to recover from incorrect pairwise data associations. The returned global data association is guaranteed to be cycle consistent, and is shown to improve both precision and recall compared to the input pairwise associations. The proposed multi-UAV system is validated both in simulation and during real-world collaborative exploration missions at NASA Langley Research Center. △ Less

Submitted 7 June, 2020; v1 submitted 28 August, 2019; originally announced August 2019.

Comments: IJRR revision

arXiv:1908.09171 [pdf, other]

Planning Beyond the Sensing Horizon Using a Learned Context

Authors: Michael Everett, Justin Miller, Jonathan P. How

Abstract: Last-mile delivery systems commonly propose the use of autonomous robotic vehicles to increase scalability and efficiency. The economic inefficiency of collecting accurate prior maps for navigation motivates the use of planning algorithms that operate in unmapped environments. However, these algorithms typically waste time exploring regions that are unlikely to contain the delivery destination. Co… ▽ More Last-mile delivery systems commonly propose the use of autonomous robotic vehicles to increase scalability and efficiency. The economic inefficiency of collecting accurate prior maps for navigation motivates the use of planning algorithms that operate in unmapped environments. However, these algorithms typically waste time exploring regions that are unlikely to contain the delivery destination. Context is key information about structured environments that could guide exploration toward the unknown goal location, but the abstract idea is difficult to quantify for use in a planning algorithm. Some approaches specifically consider contextual relationships between objects, but would perform poorly in object-sparse environments like outdoors. Recent deep learning-based approaches consider context too generally, making training/transferability difficult. Therefore, this work proposes a novel formulation of utilizing context for planning as an image-to-image translation problem, which is shown to extract terrain context from semantic gridmaps, into a metric that an exploration-based planner can use. The proposed framework has the benefit of training on a static dataset instead of requiring a time-consuming simulator. Across 42 test houses with layouts from satellite images, the trained algorithm enables a robot to reach its goal 189\% faster than with a context-unaware planner, and within 63\% of the optimal path computed with a prior map. The proposed algorithm is also implemented on a vehicle with a forward-facing camera in a high-fidelity, Unreal simulation of neighborhood houses. △ Less

Submitted 1 June, 2020; v1 submitted 24 August, 2019; originally announced August 2019.

arXiv:1908.03790 [pdf, other]

Towards Online Observability-Aware Trajectory Optimization for Landmark-based Estimators

Authors: Kristoffer M. Frey, Ted J. Steiner, Jonathan P. How

Abstract: As autonomous systems increasingly rely on onboard sensing for localization and perception, the parallel tasks of motion planning and state estimation become more strongly coupled. This coupling is well-captured by augmenting the planning objective with a posterior-covariance penalty -- however, prediction of the estimator covariance is challenging when the observation model depends on unknown lan… ▽ More As autonomous systems increasingly rely on onboard sensing for localization and perception, the parallel tasks of motion planning and state estimation become more strongly coupled. This coupling is well-captured by augmenting the planning objective with a posterior-covariance penalty -- however, prediction of the estimator covariance is challenging when the observation model depends on unknown landmarks, as is the case in Simultaneous Localization and Map** (SLAM). This paper addresses these challenges in the case of landmark- and SLAM-based estimators, enabling efficient prediction (and ultimately minimization) of this performance metric. First, we provide an interval-based filtering approximation of the SLAM inference process which allows for recursive propagation of the ego-covariance while avoiding the quadratic complexity of explicitly tracking landmark uncertainty. Secondly, we introduce a Lie-derivative measurement bundling scheme that simplifies the recursive "bundled" update, representing significant computational savings for high-rate sensors such as cameras. Finally, we identify a large class of measurement models (which includes orthographic camera projection) for which the contributions from each landmark can be directly combined, making evaluation of the information gained at each timestep (nearly) independent of the number of landmarks. This also enables the generalization from finite sets of landmarks $\{\ell^{(n)} \}$ to distributions, foregoing the need for fully-specified linearization points at planning time and allowing for new landmarks to be anticipated. Taken together, these contributions allow SLAM performance to be accurately and efficiently predicted, paving the way for online, observability-aware trajectory optimization in unknown space. △ Less

Submitted 10 September, 2020; v1 submitted 10 August, 2019; originally announced August 2019.

Comments: Preprint; 25 pages

arXiv:1907.06553 [pdf, other]

Dynamic Tube MPC for Nonlinear Systems

Authors: Brett T. Lopez, Jean-Jacques E. Slotine, Jonathan P. How

Abstract: Modeling error or external disturbances can severely degrade the performance of Model Predictive Control (MPC) in real-world scenarios. Robust MPC (RMPC) addresses this limitation by optimizing over feedback policies but at the expense of increased computational complexity. Tube MPC is an approximate solution strategy in which a robust controller, designed offline, keeps the system in an invariant… ▽ More Modeling error or external disturbances can severely degrade the performance of Model Predictive Control (MPC) in real-world scenarios. Robust MPC (RMPC) addresses this limitation by optimizing over feedback policies but at the expense of increased computational complexity. Tube MPC is an approximate solution strategy in which a robust controller, designed offline, keeps the system in an invariant tube around a desired nominal trajectory, generated online. Naturally, this decomposition is suboptimal, especially for systems with changing objectives or operating conditions. In addition, many tube MPC approaches are unable to capture state-dependent uncertainty due to the complexity of calculating invariant tubes, resulting in overly-conservative approximations. This work presents the Dynamic Tube MPC (DTMPC) framework for nonlinear systems where both the tube geometry and open-loop trajectory are optimized simultaneously. By using boundary layer sliding control, the tube geometry can be expressed as a simple relation between control parameters and uncertainty bound; enabling the tube geometry dynamics to be added to the nominal MPC optimization with minimal increase in computational complexity. In addition, DTMPC is able to leverage state-dependent uncertainty to reduce conservativeness and improve optimization feasibility. DTMPC is demonstrated to robustly perform obstacle avoidance and modify the tube geometry in response to obstacle proximity. △ Less

Submitted 15 July, 2019; originally announced July 2019.

Showing 51–100 of 152 results for author: How, J