Search | arXiv e-print repository

Control of Microrobots Using Model Predictive Control and Gaussian Processes for Disturbance Estimation

Authors: Mehdi Kermanshah, Logan E. Beaver, Max Sokolich, Sambeeta Das, Ron Weiss, Roberto Tron, Calin Belta

Abstract: This paper presents a control framework for magnetically actuated micron-scale robots ($μ$bots) designed to mitigate disturbances and improve trajectory tracking. To address the challenges posed by unmodeled dynamics and environmental variability, we combine data-driven modeling with model-based control to accurately track desired trajectories using a relatively small amount of data. The system is… ▽ More This paper presents a control framework for magnetically actuated micron-scale robots ($μ$bots) designed to mitigate disturbances and improve trajectory tracking. To address the challenges posed by unmodeled dynamics and environmental variability, we combine data-driven modeling with model-based control to accurately track desired trajectories using a relatively small amount of data. The system is represented with a simple linear model, and Gaussian Processes (GP) are employed to capture and estimate disturbances. This disturbance-enhanced model is then integrated into a Model Predictive Controller (MPC). Our approach demonstrates promising performance in both simulation and experimental setups, showcasing its potential for precise and reliable microrobot control in complex environments. △ Less

Submitted 4 June, 2024; originally announced June 2024.

arXiv:2403.15621 [pdf, other]

Global Games with Negative Feedback for Autonomous Colony Maintenance using Robot Teams

Authors: Logan E. Beaver

Abstract: In this article we address the colony maintenance problem, where a team of robots are tasked with continuously maintaining the energy supply of an autonomous colony. We model this as a global game, where robots measure the energy level of a central nest to determine whether or not to forage for energy sources. We design a mechanism that avoids the trivial equilibrium where all robots always forage… ▽ More In this article we address the colony maintenance problem, where a team of robots are tasked with continuously maintaining the energy supply of an autonomous colony. We model this as a global game, where robots measure the energy level of a central nest to determine whether or not to forage for energy sources. We design a mechanism that avoids the trivial equilibrium where all robots always forage. Furthermore, we demonstrate that when the game is played iteratively a negative feedback term stabilizes the number of foraging robots at a non-trivial Nash equilibrium. We compare our approach qualitatively to existing global games, where a positive positive feedback term admits threshold-based decision making, and encourages many robots to forage simultaneously. We discuss how positive feedback can lead to a cascading failure in the presence of a human who recruits robots for external tasks, and we demonstrate the performance of our approach in simulation. △ Less

Submitted 22 March, 2024; originally announced March 2024.

Comments: 6 pages, 5 figures

arXiv:2212.00188 [pdf, other]

Learning a Tracking Controller for Rolling $μ$bots

Authors: Logan E Beaver, Max Sokolich, Suhail Alsalehi, Ron Weiss, Sambeeta Das, Calin Belta

Abstract: Micron-scale robots ($μ$bots) have recently shown great promise for emerging medical applications. Accurate controlling $μ$bots, while critical to their successful deployment, is challenging. In this work, we consider the problem of tracking a reference trajectory using a $μ$bot in the presence of disturbances and uncertainty. The disturbances primarily come from Brownian motion and other environm… ▽ More Micron-scale robots ($μ$bots) have recently shown great promise for emerging medical applications. Accurate controlling $μ$bots, while critical to their successful deployment, is challenging. In this work, we consider the problem of tracking a reference trajectory using a $μ$bot in the presence of disturbances and uncertainty. The disturbances primarily come from Brownian motion and other environmental phenomena, while the uncertainty originates from errors in the model parameters. We model the $μ$bot as an uncertain unicycle that is controlled by a global magnetic field. To compensate for disturbances and uncertainties, we develop a nonlinear mismatch controller. We define the model mismatch error as the difference between our model's predicted velocity and the actual velocity of the $μ$bot. We employ a Gaussian Process to learn the model mismatch error as a function of the applied control input. Then we use a least-squares minimization to select a control action that minimizes the difference between the actual velocity of the $μ$bot and a reference velocity. We demonstrate the online performance of our joint learning and control algorithm in simulation, where our approach accurately learns the model mismatch and improves tracking performance. We also validate our approach in an experiment and show that certain error metrics are reduced by up to $40\%$. △ Less

Submitted 13 August, 2023; v1 submitted 30 November, 2022; originally announced December 2022.

Comments: 8 pages, 9 figures

arXiv:2211.05251 [pdf, other]

A Graph-Based Approach to Generate Energy-Optimal Robot Trajectories in Polygonal Environments

Authors: Logan E. Beaver, Roberto Tron, Christos G. Cassandras

Abstract: As robotic systems continue to address emerging issues in areas such as logistics, mobility, manufacturing, and disaster response, it is increasingly important to rapidly generate safe and energy-efficient trajectories. In this article, we present a new approach to plan energy-optimal trajectories through cluttered environments containing polygonal obstacles. In particular, we develop a method to… ▽ More As robotic systems continue to address emerging issues in areas such as logistics, mobility, manufacturing, and disaster response, it is increasingly important to rapidly generate safe and energy-efficient trajectories. In this article, we present a new approach to plan energy-optimal trajectories through cluttered environments containing polygonal obstacles. In particular, we develop a method to quickly generate optimal trajectories for a double-integrator system, and we show that optimal path planning reduces to an integer program. To find an efficient solution, we present a distance-informed prefix search to efficiently generate optimal trajectories for a large class of environments. We demonstrate that our approach, while matching the performance of RRT* and Probabilistic Road Maps in terms of path length, outperforms both in terms of energy cost and computational time by up to an order of magnitude. We also demonstrate that our approach yields implementable trajectories in an experiment with a Crazyflie quadrotor. △ Less

Submitted 11 November, 2022; v1 submitted 9 November, 2022; originally announced November 2022.

Comments: 9 pages, 7 figures

arXiv:2209.11664 [pdf, other]

A Constraint-Driven Approach to Line Flocking: The V Formation as an Energy-Saving Strategy

Authors: Logan E. Beaver, Christopher Kroninger, Michael Dorothy, Andreas A. Malikopoulos

Abstract: The study of robotic flocking has received significant attention in the past twenty years. In this article, we present a constraint-driven control algorithm that minimizes the energy consumption of individual agents and yields an emergent V formation. As the formation emerges from the decentralized interaction between agents, our approach is robust to the spontaneous addition or removal of agents… ▽ More The study of robotic flocking has received significant attention in the past twenty years. In this article, we present a constraint-driven control algorithm that minimizes the energy consumption of individual agents and yields an emergent V formation. As the formation emerges from the decentralized interaction between agents, our approach is robust to the spontaneous addition or removal of agents to the system. First, we present an analytical model for the trailing upwash behind a fixed-wing UAV, and we derive the optimal air speed for trailing UAVs to maximize their travel endurance. Next, we prove that simply flying at the optimal airspeed will never lead to emergent flocking behavior, and we propose a new decentralized "anseroid" behavior that yields emergent V formations. We encode these behaviors in a constraint-driven control algorithm that minimizes the locomotive power of each UAV. Finally, we prove that UAVs initialized in an approximate V or echelon formation will converge under our proposed control law, and we demonstrate this emergence occurs in real-time in simulation and in physical experiments with a fleet of Crazyflie quadrotors. △ Less

Submitted 23 September, 2022; originally announced September 2022.

Comments: 12 pages, 7 figures

arXiv:2111.03232 [pdf, other]

doi 10.1109/MARSS55884.2022.9870476

A First-Order Approach to Model Simultaneous Control of Multiple Microrobots

Authors: Logan E. Beaver, Sambeeta Das, Andreas A. Malikopoulos

Abstract: The control of swarm systems is relatively well understood for simple robotic platforms at the macro scale. However, there are still several unanswered questions about how similar results can be achieved for microrobots. In this paper, we propose a modeling framework based on a dynamic model of magnetized self-propelling Janus microrobots under a global magnetic field. We verify our model experime… ▽ More The control of swarm systems is relatively well understood for simple robotic platforms at the macro scale. However, there are still several unanswered questions about how similar results can be achieved for microrobots. In this paper, we propose a modeling framework based on a dynamic model of magnetized self-propelling Janus microrobots under a global magnetic field. We verify our model experimentally and provide methods that can aim at accurately describing the behavior of microrobots while modeling their simultaneous control. The model can be generalized to other microrobotic platforms in low Reynolds number environments. △ Less

Submitted 4 November, 2021; originally announced November 2021.

Comments: 7 pages, 2 figures

arXiv:2109.05995 [pdf, other]

doi 10.1109/ITSC55140.2022.9921797

A Scalable Last-Mile Delivery Service: From Simulation to Scaled Experiment

Authors: Meera Ratnagiri, Clare O'Dwyer, Logan E. Beaver, Heeseung Bang, Behdad Chalaki, Andreas A. Malikopoulos

Abstract: In this paper, we investigate the problem of a last-mile delivery service that selects up to $N$ available vehicles to deliver $M$ packages from a centralized depot to $M$ delivery locations. The objective of the last-mile delivery service is to jointly maximize customer satisfaction (minimize delivery time) and minimize operating cost (minimize total travel time) by selecting the optimal number o… ▽ More In this paper, we investigate the problem of a last-mile delivery service that selects up to $N$ available vehicles to deliver $M$ packages from a centralized depot to $M$ delivery locations. The objective of the last-mile delivery service is to jointly maximize customer satisfaction (minimize delivery time) and minimize operating cost (minimize total travel time) by selecting the optimal number of vehicles to perform the deliveries. We model this as an assignment (vehicles to packages) and path planning (determining the delivery order and route) problem, which is equivalent to the NP-hard multiple traveling salesperson problem. We propose a scalable heuristic algorithm, which sacrifices some optimality to achieve a reasonable computational cost for a high number of packages. The algorithm combines hierarchical clustering with a greedy search. To validate our approach, we compare the results of our simulation to experiments in a $1$:$25$ scale robotic testbed for future mobility systems. △ Less

Submitted 13 September, 2021; originally announced September 2021.

Comments: 7 pages, 8 figures

Journal ref: Proceedings of the 25th IEEE International Conference on Intelligent Transportation Systems (ITSC), 2022

arXiv:2109.05988 [pdf, other]

doi 10.1109/LCSYS.2021.3133801

Constraint-Driven Optimal Control of Multi-Agent Systems: A Highway Platooning Case Study

Authors: Logan E. Beaver, Andreas A. Malikopoulos

Abstract: Platooning has been exploited as a method for vehicles to minimize energy consumption. In this article, we present a constraint-driven optimal control framework that yields emergent platooning behavior for connected and automated vehicles operating in an open transportation system. Our approach combines recent insights in constraint-driven optimal control with the physical aerodynamic interactions… ▽ More Platooning has been exploited as a method for vehicles to minimize energy consumption. In this article, we present a constraint-driven optimal control framework that yields emergent platooning behavior for connected and automated vehicles operating in an open transportation system. Our approach combines recent insights in constraint-driven optimal control with the physical aerodynamic interactions between vehicles in a highway setting. The result is a set of equations that describes when platooning is an appropriate strategy, as well as a descriptive optimal control law that yields emergent platooning behavior. Finally, we demonstrate these properties in simulation. △ Less

Submitted 6 December, 2021; v1 submitted 13 September, 2021; originally announced September 2021.

Comments: 6 pages, 3 figures

Journal ref: IEEE Control Systems Letters, vol. 6, pp. 1754-1759, 2021

arXiv:2109.02811 [pdf, other]

doi 10.1109/DTPI55838.2022.9998963

A Digital Smart City for Emerging Mobility Systems

Authors: Raymond M. Zayas, Logan E. Beaver, Behdad Chalaki, Heeseung Bang, Andreas A. Malikopoulos

Abstract: The increasing demand for emerging mobility systems with connected and automated vehicles has imposed the necessity for quality testing environments to support their development. In this paper, we introduce a Unity-based virtual simulation environment for emerging mobility systems, called the Information and Decision Science Lab's Scaled Smart Digital City (IDS 3D City), intended to operate alongs… ▽ More The increasing demand for emerging mobility systems with connected and automated vehicles has imposed the necessity for quality testing environments to support their development. In this paper, we introduce a Unity-based virtual simulation environment for emerging mobility systems, called the Information and Decision Science Lab's Scaled Smart Digital City (IDS 3D City), intended to operate alongside its physical peer and its established control framework. By utilizing the Robot Operation System, AirSim, and Unity, we constructed a simulation environment capable of iteratively designing experiments significantly faster than it is possible in a physical testbed. This environment provides an intermediate step to validate the effectiveness of our control algorithms prior to their implementation in the physical testbed. The IDS 3D City also enables us to demonstrate that our control algorithms work independently of the underlying vehicle dynamics, as the vehicle dynamics introduced by AirSim operate at a different scale than our scaled smart city. Finally, we demonstrate the behavior of our digital environment by performing an experiment in both the virtual and physical environments and comparing their outputs. △ Less

Submitted 11 January, 2023; v1 submitted 6 September, 2021; originally announced September 2021.

Comments: 6 pages, 8 figures

Journal ref: IEEE 2nd International Conference on Digital Twins and Parallel Intelligence (DTPI), 2022

arXiv:2103.03339 [pdf, other]

Optimal Control of Differentially Flat Systems is Surprisingly Easy

Authors: Logan E. Beaver, Andreas A. Malikopoulos

Abstract: As we move to increasingly complex cyber-physical systems (CPS), new approaches are needed to plan efficient state trajectories in real-time. In this paper, we propose an approach to significantly reduce the complexity of solving optimal control problems for a class of CPS with nonlinear dynamics. We exploit the property of differential flatness to simplify the Euler-Lagrange equations that arise… ▽ More As we move to increasingly complex cyber-physical systems (CPS), new approaches are needed to plan efficient state trajectories in real-time. In this paper, we propose an approach to significantly reduce the complexity of solving optimal control problems for a class of CPS with nonlinear dynamics. We exploit the property of differential flatness to simplify the Euler-Lagrange equations that arise during optimization, and this simplification eliminates the numerical instabilities that plague optimal control in general. We also present an explicit differential equation that describes the evolution of the optimal state trajectory, and we extend our results to consider both the unconstrained and constrained cases. Furthermore, we demonstrate the performance of our approach by generating the optimal trajectory for a planar manipulator with two revolute joints. We show in simulation that our approach is able to generate the constrained optimal trajectory in $4.5$ ms while respecting workspace constraints and switching between a `left' and `right' bend in the elbow joint. △ Less

Submitted 26 August, 2023; v1 submitted 4 March, 2021; originally announced March 2021.

Comments: 14 pages, 4 figures

arXiv:2101.06288 [pdf, other]

doi 10.1109/MED51440.2021.9480318

Energy-Optimal Goal Assignment of Multi-Agent System with Goal Trajectories in Polynomials

Authors: Heeseung Bang, Logan Beaver, Andreas A. Malikopoulos

Abstract: In this paper, we propose an approach for solving an energy-optimal goal assignment problem to generate the desired formation in multi-agent systems. Each agent solves a decentralized optimization problem with only local information about its neighboring agents and the goals. The optimization problem consists of two sub-problems. The first problem seeks to minimize the energy for each agent to rea… ▽ More In this paper, we propose an approach for solving an energy-optimal goal assignment problem to generate the desired formation in multi-agent systems. Each agent solves a decentralized optimization problem with only local information about its neighboring agents and the goals. The optimization problem consists of two sub-problems. The first problem seeks to minimize the energy for each agent to reach certain goals, while the second problem entreats an optimal combination of goal and agent pairs that minimizes the energy cost. By assuming the goal trajectories are given in a polynomial form, we prove the solution to the formulated problem exists globally. Finally, the effectiveness of the proposed approach is validated through the simulation. △ Less

Submitted 15 January, 2021; originally announced January 2021.

Comments: 7 pages, 4 figures

arXiv:2009.14279 [pdf, other]

doi 10.1016/j.arcontrol.2021.03.004

An Overview on Optimal Flocking

Authors: Logan E. Beaver, Andreas A. Malikopoulos

Abstract: The study of robotic flocking has received considerable attention in the past twenty years. As we begin to deploy flocking control algorithms on physical multi-agent and swarm systems, there is an increasing necessity for rigorous promises on safety and performance. In this paper, we present an overview the literature focusing on optimization approaches to achieve flocking behavior that provide st… ▽ More The study of robotic flocking has received considerable attention in the past twenty years. As we begin to deploy flocking control algorithms on physical multi-agent and swarm systems, there is an increasing necessity for rigorous promises on safety and performance. In this paper, we present an overview the literature focusing on optimization approaches to achieve flocking behavior that provide strong safety guarantees. We separate the literature into cluster and line flocking, and categorize cluster flocking with respect to the system-level objective, which may be realized by a reactive or planning control algorithm. We also categorize the line flocking literature by the energy-saving mechanism that is exploited by the agents. We present several approaches aimed at minimizing the communication and computational requirements in real systems via neighbor filtering and event-driven planning, and conclude with our perspective on the outlook and future research direction of optimal flocking as a field. △ Less

Submitted 22 January, 2021; v1 submitted 29 September, 2020; originally announced September 2020.

Comments: 21 pages, 5 figures

Journal ref: Annual Reviews in Control, April 2021

arXiv:2009.00588 [pdf, ps, other]

Energy-Optimal Motion Planning for Agents: Barycentric Motion and Collision Avoidance Constraints

Authors: Logan E. Beaver, Michael Dorothy, Christopher Kroninger, Andreas A. Malikopoulos

Abstract: As robotic swarm systems emerge, it is increasingly important to provide strong guarantees on energy consumption and safety to maximize system performance. One approach to achieve these guarantees is through constraint-driven control, where agents seek to minimize energy consumption subject to a set of safety and task constraints. In this paper, we provide a sufficient and necessary condition for… ▽ More As robotic swarm systems emerge, it is increasingly important to provide strong guarantees on energy consumption and safety to maximize system performance. One approach to achieve these guarantees is through constraint-driven control, where agents seek to minimize energy consumption subject to a set of safety and task constraints. In this paper, we provide a sufficient and necessary condition for an energy-minimizing agent with integrator dynamics to have a continuous control input at the transition between unconstrained and constrained trajectories. In addition, we present and analyze barycentric motion and collision avoidance constraints to be used in constraint-driven control of swarms. △ Less

Submitted 1 September, 2020; originally announced September 2020.

Comments: 6 pages, no figures

Journal ref: 2021 American Control Conference (ACC), pp 1037-1042

arXiv:2003.07310 [pdf, ps, other]

doi 10.1109/CDC42340.2020.9304333

Beyond Reynolds: A Constraint-Driven Approach to Cluster Flocking

Authors: Logan E. Beaver, Andreas A. Malikopoulos

Abstract: In this paper, we present an original set of flocking rules using an ecologically-inspired paradigm for control of multi-robot systems. We translate these rules into a constraint-driven optimal control problem where the agents minimize energy consumption subject to safety and task constraints. We prove several properties about the feasible space of the optimal control problem and show that velocit… ▽ More In this paper, we present an original set of flocking rules using an ecologically-inspired paradigm for control of multi-robot systems. We translate these rules into a constraint-driven optimal control problem where the agents minimize energy consumption subject to safety and task constraints. We prove several properties about the feasible space of the optimal control problem and show that velocity consensus is an optimal solution. We also motivate the inclusion of slack variables in constraint-driven problems when the global state is only partially observable by each agent. Finally, we analyze the case where the communication topology is fixed and connected, and prove that our proposed flocking rules achieve velocity consensus. △ Less

Submitted 5 May, 2020; v1 submitted 16 March, 2020; originally announced March 2020.

Comments: 6 pages

Journal ref: 2020 59th IEE Conference on Decision and Control (CDC), 2020, pp 208-213

arXiv:2001.11176 [pdf, other]

doi 10.1109/IV47402.2020.9304531

Experimental Validation of a Real-Time Optimal Controller for Coordination of CAVs in a Multi-Lane Roundabout

Authors: Behdad Chalaki, Logan E. Beaver, Andreas A. Malikopoulos

Abstract: Roundabouts in conjunction with other traffic scenarios, e.g., intersections, merging roadways, speed reduction zones, can induce congestion in a transportation network due to driver responses to various disturbances. Research efforts have shown that smoothing traffic flow and eliminating stop-and-go driving can both improve fuel efficiency of the vehicles and the throughput of a roundabout. In th… ▽ More Roundabouts in conjunction with other traffic scenarios, e.g., intersections, merging roadways, speed reduction zones, can induce congestion in a transportation network due to driver responses to various disturbances. Research efforts have shown that smoothing traffic flow and eliminating stop-and-go driving can both improve fuel efficiency of the vehicles and the throughput of a roundabout. In this paper, we validate an optimal control framework developed earlier in a multi-lane roundabout scenario using the University of Delaware's scaled smart city (UDSSC). We first provide conditions where the solution is optimal. Then, we demonstrate the feasibility of the solution using experiments at UDSSC, and show that the optimal solution completely eliminates stop-and-go driving while preserving safety. △ Less

Submitted 18 May, 2020; v1 submitted 29 January, 2020; originally announced January 2020.

Comments: 6 Pages, 4 Figures, 1 table

Journal ref: IEEE Intelligent Vehicles Symposium (IV), (2020), 504-509

arXiv:1909.10033 [pdf, other]

doi 10.1109/ITSC45102.2020.9294570

A Game-Theoretic Analysis of the Social Impact of Connected and Automated Vehicles

Authors: Ioannis Vasileios Chremos, Logan Beaver, Andreas Malikopoulos

Abstract: In this paper, we address the much-anticipated deployment of connected and automated vehicles (CAVs) in society by modeling and analyzing the social-mobility dilemma in a game-theoretic approach. We formulate this dilemma as a normal-form game of players making a binary decision: whether to travel with a CAV (CAV travel) or not (non-CAV travel) and by constructing an intuitive payoff function insp… ▽ More In this paper, we address the much-anticipated deployment of connected and automated vehicles (CAVs) in society by modeling and analyzing the social-mobility dilemma in a game-theoretic approach. We formulate this dilemma as a normal-form game of players making a binary decision: whether to travel with a CAV (CAV travel) or not (non-CAV travel) and by constructing an intuitive payoff function inspired by the socially beneficial outcomes of a mobility system consisting of CAVs. We show that the game is equivalent to the Prisoner's dilemma, which implies that the rational collective decision is the opposite of the socially optimum. We present two different solutions to tackle this phenomenon: one with a preference structure and the other with institutional arrangements. In the first approach, we implement a social mechanism that incentivizes players to non-CAV travel and derive a lower bound on the players that ensures an equilibrium of non-CAV travel. In the second approach, we investigate the possibility of players bargaining to create an institution that enforces non-CAV travel and show that as the number of players increases, the incentive ratio of non-CAV travel over CAV travel tends to zero. We conclude by showcasing the last result with a numerical study. △ Less

Submitted 2 June, 2020; v1 submitted 22 September, 2019; originally announced September 2019.

arXiv:1903.01632 [pdf, other]

doi 10.1080/00423114.2020.1730412

Demonstration of a Time-Efficient Mobility System Using a Scaled Smart City

Authors: Logan E. Beaver, Behdad Chalaki, AM Ishtiaque Mahbub, Liuhui Zhao, Ray Zayas, Andreas A. Malikopoulos

Abstract: The implementation of connected and automated vehicle (CAV) technologies enables a novel computational framework to deliver real-time control actions that optimize travel time, energy, and safety. Hardware is an integral part of any practical implementation of CAVs, and as such, it should be incorporated in any validation method. However, high costs associated with full scale, field testing of CAV… ▽ More The implementation of connected and automated vehicle (CAV) technologies enables a novel computational framework to deliver real-time control actions that optimize travel time, energy, and safety. Hardware is an integral part of any practical implementation of CAVs, and as such, it should be incorporated in any validation method. However, high costs associated with full scale, field testing of CAVs have proven to be a significant barrier. In this paper, we present the implementation of a decentralized control framework, which was developed previously, in a scaled-city using robotic CAVs, and discuss the implications of CAVs on travel time. Supplemental information and videos can be found at https://sites.google.com/view/ud-ids-lab/tfms. △ Less

Submitted 21 November, 2019; v1 submitted 4 March, 2019; originally announced March 2019.

Journal ref: Vehicle System Dynamics 58 (2020) 787-804

arXiv:1812.06120 [pdf, other]

Simulation to Scaled City: Zero-Shot Policy Transfer for Traffic Control via Autonomous Vehicles

Authors: Kathy Jang, Eugene Vinitsky, Behdad Chalaki, Ben Remer, Logan Beaver, Andreas Malikopoulos, Alexandre Bayen

Abstract: Using deep reinforcement learning, we train control policies for autonomous vehicles leading a platoon of vehicles onto a roundabout. Using Flow, a library for deep reinforcement learning in micro-simulators, we train two policies, one policy with noise injected into the state and action space and one without any injected noise. In simulation, the autonomous vehicle learns an emergent metering beh… ▽ More Using deep reinforcement learning, we train control policies for autonomous vehicles leading a platoon of vehicles onto a roundabout. Using Flow, a library for deep reinforcement learning in micro-simulators, we train two policies, one policy with noise injected into the state and action space and one without any injected noise. In simulation, the autonomous vehicle learns an emergent metering behavior for both policies in which it slows to allow for smoother merging. We then directly transfer this policy without any tuning to the University of Delaware Scaled Smart City (UDSSC), a 1:25 scale testbed for connected and automated vehicles. We characterize the performance of both policies on the scaled city. We show that the noise-free policy winds up crashing and only occasionally metering. However, the noise-injected policy consistently performs the metering behavior and remains collision-free, suggesting that the noise helps with the zero-shot policy transfer. Additionally, the transferred, noise-injected policy leads to a 5% reduction of average travel time and a reduction of 22% in maximum travel time in the UDSSC. Videos of the controllers can be found at https://sites.google.com/view/iccps-policy-transfer. △ Less

Submitted 22 February, 2019; v1 submitted 14 December, 2018; originally announced December 2018.

Comments: To be published at the International Conference on Cyber Physical Systems (ICCPS) 2019. 10 pages, 9 figures

ACM Class: I.2.1; I.2.4; I.2.6; I.2.10; I.6.5

Showing 1–18 of 18 results for author: Beaver, L