-
QuadSwarm: A Modular Multi-Quadrotor Simulator for Deep Reinforcement Learning with Direct Thrust Control
Authors:
Zhehui Huang,
Sumeet Batra,
Tao Chen,
Rahul Krupani,
Tushar Kumar,
Artem Molchanov,
Aleksei Petrenko,
James A. Preiss,
Zhao**g Yang,
Gaurav S. Sukhatme
Abstract:
Reinforcement learning (RL) has shown promise in creating robust policies for robotics tasks. However, contemporary RL algorithms are data-hungry, often requiring billions of environment transitions to train successful policies. This necessitates the use of fast and highly-parallelizable simulators. In addition to speed, such simulators need to model the physics of the robots and their interaction…
▽ More
Reinforcement learning (RL) has shown promise in creating robust policies for robotics tasks. However, contemporary RL algorithms are data-hungry, often requiring billions of environment transitions to train successful policies. This necessitates the use of fast and highly-parallelizable simulators. In addition to speed, such simulators need to model the physics of the robots and their interaction with the environment to a level acceptable for transferring policies learned in simulation to reality. We present QuadSwarm, a fast, reliable simulator for research in single and multi-robot RL for quadrotors that addresses both issues. QuadSwarm, with fast forward-dynamics propagation decoupled from rendering, is designed to be highly parallelizable such that throughput scales linearly with additional compute. It provides multiple components tailored toward multi-robot RL, including diverse training scenarios, and provides domain randomization to facilitate the development and sim2real transfer of multi-quadrotor control policies. Initial experiments suggest that QuadSwarm achieves over 48,500 simulation samples per second (SPS) on a single quadrotor and over 62,000 SPS on eight quadrotors on a 16-core CPU. The code can be found in https://github.com/Zhehui-Huang/quad-swarm-rl.
△ Less
Submitted 15 June, 2023;
originally announced June 2023.
-
Resilient Coverage: Exploring the Local-to-Global Trade-off
Authors:
Ragesh K. Ramachandran,
Lifeng Zhou James A. Preiss,
Gaurav S. Sukhatme
Abstract:
We propose a centralized control framework to select suitable robots from a heterogeneous pool and place them at appropriate locations to monitor a region for events of interest. In the event of a robot failure, the framework repositions robots in a user-defined local neighborhood of the failed robot to compensate for the coverage loss. The central controller augments the team with additional robo…
▽ More
We propose a centralized control framework to select suitable robots from a heterogeneous pool and place them at appropriate locations to monitor a region for events of interest. In the event of a robot failure, the framework repositions robots in a user-defined local neighborhood of the failed robot to compensate for the coverage loss. The central controller augments the team with additional robots from the robot pool when simply repositioning robots fails to attain a user-specified level of desired coverage. The size of the local neighborhood around the failed robot and the desired coverage over the region are two objectives that can be manipulated to achieve a user-specified balance. We investigate the trade-off between the coverage compensation achieved through local repositioning and the computation required to plan the new robot locations. We also study the relationship between the size of the local neighborhood and the number of additional robots added to the team for a given user-specified level of desired coverage. We use extensive simulations and an experiment with a team of seven quadrotors to verify the effectiveness of our framework. Additionally, we show that to reach a high level of coverage in a neighborhood with a large robot population, it is more efficient to enlarge the neighborhood size, instead of adding additional robots and repositioning them.
△ Less
Submitted 15 April, 2020; v1 submitted 3 October, 2019;
originally announced October 2019.
-
Analyzing the Variance of Policy Gradient Estimators for the Linear-Quadratic Regulator
Authors:
James A. Preiss,
Sébastien M. R. Arnold,
Chen-Yu Wei,
Marius Kloft
Abstract:
We study the variance of the REINFORCE policy gradient estimator in environments with continuous state and action spaces, linear dynamics, quadratic cost, and Gaussian noise. These simple environments allow us to derive bounds on the estimator variance in terms of the environment and noise parameters. We compare the predictions of our bounds to the empirical variance in simulation experiments.
We study the variance of the REINFORCE policy gradient estimator in environments with continuous state and action spaces, linear dynamics, quadratic cost, and Gaussian noise. These simple environments allow us to derive bounds on the estimator variance in terms of the environment and noise parameters. We compare the predictions of our bounds to the empirical variance in simulation experiments.
△ Less
Submitted 2 October, 2019;
originally announced October 2019.
-
Resilience by Reconfiguration: Exploiting Heterogeneity in Robot Teams
Authors:
Ragesh K. Ramachandran,
James A. Preiss,
Gaurav S. Sukhatme
Abstract:
We propose a method to maintain high resource in a networked heterogeneous multi-robot system to resource failures. In our model, resources such as and computation are available on robots. The robots engaged in a joint task using these pooled resources. In our model, a resource on a particular robot becomes unavailable e.g., a sensor ceases to function due to a failure), the system reconfigures so…
▽ More
We propose a method to maintain high resource in a networked heterogeneous multi-robot system to resource failures. In our model, resources such as and computation are available on robots. The robots engaged in a joint task using these pooled resources. In our model, a resource on a particular robot becomes unavailable e.g., a sensor ceases to function due to a failure), the system reconfigures so that the robot continues to have to this resource by communicating with other robots. Specifically, we consider the problem of selecting edges to be in the system's communication graph after a resource has occurred. We define a metric that allows us to characterize the quality of the resource distribution in the represented by the communication graph. Upon a resource becoming unavailable due to failure, we reconfigure network so that the resource distribution is brought as to the ideal resource distribution as possible without a big change in the communication cost. Our approach uses integer semi-definite programming to achieve this goal. We also provide a simulated annealing method to compute a formation that satisfies the inter-robot distances imposed by the topology, along with other constraints. Our method can compute a communication topology, spatial formation, and formation change motion planning in a few seconds. We validate our method in simulation and real-robot experiments with a team of seven quadrotors.
△ Less
Submitted 14 May, 2019; v1 submitted 12 March, 2019;
originally announced March 2019.
-
Sim-to-(Multi)-Real: Transfer of Low-Level Robust Control Policies to Multiple Quadrotors
Authors:
Artem Molchanov,
Tao Chen,
Wolfgang Hönig,
James A. Preiss,
Nora Ayanian,
Gaurav S. Sukhatme
Abstract:
Quadrotor stabilizing controllers often require careful, model-specific tuning for safe operation. We use reinforcement learning to train policies in simulation that transfer remarkably well to multiple different physical quadrotors. Our policies are low-level, i.e., we map the rotorcrafts' state directly to the motor outputs. The trained control policies are very robust to external disturbances a…
▽ More
Quadrotor stabilizing controllers often require careful, model-specific tuning for safe operation. We use reinforcement learning to train policies in simulation that transfer remarkably well to multiple different physical quadrotors. Our policies are low-level, i.e., we map the rotorcrafts' state directly to the motor outputs. The trained control policies are very robust to external disturbances and can withstand harsh initial conditions such as throws. We show how different training methodologies (change of the cost function, modeling of noise, use of domain randomization) might affect flight performance. To the best of our knowledge, this is the first work that demonstrates that a simple neural network can learn a robust stabilizing low-level quadrotor controller (without the use of a stabilizing PD controller) that is shown to generalize to multiple quadrotors.
△ Less
Submitted 16 April, 2019; v1 submitted 11 March, 2019;
originally announced March 2019.
-
Downwash-Aware Trajectory Planning for Large Quadrotor Teams
Authors:
James A. Preiss,
Wolfgang Hönig,
Nora Ayanian,
Gaurav S. Sukhatme
Abstract:
We describe a method for formation-change trajectory planning for large quadrotor teams in obstacle-rich environments. Our method decomposes the planning problem into two stages: a discrete planner operating on a graph representation of the workspace, and a continuous refinement that converts the non-smooth graph plan into a set of C^k-continuous trajectories, locally optimizing an integral-square…
▽ More
We describe a method for formation-change trajectory planning for large quadrotor teams in obstacle-rich environments. Our method decomposes the planning problem into two stages: a discrete planner operating on a graph representation of the workspace, and a continuous refinement that converts the non-smooth graph plan into a set of C^k-continuous trajectories, locally optimizing an integral-squared-derivative cost. We account for the downwash effect, allowing safe flight in dense formations. We demonstrate the computational efficiency in simulation with up to 200 robots and the physical plausibility with an experiment with 32 nano-quadrotors. Our approach can compute safe and smooth trajectories for hundreds of quadrotors in dense environments with obstacles in a few minutes.
△ Less
Submitted 23 July, 2017; v1 submitted 16 April, 2017;
originally announced April 2017.