-
CtRL-Sim: Reactive and Controllable Driving Agents with Offline Reinforcement Learning
Authors:
Luke Rowe,
Roger Girgis,
Anthony Gosselin,
Bruno Carrez,
Florian Golemo,
Felix Heide,
Liam Paull,
Christopher Pal
Abstract:
Evaluating autonomous vehicle stacks (AVs) in simulation typically involves replaying driving logs from real-world recorded traffic. However, agents replayed from offline data are not reactive and hard to intuitively control. Existing approaches address these challenges by proposing methods that rely on heuristics or generative models of real-world data but these approaches either lack realism or…
▽ More
Evaluating autonomous vehicle stacks (AVs) in simulation typically involves replaying driving logs from real-world recorded traffic. However, agents replayed from offline data are not reactive and hard to intuitively control. Existing approaches address these challenges by proposing methods that rely on heuristics or generative models of real-world data but these approaches either lack realism or necessitate costly iterative sampling procedures to control the generated behaviours. In this work, we take an alternative approach and propose CtRL-Sim, a method that leverages return-conditioned offline reinforcement learning to efficiently generate reactive and controllable traffic agents. Specifically, we process real-world driving data through a physics-enhanced Nocturne simulator to generate a diverse offline reinforcement learning dataset, annotated with various reward terms. With this dataset, we train a return-conditioned multi-agent behaviour model that allows for fine-grained manipulation of agent behaviours by modifying the desired returns for the various reward components. This capability enables the generation of a wide range of driving behaviours beyond the scope of the initial dataset, including adversarial behaviours. We demonstrate that CtRL-Sim can generate diverse and realistic safety-critical scenarios while providing fine-grained control over agent behaviours.
△ Less
Submitted 14 June, 2024; v1 submitted 28 March, 2024;
originally announced March 2024.
-
Direct Behavior Specification via Constrained Reinforcement Learning
Authors:
Julien Roy,
Roger Girgis,
Joshua Romoff,
Pierre-Luc Bacon,
Christopher Pal
Abstract:
The standard formulation of Reinforcement Learning lacks a practical way of specifying what are admissible and forbidden behaviors. Most often, practitioners go about the task of behavior specification by manually engineering the reward function, a counter-intuitive process that requires several iterations and is prone to reward hacking by the agent. In this work, we argue that constrained RL, whi…
▽ More
The standard formulation of Reinforcement Learning lacks a practical way of specifying what are admissible and forbidden behaviors. Most often, practitioners go about the task of behavior specification by manually engineering the reward function, a counter-intuitive process that requires several iterations and is prone to reward hacking by the agent. In this work, we argue that constrained RL, which has almost exclusively been used for safe RL, also has the potential to significantly reduce the amount of work spent for reward specification in applied RL projects. To this end, we propose to specify behavioral preferences in the CMDP framework and to use Lagrangian methods to automatically weigh each of these behavioral constraints. Specifically, we investigate how CMDPs can be adapted to solve goal-based tasks while adhering to several constraints simultaneously. We evaluate this framework on a set of continuous control tasks relevant to the application of Reinforcement Learning for NPC design in video games.
△ Less
Submitted 18 June, 2022; v1 submitted 22 December, 2021;
originally announced December 2021.
-
Latent Variable Sequential Set Transformers For Joint Multi-Agent Motion Prediction
Authors:
Roger Girgis,
Florian Golemo,
Felipe Codevilla,
Martin Weiss,
Jim Aldon D'Souza,
Samira Ebrahimi Kahou,
Felix Heide,
Christopher Pal
Abstract:
Robust multi-agent trajectory prediction is essential for the safe control of robotic systems. A major challenge is to efficiently learn a representation that approximates the true joint distribution of contextual, social, and temporal information to enable planning. We propose Latent Variable Sequential Set Transformers which are encoder-decoder architectures that generate scene-consistent multi-…
▽ More
Robust multi-agent trajectory prediction is essential for the safe control of robotic systems. A major challenge is to efficiently learn a representation that approximates the true joint distribution of contextual, social, and temporal information to enable planning. We propose Latent Variable Sequential Set Transformers which are encoder-decoder architectures that generate scene-consistent multi-agent trajectories. We refer to these architectures as "AutoBots". The encoder is a stack of interleaved temporal and social multi-head self-attention (MHSA) modules which alternately perform equivariant processing across the temporal and social dimensions. The decoder employs learnable seed parameters in combination with temporal and social MHSA modules allowing it to perform inference over the entire future scene in a single forward pass efficiently. AutoBots can produce either the trajectory of one ego-agent or a distribution over the future trajectories for all agents in the scene. For the single-agent prediction case, our model achieves top results on the global nuScenes vehicle motion prediction leaderboard, and produces strong results on the Argoverse vehicle prediction challenge. In the multi-agent setting, we evaluate on the synthetic partition of TrajNet++ dataset to showcase the model's socially-consistent predictions. We also demonstrate our model on general sequences of sets and provide illustrative experiments modelling the sequential structure of the multiple strokes that make up symbols in the Omniglot data. A distinguishing feature of AutoBots is that all models are trainable on a single desktop GPU (1080 Ti) in under 48h.
△ Less
Submitted 10 February, 2022; v1 submitted 19 February, 2021;
originally announced April 2021.
-
Navigation Agents for the Visually Impaired: A Sidewalk Simulator and Experiments
Authors:
Martin Weiss,
Simon Chamorro,
Roger Girgis,
Margaux Luck,
Samira E. Kahou,
Joseph P. Cohen,
Derek Nowrouzezahrai,
Doina Precup,
Florian Golemo,
Chris Pal
Abstract:
Millions of blind and visually-impaired (BVI) people navigate urban environments every day, using smartphones for high-level path-planning and white canes or guide dogs for local information. However, many BVI people still struggle to travel to new places. In our endeavor to create a navigation assistant for the BVI, we found that existing Reinforcement Learning (RL) environments were unsuitable f…
▽ More
Millions of blind and visually-impaired (BVI) people navigate urban environments every day, using smartphones for high-level path-planning and white canes or guide dogs for local information. However, many BVI people still struggle to travel to new places. In our endeavor to create a navigation assistant for the BVI, we found that existing Reinforcement Learning (RL) environments were unsuitable for the task. This work introduces SEVN, a sidewalk simulation environment and a neural network-based approach to creating a navigation agent. SEVN contains panoramic images with labels for house numbers, doors, and street name signs, and formulations for several navigation tasks. We study the performance of an RL algorithm (PPO) in this setting. Our policy model fuses multi-modal observations in the form of variable resolution images, visible text, and simulated GPS data to navigate to a goal door. We hope that this dataset, simulator, and experimental results will provide a foundation for further research into the creation of agents that can assist members of the BVI community with outdoor navigation.
△ Less
Submitted 29 October, 2019;
originally announced October 2019.
-
A Survey of Mobile Computing for the Visually Impaired
Authors:
Martin Weiss,
Margaux Luck,
Roger Girgis,
Chris Pal,
Joseph Paul Cohen
Abstract:
The number of visually impaired or blind (VIB) people in the world is estimated at several hundred million. Based on a series of interviews with the VIB and developers of assistive technology, this paper provides a survey of machine-learning based mobile applications and identifies the most relevant applications. We discuss the functionality of these apps, how they align with the needs and require…
▽ More
The number of visually impaired or blind (VIB) people in the world is estimated at several hundred million. Based on a series of interviews with the VIB and developers of assistive technology, this paper provides a survey of machine-learning based mobile applications and identifies the most relevant applications. We discuss the functionality of these apps, how they align with the needs and requirements of the VIB users, and how they can be improved with techniques such as federated learning and model compression. As a result of this study we identify promising future directions of research in mobile perception, micro-navigation, and content-summarization.
△ Less
Submitted 27 November, 2018; v1 submitted 25 November, 2018;
originally announced November 2018.
-
Performance evaluation of a new route optimization technique for mobile IP
Authors:
Moheb R Girgis,
Tarek M Mahmoud,
Youssef S Takroni,
Hassan S Hassan
Abstract:
Mobile ip (mip) is an internet protocol that allows mobile nodes to have continuous network connectivity to the internet without changing their ip addresses while moving to other networks. The packets sent from correspondent node (cn) to a mobile node (mn) go first through the mobile node's home agent (ha), then the ha tunnels them to the mn's foreign network. One of the main problems in the origi…
▽ More
Mobile ip (mip) is an internet protocol that allows mobile nodes to have continuous network connectivity to the internet without changing their ip addresses while moving to other networks. The packets sent from correspondent node (cn) to a mobile node (mn) go first through the mobile node's home agent (ha), then the ha tunnels them to the mn's foreign network. One of the main problems in the original mip is the triangle routing problem. Triangle routing problem appears when the indirect path between cn and mn through the ha is longer than the direct path. This paper proposes a new technique to improve the performance of the original mip during the handoff. The proposed technique reduces the delay, the packet loss and the registration time for all the packets transferred between the cn and the mn. In this technique, tunneling occurs at two levels above the ha in a hierarchical network. To show the effectiveness of the proposed technique, it is compared with the original mip and another technique for solving the same problem in which tunneling occurs at one level above the ha. Simulation results presented in this paper are based on the ns2 mobility software on linux platform. The simulations results show that our proposed technique achieves better performance than the others, considering the packet delay, the packet losses during handoffs and the registration time, in different scenarios for the location of the mn with respect to the ha and fas.
△ Less
Submitted 6 April, 2010;
originally announced April 2010.