Search | arXiv e-print repository

Waymax: An Accelerated, Data-Driven Simulator for Large-Scale Autonomous Driving Research

Authors: Cole Gulino, Justin Fu, Wenjie Luo, George Tucker, Eli Bronstein, Yiren Lu, Jean Harb, Xinlei Pan, Yan Wang, Xiangyu Chen, John D. Co-Reyes, Rishabh Agarwal, Rebecca Roelofs, Yao Lu, Nico Montali, Paul Mougin, Zoey Yang, Brandyn White, Aleksandra Faust, Rowan McAllister, Dragomir Anguelov, Benjamin Sapp

Abstract: Simulation is an essential tool to develop and benchmark autonomous vehicle planning software in a safe and cost-effective manner. However, realistic simulation requires accurate modeling of nuanced and complex multi-agent interactive behaviors. To address these challenges, we introduce Waymax, a new data-driven simulator for autonomous driving in multi-agent scenes, designed for large-scale simul… ▽ More Simulation is an essential tool to develop and benchmark autonomous vehicle planning software in a safe and cost-effective manner. However, realistic simulation requires accurate modeling of nuanced and complex multi-agent interactive behaviors. To address these challenges, we introduce Waymax, a new data-driven simulator for autonomous driving in multi-agent scenes, designed for large-scale simulation and testing. Waymax uses publicly-released, real-world driving data (e.g., the Waymo Open Motion Dataset) to initialize or play back a diverse set of multi-agent simulated scenarios. It runs entirely on hardware accelerators such as TPUs/GPUs and supports in-graph simulation for training, making it suitable for modern large-scale, distributed machine learning workflows. To support online training and evaluation, Waymax includes several learned and hard-coded behavior models that allow for realistic interaction within simulation. To supplement Waymax, we benchmark a suite of popular imitation and reinforcement learning algorithms with ablation studies on different design decisions, where we highlight the effectiveness of routes as guidance for planning agents and the ability of RL to overfit against simulated agents. △ Less

Submitted 12 October, 2023; originally announced October 2023.

arXiv:2309.02812 [pdf]

Agent-based simulation of pedestrians' earthquake evacuation; application to Beirut, Lebanon

Authors: Rouba Iskandar, Kamel Allaw, Julie Dugdale, Elise Beck, Jocelyne Adjizian-Gérard, Cécile Cornou, Jacques Harb, Pascal Lacroix, Nada Badaro-Saliba, Stéphane Cartier, Rita Zaarour

Abstract: Most seismic risk assessment methods focus on estimating the damages to the built environment and the consequent socioeconomic losses without fully taking into account the social aspect of risk. Yet, human behaviour is a key element in predicting the human impact of an earthquake, therefore, it is important to include it in quantitative risk assessment studies. In this study, an interdisciplinary… ▽ More Most seismic risk assessment methods focus on estimating the damages to the built environment and the consequent socioeconomic losses without fully taking into account the social aspect of risk. Yet, human behaviour is a key element in predicting the human impact of an earthquake, therefore, it is important to include it in quantitative risk assessment studies. In this study, an interdisciplinary approach simulating pedestrians' evacuation during earthquakes at the city scale is developed using an agent-based model. The model integrates the seismic hazard, the physical vulnerability as well as individuals' behaviours and mobility. The simulator is applied to the case of Beirut, Lebanon. Lebanon is at the heart of the Levant fault system that has generated several Mw>7 earthquakes, the latest being in 1759. It is one of the countries with the highest seismic risk in the Mediterranean region. This is due to the high seismic vulnerability of the buildings due to the absence of mandatory seismic regulation until 2012, the high level of urbanization, and the lack of adequate spatial planning and risk prevention policies. Beirut as the main residential, economic and institutional hub of Lebanon is densely populated. To accommodate the growing need for urban development, constructions have almost taken over all of the green areas of the city; squares and gardens are disappearing to give place to skyscrapers. However, open spaces are safe places to shelter, away from debris, and therefore play an essential role in earthquake evacuation. Despite the massive urbanization, there are a few open spaces but locked gates and other types of anthropogenic barriers often limit their access. To simulate this complex context, pedestrians' evacuation simulations are run in a highly realistic spatial environment implemented in GAMA [1]. Previous data concerning soil and buildings in Beirut [2, 3] are complemented by new geographic data extracted from high-resolution Pleiades satellite images. The seismic loading is defined as a peak ground acceleration of 0.3g, as stated in Lebanese seismic regulations. Building damages are estimated using an artificial neural network trained to predict the mean damage [4] based on the seismic loading as well as the soil and building vibrational properties [5]. Moreover, the quantity and the footprint of the generated debris around each building are also estimated and included in the model. We simulate how topography, buildings, debris, and access to open spaces, affect individuals' mobility. Two city configurations are implemented: 1. Open spaces are accessible without any barriers; 2. Access to some open spaces is blocked. The first simulation results show that while 52% of the population is able to arrive to an open space within 5 minutes after an earthquake, this number is reduced to 39% when one of the open spaces is locked. These results show that the presence of accessible open spaces in a city and their proximity to the residential buildings is a crucial factor for ensuring people's safety when an earthquake occurs. △ Less

Submitted 6 September, 2023; originally announced September 2023.

arXiv:2207.01566 [pdf, other]

General Policy Evaluation and Improvement by Learning to Identify Few But Crucial States

Authors: Francesco Faccio, Aditya Ramesh, Vincent Herrmann, Jean Harb, Jürgen Schmidhuber

Abstract: Learning to evaluate and improve policies is a core problem of Reinforcement Learning (RL). Traditional RL algorithms learn a value function defined for a single policy. A recently explored competitive alternative is to learn a single value function for many policies. Here we combine the actor-critic architecture of Parameter-Based Value Functions and the policy embedding of Policy Evaluation Netw… ▽ More Learning to evaluate and improve policies is a core problem of Reinforcement Learning (RL). Traditional RL algorithms learn a value function defined for a single policy. A recently explored competitive alternative is to learn a single value function for many policies. Here we combine the actor-critic architecture of Parameter-Based Value Functions and the policy embedding of Policy Evaluation Networks to learn a single value function for evaluating (and thus hel** to improve) any policy represented by a deep neural network (NN). The method yields competitive experimental results. In continuous control problems with infinitely many states, our value function minimizes its prediction error by simultaneously learning a small set of `probing states' and a map** from actions produced in probing states to the policy's return. The method extracts crucial abstract knowledge about the environment in form of very few states sufficient to fully specify the behavior of many policies. A policy improves solely by changing actions in probing states, following the gradient of the value function's predictions. Surprisingly, it is possible to clone the behavior of a near-optimal policy in Swimmer-v3 and Hopper-v3 environments only by knowing how to act in 3 and 5 such learned states, respectively. Remarkably, our value function trained to evaluate NN policies is also invariant to changes of the policy architecture: we show that it allows for zero-shot learning of linear policies competitive with the best policy seen during training. Our code is public. △ Less

Submitted 4 July, 2022; originally announced July 2022.

Comments: Preprint. Under review

arXiv:2002.11833 [pdf, other]

Policy Evaluation Networks

Authors: Jean Harb, Tom Schaul, Doina Precup, Pierre-Luc Bacon

Abstract: Many reinforcement learning algorithms use value functions to guide the search for better policies. These methods estimate the value of a single policy while generalizing across many states. The core idea of this paper is to flip this convention and estimate the value of many policies, for a single set of states. This approach opens up the possibility of performing direct gradient ascent in policy… ▽ More Many reinforcement learning algorithms use value functions to guide the search for better policies. These methods estimate the value of a single policy while generalizing across many states. The core idea of this paper is to flip this convention and estimate the value of many policies, for a single set of states. This approach opens up the possibility of performing direct gradient ascent in policy space without seeing any new data. The main challenge for this approach is finding a way to represent complex policies that facilitates learning and generalization. To address this problem, we introduce a scalable, differentiable fingerprinting mechanism that retains essential policy information in a concise embedding. Our empirical results demonstrate that combining these three elements (learned Policy Evaluation Network, policy fingerprints, gradient ascent) can produce policies that outperform those that generated the training data, in zero-shot manner. △ Less

Submitted 26 February, 2020; originally announced February 2020.

Comments: 12 pages, 11 figures

arXiv:2002.05665 [pdf, other]

FRSign: A Large-Scale Traffic Light Dataset for Autonomous Trains

Authors: Jeanine Harb, Nicolas Rébéna, Raphaël Chosidow, Grégoire Roblin, Roman Potarusov, Hatem Hajri

Abstract: In the realm of autonomous transportation, there have been many initiatives for open-sourcing self-driving cars datasets, but much less for alternative methods of transportation such as trains. In this paper, we aim to bridge the gap by introducing FRSign, a large-scale and accurate dataset for vision-based railway traffic light detection and recognition. Our recordings were made on selected runni… ▽ More In the realm of autonomous transportation, there have been many initiatives for open-sourcing self-driving cars datasets, but much less for alternative methods of transportation such as trains. In this paper, we aim to bridge the gap by introducing FRSign, a large-scale and accurate dataset for vision-based railway traffic light detection and recognition. Our recordings were made on selected running trains in France and benefited from carefully hand-labeled annotations. An illustrative dataset which corresponds to ten percent of the acquired data to date is published in open source with the paper. It contains more than 100,000 images illustrating six types of French railway traffic lights and their possible color combinations, together with the relevant information regarding their acquisition such as date, time, sensor parameters, and bounding boxes. This dataset is published in open-source at the address \url{https://frsign.irt-systemx.fr}. We compare, analyze various properties of the dataset and provide metrics to express its variability. We also discuss specific challenges and particularities related to autonomous trains in comparison to autonomous cars. △ Less

Submitted 5 February, 2020; originally announced February 2020.

arXiv:1811.07004 [pdf, ps, other]

The Barbados 2018 List of Open Issues in Continual Learning

Authors: Tom Schaul, Hado van Hasselt, Joseph Modayil, Martha White, Adam White, Pierre-Luc Bacon, Jean Harb, Shibl Mourad, Marc Bellemare, Doina Precup

Abstract: We want to make progress toward artificial general intelligence, namely general-purpose agents that autonomously learn how to competently act in complex environments. The purpose of this report is to sketch a research outline, share some of the most important open issues we are facing, and stimulate further discussion in the community. The content is based on some of our discussions during a week-… ▽ More We want to make progress toward artificial general intelligence, namely general-purpose agents that autonomously learn how to competently act in complex environments. The purpose of this report is to sketch a research outline, share some of the most important open issues we are facing, and stimulate further discussion in the community. The content is based on some of our discussions during a week-long workshop held in Barbados in February 2018. △ Less

Submitted 16 November, 2018; originally announced November 2018.

Comments: NIPS Continual Learning Workshop 2018

arXiv:1712.00004 [pdf, other]

Learnings Options End-to-End for Continuous Action Tasks

Authors: Martin Klissarov, Pierre-Luc Bacon, Jean Harb, Doina Precup

Abstract: We present new results on learning temporally extended actions for continuoustasks, using the options framework (Suttonet al.[1999b], Precup [2000]). In orderto achieve this goal we work with the option-critic architecture (Baconet al.[2017])using a deliberation cost and train it with proximal policy optimization (Schulmanet al.[2017]) instead of vanilla policy gradient. Results on Mujoco domains… ▽ More We present new results on learning temporally extended actions for continuoustasks, using the options framework (Suttonet al.[1999b], Precup [2000]). In orderto achieve this goal we work with the option-critic architecture (Baconet al.[2017])using a deliberation cost and train it with proximal policy optimization (Schulmanet al.[2017]) instead of vanilla policy gradient. Results on Mujoco domains arepromising, but lead to interesting questions aboutwhena given option should beused, an issue directly connected to the use of initiation sets. △ Less

Submitted 29 November, 2017; originally announced December 2017.

arXiv:1709.04571 [pdf, other]

When Waiting is not an Option : Learning Options with a Deliberation Cost

Authors: Jean Harb, Pierre-Luc Bacon, Martin Klissarov, Doina Precup

Abstract: Recent work has shown that temporally extended actions (options) can be learned fully end-to-end as opposed to being specified in advance. While the problem of "how" to learn options is increasingly well understood, the question of "what" good options should be has remained elusive. We formulate our answer to what "good" options should be in the bounded rationality framework (Simon, 1957) through… ▽ More Recent work has shown that temporally extended actions (options) can be learned fully end-to-end as opposed to being specified in advance. While the problem of "how" to learn options is increasingly well understood, the question of "what" good options should be has remained elusive. We formulate our answer to what "good" options should be in the bounded rationality framework (Simon, 1957) through the notion of deliberation cost. We then derive practical gradient-based learning algorithms to implement this objective. Our results in the Arcade Learning Environment (ALE) show increased performance and interpretability. △ Less

Submitted 13 September, 2017; originally announced September 2017.

arXiv:1706.02275 [pdf, other]

Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments

Authors: Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, Pieter Abbeel, Igor Mordatch

Abstract: We explore deep reinforcement learning methods for multi-agent domains. We begin by analyzing the difficulty of traditional algorithms in the multi-agent case: Q-learning is challenged by an inherent non-stationarity of the environment, while policy gradient suffers from a variance that increases as the number of agents grows. We then present an adaptation of actor-critic methods that considers ac… ▽ More We explore deep reinforcement learning methods for multi-agent domains. We begin by analyzing the difficulty of traditional algorithms in the multi-agent case: Q-learning is challenged by an inherent non-stationarity of the environment, while policy gradient suffers from a variance that increases as the number of agents grows. We then present an adaptation of actor-critic methods that considers action policies of other agents and is able to successfully learn policies that require complex multi-agent coordination. Additionally, we introduce a training regimen utilizing an ensemble of policies for each agent that leads to more robust multi-agent policies. We show the strength of our approach compared to existing methods in cooperative as well as competitive scenarios, where agent populations are able to discover various physical and informational coordination strategies. △ Less

Submitted 14 March, 2020; v1 submitted 7 June, 2017; originally announced June 2017.

arXiv:1704.05495 [pdf, other]

Investigating Recurrence and Eligibility Traces in Deep Q-Networks

Authors: Jean Harb, Doina Precup

Abstract: Eligibility traces in reinforcement learning are used as a bias-variance trade-off and can often speed up training time by propagating knowledge back over time-steps in a single update. We investigate the use of eligibility traces in combination with recurrent networks in the Atari domain. We illustrate the benefits of both recurrent nets and eligibility traces in some Atari games, and highlight a… ▽ More Eligibility traces in reinforcement learning are used as a bias-variance trade-off and can often speed up training time by propagating knowledge back over time-steps in a single update. We investigate the use of eligibility traces in combination with recurrent networks in the Atari domain. We illustrate the benefits of both recurrent nets and eligibility traces in some Atari games, and highlight also the importance of the optimization used in the training. △ Less

Submitted 18 April, 2017; originally announced April 2017.

Comments: 8 pages, 3 figures, NIPS 2016 Deep Reinforcement Learning Workshop

arXiv:1609.05140 [pdf, other]

The Option-Critic Architecture

Authors: Pierre-Luc Bacon, Jean Harb, Doina Precup

Abstract: Temporal abstraction is key to scaling up learning and planning in reinforcement learning. While planning with temporally extended actions is well understood, creating such abstractions autonomously from data has remained challenging. We tackle this problem in the framework of options [Sutton, Precup & Singh, 1999; Precup, 2000]. We derive policy gradient theorems for options and propose a new opt… ▽ More Temporal abstraction is key to scaling up learning and planning in reinforcement learning. While planning with temporally extended actions is well understood, creating such abstractions autonomously from data has remained challenging. We tackle this problem in the framework of options [Sutton, Precup & Singh, 1999; Precup, 2000]. We derive policy gradient theorems for options and propose a new option-critic architecture capable of learning both the internal policies and the termination conditions of options, in tandem with the policy over options, and without the need to provide any additional rewards or subgoals. Experimental results in both discrete and continuous environments showcase the flexibility and efficiency of the framework. △ Less

Submitted 2 December, 2016; v1 submitted 16 September, 2016; originally announced September 2016.

Comments: Accepted to the Thirthy-first AAAI Conference On Artificial Intelligence (AAAI), 2017

Showing 1–11 of 11 results for author: Harb, J