-
NeRAF: 3D Scene Infused Neural Radiance and Acoustic Fields
Authors:
Amandine Brunetto,
Sascha Hornauer,
Fabien Moutarde
Abstract:
Sound plays a major role in human perception, providing essential scene information alongside vision for understanding our environment. Despite progress in neural implicit representations, learning acoustics that match a visual scene is still challenging. We propose NeRAF, a method that jointly learns acoustic and radiance fields. NeRAF is designed as a Nerfstudio module for convenient access to r…
▽ More
Sound plays a major role in human perception, providing essential scene information alongside vision for understanding our environment. Despite progress in neural implicit representations, learning acoustics that match a visual scene is still challenging. We propose NeRAF, a method that jointly learns acoustic and radiance fields. NeRAF is designed as a Nerfstudio module for convenient access to realistic audio-visual generation. It synthesizes both novel views and spatialized audio at new positions, leveraging radiance field capabilities to condition the acoustic field with 3D scene information. At inference, each modality can be rendered independently and at spatially separated positions, providing greater versatility. We demonstrate the advantages of our method on the SoundSpaces dataset. NeRAF achieves substantial performance improvements over previous works while being more data-efficient. Furthermore, NeRAF enhances novel view synthesis of complex scenes trained with sparse data through cross-modal learning.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
Mesoscale Traffic Forecasting for Real-Time Bottleneck and Shockwave Prediction
Authors:
Raphael Chekroun,
Han Wang,
Jonathan Lee,
Marin Toromanoff,
Sascha Hornauer,
Fabien Moutarde,
Maria Laura Delle Monache
Abstract:
Accurate real-time traffic state forecasting plays a pivotal role in traffic control research. In particular, the CIRCLES consortium project necessitates predictive techniques to mitigate the impact of data source delays. After the success of the MegaVanderTest experiment, this paper aims at overcoming the current system limitations and develop a more suited approach to improve the real-time traff…
▽ More
Accurate real-time traffic state forecasting plays a pivotal role in traffic control research. In particular, the CIRCLES consortium project necessitates predictive techniques to mitigate the impact of data source delays. After the success of the MegaVanderTest experiment, this paper aims at overcoming the current system limitations and develop a more suited approach to improve the real-time traffic state estimation for the next iterations of the experiment. In this paper, we introduce the SA-LSTM, a deep forecasting method integrating Self-Attention (SA) on the spatial dimension with Long Short-Term Memory (LSTM) yielding state-of-the-art results in real-time mesoscale traffic forecasting. We extend this approach to multi-step forecasting with the n-step SA-LSTM, which outperforms traditional multi-step forecasting methods in the trade-off between short-term and long-term predictions, all while operating in real-time.
△ Less
Submitted 4 March, 2024; v1 submitted 8 February, 2024;
originally announced February 2024.
-
HiER: Highlight Experience Replay for Boosting Off-Policy Reinforcement Learning Agents
Authors:
Dániel Horváth,
Jesús Bujalance Martín,
Ferenc Gábor Erdős,
Zoltán Istenes,
Fabien Moutarde
Abstract:
Even though reinforcement-learning-based algorithms achieved superhuman performance in many domains, the field of robotics poses significant challenges as the state and action spaces are continuous, and the reward function is predominantly sparse. Furthermore, on many occasions, the agent is devoid of access to any form of demonstration. Inspired by human learning, in this work, we propose a metho…
▽ More
Even though reinforcement-learning-based algorithms achieved superhuman performance in many domains, the field of robotics poses significant challenges as the state and action spaces are continuous, and the reward function is predominantly sparse. Furthermore, on many occasions, the agent is devoid of access to any form of demonstration. Inspired by human learning, in this work, we propose a method named highlight experience replay (HiER) that creates a secondary highlight replay buffer for the most relevant experiences. For the weights update, the transitions are sampled from both the standard and the highlight experience replay buffer. It can be applied with or without the techniques of hindsight experience replay (HER) and prioritized experience replay (PER). Our method significantly improves the performance of the state-of-the-art, validated on 8 tasks of three robotic benchmarks. Furthermore, to exploit the full potential of HiER, we propose HiER+ in which HiER is enhanced with an arbitrary data collection curriculum learning method. Our implementation, the qualitative results, and a video presentation are available on the project site: http://www.danielhorvath.eu/hier/.
△ Less
Submitted 9 July, 2024; v1 submitted 14 December, 2023;
originally announced December 2023.
-
MBAPPE: MCTS-Built-Around Prediction for Planning Explicitly
Authors:
Raphael Chekroun,
Thomas Gilles,
Marin Toromanoff,
Sascha Hornauer,
Fabien Moutarde
Abstract:
We present MBAPPE, a novel approach to motion planning for autonomous driving combining tree search with a partially-learned model of the environment. Leveraging the inherent explainable exploration and optimization capabilities of the Monte-Carlo Search Tree (MCTS), our method addresses complex decision-making in a dynamic environment. We propose a framework that combines MCTS with supervised lea…
▽ More
We present MBAPPE, a novel approach to motion planning for autonomous driving combining tree search with a partially-learned model of the environment. Leveraging the inherent explainable exploration and optimization capabilities of the Monte-Carlo Search Tree (MCTS), our method addresses complex decision-making in a dynamic environment. We propose a framework that combines MCTS with supervised learning, enabling the autonomous vehicle to effectively navigate through diverse scenarios. Experimental results demonstrate the effectiveness and adaptability of our approach, showcasing improved real-time decision-making and collision avoidance. This paper contributes to the field by providing a robust solution for motion planning in autonomous driving systems, enhancing their explainability and reliability.
△ Less
Submitted 15 September, 2023;
originally announced September 2023.
-
TSGN: Temporal Scene Graph Neural Networks with Projected Vectorized Representation for Multi-Agent Motion Prediction
Authors:
Yunong Wu,
Thomas Gilles,
Bogdan Stanciulescu,
Fabien Moutarde
Abstract:
Predicting future motions of nearby agents is essential for an autonomous vehicle to take safe and effective actions. In this paper, we propose TSGN, a framework using Temporal Scene Graph Neural Networks with projected vectorized representations for multi-agent trajectory prediction. Projected vectorized representation models the traffic scene as a graph which is constructed by a set of vectors.…
▽ More
Predicting future motions of nearby agents is essential for an autonomous vehicle to take safe and effective actions. In this paper, we propose TSGN, a framework using Temporal Scene Graph Neural Networks with projected vectorized representations for multi-agent trajectory prediction. Projected vectorized representation models the traffic scene as a graph which is constructed by a set of vectors. These vectors represent agents, road network, and their spatial relative relationships. All relative features under this representation are both translationand rotation-invariant. Based on this representation, TSGN captures the spatial-temporal features across agents, road network, interactions among them, and temporal dependencies of temporal traffic scenes. TSGN can predict multimodal future trajectories for all agents simultaneously, plausibly, and accurately. Meanwhile, we propose a Hierarchical Lane Transformer for capturing interactions between agents and road network, which filters the surrounding road network and only keeps the most probable lane segments which could have an impact on the future behavior of the target agent. Without sacrificing the prediction performance, this greatly reduces the computational burden. Experiments show TSGN achieves state-of-the-art performance on the Argoverse motion forecasting benchmar.
△ Less
Submitted 14 May, 2023;
originally announced May 2023.
-
The Audio-Visual BatVision Dataset for Research on Sight and Sound
Authors:
Amandine Brunetto,
Sascha Hornauer,
Stella X. Yu,
Fabien Moutarde
Abstract:
Vision research showed remarkable success in understanding our world, propelled by datasets of images and videos. Sensor data from radar, LiDAR and cameras supports research in robotics and autonomous driving for at least a decade. However, while visual sensors may fail in some conditions, sound has recently shown potential to complement sensor data. Simulated room impulse responses (RIR) in 3D ap…
▽ More
Vision research showed remarkable success in understanding our world, propelled by datasets of images and videos. Sensor data from radar, LiDAR and cameras supports research in robotics and autonomous driving for at least a decade. However, while visual sensors may fail in some conditions, sound has recently shown potential to complement sensor data. Simulated room impulse responses (RIR) in 3D apartment-models became a benchmark dataset for the community, fostering a range of audiovisual research. In simulation, depth is predictable from sound, by learning bat-like perception with a neural network. Concurrently, the same was achieved in reality by using RGB-D images and echoes of chir** sounds. Biomimicking bat perception is an exciting new direction but needs dedicated datasets to explore the potential. Therefore, we collected the BatVision dataset to provide large-scale echoes in complex real-world scenes to the community. We equipped a robot with a speaker to emit chirps and a binaural microphone to record their echoes. Synchronized RGB-D images from the same perspective provide visual labels of traversed spaces. We sampled modern US office spaces to historic French university grounds, indoor and outdoor with large architectural variety. This dataset will allow research on robot echolocation, general audio-visual tasks and sound phænomena unavailable in simulated data. We show promising results for audio-only depth prediction and show how state-of-the-art work developed for simulated data can also succeed on our dataset. Project page: https://amandinebtto.github.io/Batvision-Dataset/
△ Less
Submitted 1 March, 2024; v1 submitted 13 March, 2023;
originally announced March 2023.
-
Uncertainty estimation for Cross-dataset performance in Trajectory prediction
Authors:
Thomas Gilles,
Stefano Sabatini,
Dzmitry Tsishkou,
Bogdan Stanciulescu,
Fabien Moutarde
Abstract:
While a lot of work has been carried on develo** trajectory prediction methods, and various datasets have been proposed for benchmarking this task, little study has been done so far on the generalizability and the transferability of these methods across dataset. In this paper, we observe the performance of two of the latest state-of-the-art trajectory prediction methods across four different dat…
▽ More
While a lot of work has been carried on develo** trajectory prediction methods, and various datasets have been proposed for benchmarking this task, little study has been done so far on the generalizability and the transferability of these methods across dataset. In this paper, we observe the performance of two of the latest state-of-the-art trajectory prediction methods across four different datasets (Argoverse, NuScenes, Interaction, Shifts). This analysis allows to gain some insights on the generalizability proprieties of most recent trajectory prediction models and to analyze which dataset is more representative of real driving scenes and therefore enables better transferability. Furthermore we present a novel method to estimate prediction uncertainty and show how it could be used to achieve better performance across datasets.
△ Less
Submitted 12 July, 2022; v1 submitted 15 May, 2022;
originally announced May 2022.
-
Assessing Cross-dataset Generalization of Pedestrian Crossing Predictors
Authors:
Joseph Gesnouin,
Steve Pechberti,
Bogdan Stanciulescu,
Fabien Moutarde
Abstract:
Pedestrian crossing prediction has been a topic of active research, resulting in many new algorithmic solutions. While measuring the overall progress of those solutions over time tends to be more and more established due to the new publicly available benchmark and standardized evaluation procedures, knowing how well existing predictors react to unseen data remains an unanswered question. This eval…
▽ More
Pedestrian crossing prediction has been a topic of active research, resulting in many new algorithmic solutions. While measuring the overall progress of those solutions over time tends to be more and more established due to the new publicly available benchmark and standardized evaluation procedures, knowing how well existing predictors react to unseen data remains an unanswered question. This evaluation is imperative as serviceable crossing behavior predictors should be set to work in various scenarii without compromising pedestrian safety due to misprediction. To this end, we conduct a study based on direct cross-dataset evaluation. Our experiments show that current state-of-the-art pedestrian behavior predictors generalize poorly in cross-dataset evaluation scenarii, regardless of their robustness during a direct training-test set evaluation setting. In the light of what we observe, we argue that the future of pedestrian crossing prediction, e.g. reliable and generalizable implementations, should not be about tailoring models, trained with very little available data, and tested in a classical train-test scenario with the will to infer anything about their behavior in real life. It should be about evaluating models in a cross-dataset setting while considering their uncertainty estimates under domain shift.
△ Less
Submitted 29 January, 2022;
originally announced January 2022.
-
STIR$^2$: Reward Relabelling for combined Reinforcement and Imitation Learning on sparse-reward tasks
Authors:
Jesus Bujalance Martin,
Fabien Moutarde
Abstract:
In the search for more sample-efficient reinforcement-learning (RL) algorithms, a promising direction is to leverage as much external off-policy data as possible. For instance, expert demonstrations. In the past, multiple ideas have been proposed to make good use of the demonstrations added to the replay buffer, such as pretraining on demonstrations only or minimizing additional cost functions. We…
▽ More
In the search for more sample-efficient reinforcement-learning (RL) algorithms, a promising direction is to leverage as much external off-policy data as possible. For instance, expert demonstrations. In the past, multiple ideas have been proposed to make good use of the demonstrations added to the replay buffer, such as pretraining on demonstrations only or minimizing additional cost functions. We present a new method, able to leverage both demonstrations and episodes collected online in any sparse-reward environment with any off-policy algorithm. Our method is based on a reward bonus given to demonstrations and successful episodes (via relabeling), encouraging expert imitation and self-imitation. Our experiments focus on several robotic-manipulation tasks across two different simulation environments. We show that our method based on reward relabeling improves the performance of the base algorithm (SAC and DDPG) on these tasks. Finally, our best algorithm STIR$^2$ (Self and Teacher Imitation by Reward Relabeling), which integrates into our method multiple improvements from previous works, is more data-efficient than all baselines.
△ Less
Submitted 28 February, 2023; v1 submitted 11 January, 2022;
originally announced January 2022.
-
GRI: General Reinforced Imitation and its Application to Vision-Based Autonomous Driving
Authors:
Raphael Chekroun,
Marin Toromanoff,
Sascha Hornauer,
Fabien Moutarde
Abstract:
Deep reinforcement learning (DRL) has been demonstrated to be effective for several complex decision-making applications such as autonomous driving and robotics. However, DRL is notoriously limited by its high sample complexity and its lack of stability. Prior knowledge, e.g. as expert demonstrations, is often available but challenging to leverage to mitigate these issues. In this paper, we propos…
▽ More
Deep reinforcement learning (DRL) has been demonstrated to be effective for several complex decision-making applications such as autonomous driving and robotics. However, DRL is notoriously limited by its high sample complexity and its lack of stability. Prior knowledge, e.g. as expert demonstrations, is often available but challenging to leverage to mitigate these issues. In this paper, we propose General Reinforced Imitation (GRI), a novel method which combines benefits from exploration and expert data and is straightforward to implement over any off-policy RL algorithm. We make one simplifying hypothesis: expert demonstrations can be seen as perfect data whose underlying policy gets a constant high reward. Based on this assumption, GRI introduces the notion of offline demonstration agents. This agent sends expert data which are processed both concurrently and indistinguishably with the experiences coming from the online RL exploration agent. We show that our approach enables major improvements on vision-based autonomous driving in urban environments. We further validate the GRI method on Mujoco continuous control tasks with different off-policy RL algorithms. Our method ranked first on the CARLA Leaderboard and outperforms World on Rails, the previous state-of-the-art, by 17%.
△ Less
Submitted 17 May, 2022; v1 submitted 16 November, 2021;
originally announced November 2021.
-
Learning from demonstrations with SACR2: Soft Actor-Critic with Reward Relabeling
Authors:
Jesus Bujalance Martin,
Raphael Chekroun,
Fabien Moutarde
Abstract:
During recent years, deep reinforcement learning (DRL) has made successful incursions into complex decision-making applications such as robotics, autonomous driving or video games. Off-policy algorithms tend to be more sample-efficient than their on-policy counterparts, and can additionally benefit from any off-policy data stored in the replay buffer. Expert demonstrations are a popular source for…
▽ More
During recent years, deep reinforcement learning (DRL) has made successful incursions into complex decision-making applications such as robotics, autonomous driving or video games. Off-policy algorithms tend to be more sample-efficient than their on-policy counterparts, and can additionally benefit from any off-policy data stored in the replay buffer. Expert demonstrations are a popular source for such data: the agent is exposed to successful states and actions early on, which can accelerate the learning process and improve performance. In the past, multiple ideas have been proposed to make good use of the demonstrations in the buffer, such as pretraining on demonstrations only or minimizing additional cost functions. We carry on a study to evaluate several of these ideas in isolation, to see which of them have the most significant impact. We also present a new method for sparse-reward tasks, based on a reward bonus given to demonstrations and successful episodes. First, we give a reward bonus to the transitions coming from demonstrations to encourage the agent to match the demonstrated behaviour. Then, upon collecting a successful episode, we relabel its transitions with the same bonus before adding them to the replay buffer, encouraging the agent to also match its previous successes. The base algorithm for our experiments is the popular Soft Actor-Critic (SAC), a state-of-the-art off-policy algorithm for continuous action spaces. Our experiments focus on manipulation robotics, specifically on a 3D reaching task for a robotic arm in simulation. We show that our method SACR2 based on reward relabeling improves the performance on this task, even in the absence of demonstrations.
△ Less
Submitted 3 December, 2021; v1 submitted 27 October, 2021;
originally announced October 2021.
-
THOMAS: Trajectory Heatmap Output with learned Multi-Agent Sampling
Authors:
Thomas Gilles,
Stefano Sabatini,
Dzmitry Tsishkou,
Bogdan Stanciulescu,
Fabien Moutarde
Abstract:
In this paper, we propose THOMAS, a joint multi-agent trajectory prediction framework allowing for an efficient and consistent prediction of multi-agent multi-modal trajectories. We present a unified model architecture for simultaneous agent future heatmap estimation, in which we leverage hierarchical and sparse image generation for fast and memory-efficient inference. We propose a learnable traje…
▽ More
In this paper, we propose THOMAS, a joint multi-agent trajectory prediction framework allowing for an efficient and consistent prediction of multi-agent multi-modal trajectories. We present a unified model architecture for simultaneous agent future heatmap estimation, in which we leverage hierarchical and sparse image generation for fast and memory-efficient inference. We propose a learnable trajectory recombination model that takes as input a set of predicted trajectories for each agent and outputs its consistent reordered recombination. This recombination module is able to realign the initially independent modalities so that they do no collide and are coherent with each other. We report our results on the Interaction multi-agent prediction challenge and rank $1^{st}$ on the online test leaderboard.
△ Less
Submitted 21 January, 2022; v1 submitted 13 October, 2021;
originally announced October 2021.
-
GOHOME: Graph-Oriented Heatmap Output for future Motion Estimation
Authors:
Thomas Gilles,
Stefano Sabatini,
Dzmitry Tsishkou,
Bogdan Stanciulescu,
Fabien Moutarde
Abstract:
In this paper, we propose GOHOME, a method leveraging graph representations of the High Definition Map and sparse projections to generate a heatmap output representing the future position probability distribution for a given agent in a traffic scene. This heatmap output yields an unconstrained 2D grid representation of agent future possible locations, allowing inherent multimodality and a measure…
▽ More
In this paper, we propose GOHOME, a method leveraging graph representations of the High Definition Map and sparse projections to generate a heatmap output representing the future position probability distribution for a given agent in a traffic scene. This heatmap output yields an unconstrained 2D grid representation of agent future possible locations, allowing inherent multimodality and a measure of the uncertainty of the prediction. Our graph-oriented model avoids the high computation burden of representing the surrounding context as squared images and processing it with classical CNNs, but focuses instead only on the most probable lanes where the agent could end up in the immediate future. GOHOME reaches 2$nd$ on Argoverse Motion Forecasting Benchmark on the MissRate$_6$ metric while achieving significant speed-up and memory burden diminution compared to Argoverse 1$^{st}$ place method HOME. We also highlight that heatmap output enables multimodal ensembling and improve 1$^{st}$ place MissRate$_6$ by more than 15$\%$ with our best ensemble on Argoverse. Finally, we evaluate and reach state-of-the-art performance on the other trajectory prediction datasets nuScenes and Interaction, demonstrating the generalizability of our method.
△ Less
Submitted 21 September, 2021; v1 submitted 4 September, 2021;
originally announced September 2021.
-
TrouSPI-Net: Spatio-temporal attention on parallel atrous convolutions and U-GRUs for skeletal pedestrian crossing prediction
Authors:
Joseph Gesnouin,
Steve Pechberti,
Bogdan Stanciulescu,
Fabien Moutarde
Abstract:
Understanding the behaviors and intentions of pedestrians is still one of the main challenges for vehicle autonomy, as accurate predictions of their intentions can guarantee their safety and driving comfort of vehicles. In this paper, we address pedestrian crossing prediction in urban traffic environments by linking the dynamics of a pedestrian's skeleton to a binary crossing intention. We introdu…
▽ More
Understanding the behaviors and intentions of pedestrians is still one of the main challenges for vehicle autonomy, as accurate predictions of their intentions can guarantee their safety and driving comfort of vehicles. In this paper, we address pedestrian crossing prediction in urban traffic environments by linking the dynamics of a pedestrian's skeleton to a binary crossing intention. We introduce TrouSPI-Net: a context-free, lightweight, multi-branch predictor. TrouSPI-Net extracts spatio-temporal features for different time resolutions by encoding pseudo-images sequences of skeletal joints' positions and processes them with parallel attention modules and atrous convolutions. The proposed approach is then enhanced by processing features such as relative distances of skeletal joints, bounding box positions, or ego-vehicle speed with U-GRUs. Using the newly proposed evaluation procedures for two large public naturalistic data sets for studying pedestrian behavior in traffic: JAAD and PIE, we evaluate TrouSPI-Net and analyze its performance. Experimental results show that TrouSPI-Net achieved 0.76 F1 score on JAAD and 0.80 F1 score on PIE, therefore outperforming current state-of-the-art while being lightweight and context-free.
△ Less
Submitted 7 September, 2021; v1 submitted 2 September, 2021;
originally announced September 2021.
-
Asymmetrical Bi-RNN for pedestrian trajectory encoding
Authors:
Raphaël Rozenberg,
Joseph Gesnouin,
Fabien Moutarde
Abstract:
Pedestrian motion behavior involves a combination of individual goals and social interactions with other agents. In this article, we present an asymmetrical bidirectional recurrent neural network architecture called U-RNN to encode pedestrian trajectories and evaluate its relevance to replace LSTMs for various forecasting models. Experimental results on the Trajnet++ benchmark show that the U-LSTM…
▽ More
Pedestrian motion behavior involves a combination of individual goals and social interactions with other agents. In this article, we present an asymmetrical bidirectional recurrent neural network architecture called U-RNN to encode pedestrian trajectories and evaluate its relevance to replace LSTMs for various forecasting models. Experimental results on the Trajnet++ benchmark show that the U-LSTM variant yields better results regarding every available metrics (ADE, FDE, Collision rate) than common trajectory encoders for a variety of approaches and interaction modules, suggesting that the proposed approach is a viable alternative to the de facto sequence encoding RNNs.
Our implementation of the asymmetrical Bi-RNNs for the Trajnet++ benchmark is available at: github.com/JosephGesnouin/Asymmetrical-Bi-RNNs-to-encode-pedestrian-trajectories
△ Less
Submitted 19 June, 2021; v1 submitted 1 June, 2021;
originally announced June 2021.
-
HOME: Heatmap Output for future Motion Estimation
Authors:
Thomas Gilles,
Stefano Sabatini,
Dzmitry Tsishkou,
Bogdan Stanciulescu,
Fabien Moutarde
Abstract:
In this paper, we propose HOME, a framework tackling the motion forecasting problem with an image output representing the probability distribution of the agent's future location. This method allows for a simple architecture with classic convolution networks coupled with attention mechanism for agent interactions, and outputs an unconstrained 2D top-view representation of the agent's possible futur…
▽ More
In this paper, we propose HOME, a framework tackling the motion forecasting problem with an image output representing the probability distribution of the agent's future location. This method allows for a simple architecture with classic convolution networks coupled with attention mechanism for agent interactions, and outputs an unconstrained 2D top-view representation of the agent's possible future. Based on this output, we design two methods to sample a finite set of agent's future locations. These methods allow us to control the optimization trade-off between miss rate and final displacement error for multiple modalities without having to retrain any part of the model. We apply our method to the Argoverse Motion Forecasting Benchmark and achieve 1st place on the online leaderboard.
△ Less
Submitted 2 June, 2021; v1 submitted 23 May, 2021;
originally announced May 2021.
-
End-to-End Model-Free Reinforcement Learning for Urban Driving using Implicit Affordances
Authors:
Marin Toromanoff,
Emilie Wirbel,
Fabien Moutarde
Abstract:
Reinforcement Learning (RL) aims at learning an optimal behavior policy from its own experiments and not rule-based control methods. However, there is no RL algorithm yet capable of handling a task as difficult as urban driving. We present a novel technique, coined implicit affordances, to effectively leverage RL for urban driving thus including lane kee**, pedestrians and vehicles avoidance, an…
▽ More
Reinforcement Learning (RL) aims at learning an optimal behavior policy from its own experiments and not rule-based control methods. However, there is no RL algorithm yet capable of handling a task as difficult as urban driving. We present a novel technique, coined implicit affordances, to effectively leverage RL for urban driving thus including lane kee**, pedestrians and vehicles avoidance, and traffic light detection. To our knowledge we are the first to present a successful RL agent handling such a complex task especially regarding the traffic light detection. Furthermore, we have demonstrated the effectiveness of our method by winning the Camera Only track of the CARLA challenge.
△ Less
Submitted 16 March, 2020; v1 submitted 25 November, 2019;
originally announced November 2019.
-
Is Deep Reinforcement Learning Really Superhuman on Atari? Leveling the playing field
Authors:
Marin Toromanoff,
Emilie Wirbel,
Fabien Moutarde
Abstract:
Consistent and reproducible evaluation of Deep Reinforcement Learning (DRL) is not straightforward. In the Arcade Learning Environment (ALE), small changes in environment parameters such as stochasticity or the maximum allowed play time can lead to very different performance. In this work, we discuss the difficulties of comparing different agents trained on ALE. In order to take a step further tow…
▽ More
Consistent and reproducible evaluation of Deep Reinforcement Learning (DRL) is not straightforward. In the Arcade Learning Environment (ALE), small changes in environment parameters such as stochasticity or the maximum allowed play time can lead to very different performance. In this work, we discuss the difficulties of comparing different agents trained on ALE. In order to take a step further towards reproducible and comparable DRL, we introduce SABER, a Standardized Atari BEnchmark for general Reinforcement learning algorithms. Our methodology extends previous recommendations and contains a complete set of environment parameters as well as train and test procedures. We then use SABER to evaluate the current state of the art, Rainbow. Furthermore, we introduce a human world records baseline, and argue that previous claims of expert or superhuman performance of DRL might not be accurate. Finally, we propose Rainbow-IQN by extending Rainbow with Implicit Quantile Networks (IQN) leading to new state-of-the-art performance. Source code is available for reproducibility.
△ Less
Submitted 8 November, 2019; v1 submitted 13 August, 2019;
originally announced August 2019.
-
Multiview Based 3D Scene Understanding On Partial Point Sets
Authors:
Ye Zhu,
Sven Ewan Shepstone,
Pablo Martínez-Nuevo,
Miklas Strøm Kristoffersen,
Fabien Moutarde,
Zhuang Fu
Abstract:
Deep learning within the context of point clouds has gained much research interest in recent years mostly due to the promising results that have been achieved on a number of challenging benchmarks, such as 3D shape recognition and scene semantic segmentation. In many realistic settings however, snapshots of the environment are often taken from a single view, which only contains a partial set of th…
▽ More
Deep learning within the context of point clouds has gained much research interest in recent years mostly due to the promising results that have been achieved on a number of challenging benchmarks, such as 3D shape recognition and scene semantic segmentation. In many realistic settings however, snapshots of the environment are often taken from a single view, which only contains a partial set of the scene due to the field of view restriction of commodity cameras. 3D scene semantic understanding on partial point clouds is considered as a challenging task. In this work, we propose a processing approach for 3D point cloud data based on a multiview representation of the existing 360° point clouds. By fusing the original 360° point clouds and their corresponding 3D multiview representations as input data, a neural network is able to recognize partial point sets while improving the general performance on complete point sets, resulting in an overall increase of 31.9% and 4.3% in segmentation accuracy for partial and complete scene semantic understanding, respectively. This method can also be applied in a wider 3D recognition context such as 3D part segmentation.
△ Less
Submitted 30 November, 2018;
originally announced December 2018.
-
Coupled Longitudinal and Lateral Control of a Vehicle using Deep Learning
Authors:
Guillaume Devineau,
Philip Polack,
Florent Altché,
Fabien Moutarde
Abstract:
This paper explores the capability of deep neural networks to capture key characteristics of vehicle dynamics, and their ability to perform coupled longitudinal and lateral control of a vehicle. To this extent, two different artificial neural networks are trained to compute vehicle controls corresponding to a reference trajectory, using a dataset based on high-fidelity simulations of vehicle dynam…
▽ More
This paper explores the capability of deep neural networks to capture key characteristics of vehicle dynamics, and their ability to perform coupled longitudinal and lateral control of a vehicle. To this extent, two different artificial neural networks are trained to compute vehicle controls corresponding to a reference trajectory, using a dataset based on high-fidelity simulations of vehicle dynamics. In this study, control inputs are chosen as the steering angle of the front wheels, and the applied torque on each wheel. The performance of both models, namely a Multi-Layer Perceptron (MLP) and a Convolutional Neural Network (CNN), is evaluated based on their ability to drive the vehicle on a challenging test track, shifting between long straight lines and tight curves. A comparison to conventional decoupled controllers on the same track is also provided.
△ Less
Submitted 22 October, 2018;
originally announced October 2018.
-
End to End Vehicle Lateral Control Using a Single Fisheye Camera
Authors:
Marin Toromanoff,
Emilie Wirbel,
Frédéric Wilhelm,
Camilo Vejarano,
Xavier Perrotton,
Fabien Moutarde
Abstract:
Convolutional neural networks are commonly used to control the steering angle for autonomous cars. Most of the time, multiple long range cameras are used to generate lateral failure cases. In this paper we present a novel model to generate this data and label augmentation using only one short range fisheye camera. We present our simulator and how it can be used as a consistent metric for lateral e…
▽ More
Convolutional neural networks are commonly used to control the steering angle for autonomous cars. Most of the time, multiple long range cameras are used to generate lateral failure cases. In this paper we present a novel model to generate this data and label augmentation using only one short range fisheye camera. We present our simulator and how it can be used as a consistent metric for lateral end-to-end control evaluation. Experiments are conducted on a custom dataset corresponding to more than 10000 km and 200 hours of open road driving. Finally we evaluate this model on real world driving scenarios, open road and a custom test track with challenging obstacle avoidance and sharp turns. In our simulator based on real-world videos, the final model was capable of more than 99% autonomy on urban road
△ Less
Submitted 20 August, 2018;
originally announced August 2018.
-
Monocular Urban Localization using Street View
Authors:
Li Yu,
Cyril Joly,
Guillaume Bresson,
Fabien Moutarde
Abstract:
This paper presents a metric global localization in the urban environment only with a monocular camera and the Google Street View database. We fully leverage the abundant sources from the Street View and benefits from its topo-metric structure to build a coarse-to-fine positioning, namely a topological place recognition process and then a metric pose estimation by local bundle adjustment. Our meth…
▽ More
This paper presents a metric global localization in the urban environment only with a monocular camera and the Google Street View database. We fully leverage the abundant sources from the Street View and benefits from its topo-metric structure to build a coarse-to-fine positioning, namely a topological place recognition process and then a metric pose estimation by local bundle adjustment. Our method is tested on a 3 km urban environment and demonstrates both sub-meter accuracy and robustness to viewpoint changes, illumination and occlusion. To our knowledge, this is the first work that studies the global urban localization simply with a single camera and Street View.
△ Less
Submitted 16 June, 2016; v1 submitted 17 May, 2016;
originally announced May 2016.
-
A Distributed Model Predictive Control Framework for Road-Following Formation Control of Car-like Vehicles (Extended Version)
Authors:
Xiangjun Qian,
Florent Altché,
Arnaud de La Fortelle,
Fabien Moutarde
Abstract:
This work presents a novel framework for the formation control of multiple autonomous ground vehicles in an on-road environment. Unique challenges of this problem lie in 1) the design of collision avoidance strategies with obstacles and with other vehicles in a highly structured environment, 2) dynamic reconfiguration of the formation to handle different task specifications. In this paper, we desi…
▽ More
This work presents a novel framework for the formation control of multiple autonomous ground vehicles in an on-road environment. Unique challenges of this problem lie in 1) the design of collision avoidance strategies with obstacles and with other vehicles in a highly structured environment, 2) dynamic reconfiguration of the formation to handle different task specifications. In this paper, we design a local MPC-based tracking controller for each individual vehicle to follow a reference trajectory while satisfying various constraints (kinematics and dynamics, collision avoidance, \textit{etc.}). The reference trajectory of a vehicle is computed from its leader's trajectory, based on a pre-defined formation tree. We use logic rules to organize the collision avoidance behaviors of member vehicles. Moreover, we propose a methodology to safely reconfigure the formation on-the-fly. The proposed framework has been validated using high-fidelity simulations.
△ Less
Submitted 29 April, 2016;
originally announced May 2016.
-
Priority-based coordination of autonomous and legacy vehicles at intersection
Authors:
Xiangjun Qian,
Jean Gregoire,
Fabien Moutarde,
Arnaud De La Fortelle
Abstract:
Recently, researchers have proposed various autonomous intersection management techniques that enable autonomous vehicles to cross the intersection without traffic lights or stop signs. In particular, a priority-based coordination system with provable collision-free and deadlock-free features has been presented. In this paper, we extend the priority-based approach to support legacy vehicles withou…
▽ More
Recently, researchers have proposed various autonomous intersection management techniques that enable autonomous vehicles to cross the intersection without traffic lights or stop signs. In particular, a priority-based coordination system with provable collision-free and deadlock-free features has been presented. In this paper, we extend the priority-based approach to support legacy vehicles without compromising above-mentioned features. We make the hypothesis that legacy vehicles are able to keep a safe distance from their leading vehicles. Then we explore some special configurations of system that ensures the safe crossing of legacy vehicles. We implement the extended system in a realistic traffic simulator SUMO. Simulations are performed to demonstrate the safety of the system.
△ Less
Submitted 26 September, 2014; v1 submitted 22 July, 2014;
originally announced July 2014.
-
Statistical Traffic State Analysis in Large-scale Transportation Networks Using Locality-Preserving Non-negative Matrix Factorization
Authors:
Yufei Han,
Fabien Moutarde
Abstract:
Statistical traffic data analysis is a hot topic in traffic management and control. In this field, current research progresses focus on analyzing traffic flows of individual links or local regions in a transportation network. Less attention are paid to the global view of traffic states over the entire network, which is important for modeling large-scale traffic scenes. Our aim is precisely to prop…
▽ More
Statistical traffic data analysis is a hot topic in traffic management and control. In this field, current research progresses focus on analyzing traffic flows of individual links or local regions in a transportation network. Less attention are paid to the global view of traffic states over the entire network, which is important for modeling large-scale traffic scenes. Our aim is precisely to propose a new methodology for extracting spatio-temporal traffic patterns, ultimately for modeling large-scale traffic dynamics, and long-term traffic forecasting. We attack this issue by utilizing Locality-Preserving Non-negative Matrix Factorization (LPNMF) to derive low-dimensional representation of network-level traffic states. Clustering is performed on the compact LPNMF projections to unveil typical spatial patterns and temporal dynamics of network-level traffic states. We have tested the proposed method on simulated traffic data generated for a large-scale road network, and reported experimental results validate the ability of our approach for extracting meaningful large-scale space-time traffic patterns. Furthermore, the derived clustering results provide an intuitive understanding of spatial-temporal characteristics of traffic flows in the large-scale network, and a basis for potential long-term forecasting.
△ Less
Submitted 20 December, 2012;
originally announced December 2012.
-
Analysis of Large-scale Traffic Dynamics using Non-negative Tensor Factorization
Authors:
Yufei Han,
Fabien Moutarde
Abstract:
In this paper, we present our work on clustering and prediction of temporal dynamics of global congestion configurations in large-scale road networks. Instead of looking into temporal traffic state variation of individual links, or of small areas, we focus on spatial congestion configurations of the whole network. In our work, we aim at describing the typical temporal dynamic patterns of this netw…
▽ More
In this paper, we present our work on clustering and prediction of temporal dynamics of global congestion configurations in large-scale road networks. Instead of looking into temporal traffic state variation of individual links, or of small areas, we focus on spatial congestion configurations of the whole network. In our work, we aim at describing the typical temporal dynamic patterns of this network-level traffic state and achieving long-term prediction of the large-scale traffic dynamics, in a unified data-mining framework. To this end, we formulate this joint task using Non-negative Tensor Factorization (NTF), which has been shown to be a useful decomposition tools for multivariate data sequences. Clustering and prediction are performed based on the compact tensor factorization results. Experiments on large-scale simulated data illustrate the interest of our method with promising results for long-term forecast of traffic evolution.
△ Less
Submitted 18 December, 2012;
originally announced December 2012.
-
Joint interpretation of on-board vision and static GPS cartography for determination of correct speed limit
Authors:
Alexandre Bargeton,
Fabien Moutarde,
Fawzi Nashashibi,
Anne-Sophie Puthon
Abstract:
We present here a first prototype of a "Speed Limit Support" Advance Driving Assistance System (ADAS) producing permanent reliable information on the current speed limit applicable to the vehicle. Such a module can be used either for information of the driver, or could even serve for automatic setting of the maximum speed of a smart Adaptive Cruise Control (ACC). Our system is based on a joint int…
▽ More
We present here a first prototype of a "Speed Limit Support" Advance Driving Assistance System (ADAS) producing permanent reliable information on the current speed limit applicable to the vehicle. Such a module can be used either for information of the driver, or could even serve for automatic setting of the maximum speed of a smart Adaptive Cruise Control (ACC). Our system is based on a joint interpretation of cartographic information (for static reference information) with on-board vision, used for traffic sign detection and recognition (including supplementary sub-signs) and visual road lines localization (for detection of lane changes). The visual traffic sign detection part is quite robust (90% global correct detection and recognition for main speed signs, and 80% for exit-lane sub-signs detection). Our approach for joint interpretation with cartography is original, and logic-based rather than probability-based, which allows correct behaviour even in cases, which do happen, when both vision and cartography may provide the same erroneous information.
△ Less
Submitted 19 October, 2010;
originally announced October 2010.
-
Modular Traffic Sign Recognition applied to on-vehicle real-time visual detection of American and European speed limit signs
Authors:
Fabien Moutarde,
Alexandre Bargeton,
Anne Herbin,
Lowik Chanussot
Abstract:
We present a new modular traffic signs recognition system, successfully applied to both American and European speed limit signs. Our sign detection step is based only on shape-detection (rectangles or circles). This enables it to work on grayscale images, contrary to most European competitors, which eases robustness to illumination conditions (notably night operation). Speed sign candidates are…
▽ More
We present a new modular traffic signs recognition system, successfully applied to both American and European speed limit signs. Our sign detection step is based only on shape-detection (rectangles or circles). This enables it to work on grayscale images, contrary to most European competitors, which eases robustness to illumination conditions (notably night operation). Speed sign candidates are classified (or rejected) by segmenting potential digits inside them (which is rather original and has several advantages), and then applying a neural digit recognition. The global detection rate is ~90% for both (standard) U.S. and E.U. speed signs, with a misclassification rate <1%, and no validated false alarm in >150 minutes of video. The system processes in real-time ~20 frames/s on a standard high-end laptop.
△ Less
Submitted 7 October, 2009;
originally announced October 2009.
-
Visual object categorization with new keypoint-based adaBoost features
Authors:
Taoufik Bdiri,
Fabien Moutarde,
Bruno Steux
Abstract:
We present promising results for visual object categorization, obtained with adaBoost using new original ?keypoints-based features?. These weak-classifiers produce a boolean response based on presence or absence in the tested image of a ?keypoint? (a kind of SURF interest point) with a descriptor sufficiently similar (i.e. within a given distance) to a reference descriptor characterizing the fea…
▽ More
We present promising results for visual object categorization, obtained with adaBoost using new original ?keypoints-based features?. These weak-classifiers produce a boolean response based on presence or absence in the tested image of a ?keypoint? (a kind of SURF interest point) with a descriptor sufficiently similar (i.e. within a given distance) to a reference descriptor characterizing the feature. A first experiment was conducted on a public image dataset containing lateral-viewed cars, yielding 95% recall with 95% precision on test set. Preliminary tests on a small subset of a pedestrians database also gives promising 97% recall with 92 % precision, which shows the generality of our new family of features. Moreover, analysis of the positions of adaBoost-selected keypoints show that they correspond to a specific part of the object category (such as ?wheel? or ?side skirt? in the case of lateral-cars) and thus have a ?semantic? meaning. We also made a first test on video for detecting vehicles from adaBoostselected keypoints filtered in real-time from all detected keypoints.
△ Less
Submitted 7 October, 2009;
originally announced October 2009.
-
Introducing New AdaBoost Features for Real-Time Vehicle Detection
Authors:
Bogdan Stanciulescu,
Amaury Breheret,
Fabien Moutarde
Abstract:
This paper shows how to improve the real-time object detection in complex robotics applications, by exploring new visual features as AdaBoost weak classifiers. These new features are symmetric Haar filters (enforcing global horizontal and vertical symmetry) and N-connexity control points. Experimental evaluation on a car database show that the latter appear to provide the best results for the ve…
▽ More
This paper shows how to improve the real-time object detection in complex robotics applications, by exploring new visual features as AdaBoost weak classifiers. These new features are symmetric Haar filters (enforcing global horizontal and vertical symmetry) and N-connexity control points. Experimental evaluation on a car database show that the latter appear to provide the best results for the vehicle-detection problem.
△ Less
Submitted 7 October, 2009;
originally announced October 2009.
-
Adaboost with "Keypoint Presence Features" for Real-Time Vehicle Visual Detection
Authors:
Taoufik Bdiri,
Fabien Moutarde,
Nicolas Bourdis,
Bruno Steux
Abstract:
We present promising results for real-time vehicle visual detection, obtained with adaBoost using new original ?keypoints presence features?. These weak-classifiers produce a boolean response based on presence or absence in the tested image of a ?keypoint? (~ a SURF interest point) with a descriptor sufficiently similar (i.e. within a given distance) to a reference descriptor characterizing the…
▽ More
We present promising results for real-time vehicle visual detection, obtained with adaBoost using new original ?keypoints presence features?. These weak-classifiers produce a boolean response based on presence or absence in the tested image of a ?keypoint? (~ a SURF interest point) with a descriptor sufficiently similar (i.e. within a given distance) to a reference descriptor characterizing the feature. A first experiment was conducted on a public image dataset containing lateral-viewed cars, yielding 95% recall with 95% precision on test set. Moreover, analysis of the positions of adaBoost-selected keypoints show that they correspond to a specific part of the object category (such as ?wheel? or ?side skirt?) and thus have a ?semantic? meaning.
△ Less
Submitted 7 October, 2009;
originally announced October 2009.