Search | arXiv e-print repository

NeRAF: 3D Scene Infused Neural Radiance and Acoustic Fields

Authors: Amandine Brunetto, Sascha Hornauer, Fabien Moutarde

Abstract: Sound plays a major role in human perception, providing essential scene information alongside vision for understanding our environment. Despite progress in neural implicit representations, learning acoustics that match a visual scene is still challenging. We propose NeRAF, a method that jointly learns acoustic and radiance fields. NeRAF is designed as a Nerfstudio module for convenient access to r… ▽ More Sound plays a major role in human perception, providing essential scene information alongside vision for understanding our environment. Despite progress in neural implicit representations, learning acoustics that match a visual scene is still challenging. We propose NeRAF, a method that jointly learns acoustic and radiance fields. NeRAF is designed as a Nerfstudio module for convenient access to realistic audio-visual generation. It synthesizes both novel views and spatialized audio at new positions, leveraging radiance field capabilities to condition the acoustic field with 3D scene information. At inference, each modality can be rendered independently and at spatially separated positions, providing greater versatility. We demonstrate the advantages of our method on the SoundSpaces dataset. NeRAF achieves substantial performance improvements over previous works while being more data-efficient. Furthermore, NeRAF enhances novel view synthesis of complex scenes trained with sparse data through cross-modal learning. △ Less

Submitted 28 May, 2024; originally announced May 2024.

Comments: Project Page: https://amandinebtto.github.io/NeRAF

arXiv:2402.05663 [pdf, other]

Mesoscale Traffic Forecasting for Real-Time Bottleneck and Shockwave Prediction

Authors: Raphael Chekroun, Han Wang, Jonathan Lee, Marin Toromanoff, Sascha Hornauer, Fabien Moutarde, Maria Laura Delle Monache

Abstract: Accurate real-time traffic state forecasting plays a pivotal role in traffic control research. In particular, the CIRCLES consortium project necessitates predictive techniques to mitigate the impact of data source delays. After the success of the MegaVanderTest experiment, this paper aims at overcoming the current system limitations and develop a more suited approach to improve the real-time traff… ▽ More Accurate real-time traffic state forecasting plays a pivotal role in traffic control research. In particular, the CIRCLES consortium project necessitates predictive techniques to mitigate the impact of data source delays. After the success of the MegaVanderTest experiment, this paper aims at overcoming the current system limitations and develop a more suited approach to improve the real-time traffic state estimation for the next iterations of the experiment. In this paper, we introduce the SA-LSTM, a deep forecasting method integrating Self-Attention (SA) on the spatial dimension with Long Short-Term Memory (LSTM) yielding state-of-the-art results in real-time mesoscale traffic forecasting. We extend this approach to multi-step forecasting with the n-step SA-LSTM, which outperforms traditional multi-step forecasting methods in the trade-off between short-term and long-term predictions, all while operating in real-time. △ Less

Submitted 4 March, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

arXiv:2312.09394 [pdf, other]

HiER: Highlight Experience Replay for Boosting Off-Policy Reinforcement Learning Agents

Authors: Dániel Horváth, Jesús Bujalance Martín, Ferenc Gábor Erdős, Zoltán Istenes, Fabien Moutarde

Abstract: Even though reinforcement-learning-based algorithms achieved superhuman performance in many domains, the field of robotics poses significant challenges as the state and action spaces are continuous, and the reward function is predominantly sparse. Furthermore, on many occasions, the agent is devoid of access to any form of demonstration. Inspired by human learning, in this work, we propose a metho… ▽ More Even though reinforcement-learning-based algorithms achieved superhuman performance in many domains, the field of robotics poses significant challenges as the state and action spaces are continuous, and the reward function is predominantly sparse. Furthermore, on many occasions, the agent is devoid of access to any form of demonstration. Inspired by human learning, in this work, we propose a method named highlight experience replay (HiER) that creates a secondary highlight replay buffer for the most relevant experiences. For the weights update, the transitions are sampled from both the standard and the highlight experience replay buffer. It can be applied with or without the techniques of hindsight experience replay (HER) and prioritized experience replay (PER). Our method significantly improves the performance of the state-of-the-art, validated on 8 tasks of three robotic benchmarks. Furthermore, to exploit the full potential of HiER, we propose HiER+ in which HiER is enhanced with an arbitrary data collection curriculum learning method. Our implementation, the qualitative results, and a video presentation are available on the project site: http://www.danielhorvath.eu/hier/. △ Less

Submitted 9 July, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

Comments: Accpeted for publication in IEEE Access

arXiv:2309.08452 [pdf, other]

MBAPPE: MCTS-Built-Around Prediction for Planning Explicitly

Authors: Raphael Chekroun, Thomas Gilles, Marin Toromanoff, Sascha Hornauer, Fabien Moutarde

Abstract: We present MBAPPE, a novel approach to motion planning for autonomous driving combining tree search with a partially-learned model of the environment. Leveraging the inherent explainable exploration and optimization capabilities of the Monte-Carlo Search Tree (MCTS), our method addresses complex decision-making in a dynamic environment. We propose a framework that combines MCTS with supervised lea… ▽ More We present MBAPPE, a novel approach to motion planning for autonomous driving combining tree search with a partially-learned model of the environment. Leveraging the inherent explainable exploration and optimization capabilities of the Monte-Carlo Search Tree (MCTS), our method addresses complex decision-making in a dynamic environment. We propose a framework that combines MCTS with supervised learning, enabling the autonomous vehicle to effectively navigate through diverse scenarios. Experimental results demonstrate the effectiveness and adaptability of our approach, showcasing improved real-time decision-making and collision avoidance. This paper contributes to the field by providing a robust solution for motion planning in autonomous driving systems, enhancing their explainability and reliability. △ Less

Submitted 15 September, 2023; originally announced September 2023.

arXiv:2305.08190 [pdf, other]

TSGN: Temporal Scene Graph Neural Networks with Projected Vectorized Representation for Multi-Agent Motion Prediction

Authors: Yunong Wu, Thomas Gilles, Bogdan Stanciulescu, Fabien Moutarde

Abstract: Predicting future motions of nearby agents is essential for an autonomous vehicle to take safe and effective actions. In this paper, we propose TSGN, a framework using Temporal Scene Graph Neural Networks with projected vectorized representations for multi-agent trajectory prediction. Projected vectorized representation models the traffic scene as a graph which is constructed by a set of vectors.… ▽ More Predicting future motions of nearby agents is essential for an autonomous vehicle to take safe and effective actions. In this paper, we propose TSGN, a framework using Temporal Scene Graph Neural Networks with projected vectorized representations for multi-agent trajectory prediction. Projected vectorized representation models the traffic scene as a graph which is constructed by a set of vectors. These vectors represent agents, road network, and their spatial relative relationships. All relative features under this representation are both translationand rotation-invariant. Based on this representation, TSGN captures the spatial-temporal features across agents, road network, interactions among them, and temporal dependencies of temporal traffic scenes. TSGN can predict multimodal future trajectories for all agents simultaneously, plausibly, and accurately. Meanwhile, we propose a Hierarchical Lane Transformer for capturing interactions between agents and road network, which filters the surrounding road network and only keeps the most probable lane segments which could have an impact on the future behavior of the target agent. Without sacrificing the prediction performance, this greatly reduces the computational burden. Experiments show TSGN achieves state-of-the-art performance on the Argoverse motion forecasting benchmar. △ Less

Submitted 14 May, 2023; originally announced May 2023.

Comments: 8 pages

arXiv:2303.07257 [pdf, other]

doi 10.1109/IROS55552.2023.10341715

The Audio-Visual BatVision Dataset for Research on Sight and Sound

Authors: Amandine Brunetto, Sascha Hornauer, Stella X. Yu, Fabien Moutarde

Abstract: Vision research showed remarkable success in understanding our world, propelled by datasets of images and videos. Sensor data from radar, LiDAR and cameras supports research in robotics and autonomous driving for at least a decade. However, while visual sensors may fail in some conditions, sound has recently shown potential to complement sensor data. Simulated room impulse responses (RIR) in 3D ap… ▽ More Vision research showed remarkable success in understanding our world, propelled by datasets of images and videos. Sensor data from radar, LiDAR and cameras supports research in robotics and autonomous driving for at least a decade. However, while visual sensors may fail in some conditions, sound has recently shown potential to complement sensor data. Simulated room impulse responses (RIR) in 3D apartment-models became a benchmark dataset for the community, fostering a range of audiovisual research. In simulation, depth is predictable from sound, by learning bat-like perception with a neural network. Concurrently, the same was achieved in reality by using RGB-D images and echoes of chir** sounds. Biomimicking bat perception is an exciting new direction but needs dedicated datasets to explore the potential. Therefore, we collected the BatVision dataset to provide large-scale echoes in complex real-world scenes to the community. We equipped a robot with a speaker to emit chirps and a binaural microphone to record their echoes. Synchronized RGB-D images from the same perspective provide visual labels of traversed spaces. We sampled modern US office spaces to historic French university grounds, indoor and outdoor with large architectural variety. This dataset will allow research on robot echolocation, general audio-visual tasks and sound phænomena unavailable in simulated data. We show promising results for audio-only depth prediction and show how state-of-the-art work developed for simulated data can also succeed on our dataset. Project page: https://amandinebtto.github.io/Batvision-Dataset/ △ Less

Submitted 1 March, 2024; v1 submitted 13 March, 2023; originally announced March 2023.

Comments: Project page https://amandinebtto.github.io/Batvision-Dataset/ This version contains camera ready paper

Journal ref: 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

arXiv:2205.07310 [pdf, other]

Uncertainty estimation for Cross-dataset performance in Trajectory prediction

Authors: Thomas Gilles, Stefano Sabatini, Dzmitry Tsishkou, Bogdan Stanciulescu, Fabien Moutarde

Abstract: While a lot of work has been carried on develo** trajectory prediction methods, and various datasets have been proposed for benchmarking this task, little study has been done so far on the generalizability and the transferability of these methods across dataset. In this paper, we observe the performance of two of the latest state-of-the-art trajectory prediction methods across four different dat… ▽ More While a lot of work has been carried on develo** trajectory prediction methods, and various datasets have been proposed for benchmarking this task, little study has been done so far on the generalizability and the transferability of these methods across dataset. In this paper, we observe the performance of two of the latest state-of-the-art trajectory prediction methods across four different datasets (Argoverse, NuScenes, Interaction, Shifts). This analysis allows to gain some insights on the generalizability proprieties of most recent trajectory prediction models and to analyze which dataset is more representative of real driving scenes and therefore enables better transferability. Furthermore we present a novel method to estimate prediction uncertainty and show how it could be used to achieve better performance across datasets. △ Less

Submitted 12 July, 2022; v1 submitted 15 May, 2022; originally announced May 2022.

Comments: Workshop on Fresh Perspectives on the Future of Autonomous Driving, ICRA 2022

arXiv:2201.12626 [pdf, other]

Assessing Cross-dataset Generalization of Pedestrian Crossing Predictors

Authors: Joseph Gesnouin, Steve Pechberti, Bogdan Stanciulescu, Fabien Moutarde

Abstract: Pedestrian crossing prediction has been a topic of active research, resulting in many new algorithmic solutions. While measuring the overall progress of those solutions over time tends to be more and more established due to the new publicly available benchmark and standardized evaluation procedures, knowing how well existing predictors react to unseen data remains an unanswered question. This eval… ▽ More Pedestrian crossing prediction has been a topic of active research, resulting in many new algorithmic solutions. While measuring the overall progress of those solutions over time tends to be more and more established due to the new publicly available benchmark and standardized evaluation procedures, knowing how well existing predictors react to unseen data remains an unanswered question. This evaluation is imperative as serviceable crossing behavior predictors should be set to work in various scenarii without compromising pedestrian safety due to misprediction. To this end, we conduct a study based on direct cross-dataset evaluation. Our experiments show that current state-of-the-art pedestrian behavior predictors generalize poorly in cross-dataset evaluation scenarii, regardless of their robustness during a direct training-test set evaluation setting. In the light of what we observe, we argue that the future of pedestrian crossing prediction, e.g. reliable and generalizable implementations, should not be about tailoring models, trained with very little available data, and tested in a classical train-test scenario with the will to infer anything about their behavior in real life. It should be about evaluating models in a cross-dataset setting while considering their uncertainty estimates under domain shift. △ Less

Submitted 29 January, 2022; originally announced January 2022.

Comments: Submitted to the 33rd IEEE Intelligent Vehicles Symposium

arXiv:2201.03834 [pdf, other]

STIR$^2$: Reward Relabelling for combined Reinforcement and Imitation Learning on sparse-reward tasks

Authors: Jesus Bujalance Martin, Fabien Moutarde

Abstract: In the search for more sample-efficient reinforcement-learning (RL) algorithms, a promising direction is to leverage as much external off-policy data as possible. For instance, expert demonstrations. In the past, multiple ideas have been proposed to make good use of the demonstrations added to the replay buffer, such as pretraining on demonstrations only or minimizing additional cost functions. We… ▽ More In the search for more sample-efficient reinforcement-learning (RL) algorithms, a promising direction is to leverage as much external off-policy data as possible. For instance, expert demonstrations. In the past, multiple ideas have been proposed to make good use of the demonstrations added to the replay buffer, such as pretraining on demonstrations only or minimizing additional cost functions. We present a new method, able to leverage both demonstrations and episodes collected online in any sparse-reward environment with any off-policy algorithm. Our method is based on a reward bonus given to demonstrations and successful episodes (via relabeling), encouraging expert imitation and self-imitation. Our experiments focus on several robotic-manipulation tasks across two different simulation environments. We show that our method based on reward relabeling improves the performance of the base algorithm (SAC and DDPG) on these tasks. Finally, our best algorithm STIR$^2$ (Self and Teacher Imitation by Reward Relabeling), which integrates into our method multiple improvements from previous works, is more data-efficient than all baselines. △ Less

Submitted 28 February, 2023; v1 submitted 11 January, 2022; originally announced January 2022.

Comments: arXiv admin note: substantial text overlap with arXiv:2110.14464

arXiv:2111.08575 [pdf, other]

GRI: General Reinforced Imitation and its Application to Vision-Based Autonomous Driving

Authors: Raphael Chekroun, Marin Toromanoff, Sascha Hornauer, Fabien Moutarde

Abstract: Deep reinforcement learning (DRL) has been demonstrated to be effective for several complex decision-making applications such as autonomous driving and robotics. However, DRL is notoriously limited by its high sample complexity and its lack of stability. Prior knowledge, e.g. as expert demonstrations, is often available but challenging to leverage to mitigate these issues. In this paper, we propos… ▽ More Deep reinforcement learning (DRL) has been demonstrated to be effective for several complex decision-making applications such as autonomous driving and robotics. However, DRL is notoriously limited by its high sample complexity and its lack of stability. Prior knowledge, e.g. as expert demonstrations, is often available but challenging to leverage to mitigate these issues. In this paper, we propose General Reinforced Imitation (GRI), a novel method which combines benefits from exploration and expert data and is straightforward to implement over any off-policy RL algorithm. We make one simplifying hypothesis: expert demonstrations can be seen as perfect data whose underlying policy gets a constant high reward. Based on this assumption, GRI introduces the notion of offline demonstration agents. This agent sends expert data which are processed both concurrently and indistinguishably with the experiences coming from the online RL exploration agent. We show that our approach enables major improvements on vision-based autonomous driving in urban environments. We further validate the GRI method on Mujoco continuous control tasks with different off-policy RL algorithms. Our method ranked first on the CARLA Leaderboard and outperforms World on Rails, the previous state-of-the-art, by 17%. △ Less

Submitted 17 May, 2022; v1 submitted 16 November, 2021; originally announced November 2021.

arXiv:2110.14464 [pdf, other]

Learning from demonstrations with SACR2: Soft Actor-Critic with Reward Relabeling

Authors: Jesus Bujalance Martin, Raphael Chekroun, Fabien Moutarde

Abstract: During recent years, deep reinforcement learning (DRL) has made successful incursions into complex decision-making applications such as robotics, autonomous driving or video games. Off-policy algorithms tend to be more sample-efficient than their on-policy counterparts, and can additionally benefit from any off-policy data stored in the replay buffer. Expert demonstrations are a popular source for… ▽ More During recent years, deep reinforcement learning (DRL) has made successful incursions into complex decision-making applications such as robotics, autonomous driving or video games. Off-policy algorithms tend to be more sample-efficient than their on-policy counterparts, and can additionally benefit from any off-policy data stored in the replay buffer. Expert demonstrations are a popular source for such data: the agent is exposed to successful states and actions early on, which can accelerate the learning process and improve performance. In the past, multiple ideas have been proposed to make good use of the demonstrations in the buffer, such as pretraining on demonstrations only or minimizing additional cost functions. We carry on a study to evaluate several of these ideas in isolation, to see which of them have the most significant impact. We also present a new method for sparse-reward tasks, based on a reward bonus given to demonstrations and successful episodes. First, we give a reward bonus to the transitions coming from demonstrations to encourage the agent to match the demonstrated behaviour. Then, upon collecting a successful episode, we relabel its transitions with the same bonus before adding them to the replay buffer, encouraging the agent to also match its previous successes. The base algorithm for our experiments is the popular Soft Actor-Critic (SAC), a state-of-the-art off-policy algorithm for continuous action spaces. Our experiments focus on manipulation robotics, specifically on a 3D reaching task for a robotic arm in simulation. We show that our method SACR2 based on reward relabeling improves the performance on this task, even in the absence of demonstrations. △ Less

Submitted 3 December, 2021; v1 submitted 27 October, 2021; originally announced October 2021.

Comments: Presented at Deep RL Workshop, NeurIPS 2021

arXiv:2110.06607 [pdf, other]

THOMAS: Trajectory Heatmap Output with learned Multi-Agent Sampling

Authors: Thomas Gilles, Stefano Sabatini, Dzmitry Tsishkou, Bogdan Stanciulescu, Fabien Moutarde

Abstract: In this paper, we propose THOMAS, a joint multi-agent trajectory prediction framework allowing for an efficient and consistent prediction of multi-agent multi-modal trajectories. We present a unified model architecture for simultaneous agent future heatmap estimation, in which we leverage hierarchical and sparse image generation for fast and memory-efficient inference. We propose a learnable traje… ▽ More In this paper, we propose THOMAS, a joint multi-agent trajectory prediction framework allowing for an efficient and consistent prediction of multi-agent multi-modal trajectories. We present a unified model architecture for simultaneous agent future heatmap estimation, in which we leverage hierarchical and sparse image generation for fast and memory-efficient inference. We propose a learnable trajectory recombination model that takes as input a set of predicted trajectories for each agent and outputs its consistent reordered recombination. This recombination module is able to realign the initially independent modalities so that they do no collide and are coherent with each other. We report our results on the Interaction multi-agent prediction challenge and rank $1^{st}$ on the online test leaderboard. △ Less

Submitted 21 January, 2022; v1 submitted 13 October, 2021; originally announced October 2021.

arXiv:2109.01827 [pdf, other]

GOHOME: Graph-Oriented Heatmap Output for future Motion Estimation

Authors: Thomas Gilles, Stefano Sabatini, Dzmitry Tsishkou, Bogdan Stanciulescu, Fabien Moutarde

Abstract: In this paper, we propose GOHOME, a method leveraging graph representations of the High Definition Map and sparse projections to generate a heatmap output representing the future position probability distribution for a given agent in a traffic scene. This heatmap output yields an unconstrained 2D grid representation of agent future possible locations, allowing inherent multimodality and a measure… ▽ More In this paper, we propose GOHOME, a method leveraging graph representations of the High Definition Map and sparse projections to generate a heatmap output representing the future position probability distribution for a given agent in a traffic scene. This heatmap output yields an unconstrained 2D grid representation of agent future possible locations, allowing inherent multimodality and a measure of the uncertainty of the prediction. Our graph-oriented model avoids the high computation burden of representing the surrounding context as squared images and processing it with classical CNNs, but focuses instead only on the most probable lanes where the agent could end up in the immediate future. GOHOME reaches 2$nd$ on Argoverse Motion Forecasting Benchmark on the MissRate$_6$ metric while achieving significant speed-up and memory burden diminution compared to Argoverse 1$^{st}$ place method HOME. We also highlight that heatmap output enables multimodal ensembling and improve 1$^{st}$ place MissRate$_6$ by more than 15$\%$ with our best ensemble on Argoverse. Finally, we evaluate and reach state-of-the-art performance on the other trajectory prediction datasets nuScenes and Interaction, demonstrating the generalizability of our method. △ Less

Submitted 21 September, 2021; v1 submitted 4 September, 2021; originally announced September 2021.

arXiv:2109.00953 [pdf, other]

TrouSPI-Net: Spatio-temporal attention on parallel atrous convolutions and U-GRUs for skeletal pedestrian crossing prediction

Authors: Joseph Gesnouin, Steve Pechberti, Bogdan Stanciulescu, Fabien Moutarde

Abstract: Understanding the behaviors and intentions of pedestrians is still one of the main challenges for vehicle autonomy, as accurate predictions of their intentions can guarantee their safety and driving comfort of vehicles. In this paper, we address pedestrian crossing prediction in urban traffic environments by linking the dynamics of a pedestrian's skeleton to a binary crossing intention. We introdu… ▽ More Understanding the behaviors and intentions of pedestrians is still one of the main challenges for vehicle autonomy, as accurate predictions of their intentions can guarantee their safety and driving comfort of vehicles. In this paper, we address pedestrian crossing prediction in urban traffic environments by linking the dynamics of a pedestrian's skeleton to a binary crossing intention. We introduce TrouSPI-Net: a context-free, lightweight, multi-branch predictor. TrouSPI-Net extracts spatio-temporal features for different time resolutions by encoding pseudo-images sequences of skeletal joints' positions and processes them with parallel attention modules and atrous convolutions. The proposed approach is then enhanced by processing features such as relative distances of skeletal joints, bounding box positions, or ego-vehicle speed with U-GRUs. Using the newly proposed evaluation procedures for two large public naturalistic data sets for studying pedestrian behavior in traffic: JAAD and PIE, we evaluate TrouSPI-Net and analyze its performance. Experimental results show that TrouSPI-Net achieved 0.76 F1 score on JAAD and 0.80 F1 score on PIE, therefore outperforming current state-of-the-art while being lightweight and context-free. △ Less

Submitted 7 September, 2021; v1 submitted 2 September, 2021; originally announced September 2021.

Comments: Accepted to IEEE International Conference on Automatic Face & Gesture Recognition 2021 (December 15 - 18, 2021) 7 pages, 2 Figures

arXiv:2106.04419 [pdf, other]

Asymmetrical Bi-RNN for pedestrian trajectory encoding

Authors: Raphaël Rozenberg, Joseph Gesnouin, Fabien Moutarde

Abstract: Pedestrian motion behavior involves a combination of individual goals and social interactions with other agents. In this article, we present an asymmetrical bidirectional recurrent neural network architecture called U-RNN to encode pedestrian trajectories and evaluate its relevance to replace LSTMs for various forecasting models. Experimental results on the Trajnet++ benchmark show that the U-LSTM… ▽ More Pedestrian motion behavior involves a combination of individual goals and social interactions with other agents. In this article, we present an asymmetrical bidirectional recurrent neural network architecture called U-RNN to encode pedestrian trajectories and evaluate its relevance to replace LSTMs for various forecasting models. Experimental results on the Trajnet++ benchmark show that the U-LSTM variant yields better results regarding every available metrics (ADE, FDE, Collision rate) than common trajectory encoders for a variety of approaches and interaction modules, suggesting that the proposed approach is a viable alternative to the de facto sequence encoding RNNs. Our implementation of the asymmetrical Bi-RNNs for the Trajnet++ benchmark is available at: github.com/JosephGesnouin/Asymmetrical-Bi-RNNs-to-encode-pedestrian-trajectories △ Less

Submitted 19 June, 2021; v1 submitted 1 June, 2021; originally announced June 2021.

Comments: 7 pages

MSC Class: 68T45 ACM Class: I.2.9; I.2.10

arXiv:2105.10968 [pdf, other]

HOME: Heatmap Output for future Motion Estimation

Authors: Thomas Gilles, Stefano Sabatini, Dzmitry Tsishkou, Bogdan Stanciulescu, Fabien Moutarde

Abstract: In this paper, we propose HOME, a framework tackling the motion forecasting problem with an image output representing the probability distribution of the agent's future location. This method allows for a simple architecture with classic convolution networks coupled with attention mechanism for agent interactions, and outputs an unconstrained 2D top-view representation of the agent's possible futur… ▽ More In this paper, we propose HOME, a framework tackling the motion forecasting problem with an image output representing the probability distribution of the agent's future location. This method allows for a simple architecture with classic convolution networks coupled with attention mechanism for agent interactions, and outputs an unconstrained 2D top-view representation of the agent's possible future. Based on this output, we design two methods to sample a finite set of agent's future locations. These methods allow us to control the optimization trade-off between miss rate and final displacement error for multiple modalities without having to retrain any part of the model. We apply our method to the Argoverse Motion Forecasting Benchmark and achieve 1st place on the online leaderboard. △ Less

Submitted 2 June, 2021; v1 submitted 23 May, 2021; originally announced May 2021.

arXiv:1911.10868 [pdf, other]

End-to-End Model-Free Reinforcement Learning for Urban Driving using Implicit Affordances

Authors: Marin Toromanoff, Emilie Wirbel, Fabien Moutarde

Abstract: Reinforcement Learning (RL) aims at learning an optimal behavior policy from its own experiments and not rule-based control methods. However, there is no RL algorithm yet capable of handling a task as difficult as urban driving. We present a novel technique, coined implicit affordances, to effectively leverage RL for urban driving thus including lane kee**, pedestrians and vehicles avoidance, an… ▽ More Reinforcement Learning (RL) aims at learning an optimal behavior policy from its own experiments and not rule-based control methods. However, there is no RL algorithm yet capable of handling a task as difficult as urban driving. We present a novel technique, coined implicit affordances, to effectively leverage RL for urban driving thus including lane kee**, pedestrians and vehicles avoidance, and traffic light detection. To our knowledge we are the first to present a successful RL agent handling such a complex task especially regarding the traffic light detection. Furthermore, we have demonstrated the effectiveness of our method by winning the Camera Only track of the CARLA challenge. △ Less

Submitted 16 March, 2020; v1 submitted 25 November, 2019; originally announced November 2019.

Comments: Accepted at main conference of CVPR 2020

arXiv:1908.04683 [pdf, other]

Is Deep Reinforcement Learning Really Superhuman on Atari? Leveling the playing field

Authors: Marin Toromanoff, Emilie Wirbel, Fabien Moutarde

Abstract: Consistent and reproducible evaluation of Deep Reinforcement Learning (DRL) is not straightforward. In the Arcade Learning Environment (ALE), small changes in environment parameters such as stochasticity or the maximum allowed play time can lead to very different performance. In this work, we discuss the difficulties of comparing different agents trained on ALE. In order to take a step further tow… ▽ More Consistent and reproducible evaluation of Deep Reinforcement Learning (DRL) is not straightforward. In the Arcade Learning Environment (ALE), small changes in environment parameters such as stochasticity or the maximum allowed play time can lead to very different performance. In this work, we discuss the difficulties of comparing different agents trained on ALE. In order to take a step further towards reproducible and comparable DRL, we introduce SABER, a Standardized Atari BEnchmark for general Reinforcement learning algorithms. Our methodology extends previous recommendations and contains a complete set of environment parameters as well as train and test procedures. We then use SABER to evaluate the current state of the art, Rainbow. Furthermore, we introduce a human world records baseline, and argue that previous claims of expert or superhuman performance of DRL might not be accurate. Finally, we propose Rainbow-IQN by extending Rainbow with Implicit Quantile Networks (IQN) leading to new state-of-the-art performance. Source code is available for reproducibility. △ Less

Submitted 8 November, 2019; v1 submitted 13 August, 2019; originally announced August 2019.

Comments: Paper currently in review

arXiv:1812.01712 [pdf, other]

Multiview Based 3D Scene Understanding On Partial Point Sets

Authors: Ye Zhu, Sven Ewan Shepstone, Pablo Martínez-Nuevo, Miklas Strøm Kristoffersen, Fabien Moutarde, Zhuang Fu

Abstract: Deep learning within the context of point clouds has gained much research interest in recent years mostly due to the promising results that have been achieved on a number of challenging benchmarks, such as 3D shape recognition and scene semantic segmentation. In many realistic settings however, snapshots of the environment are often taken from a single view, which only contains a partial set of th… ▽ More Deep learning within the context of point clouds has gained much research interest in recent years mostly due to the promising results that have been achieved on a number of challenging benchmarks, such as 3D shape recognition and scene semantic segmentation. In many realistic settings however, snapshots of the environment are often taken from a single view, which only contains a partial set of the scene due to the field of view restriction of commodity cameras. 3D scene semantic understanding on partial point clouds is considered as a challenging task. In this work, we propose a processing approach for 3D point cloud data based on a multiview representation of the existing 360° point clouds. By fusing the original 360° point clouds and their corresponding 3D multiview representations as input data, a neural network is able to recognize partial point sets while improving the general performance on complete point sets, resulting in an overall increase of 31.9% and 4.3% in segmentation accuracy for partial and complete scene semantic understanding, respectively. This method can also be applied in a wider 3D recognition context such as 3D part segmentation. △ Less

Submitted 30 November, 2018; originally announced December 2018.

Comments: This paper has been submitted to IEEE Transactions on Neural Networks and Learning Systems

arXiv:1810.09365 [pdf, other]

Coupled Longitudinal and Lateral Control of a Vehicle using Deep Learning

Authors: Guillaume Devineau, Philip Polack, Florent Altché, Fabien Moutarde

Abstract: This paper explores the capability of deep neural networks to capture key characteristics of vehicle dynamics, and their ability to perform coupled longitudinal and lateral control of a vehicle. To this extent, two different artificial neural networks are trained to compute vehicle controls corresponding to a reference trajectory, using a dataset based on high-fidelity simulations of vehicle dynam… ▽ More This paper explores the capability of deep neural networks to capture key characteristics of vehicle dynamics, and their ability to perform coupled longitudinal and lateral control of a vehicle. To this extent, two different artificial neural networks are trained to compute vehicle controls corresponding to a reference trajectory, using a dataset based on high-fidelity simulations of vehicle dynamics. In this study, control inputs are chosen as the steering angle of the front wheels, and the applied torque on each wheel. The performance of both models, namely a Multi-Layer Perceptron (MLP) and a Convolutional Neural Network (CNN), is evaluated based on their ability to drive the vehicle on a challenging test track, shifting between long straight lines and tight curves. A comparison to conventional decoupled controllers on the same track is also provided. △ Less

Submitted 22 October, 2018; originally announced October 2018.

Comments: Published in the IEEE 2018 International Conference on Intelligent Transportation Systems (ITSC 2018)

arXiv:1808.06940 [pdf, other]

End to End Vehicle Lateral Control Using a Single Fisheye Camera

Authors: Marin Toromanoff, Emilie Wirbel, Frédéric Wilhelm, Camilo Vejarano, Xavier Perrotton, Fabien Moutarde

Abstract: Convolutional neural networks are commonly used to control the steering angle for autonomous cars. Most of the time, multiple long range cameras are used to generate lateral failure cases. In this paper we present a novel model to generate this data and label augmentation using only one short range fisheye camera. We present our simulator and how it can be used as a consistent metric for lateral e… ▽ More Convolutional neural networks are commonly used to control the steering angle for autonomous cars. Most of the time, multiple long range cameras are used to generate lateral failure cases. In this paper we present a novel model to generate this data and label augmentation using only one short range fisheye camera. We present our simulator and how it can be used as a consistent metric for lateral end-to-end control evaluation. Experiments are conducted on a custom dataset corresponding to more than 10000 km and 200 hours of open road driving. Finally we evaluate this model on real world driving scenarios, open road and a custom test track with challenging obstacle avoidance and sharp turns. In our simulator based on real-world videos, the final model was capable of more than 99% autonomy on urban road △ Less

Submitted 20 August, 2018; originally announced August 2018.

Comments: 7 pages paper accepted at IROS 2018

arXiv:1605.05157 [pdf, ps, other]

Monocular Urban Localization using Street View

Authors: Li Yu, Cyril Joly, Guillaume Bresson, Fabien Moutarde

Abstract: This paper presents a metric global localization in the urban environment only with a monocular camera and the Google Street View database. We fully leverage the abundant sources from the Street View and benefits from its topo-metric structure to build a coarse-to-fine positioning, namely a topological place recognition process and then a metric pose estimation by local bundle adjustment. Our meth… ▽ More This paper presents a metric global localization in the urban environment only with a monocular camera and the Google Street View database. We fully leverage the abundant sources from the Street View and benefits from its topo-metric structure to build a coarse-to-fine positioning, namely a topological place recognition process and then a metric pose estimation by local bundle adjustment. Our method is tested on a 3 km urban environment and demonstrates both sub-meter accuracy and robustness to viewpoint changes, illumination and occlusion. To our knowledge, this is the first work that studies the global urban localization simply with a single camera and Street View. △ Less

Submitted 16 June, 2016; v1 submitted 17 May, 2016; originally announced May 2016.

Comments: 6 pages, 6 figures, submitted to ICARCV2016

arXiv:1605.00026 [pdf, other]

A Distributed Model Predictive Control Framework for Road-Following Formation Control of Car-like Vehicles (Extended Version)

Authors: Xiangjun Qian, Florent Altché, Arnaud de La Fortelle, Fabien Moutarde

Abstract: This work presents a novel framework for the formation control of multiple autonomous ground vehicles in an on-road environment. Unique challenges of this problem lie in 1) the design of collision avoidance strategies with obstacles and with other vehicles in a highly structured environment, 2) dynamic reconfiguration of the formation to handle different task specifications. In this paper, we desi… ▽ More This work presents a novel framework for the formation control of multiple autonomous ground vehicles in an on-road environment. Unique challenges of this problem lie in 1) the design of collision avoidance strategies with obstacles and with other vehicles in a highly structured environment, 2) dynamic reconfiguration of the formation to handle different task specifications. In this paper, we design a local MPC-based tracking controller for each individual vehicle to follow a reference trajectory while satisfying various constraints (kinematics and dynamics, collision avoidance, \textit{etc.}). The reference trajectory of a vehicle is computed from its leader's trajectory, based on a pre-defined formation tree. We use logic rules to organize the collision avoidance behaviors of member vehicles. Moreover, we propose a methodology to safely reconfigure the formation on-the-fly. The proposed framework has been validated using high-fidelity simulations. △ Less

Submitted 29 April, 2016; originally announced May 2016.

Comments: Extended version of the conference paper submission on ICARCV'16

arXiv:1407.5813

Priority-based coordination of autonomous and legacy vehicles at intersection

Authors: Xiangjun Qian, Jean Gregoire, Fabien Moutarde, Arnaud De La Fortelle

Abstract: Recently, researchers have proposed various autonomous intersection management techniques that enable autonomous vehicles to cross the intersection without traffic lights or stop signs. In particular, a priority-based coordination system with provable collision-free and deadlock-free features has been presented. In this paper, we extend the priority-based approach to support legacy vehicles withou… ▽ More Recently, researchers have proposed various autonomous intersection management techniques that enable autonomous vehicles to cross the intersection without traffic lights or stop signs. In particular, a priority-based coordination system with provable collision-free and deadlock-free features has been presented. In this paper, we extend the priority-based approach to support legacy vehicles without compromising above-mentioned features. We make the hypothesis that legacy vehicles are able to keep a safe distance from their leading vehicles. Then we explore some special configurations of system that ensures the safe crossing of legacy vehicles. We implement the extended system in a realistic traffic simulator SUMO. Simulations are performed to demonstrate the safety of the system. △ Less

Submitted 26 September, 2014; v1 submitted 22 July, 2014; originally announced July 2014.

Comments: put in other preprint server

arXiv:1212.5264 [pdf]

Statistical Traffic State Analysis in Large-scale Transportation Networks Using Locality-Preserving Non-negative Matrix Factorization

Authors: Yufei Han, Fabien Moutarde

Abstract: Statistical traffic data analysis is a hot topic in traffic management and control. In this field, current research progresses focus on analyzing traffic flows of individual links or local regions in a transportation network. Less attention are paid to the global view of traffic states over the entire network, which is important for modeling large-scale traffic scenes. Our aim is precisely to prop… ▽ More Statistical traffic data analysis is a hot topic in traffic management and control. In this field, current research progresses focus on analyzing traffic flows of individual links or local regions in a transportation network. Less attention are paid to the global view of traffic states over the entire network, which is important for modeling large-scale traffic scenes. Our aim is precisely to propose a new methodology for extracting spatio-temporal traffic patterns, ultimately for modeling large-scale traffic dynamics, and long-term traffic forecasting. We attack this issue by utilizing Locality-Preserving Non-negative Matrix Factorization (LPNMF) to derive low-dimensional representation of network-level traffic states. Clustering is performed on the compact LPNMF projections to unveil typical spatial patterns and temporal dynamics of network-level traffic states. We have tested the proposed method on simulated traffic data generated for a large-scale road network, and reported experimental results validate the ability of our approach for extracting meaningful large-scale space-time traffic patterns. Furthermore, the derived clustering results provide an intuitive understanding of spatial-temporal characteristics of traffic flows in the large-scale network, and a basis for potential long-term forecasting. △ Less

Submitted 20 December, 2012; originally announced December 2012.

Comments: IET Intelligent Transport Systems (2013)

arXiv:1212.4675 [pdf]

Analysis of Large-scale Traffic Dynamics using Non-negative Tensor Factorization

Authors: Yufei Han, Fabien Moutarde

Abstract: In this paper, we present our work on clustering and prediction of temporal dynamics of global congestion configurations in large-scale road networks. Instead of looking into temporal traffic state variation of individual links, or of small areas, we focus on spatial congestion configurations of the whole network. In our work, we aim at describing the typical temporal dynamic patterns of this netw… ▽ More In this paper, we present our work on clustering and prediction of temporal dynamics of global congestion configurations in large-scale road networks. Instead of looking into temporal traffic state variation of individual links, or of small areas, we focus on spatial congestion configurations of the whole network. In our work, we aim at describing the typical temporal dynamic patterns of this network-level traffic state and achieving long-term prediction of the large-scale traffic dynamics, in a unified data-mining framework. To this end, we formulate this joint task using Non-negative Tensor Factorization (NTF), which has been shown to be a useful decomposition tools for multivariate data sequences. Clustering and prediction are performed based on the compact tensor factorization results. Experiments on large-scale simulated data illustrate the interest of our method with promising results for long-term forecast of traffic evolution. △ Less

Submitted 18 December, 2012; originally announced December 2012.

Comments: ITS World Congress 2012 (2012)

arXiv:1010.3867 [pdf]

Joint interpretation of on-board vision and static GPS cartography for determination of correct speed limit

Authors: Alexandre Bargeton, Fabien Moutarde, Fawzi Nashashibi, Anne-Sophie Puthon

Abstract: We present here a first prototype of a "Speed Limit Support" Advance Driving Assistance System (ADAS) producing permanent reliable information on the current speed limit applicable to the vehicle. Such a module can be used either for information of the driver, or could even serve for automatic setting of the maximum speed of a smart Adaptive Cruise Control (ACC). Our system is based on a joint int… ▽ More We present here a first prototype of a "Speed Limit Support" Advance Driving Assistance System (ADAS) producing permanent reliable information on the current speed limit applicable to the vehicle. Such a module can be used either for information of the driver, or could even serve for automatic setting of the maximum speed of a smart Adaptive Cruise Control (ACC). Our system is based on a joint interpretation of cartographic information (for static reference information) with on-board vision, used for traffic sign detection and recognition (including supplementary sub-signs) and visual road lines localization (for detection of lane changes). The visual traffic sign detection part is quite robust (90% global correct detection and recognition for main speed signs, and 80% for exit-lane sub-signs detection). Our approach for joint interpretation with cartography is original, and logic-based rather than probability-based, which allows correct behaviour even in cases, which do happen, when both vision and cartography may provide the same erroneous information. △ Less

Submitted 19 October, 2010; originally announced October 2010.

Journal ref: 17th ITS world congress (ITSwc'2010), Busan : Korea, Republic Of (2010)

arXiv:0910.1295 [pdf]

Modular Traffic Sign Recognition applied to on-vehicle real-time visual detection of American and European speed limit signs

Authors: Fabien Moutarde, Alexandre Bargeton, Anne Herbin, Lowik Chanussot

Abstract: We present a new modular traffic signs recognition system, successfully applied to both American and European speed limit signs. Our sign detection step is based only on shape-detection (rectangles or circles). This enables it to work on grayscale images, contrary to most European competitors, which eases robustness to illumination conditions (notably night operation). Speed sign candidates are… ▽ More We present a new modular traffic signs recognition system, successfully applied to both American and European speed limit signs. Our sign detection step is based only on shape-detection (rectangles or circles). This enables it to work on grayscale images, contrary to most European competitors, which eases robustness to illumination conditions (notably night operation). Speed sign candidates are classified (or rejected) by segmenting potential digits inside them (which is rather original and has several advantages), and then applying a neural digit recognition. The global detection rate is ~90% for both (standard) U.S. and E.U. speed signs, with a misclassification rate <1%, and no validated false alarm in >150 minutes of video. The system processes in real-time ~20 frames/s on a standard high-end laptop. △ Less

Submitted 7 October, 2009; originally announced October 2009.

Journal ref: 14th World congress on Intelligent Transportation Systems (ITS'2007), Bei**g : China (2007)

arXiv:0910.1294 [pdf]

Visual object categorization with new keypoint-based adaBoost features

Authors: Taoufik Bdiri, Fabien Moutarde, Bruno Steux

Abstract: We present promising results for visual object categorization, obtained with adaBoost using new original ?keypoints-based features?. These weak-classifiers produce a boolean response based on presence or absence in the tested image of a ?keypoint? (a kind of SURF interest point) with a descriptor sufficiently similar (i.e. within a given distance) to a reference descriptor characterizing the fea… ▽ More We present promising results for visual object categorization, obtained with adaBoost using new original ?keypoints-based features?. These weak-classifiers produce a boolean response based on presence or absence in the tested image of a ?keypoint? (a kind of SURF interest point) with a descriptor sufficiently similar (i.e. within a given distance) to a reference descriptor characterizing the feature. A first experiment was conducted on a public image dataset containing lateral-viewed cars, yielding 95% recall with 95% precision on test set. Preliminary tests on a small subset of a pedestrians database also gives promising 97% recall with 92 % precision, which shows the generality of our new family of features. Moreover, analysis of the positions of adaBoost-selected keypoints show that they correspond to a specific part of the object category (such as ?wheel? or ?side skirt? in the case of lateral-cars) and thus have a ?semantic? meaning. We also made a first test on video for detecting vehicles from adaBoostselected keypoints filtered in real-time from all detected keypoints. △ Less

Submitted 7 October, 2009; originally announced October 2009.

Journal ref: IEEE Symposium on Intelligent Vehicles (IV'2009), XiAn : China (2009)

arXiv:0910.1293 [pdf]

Introducing New AdaBoost Features for Real-Time Vehicle Detection

Authors: Bogdan Stanciulescu, Amaury Breheret, Fabien Moutarde

Abstract: This paper shows how to improve the real-time object detection in complex robotics applications, by exploring new visual features as AdaBoost weak classifiers. These new features are symmetric Haar filters (enforcing global horizontal and vertical symmetry) and N-connexity control points. Experimental evaluation on a car database show that the latter appear to provide the best results for the ve… ▽ More This paper shows how to improve the real-time object detection in complex robotics applications, by exploring new visual features as AdaBoost weak classifiers. These new features are symmetric Haar filters (enforcing global horizontal and vertical symmetry) and N-connexity control points. Experimental evaluation on a car database show that the latter appear to provide the best results for the vehicle-detection problem. △ Less

Submitted 7 October, 2009; originally announced October 2009.

Journal ref: COGIS'07 conference on COGnitive systems with Interactive Sensors, Stanford, Palo Alto : United States (2007)

arXiv:0910.1273 [pdf]

Adaboost with "Keypoint Presence Features" for Real-Time Vehicle Visual Detection

Authors: Taoufik Bdiri, Fabien Moutarde, Nicolas Bourdis, Bruno Steux

Abstract: We present promising results for real-time vehicle visual detection, obtained with adaBoost using new original ?keypoints presence features?. These weak-classifiers produce a boolean response based on presence or absence in the tested image of a ?keypoint? (~ a SURF interest point) with a descriptor sufficiently similar (i.e. within a given distance) to a reference descriptor characterizing the… ▽ More We present promising results for real-time vehicle visual detection, obtained with adaBoost using new original ?keypoints presence features?. These weak-classifiers produce a boolean response based on presence or absence in the tested image of a ?keypoint? (~ a SURF interest point) with a descriptor sufficiently similar (i.e. within a given distance) to a reference descriptor characterizing the feature. A first experiment was conducted on a public image dataset containing lateral-viewed cars, yielding 95% recall with 95% precision on test set. Moreover, analysis of the positions of adaBoost-selected keypoints show that they correspond to a specific part of the object category (such as ?wheel? or ?side skirt?) and thus have a ?semantic? meaning. △ Less

Submitted 7 October, 2009; originally announced October 2009.

Journal ref: 16th World Congress on Intelligent Transport Systems (ITSwc'2009), Suède (2009)

Showing 1–31 of 31 results for author: Moutarde, F