Search | arXiv e-print repository

LatentFormer: Multi-Agent Transformer-Based Interaction Modeling and Trajectory Prediction

Authors: Elmira Amirloo, Amir Rasouli, Peter Lakner, Mohsen Rohani, Jun Luo

Abstract: Multi-agent trajectory prediction is a fundamental problem in autonomous driving. The key challenges in prediction are accurately anticipating the behavior of surrounding agents and understanding the scene context. To address these problems, we propose LatentFormer, a transformer-based model for predicting future vehicle trajectories. The proposed method leverages a novel technique for modeling in… ▽ More Multi-agent trajectory prediction is a fundamental problem in autonomous driving. The key challenges in prediction are accurately anticipating the behavior of surrounding agents and understanding the scene context. To address these problems, we propose LatentFormer, a transformer-based model for predicting future vehicle trajectories. The proposed method leverages a novel technique for modeling interactions among dynamic objects in the scene. Contrary to many existing approaches which model cross-agent interactions during the observation time, our method additionally exploits the future states of the agents. This is accomplished using a hierarchical attention mechanism where the evolving states of the agents autoregressively control the contributions of past trajectories and scene encodings in the final prediction. Furthermore, we propose a multi-resolution map encoding scheme that relies on a vision transformer module to effectively capture both local and global scene context to guide the generation of more admissible future trajectories. We evaluate the proposed method on the nuScenes benchmark dataset and show that our approach achieves state-of-the-art performance and improves upon trajectory metrics by up to 40%. We further investigate the contributions of various components of the proposed technique via extensive ablation studies. △ Less

Submitted 3 March, 2022; originally announced March 2022.

arXiv:2103.01039 [pdf, other]

Self-Supervised Simultaneous Multi-Step Prediction of Road Dynamics and Cost Map

Authors: Elmira Amirloo, Mohsen Rohani, Ershad Banijamali, Jun Luo, Pascal Poupart

Abstract: While supervised learning is widely used for perception modules in conventional autonomous driving solutions, scalability is hindered by the huge amount of data labeling needed. In contrast, while end-to-end architectures do not require labeled data and are potentially more scalable, interpretability is sacrificed. We introduce a novel architecture that is trained in a fully self-supervised fashio… ▽ More While supervised learning is widely used for perception modules in conventional autonomous driving solutions, scalability is hindered by the huge amount of data labeling needed. In contrast, while end-to-end architectures do not require labeled data and are potentially more scalable, interpretability is sacrificed. We introduce a novel architecture that is trained in a fully self-supervised fashion for simultaneous multi-step prediction of space-time cost map and road dynamics. Our solution replaces the manually designed cost function for motion planning with a learned high dimensional cost map that is naturally interpretable and allows diverse contextual information to be integrated without manual data labeling. Experiments on real world driving data show that our solution leads to lower number of collisions and road violations in long planning horizons in comparison to baselines, demonstrating the feasibility of fully self-supervised prediction without sacrificing either scalability or interpretability. △ Less

Submitted 29 March, 2021; v1 submitted 1 March, 2021; originally announced March 2021.

Journal ref: CVPR 2021

arXiv:2012.13478 [pdf, other]

Prediction by Anticipation: An Action-Conditional Prediction Method based on Interaction Learning

Authors: Ershad Banijamali, Mohsen Rohani, Elmira Amirloo, Jun Luo, Pascal Poupart

Abstract: In autonomous driving (AD), accurately predicting changes in the environment can effectively improve safety and comfort. Due to complex interactions among traffic participants, however, it is very hard to achieve accurate prediction for a long horizon. To address this challenge, we propose prediction by anticipation, which views interaction in terms of a latent probabilistic generative process whe… ▽ More In autonomous driving (AD), accurately predicting changes in the environment can effectively improve safety and comfort. Due to complex interactions among traffic participants, however, it is very hard to achieve accurate prediction for a long horizon. To address this challenge, we propose prediction by anticipation, which views interaction in terms of a latent probabilistic generative process wherein some vehicles move partly in response to the anticipated motion of other vehicles. Under this view, consecutive data frames can be factorized into sequential samples from an action-conditional distribution that effectively generalizes to a wider range of actions and driving situations. Our proposed prediction model, variational Bayesian in nature, is trained to maximize the evidence lower bound (ELBO) of the log-likelihood of this conditional distribution. Evaluations of our approach with prominent AD datasets NGSIM I-80 and Argoverse show significant improvement over current state-of-the-art in both accuracy and generalization. △ Less

Submitted 24 December, 2020; originally announced December 2020.

arXiv:2012.07773 [pdf, other]

PePScenes: A Novel Dataset and Baseline for Pedestrian Action Prediction in 3D

Authors: Amir Rasouli, Tiffany Yau, Peter Lakner, Saber Malekmohammadi, Mohsen Rohani, Jun Luo

Abstract: Predicting the behavior of road users, particularly pedestrians, is vital for safe motion planning in the context of autonomous driving systems. Traditionally, pedestrian behavior prediction has been realized in terms of forecasting future trajectories. However, recent evidence suggests that predicting higher-level actions, such as crossing the road, can help improve trajectory forecasting and pla… ▽ More Predicting the behavior of road users, particularly pedestrians, is vital for safe motion planning in the context of autonomous driving systems. Traditionally, pedestrian behavior prediction has been realized in terms of forecasting future trajectories. However, recent evidence suggests that predicting higher-level actions, such as crossing the road, can help improve trajectory forecasting and planning tasks accordingly. There are a number of existing datasets that cater to the development of pedestrian action prediction algorithms, however, they lack certain characteristics, such as bird's eye view semantic map information, 3D locations of objects in the scene, etc., which are crucial in the autonomous driving context. To this end, we propose a new pedestrian action prediction dataset created by adding per-frame 2D/3D bounding box and behavioral annotations to the popular autonomous driving dataset, nuScenes. In addition, we propose a hybrid neural network architecture that incorporates various data modalities for predicting pedestrian crossing action. By evaluating our model on the newly proposed dataset, the contribution of different data modalities to the prediction task is revealed. The dataset is available at https://github.com/huawei-noah/PePScenes. △ Less

Submitted 14 December, 2020; originally announced December 2020.

Comments: 1 Figure, 2 Table. ML4AD at NeurIPS, 2020

arXiv:2012.03298 [pdf, other]

Bifold and Semantic Reasoning for Pedestrian Behavior Prediction

Authors: Amir Rasouli, Mohsen Rohani, Jun Luo

Abstract: Pedestrian behavior prediction is one of the major challenges for intelligent driving systems. Pedestrians often exhibit complex behaviors influenced by various contextual elements. To address this problem, we propose BiPed, a multitask learning framework that simultaneously predicts trajectories and actions of pedestrians by relying on multimodal data. Our method benefits from 1) a bifold encodin… ▽ More Pedestrian behavior prediction is one of the major challenges for intelligent driving systems. Pedestrians often exhibit complex behaviors influenced by various contextual elements. To address this problem, we propose BiPed, a multitask learning framework that simultaneously predicts trajectories and actions of pedestrians by relying on multimodal data. Our method benefits from 1) a bifold encoding approach where different data modalities are processed independently allowing them to develop their own representations, and jointly to produce a representation for all modalities using shared parameters; 2) a novel interaction modeling technique that relies on categorical semantic parsing of the scenes to capture interactions between target pedestrians and their surroundings; and 3) a bifold prediction mechanism that uses both independent and shared decoding of multimodal representations. Using public pedestrian behavior benchmark datasets for driving, PIE and JAAD, we highlight the benefits of the proposed method for behavior prediction and show that our model achieves state-of-the-art performance and improves trajectory and action prediction by up to 22% and 9% respectively. We further investigate the contributions of the proposed reasoning techniques via extensive ablation studies. △ Less

Submitted 9 August, 2021; v1 submitted 6 December, 2020; originally announced December 2020.

Comments: ICCV 2021. 11 pages; 5 Figures; 5 tables

arXiv:2012.02148 [pdf, other]

Graph-SIM: A Graph-based Spatiotemporal Interaction Modelling for Pedestrian Action Prediction

Authors: Tiffany Yau, Saber Malekmohammadi, Amir Rasouli, Peter Lakner, Mohsen Rohani, Jun Luo

Abstract: One of the most crucial yet challenging tasks for autonomous vehicles in urban environments is predicting the future behaviour of nearby pedestrians, especially at points of crossing. Predicting behaviour depends on many social and environmental factors, particularly interactions between road users. Capturing such interactions requires a global view of the scene and dynamics of the road users in t… ▽ More One of the most crucial yet challenging tasks for autonomous vehicles in urban environments is predicting the future behaviour of nearby pedestrians, especially at points of crossing. Predicting behaviour depends on many social and environmental factors, particularly interactions between road users. Capturing such interactions requires a global view of the scene and dynamics of the road users in three-dimensional space. This information, however, is missing from the current pedestrian behaviour benchmark datasets. Motivated by these challenges, we propose 1) a novel graph-based model for predicting pedestrian crossing action. Our method models pedestrians' interactions with nearby road users through clustering and relative importance weighting of interactions using features obtained from the bird's-eye-view. 2) We introduce a new dataset that provides 3D bounding box and pedestrian behavioural annotations for the existing nuScenes dataset. On the new data, our approach achieves state-of-the-art performance by improving on various metrics by more than 15% in comparison to existing methods. The dataset is available at https://github.com/huawei-noah/datasets/PePScenes. △ Less

Submitted 25 March, 2021; v1 submitted 3 December, 2020; originally announced December 2020.

Comments: 7 pages, 3 figures, 4 tables, accepted at ICRA 2021

arXiv:2012.00514 [pdf, other]

Multi-Modal Hybrid Architecture for Pedestrian Action Prediction

Authors: Amir Rasouli, Tiffany Yau, Mohsen Rohani, Jun Luo

Abstract: Pedestrian behavior prediction is one of the major challenges for intelligent driving systems in urban environments. Pedestrians often exhibit a wide range of behaviors and adequate interpretations of those depend on various sources of information such as pedestrian appearance, states of other road users, the environment layout, etc. To address this problem, we propose a novel multi-modal predicti… ▽ More Pedestrian behavior prediction is one of the major challenges for intelligent driving systems in urban environments. Pedestrians often exhibit a wide range of behaviors and adequate interpretations of those depend on various sources of information such as pedestrian appearance, states of other road users, the environment layout, etc. To address this problem, we propose a novel multi-modal prediction algorithm that incorporates different sources of information captured from the environment to predict future crossing actions of pedestrians. The proposed model benefits from a hybrid learning architecture consisting of feedforward and recurrent networks for analyzing visual features of the environment and dynamics of the scene. Using the existing 2D pedestrian behavior benchmarks and a newly annotated 3D driving dataset, we show that our proposed model achieves state-of-the-art performance in pedestrian crossing prediction. △ Less

Submitted 16 November, 2020; originally announced December 2020.

Comments: 7 pages, 4 Figures, 3 tables, submitted to ICRA 2021

arXiv:2010.09776 [pdf, other]

SMARTS: Scalable Multi-Agent Reinforcement Learning Training School for Autonomous Driving

Authors: Ming Zhou, Jun Luo, Julian Villella, Yaodong Yang, David Rusu, Jiayu Miao, Weinan Zhang, Montgomery Alban, Iman Fadakar, Zheng Chen, Aurora Chongxi Huang, Ying Wen, Kimia Hassanzadeh, Daniel Graves, Dong Chen, Zhengbang Zhu, Nhat Nguyen, Mohamed Elsayed, Kun Shao, Sanjeevan Ahilan, Baokuan Zhang, Jiannan Wu, Zhengang Fu, Kasra Rezaee, Peyman Yadmellat , et al. (12 additional authors not shown)

Abstract: Multi-agent interaction is a fundamental aspect of autonomous driving in the real world. Despite more than a decade of research and development, the problem of how to competently interact with diverse road users in diverse scenarios remains largely unsolved. Learning methods have much to offer towards solving this problem. But they require a realistic multi-agent simulator that generates diverse a… ▽ More Multi-agent interaction is a fundamental aspect of autonomous driving in the real world. Despite more than a decade of research and development, the problem of how to competently interact with diverse road users in diverse scenarios remains largely unsolved. Learning methods have much to offer towards solving this problem. But they require a realistic multi-agent simulator that generates diverse and competent driving interactions. To meet this need, we develop a dedicated simulation platform called SMARTS (Scalable Multi-Agent RL Training School). SMARTS supports the training, accumulation, and use of diverse behavior models of road users. These are in turn used to create increasingly more realistic and diverse interactions that enable deeper and broader research on multi-agent interaction. In this paper, we describe the design goals of SMARTS, explain its basic architecture and its key features, and illustrate its use through concrete multi-agent experiments on interactive scenarios. We open-source the SMARTS platform and the associated benchmark tasks and evaluation metrics to encourage and empower research on multi-agent learning for autonomous driving. Our code is available at https://github.com/huawei-noah/SMARTS. △ Less

Submitted 31 October, 2020; v1 submitted 19 October, 2020; originally announced October 2020.

Comments: 20 pages, 11 figures. Paper accepted to CoRL 2020

arXiv:2007.09569 [pdf, other]

Understanding and Mitigating the Limitations of Prioritized Experience Replay

Authors: Yangchen Pan, **cheng Mei, Amir-massoud Farahmand, Martha White, Hengshuai Yao, Mohsen Rohani, Jun Luo

Abstract: Prioritized Experience Replay (ER) has been empirically shown to improve sample efficiency across many domains and attracted great attention; however, there is little theoretical understanding of why such prioritized sampling helps and its limitations. In this work, we take a deep look at the prioritized ER. In a supervised learning setting, we show the equivalence between the error-based prioriti… ▽ More Prioritized Experience Replay (ER) has been empirically shown to improve sample efficiency across many domains and attracted great attention; however, there is little theoretical understanding of why such prioritized sampling helps and its limitations. In this work, we take a deep look at the prioritized ER. In a supervised learning setting, we show the equivalence between the error-based prioritized sampling method for mean squared error and uniform sampling for cubic power loss. We then provide theoretical insight into why it improves convergence rate upon uniform sampling during early learning. Based on the insight, we further point out two limitations of the prioritized ER method: 1) outdated priorities and 2) insufficient coverage of the sample space. To mitigate the limitations, we propose our model-based stochastic gradient Langevin dynamics sampling method. We show that our method does provide states distributed close to an ideal prioritized sampling distribution estimated by the brute-force method, which does not suffer from the two limitations. We conduct experiments on both discrete and continuous control problems to show our approach's efficacy and examine the practical implication of our method in an autonomous driving application. △ Less

Submitted 11 June, 2022; v1 submitted 18 July, 2020; originally announced July 2020.

Comments: Accepted to UAI2022

arXiv:2006.09655 [pdf]

doi 10.1155/2021/2977954

Fairness-Oriented Semi-Chaotic Genetic Algorithm-Based Channel Assignment Technique for Nodes Starvation Problem in Wireless Mesh Network

Authors: Fuad A. Ghaleb, Bander Ali Saleh Al-rimy, Maznah Kamat, Mohd. Foad Rohani, Shukor Abd Razak

Abstract: Multi-Radio Multi-Channel Wireless Mesh Networks (WMNs) have emerged as a scalable, reliable, and agile wireless network that supports many types of innovative technologies such as the Internet of Things (IoT) and vehicular networks. Due to the limited number of orthogonal channels, interference between channels adversely affects the fair distribution of bandwidth among mesh clients, causing node… ▽ More Multi-Radio Multi-Channel Wireless Mesh Networks (WMNs) have emerged as a scalable, reliable, and agile wireless network that supports many types of innovative technologies such as the Internet of Things (IoT) and vehicular networks. Due to the limited number of orthogonal channels, interference between channels adversely affects the fair distribution of bandwidth among mesh clients, causing node starvation in terms of insufficient bandwidth, which impedes the adoption of WMN as an efficient access technology. Therefore, a fair channel assignment is crucial for the mesh clients to utilize the available resources. However, the node starvation problem due to unfair channel distribution has been vastly overlooked during channel assignment by the extant research. Instead, existing channel assignment algorithms either reduce the total network interference or maximize the total network throughput, which neither guarantees a fair distribution of the channels nor eliminates node starvation. To this end, the Fairness-Oriented Semi-Chaotic Genetic Algorithm-Based Channel Assignment Technique (FA-SCGA-CAA) was proposed in this paper for Nodes Starvation Problem in Wireless Mesh Networks. FA-SCGA-CAA optimizes fairness based on multiple-criterion using a modified version of the Genetic Algorithm (GA). The modification includes proposing a semi-chaotic technique for creating the primary chromosome with powerful genes. Such a chromosome was used to create a strong population that directs the search towards the global minima in an effective and efficient way. The outcome is a nonlinear fairness oriented fitness function that aims at maximizing the link fairness while minimizing the link interference. Comparison with related work shows that the proposed FA_SCGA_CAA reduced the potential nodes starvation by 22% and improved network capacity utilization by 23%. △ Less

Submitted 17 June, 2020; originally announced June 2020.

Comments: 18 pages, 10 Figures

arXiv:1812.09395 [pdf, other]

Multi-Step Prediction of Occupancy Grid Maps with Recurrent Neural Networks

Authors: Nima Mohajerin, Mohsen Rohani

Abstract: We investigate the multi-step prediction of the drivable space, represented by Occupancy Grid Maps (OGMs), for autonomous vehicles. Our motivation is that accurate multi-step prediction of the drivable space can efficiently improve path planning and navigation resulting in safe, comfortable and optimum paths in autonomous driving. We train a variety of Recurrent Neural Network (RNN) based architec… ▽ More We investigate the multi-step prediction of the drivable space, represented by Occupancy Grid Maps (OGMs), for autonomous vehicles. Our motivation is that accurate multi-step prediction of the drivable space can efficiently improve path planning and navigation resulting in safe, comfortable and optimum paths in autonomous driving. We train a variety of Recurrent Neural Network (RNN) based architectures on the OGM sequences from the KITTI dataset. The results demonstrate significant improvement of the prediction accuracy using our proposed difference learning method, incorporating motion related features, over the state of the art. We remove the egomotion from the OGM sequences by transforming them into a common frame. Although in the transformed sequences the KITTI dataset is heavily biased toward static objects, by learning the difference between subsequent OGMs, our proposed method provides accurate prediction over both the static and moving objects. △ Less

Submitted 22 January, 2019; v1 submitted 21 December, 2018; originally announced December 2018.

Showing 1–11 of 11 results for author: Rohani, M