Search | arXiv e-print repository

Gap Completion in Point Cloud Scene occluded by Vehicles using SGC-Net

Authors: Yu Feng, Yiming Xu, Yan Xia, Claus Brenner, Monika Sester

Abstract: Recent advances in mobile map** systems have greatly enhanced the efficiency and convenience of acquiring urban 3D data. These systems utilize LiDAR sensors mounted on vehicles to capture vast cityscapes. However, a significant challenge arises due to occlusions caused by roadside parked vehicles, leading to the loss of scene information, particularly on the roads, sidewalks, curbs, and the lowe… ▽ More Recent advances in mobile map** systems have greatly enhanced the efficiency and convenience of acquiring urban 3D data. These systems utilize LiDAR sensors mounted on vehicles to capture vast cityscapes. However, a significant challenge arises due to occlusions caused by roadside parked vehicles, leading to the loss of scene information, particularly on the roads, sidewalks, curbs, and the lower sections of buildings. In this study, we present a novel approach that leverages deep neural networks to learn a model capable of filling gaps in urban scenes that are obscured by vehicle occlusion. We have developed an innovative technique where we place virtual vehicle models along road boundaries in the gap-free scene and utilize a ray-casting algorithm to create a new scene with occluded gaps. This allows us to generate diverse and realistic urban point cloud scenes with and without vehicle occlusion, surpassing the limitations of real-world training data collection and annotation. Furthermore, we introduce the Scene Gap Completion Network (SGC-Net), an end-to-end model that can generate well-defined shape boundaries and smooth surfaces within occluded gaps. The experiment results reveal that 97.66% of the filled points fall within a range of 5 centimeters relative to the high-density ground truth point cloud scene. These findings underscore the efficacy of our proposed model in gap completion and reconstructing urban scenes affected by vehicle occlusions. △ Less

Submitted 11 July, 2024; originally announced July 2024.

arXiv:2407.03825 [pdf, other]

StreamLTS: Query-based Temporal-Spatial LiDAR Fusion for Cooperative Object Detection

Authors: Yunshuang Yuan, Monika Sester

Abstract: Cooperative perception via communication among intelligent traffic agents has great potential to improve the safety of autonomous driving. However, limited communication bandwidth, localization errors and asynchronized capturing time of sensor data, all introduce difficulties to the data fusion of different agents. To some extend, previous works have attempted to reduce the shared data size, mitig… ▽ More Cooperative perception via communication among intelligent traffic agents has great potential to improve the safety of autonomous driving. However, limited communication bandwidth, localization errors and asynchronized capturing time of sensor data, all introduce difficulties to the data fusion of different agents. To some extend, previous works have attempted to reduce the shared data size, mitigate the spatial feature misalignment caused by localization errors and communication delay. However, none of them have considered the asynchronized sensor ticking times, which can lead to dynamic object misplacement of more than one meter during data fusion. In this work, we propose Time-Aligned COoperative Object Detection (TA-COOD), for which we adapt widely used dataset OPV2V and DairV2X with considering asynchronous LiDAR sensor ticking times and build an efficient fully sparse framework with modeling the temporal information of individual objects with query-based techniques. The experiment results confirmed the superior efficiency of our fully sparse framework compared to the state-of-the-art dense models. More importantly, they show that the point-wise observation timestamps of the dynamic objects are crucial for accurate modeling the object temporal context and the predictability of their time-related locations. △ Less

Submitted 4 July, 2024; originally announced July 2024.

arXiv:2406.15110 [pdf]

doi 10.5194/isprs-archives-XLVIII-1-W1-2023-325-2023

Voxel-Based Point Cloud Localization for Smart Spaces Management

Authors: F. S. Mortazavi, O. Shkedova, U. Feuerhake, C. Brenner, M. Sester

Abstract: This paper proposes a voxel-based approach for creating a digital twin of an urban environment that is capable of efficiently managing smart spaces. The paper explains the registration and localization procedure of the point cloud dataset, which uses the KISS ICP for scan point cloud combination and the RANSAC method for the initial alignment of the combined point cloud. The mobile map** point c… ▽ More This paper proposes a voxel-based approach for creating a digital twin of an urban environment that is capable of efficiently managing smart spaces. The paper explains the registration and localization procedure of the point cloud dataset, which uses the KISS ICP for scan point cloud combination and the RANSAC method for the initial alignment of the combined point cloud. The mobile map** point cloud using Riegl VMX-250 serves as the reference map, and Velodyne scans are used for localization purposes. The point-to-plane iterative closest-point method is then employed to refine the alignment. The paper evaluates the efficacy of the proposed method by calculating the errors between the estimated and ground truth positions. The results indicate that the voxel-based approach is capable of accurately estimating the position of the sensor platform, which are applicable for various use cases. A specific use case in the context is smart parking space management, which is described and initial visualization results are shown. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Journal ref: XLVIII-1/W1-2023:325-332

arXiv:2404.18617 [pdf, other]

CoSense3D: an Agent-based Efficient Learning Framework for Collective Perception

Authors: Yunshuang Yuan, Monika Sester

Abstract: Collective Perception has attracted significant attention in recent years due to its advantage for mitigating occlusion and expanding the field-of-view, thereby enhancing reliability, efficiency, and, most crucially, decision-making safety. However, develo** collective perception models is highly resource demanding due to extensive requirements of processing input data for many agents, usually d… ▽ More Collective Perception has attracted significant attention in recent years due to its advantage for mitigating occlusion and expanding the field-of-view, thereby enhancing reliability, efficiency, and, most crucially, decision-making safety. However, develo** collective perception models is highly resource demanding due to extensive requirements of processing input data for many agents, usually dozens of images and point clouds for a single frame. This not only slows down the model development process for collective perception but also impedes the utilization of larger models. In this paper, we propose an agent-based training framework that handles the deep learning modules and agent data separately to have a cleaner data flow structure. This framework not only provides an API for flexibly prototy** the data processing pipeline and defining the gradient calculation for each agent, but also provides the user interface for interactive training, testing and data visualization. Training experiment results of four collective object detection models on the prominent collective perception benchmark OPV2V show that the agent-based training can significantly reduce the GPU memory consumption and training time while retaining inference performance. The framework and model implementations are available at \url{https://github.com/YuanYunshuang/CoSense3D} △ Less

Submitted 29 April, 2024; originally announced April 2024.

arXiv:2403.07223 [pdf, other]

3D Uncertain Implicit Surface Map** using GMM and GP

Authors: Qianqian Zou, Monika Sester

Abstract: In this study, we address the challenge of constructing continuous three-dimensional (3D) models that accurately represent uncertain surfaces, derived from noisy and incomplete LiDAR scanning data. Building upon our prior work, which utilized the Gaussian Process (GP) and Gaussian Mixture Model (GMM) for structured building models, we introduce a more generalized approach tailored for complex surf… ▽ More In this study, we address the challenge of constructing continuous three-dimensional (3D) models that accurately represent uncertain surfaces, derived from noisy and incomplete LiDAR scanning data. Building upon our prior work, which utilized the Gaussian Process (GP) and Gaussian Mixture Model (GMM) for structured building models, we introduce a more generalized approach tailored for complex surfaces in urban scenes, where GMM Regression and GP with derivative observations are applied. A Hierarchical GMM (HGMM) is employed to optimize the number of GMM components and speed up the GMM training. With the prior map obtained from HGMM, GP inference is followed for the refinement of the final map. Our approach models the implicit surface of the geo-object and enables the inference of the regions that are not completely covered by measurements. The integration of GMM and GP yields well-calibrated uncertainty estimates alongside the surface model, enhancing both accuracy and reliability. The proposed method is evaluated on real data collected by a mobile map** system. Compared to the performance in map** accuracy and uncertainty quantification of other methods, such as Gaussian Process Implicit Surface map (GPIS) and log-Gaussian Process Implicit Surface map (Log-GPIS), the proposed method achieves lower RMSEs, higher log-likelihood values and lower computational costs for the evaluated datasets. △ Less

Submitted 22 April, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

arXiv:2402.03981 [pdf, other]

Controllable Diverse Sampling for Diffusion Based Motion Behavior Forecasting

Authors: Yiming Xu, Hao Cheng, Monika Sester

Abstract: In autonomous driving tasks, trajectory prediction in complex traffic environments requires adherence to real-world context conditions and behavior multimodalities. Existing methods predominantly rely on prior assumptions or generative models trained on curated data to learn road agents' stochastic behavior bounded by scene constraints. However, they often face mode averaging issues due to data im… ▽ More In autonomous driving tasks, trajectory prediction in complex traffic environments requires adherence to real-world context conditions and behavior multimodalities. Existing methods predominantly rely on prior assumptions or generative models trained on curated data to learn road agents' stochastic behavior bounded by scene constraints. However, they often face mode averaging issues due to data imbalance and simplistic priors, and could even suffer from mode collapse due to unstable training and single ground truth supervision. These issues lead the existing methods to a loss of predictive diversity and adherence to the scene constraints. To address these challenges, we introduce a novel trajectory generator named Controllable Diffusion Trajectory (CDT), which integrates map information and social interactions into a Transformer-based conditional denoising diffusion model to guide the prediction of future trajectories. To ensure multimodality, we incorporate behavioral tokens to direct the trajectory's modes, such as going straight, turning right or left. Moreover, we incorporate the predicted endpoints as an alternative behavioral token into the CDT model to facilitate the prediction of accurate trajectories. Extensive experiments on the Argoverse 2 benchmark demonstrate that CDT excels in generating diverse and scene-compliant trajectories in complex urban settings. △ Less

Submitted 6 February, 2024; originally announced February 2024.

arXiv:2302.13933 [pdf, other]

LAformer: Trajectory Prediction for Autonomous Driving with Lane-Aware Scene Constraints

Authors: Mengmeng Liu, Hao Cheng, Lin Chen, Hellward Broszio, Jiangtao Li, Runjiang Zhao, Monika Sester, Michael Ying Yang

Abstract: Trajectory prediction for autonomous driving must continuously reason the motion stochasticity of road agents and comply with scene constraints. Existing methods typically rely on one-stage trajectory prediction models, which condition future trajectories on observed trajectories combined with fused scene information. However, they often struggle with complex scene constraints, such as those encou… ▽ More Trajectory prediction for autonomous driving must continuously reason the motion stochasticity of road agents and comply with scene constraints. Existing methods typically rely on one-stage trajectory prediction models, which condition future trajectories on observed trajectories combined with fused scene information. However, they often struggle with complex scene constraints, such as those encountered at intersections. To this end, we present a novel method, called LAformer. It uses a temporally dense lane-aware estimation module to select only the top highly potential lane segments in an HD map, which effectively and continuously aligns motion dynamics with scene information, reducing the representation requirements for the subsequent attention-based decoder by filtering out irrelevant lane segments. Additionally, unlike one-stage prediction models, LAformer utilizes predictions from the first stage as anchor trajectories and adds a second-stage motion refinement module to further explore temporal consistency across the complete time horizon. Extensive experiments on Argoverse 1 and nuScenes demonstrate that LAformer achieves excellent performance for multimodal trajectory prediction. △ Less

Submitted 27 February, 2023; originally announced February 2023.

arXiv:2302.07583 [pdf, other]

ForceFormer: Exploring Social Force and Transformer for Pedestrian Trajectory Prediction

Authors: Weicheng Zhang, Hao Cheng, Fatema T. Johora, Monika Sester

Abstract: Predicting trajectories of pedestrians based on goal information in highly interactive scenes is a crucial step toward Intelligent Transportation Systems and Autonomous Driving. The challenges of this task come from two key sources: (1) complex social interactions in high pedestrian density scenarios and (2) limited utilization of goal information to effectively associate with past motion informat… ▽ More Predicting trajectories of pedestrians based on goal information in highly interactive scenes is a crucial step toward Intelligent Transportation Systems and Autonomous Driving. The challenges of this task come from two key sources: (1) complex social interactions in high pedestrian density scenarios and (2) limited utilization of goal information to effectively associate with past motion information. To address these difficulties, we integrate social forces into a Transformer-based stochastic generative model backbone and propose a new goal-based trajectory predictor called ForceFormer. Differentiating from most prior works that simply use the destination position as an input feature, we leverage the driving force from the destination to efficiently simulate the guidance of a target on a pedestrian. Additionally, repulsive forces are used as another input feature to describe the avoidance action among neighboring pedestrians. Extensive experiments show that our proposed method achieves on-par performance measured by distance errors with the state-of-the-art models but evidently decreases collisions, especially in dense pedestrian scenarios on widely used pedestrian datasets. △ Less

Submitted 15 February, 2023; originally announced February 2023.

arXiv:2302.02928 [pdf, other]

doi 10.1016/j.isprsjprs.2023.08.013.

Generating Evidential BEV Maps in Continuous Driving Space

Authors: Yunshuang Yuan, Hao Cheng, Michael Ying Yang, Monika Sester

Abstract: Safety is critical for autonomous driving, and one aspect of improving safety is to accurately capture the uncertainties of the perception system, especially knowing the unknown. Different from only providing deterministic or probabilistic results, e.g., probabilistic object detection, that only provide partial information for the perception scenario, we propose a complete probabilistic model name… ▽ More Safety is critical for autonomous driving, and one aspect of improving safety is to accurately capture the uncertainties of the perception system, especially knowing the unknown. Different from only providing deterministic or probabilistic results, e.g., probabilistic object detection, that only provide partial information for the perception scenario, we propose a complete probabilistic model named GevBEV. It interprets the 2D driving space as a probabilistic Bird's Eye View (BEV) map with point-based spatial Gaussian distributions, from which one can draw evidence as the parameters for the categorical Dirichlet distribution of any new sample point in the continuous driving space. The experimental results show that GevBEV not only provides more reliable uncertainty quantification but also outperforms the previous works on the benchmarks OPV2V and V2V4Real of BEV map interpretation for cooperative perception in simulated and real-world driving scenarios, respectively. A critical factor in cooperative perception is the data transmission size through the communication channels. GevBEV helps reduce communication overhead by selecting only the most important information to share from the learned uncertainty, reducing the average information communicated by 87% with only a slight performance drop. Our code is published at https://github.com/YuanYunshuang/GevBEV. △ Less

Submitted 4 September, 2023; v1 submitted 6 February, 2023; originally announced February 2023.

Journal ref: ISPRS Journal of Photogrammetry and Remote Sensing, 2023

arXiv:2212.07271 [pdf, other]

doi 10.1109/LRA.2023.3303694

Gaussian Process Map** of Uncertain Building Models with GMM as Prior

Authors: Qianqian Zou, Claus Brenner, Monika Sester

Abstract: Map** with uncertainty representation is required in many research domains, especially for localization. Although there are many investigations regarding the uncertainty of the pose estimation of an ego-robot with map information, the quality of the reference maps is often neglected. To avoid potential problems caused by the errors of maps and a lack of uncertainty quantification, an adequate un… ▽ More Map** with uncertainty representation is required in many research domains, especially for localization. Although there are many investigations regarding the uncertainty of the pose estimation of an ego-robot with map information, the quality of the reference maps is often neglected. To avoid potential problems caused by the errors of maps and a lack of uncertainty quantification, an adequate uncertainty measure for the maps is required. In this letter, uncertain building models with abstract map surfaces using Gaussian Processes (GPs) are proposed to describe the map uncertainty in a probabilistic way. To reduce the redundant computation for simple planar objects, extracted facets from a Gaussian Mixture Model (GMM) are combined with an implicit GP map, also employing local GP-block techniques. The proposed method is evaluated on LiDAR point clouds of city buildings collected by a mobile map** system. Compared to the performance of other methods such as OctoMap, GP Occupancy Map (GPOM), Bayesian Generalized Kernel OctoMap (BGKOctoMap), Local automatic relevance determination Hilbert map (LARD-HM) and Gaussian Implicit Surface map (GPIS), our method achieves a higher Precision-Recall AUC for the evaluated buildings. △ Less

Submitted 28 August, 2023; v1 submitted 14 December, 2022; originally announced December 2022.

arXiv:2209.07857 [pdf, other]

GATraj: A Graph- and Attention-based Multi-Agent Trajectory Prediction Model

Authors: Hao Cheng, Mengmeng Liu, Lin Chen, Hellward Broszio, Monika Sester, Michael Ying Yang

Abstract: Trajectory prediction has been a long-standing problem in intelligent systems like autonomous driving and robot navigation. Models trained on large-scale benchmarks have made significant progress in improving prediction accuracy. However, the importance on efficiency for real-time applications has been less emphasized. This paper proposes an attention-based graph model, named GATraj, which achieve… ▽ More Trajectory prediction has been a long-standing problem in intelligent systems like autonomous driving and robot navigation. Models trained on large-scale benchmarks have made significant progress in improving prediction accuracy. However, the importance on efficiency for real-time applications has been less emphasized. This paper proposes an attention-based graph model, named GATraj, which achieves a good balance of prediction accuracy and inference speed. We use attention mechanisms to model the spatial-temporal dynamics of agents, such as pedestrians or vehicles, and a graph convolutional network to model their interactions. Additionally, a Laplacian mixture decoder is implemented to mitigate mode collapse and generate diverse multimodal predictions for each agent. GATraj achieves state-of-the-art prediction performance at a much higher speed when tested on the ETH/UCY datasets for pedestrian trajectories, and good performance at about 100 Hz inference speed when tested on the nuScenes dataset for autonomous driving. We conduct extensive experiments to analyze the probability estimation of the Laplacian mixture decoder and compare it with a Gaussian mixture decoder for predicting different multimodalities. Furthermore, comprehensive ablation studies demonstrate the effectiveness of each proposed module in GATraj. The code is released at https://github.com/mengmengliu1998/GATraj. △ Less

Submitted 19 June, 2023; v1 submitted 16 September, 2022; originally announced September 2022.

arXiv:2205.09418 [pdf, other]

Leveraging Dynamic Objects for Relative Localization Correction in a Connected Autonomous Vehicle Network

Authors: Yunshuang Yuan, Monika Sester

Abstract: High-accurate localization is crucial for the safety and reliability of autonomous driving, especially for the information fusion of collective perception that aims to further improve road safety by sharing information in a communication network of ConnectedAutonomous Vehicles (CAV). In this scenario, small localization errors can impose additional difficulty on fusing the information from differe… ▽ More High-accurate localization is crucial for the safety and reliability of autonomous driving, especially for the information fusion of collective perception that aims to further improve road safety by sharing information in a communication network of ConnectedAutonomous Vehicles (CAV). In this scenario, small localization errors can impose additional difficulty on fusing the information from different CAVs. In this paper, we propose a RANSAC-based (RANdom SAmple Consensus) method to correct the relative localization errors between two CAVs in order to ease the information fusion among the CAVs. Different from previous LiDAR-based localization algorithms that only take the static environmental information into consideration, this method also leverages the dynamic objects for localization thanks to the real-time data sharing between CAVs. Specifically, in addition to the static objects like poles, fences, and facades, the object centers of the detected dynamic vehicles are also used as keypoints for the matching of two point sets. The experiments on the synthetic dataset COMAP show that the proposed method can greatly decrease the relative localization error between two CAVs to less than 20cmas far as there are enough vehicles and poles are correctly detected by bothCAVs. Besides, our proposed method is also highly efficient in runtime and can be used in real-time scenarios of autonomous driving. △ Less

Submitted 30 May, 2022; v1 submitted 19 May, 2022; originally announced May 2022.

Comments: ISPRS congress 2022

arXiv:2205.08783 [pdf, other]

Improving Pedestrian Priority via Grou** and Virtual Lanes

Authors: Yao Li, Vinu Kamalasanan, Mariana Batista, Monika Sester

Abstract: The shared space design is applied in urban streets to support barrier-free movement and integrate traffic participants (such as pedestrians, cyclists and vehicles) into a common road space. Regardless of the low-speed environment, sharing space with motor vehicles can make vulnerable road users feel uneasy. Yet, walking in groups increases their confidence as well as influence the yielding behavi… ▽ More The shared space design is applied in urban streets to support barrier-free movement and integrate traffic participants (such as pedestrians, cyclists and vehicles) into a common road space. Regardless of the low-speed environment, sharing space with motor vehicles can make vulnerable road users feel uneasy. Yet, walking in groups increases their confidence as well as influence the yielding behavior of drivers. Therefore, we propose an innovative approach to support the crossing of pedestrians via grou** and project the virtual lanes in shared spaces. This paper presents the important components of the crowd steering system, discusses the enablers and gaps in the current approach, and illustrates the proposed idea with concept diagrams. △ Less

Submitted 18 May, 2022; originally announced May 2022.

arXiv:2203.09438 [pdf, other]

An Explainable Stacked Ensemble Model for Static Route-Free Estimation of Time of Arrival

Authors: Sören Schleibaum, Jörg P. Müller, Monika Sester

Abstract: To compare alternative taxi schedules and to compute them, as well as to provide insights into an upcoming taxi trip to drivers and passengers, the duration of a trip or its Estimated Time of Arrival (ETA) is predicted. To reach a high prediction precision, machine learning models for ETA are state of the art. One yet unexploited option to further increase prediction precision is to combine multip… ▽ More To compare alternative taxi schedules and to compute them, as well as to provide insights into an upcoming taxi trip to drivers and passengers, the duration of a trip or its Estimated Time of Arrival (ETA) is predicted. To reach a high prediction precision, machine learning models for ETA are state of the art. One yet unexploited option to further increase prediction precision is to combine multiple ETA models into an ensemble. While an increase of prediction precision is likely, the main drawback is that the predictions made by such an ensemble become less transparent due to the sophisticated ensemble architecture. One option to remedy this drawback is to apply eXplainable Artificial Intelligence (XAI). The contribution of this paper is three-fold. First, we combine multiple machine learning models from our previous work for ETA into a two-level ensemble model - a stacked ensemble model - which on its own is novel; therefore, we can outperform previous state-of-the-art static route-free ETA approaches. Second, we apply existing XAI methods to explain the first- and second-level models of the ensemble. Third, we propose three joining methods for combining the first-level explanations with the second-level ones. Those joining methods enable us to explain stacked ensembles for regression tasks. An experimental evaluation shows that the ETA models correctly learned the importance of those input features driving the prediction. △ Less

Submitted 11 January, 2024; v1 submitted 17 March, 2022; originally announced March 2022.

arXiv:2201.05514 [pdf, other]

doi 10.1016/j.compenvurbsys.2022.101759

Determination of building flood risk maps from LiDAR mobile map** data

Authors: Yu Feng, Qing Xiao, Claus Brenner, Aaron Peche, Juntao Yang, Udo Feuerhake, Monika Sester

Abstract: With increasing urbanization, flooding is a major challenge for many cities today. Based on forecast precipitation, topography, and pipe networks, flood simulations can provide early warnings for areas and buildings at risk of flooding. Basement windows, doors, and underground garage entrances are common places where floodwater can flow into a building. Some buildings have been prepared or designe… ▽ More With increasing urbanization, flooding is a major challenge for many cities today. Based on forecast precipitation, topography, and pipe networks, flood simulations can provide early warnings for areas and buildings at risk of flooding. Basement windows, doors, and underground garage entrances are common places where floodwater can flow into a building. Some buildings have been prepared or designed considering the threat of flooding, but others have not. Therefore, knowing the heights of these facade openings helps to identify places that are more susceptible to water ingress. However, such data is not yet readily available in most cities. Traditional surveying of the desired targets may be used, but this is a very time-consuming and laborious process. This research presents a new process for the extraction of windows and doors from LiDAR mobile map** data. Deep learning object detection models are trained to identify these objects. Usually, this requires to provide large amounts of manual annotations. In this paper, we mitigate this problem by leveraging a rule-based method. In a first step, the rule-based method is used to generate pseudo-labels. A semi-supervised learning strategy is then applied with three different levels of supervision. The results show that using only automatically generated pseudo-labels, the learning-based model outperforms the rule-based approach by 14.6% in terms of F1-score. After five hours of human supervision, it is possible to improve the model by another 6.2%. By comparing the detected facade openings' heights with the predicted water levels from a flood simulation model, a map can be produced which assigns per-building flood risk levels. This information can be combined with flood forecasting to provide a more targeted disaster prevention guide for the city's infrastructure and residential buildings. △ Less

Submitted 14 January, 2022; originally announced January 2022.

Journal ref: Computers, Environment and Urban Systems, Vol. 93, April 2022, 101759

arXiv:2109.11615 [pdf, other]

Keypoints-Based Deep Feature Fusion for Cooperative Vehicle Detection of Autonomous Driving

Authors: Yunshuang Yuan, Hao Cheng, Monika Sester

Abstract: Sharing collective perception messages (CPM) between vehicles is investigated to decrease occlusions so as to improve the perception accuracy and safety of autonomous driving. However, highly accurate data sharing and low communication overhead is a big challenge for collective perception, especially when real-time communication is required among connected and automated vehicles. In this paper, we… ▽ More Sharing collective perception messages (CPM) between vehicles is investigated to decrease occlusions so as to improve the perception accuracy and safety of autonomous driving. However, highly accurate data sharing and low communication overhead is a big challenge for collective perception, especially when real-time communication is required among connected and automated vehicles. In this paper, we propose an efficient and effective keypoints-based deep feature fusion framework built on the 3D object detector PV-RCNN, called Fusion PV-RCNN (FPV-RCNN for short), for collective perception. We introduce a high-performance bounding box proposal matching module and a keypoints selection strategy to compress the CPM size and solve the multi-vehicle data fusion problem. Besides, we also propose an effective localization error correction module based on the maximum consensus principle to increase the robustness of the data fusion. Compared to a bird's-eye view (BEV) keypoints feature fusion, FPV-RCNN achieves improved detection accuracy by about 9% at a high evaluation criterion (IoU 0.7) on the synthetic dataset COMAP dedicated to collective perception. In addition, its performance is comparable to two raw data fusion baselines that have no data loss in sharing. Moreover, our method also significantly decreases the CPM size to less than 0.3 KB, and is thus about 50 times smaller than the BEV feature map sharing used in previous works. Even with further decreased CPM feature channels, i.e., from 128 to 32, the detection performance does not show apparent drops. The code of our method is available at https://github.com/YuanYunshuang/FPV_RCNN. △ Less

Submitted 15 February, 2022; v1 submitted 23 September, 2021; originally announced September 2021.

arXiv:2106.06255 [pdf, other]

Improving Take-over Situation by Active Communication

Authors: Monika Sester, Mark Vollrath, Hao Cheng

Abstract: In this short paper an idea is sketched, how to support drivers of an autonomous vehicle in taking back control of the vehicle after a longer section of autonomous cruising. The hypothesis is that a clear communication about the location and behavior of relevant objects in the environment will help the driver to quickly grasp the situational context and thus support drivers in safely handling the… ▽ More In this short paper an idea is sketched, how to support drivers of an autonomous vehicle in taking back control of the vehicle after a longer section of autonomous cruising. The hypothesis is that a clear communication about the location and behavior of relevant objects in the environment will help the driver to quickly grasp the situational context and thus support drivers in safely handling the ongoing driving situation manually after take-over. Based on this hypothesis, a research concept is sketched, which entails the necessary components as well as the disciplines involved. △ Less

Submitted 11 June, 2021; originally announced June 2021.

arXiv:2105.03891 [pdf, other]

Interaction Detection Between Vehicles and Vulnerable Road Users: A Deep Generative Approach with Attention

Authors: Hao Cheng, Li Feng, Hailong Liu, Takatsugu Hirayama, Hiroshi Murase, Monika Sester

Abstract: Intersections where vehicles are permitted to turn and interact with vulnerable road users (VRUs) like pedestrians and cyclists are among some of the most challenging locations for automated and accurate recognition of road users' behavior. In this paper, we propose a deep conditional generative model for interaction detection at such locations. It aims to automatically analyze massive video data… ▽ More Intersections where vehicles are permitted to turn and interact with vulnerable road users (VRUs) like pedestrians and cyclists are among some of the most challenging locations for automated and accurate recognition of road users' behavior. In this paper, we propose a deep conditional generative model for interaction detection at such locations. It aims to automatically analyze massive video data about the continuity of road users' behavior. This task is essential for many intelligent transportation systems such as traffic safety control and self-driving cars that depend on the understanding of road users' locomotion. A Conditional Variational Auto-Encoder based model with Gaussian latent variables is trained to encode road users' behavior and perform probabilistic and diverse predictions of interactions. The model takes as input the information of road users' type, position and motion automatically extracted by a deep learning object detector and optical flow from videos, and generates frame-wise probabilities that represent the dynamics of interactions between a turning vehicle and any VRUs involved. The model's efficacy was validated by testing on real--world datasets acquired from two different intersections. It achieved an F1-score above 0.96 at a right--turn intersection in Germany and 0.89 at a left--turn intersection in Japan, both with very busy traffic flows. △ Less

Submitted 9 May, 2021; originally announced May 2021.

arXiv:2104.06916 [pdf, other]

Autonomous Vehicles Drive into Shared Spaces: eHMI Design Concept Focusing on Vulnerable Road Users

Authors: Yang Li, Hao Cheng, Zhe Zeng, Hailong Liu, Monika Sester

Abstract: In comparison to conventional traffic designs, shared spaces promote a more pleasant urban environment with slower motorized movement, smoother traffic, and less congestion. In the foreseeable future, shared spaces will be populated with a mixture of autonomous vehicles (AVs) and vulnerable road users (VRUs) like pedestrians and cyclists. However, a driver-less AV lacks a way to communicate with t… ▽ More In comparison to conventional traffic designs, shared spaces promote a more pleasant urban environment with slower motorized movement, smoother traffic, and less congestion. In the foreseeable future, shared spaces will be populated with a mixture of autonomous vehicles (AVs) and vulnerable road users (VRUs) like pedestrians and cyclists. However, a driver-less AV lacks a way to communicate with the VRUs when they have to reach an agreement of a negotiation, which brings new challenges to the safety and smoothness of the traffic. To find a feasible solution to integrating AVs seamlessly into shared-space traffic, we first identified the possible issues that the shared-space designs have not considered for the role of AVs. Then an online questionnaire was used to ask participants about how they would like a driver of the manually driving vehicle to communicate with VRUs in a shared space. We found that when the driver wanted to give some suggestions to the VRUs in a negotiation, participants thought that the communications via the driver's body behaviors were necessary. Besides, when the driver conveyed information about her/his intentions and cautions to the VRUs, participants selected different communication methods with respect to their transport modes (as a driver, pedestrian, or cyclist). These results suggest that novel eHMIs might be useful for AV-VRU communication when the original drivers are not present. Hence, a potential eHMI design concept was proposed for different VRUs to meet their various expectations. In the end, we further discussed the effects of the eHMIs on improving the sociality in shared spaces and the autonomous driving systems. △ Less

Submitted 23 September, 2021; v1 submitted 14 April, 2021; originally announced April 2021.

arXiv:2010.16267 [pdf, other]

Exploring Dynamic Context for Multi-path Trajectory Prediction

Authors: Hao Cheng, Wentong Liao, Xuejiao Tang, Michael Ying Yang, Monika Sester, Bodo Rosenhahn

Abstract: To accurately predict future positions of different agents in traffic scenarios is crucial for safely deploying intelligent autonomous systems in the real-world environment. However, it remains a challenge due to the behavior of a target agent being affected by other agents dynamically and there being more than one socially possible paths the agent could take. In this paper, we propose a novel fra… ▽ More To accurately predict future positions of different agents in traffic scenarios is crucial for safely deploying intelligent autonomous systems in the real-world environment. However, it remains a challenge due to the behavior of a target agent being affected by other agents dynamically and there being more than one socially possible paths the agent could take. In this paper, we propose a novel framework, named Dynamic Context Encoder Network (DCENet). In our framework, first, the spatial context between agents is explored by using self-attention architectures. Then, the two-stream encoders are trained to learn temporal context between steps by taking the respective observed trajectories and the extracted dynamic spatial context as input. The spatial-temporal context is encoded into a latent space using a Conditional Variational Auto-Encoder (CVAE) module. Finally, a set of future trajectories for each agent is predicted conditioned on the learned spatial-temporal context by sampling from the latent space, repeatedly. DCENet is evaluated on one of the most popular challenging benchmarks for trajectory forecasting Trajnet and reports a new state-of-the-art performance. It also demonstrates superior performance evaluated on the benchmark inD for mixed traffic at intersections. A series of ablation studies is conducted to validate the effectiveness of each proposed module. Our code is available at https://github.com/wtliao/DCENet. △ Less

Submitted 24 March, 2021; v1 submitted 30 October, 2020; originally announced October 2020.

Comments: accpeted by ICRA 2021, code available

arXiv:2006.11802 [pdf, other]

doi 10.1016/j.isprsjprs.2020.09.011

Flood severity map** from Volunteered Geographic Information by interpreting water level from images containing people: a case study of Hurricane Harvey

Authors: Yu Feng, Claus Brenner, Monika Sester

Abstract: With increasing urbanization, in recent years there has been a growing interest and need in monitoring and analyzing urban flood events. Social media, as a new data source, can provide real-time information for flood monitoring. The social media posts with locations are often referred to as Volunteered Geographic Information (VGI), which can reveal the spatial pattern of such events. Since more im… ▽ More With increasing urbanization, in recent years there has been a growing interest and need in monitoring and analyzing urban flood events. Social media, as a new data source, can provide real-time information for flood monitoring. The social media posts with locations are often referred to as Volunteered Geographic Information (VGI), which can reveal the spatial pattern of such events. Since more images are shared on social media than ever before, recent research focused on the extraction of flood-related posts by analyzing images in addition to texts. Apart from merely classifying posts as flood relevant or not, more detailed information, e.g. the flood severity, can also be extracted based on image interpretation. However, it has been less tackled and has not yet been applied for flood severity map**. In this paper, we propose a novel three-step process to extract and map flood severity information. First, flood relevant images are retrieved with the help of pre-trained convolutional neural networks as feature extractors. Second, the images containing people are further classified into four severity levels by observing the relationship between body parts and their partial inundation, i.e. images are classified according to the water level with respect to different body parts, namely ankle, knee, hip, and chest. Lastly, locations of the Tweets are used for generating a map of estimated flood extent and severity. This process was applied to an image dataset collected during Hurricane Harvey in 2017, as a proof of concept. The results show that VGI can be used as a supplement to remote sensing observations for flood extent map** and is beneficial, especially for urban areas, where the infrastructure is often occluding water. Based on the extracted water level information, an integrated overview of flood severity can be provided for the early stages of emergency response. △ Less

Submitted 30 September, 2020; v1 submitted 21 June, 2020; originally announced June 2020.

arXiv:2006.08264 [pdf, other]

AMENet: Attentive Maps Encoder Network for Trajectory Prediction

Authors: Hao Cheng, Wentong Liao, Michael Ying Yang, Bodo Rosenhahn, Monika Sester

Abstract: Trajectory prediction is critical for applications of planning safe future movements and remains challenging even for the next few seconds in urban mixed traffic. How an agent moves is affected by the various behaviors of its neighboring agents in different environments. To predict movements, we propose an end-to-end generative model named Attentive Maps Encoder Network (AMENet) that encodes the a… ▽ More Trajectory prediction is critical for applications of planning safe future movements and remains challenging even for the next few seconds in urban mixed traffic. How an agent moves is affected by the various behaviors of its neighboring agents in different environments. To predict movements, we propose an end-to-end generative model named Attentive Maps Encoder Network (AMENet) that encodes the agent's motion and interaction information for accurate and realistic multi-path trajectory prediction. A conditional variational auto-encoder module is trained to learn the latent space of possible future paths based on attentive dynamic maps for interaction modeling and then is used to predict multiple plausible future trajectories conditioned on the observed past trajectories. The efficacy of AMENet is validated using two public trajectory prediction benchmarks Trajnet and InD. △ Less

Submitted 13 January, 2021; v1 submitted 15 June, 2020; originally announced June 2020.

Comments: Accepted by ISPRS Journal of Photogrammetry and Remote Sensing

arXiv:2002.05966 [pdf, other]

MCENET: Multi-Context Encoder Network for Homogeneous Agent Trajectory Prediction in Mixed Traffic

Authors: Hao Cheng, Wentong Liao, Michael Ying Yang, Monika Sester, Bodo Rosenhahn

Abstract: Trajectory prediction in urban mixed-traffic zones (a.k.a. shared spaces) is critical for many intelligent transportation systems, such as intent detection for autonomous driving. However, there are many challenges to predict the trajectories of heterogeneous road agents (pedestrians, cyclists and vehicles) at a microscopical level. For example, an agent might be able to choose multiple plausible… ▽ More Trajectory prediction in urban mixed-traffic zones (a.k.a. shared spaces) is critical for many intelligent transportation systems, such as intent detection for autonomous driving. However, there are many challenges to predict the trajectories of heterogeneous road agents (pedestrians, cyclists and vehicles) at a microscopical level. For example, an agent might be able to choose multiple plausible paths in complex interactions with other agents in varying environments. To this end, we propose an approach named Multi-Context Encoder Network (MCENET) that is trained by encoding both past and future scene context, interaction context and motion information to capture the patterns and variations of the future trajectories using a set of stochastic latent variables. In inference time, we combine the past context and motion information of the target agent with samplings of the latent variables to predict multiple realistic trajectories in the future. Through experiments on several datasets of varying scenes, our method outperforms some of the recent state-of-the-art methods for mixed traffic trajectory prediction by a large margin and more robust in a very challenging environment. The impact of each context is justified via ablation studies. △ Less

Submitted 23 June, 2020; v1 submitted 14 February, 2020; originally announced February 2020.

Comments: 8 pages, 5 figures, code is available on https://github.com/haohao11/MCENET

arXiv:1908.00625 [pdf]

doi 10.1016/j.jclepro.2019.117732

Learning about spatial inequalities: Capturing the heterogeneity in the urban environment

Authors: J. Siqueira-Gay, M. A. Giannotti, M. Sester

Abstract: Transportation systems can be conceptualized as an instrument of spreading people and resources over the territory, playing an important role in develo** sustainable cities. The current rationale of transport provision is based on population demand, disregarding land use and socioeconomic information. To meet the challenge to promote a more equitable resource distribution, this work aims at iden… ▽ More Transportation systems can be conceptualized as an instrument of spreading people and resources over the territory, playing an important role in develo** sustainable cities. The current rationale of transport provision is based on population demand, disregarding land use and socioeconomic information. To meet the challenge to promote a more equitable resource distribution, this work aims at identifying and describing patterns of urban services supply, their accessibility, and household income. By using a multidimensional approach, the spatial inequalities of a large city of the global south reveal that the low-income population has low access mainly to hospitals and cultural centers. A low-income group presents an intermediate level of accessibility to public schools and sports centers, evidencing the diverse condition of citizens in the peripheries. These complex outcomes generated by the interaction of land use and public transportation emphasize the importance of comprehensive methodological approaches to support decisions of urban projects, plans and programs. Reducing spatial inequalities, especially providing services for deprived groups, is fundamental to promote the sustainable use of resources and optimize the daily commuting. △ Less

Submitted 24 July, 2019; originally announced August 2019.

arXiv:1709.08489 [pdf, other]

Location- and time-dependent meeting point recommendations for shared interurban rides

Authors: Paul Czioska, Aleksandar Trifunović, Sophie Dennisen, Monika Sester

Abstract: Drivers offering spare seats in their vehicles on long-distance (interurban) trips often have to pick up or drop off passengers in cities en route. In that case it is necessary to agree on a meeting point. Often, this is done by proposing well-known locations like train stations, which frequently induces unnecessary detours through the inner-city districts. In contrast, meeting points in the vicin… ▽ More Drivers offering spare seats in their vehicles on long-distance (interurban) trips often have to pick up or drop off passengers in cities en route. In that case it is necessary to agree on a meeting point. Often, this is done by proposing well-known locations like train stations, which frequently induces unnecessary detours through the inner-city districts. In contrast, meeting points in the vicinity of motorways and arterial roads with good public transport connection can reduce driving time and mileage. This work proposes a location-based approach to enable a fast and automatic recommendation of suitable pick-up (and drop-off) points for drivers and passengers using a GIS workflow and comprehensive precomputation of travel times. △ Less

Submitted 21 September, 2017; originally announced September 2017.

Comments: 22 pages. Submitted to Journal of Location Based Services

arXiv:1709.08488 [pdf, other]

Real-world Meeting Points for Shared Demand-Responsive Transportation Systems

Authors: Paul Czioska, Ronny Kutadinata, Aleksandar Trifunović, Stephan Winter, Monika Sester, Bernhard Friedrich

Abstract: While conventional shared demand-responsive transportation (SDRT) systems mostly operate on a door-to-door policy, the usage of meeting points for the pick-up and drop-off of user groups can offer several advantages, like fewer stops and less total travelled mileage. Moreover, it offers the possibility to select only feasible and well-defined locations where a safe (de-)boarding is possible. This… ▽ More While conventional shared demand-responsive transportation (SDRT) systems mostly operate on a door-to-door policy, the usage of meeting points for the pick-up and drop-off of user groups can offer several advantages, like fewer stops and less total travelled mileage. Moreover, it offers the possibility to select only feasible and well-defined locations where a safe (de-)boarding is possible. This paper presents a three-step workflow for solving the SDRT problem with meeting points (SDRT-MP). Firstly, the customers are clustered into similar groups, then meeting (and divergence) points are determined for each cluster. Finally, a parallel neighbourhood search algorithm is applied to create the vehicle routes. Further, a simulation with realistic pick-up and drop-off locations based on map data is performed in order to demonstrate the impact of using meeting points for SDRT systems in contrast to the door-to-door service. Although the average passenger travel time is higher due to enhanced walking and waiting times, the experiment highlights a reduction of operator resources required to serve all customers. △ Less

Submitted 21 September, 2017; originally announced September 2017.

Comments: 28 pages. Submitted to Transportation Science

arXiv:1511.03010 [pdf]

doi 10.3390/ijgi5050055.

Geospatial Big Data Handling Theory and Methods: A Review and Research Challenges

Authors: S. Li, S. Dragicevic, F. Anton, M. Sester, S. Winter, A. Coltekin, C. Pettit, B. Jiang, J. Haworth, A. Stein, T. Cheng

Abstract: Big data has now become a strong focus of global interest that is increasingly attracting the attention of academia, industry, government and other organizations. Big data can be situated in the disciplinary area of traditional geospatial data handling theory and methods. The increasing volume and varying format of collected geospatial big data presents challenges in storing, managing, processing,… ▽ More Big data has now become a strong focus of global interest that is increasingly attracting the attention of academia, industry, government and other organizations. Big data can be situated in the disciplinary area of traditional geospatial data handling theory and methods. The increasing volume and varying format of collected geospatial big data presents challenges in storing, managing, processing, analyzing, visualizing and verifying the quality of data. This has implications for the quality of decisions made with big data. Consequently, this position paper of the International Society for Photogrammetry and Remote Sensing (ISPRS) Technical Commission II (TC II) revisits the existing geospatial data handling methods and theories to determine if they are still capable of handling emerging geospatial big data. Further, the paper synthesises problems, major issues and challenges with current developments as well as recommending what needs to be developed further in the near future. Keywords: Big data, Geospatial, Data handling, Analytics, Spatial Modeling, Review △ Less

Submitted 10 November, 2015; originally announced November 2015.

Comments: 25 pages, 3 figures

Journal ref: ISPRS International Journal of Geo-Information, 5(5), 55, 2016

Showing 1–27 of 27 results for author: Sester, M