Search | arXiv e-print repository

A Deep Automotive Radar Detector using the RaDelft Dataset

Authors: Ignacio Roldan, Andras Palffy, Julian F. P. Kooij, Dariu M. Gavrila, Francesco Fioranelli, Alexander Yarovoy

Abstract: The detection of multiple extended targets in complex environments using high-resolution automotive radar is considered. A data-driven approach is proposed where unlabeled synchronized lidar data is used as ground truth to train a neural network with only radar data as input. To this end, the novel, large-scale, real-life, and multi-sensor RaDelft dataset has been recorded using a demonstrator veh… ▽ More The detection of multiple extended targets in complex environments using high-resolution automotive radar is considered. A data-driven approach is proposed where unlabeled synchronized lidar data is used as ground truth to train a neural network with only radar data as input. To this end, the novel, large-scale, real-life, and multi-sensor RaDelft dataset has been recorded using a demonstrator vehicle in different locations in the city of Delft. The dataset, as well as the documentation and example code, is publicly available for those researchers in the field of automotive radar or machine perception. The proposed data-driven detector is able to generate lidar-like point clouds using only radar data from a high-resolution system, which preserves the shape and size of extended targets. The results are compared against conventional CFAR detectors as well as variations of the method to emulate the available approaches in the literature, using the probability of detection, the probability of false alarm, and the Chamfer distance as performance metrics. Moreover, an ablation study was carried out to assess the impact of Doppler and temporal information on detection performance. The proposed method outperforms the different baselines in terms of Chamfer distance, achieving a reduction of 75% against conventional CFAR detectors and 10% against the modified state-of-the-art deep learning-based approaches. △ Less

Submitted 27 June, 2024; v1 submitted 7 June, 2024; originally announced June 2024.

Comments: Under review at IEEE Transaction on Radar Systems

arXiv:2405.15688 [pdf, other]

UNION: Unsupervised 3D Object Detection using Object Appearance-based Pseudo-Classes

Authors: Ted Lentsch, Holger Caesar, Dariu M. Gavrila

Abstract: Unsupervised 3D object detection methods have emerged to leverage vast amounts of data efficiently without requiring manual labels for training. Recent approaches rely on dynamic objects for learning to detect objects but penalize the detections of static instances during training. Multiple rounds of (self) training are used in which detected static instances are added to the set of training targe… ▽ More Unsupervised 3D object detection methods have emerged to leverage vast amounts of data efficiently without requiring manual labels for training. Recent approaches rely on dynamic objects for learning to detect objects but penalize the detections of static instances during training. Multiple rounds of (self) training are used in which detected static instances are added to the set of training targets; this procedure to improve performance is computationally expensive. To address this, we propose the method UNION. We use spatial clustering and self-supervised scene flow to obtain a set of static and dynamic object proposals from LiDAR. Subsequently, object proposals' visual appearances are encoded to distinguish static objects in the foreground and background by selecting static instances that are visually similar to dynamic objects. As a result, static and dynamic foreground objects are obtained together, and existing detectors can be trained with a single training. In addition, we extend 3D object discovery to detection by using object appearance-based cluster labels as pseudo-class labels for training object classification. We conduct extensive experiments on the nuScenes dataset and increase the state-of-the-art performance for unsupervised object discovery, i.e. UNION more than doubles the average precision to 33.9. The code will be made publicly available. △ Less

Submitted 24 May, 2024; originally announced May 2024.

Comments: Under review

MSC Class: 68T10; 62H35; 68T05; 68U10 ACM Class: I.2.10; I.4.8; I.5.1; I.5.4

arXiv:2402.12970 [pdf, other]

doi 10.1109/RadarConf2458775.2024.10548426

See Further Than CFAR: a Data-Driven Radar Detector Trained by Lidar

Authors: Ignacio Roldan, Andras Palffy, Julian F. P. Kooij, Dariu M. Gavrila, Francesco Fioranelli, Alexander Yarovoy

Abstract: In this paper, we address the limitations of traditional constant false alarm rate (CFAR) target detectors in automotive radars, particularly in complex urban environments with multiple objects that appear as extended targets. We propose a data-driven radar target detector exploiting a highly efficient 2D CNN backbone inspired by the computer vision domain. Our approach is distinguished by a uniqu… ▽ More In this paper, we address the limitations of traditional constant false alarm rate (CFAR) target detectors in automotive radars, particularly in complex urban environments with multiple objects that appear as extended targets. We propose a data-driven radar target detector exploiting a highly efficient 2D CNN backbone inspired by the computer vision domain. Our approach is distinguished by a unique cross sensor supervision pipeline, enabling it to learn exclusively from unlabeled synchronized radar and lidar data, thus eliminating the need for costly manual object annotations. Using a novel large-scale, real-life multi-sensor dataset recorded in various driving scenarios, we demonstrate that the proposed detector generates dense, lidar-like point clouds, achieving a lower Chamfer distance to the reference lidar point clouds than CFAR detectors. Overall, it significantly outperforms CFAR baselines detection accuracy. △ Less

Submitted 27 February, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

Comments: Accepted for lecture presentation at IEEE RadarConf'24, Denver, USA

Journal ref: 2024 IEEE Radar Conference (RadarConf24)

arXiv:2310.10353 [pdf, other]

Multimodal Object Query Initialization for 3D Object Detection

Authors: Mathijs R. van Geerenstein, Felicia Ruppel, Klaus Dietmayer, Dariu M. Gavrila

Abstract: 3D object detection models that exploit both LiDAR and camera sensor features are top performers in large-scale autonomous driving benchmarks. A transformer is a popular network architecture used for this task, in which so-called object queries act as candidate objects. Initializing these object queries based on current sensor inputs is a common practice. For this, existing methods strongly rely o… ▽ More 3D object detection models that exploit both LiDAR and camera sensor features are top performers in large-scale autonomous driving benchmarks. A transformer is a popular network architecture used for this task, in which so-called object queries act as candidate objects. Initializing these object queries based on current sensor inputs is a common practice. For this, existing methods strongly rely on LiDAR data however, and do not fully exploit image features. Besides, they introduce significant latency. To overcome these limitations we propose EfficientQ3M, an efficient, modular, and multimodal solution for object query initialization for transformer-based 3D object detection models. The proposed initialization method is combined with a "modality-balanced" transformer decoder where the queries can access all sensor modalities throughout the decoder. In experiments, we outperform the state of the art in transformer-based LiDAR object detection on the competitive nuScenes benchmark and showcase the benefits of input-dependent multimodal query initialization, while being more efficient than the available alternatives for LiDAR-camera initialization. The proposed method can be applied with any combination of sensor modalities as input, demonstrating its modularity. △ Less

Submitted 16 October, 2023; originally announced October 2023.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2303.00462 [pdf, other]

Hidden Gems: 4D Radar Scene Flow Learning Using Cross-Modal Supervision

Authors: Fangqiang Ding, Andras Palffy, Dariu M. Gavrila, Chris Xiaoxuan Lu

Abstract: This work proposes a novel approach to 4D radar-based scene flow estimation via cross-modal learning. Our approach is motivated by the co-located sensing redundancy in modern autonomous vehicles. Such redundancy implicitly provides various forms of supervision cues to the radar scene flow estimation. Specifically, we introduce a multi-task model architecture for the identified cross-modal learning… ▽ More This work proposes a novel approach to 4D radar-based scene flow estimation via cross-modal learning. Our approach is motivated by the co-located sensing redundancy in modern autonomous vehicles. Such redundancy implicitly provides various forms of supervision cues to the radar scene flow estimation. Specifically, we introduce a multi-task model architecture for the identified cross-modal learning problem and propose loss functions to opportunistically engage scene flow estimation using multiple cross-modal constraints for effective model training. Extensive experiments show the state-of-the-art performance of our method and demonstrate the effectiveness of cross-modal supervised learning to infer more accurate 4D radar scene flow. We also show its usefulness to two subtasks - motion segmentation and ego-motion estimation. Our source code will be available on https://github.com/Toytiny/CMFlow. △ Less

Submitted 17 March, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

Comments: 10 pages, 7 figures. Accepted by CVPR 2023. See our code at https://github.com/Toytiny/CMFlow. Supplementary materials can be found at https://drive.google.com/file/d/1Iewcqnjzecge2ePBM8k2tg-85LX5xs3N/view

arXiv:2211.13309 [pdf, other]

How do Cross-View and Cross-Modal Alignment Affect Representations in Contrastive Learning?

Authors: Thomas M. Hehn, Julian F. P. Kooij, Dariu M. Gavrila

Abstract: Various state-of-the-art self-supervised visual representation learning approaches take advantage of data from multiple sensors by aligning the feature representations across views and/or modalities. In this work, we investigate how aligning representations affects the visual features obtained from cross-view and cross-modal contrastive learning on images and point clouds. On five real-world datas… ▽ More Various state-of-the-art self-supervised visual representation learning approaches take advantage of data from multiple sensors by aligning the feature representations across views and/or modalities. In this work, we investigate how aligning representations affects the visual features obtained from cross-view and cross-modal contrastive learning on images and point clouds. On five real-world datasets and on five tasks, we train and evaluate 108 models based on four pretraining variations. We find that cross-modal representation alignment discards complementary visual information, such as color and texture, and instead emphasizes redundant depth cues. The depth cues obtained from pretraining improve downstream depth prediction performance. Also overall, cross-modal alignment leads to more robust encoders than pre-training by cross-view alignment, especially on depth prediction, instance segmentation, and object detection. △ Less

Submitted 23 November, 2022; originally announced November 2022.

arXiv:2211.13133 [pdf, other]

Structural Knowledge Distillation for Object Detection

Authors: Philip de Rijk, Lukas Schneider, Marius Cordts, Dariu M. Gavrila

Abstract: Knowledge Distillation (KD) is a well-known training paradigm in deep neural networks where knowledge acquired by a large teacher model is transferred to a small student. KD has proven to be an effective technique to significantly improve the student's performance for various tasks including object detection. As such, KD techniques mostly rely on guidance at the intermediate feature level, which i… ▽ More Knowledge Distillation (KD) is a well-known training paradigm in deep neural networks where knowledge acquired by a large teacher model is transferred to a small student. KD has proven to be an effective technique to significantly improve the student's performance for various tasks including object detection. As such, KD techniques mostly rely on guidance at the intermediate feature level, which is typically implemented by minimizing an lp-norm distance between teacher and student activations during training. In this paper, we propose a replacement for the pixel-wise independent lp-norm based on the structural similarity (SSIM). By taking into account additional contrast and structural cues, feature importance, correlation and spatial dependence in the feature space are considered in the loss formulation. Extensive experiments on MSCOCO demonstrate the effectiveness of our method across different training schemes and architectures. Our method adds only little computational overhead, is straightforward to implement and at the same time it significantly outperforms the standard lp-norms. Moreover, more complex state-of-the-art KD methods using attention-based sampling mechanisms are outperformed, including a +3.5 AP gain using a Faster R-CNN R-50 compared to a vanilla model. △ Less

Submitted 23 November, 2022; originally announced November 2022.

arXiv:2011.09141 [pdf, other]

Semantic Scene Completion using Local Deep Implicit Functions on LiDAR Data

Authors: Christoph B. Rist, David Emmerichs, Markus Enzweiler, Dariu M. Gavrila

Abstract: Semantic scene completion is the task of jointly estimating 3D geometry and semantics of objects and surfaces within a given extent. This is a particularly challenging task on real-world data that is sparse and occluded. We propose a scene segmentation network based on local Deep Implicit Functions as a novel learning-based method for scene completion. Unlike previous work on scene completion, our… ▽ More Semantic scene completion is the task of jointly estimating 3D geometry and semantics of objects and surfaces within a given extent. This is a particularly challenging task on real-world data that is sparse and occluded. We propose a scene segmentation network based on local Deep Implicit Functions as a novel learning-based method for scene completion. Unlike previous work on scene completion, our method produces a continuous scene representation that is not based on voxelization. We encode raw point clouds into a latent space locally and at multiple spatial resolutions. A global scene completion function is subsequently assembled from the localized function patches. We show that this continuous representation is suitable to encode geometric and semantic properties of extensive outdoor scenes without the need for spatial discretization (thus avoiding the trade-off between level of scene detail and the scene extent that can be covered). We train and evaluate our method on semantically annotated LiDAR scans from the Semantic KITTI dataset. Our experiments verify that our method generates a powerful representation that can be decoded into a dense 3D description of a given scene. The performance of our method surpasses the state of the art on the Semantic KITTI Scene Completion Benchmark in terms of geometric completion intersection-over-union (IoU). △ Less

Submitted 12 April, 2021; v1 submitted 18 November, 2020; originally announced November 2020.

ACM Class: I.4.8

arXiv:2004.12678 [pdf, other]

Diversity in Action: General-Sum Multi-Agent Continuous Inverse Optimal Control

Authors: Christian Muench, Frans A. Oliehoek, Dariu M. Gavrila

Abstract: Traffic scenarios are inherently interactive. Multiple decision-makers predict the actions of others and choose strategies that maximize their rewards. We view these interactions from the perspective of game theory which introduces various challenges. Humans are not entirely rational, their rewards need to be inferred from real-world data, and any prediction algorithm needs to be real-time capable… ▽ More Traffic scenarios are inherently interactive. Multiple decision-makers predict the actions of others and choose strategies that maximize their rewards. We view these interactions from the perspective of game theory which introduces various challenges. Humans are not entirely rational, their rewards need to be inferred from real-world data, and any prediction algorithm needs to be real-time capable so that we can use it in an autonomous vehicle (AV). In this work, we present a game-theoretic method that addresses all of the points above. Compared to many existing methods used for AVs, our approach does 1) not require perfect communication, and 2) allows for individual rewards per agent. Our experiments demonstrate that these more realistic assumptions lead to qualitatively and quantitatively different reward inference and prediction of future actions that match better with expected real-world behaviour. △ Less

Submitted 27 April, 2020; originally announced April 2020.

Comments: 16 pages, 6 figures

arXiv:2004.12165 [pdf, other]

doi 10.1109/LRA.2020.2967272

CNN based Road User Detection using the 3D Radar Cube

Authors: Andras Palffy, Jiaao Dong, Julian F. P. Kooij, Dariu M. Gavrila

Abstract: This letter presents a novel radar based, single-frame, multi-class detection method for moving road users (pedestrian, cyclist, car), which utilizes low-level radar cube data. The method provides class information both on the radar target- and object-level. Radar targets are classified individually after extending the target features with a cropped block of the 3D radar cube around their position… ▽ More This letter presents a novel radar based, single-frame, multi-class detection method for moving road users (pedestrian, cyclist, car), which utilizes low-level radar cube data. The method provides class information both on the radar target- and object-level. Radar targets are classified individually after extending the target features with a cropped block of the 3D radar cube around their positions, thereby capturing the motion of moving parts in the local velocity distribution. A Convolutional Neural Network (CNN) is proposed for this classification step. Afterwards, object proposals are generated with a clustering step, which not only considers the radar targets' positions and velocities, but their calculated class scores as well. In experiments on a real-life dataset we demonstrate that our method outperforms the state-of-the-art methods both target- and object-wise by reaching an average of 0.70 (baseline: 0.68) target-wise and 0.56 (baseline: 0.48) object-wise F1 score. Furthermore, we examine the importance of the used features in an ablation study. △ Less

Submitted 16 July, 2020; v1 submitted 25 April, 2020; originally announced April 2020.

Journal ref: IEEE Robotics and Automation Letters (RAL), vol. 5, nr. 2, pp. 1263-1270, 2020

arXiv:1905.06113 [pdf, other]

doi 10.1177/0278364920917446

Human Motion Trajectory Prediction: A Survey

Authors: Andrey Rudenko, Luigi Palmieri, Michael Herman, Kris M. Kitani, Dariu M. Gavrila, Kai O. Arras

Abstract: With growing numbers of intelligent autonomous systems in human environments, the ability of such systems to perceive, understand and anticipate human behavior becomes increasingly important. Specifically, predicting future positions of dynamic agents and planning considering such predictions are key tasks for self-driving vehicles, service robots and advanced surveillance systems. This paper prov… ▽ More With growing numbers of intelligent autonomous systems in human environments, the ability of such systems to perceive, understand and anticipate human behavior becomes increasingly important. Specifically, predicting future positions of dynamic agents and planning considering such predictions are key tasks for self-driving vehicles, service robots and advanced surveillance systems. This paper provides a survey of human motion trajectory prediction. We review, analyze and structure a large selection of work from different communities and propose a taxonomy that categorizes existing methods based on the motion modeling approach and level of contextual information used. We provide an overview of the existing datasets and performance metrics. We discuss limitations of the state of the art and outline directions for further research. △ Less

Submitted 17 December, 2019; v1 submitted 15 May, 2019; originally announced May 2019.

Comments: Submitted to the International Journal of Robotics Research (IJRR), 37 pages

arXiv:1903.11532 [pdf, other]

Privacy Protection in Street-View Panoramas using Depth and Multi-View Imagery

Authors: Ries Uittenbogaard, Clint Sebastian, Julien Vijverberg, Bas Boom, Dariu M. Gavrila, Peter H. N. de With

Abstract: The current paradigm in privacy protection in street-view images is to detect and blur sensitive information. In this paper, we propose a framework that is an alternative to blurring, which automatically removes and inpaints moving objects (e.g. pedestrians, vehicles) in street-view imagery. We propose a novel moving object segmentation algorithm exploiting consistencies in depth across multiple s… ▽ More The current paradigm in privacy protection in street-view images is to detect and blur sensitive information. In this paper, we propose a framework that is an alternative to blurring, which automatically removes and inpaints moving objects (e.g. pedestrians, vehicles) in street-view imagery. We propose a novel moving object segmentation algorithm exploiting consistencies in depth across multiple street-view images that are later combined with the results of a segmentation network. The detected moving objects are removed and inpainted with information from other views, to obtain a realistic output image such that the moving object is not visible anymore. We evaluate our results on a dataset of 1000 images to obtain a peak noise-to-signal ratio (PSNR) and L1 loss of 27.2 dB and 2.5%, respectively. To ensure the subjective quality, To assess overall quality, we also report the results of a survey conducted on 35 professionals, asked to visually inspect the images whether object removal and inpainting had taken place. The inpainting dataset will be made publicly available for scientific benchmarking purposes at https://research.cyclomedia.com △ Less

Submitted 27 March, 2019; originally announced March 2019.

Comments: Accepted to CVPR 2019. Dataset (and provided link) will be made available before the CVPR

arXiv:1805.07193 [pdf, other]

doi 10.1109/TPAMI.2019.2897684

The EuroCity Persons Dataset: A Novel Benchmark for Object Detection

Authors: Markus Braun, Sebastian Krebs, Fabian Flohr, Dariu M. Gavrila

Abstract: Big data has had a great share in the success of deep learning in computer vision. Recent works suggest that there is significant further potential to increase object detection performance by utilizing even bigger datasets. In this paper, we introduce the EuroCity Persons dataset, which provides a large number of highly diverse, accurate and detailed annotations of pedestrians, cyclists and other… ▽ More Big data has had a great share in the success of deep learning in computer vision. Recent works suggest that there is significant further potential to increase object detection performance by utilizing even bigger datasets. In this paper, we introduce the EuroCity Persons dataset, which provides a large number of highly diverse, accurate and detailed annotations of pedestrians, cyclists and other riders in urban traffic scenes. The images for this dataset were collected on-board a moving vehicle in 31 cities of 12 European countries. With over 238200 person instances manually labeled in over 47300 images, EuroCity Persons is nearly one order of magnitude larger than person datasets used previously for benchmarking. The dataset furthermore contains a large number of person orientation annotations (over 211200). We optimize four state-of-the-art deep learning approaches (Faster R-CNN, R-FCN, SSD and YOLOv3) to serve as baselines for the new object detection benchmark. In experiments with previous datasets we analyze the generalization capabilities of these detectors when trained with the new dataset. We furthermore study the effect of the training set size, the dataset diversity (day- vs. night-time, geographical region), the dataset detail (i.e. availability of object orientation information) and the annotation quality on the detector performance. Finally, we analyze error sources and discuss the road ahead. △ Less

Submitted 5 June, 2018; v1 submitted 18 May, 2018; originally announced May 2018.

Comments: Submitted to IEEE Trans. on Pattern Analysis and Machine Intelligence

Journal ref: Published in IEEE Trans. on Pattern Analysis and Machine Intelligence, 2019

Showing 1–13 of 13 results for author: Gavrila, D M