Search | arXiv e-print repository

Consistency Regularisation for Unsupervised Domain Adaptation in Monocular Depth Estimation

Authors: Amir El-Ghoussani, Julia Hornauer, Gustavo Carneiro, Vasileios Belagiannis

Abstract: In monocular depth estimation, unsupervised domain adaptation has recently been explored to relax the dependence on large annotated image-based depth datasets. However, this comes at the cost of training multiple models or requiring complex training protocols. We formulate unsupervised domain adaptation for monocular depth estimation as a consistency-based semi-supervised learning problem by assum… ▽ More In monocular depth estimation, unsupervised domain adaptation has recently been explored to relax the dependence on large annotated image-based depth datasets. However, this comes at the cost of training multiple models or requiring complex training protocols. We formulate unsupervised domain adaptation for monocular depth estimation as a consistency-based semi-supervised learning problem by assuming access only to the source domain ground truth labels. To this end, we introduce a pairwise loss function that regularises predictions on the source domain while enforcing perturbation consistency across multiple augmented views of the unlabelled target samples. Importantly, our approach is simple and effective, requiring only training of a single model in contrast to the prior work. In our experiments, we rely on the standard depth estimation benchmarks KITTI and NYUv2 to demonstrate state-of-the-art results compared to related approaches. Furthermore, we analyse the simplicity and effectiveness of our approach in a series of ablation studies. The code is available at \url{https://github.com/AmirMaEl/SemiSupMDE}. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: Accepted to Conference on Lifelong Learning Agents (CoLLAs) 2024

arXiv:2403.06020 [pdf, other]

Multi-conditioned Graph Diffusion for Neural Architecture Search

Authors: Rohan Asthana, Joschua Conrad, Youssef Dawoud, Maurits Ortmanns, Vasileios Belagiannis

Abstract: Neural architecture search automates the design of neural network architectures usually by exploring a large and thus complex architecture search space. To advance the architecture search, we present a graph diffusion-based NAS approach that uses discrete conditional graph diffusion processes to generate high-performing neural network architectures. We then propose a multi-conditioned classifier-f… ▽ More Neural architecture search automates the design of neural network architectures usually by exploring a large and thus complex architecture search space. To advance the architecture search, we present a graph diffusion-based NAS approach that uses discrete conditional graph diffusion processes to generate high-performing neural network architectures. We then propose a multi-conditioned classifier-free guidance approach applied to graph diffusion networks to jointly impose constraints such as high accuracy and low hardware latency. Unlike the related work, our method is completely differentiable and requires only a single model training. In our evaluations, we show promising results on six standard benchmarks, yielding novel and unique architectures at a fast speed, i.e. less than 0.2 seconds per architecture. Furthermore, we demonstrate the generalisability and efficiency of our method through experiments on ImageNet dataset. △ Less

Submitted 22 March, 2024; v1 submitted 9 March, 2024; originally announced March 2024.

Comments: Accepted at Transactions on Machine Learning Research (TMLR)

arXiv:2308.09080 [pdf, other]

Pedestrian Environment Model for Automated Driving

Authors: Adrian Holzbock, Alexander Tsaregorodtsev, Vasileios Belagiannis

Abstract: Besides interacting correctly with other vehicles, automated vehicles should also be able to react in a safe manner to vulnerable road users like pedestrians or cyclists. For a safe interaction between pedestrians and automated vehicles, the vehicle must be able to interpret the pedestrian's behavior. Common environment models do not contain information like body poses used to understand the pedes… ▽ More Besides interacting correctly with other vehicles, automated vehicles should also be able to react in a safe manner to vulnerable road users like pedestrians or cyclists. For a safe interaction between pedestrians and automated vehicles, the vehicle must be able to interpret the pedestrian's behavior. Common environment models do not contain information like body poses used to understand the pedestrian's intent. In this work, we propose an environment model that includes the position of the pedestrians as well as their pose information. We only use images from a monocular camera and the vehicle's localization data as input to our pedestrian environment model. We extract the skeletal information with a neural network human pose estimator from the image. Furthermore, we track the skeletons with a simple tracking algorithm based on the Hungarian algorithm and an ego-motion compensation. To obtain the 3D information of the position, we aggregate the data from consecutive frames in conjunction with the vehicle position. We demonstrate our pedestrian environment model on data generated with the CARLA simulator and the nuScenes dataset. Overall, we reach a relative position error of around 16% on both datasets. △ Less

Submitted 17 August, 2023; originally announced August 2023.

Comments: Accepted for presentation at the 26th IEEE International Conference on Intelligent Transportation Systems (ITSC 2023), 24-28 September 2023, Bilbao, Bizkaia, Spain

arXiv:2308.06072 [pdf, other]

Out-of-Distribution Detection for Monocular Depth Estimation

Authors: Julia Hornauer, Adrian Holzbock, Vasileios Belagiannis

Abstract: In monocular depth estimation, uncertainty estimation approaches mainly target the data uncertainty introduced by image noise. In contrast to prior work, we address the uncertainty due to lack of knowledge, which is relevant for the detection of data not represented by the training distribution, the so-called out-of-distribution (OOD) data. Motivated by anomaly detection, we propose to detect OOD… ▽ More In monocular depth estimation, uncertainty estimation approaches mainly target the data uncertainty introduced by image noise. In contrast to prior work, we address the uncertainty due to lack of knowledge, which is relevant for the detection of data not represented by the training distribution, the so-called out-of-distribution (OOD) data. Motivated by anomaly detection, we propose to detect OOD images from an encoder-decoder depth estimation model based on the reconstruction error. Given the features extracted with the fixed depth encoder, we train an image decoder for image reconstruction using only in-distribution data. Consequently, OOD images result in a high reconstruction error, which we use to distinguish between in- and out-of-distribution samples. We built our experiments on the standard NYU Depth V2 and KITTI benchmarks as in-distribution data. Our post hoc method performs astonishingly well on different models and outperforms existing uncertainty estimation approaches without modifying the trained encoder-decoder depth estimation model. △ Less

Submitted 11 August, 2023; originally announced August 2023.

Comments: Accepted to ICCV 2023

arXiv:2308.04946 [pdf, other]

SelectNAdapt: Support Set Selection for Few-Shot Domain Adaptation

Authors: Youssef Dawoud, Gustavo Carneiro, Vasileios Belagiannis

Abstract: Generalisation of deep neural networks becomes vulnerable when distribution shifts are encountered between train (source) and test (target) domain data. Few-shot domain adaptation mitigates this issue by adapting deep neural networks pre-trained on the source domain to the target domain using a randomly selected and annotated support set from the target domain. This paper argues that randomly sele… ▽ More Generalisation of deep neural networks becomes vulnerable when distribution shifts are encountered between train (source) and test (target) domain data. Few-shot domain adaptation mitigates this issue by adapting deep neural networks pre-trained on the source domain to the target domain using a randomly selected and annotated support set from the target domain. This paper argues that randomly selecting the support set can be further improved for effectively adapting the pre-trained source models to the target domain. Alternatively, we propose SelectNAdapt, an algorithm to curate the selection of the target domain samples, which are then annotated and included in the support set. In particular, for the K-shot adaptation problem, we first leverage self-supervision to learn features of the target domain data. Then, we propose a per-class clustering scheme of the learned target domain features and select K representative target samples using a distance-based scoring function. Finally, we bring our selection setup towards a practical ground by relying on pseudo-labels for clustering semantically similar target domain samples. Our experiments show promising results on three few-shot domain adaptation benchmarks for image recognition compared to related approaches and the standard random selection. △ Less

Submitted 9 August, 2023; originally announced August 2023.

Comments: Accepted to ICCV Workshop

arXiv:2308.01707 [pdf, other]

Joint Out-of-Distribution Detection and Uncertainty Estimation for Trajectory Prediction

Authors: Julian Wiederer, Julian Schmidt, Ulrich Kressel, Klaus Dietmayer, Vasileios Belagiannis

Abstract: Despite the significant research efforts on trajectory prediction for automated driving, limited work exists on assessing the prediction reliability. To address this limitation we propose an approach that covers two sources of error, namely novel situations with out-of-distribution (OOD) detection and the complexity in in-distribution (ID) situations with uncertainty estimation. We introduce two m… ▽ More Despite the significant research efforts on trajectory prediction for automated driving, limited work exists on assessing the prediction reliability. To address this limitation we propose an approach that covers two sources of error, namely novel situations with out-of-distribution (OOD) detection and the complexity in in-distribution (ID) situations with uncertainty estimation. We introduce two modules next to an encoder-decoder network for trajectory prediction. Firstly, a Gaussian mixture model learns the probability density function of the ID encoder features during training, and then it is used to detect the OOD samples in regions of the feature space with low likelihood. Secondly, an error regression network is applied to the encoder, which learns to estimate the trajectory prediction error in supervised training. During inference, the estimated prediction error is used as the uncertainty. In our experiments, the combination of both modules outperforms the prior work in OOD detection and uncertainty estimation, on the Shifts robust trajectory prediction dataset by $2.8 \%$ and $10.1 \%$, respectively. The code is publicly available. △ Less

Submitted 4 August, 2023; v1 submitted 3 August, 2023; originally announced August 2023.

Comments: Accepted to the 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2023)

arXiv:2306.13323 [pdf, other]

Automated Automotive Radar Calibration With Intelligent Vehicles

Authors: Alexander Tsaregorodtsev, Michael Buchholz, Vasileios Belagiannis

Abstract: While automotive radar sensors are widely adopted and have been used for automatic cruise control and collision avoidance tasks, their application outside of vehicles is still limited. As they have the ability to resolve multiple targets in 3D space, radars can also be used for improving environment perception. This application, however, requires a precise calibration, which is usually a time-cons… ▽ More While automotive radar sensors are widely adopted and have been used for automatic cruise control and collision avoidance tasks, their application outside of vehicles is still limited. As they have the ability to resolve multiple targets in 3D space, radars can also be used for improving environment perception. This application, however, requires a precise calibration, which is usually a time-consuming and labor-intensive task. We, therefore, present an approach for automated and geo-referenced extrinsic calibration of automotive radar sensors that is based on a novel hypothesis filtering scheme. Our method does not require external modifications of a vehicle and instead uses the location data obtained from automated vehicles. This location data is then combined with filtered sensor data to create calibration hypotheses. Subsequent filtering and optimization recovers the correct calibration. Our evaluation on data from a real testing site shows that our method can correctly calibrate infrastructure sensors in an automated manner, thus enabling cooperative driving scenarios. △ Less

Submitted 23 June, 2023; originally announced June 2023.

Comments: 5 pages, 4 figures, accepted for presentation at the 31st European Signal Processing Conference (EUSIPCO), September 4 - September 8, 2023, Helsinki, Finland

arXiv:2306.12881 [pdf, other]

Data-Free Backbone Fine-Tuning for Pruned Neural Networks

Authors: Adrian Holzbock, Achyut Hegde, Klaus Dietmayer, Vasileios Belagiannis

Abstract: Model compression techniques reduce the computational load and memory consumption of deep neural networks. After the compression operation, e.g. parameter pruning, the model is normally fine-tuned on the original training dataset to recover from the performance drop caused by compression. However, the training data is not always available due to privacy issues or other factors. In this work, we pr… ▽ More Model compression techniques reduce the computational load and memory consumption of deep neural networks. After the compression operation, e.g. parameter pruning, the model is normally fine-tuned on the original training dataset to recover from the performance drop caused by compression. However, the training data is not always available due to privacy issues or other factors. In this work, we present a data-free fine-tuning approach for pruning the backbone of deep neural networks. In particular, the pruned network backbone is trained with synthetically generated images, and our proposed intermediate supervision to mimic the unpruned backbone's output feature map. Afterwards, the pruned backbone can be combined with the original network head to make predictions. We generate synthetic images by back-propagating gradients to noise images while relying on L1-pruning for the backbone pruning. In our experiments, we show that our approach is task-independent due to pruning only the backbone. By evaluating our approach on 2D human pose estimation, object detection, and image classification, we demonstrate promising performance compared to the unpruned model. Our code is available at https://github.com/holzbock/dfbf. △ Less

Submitted 22 June, 2023; originally announced June 2023.

Comments: Accpeted for presentation at the 31st European Signal Processing Conference (EUSIPCO) 2023, September 4-8, 2023, Helsinki, Finland

arXiv:2304.10814 [pdf, other]

Automated Static Camera Calibration with Intelligent Vehicles

Authors: Alexander Tsaregorodtsev, Adrian Holzbock, Jan Strohbeck, Michael Buchholz, Vasileios Belagiannis

Abstract: Connected and cooperative driving requires precise calibration of the roadside infrastructure for having a reliable perception system. To solve this requirement in an automated manner, we present a robust extrinsic calibration method for automated geo-referenced camera calibration. Our method requires a calibration vehicle equipped with a combined GNSS/RTK receiver and an inertial measurement unit… ▽ More Connected and cooperative driving requires precise calibration of the roadside infrastructure for having a reliable perception system. To solve this requirement in an automated manner, we present a robust extrinsic calibration method for automated geo-referenced camera calibration. Our method requires a calibration vehicle equipped with a combined GNSS/RTK receiver and an inertial measurement unit (IMU) for self-localization. In order to remove any requirements for the target's appearance and the local traffic conditions, we propose a novel approach using hypothesis filtering. Our method does not require any human interaction with the information recorded by both the infrastructure and the vehicle. Furthermore, we do not limit road access for other road users during calibration. We demonstrate the feasibility and accuracy of our approach by evaluating our approach on synthetic datasets as well as a real-world connected intersection, and deploying the calibration on real infrastructure. Our source code is publicly available. △ Less

Submitted 21 April, 2023; originally announced April 2023.

Comments: 7 pages, 3 figures, accepted for presentation at the 34th IEEE Intelligent Vehicles Symposium (IV 2023), June 4 - June 7, 2023, Anchorage, Alaska, United States of America

arXiv:2304.05856 [pdf, other]

RESET: Revisiting Trajectory Sets for Conditional Behavior Prediction

Authors: Julian Schmidt, Pascal Huissel, Julian Wiederer, Julian Jordan, Vasileios Belagiannis, Klaus Dietmayer

Abstract: It is desirable to predict the behavior of traffic participants conditioned on different planned trajectories of the autonomous vehicle. This allows the downstream planner to estimate the impact of its decisions. Recent approaches for conditional behavior prediction rely on a regression decoder, meaning that coordinates or polynomial coefficients are regressed. In this work we revisit set-based tr… ▽ More It is desirable to predict the behavior of traffic participants conditioned on different planned trajectories of the autonomous vehicle. This allows the downstream planner to estimate the impact of its decisions. Recent approaches for conditional behavior prediction rely on a regression decoder, meaning that coordinates or polynomial coefficients are regressed. In this work we revisit set-based trajectory prediction, where the probability of each trajectory in a predefined trajectory set is determined by a classification model, and first-time employ it to the task of conditional behavior prediction. We propose RESET, which combines a new metric-driven algorithm for trajectory set generation with a graph-based encoder. For unconditional prediction, RESET achieves comparable performance to a regression-based approach. Due to the nature of set-based approaches, it has the advantageous property of being able to predict a flexible number of trajectories without influencing runtime or complexity. For conditional prediction, RESET achieves reasonable results with late fusion of the planned trajectory, which was not observed for regression-based approaches before. This means that RESET is computationally lightweight to combine with a planner that proposes multiple future plans of the autonomous vehicle, as large parts of the forward pass can be reused. △ Less

Submitted 12 April, 2023; originally announced April 2023.

Comments: Accepted to the 2023 Intelligent Vehicles Symposium (IV 2023)

arXiv:2303.08052 [pdf, other]

Localizing Spatial Information in Neural Spatiospectral Filters

Authors: Annika Briegleb, Thomas Haubner, Vasileios Belagiannis, Walter Kellermann

Abstract: Beamforming for multichannel speech enhancement relies on the estimation of spatial characteristics of the acoustic scene. In its simplest form, the delay-and-sum beamformer (DSB) introduces a time delay to all channels to align the desired signal components for constructive superposition. Recent investigations of neural spatiospectral filtering revealed that these filters can be characterized by… ▽ More Beamforming for multichannel speech enhancement relies on the estimation of spatial characteristics of the acoustic scene. In its simplest form, the delay-and-sum beamformer (DSB) introduces a time delay to all channels to align the desired signal components for constructive superposition. Recent investigations of neural spatiospectral filtering revealed that these filters can be characterized by a beampattern similar to one of traditional beamformers, which shows that artificial neural networks can learn and explicitly represent spatial structure. Using the Complex-valued Spatial Autoencoder (COSPA) as an exemplary neural spatiospectral filter for multichannel speech enhancement, we investigate where and how such networks represent spatial information. We show via clustering that for COSPA the spatial information is represented by the features generated by a gated recurrent unit (GRU) layer that has access to all channels simultaneously and that these features are not source -- but only direction of arrival-dependent. △ Less

Submitted 3 July, 2023; v1 submitted 14 March, 2023; originally announced March 2023.

Comments: Accepted to the 31st European Signal Processing Conference (EUSIPCO 2023), Helsinki, Finland. 5 pages, 3 figures

arXiv:2302.09998 [pdf, other]

doi 10.1007/978-3-031-25056-9_36

Gesture Recognition with Keypoint and Radar Stream Fusion for Automated Vehicles

Authors: Adrian Holzbock, Nicolai Kern, Christian Waldschmidt, Klaus Dietmayer, Vasileios Belagiannis

Abstract: We present a joint camera and radar approach to enable autonomous vehicles to understand and react to human gestures in everyday traffic. Initially, we process the radar data with a PointNet followed by a spatio-temporal multilayer perceptron (stMLP). Independently, the human body pose is extracted from the camera frame and processed with a separate stMLP network. We propose a fusion neural networ… ▽ More We present a joint camera and radar approach to enable autonomous vehicles to understand and react to human gestures in everyday traffic. Initially, we process the radar data with a PointNet followed by a spatio-temporal multilayer perceptron (stMLP). Independently, the human body pose is extracted from the camera frame and processed with a separate stMLP network. We propose a fusion neural network for both modalities, including an auxiliary loss for each modality. In our experiments with a collected dataset, we show the advantages of gesture recognition with two modalities. Motivated by adverse weather conditions, we also demonstrate promising performance when one of the sensors lacks functionality. △ Less

Submitted 20 February, 2023; originally announced February 2023.

Comments: Accepted for presentation at the 3rd AVVision Workshop at ECCV 2022, October 23, 2022, Tel Aviv, Israel

Journal ref: In Computer Vision-ECCV 2022 Workshops: Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part I (pp. 570-584). Cham: Springer Nature Switzerland

arXiv:2212.02875 [pdf, other]

Multi-Task Edge Prediction in Temporally-Dynamic Video Graphs

Authors: Osman Ülger, Julian Wiederer, Mohsen Ghafoorian, Vasileios Belagiannis, Pascal Mettes

Abstract: Graph neural networks have shown to learn effective node representations, enabling node-, link-, and graph-level inference. Conventional graph networks assume static relations between nodes, while relations between entities in a video often evolve over time, with nodes entering and exiting dynamically. In such temporally-dynamic graphs, a core problem is inferring the future state of spatio-tempor… ▽ More Graph neural networks have shown to learn effective node representations, enabling node-, link-, and graph-level inference. Conventional graph networks assume static relations between nodes, while relations between entities in a video often evolve over time, with nodes entering and exiting dynamically. In such temporally-dynamic graphs, a core problem is inferring the future state of spatio-temporal edges, which can constitute multiple types of relations. To address this problem, we propose MTD-GNN, a graph network for predicting temporally-dynamic edges for multiple types of relations. We propose a factorized spatio-temporal graph attention layer to learn dynamic node representations and present a multi-task edge prediction loss that models multiple relations simultaneously. The proposed architecture operates on top of scene graphs that we obtain from videos through object detection and spatio-temporal linking. Experimental evaluations on ActionGenome and CLEVRER show that modeling multiple relations in our temporally-dynamic graph network can be mutually beneficial, outperforming existing static and spatio-temporal graph neural networks, as well as state-of-the-art predicate classification methods. △ Less

Submitted 6 December, 2022; originally announced December 2022.

Comments: BMVC2022

arXiv:2211.14512 [pdf, other]

Residual Pattern Learning for Pixel-wise Out-of-Distribution Detection in Semantic Segmentation

Authors: Yuyuan Liu, Choubo Ding, Yu Tian, Guansong Pang, Vasileios Belagiannis, Ian Reid, Gustavo Carneiro

Abstract: Semantic segmentation models classify pixels into a set of known (``in-distribution'') visual classes. When deployed in an open world, the reliability of these models depends on their ability not only to classify in-distribution pixels but also to detect out-of-distribution (OoD) pixels. Historically, the poor OoD detection performance of these models has motivated the design of methods based on m… ▽ More Semantic segmentation models classify pixels into a set of known (``in-distribution'') visual classes. When deployed in an open world, the reliability of these models depends on their ability not only to classify in-distribution pixels but also to detect out-of-distribution (OoD) pixels. Historically, the poor OoD detection performance of these models has motivated the design of methods based on model re-training using synthetic training images that include OoD visual objects. Although successful, these re-trained methods have two issues: 1) their in-distribution segmentation accuracy may drop during re-training, and 2) their OoD detection accuracy does not generalise well to new contexts (e.g., country surroundings) outside the training set (e.g., city surroundings). In this paper, we mitigate these issues with: (i) a new residual pattern learning (RPL) module that assists the segmentation model to detect OoD pixels without affecting the inlier segmentation performance; and (ii) a novel context-robust contrastive learning (CoroCL) that enforces RPL to robustly detect OoD pixels among various contexts. Our approach improves by around 10\% FPR and 7\% AuPRC the previous state-of-the-art in Fishyscapes, Segment-Me-If-You-Can, and RoadAnomaly datasets. Our code is available at: https://github.com/yyliu01/RPL. △ Less

Submitted 21 August, 2023; v1 submitted 26 November, 2022; originally announced November 2022.

Comments: The paper contains 16 pages and it is accepted by ICCV'23

arXiv:2211.10244 [pdf, other]

Knowing What to Label for Few Shot Microscopy Image Cell Segmentation

Authors: Youssef Dawoud, Arij Bouazizi, Katharina Ernst, Gustavo Carneiro, Vasileios Belagiannis

Abstract: In microscopy image cell segmentation, it is common to train a deep neural network on source data, containing different types of microscopy images, and then fine-tune it using a support set comprising a few randomly selected and annotated training target images. In this paper, we argue that the random selection of unlabelled training target images to be annotated and included in the support set ma… ▽ More In microscopy image cell segmentation, it is common to train a deep neural network on source data, containing different types of microscopy images, and then fine-tune it using a support set comprising a few randomly selected and annotated training target images. In this paper, we argue that the random selection of unlabelled training target images to be annotated and included in the support set may not enable an effective fine-tuning process, so we propose a new approach to optimise this image selection process. Our approach involves a new scoring function to find informative unlabelled target images. In particular, we propose to measure the consistency in the model predictions on target images against specific data augmentations. However, we observe that the model trained with source datasets does not reliably evaluate consistency on target images. To alleviate this problem, we propose novel self-supervised pretext tasks to compute the scores of unlabelled target images. Finally, the top few images with the least consistency scores are added to the support set for oracle (i.e., expert) annotation and later used to fine-tune the model to the target images. In our evaluations that involve the segmentation of five different types of cell images, we demonstrate promising results on several target test sets compared to the random selection approach as well as other selection approaches, such as Shannon's entropy and Monte-Carlo dropout. △ Less

Submitted 18 November, 2022; originally announced November 2022.

Comments: Accepted to WACV 2023

arXiv:2211.08115 [pdf, other]

doi 10.1109/WACV56688.2023.00263

Heatmap-based Out-of-Distribution Detection

Authors: Julia Hornauer, Vasileios Belagiannis

Abstract: Our work investigates out-of-distribution (OOD) detection as a neural network output explanation problem. We learn a heatmap representation for detecting OOD images while visualizing in- and out-of-distribution image regions at the same time. Given a trained and fixed classifier, we train a decoder neural network to produce heatmaps with zero response for in-distribution samples and high response… ▽ More Our work investigates out-of-distribution (OOD) detection as a neural network output explanation problem. We learn a heatmap representation for detecting OOD images while visualizing in- and out-of-distribution image regions at the same time. Given a trained and fixed classifier, we train a decoder neural network to produce heatmaps with zero response for in-distribution samples and high response heatmaps for OOD samples, based on the classifier features and the class prediction. Our main innovation lies in the heatmap definition for an OOD sample, as the normalized difference from the closest in-distribution sample. The heatmap serves as a margin to distinguish between in- and out-of-distribution samples. Our approach generates the heatmaps not only for OOD detection, but also to indicate in- and out-of-distribution regions of the input image. In our evaluations, our approach mostly outperforms the prior work on fixed classifiers, trained on CIFAR-10, CIFAR-100 and Tiny ImageNet. The code is publicly available at: https://github.com/jhornauer/heatmap_ood. △ Less

Submitted 11 August, 2023; v1 submitted 15 November, 2022; originally announced November 2022.

Comments: Accepted to WACV 2023

arXiv:2209.01838 [pdf, other]

A Benchmark for Unsupervised Anomaly Detection in Multi-Agent Trajectories

Authors: Julian Wiederer, Julian Schmidt, Ulrich Kressel, Klaus Dietmayer, Vasileios Belagiannis

Abstract: Human intuition allows to detect abnormal driving scenarios in situations they never experienced before. Like humans detect those abnormal situations and take countermeasures to prevent collisions, self-driving cars need anomaly detection mechanisms. However, the literature lacks a standard benchmark for the comparison of anomaly detection algorithms. We fill the gap and propose the R-U-MAAD bench… ▽ More Human intuition allows to detect abnormal driving scenarios in situations they never experienced before. Like humans detect those abnormal situations and take countermeasures to prevent collisions, self-driving cars need anomaly detection mechanisms. However, the literature lacks a standard benchmark for the comparison of anomaly detection algorithms. We fill the gap and propose the R-U-MAAD benchmark for unsupervised anomaly detection in multi-agent trajectories. The goal is to learn a representation of the normal driving from the training sequences without labels, and afterwards detect anomalies. We use the Argoverse Motion Forecasting dataset for the training and propose a test dataset of 160 sequences with human-annotated anomalies in urban environments. To this end we combine a replay of real-world trajectories and scene-dependent abnormal driving in the simulation. In our experiments we compare 11 baselines including linear models, deep auto-encoders and one-class classification models using standard anomaly detection metrics. The deep reconstruction and end-to-end one-class methods show promising results. The benchmark and the baseline models will be publicly available. △ Less

Submitted 5 September, 2022; originally announced September 2022.

Comments: 8 pages, 4 figures, 2 tables, accepted by IEEE ITSC 2022

arXiv:2208.03949 [pdf, other]

Extrinsic Camera Calibration with Semantic Segmentation

Authors: Alexander Tsaregorodtsev, Johannes Müller, Jan Strohbeck, Martin Herrmann, Michael Buchholz, Vasileios Belagiannis

Abstract: Monocular camera sensors are vital to intelligent vehicle operation and automated driving assistance and are also heavily employed in traffic control infrastructure. Calibrating the monocular camera, though, is time-consuming and often requires significant manual intervention. In this work, we present an extrinsic camera calibration approach that automatizes the parameter estimation by utilizing s… ▽ More Monocular camera sensors are vital to intelligent vehicle operation and automated driving assistance and are also heavily employed in traffic control infrastructure. Calibrating the monocular camera, though, is time-consuming and often requires significant manual intervention. In this work, we present an extrinsic camera calibration approach that automatizes the parameter estimation by utilizing semantic segmentation information from images and point clouds. Our approach relies on a coarse initial measurement of the camera pose and builds on lidar sensors mounted on a vehicle with high-precision localization to capture a point cloud of the camera environment. Afterward, a map** between the camera and world coordinate spaces is obtained by performing a lidar-to-camera registration of the semantically segmented sensor data. We evaluate our method on simulated and real-world data to demonstrate low error measurements in the calibration results. Our approach is suitable for infrastructure sensors as well as vehicle sensors, while it does not require motion of the camera platform. △ Less

Submitted 8 August, 2022; originally announced August 2022.

Comments: 7 pages, 3 figures, accepted at the 25th International Conference on Intelligent Transportation Systems (ITSC) 2022

arXiv:2208.02105 [pdf, other]

Edge-Based Self-Supervision for Semi-Supervised Few-Shot Microscopy Image Cell Segmentation

Authors: Youssef Dawoud, Katharina Ernst, Gustavo Carneiro, Vasileios Belagiannis

Abstract: Deep neural networks currently deliver promising results for microscopy image cell segmentation, but they require large-scale labelled databases, which is a costly and time-consuming process. In this work, we relax the labelling requirement by combining self-supervised with semi-supervised learning. We propose the prediction of edge-based maps for self-supervising the training of the unlabelled im… ▽ More Deep neural networks currently deliver promising results for microscopy image cell segmentation, but they require large-scale labelled databases, which is a costly and time-consuming process. In this work, we relax the labelling requirement by combining self-supervised with semi-supervised learning. We propose the prediction of edge-based maps for self-supervising the training of the unlabelled images, which is combined with the supervised training of a small number of labelled images for learning the segmentation task. In our experiments, we evaluate on a few-shot microscopy image cell segmentation benchmark and show that only a small number of annotated images, e.g. 10% of the original training set, is enough for our approach to reach similar performance as with the fully annotated databases on 1- to 10-shots. Our code and trained models is made publicly available △ Less

Submitted 3 August, 2022; originally announced August 2022.

Comments: Accepted by MOVI 2022

arXiv:2208.02005 [pdf, other]

doi 10.1007/978-3-031-20044-1_35

Gradient-based Uncertainty for Monocular Depth Estimation

Authors: Julia Hornauer, Vasileios Belagiannis

Abstract: In monocular depth estimation, disturbances in the image context, like moving objects or reflecting materials, can easily lead to erroneous predictions. For that reason, uncertainty estimates for each pixel are necessary, in particular for safety-critical applications such as automated driving. We propose a post hoc uncertainty estimation approach for an already trained and thus fixed depth estima… ▽ More In monocular depth estimation, disturbances in the image context, like moving objects or reflecting materials, can easily lead to erroneous predictions. For that reason, uncertainty estimates for each pixel are necessary, in particular for safety-critical applications such as automated driving. We propose a post hoc uncertainty estimation approach for an already trained and thus fixed depth estimation model, represented by a deep neural network. The uncertainty is estimated with the gradients which are extracted with an auxiliary loss function. To avoid relying on ground-truth information for the loss definition, we present an auxiliary loss function based on the correspondence of the depth prediction for an image and its horizontally flipped counterpart. Our approach achieves state-of-the-art uncertainty estimation results on the KITTI and NYU Depth V2 benchmarks without the need to retrain the neural network. Models and code are publicly available at https://github.com/jhornauer/GrUMoDepth. △ Less

Submitted 3 August, 2022; originally announced August 2022.

Comments: Accepted to ECCV 2022

arXiv:2207.00499 [pdf, other]

MotionMixer: MLP-based 3D Human Body Pose Forecasting

Authors: Arij Bouazizi, Adrian Holzbock, Ulrich Kressel, Klaus Dietmayer, Vasileios Belagiannis

Abstract: In this work, we present MotionMixer, an efficient 3D human body pose forecasting model based solely on multi-layer perceptrons (MLPs). MotionMixer learns the spatial-temporal 3D body pose dependencies by sequentially mixing both modalities. Given a stacked sequence of 3D body poses, a spatial-MLP extracts fine grained spatial dependencies of the body joints. The interaction of the body joints ove… ▽ More In this work, we present MotionMixer, an efficient 3D human body pose forecasting model based solely on multi-layer perceptrons (MLPs). MotionMixer learns the spatial-temporal 3D body pose dependencies by sequentially mixing both modalities. Given a stacked sequence of 3D body poses, a spatial-MLP extracts fine grained spatial dependencies of the body joints. The interaction of the body joints over time is then modelled by a temporal MLP. The spatial-temporal mixed features are finally aggregated and decoded to obtain the future motion. To calibrate the influence of each time step in the pose sequence, we make use of squeeze-and-excitation (SE) blocks. We evaluate our approach on Human3.6M, AMASS, and 3DPW datasets using the standard evaluation protocols. For all evaluations, we demonstrate state-of-the-art performance, while having a model with a smaller number of parameters. Our code is available at: https://github.com/MotionMLP/MotionMixer △ Less

Submitted 1 July, 2022; originally announced July 2022.

Comments: Accepted by IJCAI-ECAI'22 (Oral-Long presentation)

arXiv:2204.11511 [pdf, other]

doi 10.1109/IV51971.2022.9827054

A Spatio-Temporal Multilayer Perceptron for Gesture Recognition

Authors: Adrian Holzbock, Alexander Tsaregorodtsev, Youssef Dawoud, Klaus Dietmayer, Vasileios Belagiannis

Abstract: Gesture recognition is essential for the interaction of autonomous vehicles with humans. While the current approaches focus on combining several modalities like image features, keypoints and bone vectors, we present neural network architecture that delivers state-of-the-art results only with body skeleton input data. We propose the spatio-temporal multilayer perceptron for gesture recognition in t… ▽ More Gesture recognition is essential for the interaction of autonomous vehicles with humans. While the current approaches focus on combining several modalities like image features, keypoints and bone vectors, we present neural network architecture that delivers state-of-the-art results only with body skeleton input data. We propose the spatio-temporal multilayer perceptron for gesture recognition in the context of autonomous vehicles. Given 3D body poses over time, we define temporal and spatial mixing operations to extract features in both domains. Additionally, the importance of each time step is re-weighted with Squeeze-and-Excitation layers. An extensive evaluation of the TCG and Drive&Act datasets is provided to showcase the promising performance of our approach. Furthermore, we deploy our model to our autonomous vehicle to show its real-time capability and stable execution. △ Less

Submitted 18 August, 2022; v1 submitted 25 April, 2022; originally announced April 2022.

Comments: Accepted for presentation at the 33rd IEEE Intelligent Vehicles Symposium (IV 2022), June 5 - June 9, 2022, Aachen, Germany

Journal ref: 2022 IEEE Intelligent Vehicles Symposium (IV), June 5th - 9th, 2022, Aachen, Germany, pp. 1099-1106

arXiv:2203.14523 [pdf, other]

Translation Consistent Semi-supervised Segmentation for 3D Medical Images

Authors: Yuyuan Liu, Yu Tian, Chong Wang, Yuanhong Chen, Fengbei Liu, Vasileios Belagiannis, Gustavo Carneiro

Abstract: 3D medical image segmentation methods have been successful, but their dependence on large amounts of voxel-level annotated data is a disadvantage that needs to be addressed given the high cost to obtain such annotation. Semi-supervised learning (SSL) solve this issue by training models with a large unlabelled and a small labelled dataset. The most successful SSL approaches are based on consistency… ▽ More 3D medical image segmentation methods have been successful, but their dependence on large amounts of voxel-level annotated data is a disadvantage that needs to be addressed given the high cost to obtain such annotation. Semi-supervised learning (SSL) solve this issue by training models with a large unlabelled and a small labelled dataset. The most successful SSL approaches are based on consistency learning that minimises the distance between model responses obtained from perturbed views of the unlabelled data. These perturbations usually keep the spatial input context between views fairly consistent, which may cause the model to learn segmentation patterns from the spatial input contexts instead of the segmented objects. In this paper, we introduce the Translation Consistent Co-training (TraCoCo) which is a consistency learning SSL method that perturbs the input data views by varying their spatial input context, allowing the model to learn segmentation patterns from visual objects. Furthermore, we propose the replacement of the commonly used mean squared error (MSE) semi-supervised loss by a new Cross-model confident Binary Cross entropy (CBC) loss, which improves training convergence and keeps the robustness to co-training pseudo-labelling mistakes. We also extend CutMix augmentation to 3D SSL to further improve generalisation. Our TraCoCo shows state-of-the-art results for the Left Atrium (LA) and Brain Tumor Segmentation (BRaTS19) datasets with different backbones. Our code is available at https://github.com/yyliu01/TraCoCo. △ Less

Submitted 21 April, 2023; v1 submitted 28 March, 2022; originally announced March 2022.

arXiv:2203.04206 [pdf, other]

Lightweight Monocular Depth Estimation through Guided Decoding

Authors: Michael Rudolph, Youssef Dawoud, Ronja Güldenring, Lazaros Nalpantidis, Vasileios Belagiannis

Abstract: We present a lightweight encoder-decoder architecture for monocular depth estimation, specifically designed for embedded platforms. Our main contribution is the Guided Upsampling Block (GUB) for building the decoder of our model. Motivated by the concept of guided image filtering, GUB relies on the image to guide the decoder on upsampling the feature representation and the depth map reconstruction… ▽ More We present a lightweight encoder-decoder architecture for monocular depth estimation, specifically designed for embedded platforms. Our main contribution is the Guided Upsampling Block (GUB) for building the decoder of our model. Motivated by the concept of guided image filtering, GUB relies on the image to guide the decoder on upsampling the feature representation and the depth map reconstruction, achieving high resolution results with fine-grained details. Based on multiple GUBs, our model outperforms the related methods on the NYU Depth V2 dataset in terms of accuracy while delivering up to 35.1 fps on the NVIDIA Jetson Nano and up to 144.5 fps on the NVIDIA Xavier NX. Similarly, on the KITTI dataset, inference is possible with up to 23.7 fps on the Jetson Nano and 102.9 fps on the Xavier NX. Our code and models are made publicly available. △ Less

Submitted 8 March, 2022; originally announced March 2022.

Comments: Accepted to ICRA 2022

arXiv:2202.04461 [pdf, other]

A Multi-Task Recurrent Neural Network for End-to-End Dynamic Occupancy Grid Map**

Authors: Marcel Schreiber, Vasileios Belagiannis, Claudius Gläser, Klaus Dietmayer

Abstract: A common approach for modeling the environment of an autonomous vehicle are dynamic occupancy grid maps, in which the surrounding is divided into cells, each containing the occupancy and velocity state of its location. Despite the advantage of modeling arbitrary shaped objects, the used algorithms rely on hand-designed inverse sensor models and semantic information is missing. Therefore, we introd… ▽ More A common approach for modeling the environment of an autonomous vehicle are dynamic occupancy grid maps, in which the surrounding is divided into cells, each containing the occupancy and velocity state of its location. Despite the advantage of modeling arbitrary shaped objects, the used algorithms rely on hand-designed inverse sensor models and semantic information is missing. Therefore, we introduce a multi-task recurrent neural network to predict grid maps providing occupancies, velocity estimates, semantic information and the driveable area. During training, our network architecture, which is a combination of convolutional and recurrent layers, processes sequences of raw lidar data, that is represented as bird's eye view images with several height channels. The multi-task network is trained in an end-to-end fashion to predict occupancy grid maps without the usual preprocessing steps consisting of removing ground points and applying an inverse sensor model. In our evaluations, we show that our learned inverse sensor model is able to overcome some limitations of a geometric inverse sensor model in terms of representing object shapes and modeling freespace. Moreover, we report a better runtime performance and more accurate semantic predictions for our end-to-end approach, compared to our network relying on measurement grid maps as input data. △ Less

Submitted 5 May, 2022; v1 submitted 9 February, 2022; originally announced February 2022.

Comments: Accepted for presentation at the 2022 33rd IEEE Intelligent Vehicles Symposium (IV) (IV 2022), June 5-9, 2022, in Aachen, Germany

arXiv:2111.12918 [pdf, other]

ACPL: Anti-curriculum Pseudo-labelling for Semi-supervised Medical Image Classification

Authors: Fengbei Liu, Yu Tian, Yuanhong Chen, Yuyuan Liu, Vasileios Belagiannis, Gustavo Carneiro

Abstract: Effective semi-supervised learning (SSL) in medical image analysis (MIA) must address two challenges: 1) work effectively on both multi-class (e.g., lesion classification) and multi-label (e.g., multiple-disease diagnosis) problems, and 2) handle imbalanced learning (because of the high variance in disease prevalence). One strategy to explore in SSL MIA is based on the pseudo labelling strategy, b… ▽ More Effective semi-supervised learning (SSL) in medical image analysis (MIA) must address two challenges: 1) work effectively on both multi-class (e.g., lesion classification) and multi-label (e.g., multiple-disease diagnosis) problems, and 2) handle imbalanced learning (because of the high variance in disease prevalence). One strategy to explore in SSL MIA is based on the pseudo labelling strategy, but it has a few shortcomings. Pseudo-labelling has in general lower accuracy than consistency learning, it is not specifically designed for both multi-class and multi-label problems, and it can be challenged by imbalanced learning. In this paper, unlike traditional methods that select confident pseudo label by threshold, we propose a new SSL algorithm, called anti-curriculum pseudo-labelling (ACPL), which introduces novel techniques to select informative unlabelled samples, improving training balance and allowing the model to work for both multi-label and multi-class problems, and to estimate pseudo labels by an accurate ensemble of classifiers (improving pseudo label accuracy). We run extensive experiments to evaluate ACPL on two public medical image classification benchmarks: Chest X-Ray14 for thorax disease multi-label classification and ISIC2018 for skin lesion multi-class classification. Our method outperforms previous SOTA SSL methods on both datasets △ Less

Submitted 21 March, 2022; v1 submitted 25 November, 2021; originally announced November 2021.

Comments: CVPR 2022

arXiv:2111.12903 [pdf, other]

Perturbed and Strict Mean Teachers for Semi-supervised Semantic Segmentation

Authors: Yuyuan Liu, Yu Tian, Yuanhong Chen, Fengbei Liu, Vasileios Belagiannis, Gustavo Carneiro

Abstract: Consistency learning using input image, feature, or network perturbations has shown remarkable results in semi-supervised semantic segmentation, but this approach can be seriously affected by inaccurate predictions of unlabelled training images. There are two consequences of these inaccurate predictions: 1) the training based on the "strict" cross-entropy (CE) loss can easily overfit prediction mi… ▽ More Consistency learning using input image, feature, or network perturbations has shown remarkable results in semi-supervised semantic segmentation, but this approach can be seriously affected by inaccurate predictions of unlabelled training images. There are two consequences of these inaccurate predictions: 1) the training based on the "strict" cross-entropy (CE) loss can easily overfit prediction mistakes, leading to confirmation bias; and 2) the perturbations applied to these inaccurate predictions will use potentially erroneous predictions as training signals, degrading consistency learning. In this paper, we address the prediction accuracy problem of consistency learning methods with novel extensions of the mean-teacher (MT) model, which include a new auxiliary teacher, and the replacement of MT's mean square error (MSE) by a stricter confidence-weighted cross-entropy (Conf-CE) loss. The accurate prediction by this model allows us to use a challenging combination of network, input data and feature perturbations to improve the consistency learning generalisation, where the feature perturbations consist of a new adversarial perturbation. Results on public benchmarks show that our approach achieves remarkable improvements over the previous SOTA methods in the field. Our code is available at https://github.com/yyliu01/PS-MT. △ Less

Submitted 26 March, 2022; v1 submitted 24 November, 2021; originally announced November 2021.

Comments: CVPR 2022 camera-ready

arXiv:2110.11809 [pdf, other]

PropMix: Hard Sample Filtering and Proportional MixUp for Learning with Noisy Labels

Authors: Filipe R. Cordeiro, Vasileios Belagiannis, Ian Reid, Gustavo Carneiro

Abstract: The most competitive noisy label learning methods rely on an unsupervised classification of clean and noisy samples, where samples classified as noisy are re-labelled and "MixMatched" with the clean samples. These methods have two issues in large noise rate problems: 1) the noisy set is more likely to contain hard samples that are in-correctly re-labelled, and 2) the number of samples produced by… ▽ More The most competitive noisy label learning methods rely on an unsupervised classification of clean and noisy samples, where samples classified as noisy are re-labelled and "MixMatched" with the clean samples. These methods have two issues in large noise rate problems: 1) the noisy set is more likely to contain hard samples that are in-correctly re-labelled, and 2) the number of samples produced by MixMatch tends to be reduced because it is constrained by the small clean set size. In this paper, we introduce the learning algorithm PropMix to handle the issues above. PropMix filters out hard noisy samples, with the goal of increasing the likelihood of correctly re-labelling the easy noisy samples. Also, PropMix places clean and re-labelled easy noisy samples in a training set that is augmented with MixUp, removing the clean set size constraint and including a large proportion of correctly re-labelled easy noisy samples. We also include self-supervised pre-training to improve robustness to high noisy label scenarios. Our experiments show that PropMix has state-of-the-art (SOTA) results on CIFAR-10/-100(with symmetric, asymmetric and semantic label noise), Red Mini-ImageNet (from the Controlled Noisy Web Labels), Clothing1M and WebVision. In severe label noise bench-marks, our results are substantially better than other methods. The code is available athttps://github.com/filipe-research/PropMix. △ Less

Submitted 22 October, 2021; originally announced October 2021.

Comments: Paper accepted at BMVC'21: The 32nd British Machine Vision Conference

arXiv:2110.07922 [pdf, other]

Anomaly Detection in Multi-Agent Trajectories for Automated Driving

Authors: Julian Wiederer, Arij Bouazizi, Marco Troina, Ulrich Kressel, Vasileios Belagiannis

Abstract: Human drivers can recognise fast abnormal driving situations to avoid accidents. Similar to humans, automated vehicles are supposed to perform anomaly detection. In this work, we propose the spatio-temporal graph auto-encoder for learning normal driving behaviours. Our innovation is the ability to jointly learn multiple trajectories of a dynamic number of agents. To perform anomaly detection, we f… ▽ More Human drivers can recognise fast abnormal driving situations to avoid accidents. Similar to humans, automated vehicles are supposed to perform anomaly detection. In this work, we propose the spatio-temporal graph auto-encoder for learning normal driving behaviours. Our innovation is the ability to jointly learn multiple trajectories of a dynamic number of agents. To perform anomaly detection, we first estimate a density function of the learned trajectory feature representation and then detect anomalies in low-density regions. Due to the lack of multi-agent trajectory datasets for anomaly detection in automated driving, we introduce our dataset using a driving simulator for normal and abnormal manoeuvres. Our evaluations show that our approach learns the relation between different agents and delivers promising results compared to the related works. The code, simulation and the dataset are publicly available on https://github.com/againerju/maad_highway. △ Less

Submitted 28 October, 2021; v1 submitted 15 October, 2021; originally announced October 2021.

Comments: 15 pages incl. supplementary material, 8 figures, 4 tables (accepted by CoRL 2021)

arXiv:2110.07578 [pdf, other]

Learning Temporal 3D Human Pose Estimation with Pseudo-Labels

Authors: Arij Bouazizi, Ulrich Kressel, Vasileios Belagiannis

Abstract: We present a simple, yet effective, approach for self-supervised 3D human pose estimation. Unlike the prior work, we explore the temporal information next to the multi-view self-supervision. During training, we rely on triangulating 2D body pose estimates of a multiple-view camera system. A temporal convolutional neural network is trained with the generated 3D ground-truth and the geometric multi-… ▽ More We present a simple, yet effective, approach for self-supervised 3D human pose estimation. Unlike the prior work, we explore the temporal information next to the multi-view self-supervision. During training, we rely on triangulating 2D body pose estimates of a multiple-view camera system. A temporal convolutional neural network is trained with the generated 3D ground-truth and the geometric multi-view consistency loss, imposing geometrical constraints on the predicted 3D body skeleton. During inference, our model receives a sequence of 2D body pose estimates from a single-view to predict the 3D body pose for each of them. An extensive evaluation shows that our method achieves state-of-the-art performance in the Human3.6M and MPI-INF-3DHP benchmarks. Our code and models are publicly available at \url{https://github.com/vru2020/TM_HPE/}. △ Less

Submitted 14 October, 2021; originally announced October 2021.

Comments: Accepted for publication at AVSS 2021. Project page:https://github.com/vru2020/TM_HPE/

arXiv:2109.09734 [pdf, other]

MetaMedSeg: Volumetric Meta-learning for Few-Shot Organ Segmentation

Authors: Anastasia Makarevich, Azade Farshad, Vasileios Belagiannis, Nassir Navab

Abstract: The lack of sufficient annotated image data is a common issue in medical image segmentation. For some organs and densities, the annotation may be scarce, leading to poor model training convergence, while other organs have plenty of annotated data. In this work, we present MetaMedSeg, a gradient-based meta-learning algorithm that redefines the meta-learning task for the volumetric medical data with… ▽ More The lack of sufficient annotated image data is a common issue in medical image segmentation. For some organs and densities, the annotation may be scarce, leading to poor model training convergence, while other organs have plenty of annotated data. In this work, we present MetaMedSeg, a gradient-based meta-learning algorithm that redefines the meta-learning task for the volumetric medical data with the goal to capture the variety between the slices. We also explore different weighting schemes for gradients aggregation, arguing that different tasks might have different complexity, and hence, contribute differently to the initialization. We propose an importance-aware weighting scheme to train our model. In the experiments, we present an evaluation of the medical decathlon dataset by extracting 2D slices from CT and MRI volumes of different organs and performing semantic segmentation. The results show that our proposed volumetric task definition leads to up to 30% improvement in terms of IoU compared to related baselines. The proposed update rule is also shown to improve the performance for complex scenarios where the data distribution of the target organ is very different from the source organs. △ Less

Submitted 18 September, 2021; originally announced September 2021.

arXiv:2108.07777 [pdf, other]

Self-Supervised 3D Human Pose Estimation with Multiple-View Geometry

Authors: Arij Bouazizi, Julian Wiederer, Ulrich Kressel, Vasileios Belagiannis

Abstract: We present a self-supervised learning algorithm for 3D human pose estimation of a single person based on a multiple-view camera system and 2D body pose estimates for each view. To train our model, represented by a deep neural network, we propose a four-loss function learning algorithm, which does not require any 2D or 3D body pose ground-truth. The proposed loss functions make use of the multiple-… ▽ More We present a self-supervised learning algorithm for 3D human pose estimation of a single person based on a multiple-view camera system and 2D body pose estimates for each view. To train our model, represented by a deep neural network, we propose a four-loss function learning algorithm, which does not require any 2D or 3D body pose ground-truth. The proposed loss functions make use of the multiple-view geometry to reconstruct 3D body pose estimates and impose body pose constraints across the camera views. Our approach utilizes all available camera views during training, while the inference is single-view. In our evaluations, we show promising performance on Human3.6M and HumanEva benchmarks, while we also present a generalization study on MPI-INF-3DHP dataset, as well as several ablation results. Overall, we outperform all self-supervised learning methods and reach comparable results to supervised and weakly-supervised learning approaches. Our code and models are publicly available △ Less

Submitted 17 August, 2021; originally announced August 2021.

Comments: Accepted for publication at FG 2021

arXiv:2108.02671 [pdf, other]

doi 10.1109/ICCVW54120.2021.00111

Visual Domain Adaptation for Monocular Depth Estimation on Resource-Constrained Hardware

Authors: Julia Hornauer, Lazaros Nalpantidis, Vasileios Belagiannis

Abstract: Real-world perception systems in many cases build on hardware with limited resources to adhere to cost and power limitations of their carrying system. Deploying deep neural networks on resource-constrained hardware became possible with model compression techniques, as well as efficient and hardware-aware architecture design. However, model adaptation is additionally required due to the diverse ope… ▽ More Real-world perception systems in many cases build on hardware with limited resources to adhere to cost and power limitations of their carrying system. Deploying deep neural networks on resource-constrained hardware became possible with model compression techniques, as well as efficient and hardware-aware architecture design. However, model adaptation is additionally required due to the diverse operation environments. In this work, we address the problem of training deep neural networks on resource-constrained hardware in the context of visual domain adaptation. We select the task of monocular depth estimation where our goal is to transform a pre-trained model to the target's domain data. While the source domain includes labels, we assume an unlabelled target domain, as it happens in real-world applications. Then, we present an adversarial learning approach that is adapted for training on the device with limited resources. Since visual domain adaptation, i.e. neural network training, has not been previously explored for resource-constrained hardware, we present the first feasibility study for image-based depth estimation. Our experiments show that visual domain adaptation is relevant only for efficient network architectures and training sets at the order of a few hundred samples. Models and code are publicly available. △ Less

Submitted 5 May, 2022; v1 submitted 5 August, 2021; originally announced August 2021.

Comments: Accepted to ICCV 2021 Workshop on Embedded and Real-World Computer Vision in Autonomous Driving

Journal ref: 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), 2021, pp. 954-962

arXiv:2107.07787 [pdf, other]

Attention-based Vehicle Self-Localization with HD Feature Maps

Authors: Nico Engel, Vasileios Belagiannis, Klaus Dietmayer

Abstract: We present a vehicle self-localization method using point-based deep neural networks. Our approach processes measurements and point features, i.e. landmarks, from a high-definition digital map to infer the vehicle's pose. To learn the best association and incorporate local information between the point sets, we propose an attention mechanism that matches the measurements to the corresponding landm… ▽ More We present a vehicle self-localization method using point-based deep neural networks. Our approach processes measurements and point features, i.e. landmarks, from a high-definition digital map to infer the vehicle's pose. To learn the best association and incorporate local information between the point sets, we propose an attention mechanism that matches the measurements to the corresponding landmarks. Finally, we use this representation for the point-cloud registration and the subsequent pose regression task. Furthermore, we introduce a training simulation framework that artificially generates measurements and landmarks to facilitate the deployment process and reduce the cost of creating extensive datasets from real-world data. We evaluate our method on our dataset, as well as an adapted version of the Kitti odometry dataset, where we achieve superior performance compared to related approaches; and additionally show dominant generalization capabilities. △ Less

Submitted 16 July, 2021; originally announced July 2021.

Comments: Accepted for publication at 24th IEEE International Conference on Intelligent Transportation Systems (ITSC 2021)

arXiv:2106.08693 [pdf, ps, other]

ParticleAugment: Sampling-Based Data Augmentation

Authors: Alexander Tsaregorodtsev, Vasileios Belagiannis

Abstract: We present an automated data augmentation approach for image classification. We formulate the problem as Monte Carlo sampling where our goal is to approximate the optimal augmentation policies. We propose a particle filtering scheme for the policy search where the probability of applying a set of augmentation operations forms the state of the filter. We measure the policy performance based on the… ▽ More We present an automated data augmentation approach for image classification. We formulate the problem as Monte Carlo sampling where our goal is to approximate the optimal augmentation policies. We propose a particle filtering scheme for the policy search where the probability of applying a set of augmentation operations forms the state of the filter. We measure the policy performance based on the loss function difference between a reference and the actual model, which we afterwards use to re-weight the particles and finally update the policy. In our experiments, we show that our formulation for automated augmentation reaches promising results on CIFAR-10, CIFAR-100, and ImageNet datasets using the standard network architectures for this problem. By comparing with the related work, our method reaches a balance between the computational cost of policy search and the model performance. Our code will be made publicly available. △ Less

Submitted 15 October, 2021; v1 submitted 16 June, 2021; originally announced June 2021.

Comments: 8 pages

arXiv:2103.11395 [pdf, other]

ScanMix: Learning from Severe Label Noise via Semantic Clustering and Semi-Supervised Learning

Authors: Ragav Sachdeva, Filipe R Cordeiro, Vasileios Belagiannis, Ian Reid, Gustavo Carneiro

Abstract: We propose a new training algorithm, ScanMix, that explores semantic clustering and semi-supervised learning (SSL) to allow superior robustness to severe label noise and competitive robustness to non-severe label noise problems, in comparison to the state of the art (SOTA) methods. ScanMix is based on the expectation maximisation framework, where the E-step estimates the latent variable to cluster… ▽ More We propose a new training algorithm, ScanMix, that explores semantic clustering and semi-supervised learning (SSL) to allow superior robustness to severe label noise and competitive robustness to non-severe label noise problems, in comparison to the state of the art (SOTA) methods. ScanMix is based on the expectation maximisation framework, where the E-step estimates the latent variable to cluster the training images based on their appearance and classification results, and the M-step optimises the SSL classification and learns effective feature representations via semantic clustering. We present a theoretical result that shows the correctness and convergence of ScanMix, and an empirical result that shows that ScanMix has SOTA results on CIFAR-10/-100 (with symmetric, asymmetric and semantic label noise), Red Mini-ImageNet (from the Controlled Noisy Web Labels), Clothing1M and WebVision. In all benchmarks with severe label noise, our results are competitive to the current SOTA. △ Less

Submitted 16 October, 2022; v1 submitted 21 March, 2021; originally announced March 2021.

Comments: Paper accepted at Pattern Recognition

arXiv:2103.04173 [pdf, other]

doi 10.1016/j.patcog.2022.109013

LongReMix: Robust Learning with High Confidence Samples in a Noisy Label Environment

Authors: Filipe R. Cordeiro, Ragav Sachdeva, Vasileios Belagiannis, Ian Reid, Gustavo Carneiro

Abstract: Deep neural network models are robust to a limited amount of label noise, but their ability to memorise noisy labels in high noise rate problems is still an open issue. The most competitive noisy-label learning algorithms rely on a 2-stage process comprising an unsupervised learning to classify training samples as clean or noisy, followed by a semi-supervised learning that minimises the empirical… ▽ More Deep neural network models are robust to a limited amount of label noise, but their ability to memorise noisy labels in high noise rate problems is still an open issue. The most competitive noisy-label learning algorithms rely on a 2-stage process comprising an unsupervised learning to classify training samples as clean or noisy, followed by a semi-supervised learning that minimises the empirical vicinal risk (EVR) using a labelled set formed by samples classified as clean, and an unlabelled set with samples classified as noisy. In this paper, we hypothesise that the generalisation of such 2-stage noisy-label learning methods depends on the precision of the unsupervised classifier and the size of the training set to minimise the EVR. We empirically validate these two hypotheses and propose the new 2-stage noisy-label training algorithm LongReMix. We test LongReMix on the noisy-label benchmarks CIFAR-10, CIFAR-100, WebVision, Clothing1M, and Food101-N. The results show that our LongReMix generalises better than competing approaches, particularly in high label noise problems. Furthermore, our approach achieves state-of-the-art performance in most datasets. The code is available at https://github.com/filipe-research/LongReMix. △ Less

Submitted 4 September, 2022; v1 submitted 6 March, 2021; originally announced March 2021.

Comments: Published at Pattern Recognition 2022

arXiv:2103.04053 [pdf, other]

NVUM: Non-Volatile Unbiased Memory for Robust Medical Image Classification

Authors: Fengbei Liu, Yuanhong Chen, Yu Tian, Yuyuan Liu, Chong Wang, Vasileios Belagiannis, Gustavo Carneiro

Abstract: Real-world large-scale medical image analysis (MIA) datasets have three challenges: 1) they contain noisy-labelled samples that affect training convergence and generalisation, 2) they usually have an imbalanced distribution of samples per class, and 3) they normally comprise a multi-label problem, where samples can have multiple diagnoses. Current approaches are commonly trained to solve a subset… ▽ More Real-world large-scale medical image analysis (MIA) datasets have three challenges: 1) they contain noisy-labelled samples that affect training convergence and generalisation, 2) they usually have an imbalanced distribution of samples per class, and 3) they normally comprise a multi-label problem, where samples can have multiple diagnoses. Current approaches are commonly trained to solve a subset of those problems, but we are unaware of methods that address the three problems simultaneously. In this paper, we propose a new training module called Non-Volatile Unbiased Memory (NVUM), which non-volatility stores running average of model logits for a new regularization loss on noisy multi-label problem. We further unbias the classification prediction in NVUM update for imbalanced learning problem. We run extensive experiments to evaluate NVUM on new benchmarks proposed by this paper, where training is performed on noisy multi-label imbalanced chest X-ray (CXR) training sets, formed by Chest-Xray14 and CheXpert, and the testing is performed on the clean multi-label CXR datasets OpenI and PadChest. Our method outperforms previous state-of-the-art CXR classifiers and previous methods that can deal with noisy labels on all evaluations. Our code is available at https://github.com/FBLADL/NVUM. △ Less

Submitted 21 August, 2022; v1 submitted 6 March, 2021; originally announced March 2021.

Comments: MICCAI 2022 Early Accept

arXiv:2103.03629 [pdf, other]

Self-supervised Mean Teacher for Semi-supervised Chest X-ray Classification

Authors: Fengbei Liu, Yu Tian, Filipe R. Cordeiro, Vasileios Belagiannis, Ian Reid, Gustavo Carneiro

Abstract: The training of deep learning models generally requires a large amount of annotated data for effective convergence and generalisation. However, obtaining high-quality annotations is a laboursome and expensive process due to the need of expert radiologists for the labelling task. The study of semi-supervised learning in medical image analysis is then of crucial importance given that it is much less… ▽ More The training of deep learning models generally requires a large amount of annotated data for effective convergence and generalisation. However, obtaining high-quality annotations is a laboursome and expensive process due to the need of expert radiologists for the labelling task. The study of semi-supervised learning in medical image analysis is then of crucial importance given that it is much less expensive to obtain unlabelled images than to acquire images labelled by expert radiologists. Essentially, semi-supervised methods leverage large sets of unlabelled data to enable better training convergence and generalisation than using only the small set of labelled images. In this paper, we propose Self-supervised Mean Teacher for Semi-supervised (S$^2$MTS$^2$) learning that combines self-supervised mean-teacher pre-training with semi-supervised fine-tuning. The main innovation of S$^2$MTS$^2$ is the self-supervised mean-teacher pre-training based on the joint contrastive learning, which uses an infinite number of pairs of positive query and key features to improve the mean-teacher representation. The model is then fine-tuned using the exponential moving average teacher framework trained with semi-supervised learning. We validate S$^2$MTS$^2$ on the multi-label classification problems from Chest X-ray14 and CheXpert, and the multi-class classification from ISIC2018, where we show that it outperforms the previous SOTA semi-supervised learning methods by a large margin. △ Less

Submitted 4 November, 2021; v1 submitted 5 March, 2021; originally announced March 2021.

Comments: MLMI-MICCAI 2021

arXiv:2011.08659 [pdf, other]

doi 10.1109/ICRA48506.2021.9561375

Dynamic Occupancy Grid Map** with Recurrent Neural Networks

Authors: Marcel Schreiber, Vasileios Belagiannis, Claudius Gläser, Klaus Dietmayer

Abstract: Modeling and understanding the environment is an essential task for autonomous driving. In addition to the detection of objects, in complex traffic scenarios the motion of other road participants is of special interest. Therefore, we propose to use a recurrent neural network to predict a dynamic occupancy grid map, which divides the vehicle surrounding in cells, each containing the occupancy proba… ▽ More Modeling and understanding the environment is an essential task for autonomous driving. In addition to the detection of objects, in complex traffic scenarios the motion of other road participants is of special interest. Therefore, we propose to use a recurrent neural network to predict a dynamic occupancy grid map, which divides the vehicle surrounding in cells, each containing the occupancy probability and a velocity estimate. During training, our network is fed with sequences of measurement grid maps, which encode the lidar measurements of a single time step. Due to the combination of convolutional and recurrent layers, our approach is capable to use spatial and temporal information for the robust detection of static and dynamic environment. In order to apply our approach with measurements from a moving ego-vehicle, we propose a method for ego-motion compensation that is applicable in neural network architectures with recurrent layers working on different resolutions. In our evaluations, we compare our approach with a state-of-the-art particle-based algorithm on a large publicly available dataset to demonstrate the improved accuracy of velocity estimates and the more robust separation of the environment in static and dynamic area. Additionally, we show that our proposed method for ego-motion compensation leads to comparable results in scenarios with stationary and with moving ego-vehicle. △ Less

Submitted 5 May, 2022; v1 submitted 17 November, 2020; originally announced November 2020.

Journal ref: 2021 IEEE International Conference on Robotics and Automation (ICRA), May 30 - June 5, 2021, Xi'an, China, pp. 6717-6724

arXiv:2011.05704 [pdf, other]

EvidentialMix: Learning with Combined Open-set and Closed-set Noisy Labels

Authors: Ragav Sachdeva, Filipe R. Cordeiro, Vasileios Belagiannis, Ian Reid, Gustavo Carneiro

Abstract: The efficacy of deep learning depends on large-scale data sets that have been carefully curated with reliable data acquisition and annotation processes. However, acquiring such large-scale data sets with precise annotations is very expensive and time-consuming, and the cheap alternatives often yield data sets that have noisy labels. The field has addressed this problem by focusing on training mode… ▽ More The efficacy of deep learning depends on large-scale data sets that have been carefully curated with reliable data acquisition and annotation processes. However, acquiring such large-scale data sets with precise annotations is very expensive and time-consuming, and the cheap alternatives often yield data sets that have noisy labels. The field has addressed this problem by focusing on training models under two types of label noise: 1) closed-set noise, where some training samples are incorrectly annotated to a training label other than their known true class; and 2) open-set noise, where the training set includes samples that possess a true class that is (strictly) not contained in the set of known training labels. In this work, we study a new variant of the noisy label problem that combines the open-set and closed-set noisy labels, and introduce a benchmark evaluation to assess the performance of training algorithms under this setup. We argue that such problem is more general and better reflects the noisy label scenarios in practice. Furthermore, we propose a novel algorithm, called EvidentialMix, that addresses this problem and compare its performance with the state-of-the-art methods for both closed-set and open-set noise on the proposed benchmark. Our results show that our method produces superior classification results and better feature representations than previous state-of-the-art methods. The code is available at https://github.com/ragavsachdeva/EvidentialMix. △ Less

Submitted 11 November, 2020; originally announced November 2020.

Comments: Paper accepted at WACV'21: Winter Conference on Applications of Computer Vision

arXiv:2011.00931 [pdf, other]

doi 10.1109/ACCESS.2021.3116304

Point Transformer

Authors: Nico Engel, Vasileios Belagiannis, Klaus Dietmayer

Abstract: In this work, we present Point Transformer, a deep neural network that operates directly on unordered and unstructured point sets. We design Point Transformer to extract local and global features and relate both representations by introducing the local-global attention mechanism, which aims to capture spatial point relations and shape information. For that purpose, we propose SortNet, as part of t… ▽ More In this work, we present Point Transformer, a deep neural network that operates directly on unordered and unstructured point sets. We design Point Transformer to extract local and global features and relate both representations by introducing the local-global attention mechanism, which aims to capture spatial point relations and shape information. For that purpose, we propose SortNet, as part of the Point Transformer, which induces input permutation invariance by selecting points based on a learned score. The output of Point Transformer is a sorted and permutation invariant feature list that can directly be incorporated into common computer vision applications. We evaluate our approach on standard classification and part segmentation benchmarks to demonstrate competitive results compared to the prior work. Code is publicly available at: https://github.com/engelnico/point-transformer △ Less

Submitted 14 October, 2021; v1 submitted 2 November, 2020; originally announced November 2020.

arXiv:2007.16072 [pdf, other]

Traffic Control Gesture Recognition for Autonomous Vehicles

Authors: Julian Wiederer, Arij Bouazizi, Ulrich Kressel, Vasileios Belagiannis

Abstract: A car driver knows how to react on the gestures of the traffic officers. Clearly, this is not the case for the autonomous vehicle, unless it has road traffic control gesture recognition functionalities. In this work, we address the limitation of the existing autonomous driving datasets to provide learning data for traffic control gesture recognition. We introduce a dataset that is based on 3D body… ▽ More A car driver knows how to react on the gestures of the traffic officers. Clearly, this is not the case for the autonomous vehicle, unless it has road traffic control gesture recognition functionalities. In this work, we address the limitation of the existing autonomous driving datasets to provide learning data for traffic control gesture recognition. We introduce a dataset that is based on 3D body skeleton input to perform traffic control gesture classification on every time step. Our dataset consists of 250 sequences from several actors, ranging from 16 to 90 seconds per sequence. To evaluate our dataset, we propose eight sequential processing models based on deep neural networks such as recurrent networks, attention mechanism, temporal convolutional networks and graph convolutional networks. We present an extensive evaluation and analysis of all approaches for our dataset, as well as real-world quantitative evaluation. The code and dataset is publicly available. △ Less

Submitted 31 July, 2020; originally announced July 2020.

Comments: 8 pages, 8 figures, 3 tables, accepted by IROS 2020

arXiv:2007.11255 [pdf, other]

doi 10.1109/ITSC45102.2020.9294279

DeepCLR: Correspondence-Less Architecture for Deep End-to-End Point Cloud Registration

Authors: Markus Horn, Nico Engel, Vasileios Belagiannis, Michael Buchholz, Klaus Dietmayer

Abstract: This work addresses the problem of point cloud registration using deep neural networks. We propose an approach to predict the alignment between two point clouds with overlap** data content, but displaced origins. Such point clouds originate, for example, from consecutive measurements of a LiDAR mounted on a moving platform. The main difficulty in deep registration of raw point clouds is the fusi… ▽ More This work addresses the problem of point cloud registration using deep neural networks. We propose an approach to predict the alignment between two point clouds with overlap** data content, but displaced origins. Such point clouds originate, for example, from consecutive measurements of a LiDAR mounted on a moving platform. The main difficulty in deep registration of raw point clouds is the fusion of template and source point cloud. Our proposed architecture applies flow embedding to tackle this problem, which generates features that describe the motion of each template point. These features are then used to predict the alignment in an end-to-end fashion without extracting explicit point correspondences between both input clouds. We rely on the KITTI odometry and ModelNet40 datasets for evaluating our method on various point distributions. Our approach achieves state-of-the-art accuracy and the lowest run-time of the compared methods. △ Less

Submitted 13 January, 2021; v1 submitted 22 July, 2020; originally announced July 2020.

Comments: 7 pages, 5 figures, 4 tables

Journal ref: 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC)

arXiv:2007.01671 [pdf, other]

Few-Shot Microscopy Image Cell Segmentation

Authors: Youssef Dawoud, Julia Hornauer, Gustavo Carneiro, Vasileios Belagiannis

Abstract: Automatic cell segmentation in microscopy images works well with the support of deep neural networks trained with full supervision. Collecting and annotating images, though, is not a sustainable solution for every new microscopy database and cell type. Instead, we assume that we can access a plethora of annotated image data sets from different domains (sources) and a limited number of annotated im… ▽ More Automatic cell segmentation in microscopy images works well with the support of deep neural networks trained with full supervision. Collecting and annotating images, though, is not a sustainable solution for every new microscopy database and cell type. Instead, we assume that we can access a plethora of annotated image data sets from different domains (sources) and a limited number of annotated image data sets from the domain of interest (target), where each domain denotes not only different image appearance but also a different type of cell segmentation problem. We pose this problem as meta-learning where the goal is to learn a generic and adaptable few-shot learning model from the available source domain data sets and cell segmentation tasks. The model can be afterwards fine-tuned on the few annotated images of the target domain that contains different image appearance and different cell type. In our meta-learning training, we propose the combination of three objective functions to segment the cells, move the segmentation results away from the classification boundary using cross-domain tasks, and learn an invariant representation between tasks of the source domains. Our experiments on five public databases show promising results from 1- to 10-shot meta-learning using standard segmentation neural network architectures. △ Less

Submitted 29 June, 2020; originally announced July 2020.

Comments: 16 pages, 4 figures, Accepted by ECML-PKDD 2020 conference

arXiv:1911.01711 [pdf, other]

LACI: Low-effort Automatic Calibration of Infrastructure Sensors

Authors: Johannes Müller, Martin Herrmann, Jan Strohbeck, Vasileios Belagiannis, Michael Buchholz

Abstract: Sensor calibration usually is a time consuming yet important task. While classical approaches are sensor-specific and often need calibration targets as well as a widely overlap** field of view (FOV), within this work, a cooperative intelligent vehicle is used as callibration target. The vehicleis detected in the sensor frame and then matched with the information received from the cooperative awa… ▽ More Sensor calibration usually is a time consuming yet important task. While classical approaches are sensor-specific and often need calibration targets as well as a widely overlap** field of view (FOV), within this work, a cooperative intelligent vehicle is used as callibration target. The vehicleis detected in the sensor frame and then matched with the information received from the cooperative awareness messagessend by the coperative intelligent vehicle. The presented algorithm is fully automated as well as sensor-independent, relying only on a very common set of assumptions. Due to the direct registration on the world frame, no overlap** FOV is necessary. The algorithm is evaluated through experiment for four laserscanners as well as one pair of stereo cameras showing a repetition error within the measurement uncertainty of the sensors. A plausibility check rules out systematic errors that might not have been covered by evaluating the repetition error. △ Less

Submitted 5 November, 2019; originally announced November 2019.

Comments: 6 pages, published at ITSC 2019

arXiv:1909.11387 [pdf, other]

doi 10.1109/ICRA40945.2020.9196702

Motion Estimation in Occupancy Grid Maps in Stationary Settings Using Recurrent Neural Networks

Authors: Marcel Schreiber, Vasileios Belagiannis, Claudius Glaeser, Klaus Dietmayer

Abstract: In this work, we tackle the problem of modeling the vehicle environment as dynamic occupancy grid map in complex urban scenarios using recurrent neural networks. Dynamic occupancy grid maps represent the scene in a bird's eye view, where each grid cell contains the occupancy probability and the two dimensional velocity. As input data, our approach relies on measurement grid maps, which contain occ… ▽ More In this work, we tackle the problem of modeling the vehicle environment as dynamic occupancy grid map in complex urban scenarios using recurrent neural networks. Dynamic occupancy grid maps represent the scene in a bird's eye view, where each grid cell contains the occupancy probability and the two dimensional velocity. As input data, our approach relies on measurement grid maps, which contain occupancy probabilities, generated with lidar measurements. Given this configuration, we propose a recurrent neural network architecture to predict a dynamic occupancy grid map, i.e. filtered occupancy and velocity of each cell, by using a sequence of measurement grid maps. Our network architecture contains convolutional long-short term memories in order to sequentially process the input, makes use of spatial context, and captures motion. In the evaluation, we quantify improvements in estimating the velocity of braking and turning vehicles compared to the state-of-the-art. Additionally, we demonstrate that our approach provides more consistent velocity estimates for dynamic objects, as well as, less erroneous velocity estimates in static area. △ Less

Submitted 5 May, 2022; v1 submitted 25 September, 2019; originally announced September 2019.

Journal ref: 2020 IEEE International Conference on Robotics and Automation (ICRA), May 31 - June 4, 2020, Paris, France, pp. 8587-8593

arXiv:1908.00111 [pdf, other]

Few-Shot Meta-Denoising

Authors: Leslie Casas, Attila Klimmek, Gustavo Carneiro, Nassir Navab, Vasileios Belagiannis

Abstract: We study the problem of few-shot learning-based denoising where the training set contains just a handful of clean and noisy samples. A solution to mitigate the small training set issue is to pre-train a denoising model with small training sets containing pairs of clean and synthesized noisy signals, produced from empirical noise priors, and fine-tune on the available small training set. While such… ▽ More We study the problem of few-shot learning-based denoising where the training set contains just a handful of clean and noisy samples. A solution to mitigate the small training set issue is to pre-train a denoising model with small training sets containing pairs of clean and synthesized noisy signals, produced from empirical noise priors, and fine-tune on the available small training set. While such transfer learning seems effective, it may not generalize well because of the limited amount of training data. In this work, we propose a new meta-learning training approach for few-shot learning-based denoising problems. Our model is meta-trained using known synthetic noise models, and then fine-tuned with the small training set, with the real noise, as a few-shot learning task. Meta-learning from small training sets of synthetically generated data during meta-training enables us to not only generate an infinite number of training tasks, but also train a model to learn with small training sets -- both advantages have the potential to improve the generalisation of the denoising model. Our approach is empirically shown to produce more accurate denoising results than supervised learning and transfer learning in three denoising evaluations for images and 1-D signals. Interestingly, our study provides strong indications that meta-learning has the potential to become the main learning algorithm for denoising. △ Less

Submitted 25 November, 2019; v1 submitted 31 July, 2019; originally announced August 2019.

arXiv:1904.09007 [pdf, other]

DeepLocalization: Landmark-based Self-Localization with Deep Neural Networks

Authors: Nico Engel, Stefan Hoermann, Markus Horn, Vasileios Belagiannis, Klaus Dietmayer

Abstract: We address the problem of vehicle self-localization from multi-modal sensor information and a reference map. The map is generated off-line by extracting landmarks from the vehicle's field of view, while the measurements are collected similarly on the fly. Our goal is to determine the autonomous vehicle's pose from the landmark measurements and map landmarks. To learn this map**, we propose DeepL… ▽ More We address the problem of vehicle self-localization from multi-modal sensor information and a reference map. The map is generated off-line by extracting landmarks from the vehicle's field of view, while the measurements are collected similarly on the fly. Our goal is to determine the autonomous vehicle's pose from the landmark measurements and map landmarks. To learn this map**, we propose DeepLocalization, a deep neural network that regresses the vehicle's translation and rotation parameters from unordered and dynamic input landmarks. The proposed network architecture is robust to changes of the dynamic environment and can cope with a small number of extracted landmarks. During the training process we rely on synthetically generated ground-truth. In our experiments, we evaluate two inference approaches in real-world scenarios. We show that DeepLocalization can be combined with regular GPS signals and filtering algorithms such as the extended Kalman filter. Our approach achieves state-of-the-art accuracy and is about ten times faster than the related work. △ Less

Submitted 19 July, 2019; v1 submitted 18 April, 2019; originally announced April 2019.

Comments: Accepted for publication by the IEEE Intelligent Transportation Systems Conference (ITSC 2019), Auckland, New Zealand

arXiv:1901.02000 [pdf, other]

Forecasting People Trajectories and Head Poses by Jointly Reasoning on Tracklets and Vislets

Authors: Irtiza Hasan, Francesco Setti, Theodore Tsesmelis, Vasileios Belagiannis, Sikandar Amin, Alessio Del Bue, Marco Cristani, Fabio Galasso

Abstract: In this work, we explore the correlation between people trajectories and their head orientations. We argue that people trajectory and head pose forecasting can be modelled as a joint problem. Recent approaches on trajectory forecasting leverage short-term trajectories (aka tracklets) of pedestrians to predict their future paths. In addition, sociological cues, such as expected destination or pedes… ▽ More In this work, we explore the correlation between people trajectories and their head orientations. We argue that people trajectory and head pose forecasting can be modelled as a joint problem. Recent approaches on trajectory forecasting leverage short-term trajectories (aka tracklets) of pedestrians to predict their future paths. In addition, sociological cues, such as expected destination or pedestrian interaction, are often combined with tracklets. In this paper, we propose MiXing-LSTM (MX-LSTM) to capture the interplay between positions and head orientations (vislets) thanks to a joint unconstrained optimization of full covariance matrices during the LSTM backpropagation. We additionally exploit the head orientations as a proxy for the visual attention, when modeling social interactions. MX-LSTM predicts future pedestrians location and head pose, increasing the standard capabilities of the current approaches on long-term trajectory forecasting. Compared to the state-of-the-art, our approach shows better performances on an extensive set of public benchmarks. MX-LSTM is particularly effective when people move slowly, i.e. the most challenging scenario for all other models. The proposed approach also allows for accurate predictions on a longer time horizon. △ Less

Submitted 15 October, 2019; v1 submitted 7 January, 2019; originally announced January 2019.

Comments: Accepted at IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2019. arXiv admin note: text overlap with arXiv:1805.00652

Showing 1–50 of 55 results for author: Belagiannis, V