-
DaF-BEVSeg: Distortion-aware Fisheye Camera based Bird's Eye View Segmentation with Occlusion Reasoning
Authors:
Senthil Yogamani,
David Unger,
Venkatraman Narayanan,
Varun Ravi Kumar
Abstract:
Semantic segmentation is an effective way to perform scene understanding. Recently, segmentation in 3D Bird's Eye View (BEV) space has become popular as its directly used by drive policy. However, there is limited work on BEV segmentation for surround-view fisheye cameras, commonly used in commercial vehicles. As this task has no real-world public dataset and existing synthetic datasets do not han…
▽ More
Semantic segmentation is an effective way to perform scene understanding. Recently, segmentation in 3D Bird's Eye View (BEV) space has become popular as its directly used by drive policy. However, there is limited work on BEV segmentation for surround-view fisheye cameras, commonly used in commercial vehicles. As this task has no real-world public dataset and existing synthetic datasets do not handle amodal regions due to occlusion, we create a synthetic dataset using the Cognata simulator comprising diverse road types, weather, and lighting conditions. We generalize the BEV segmentation to work with any camera model; this is useful for mixing diverse cameras. We implement a baseline by applying cylindrical rectification on the fisheye images and using a standard LSS-based BEV segmentation model. We demonstrate that we can achieve better performance without undistortion, which has the adverse effects of increased runtime due to pre-processing, reduced field-of-view, and resampling artifacts. Further, we introduce a distortion-aware learnable BEV pooling strategy that is more effective for the fisheye cameras. We extend the model with an occlusion reasoning module, which is critical for estimating in BEV space. Qualitative performance of DaF-BEVSeg is showcased in the video at https://streamable.com/ge4v51.
△ Less
Submitted 9 April, 2024;
originally announced April 2024.
-
Impact of Video Compression Artifacts on Fisheye Camera Visual Perception Tasks
Authors:
Madhumitha Sakthi,
Louis Kerofsky,
Varun Ravi Kumar,
Senthil Yogamani
Abstract:
Autonomous driving systems require extensive data collection schemes to cover the diverse scenarios needed for building a robust and safe system. The data volumes are in the order of Exabytes and have to be stored for a long period of time (i.e., more than 10 years of the vehicle's life cycle). Lossless compression doesn't provide sufficient compression ratios, hence, lossy video compression has b…
▽ More
Autonomous driving systems require extensive data collection schemes to cover the diverse scenarios needed for building a robust and safe system. The data volumes are in the order of Exabytes and have to be stored for a long period of time (i.e., more than 10 years of the vehicle's life cycle). Lossless compression doesn't provide sufficient compression ratios, hence, lossy video compression has been explored. It is essential to prove that lossy video compression artifacts do not impact the performance of the perception algorithms. However, there is limited work in this area to provide a solid conclusion. In particular, there is no such work for fisheye cameras, which have high radial distortion and where compression may have higher artifacts. Fisheye cameras are commonly used in automotive systems for 3D object detection task. In this work, we provide the first analysis of the impact of standard video compression codecs on wide FOV fisheye camera images. We demonstrate that the achievable compression with negligible impact depends on the dataset and temporal prediction of the video codec. We propose a radial distortion-aware zonal metric to evaluate the performance of artifacts in fisheye images. In addition, we present a novel method for estimating affine mode parameters of the latest VVC codec, and suggest some areas for improvement in video codecs for the application to fisheye imagery.
△ Less
Submitted 24 March, 2024;
originally announced March 2024.
-
Neural Rendering based Urban Scene Reconstruction for Autonomous Driving
Authors:
Shihao Shen,
Louis Kerofsky,
Varun Ravi Kumar,
Senthil Yogamani
Abstract:
Dense 3D reconstruction has many applications in automated driving including automated annotation validation, multimodal data augmentation, providing ground truth annotations for systems lacking LiDAR, as well as enhancing auto-labeling accuracy. LiDAR provides highly accurate but sparse depth, whereas camera images enable estimation of dense depth but noisy particularly at long ranges. In this pa…
▽ More
Dense 3D reconstruction has many applications in automated driving including automated annotation validation, multimodal data augmentation, providing ground truth annotations for systems lacking LiDAR, as well as enhancing auto-labeling accuracy. LiDAR provides highly accurate but sparse depth, whereas camera images enable estimation of dense depth but noisy particularly at long ranges. In this paper, we harness the strengths of both sensors and propose a multimodal 3D scene reconstruction using a framework combining neural implicit surfaces and radiance fields. In particular, our method estimates dense and accurate 3D structures and creates an implicit map representation based on signed distance fields, which can be further rendered into RGB images, and depth maps. A mesh can be extracted from the learned signed distance field and culled based on occlusion. Dynamic objects are efficiently filtered on the fly during sampling using 3D object detection models. We demonstrate qualitative and quantitative results on challenging automotive scenes.
△ Less
Submitted 9 February, 2024;
originally announced February 2024.
-
Multi-camera Bird's Eye View Perception for Autonomous Driving
Authors:
David Unger,
Nikhil Gosala,
Varun Ravi Kumar,
Shubhankar Borse,
Abhinav Valada,
Senthil Yogamani
Abstract:
Most automated driving systems comprise a diverse sensor set, including several cameras, Radars, and LiDARs, ensuring a complete 360°coverage in near and far regions. Unlike Radar and LiDAR, which measure directly in 3D, cameras capture a 2D perspective projection with inherent depth ambiguity. However, it is essential to produce perception outputs in 3D to enable the spatial reasoning of other ag…
▽ More
Most automated driving systems comprise a diverse sensor set, including several cameras, Radars, and LiDARs, ensuring a complete 360°coverage in near and far regions. Unlike Radar and LiDAR, which measure directly in 3D, cameras capture a 2D perspective projection with inherent depth ambiguity. However, it is essential to produce perception outputs in 3D to enable the spatial reasoning of other agents and structures for optimal path planning. The 3D space is typically simplified to the BEV space by omitting the less relevant Z-coordinate, which corresponds to the height dimension.The most basic approach to achieving the desired BEV representation from a camera image is IPM, assuming a flat ground surface. Surround vision systems that are pretty common in new vehicles use the IPM principle to generate a BEV image and to show it on display to the driver. However, this approach is not suited for autonomous driving since there are severe distortions introduced by this too-simplistic transformation method. More recent approaches use deep neural networks to output directly in BEV space. These methods transform camera images into BEV space using geometric constraints implicitly or explicitly in the network. As CNN has more context information and a learnable transformation can be more flexible and adapt to image content, the deep learning-based methods set the new benchmark for BEV transformation and achieve state-of-the-art performance. First, this chapter discusses the contemporary trends of multi-camera-based DNN (deep neural network) models outputting object representations directly in the BEV space. Then, we discuss how this approach can extend to effective sensor fusion and coupling downstream tasks like situation analysis and prediction. Finally, we show challenges and open problems in BEV perception.
△ Less
Submitted 19 September, 2023; v1 submitted 16 September, 2023;
originally announced September 2023.
-
LiDAR-BEVMTN: Real-Time LiDAR Bird's-Eye View Multi-Task Perception Network for Autonomous Driving
Authors:
Sambit Mohapatra,
Senthil Yogamani,
Varun Ravi Kumar,
Stefan Milz,
Heinrich Gotzig,
Patrick Mäder
Abstract:
LiDAR is crucial for robust 3D scene perception in autonomous driving. LiDAR perception has the largest body of literature after camera perception. However, multi-task learning across tasks like detection, segmentation, and motion estimation using LiDAR remains relatively unexplored, especially on automotive-grade embedded platforms. We present a real-time multi-task convolutional neural network f…
▽ More
LiDAR is crucial for robust 3D scene perception in autonomous driving. LiDAR perception has the largest body of literature after camera perception. However, multi-task learning across tasks like detection, segmentation, and motion estimation using LiDAR remains relatively unexplored, especially on automotive-grade embedded platforms. We present a real-time multi-task convolutional neural network for LiDAR-based object detection, semantics, and motion segmentation. The unified architecture comprises a shared encoder and task-specific decoders, enabling joint representation learning. We propose a novel Semantic Weighting and Guidance (SWAG) module to transfer semantic features for improved object detection selectively. Our heterogeneous training scheme combines diverse datasets and exploits complementary cues between tasks. The work provides the first embedded implementation unifying these key perception tasks from LiDAR point clouds achieving 3ms latency on the embedded NVIDIA Xavier platform. We achieve state-of-the-art results for two tasks, semantic and motion segmentation, and close to state-of-the-art performance for 3D object detection. By maximizing hardware efficiency and leveraging multi-task synergies, our method delivers an accurate and efficient solution tailored for real-world automated driving deployment. Qualitative results can be seen at https://youtu.be/H-hWRzv2lIY.
△ Less
Submitted 17 July, 2023;
originally announced July 2023.
-
Class adaptive threshold and negative class guided noisy annotation robust Facial Expression Recognition
Authors:
Darshan Gera,
Badveeti Naveen Siva Kumar,
Bobbili Veerendra Raj Kumar,
S Balasubramanian
Abstract:
The hindering problem in facial expression recognition (FER) is the presence of inaccurate annotations referred to as noisy annotations in the datasets. These noisy annotations are present in the datasets inherently because the labeling is subjective to the annotator, clarity of the image, etc. Recent works use sample selection methods to solve this noisy annotation problem in FER. In our work, we…
▽ More
The hindering problem in facial expression recognition (FER) is the presence of inaccurate annotations referred to as noisy annotations in the datasets. These noisy annotations are present in the datasets inherently because the labeling is subjective to the annotator, clarity of the image, etc. Recent works use sample selection methods to solve this noisy annotation problem in FER. In our work, we use a dynamic adaptive threshold to separate confident samples from non-confident ones so that our learning won't be hampered due to non-confident samples. Instead of discarding the non-confident samples, we impose consistency in the negative classes of those non-confident samples to guide the model to learn better in the positive class. Since FER datasets usually come with 7 or 8 classes, we can correctly guess a negative class by 85% probability even by choosing randomly. By learning "which class a sample doesn't belong to", the model can learn "which class it belongs to" in a better manner. We demonstrate proposed framework's effectiveness using quantitative as well as qualitative results. Our method performs better than the baseline by a margin of 4% to 28% on RAFDB and 3.3% to 31.4% on FERPlus for various levels of synthetic noisy labels in the aforementioned datasets.
△ Less
Submitted 3 May, 2023;
originally announced May 2023.
-
ABAW : Facial Expression Recognition in the wild
Authors:
Darshan Gera,
Badveeti Naveen Siva Kumar,
Bobbili Veerendra Raj Kumar,
S Balasubramanian
Abstract:
The fifth Affective Behavior Analysis in-the-wild (ABAW) competition has multiple challenges such as Valence-Arousal Estimation Challenge, Expression Classification Challenge, Action Unit Detection Challenge, Emotional Reaction Intensity Estimation Challenge. In this paper we have dealt only expression classification challenge using multiple approaches such as fully supervised, semi-supervised and…
▽ More
The fifth Affective Behavior Analysis in-the-wild (ABAW) competition has multiple challenges such as Valence-Arousal Estimation Challenge, Expression Classification Challenge, Action Unit Detection Challenge, Emotional Reaction Intensity Estimation Challenge. In this paper we have dealt only expression classification challenge using multiple approaches such as fully supervised, semi-supervised and noisy label approach. Our approach using noise aware model has performed better than baseline model by 10.46% and semi supervised model has performed better than baseline model by 9.38% and the fully supervised model has performed better than the baseline by 9.34%
△ Less
Submitted 17 March, 2023;
originally announced March 2023.
-
X$^3$KD: Knowledge Distillation Across Modalities, Tasks and Stages for Multi-Camera 3D Object Detection
Authors:
Marvin Klingner,
Shubhankar Borse,
Varun Ravi Kumar,
Behnaz Rezaei,
Venkatraman Narayanan,
Senthil Yogamani,
Fatih Porikli
Abstract:
Recent advances in 3D object detection (3DOD) have obtained remarkably strong results for LiDAR-based models. In contrast, surround-view 3DOD models based on multiple camera images underperform due to the necessary view transformation of features from perspective view (PV) to a 3D world representation which is ambiguous due to missing depth information. This paper introduces X$^3$KD, a comprehensi…
▽ More
Recent advances in 3D object detection (3DOD) have obtained remarkably strong results for LiDAR-based models. In contrast, surround-view 3DOD models based on multiple camera images underperform due to the necessary view transformation of features from perspective view (PV) to a 3D world representation which is ambiguous due to missing depth information. This paper introduces X$^3$KD, a comprehensive knowledge distillation framework across different modalities, tasks, and stages for multi-camera 3DOD. Specifically, we propose cross-task distillation from an instance segmentation teacher (X-IS) in the PV feature extraction stage providing supervision without ambiguous error backpropagation through the view transformation. After the transformation, we apply cross-modal feature distillation (X-FD) and adversarial training (X-AT) to improve the 3D world representation of multi-camera features through the information contained in a LiDAR-based 3DOD teacher. Finally, we also employ this teacher for cross-modal output distillation (X-OD), providing dense supervision at the prediction stage. We perform extensive ablations of knowledge distillation at different stages of multi-camera 3DOD. Our final X$^3$KD model outperforms previous state-of-the-art approaches on the nuScenes and Waymo datasets and generalizes to RADAR-based 3DOD. Qualitative results video at https://youtu.be/1do9DPFmr38.
△ Less
Submitted 3 March, 2023;
originally announced March 2023.
-
X-Align: Cross-Modal Cross-View Alignment for Bird's-Eye-View Segmentation
Authors:
Shubhankar Borse,
Marvin Klingner,
Varun Ravi Kumar,
Hong Cai,
Abdulaziz Almuzairee,
Senthil Yogamani,
Fatih Porikli
Abstract:
Bird's-eye-view (BEV) grid is a common representation for the perception of road components, e.g., drivable area, in autonomous driving. Most existing approaches rely on cameras only to perform segmentation in BEV space, which is fundamentally constrained by the absence of reliable depth information. Latest works leverage both camera and LiDAR modalities, but sub-optimally fuse their features usin…
▽ More
Bird's-eye-view (BEV) grid is a common representation for the perception of road components, e.g., drivable area, in autonomous driving. Most existing approaches rely on cameras only to perform segmentation in BEV space, which is fundamentally constrained by the absence of reliable depth information. Latest works leverage both camera and LiDAR modalities, but sub-optimally fuse their features using simple, concatenation-based mechanisms.
In this paper, we address these problems by enhancing the alignment of the unimodal features in order to aid feature fusion, as well as enhancing the alignment between the cameras' perspective view (PV) and BEV representations. We propose X-Align, a novel end-to-end cross-modal and cross-view learning framework for BEV segmentation consisting of the following components: (i) a novel Cross-Modal Feature Alignment (X-FA) loss, (ii) an attention-based Cross-Modal Feature Fusion (X-FF) module to align multi-modal BEV features implicitly, and (iii) an auxiliary PV segmentation branch with Cross-View Segmentation Alignment (X-SA) losses to improve the PV-to-BEV transformation. We evaluate our proposed method across two commonly used benchmark datasets, i.e., nuScenes and KITTI-360. Notably, X-Align significantly outperforms the state-of-the-art by 3 absolute mIoU points on nuScenes. We also provide extensive ablation studies to demonstrate the effectiveness of the individual components.
△ Less
Submitted 31 October, 2022; v1 submitted 13 October, 2022;
originally announced October 2022.
-
Dynamic Adaptive Threshold based Learning for Noisy Annotations Robust Facial Expression Recognition
Authors:
Darshan Gera,
Naveen Siva Kumar Badveeti,
Bobbili Veerendra Raj Kumar,
S Balasubramanian
Abstract:
The real-world facial expression recognition (FER) datasets suffer from noisy annotations due to crowd-sourcing, ambiguity in expressions, the subjectivity of annotators and inter-class similarity. However, the recent deep networks have strong capacity to memorize the noisy annotations leading to corrupted feature embedding and poor generalization. To handle noisy annotations, we propose a dynamic…
▽ More
The real-world facial expression recognition (FER) datasets suffer from noisy annotations due to crowd-sourcing, ambiguity in expressions, the subjectivity of annotators and inter-class similarity. However, the recent deep networks have strong capacity to memorize the noisy annotations leading to corrupted feature embedding and poor generalization. To handle noisy annotations, we propose a dynamic FER learning framework (DNFER) in which clean samples are selected based on dynamic class specific threshold during training. Specifically, DNFER is based on supervised training using selected clean samples and unsupervised consistent training using all the samples. During training, the mean posterior class probabilities of each mini-batch is used as dynamic class-specific threshold to select the clean samples for supervised training. This threshold is independent of noise rate and does not need any clean data unlike other methods. In addition, to learn from all samples, the posterior distributions between weakly-augmented image and strongly-augmented image are aligned using an unsupervised consistency loss. We demonstrate the robustness of DNFER on both synthetic as well as on real noisy annotated FER datasets like RAFDB, FERPlus, SFEW and AffectNet.
△ Less
Submitted 22 August, 2022;
originally announced August 2022.
-
SS-MFAR : Semi-supervised Multi-task Facial Affect Recognition
Authors:
Darshan Gera,
Badveeti Naveen Siva Kumar,
Bobbili Veerendra Raj Kumar,
S Balasubramanian
Abstract:
Automatic affect recognition has applications in many areas such as education, gaming, software development, automotives, medical care, etc. but it is non trivial task to achieve appreciable performance on in-the-wild data sets. In-the-wild data sets though represent real-world scenarios better than synthetic data sets, the former ones suffer from the problem of incomplete labels. Inspired by semi…
▽ More
Automatic affect recognition has applications in many areas such as education, gaming, software development, automotives, medical care, etc. but it is non trivial task to achieve appreciable performance on in-the-wild data sets. In-the-wild data sets though represent real-world scenarios better than synthetic data sets, the former ones suffer from the problem of incomplete labels. Inspired by semi-supervised learning, in this paper, we introduce our submission to the Multi-Task-Learning Challenge at the 4th Affective Behavior Analysis in-the-wild (ABAW) 2022 Competition. The three tasks that are considered in this challenge are valence-arousal(VA) estimation, classification of expressions into 6 basic (anger, disgust, fear, happiness, sadness, surprise), neutral, and the 'other' category and 12 action units(AU) numbered AU-{1,2,4,6,7,10,12,15,23,24,25,26}. Our method Semi-supervised Multi-task Facial Affect Recognition titled SS-MFAR uses a deep residual network with task specific classifiers for each of the tasks along with adaptive thresholds for each expression class and semi-supervised learning for the incomplete labels. Source code is available at https://github.com/1980x/ABAW2022DMACS.
△ Less
Submitted 5 August, 2022; v1 submitted 18 July, 2022;
originally announced July 2022.
-
Woodscape Fisheye Object Detection for Autonomous Driving -- CVPR 2022 OmniCV Workshop Challenge
Authors:
Saravanabalagi Ramachandran,
Ganesh Sistu,
Varun Ravi Kumar,
John McDonald,
Senthil Yogamani
Abstract:
Object detection is a comprehensively studied problem in autonomous driving. However, it has been relatively less explored in the case of fisheye cameras. The strong radial distortion breaks the translation invariance inductive bias of Convolutional Neural Networks. Thus, we present the WoodScape fisheye object detection challenge for autonomous driving which was held as part of the CVPR 2022 Work…
▽ More
Object detection is a comprehensively studied problem in autonomous driving. However, it has been relatively less explored in the case of fisheye cameras. The strong radial distortion breaks the translation invariance inductive bias of Convolutional Neural Networks. Thus, we present the WoodScape fisheye object detection challenge for autonomous driving which was held as part of the CVPR 2022 Workshop on Omnidirectional Computer Vision (OmniCV). This is one of the first competitions focused on fisheye camera object detection. We encouraged the participants to design models which work natively on fisheye images without rectification. We used CodaLab to host the competition based on the publicly available WoodScape fisheye dataset. In this paper, we provide a detailed analysis on the competition which attracted the participation of 120 global teams and a total of 1492 submissions. We briefly discuss the details of the winning methods and analyze their qualitative and quantitative results.
△ Less
Submitted 26 June, 2022;
originally announced June 2022.
-
Surround-View Cameras based Holistic Visual Perception for Automated Driving
Authors:
Varun Ravi Kumar
Abstract:
The formation of eyes led to the big bang of evolution. The dynamics changed from a primitive organism waiting for the food to come into contact for eating food being sought after by visual sensors. The human eye is one of the most sophisticated developments of evolution, but it still has defects. Humans have evolved a biological perception algorithm capable of driving cars, operating machinery, p…
▽ More
The formation of eyes led to the big bang of evolution. The dynamics changed from a primitive organism waiting for the food to come into contact for eating food being sought after by visual sensors. The human eye is one of the most sophisticated developments of evolution, but it still has defects. Humans have evolved a biological perception algorithm capable of driving cars, operating machinery, piloting aircraft, and navigating ships over millions of years. Automating these capabilities for computers is critical for various applications, including self-driving cars, augmented reality, and architectural surveying. Near-field visual perception in the context of self-driving cars can perceive the environment in a range of $0-10$ meters and 360° coverage around the vehicle. It is a critical decision-making component in the development of safer automated driving. Recent advances in computer vision and deep learning, in conjunction with high-quality sensors such as cameras and LiDARs, have fueled mature visual perception solutions. Until now, far-field perception has been the primary focus. Another significant issue is the limited processing power available for develo** real-time applications. Because of this bottleneck, there is frequently a trade-off between performance and run-time efficiency. We concentrate on the following issues in order to address them: 1) Develo** near-field perception algorithms with high performance and low computational complexity for various visual perception tasks such as geometric and semantic tasks using convolutional neural networks. 2) Using Multi-Task Learning to overcome computational bottlenecks by sharing initial convolutional layers between tasks and develo** optimization strategies that balance tasks.
△ Less
Submitted 11 June, 2022;
originally announced June 2022.
-
Surround-view Fisheye Camera Perception for Automated Driving: Overview, Survey and Challenges
Authors:
Varun Ravi Kumar,
Ciaran Eising,
Christian Witt,
Senthil Yogamani
Abstract:
Surround-view fisheye cameras are commonly used for near-field sensing in automated driving. Four fisheye cameras on four sides of the vehicle are sufficient to cover 360° around the vehicle capturing the entire near-field region. Some primary use cases are automated parking, traffic jam assist, and urban driving. There are limited datasets and very little work on near-field perception tasks as th…
▽ More
Surround-view fisheye cameras are commonly used for near-field sensing in automated driving. Four fisheye cameras on four sides of the vehicle are sufficient to cover 360° around the vehicle capturing the entire near-field region. Some primary use cases are automated parking, traffic jam assist, and urban driving. There are limited datasets and very little work on near-field perception tasks as the focus in automotive perception is on far-field perception. In contrast to far-field, surround-view perception poses additional challenges due to high precision object detection requirements of 10cm and partial visibility of objects. Due to the large radial distortion of fisheye cameras, standard algorithms cannot be extended easily to the surround-view use case. Thus, we are motivated to provide a self-contained reference for automotive fisheye camera perception for researchers and practitioners. Firstly, we provide a unified and taxonomic treatment of commonly used fisheye camera models. Secondly, we discuss various perception tasks and existing literature. Finally, we discuss the challenges and future direction.
△ Less
Submitted 5 January, 2023; v1 submitted 26 May, 2022;
originally announced May 2022.
-
SynWoodScape: Synthetic Surround-view Fisheye Camera Dataset for Autonomous Driving
Authors:
Ahmed Rida Sekkat,
Yohan Dupuis,
Varun Ravi Kumar,
Hazem Rashed,
Senthil Yogamani,
Pascal Vasseur,
Paul Honeine
Abstract:
Surround-view cameras are a primary sensor for automated driving, used for near-field perception. It is one of the most commonly used sensors in commercial vehicles primarily used for parking visualization and automated parking. Four fisheye cameras with a 190° field of view cover the 360° around the vehicle. Due to its high radial distortion, the standard algorithms do not extend easily. Previous…
▽ More
Surround-view cameras are a primary sensor for automated driving, used for near-field perception. It is one of the most commonly used sensors in commercial vehicles primarily used for parking visualization and automated parking. Four fisheye cameras with a 190° field of view cover the 360° around the vehicle. Due to its high radial distortion, the standard algorithms do not extend easily. Previously, we released the first public fisheye surround-view dataset named WoodScape. In this work, we release a synthetic version of the surround-view dataset, covering many of its weaknesses and extending it. Firstly, it is not possible to obtain ground truth for pixel-wise optical flow and depth. Secondly, WoodScape did not have all four cameras annotated simultaneously in order to sample diverse frames. However, this means that multi-camera algorithms cannot be designed to obtain a unified output in birds-eye space, which is enabled in the new dataset. We implemented surround-view fisheye geometric projections in CARLA Simulator matching WoodScape's configuration and created SynWoodScape. We release 80k images from the synthetic dataset with annotations for 10+ tasks. We also release the baseline code and supporting scripts.
△ Less
Submitted 2 January, 2023; v1 submitted 9 March, 2022;
originally announced March 2022.
-
Detecting Adversarial Perturbations in Multi-Task Perception
Authors:
Marvin Klingner,
Varun Ravi Kumar,
Senthil Yogamani,
Andreas Bär,
Tim Fingscheidt
Abstract:
While deep neural networks (DNNs) achieve impressive performance on environment perception tasks, their sensitivity to adversarial perturbations limits their use in practical applications. In this paper, we (i) propose a novel adversarial perturbation detection scheme based on multi-task perception of complex vision tasks (i.e., depth estimation and semantic segmentation). Specifically, adversaria…
▽ More
While deep neural networks (DNNs) achieve impressive performance on environment perception tasks, their sensitivity to adversarial perturbations limits their use in practical applications. In this paper, we (i) propose a novel adversarial perturbation detection scheme based on multi-task perception of complex vision tasks (i.e., depth estimation and semantic segmentation). Specifically, adversarial perturbations are detected by inconsistencies between extracted edges of the input image, the depth output, and the segmentation output. To further improve this technique, we (ii) develop a novel edge consistency loss between all three modalities, thereby improving their initial consistency which in turn supports our detection scheme. We verify our detection scheme's effectiveness by employing various known attacks and image noises. In addition, we (iii) develop a multi-task adversarial attack, aiming at fooling both tasks as well as our detection scheme. Experimental evaluation on the Cityscapes and KITTI datasets shows that under an assumption of a 5% false positive rate up to 100% of images are correctly detected as adversarially perturbed, depending on the strength of the perturbation. Code is available at https://github.com/ifnspaml/AdvAttackDet. A short video at https://youtu.be/KKa6gOyWmH4 provides qualitative results.
△ Less
Submitted 11 September, 2022; v1 submitted 2 March, 2022;
originally announced March 2022.
-
A Hybrid Sparse-Dense Monocular SLAM System for Autonomous Driving
Authors:
Louis Gallagher,
Varun Ravi Kumar,
Senthil Yogamani,
John B. McDonald
Abstract:
In this paper, we present a system for incrementally reconstructing a dense 3D model of the geometry of an outdoor environment using a single monocular camera attached to a moving vehicle. Dense models provide a rich representation of the environment facilitating higher-level scene understanding, perception, and planning. Our system employs dense depth prediction with a hybrid map** architecture…
▽ More
In this paper, we present a system for incrementally reconstructing a dense 3D model of the geometry of an outdoor environment using a single monocular camera attached to a moving vehicle. Dense models provide a rich representation of the environment facilitating higher-level scene understanding, perception, and planning. Our system employs dense depth prediction with a hybrid map** architecture combining state-of-the-art sparse features and dense fusion-based visual SLAM algorithms within an integrated framework. Our novel contributions include design of hybrid sparse-dense camera tracking and loop closure, and scale estimation improvements in dense depth prediction. We use the motion estimates from the sparse method to overcome the large and variable inter-frame displacement typical of outdoor vehicle scenarios. Our system then registers the live image with the dense model using whole-image alignment. This enables the fusion of the live frame and dense depth prediction into the model. Global consistency and alignment between the sparse and dense models are achieved by applying pose constraints from the sparse method directly within the deformation of the dense model. We provide qualitative and quantitative results for both trajectory estimation and surface reconstruction accuracy, demonstrating competitive performance on the KITTI dataset. Qualitative results of the proposed approach are illustrated in https://youtu.be/Pn2uaVqjskY. Source code for the project is publicly available at the following repository https://github.com/robotvisionmu/DenseMonoSLAM.
△ Less
Submitted 17 August, 2021;
originally announced August 2021.
-
Adversarial Attacks on Multi-task Visual Perception for Autonomous Driving
Authors:
Ibrahim Sobh,
Ahmed Hamed,
Varun Ravi Kumar,
Senthil Yogamani
Abstract:
Deep neural networks (DNNs) have accomplished impressive success in various applications, including autonomous driving perception tasks, in recent years. On the other hand, current deep neural networks are easily fooled by adversarial attacks. This vulnerability raises significant concerns, particularly in safety-critical applications. As a result, research into attacking and defending DNNs has ga…
▽ More
Deep neural networks (DNNs) have accomplished impressive success in various applications, including autonomous driving perception tasks, in recent years. On the other hand, current deep neural networks are easily fooled by adversarial attacks. This vulnerability raises significant concerns, particularly in safety-critical applications. As a result, research into attacking and defending DNNs has gained much coverage. In this work, detailed adversarial attacks are applied on a diverse multi-task visual perception deep network across distance estimation, semantic segmentation, motion detection, and object detection. The experiments consider both white and black box attacks for targeted and un-targeted cases, while attacking a task and inspecting the effect on all the others, in addition to inspecting the effect of applying a simple defense method. We conclude this paper by comparing and discussing the experimental results, proposing insights and future work. The visualizations of the attacks are available at https://youtu.be/6AixN90budY.
△ Less
Submitted 7 November, 2021; v1 submitted 15 July, 2021;
originally announced July 2021.
-
An Online Learning System for Wireless Charging Alignment using Surround-view Fisheye Cameras
Authors:
Ashok Dahal,
Varun Ravi Kumar,
Senthil Yogamani,
Ciaran Eising
Abstract:
Electric Vehicles are increasingly common, with inductive chargepads being considered a convenient and efficient means of charging electric vehicles. However, drivers are typically poor at aligning the vehicle to the necessary accuracy for efficient inductive charging, making the automated alignment of the two charging plates desirable. In parallel to the electrification of the vehicular fleet, au…
▽ More
Electric Vehicles are increasingly common, with inductive chargepads being considered a convenient and efficient means of charging electric vehicles. However, drivers are typically poor at aligning the vehicle to the necessary accuracy for efficient inductive charging, making the automated alignment of the two charging plates desirable. In parallel to the electrification of the vehicular fleet, automated parking systems that make use of surround-view camera systems are becoming increasingly popular. In this work, we propose a system based on the surround-view camera architecture to detect, localize, and automatically align the vehicle with the inductive chargepad. The visual design of the chargepads is not standardized and not necessarily known beforehand. Therefore, a system that relies on offline training will fail in some situations. Thus, we propose a self-supervised online learning method that leverages the driver's actions when manually aligning the vehicle with the chargepad and combine it with weak supervision from semantic segmentation and depth to learn a classifier to auto-annotate the chargepad in the video for further training. In this way, when faced with a previously unseen chargepad, the driver needs only manually align the vehicle a single time. As the chargepad is flat on the ground, it is not easy to detect it from a distance. Thus, we propose using a Visual SLAM pipeline to learn landmarks relative to the chargepad to enable alignment from a greater range. We demonstrate the working system on an automated vehicle as illustrated in the video at https://youtu.be/_cLCmkW4UYo. To encourage further research, we will share a chargepad dataset used in this work.
△ Less
Submitted 21 December, 2022; v1 submitted 26 May, 2021;
originally announced May 2021.
-
Weather and Light Level Classification for Autonomous Driving: Dataset, Baseline and Active Learning
Authors:
Mahesh M Dhananjaya,
Varun Ravi Kumar,
Senthil Yogamani
Abstract:
Autonomous driving is rapidly advancing, and Level 2 functions are becoming a standard feature. One of the foremost outstanding hurdles is to obtain robust visual perception in harsh weather and low light conditions where accuracy degradation is severe. It is critical to have a weather classification model to decrease visual perception confidence during these scenarios. Thus, we have built a new d…
▽ More
Autonomous driving is rapidly advancing, and Level 2 functions are becoming a standard feature. One of the foremost outstanding hurdles is to obtain robust visual perception in harsh weather and low light conditions where accuracy degradation is severe. It is critical to have a weather classification model to decrease visual perception confidence during these scenarios. Thus, we have built a new dataset for weather (fog, rain, and snow) classification and light level (bright, moderate, and low) classification. Furthermore, we provide street type (asphalt, grass, and cobblestone) classification, leading to 9 labels. Each image has three labels corresponding to weather, light level, and street type. We recorded the data utilizing an industrial front camera of RCCC (red/clear) format with a resolution of $1024\times1084$. We collected 15k video sequences and sampled 60k images. We implement an active learning framework to reduce the dataset's redundancy and find the optimal set of frames for training a model. We distilled the 60k images further to 1.1k images, which will be shared publicly after privacy anonymization. There is no public dataset for weather and light level classification focused on autonomous driving to the best of our knowledge. The baseline ResNet18 network used for weather classification achieves state-of-the-art results in two non-automotive weather classification public datasets but significantly lower accuracy on our proposed dataset, demonstrating it is not saturated and needs further research.
△ Less
Submitted 29 November, 2021; v1 submitted 28 April, 2021;
originally announced April 2021.
-
SVDistNet: Self-Supervised Near-Field Distance Estimation on Surround View Fisheye Cameras
Authors:
Varun Ravi Kumar,
Marvin Klingner,
Senthil Yogamani,
Markus Bach,
Stefan Milz,
Tim Fingscheidt,
Patrick Mäder
Abstract:
A 360° perception of scene geometry is essential for automated driving, notably for parking and urban driving scenarios. Typically, it is achieved using surround-view fisheye cameras, focusing on the near-field area around the vehicle. The majority of current depth estimation approaches focus on employing just a single camera, which cannot be straightforwardly generalized to multiple cameras. The…
▽ More
A 360° perception of scene geometry is essential for automated driving, notably for parking and urban driving scenarios. Typically, it is achieved using surround-view fisheye cameras, focusing on the near-field area around the vehicle. The majority of current depth estimation approaches focus on employing just a single camera, which cannot be straightforwardly generalized to multiple cameras. The depth estimation model must be tested on a variety of cameras equipped to millions of cars with varying camera geometries. Even within a single car, intrinsics vary due to manufacturing tolerances. Deep learning models are sensitive to these changes, and it is practically infeasible to train and test on each camera variant. As a result, we present novel camera-geometry adaptive multi-scale convolutions which utilize the camera parameters as a conditional input, enabling the model to generalize to previously unseen fisheye cameras. Additionally, we improve the distance estimation by pairwise and patchwise vector-based self-attention encoder networks. We evaluate our approach on the Fisheye WoodScape surround-view dataset, significantly improving over previous approaches. We also show a generalization of our approach across different camera viewing angles and perform extensive experiments to support our contributions. To enable comparison with other approaches, we evaluate the front camera data on the KITTI dataset (pinhole camera images) and achieve state-of-the-art performance among self-supervised monocular methods. An overview video with qualitative results is provided at https://youtu.be/bmX0UcU9wtA. Baseline code and dataset will be made public.
△ Less
Submitted 9 April, 2021;
originally announced April 2021.
-
OmniDet: Surround View Cameras based Multi-task Visual Perception Network for Autonomous Driving
Authors:
Varun Ravi Kumar,
Senthil Yogamani,
Hazem Rashed,
Ganesh Sistu,
Christian Witt,
Isabelle Leang,
Stefan Milz,
Patrick Mäder
Abstract:
Surround View fisheye cameras are commonly deployed in automated driving for 360° near-field sensing around the vehicle. This work presents a multi-task visual perception network on unrectified fisheye images to enable the vehicle to sense its surrounding environment. It consists of six primary tasks necessary for an autonomous driving system: depth estimation, visual odometry, semantic segmentati…
▽ More
Surround View fisheye cameras are commonly deployed in automated driving for 360° near-field sensing around the vehicle. This work presents a multi-task visual perception network on unrectified fisheye images to enable the vehicle to sense its surrounding environment. It consists of six primary tasks necessary for an autonomous driving system: depth estimation, visual odometry, semantic segmentation, motion segmentation, object detection, and lens soiling detection. We demonstrate that the jointly trained model performs better than the respective single task versions. Our multi-task model has a shared encoder providing a significant computational advantage and has synergized decoders where tasks support each other. We propose a novel camera geometry based adaptation mechanism to encode the fisheye distortion model both at training and inference. This was crucial to enable training on the WoodScape dataset, comprised of data from different parts of the world collected by 12 different cameras mounted on three different cars with different intrinsics and viewpoints. Given that bounding boxes is not a good representation for distorted fisheye images, we also extend object detection to use a polygon with non-uniformly sampled vertices. We additionally evaluate our model on standard automotive datasets, namely KITTI and Cityscapes. We obtain the state-of-the-art results on KITTI for depth estimation and pose estimation tasks and competitive performance on the other tasks. We perform extensive ablation studies on various architecture choices and task weighting methodologies. A short video at https://youtu.be/xbSjZ5OfPes provides qualitative results.
△ Less
Submitted 6 June, 2023; v1 submitted 15 February, 2021;
originally announced February 2021.
-
Generalized Object Detection on Fisheye Cameras for Autonomous Driving: Dataset, Representations and Baseline
Authors:
Hazem Rashed,
Eslam Mohamed,
Ganesh Sistu,
Varun Ravi Kumar,
Ciaran Eising,
Ahmad El-Sallab,
Senthil Yogamani
Abstract:
Object detection is a comprehensively studied problem in autonomous driving. However, it has been relatively less explored in the case of fisheye cameras. The standard bounding box fails in fisheye cameras due to the strong radial distortion, particularly in the image's periphery. We explore better representations like oriented bounding box, ellipse, and generic polygon for object detection in fis…
▽ More
Object detection is a comprehensively studied problem in autonomous driving. However, it has been relatively less explored in the case of fisheye cameras. The standard bounding box fails in fisheye cameras due to the strong radial distortion, particularly in the image's periphery. We explore better representations like oriented bounding box, ellipse, and generic polygon for object detection in fisheye images in this work. We use the IoU metric to compare these representations using accurate instance segmentation ground truth. We design a novel curved bounding box model that has optimal properties for fisheye distortion models. We also design a curvature adaptive perimeter sampling method for obtaining polygon vertices, improving relative mAP score by 4.9% compared to uniform sampling. Overall, the proposed polygon model improves mIoU relative accuracy by 40.3%. It is the first detailed study on object detection on fisheye cameras for autonomous driving scenarios to the best of our knowledge. The dataset comprising of 10,000 images along with all the object representations ground truth will be made public to encourage further research. We summarize our work in a short video with qualitative results at https://youtu.be/iLkOzvJpL-A.
△ Less
Submitted 21 December, 2022; v1 submitted 3 December, 2020;
originally announced December 2020.
-
SynDistNet: Self-Supervised Monocular Fisheye Camera Distance Estimation Synergized with Semantic Segmentation for Autonomous Driving
Authors:
Varun Ravi Kumar,
Marvin Klingner,
Senthil Yogamani,
Stefan Milz,
Tim Fingscheidt,
Patrick Maeder
Abstract:
State-of-the-art self-supervised learning approaches for monocular depth estimation usually suffer from scale ambiguity. They do not generalize well when applied on distance estimation for complex projection models such as in fisheye and omnidirectional cameras. This paper introduces a novel multi-task learning strategy to improve self-supervised monocular distance estimation on fisheye and pinhol…
▽ More
State-of-the-art self-supervised learning approaches for monocular depth estimation usually suffer from scale ambiguity. They do not generalize well when applied on distance estimation for complex projection models such as in fisheye and omnidirectional cameras. This paper introduces a novel multi-task learning strategy to improve self-supervised monocular distance estimation on fisheye and pinhole camera images. Our contribution to this work is threefold: Firstly, we introduce a novel distance estimation network architecture using a self-attention based encoder coupled with robust semantic feature guidance to the decoder that can be trained in a one-stage fashion. Secondly, we integrate a generalized robust loss function, which improves performance significantly while removing the need for hyperparameter tuning with the reprojection loss. Finally, we reduce the artifacts caused by dynamic objects violating static world assumptions using a semantic masking strategy. We significantly improve upon the RMSE of previous work on fisheye by 25% reduction in RMSE. As there is little work on fisheye cameras, we evaluated the proposed method on KITTI using a pinhole model. We achieved state-of-the-art performance among self-supervised methods without requiring an external scale estimation.
△ Less
Submitted 14 November, 2020; v1 submitted 10 August, 2020;
originally announced August 2020.
-
UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models
Authors:
Varun Ravi Kumar,
Senthil Yogamani,
Markus Bach,
Christian Witt,
Stefan Milz,
Patrick Mader
Abstract:
In classical computer vision, rectification is an integral part of multi-view depth estimation. It typically includes epipolar rectification and lens distortion correction. This process simplifies the depth estimation significantly, and thus it has been adopted in CNN approaches. However, rectification has several side effects, including a reduced field of view (FOV), resampling distortion, and se…
▽ More
In classical computer vision, rectification is an integral part of multi-view depth estimation. It typically includes epipolar rectification and lens distortion correction. This process simplifies the depth estimation significantly, and thus it has been adopted in CNN approaches. However, rectification has several side effects, including a reduced field of view (FOV), resampling distortion, and sensitivity to calibration errors. The effects are particularly pronounced in case of significant distortion (e.g., wide-angle fisheye cameras). In this paper, we propose a generic scale-aware self-supervised pipeline for estimating depth, euclidean distance, and visual odometry from unrectified monocular videos. We demonstrate a similar level of precision on the unrectified KITTI dataset with barrel distortion comparable to the rectified KITTI dataset. The intuition being that the rectification step can be implicitly absorbed within the CNN model, which learns the distortion model without increasing complexity. Our approach does not suffer from a reduced field of view and avoids computational costs for rectification at inference time. To further illustrate the general applicability of the proposed framework, we apply it to wide-angle fisheye cameras with 190$^\circ$ horizontal field of view. The training framework UnRectDepthNet takes in the camera distortion model as an argument and adapts projection and unprojection functions accordingly. The proposed algorithm is evaluated further on the KITTI rectified dataset, and we achieve state-of-the-art results that improve upon our previous work FisheyeDistanceNet. Qualitative results on a distorted test scene video sequence indicate excellent performance https://youtu.be/K6pbx3bU4Ss.
△ Less
Submitted 6 June, 2023; v1 submitted 13 July, 2020;
originally announced July 2020.
-
TiledSoilingNet: Tile-level Soiling Detection on Automotive Surround-view Cameras Using Coverage Metric
Authors:
Arindam Das,
Pavel Krizek,
Ganesh Sistu,
Fabian Burger,
Sankaralingam Madasamy,
Michal Uricar,
Varun Ravi Kumar,
Senthil Yogamani
Abstract:
Automotive cameras, particularly surround-view cameras, tend to get soiled by mud, water, snow, etc. For higher levels of autonomous driving, it is necessary to have a soiling detection algorithm which will trigger an automatic cleaning system. Localized detection of soiling in an image is necessary to control the cleaning system. It is also necessary to enable partial functionality in unsoiled ar…
▽ More
Automotive cameras, particularly surround-view cameras, tend to get soiled by mud, water, snow, etc. For higher levels of autonomous driving, it is necessary to have a soiling detection algorithm which will trigger an automatic cleaning system. Localized detection of soiling in an image is necessary to control the cleaning system. It is also necessary to enable partial functionality in unsoiled areas while reducing confidence in soiled areas. Although this can be solved using a semantic segmentation task, we explore a more efficient solution targeting deployment in low power embedded system. We propose a novel method to regress the area of each soiling type within a tile directly. We refer to this as coverage. The proposed approach is better than learning the dominant class in a tile as multiple soiling types occur within a tile commonly. It also has the advantage of dealing with coarse polygon annotation, which will cause the segmentation task. The proposed soiling coverage decoder is an order of magnitude faster than an equivalent segmentation decoder. We also integrated it into an object detection and semantic segmentation multi-task model using an asynchronous back-propagation algorithm. A portion of the dataset used will be released publicly as part of our WoodScape dataset to encourage further research.
△ Less
Submitted 1 July, 2020;
originally announced July 2020.
-
Predictive Analysis for Detection of Human Neck Postures using a robust integration of kinetics and kinematics
Authors:
Korupalli V Rajesh Kumar,
Susan Elias
Abstract:
Human neck postures and movements need to be monitored, measured, quantified and analyzed, as a preventive measure in healthcare applications. Improper neck postures are an increasing source of neck musculoskeletal disorders, requiring therapy and rehabilitation. The motivation for the research presented in this paper was the need to develop a notification mechanism for improper neck usage. Kinema…
▽ More
Human neck postures and movements need to be monitored, measured, quantified and analyzed, as a preventive measure in healthcare applications. Improper neck postures are an increasing source of neck musculoskeletal disorders, requiring therapy and rehabilitation. The motivation for the research presented in this paper was the need to develop a notification mechanism for improper neck usage. Kinematic data captured by sensors have limitations in accurately classifying the neck postures. Hence, we propose an integrated use of kinematic and kinetic data to efficiently classify neck postures. Using machine learning algorithms we obtained 100% accuracy in the predictive analysis of this data. The research analysis and discussions show that the kinetic data of the Hyoid muscles can accurately detect the neck posture given the corresponding kinematic data captured by the neck-band. The proposed robust platform for the integration of kinematic and kinetic data has enabled the design of a smart neck-band for the prevention of neck musculoskeletal disorders.
△ Less
Submitted 12 March, 2020;
originally announced March 2020.
-
Let's Get Dirty: GAN Based Data Augmentation for Camera Lens Soiling Detection in Autonomous Driving
Authors:
Michal Uricar,
Ganesh Sistu,
Hazem Rashed,
Antonin Vobecky,
Varun Ravi Kumar,
Pavel Krizek,
Fabian Burger,
Senthil Yogamani
Abstract:
Wide-angle fisheye cameras are commonly used in automated driving for parking and low-speed navigation tasks. Four of such cameras form a surround-view system that provides a complete and detailed view of the vehicle. These cameras are directly exposed to harsh environmental settings and can get soiled very easily by mud, dust, water, frost. Soiling on the camera lens can severely degrade the visu…
▽ More
Wide-angle fisheye cameras are commonly used in automated driving for parking and low-speed navigation tasks. Four of such cameras form a surround-view system that provides a complete and detailed view of the vehicle. These cameras are directly exposed to harsh environmental settings and can get soiled very easily by mud, dust, water, frost. Soiling on the camera lens can severely degrade the visual perception algorithms, and a camera cleaning system triggered by a soiling detection algorithm is increasingly being deployed. While adverse weather conditions, such as rain, are getting attention recently, there is only limited work on general soiling. The main reason is the difficulty in collecting a diverse dataset as it is a relatively rare event. We propose a novel GAN based algorithm for generating unseen patterns of soiled images. Additionally, the proposed method automatically provides the corresponding soiling masks eliminating the manual annotation cost. Augmentation of the generated soiled images for training improves the accuracy of soiling detection tasks significantly by 18% demonstrating its usefulness. The manually annotated soiling dataset and the generated augmentation dataset will be made public. We demonstrate the generalization of our fisheye trained GAN model on the Cityscapes dataset. We provide an empirical evaluation of the degradation of the semantic segmentation algorithm with the soiled data.
△ Less
Submitted 14 November, 2020; v1 submitted 4 December, 2019;
originally announced December 2019.
-
FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving
Authors:
Varun Ravi Kumar,
Sandesh Athni Hiremath,
Stefan Milz,
Christian Witt,
Clement Pinnard,
Senthil Yogamani,
Patrick Mader
Abstract:
Fisheye cameras are commonly used in applications like autonomous driving and surveillance to provide a large field of view ($>180^{\circ}$). However, they come at the cost of strong non-linear distortions which require more complex algorithms. In this paper, we explore Euclidean distance estimation on fisheye cameras for automotive scenes. Obtaining accurate and dense depth supervision is difficu…
▽ More
Fisheye cameras are commonly used in applications like autonomous driving and surveillance to provide a large field of view ($>180^{\circ}$). However, they come at the cost of strong non-linear distortions which require more complex algorithms. In this paper, we explore Euclidean distance estimation on fisheye cameras for automotive scenes. Obtaining accurate and dense depth supervision is difficult in practice, but self-supervised learning approaches show promising results and could potentially overcome the problem. We present a novel self-supervised scale-aware framework for learning Euclidean distance and ego-motion from raw monocular fisheye videos without applying rectification. While it is possible to perform piece-wise linear approximation of fisheye projection surface and apply standard rectilinear models, it has its own set of issues like re-sampling distortion and discontinuities in transition regions. To encourage further research in this area, we will release our dataset as part of the WoodScape project \cite{yogamani2019woodscape}. We further evaluated the proposed algorithm on the KITTI dataset and obtained state-of-the-art results comparable to other self-supervised monocular methods. Qualitative results on an unseen fisheye video demonstrate impressive performance https://youtu.be/Sgq1WzoOmXg.
△ Less
Submitted 6 October, 2020; v1 submitted 7 October, 2019;
originally announced October 2019.
-
FisheyeMODNet: Moving Object detection on Surround-view Cameras for Autonomous Driving
Authors:
Marie Yahiaoui,
Hazem Rashed,
Letizia Mariotti,
Ganesh Sistu,
Ian Clancy,
Lucie Yahiaoui,
Varun Ravi Kumar,
Senthil Yogamani
Abstract:
Moving Object Detection (MOD) is an important task for achieving robust autonomous driving. An autonomous vehicle has to estimate collision risk with other interacting objects in the environment and calculate an optional trajectory. Collision risk is typically higher for moving objects than static ones due to the need to estimate the future states and poses of the objects for decision making. This…
▽ More
Moving Object Detection (MOD) is an important task for achieving robust autonomous driving. An autonomous vehicle has to estimate collision risk with other interacting objects in the environment and calculate an optional trajectory. Collision risk is typically higher for moving objects than static ones due to the need to estimate the future states and poses of the objects for decision making. This is particularly important for near-range objects around the vehicle which are typically detected by a fisheye surround-view system that captures a 360° view of the scene. In this work, we propose a CNN architecture for moving object detection using fisheye images that were captured in autonomous driving environment. As motion geometry is highly non-linear and unique for fisheye cameras, we will make an improved version of the current dataset public to encourage further research. To target embedded deployment, we design a lightweight encoder sharing weights across sequential images. The proposed network runs at 15 fps on a 1 teraflops automotive embedded system at accuracy of 40% IoU and 69.5% mIoU.
△ Less
Submitted 30 August, 2019;
originally announced August 2019.
-
Monocular Fisheye Camera Depth Estimation Using Sparse LiDAR Supervision
Authors:
Varun Ravi Kumar,
Stefan Milz,
Martin Simon,
Christian Witt,
Karl Amende,
Johannes Petzold,
Senthil Yogamani,
Timo Pech
Abstract:
Near field depth estimation around a self driving car is an important function that can be achieved by four wide angle fisheye cameras having a field of view of over 180. Depth estimation based on convolutional neural networks (CNNs) produce state of the art results, but progress is hindered because depth annotation cannot be obtained manually. Synthetic datasets are commonly used but they have li…
▽ More
Near field depth estimation around a self driving car is an important function that can be achieved by four wide angle fisheye cameras having a field of view of over 180. Depth estimation based on convolutional neural networks (CNNs) produce state of the art results, but progress is hindered because depth annotation cannot be obtained manually. Synthetic datasets are commonly used but they have limitations. For instance, they do not capture the extensive variability in the appearance of objects like vehicles present in real datasets. There is also a domain shift while performing inference on natural images illustrated by many attempts to handle the domain adaptation explicitly. In this work, we explore an alternate approach of training using sparse LiDAR data as ground truth for depth estimation for fisheye camera. We built our own dataset using our self driving car setup which has a 64 beam Velodyne LiDAR and four wide angle fisheye cameras. To handle the difference in view points of LiDAR and fisheye camera, an occlusion resolution mechanism was implemented. We started with Eigen's multiscale convolutional network architecture and improved by modifying activation function and optimizer. We obtained promising results on our dataset with RMSE errors comparable to the state of the art results obtained on KITTI.
△ Less
Submitted 24 September, 2018; v1 submitted 16 March, 2018;
originally announced March 2018.
-
Identifying the Importance of Software Reuse in COCOMO81, COCOMOII
Authors:
CH. V. M. K. Hari,
Prof. Prasad Reddy P. V. G. D,
J. N. V. R Swarup Kumar,
G. SriRamGanesh
Abstract:
Software project management is an interpolation of project planning, project monitoring and project termination. The substratal goals of planning are to scout for the future, to diagnose the attributes that are essentially done for the consummation of the project successfully, animate the scheduling and allocate resources for the attributes. Software cost estimation is a vital role in preeminent…
▽ More
Software project management is an interpolation of project planning, project monitoring and project termination. The substratal goals of planning are to scout for the future, to diagnose the attributes that are essentially done for the consummation of the project successfully, animate the scheduling and allocate resources for the attributes. Software cost estimation is a vital role in preeminent software project decisions such as resource allocation and bidding. This paper articulates the conventional overview of software cost estimation modus operandi available. The cost, effort estimates of software projects done by the various companies are congregated, the results are segregated with the present cost models and the MRE (Mean Relative Error) is enumerated. We have administered the historical data to COCOMO 81, COCOMOII model and identified that the stellar predicament is that no cost model gives the exact estimate of a software project.
△ Less
Submitted 11 December, 2009;
originally announced December 2009.