-
Let's Get Dirty: GAN Based Data Augmentation for Camera Lens Soiling Detection in Autonomous Driving
Authors:
Michal Uricar,
Ganesh Sistu,
Hazem Rashed,
Antonin Vobecky,
Varun Ravi Kumar,
Pavel Krizek,
Fabian Burger,
Senthil Yogamani
Abstract:
Wide-angle fisheye cameras are commonly used in automated driving for parking and low-speed navigation tasks. Four of such cameras form a surround-view system that provides a complete and detailed view of the vehicle. These cameras are directly exposed to harsh environmental settings and can get soiled very easily by mud, dust, water, frost. Soiling on the camera lens can severely degrade the visu…
▽ More
Wide-angle fisheye cameras are commonly used in automated driving for parking and low-speed navigation tasks. Four of such cameras form a surround-view system that provides a complete and detailed view of the vehicle. These cameras are directly exposed to harsh environmental settings and can get soiled very easily by mud, dust, water, frost. Soiling on the camera lens can severely degrade the visual perception algorithms, and a camera cleaning system triggered by a soiling detection algorithm is increasingly being deployed. While adverse weather conditions, such as rain, are getting attention recently, there is only limited work on general soiling. The main reason is the difficulty in collecting a diverse dataset as it is a relatively rare event. We propose a novel GAN based algorithm for generating unseen patterns of soiled images. Additionally, the proposed method automatically provides the corresponding soiling masks eliminating the manual annotation cost. Augmentation of the generated soiled images for training improves the accuracy of soiling detection tasks significantly by 18% demonstrating its usefulness. The manually annotated soiling dataset and the generated augmentation dataset will be made public. We demonstrate the generalization of our fisheye trained GAN model on the Cityscapes dataset. We provide an empirical evaluation of the degradation of the semantic segmentation algorithm with the soiled data.
△ Less
Submitted 14 November, 2020; v1 submitted 4 December, 2019;
originally announced December 2019.
-
RST-MODNet: Real-time Spatio-temporal Moving Object Detection for Autonomous Driving
Authors:
Mohamed Ramzy,
Hazem Rashed,
Ahmad El Sallab,
Senthil Yogamani
Abstract:
Moving Object Detection (MOD) is a critical task for autonomous vehicles as moving objects represent higher collision risk than static ones. The trajectory of the ego-vehicle is planned based on the future states of detected moving objects. It is quite challenging as the ego-motion has to be modelled and compensated to be able to understand the motion of the surrounding objects. In this work, we p…
▽ More
Moving Object Detection (MOD) is a critical task for autonomous vehicles as moving objects represent higher collision risk than static ones. The trajectory of the ego-vehicle is planned based on the future states of detected moving objects. It is quite challenging as the ego-motion has to be modelled and compensated to be able to understand the motion of the surrounding objects. In this work, we propose a real-time end-to-end CNN architecture for MOD utilizing spatio-temporal context to improve robustness. We construct a novel time-aware architecture exploiting temporal motion information embedded within sequential images in addition to explicit motion maps using optical flow images.We demonstrate the impact of our algorithm on KITTI dataset where we obtain an improvement of 8% relative to the baselines. We compare our algorithm with state-of-the-art methods and achieve competitive results on KITTI-Motion dataset in terms of accuracy at three times better run-time. The proposed algorithm runs at 23 fps on a standard desktop GPU targeting deployment on embedded platforms.
△ Less
Submitted 1 December, 2019;
originally announced December 2019.
-
FuseMODNet: Real-Time Camera and LiDAR based Moving Object Detection for robust low-light Autonomous Driving
Authors:
Hazem Rashed,
Mohamed Ramzy,
Victor Vaquero,
Ahmad El Sallab,
Ganesh Sistu,
Senthil Yogamani
Abstract:
Moving object detection is a critical task for autonomous vehicles. As dynamic objects represent higher collision risk than static ones, our own ego-trajectories have to be planned attending to the future states of the moving elements of the scene. Motion can be perceived using temporal information such as optical flow. Conventional optical flow computation is based on camera sensors only, which m…
▽ More
Moving object detection is a critical task for autonomous vehicles. As dynamic objects represent higher collision risk than static ones, our own ego-trajectories have to be planned attending to the future states of the moving elements of the scene. Motion can be perceived using temporal information such as optical flow. Conventional optical flow computation is based on camera sensors only, which makes it prone to failure in conditions with low illumination. On the other hand, LiDAR sensors are independent of illumination, as they measure the time-of-flight of their own emitted lasers. In this work, we propose a robust and real-time CNN architecture for Moving Object Detection (MOD) under low-light conditions by capturing motion information from both camera and LiDAR sensors. We demonstrate the impact of our algorithm on KITTI dataset where we simulate a low-light environment creating a novel dataset "Dark KITTI". We obtain a 10.1% relative improvement on Dark-KITTI, and a 4.25% improvement on standard KITTI relative to our baselines. The proposed algorithm runs at 18 fps on a standard desktop GPU using $256\times1224$ resolution images.
△ Less
Submitted 20 November, 2019; v1 submitted 11 October, 2019;
originally announced October 2019.
-
FisheyeDistanceNet: Self-Supervised Scale-Aware Distance Estimation using Monocular Fisheye Camera for Autonomous Driving
Authors:
Varun Ravi Kumar,
Sandesh Athni Hiremath,
Stefan Milz,
Christian Witt,
Clement Pinnard,
Senthil Yogamani,
Patrick Mader
Abstract:
Fisheye cameras are commonly used in applications like autonomous driving and surveillance to provide a large field of view ($>180^{\circ}$). However, they come at the cost of strong non-linear distortions which require more complex algorithms. In this paper, we explore Euclidean distance estimation on fisheye cameras for automotive scenes. Obtaining accurate and dense depth supervision is difficu…
▽ More
Fisheye cameras are commonly used in applications like autonomous driving and surveillance to provide a large field of view ($>180^{\circ}$). However, they come at the cost of strong non-linear distortions which require more complex algorithms. In this paper, we explore Euclidean distance estimation on fisheye cameras for automotive scenes. Obtaining accurate and dense depth supervision is difficult in practice, but self-supervised learning approaches show promising results and could potentially overcome the problem. We present a novel self-supervised scale-aware framework for learning Euclidean distance and ego-motion from raw monocular fisheye videos without applying rectification. While it is possible to perform piece-wise linear approximation of fisheye projection surface and apply standard rectilinear models, it has its own set of issues like re-sampling distortion and discontinuities in transition regions. To encourage further research in this area, we will release our dataset as part of the WoodScape project \cite{yogamani2019woodscape}. We further evaluated the proposed algorithm on the KITTI dataset and obtained state-of-the-art results comparable to other self-supervised monocular methods. Qualitative results on an unseen fisheye video demonstrate impressive performance https://youtu.be/Sgq1WzoOmXg.
△ Less
Submitted 6 October, 2020; v1 submitted 7 October, 2019;
originally announced October 2019.
-
FisheyeMODNet: Moving Object detection on Surround-view Cameras for Autonomous Driving
Authors:
Marie Yahiaoui,
Hazem Rashed,
Letizia Mariotti,
Ganesh Sistu,
Ian Clancy,
Lucie Yahiaoui,
Varun Ravi Kumar,
Senthil Yogamani
Abstract:
Moving Object Detection (MOD) is an important task for achieving robust autonomous driving. An autonomous vehicle has to estimate collision risk with other interacting objects in the environment and calculate an optional trajectory. Collision risk is typically higher for moving objects than static ones due to the need to estimate the future states and poses of the objects for decision making. This…
▽ More
Moving Object Detection (MOD) is an important task for achieving robust autonomous driving. An autonomous vehicle has to estimate collision risk with other interacting objects in the environment and calculate an optional trajectory. Collision risk is typically higher for moving objects than static ones due to the need to estimate the future states and poses of the objects for decision making. This is particularly important for near-range objects around the vehicle which are typically detected by a fisheye surround-view system that captures a 360° view of the scene. In this work, we propose a CNN architecture for moving object detection using fisheye images that were captured in autonomous driving environment. As motion geometry is highly non-linear and unique for fisheye cameras, we will make an improved version of the current dataset public to encourage further research. To target embedded deployment, we design a lightweight encoder sharing weights across sequential images. The proposed network runs at 15 fps on a 1 teraflops automotive embedded system at accuracy of 40% IoU and 69.5% mIoU.
△ Less
Submitted 30 August, 2019;
originally announced August 2019.
-
RGB and LiDAR fusion based 3D Semantic Segmentation for Autonomous Driving
Authors:
Khaled El Madawy,
Hazem Rashed,
Ahmad El Sallab,
Omar Nasr,
Hanan Kamel,
Senthil Yogamani
Abstract:
LiDAR has become a standard sensor for autonomous driving applications as they provide highly precise 3D point clouds. LiDAR is also robust for low-light scenarios at night-time or due to shadows where the performance of cameras is degraded. LiDAR perception is gradually becoming mature for algorithms including object detection and SLAM. However, semantic segmentation algorithm remains to be relat…
▽ More
LiDAR has become a standard sensor for autonomous driving applications as they provide highly precise 3D point clouds. LiDAR is also robust for low-light scenarios at night-time or due to shadows where the performance of cameras is degraded. LiDAR perception is gradually becoming mature for algorithms including object detection and SLAM. However, semantic segmentation algorithm remains to be relatively less explored. Motivated by the fact that semantic segmentation is a mature algorithm on image data, we explore sensor fusion based 3D segmentation. Our main contribution is to convert the RGB image to a polar-grid map** representation used for LiDAR and design early and mid-level fusion architectures. Additionally, we design a hybrid fusion architecture that combines both fusion algorithms. We evaluate our algorithm on KITTI dataset which provides segmentation annotation for cars, pedestrians and cyclists. We evaluate two state-of-the-art architectures namely SqueezeSeg and PointSeg and improve the mIoU score by 10 % in both cases relative to the LiDAR only baseline.
△ Less
Submitted 17 July, 2019; v1 submitted 1 June, 2019;
originally announced June 2019.
-
SoilingNet: Soiling Detection on Automotive Surround-View Cameras
Authors:
Michal Uricar,
Pavel Krizek,
Ganesh Sistu,
Senthil Yogamani
Abstract:
Cameras are an essential part of sensor suite in autonomous driving. Surround-view cameras are directly exposed to external environment and are vulnerable to get soiled. Cameras have a much higher degradation in performance due to soiling compared to other sensors. Thus it is critical to accurately detect soiling on the cameras, particularly for higher levels of autonomous driving. We created a ne…
▽ More
Cameras are an essential part of sensor suite in autonomous driving. Surround-view cameras are directly exposed to external environment and are vulnerable to get soiled. Cameras have a much higher degradation in performance due to soiling compared to other sensors. Thus it is critical to accurately detect soiling on the cameras, particularly for higher levels of autonomous driving. We created a new dataset having multiple types of soiling namely opaque and transparent. It will be released publicly as part of our WoodScape dataset \cite{yogamani2019woodscape} to encourage further research. We demonstrate high accuracy using a Convolutional Neural Network (CNN) based architecture. We also show that it can be combined with the existing object detection task in a multi-task learning framework. Finally, we make use of Generative Adversarial Networks (GANs) to generate more images for data augmentation and show that it works successfully similar to the style transfer.
△ Less
Submitted 17 July, 2019; v1 submitted 4 May, 2019;
originally announced May 2019.
-
WoodScape: A multi-task, multi-camera fisheye dataset for autonomous driving
Authors:
Senthil Yogamani,
Ciaran Hughes,
Jonathan Horgan,
Ganesh Sistu,
Padraig Varley,
Derek O'Dea,
Michal Uricar,
Stefan Milz,
Martin Simon,
Karl Amende,
Christian Witt,
Hazem Rashed,
Sumanth Chennupati,
Sanjaya Nayak,
Saquib Mansoor,
Xavier Perroton,
Patrick Perez
Abstract:
Fisheye cameras are commonly employed for obtaining a large field of view in surveillance, augmented reality and in particular automotive applications. In spite of their prevalence, there are few public datasets for detailed evaluation of computer vision algorithms on fisheye images. We release the first extensive fisheye automotive dataset, WoodScape, named after Robert Wood who invented the fish…
▽ More
Fisheye cameras are commonly employed for obtaining a large field of view in surveillance, augmented reality and in particular automotive applications. In spite of their prevalence, there are few public datasets for detailed evaluation of computer vision algorithms on fisheye images. We release the first extensive fisheye automotive dataset, WoodScape, named after Robert Wood who invented the fisheye camera in 1906. WoodScape comprises of four surround view cameras and nine tasks including segmentation, depth estimation, 3D bounding box detection and soiling detection. Semantic annotation of 40 classes at the instance level is provided for over 10,000 images and annotation for other tasks are provided for over 100,000 images. With WoodScape, we would like to encourage the community to adapt computer vision models for fisheye camera instead of using naive rectification.
△ Less
Submitted 2 July, 2021; v1 submitted 4 May, 2019;
originally announced May 2019.
-
MultiNet++: Multi-Stream Feature Aggregation and Geometric Loss Strategy for Multi-Task Learning
Authors:
Sumanth Chennupati,
Ganesh Sistu,
Senthil Yogamani,
Samir A Rawashdeh
Abstract:
Multi-task learning is commonly used in autonomous driving for solving various visual perception tasks. It offers significant benefits in terms of both performance and computational complexity. Current work on multi-task learning networks focus on processing a single input image and there is no known implementation of multi-task learning handling a sequence of images. In this work, we propose a mu…
▽ More
Multi-task learning is commonly used in autonomous driving for solving various visual perception tasks. It offers significant benefits in terms of both performance and computational complexity. Current work on multi-task learning networks focus on processing a single input image and there is no known implementation of multi-task learning handling a sequence of images. In this work, we propose a multi-stream multi-task network to take advantage of using feature representations from preceding frames in a video sequence for joint learning of segmentation, depth, and motion. The weights of the current and previous encoder are shared so that features computed in the previous frame can be leveraged without additional computation. In addition, we propose to use the geometric mean of task losses as a better alternative to the weighted average of task losses. The proposed loss function facilitates better handling of the difference in convergence rates of different tasks. Experimental results on KITTI, Cityscapes and SYNTHIA datasets demonstrate that the proposed strategies outperform various existing multi-task learning solutions.
△ Less
Submitted 22 April, 2019; v1 submitted 15 April, 2019;
originally announced April 2019.
-
Exploring Deep Spiking Neural Networks for Automated Driving Applications
Authors:
Sambit Mohapatra,
Heinrich Gotzig,
Senthil Yogamani,
Stefan Milz,
Raoul Zollner
Abstract:
Neural networks have become the standard model for various computer vision tasks in automated driving including semantic segmentation, moving object detection, depth estimation, visual odometry, etc. The main flavors of neural networks which are used commonly are convolutional (CNN) and recurrent (RNN). In spite of rapid progress in embedded processors, power consumption and cost is still a bottle…
▽ More
Neural networks have become the standard model for various computer vision tasks in automated driving including semantic segmentation, moving object detection, depth estimation, visual odometry, etc. The main flavors of neural networks which are used commonly are convolutional (CNN) and recurrent (RNN). In spite of rapid progress in embedded processors, power consumption and cost is still a bottleneck. Spiking Neural Networks (SNNs) are gradually progressing to achieve low-power event-driven hardware architecture which has a potential for high efficiency. In this paper, we explore the role of deep spiking neural networks (SNN) for automated driving applications. We provide an overview of progress on SNN and argue how it can be a good fit for automated driving applications.
△ Less
Submitted 11 January, 2019;
originally announced March 2019.
-
Realistic Ultrasonic Environment Simulation Using Conditional Generative Adversarial Networks
Authors:
Maximilian Pöpperl,
Raghavendra Gulagundi,
Senthil Yogamani,
Stefan Milz
Abstract:
Recently, realistic data augmentation using neural networks especially generative neural networks (GAN) has achieved outstanding results. The communities main research focus is visual image processing. However, automotive cars and robots are equipped with a large suite of sensors to achieve a high redundancy. In addition to others, ultrasonic sensors are often used due to their low-costs and relia…
▽ More
Recently, realistic data augmentation using neural networks especially generative neural networks (GAN) has achieved outstanding results. The communities main research focus is visual image processing. However, automotive cars and robots are equipped with a large suite of sensors to achieve a high redundancy. In addition to others, ultrasonic sensors are often used due to their low-costs and reliable near field distance measuring capabilities. Hence, Pattern recognition needs to be applied to ultrasonic signals as well. Machine Learning requires extensive data sets and those measurements are time-consuming, expensive and not flexible to hardware and environmental changes. On the other hand, there exists no method to simulate those signals deterministically. We present a novel approach for synthetic ultrasonic signal simulation using conditional GANs (cGANs). For the best of our knowledge, we present the first realistic data augmentation for automotive ultrasonics. The performance of cGANs allows us to bring the realistic environment simulation to a new level. By using setup and environmental parameters as condition, the proposed approach is flexible to external influences. Due to the low complexity and time effort for data generation, we outperform other simulation algorithms, such as finite element method. We verify the outstanding accuracy and realism of our method by applying a detailed statistical analysis and comparing the generated data to an extensive amount of measured signals.
△ Less
Submitted 26 February, 2019;
originally announced February 2019.
-
Capsule Neural Network based Height Classification using Low-Cost Automotive Ultrasonic Sensors
Authors:
Maximilian Pöpperl,
Raghavendra Gulagundi,
Senthil Yogamani,
Stefan Milz
Abstract:
High performance ultrasonic sensor hardware is mainly used in medical applications. Although, the development in automotive scenarios is towards autonomous driving, the ultrasonic sensor hardware still stays low-cost and low-performance, respectively. To overcome the strict hardware limitations, we propose to use capsule neural networks. By the high classification capability of this network archit…
▽ More
High performance ultrasonic sensor hardware is mainly used in medical applications. Although, the development in automotive scenarios is towards autonomous driving, the ultrasonic sensor hardware still stays low-cost and low-performance, respectively. To overcome the strict hardware limitations, we propose to use capsule neural networks. By the high classification capability of this network architecture, we can achieve outstanding results for performing a detailed height analysis of detected objects. We apply a novel resorting and resha** method to feed the neural network with ultrasonic data. This increases classification performance and computation speed. We tested the approach under different environmental conditions to verify that the proposed method is working independent of external parameters that is needed for autonomous driving.
△ Less
Submitted 26 February, 2019;
originally announced February 2019.
-
NeurAll: Towards a Unified Visual Perception Model for Automated Driving
Authors:
Ganesh Sistu,
Isabelle Leang,
Sumanth Chennupati,
Senthil Yogamani,
Ciaran Hughes,
Stefan Milz,
Samir Rawashdeh
Abstract:
Convolutional Neural Networks (CNNs) are successfully used for the important automotive visual perception tasks including object recognition, motion and depth estimation, visual SLAM, etc. However, these tasks are typically independently explored and modeled. In this paper, we propose a joint multi-task network design for learning several tasks simultaneously. Our main motivation is the computatio…
▽ More
Convolutional Neural Networks (CNNs) are successfully used for the important automotive visual perception tasks including object recognition, motion and depth estimation, visual SLAM, etc. However, these tasks are typically independently explored and modeled. In this paper, we propose a joint multi-task network design for learning several tasks simultaneously. Our main motivation is the computational efficiency achieved by sharing the expensive initial convolutional layers between all tasks. Indeed, the main bottleneck in automated driving systems is the limited processing power available on deployment hardware. There is also some evidence for other benefits in improving accuracy for some tasks and easing development effort. It also offers scalability to add more tasks leveraging existing features and achieving better generalization. We survey various CNN based solutions for visual perception tasks in automated driving. Then we propose a unified CNN model for the important tasks and discuss several advanced optimization and architecture design techniques to improve the baseline model. The paper is partly review and partly positional with demonstration of several preliminary results promising for future research. We first demonstrate results of multi-stream learning and auxiliary learning which are important ingredients to scale to a large multi-task model. Finally, we implement a two-stream three-task network which performs better in many cases compared to their corresponding single-task models, while maintaining network size.
△ Less
Submitted 9 March, 2024; v1 submitted 10 February, 2019;
originally announced February 2019.
-
Yes, we GAN: Applying Adversarial Techniques for Autonomous Driving
Authors:
Michal Uricar,
Pavel Krizek,
David Hurych,
Ibrahim Sobh,
Senthil Yogamani,
Patrick Denny
Abstract:
Generative Adversarial Networks (GAN) have gained a lot of popularity from their introduction in 2014 till present. Research on GAN is rapidly growing and there are many variants of the original GAN focusing on various aspects of deep learning. GAN are perceived as the most impactful direction of machine learning in the last decade. This paper focuses on the application of GAN in autonomous drivin…
▽ More
Generative Adversarial Networks (GAN) have gained a lot of popularity from their introduction in 2014 till present. Research on GAN is rapidly growing and there are many variants of the original GAN focusing on various aspects of deep learning. GAN are perceived as the most impactful direction of machine learning in the last decade. This paper focuses on the application of GAN in autonomous driving including topics such as advanced data augmentation, loss function learning, semi-supervised learning, etc. We formalize and review key applications of adversarial techniques and discuss challenges and open problems to be addressed.
△ Less
Submitted 2 February, 2020; v1 submitted 9 February, 2019;
originally announced February 2019.
-
Challenges in Designing Datasets and Validation for Autonomous Driving
Authors:
Michal Uricar,
David Hurych,
Pavel Krizek,
Senthil Yogamani
Abstract:
Autonomous driving is getting a lot of attention in the last decade and will be the hot topic at least until the first successful certification of a car with Level 5 autonomy. There are many public datasets in the academic community. However, they are far away from what a robust industrial production system needs. There is a large gap between academic and industrial setting and a substantial way f…
▽ More
Autonomous driving is getting a lot of attention in the last decade and will be the hot topic at least until the first successful certification of a car with Level 5 autonomy. There are many public datasets in the academic community. However, they are far away from what a robust industrial production system needs. There is a large gap between academic and industrial setting and a substantial way from a research prototype, built on public datasets, to a deployable solution which is a challenging task. In this paper, we focus on bad practices that often happen in the autonomous driving from an industrial deployment perspective. Data design deserves at least the same amount of attention as the model design. There is very little attention paid to these issues in the scientific community, and we hope this paper encourages better formalization of dataset design. More specifically, we focus on the datasets design and validation scheme for autonomous driving, where we would like to highlight the common problems, wrong assumptions, and steps towards avoiding them, as well as some open problems.
△ Less
Submitted 26 January, 2019;
originally announced January 2019.
-
Optical Flow augmented Semantic Segmentation networks for Automated Driving
Authors:
Hazem Rashed,
Senthil Yogamani,
Ahmad El-Sallab,
Pavel Krizek,
Mohamed El-Helw
Abstract:
Motion is a dominant cue in automated driving systems. Optical flow is typically computed to detect moving objects and to estimate depth using triangulation. In this paper, our motivation is to leverage the existing dense optical flow to improve the performance of semantic segmentation. To provide a systematic study, we construct four different architectures which use RGB only, flow only, RGBF con…
▽ More
Motion is a dominant cue in automated driving systems. Optical flow is typically computed to detect moving objects and to estimate depth using triangulation. In this paper, our motivation is to leverage the existing dense optical flow to improve the performance of semantic segmentation. To provide a systematic study, we construct four different architectures which use RGB only, flow only, RGBF concatenated and two-stream RGB + flow. We evaluate these networks on two automotive datasets namely Virtual KITTI and Cityscapes using the state-of-the-art flow estimator FlowNet v2. We also make use of the ground truth optical flow in Virtual KITTI to serve as an ideal estimator and a standard Farneback optical flow algorithm to study the effect of noise. Using the flow ground truth in Virtual KITTI, two-stream architecture achieves the best results with an improvement of 4% IoU. As expected, there is a large improvement for moving objects like trucks, vans and cars with 38%, 28% and 6% increase in IoU. FlowNet produces an improvement of 2.4% in average IoU with larger improvement in the moving objects corresponding to 26%, 11% and 5% in trucks, vans and cars. In Cityscapes, flow augmentation provided an improvement for moving objects like motorcycle and train with an increase of 17% and 7% in IoU.
△ Less
Submitted 11 January, 2019;
originally announced January 2019.
-
Design of Real-time Semantic Segmentation Decoder for Automated Driving
Authors:
Arindam Das,
Saranya Kandan,
Senthil Yogamani,
Pavel Krizek
Abstract:
Semantic segmentation remains a computationally intensive algorithm for embedded deployment even with the rapid growth of computation power. Thus efficient network design is a critical aspect especially for applications like automated driving which requires real-time performance. Recently, there has been a lot of research on designing efficient encoders that are mostly task agnostic. Unlike image…
▽ More
Semantic segmentation remains a computationally intensive algorithm for embedded deployment even with the rapid growth of computation power. Thus efficient network design is a critical aspect especially for applications like automated driving which requires real-time performance. Recently, there has been a lot of research on designing efficient encoders that are mostly task agnostic. Unlike image classification and bounding box object detection tasks, decoders are computationally expensive as well for semantic segmentation task. In this work, we focus on efficient design of the segmentation decoder and assume that an efficient encoder is already designed to provide shared features for a multi-task learning system. We design a novel efficient non-bottleneck layer and a family of decoders which fit into a small run-time budget using VGG10 as efficient encoder. We demonstrate in our dataset that experimentation with various design choices led to an improvement of 10\% from a baseline performance.
△ Less
Submitted 19 January, 2019;
originally announced January 2019.
-
AuxNet: Auxiliary tasks enhanced Semantic Segmentation for Automated Driving
Authors:
Sumanth Chennupati,
Ganesh Sistu,
Senthil Yogamani,
Samir Rawashdeh
Abstract:
Decision making in automated driving is highly specific to the environment and thus semantic segmentation plays a key role in recognizing the objects in the environment around the car. Pixel level classification once considered a challenging task which is now becoming mature to be productized in a car. However, semantic annotation is time consuming and quite expensive. Synthetic datasets with doma…
▽ More
Decision making in automated driving is highly specific to the environment and thus semantic segmentation plays a key role in recognizing the objects in the environment around the car. Pixel level classification once considered a challenging task which is now becoming mature to be productized in a car. However, semantic annotation is time consuming and quite expensive. Synthetic datasets with domain adaptation techniques have been used to alleviate the lack of large annotated datasets. In this work, we explore an alternate approach of leveraging the annotations of other tasks to improve semantic segmentation. Recently, multi-task learning became a popular paradigm in automated driving which demonstrates joint learning of multiple tasks improves overall performance of each tasks. Motivated by this, we use auxiliary tasks like depth estimation to improve the performance of semantic segmentation task. We propose adaptive task loss weighting techniques to address scale issues in multi-task loss functions which become more crucial in auxiliary tasks. We experimented on automotive datasets including SYNTHIA and KITTI and obtained 3% and 5% improvement in accuracy respectively.
△ Less
Submitted 17 January, 2019;
originally announced January 2019.
-
Real-time Joint Object Detection and Semantic Segmentation Network for Automated Driving
Authors:
Ganesh Sistu,
Isabelle Leang,
Senthil Yogamani
Abstract:
Convolutional Neural Networks (CNN) are successfully used for various visual perception tasks including bounding box object detection, semantic segmentation, optical flow, depth estimation and visual SLAM. Generally these tasks are independently explored and modeled. In this paper, we present a joint multi-task network design for learning object detection and semantic segmentation simultaneously.…
▽ More
Convolutional Neural Networks (CNN) are successfully used for various visual perception tasks including bounding box object detection, semantic segmentation, optical flow, depth estimation and visual SLAM. Generally these tasks are independently explored and modeled. In this paper, we present a joint multi-task network design for learning object detection and semantic segmentation simultaneously. The main motivation is to achieve real-time performance on a low power embedded SOC by sharing of encoder for both the tasks. We construct an efficient architecture using a small ResNet10 like encoder which is shared for both decoders. Object detection uses YOLO v2 like decoder and semantic segmentation uses FCN8 like decoder. We evaluate the proposed network in two public datasets (KITTI, Cityscapes) and in our private fisheye camera dataset, and demonstrate that joint network provides the same accuracy as that of separate networks. We further optimize the network to achieve 30 fps for 1280x384 resolution image.
△ Less
Submitted 12 January, 2019;
originally announced January 2019.
-
Multi-stream CNN based Video Semantic Segmentation for Automated Driving
Authors:
Ganesh Sistu,
Sumanth Chennupati,
Senthil Yogamani
Abstract:
Majority of semantic segmentation algorithms operate on a single frame even in the case of videos. In this work, the goal is to exploit temporal information within the algorithm model for leveraging motion cues and temporal consistency. We propose two simple high-level architectures based on Recurrent FCN (RFCN) and Multi-Stream FCN (MSFCN) networks. In case of RFCN, a recurrent network namely LST…
▽ More
Majority of semantic segmentation algorithms operate on a single frame even in the case of videos. In this work, the goal is to exploit temporal information within the algorithm model for leveraging motion cues and temporal consistency. We propose two simple high-level architectures based on Recurrent FCN (RFCN) and Multi-Stream FCN (MSFCN) networks. In case of RFCN, a recurrent network namely LSTM is inserted between the encoder and decoder. MSFCN combines the encoders of different frames into a fused encoder via 1x1 channel-wise convolution. We use a ResNet50 network as the baseline encoder and construct three networks namely MSFCN of order 2 & 3 and RFCN of order 2. MSFCN-3 produces the best results with an accuracy improvement of 9% and 15% for Highway and New York-like city scenarios in the SYNTHIA-CVPR'16 dataset using mean IoU metric. MSFCN-3 also produced 11% and 6% for SegTrack V2 and DAVIS datasets over the baseline FCN network. We also designed an efficient version of MSFCN-2 and RFCN-2 using weight sharing among the two encoders. The efficient MSFCN-2 provided an improvement of 11% and 5% for KITTI and SYNTHIA with negligible increase in computational complexity compared to the baseline version.
△ Less
Submitted 8 January, 2019;
originally announced January 2019.
-
Exploring applications of deep reinforcement learning for real-world autonomous driving systems
Authors:
Victor Talpaert,
Ibrahim Sobh,
B Ravi Kiran,
Patrick Mannion,
Senthil Yogamani,
Ahmad El-Sallab,
Patrick Perez
Abstract:
Deep Reinforcement Learning (DRL) has become increasingly powerful in recent years, with notable achievements such as Deepmind's AlphaGo. It has been successfully deployed in commercial vehicles like Mobileye's path planning system. However, a vast majority of work on DRL is focused on toy examples in controlled synthetic car simulator environments such as TORCS and CARLA. In general, DRL is still…
▽ More
Deep Reinforcement Learning (DRL) has become increasingly powerful in recent years, with notable achievements such as Deepmind's AlphaGo. It has been successfully deployed in commercial vehicles like Mobileye's path planning system. However, a vast majority of work on DRL is focused on toy examples in controlled synthetic car simulator environments such as TORCS and CARLA. In general, DRL is still at its infancy in terms of usability in real-world applications. Our goal in this paper is to encourage real-world deployment of DRL in various autonomous driving (AD) applications. We first provide an overview of the tasks in autonomous driving systems, reinforcement learning algorithms and applications of DRL to AD systems. We then discuss the challenges which must be addressed to enable further progress towards real-world deployment.
△ Less
Submitted 16 January, 2019; v1 submitted 6 January, 2019;
originally announced January 2019.
-
Real-time Dynamic Object Detection for Autonomous Driving using Prior 3D-Maps
Authors:
B Ravi Kiran,
Luis Roldão,
Benat Irastorza,
Renzo Verastegui,
Sebastian Suss,
Senthil Yogamani,
Victor Talpaert,
Alexandre Lepoutre,
Guillaume Trehard
Abstract:
Lidar has become an essential sensor for autonomous driving as it provides reliable depth estimation. Lidar is also the primary sensor used in building 3D maps which can be used even in the case of low-cost systems which do not use Lidar. Computation on Lidar point clouds is intensive as it requires processing of millions of points per second. Additionally there are many subsequent tasks such as c…
▽ More
Lidar has become an essential sensor for autonomous driving as it provides reliable depth estimation. Lidar is also the primary sensor used in building 3D maps which can be used even in the case of low-cost systems which do not use Lidar. Computation on Lidar point clouds is intensive as it requires processing of millions of points per second. Additionally there are many subsequent tasks such as clustering, detection, tracking and classification which makes real-time execution challenging. In this paper, we discuss real-time dynamic object detection algorithms which leverages previously mapped Lidar point clouds to reduce processing. The prior 3D maps provide a static background model and we formulate dynamic object detection as a background subtraction problem. Computation and modeling challenges in the map** and online execution pipeline are described. We propose a rejection cascade architecture to subtract road regions and other 3D regions separately. We implemented an initial version of our proposed algorithm and evaluated the accuracy on CARLA simulator.
△ Less
Submitted 5 July, 2019; v1 submitted 28 September, 2018;
originally announced September 2018.
-
Monocular Fisheye Camera Depth Estimation Using Sparse LiDAR Supervision
Authors:
Varun Ravi Kumar,
Stefan Milz,
Martin Simon,
Christian Witt,
Karl Amende,
Johannes Petzold,
Senthil Yogamani,
Timo Pech
Abstract:
Near field depth estimation around a self driving car is an important function that can be achieved by four wide angle fisheye cameras having a field of view of over 180. Depth estimation based on convolutional neural networks (CNNs) produce state of the art results, but progress is hindered because depth annotation cannot be obtained manually. Synthetic datasets are commonly used but they have li…
▽ More
Near field depth estimation around a self driving car is an important function that can be achieved by four wide angle fisheye cameras having a field of view of over 180. Depth estimation based on convolutional neural networks (CNNs) produce state of the art results, but progress is hindered because depth annotation cannot be obtained manually. Synthetic datasets are commonly used but they have limitations. For instance, they do not capture the extensive variability in the appearance of objects like vehicles present in real datasets. There is also a domain shift while performing inference on natural images illustrated by many attempts to handle the domain adaptation explicitly. In this work, we explore an alternate approach of training using sparse LiDAR data as ground truth for depth estimation for fisheye camera. We built our own dataset using our self driving car setup which has a 64 beam Velodyne LiDAR and four wide angle fisheye cameras. To handle the difference in view points of LiDAR and fisheye camera, an occlusion resolution mechanism was implemented. We started with Eigen's multiscale convolutional network architecture and improved by modifying activation function and optimizer. We obtained promising results on our dataset with RMSE errors comparable to the state of the art results obtained on KITTI.
△ Less
Submitted 24 September, 2018; v1 submitted 16 March, 2018;
originally announced March 2018.
-
RTSeg: Real-time Semantic Segmentation Comparative Study
Authors:
Mennatullah Siam,
Mostafa Gamal,
Moemen Abdel-Razek,
Senthil Yogamani,
Martin Jagersand
Abstract:
Semantic segmentation benefits robotics related applications especially autonomous driving. Most of the research on semantic segmentation is only on increasing the accuracy of segmentation models with little attention to computationally efficient solutions. The few work conducted in this direction does not provide principled methods to evaluate the different design choices for segmentation. In thi…
▽ More
Semantic segmentation benefits robotics related applications especially autonomous driving. Most of the research on semantic segmentation is only on increasing the accuracy of segmentation models with little attention to computationally efficient solutions. The few work conducted in this direction does not provide principled methods to evaluate the different design choices for segmentation. In this paper, we address this gap by presenting a real-time semantic segmentation benchmarking framework with a decoupled design for feature extraction and decoding methods. The framework is comprised of different network architectures for feature extraction such as VGG16, Resnet18, MobileNet, and ShuffleNet. It is also comprised of multiple meta-architectures for segmentation that define the decoding methodology. These include SkipNet, UNet, and Dilation Frontend. Experimental results are presented on the Cityscapes dataset for urban scenes. The modular design allows novel architectures to emerge, that lead to 143x GFLOPs reduction in comparison to SegNet. This benchmarking framework is publicly available at "https://github.com/MSiam/TFSegmentation".
△ Less
Submitted 16 May, 2020; v1 submitted 7 March, 2018;
originally announced March 2018.
-
MODNet: Moving Object Detection Network with Motion and Appearance for Autonomous Driving
Authors:
Mennatullah Siam,
Heba Mahgoub,
Mohamed Zahran,
Senthil Yogamani,
Martin Jagersand,
Ahmad El-Sallab
Abstract:
We propose a novel multi-task learning system that combines appearance and motion cues for a better semantic reasoning of the environment. A unified architecture for joint vehicle detection and motion segmentation is introduced. In this architecture, a two-stream encoder is shared among both tasks. In order to evaluate our method in autonomous driving setting, KITTI annotated sequences with detect…
▽ More
We propose a novel multi-task learning system that combines appearance and motion cues for a better semantic reasoning of the environment. A unified architecture for joint vehicle detection and motion segmentation is introduced. In this architecture, a two-stream encoder is shared among both tasks. In order to evaluate our method in autonomous driving setting, KITTI annotated sequences with detection and odometry ground truth are used to automatically generate static/dynamic annotations on the vehicles. This dataset is called KITTI Moving Object Detection dataset (KITTI MOD). The dataset will be made publicly available to act as a benchmark for the motion detection task. Our experiments show that the proposed method outperforms state of the art methods that utilize motion cue only with 21.5% in mAP on KITTI MOD. Our method performs on par with the state of the art unsupervised methods on DAVIS benchmark for generic object segmentation. One of our interesting conclusions is that joint training of motion segmentation and vehicle detection benefits motion segmentation. Motion segmentation has relatively fewer data, unlike the detection task. However, the shared fusion encoder benefits from joint training to learn a generalized representation. The proposed method runs in 120 ms per frame, which beats the state of the art motion detection/segmentation in computational efficiency.
△ Less
Submitted 12 November, 2017; v1 submitted 14 September, 2017;
originally announced September 2017.
-
Deep Semantic Segmentation for Automated Driving: Taxonomy, Roadmap and Challenges
Authors:
Mennatullah Siam,
Sara Elkerdawy,
Martin Jagersand,
Senthil Yogamani
Abstract:
Semantic segmentation was seen as a challenging computer vision problem few years ago. Due to recent advancements in deep learning, relatively accurate solutions are now possible for its use in automated driving. In this paper, the semantic segmentation problem is explored from the perspective of automated driving. Most of the current semantic segmentation algorithms are designed for generic image…
▽ More
Semantic segmentation was seen as a challenging computer vision problem few years ago. Due to recent advancements in deep learning, relatively accurate solutions are now possible for its use in automated driving. In this paper, the semantic segmentation problem is explored from the perspective of automated driving. Most of the current semantic segmentation algorithms are designed for generic images and do not incorporate prior structure and end goal for automated driving. First, the paper begins with a generic taxonomic survey of semantic segmentation algorithms and then discusses how it fits in the context of automated driving. Second, the particular challenges of deploying it into a safety system which needs high level of accuracy and robustness are listed. Third, different alternatives instead of using an independent semantic segmentation module are explored. Finally, an empirical evaluation of various semantic segmentation architectures was performed on CamVid dataset in terms of accuracy and speed. This paper is a preliminary shorter version of a more detailed survey which is work in progress.
△ Less
Submitted 3 August, 2017; v1 submitted 8 July, 2017;
originally announced July 2017.
-
Rejection-Cascade of Gaussians: Real-time adaptive background subtraction framework
Authors:
B Ravi Kiran,
Arindam Das,
Senthil Yogamani
Abstract:
Background-Foreground classification is a well-studied problem in computer vision. Due to the pixel-wise nature of modeling and processing in the algorithm, it is usually difficult to satisfy real-time constraints. There is a trade-off between the speed (because of model complexity) and accuracy. Inspired by the rejection cascade of Viola-Jones classifier, we decompose the Gaussian Mixture Model (…
▽ More
Background-Foreground classification is a well-studied problem in computer vision. Due to the pixel-wise nature of modeling and processing in the algorithm, it is usually difficult to satisfy real-time constraints. There is a trade-off between the speed (because of model complexity) and accuracy. Inspired by the rejection cascade of Viola-Jones classifier, we decompose the Gaussian Mixture Model (GMM) into an adaptive cascade of Gaussians(CoG). We achieve a good improvement in speed without compromising the accuracy with respect to the baseline GMM model. We demonstrate a speed-up factor of 4-5x and 17 percent average improvement in accuracy over Wallflowers surveillance datasets. The CoG is then demonstrated to over the latent space representation of images of a convolutional variational autoencoder(VAE). We provide initial results over CDW-2014 dataset, which could speed up background subtraction for deep architectures.
△ Less
Submitted 16 November, 2019; v1 submitted 25 May, 2017;
originally announced May 2017.
-
Deep Reinforcement Learning framework for Autonomous Driving
Authors:
Ahmad El Sallab,
Mohammed Abdou,
Etienne Perot,
Senthil Yogamani
Abstract:
Reinforcement learning is considered to be a strong AI paradigm which can be used to teach machines through interaction with the environment and learning from their mistakes. Despite its perceived utility, it has not yet been successfully applied in automotive applications. Motivated by the successful demonstrations of learning of Atari games and Go by Google DeepMind, we propose a framework for a…
▽ More
Reinforcement learning is considered to be a strong AI paradigm which can be used to teach machines through interaction with the environment and learning from their mistakes. Despite its perceived utility, it has not yet been successfully applied in automotive applications. Motivated by the successful demonstrations of learning of Atari games and Go by Google DeepMind, we propose a framework for autonomous driving using deep reinforcement learning. This is of particular relevance as it is difficult to pose autonomous driving as a supervised learning problem due to strong interactions with the environment including other vehicles, pedestrians and roadworks. As it is a relatively new area of research for autonomous driving, we provide a short overview of deep reinforcement learning and then describe our proposed framework. It incorporates Recurrent Neural Networks for information integration, enabling the car to handle partially observable scenarios. It also integrates the recent work on attention models to focus on relevant information, thereby reducing the computational complexity for deployment on embedded hardware. The framework was tested in an open source 3D car racing simulator called TORCS. Our simulation results demonstrate learning of autonomous maneuvering in a scenario of complex road curvatures and simple interaction of other vehicles.
△ Less
Submitted 8 April, 2017;
originally announced April 2017.
-
End-to-End Deep Reinforcement Learning for Lane Kee** Assist
Authors:
Ahmad El Sallab,
Mohammed Abdou,
Etienne Perot,
Senthil Yogamani
Abstract:
Reinforcement learning is considered to be a strong AI paradigm which can be used to teach machines through interaction with the environment and learning from their mistakes, but it has not yet been successfully used for automotive applications. There has recently been a revival of interest in the topic, however, driven by the ability of deep learning algorithms to learn good representations of th…
▽ More
Reinforcement learning is considered to be a strong AI paradigm which can be used to teach machines through interaction with the environment and learning from their mistakes, but it has not yet been successfully used for automotive applications. There has recently been a revival of interest in the topic, however, driven by the ability of deep learning algorithms to learn good representations of the environment. Motivated by Google DeepMind's successful demonstrations of learning for games from Breakout to Go, we will propose different methods for autonomous driving using deep reinforcement learning. This is of particular interest as it is difficult to pose autonomous driving as a supervised learning problem as it has a strong interaction with the environment including other vehicles, pedestrians and roadworks. As this is a relatively new area of research for autonomous driving, we will formulate two main categories of algorithms: 1) Discrete actions category, and 2) Continuous actions category. For the discrete actions category, we will deal with Deep Q-Network Algorithm (DQN) while for the continuous actions category, we will deal with Deep Deterministic Actor Critic Algorithm (DDAC). In addition to that, We will also discover the performance of these two categories on an open source car simulator for Racing called (TORCS) which stands for The Open Racing car Simulator. Our simulation results demonstrate learning of autonomous maneuvering in a scenario of complex road curvatures and simple interaction with other vehicles. Finally, we explain the effect of some restricted conditions, put on the car during the learning phase, on the convergence time for finishing its learning phase.
△ Less
Submitted 13 December, 2016;
originally announced December 2016.