-
Optimizing Visual Question Answering Models for Driving: Bridging the Gap Between Human and Machine Attention Patterns
Authors:
Kaavya Rekanar,
Martin Hayes,
Ganesh Sistu,
Ciaran Eising
Abstract:
Visual Question Answering (VQA) models play a critical role in enhancing the perception capabilities of autonomous driving systems by allowing vehicles to analyze visual inputs alongside textual queries, fostering natural interaction and trust between the vehicle and its occupants or other road users. This study investigates the attention patterns of humans compared to a VQA model when answering d…
▽ More
Visual Question Answering (VQA) models play a critical role in enhancing the perception capabilities of autonomous driving systems by allowing vehicles to analyze visual inputs alongside textual queries, fostering natural interaction and trust between the vehicle and its occupants or other road users. This study investigates the attention patterns of humans compared to a VQA model when answering driving-related questions, revealing disparities in the objects observed. We propose an approach integrating filters to optimize the model's attention mechanisms, prioritizing relevant objects and improving accuracy. Utilizing the LXMERT model for a case study, we compare attention patterns of the pre-trained and Filter Integrated models, alongside human answers using images from the NuImages dataset, gaining insights into feature prioritization. We evaluated the models using a Subjective scoring framework which shows that the integration of the feature encoder filter has enhanced the performance of the VQA model by refining its attention mechanisms.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Surround-View Fisheye Optics in Computer Vision and Simulation: Survey and Challenges
Authors:
Daniel Jakab,
Brian Michael Deegan,
Sushil Sharma,
Eoin Martino Grua,
Jonathan Horgan,
Enda Ward,
Pepijn Van De Ven,
Anthony Scanlan,
Ciarán Eising
Abstract:
In this paper, we provide a survey on automotive surround-view fisheye optics, with an emphasis on the impact of optical artifacts on computer vision tasks in autonomous driving and ADAS. The automotive industry has advanced in applying state-of-the-art computer vision to enhance road safety and provide automated driving functionality. When using camera systems on vehicles, there is a particular n…
▽ More
In this paper, we provide a survey on automotive surround-view fisheye optics, with an emphasis on the impact of optical artifacts on computer vision tasks in autonomous driving and ADAS. The automotive industry has advanced in applying state-of-the-art computer vision to enhance road safety and provide automated driving functionality. When using camera systems on vehicles, there is a particular need for a wide field of view to capture the entire vehicle's surroundings, in areas such as low-speed maneuvering, automated parking, and cocoon sensing. However, one crucial challenge in surround-view cameras is the strong optical aberrations of the fisheye camera, which is an area that has received little attention in the literature. Additionally, a comprehensive dataset is needed for testing safety-critical scenarios in vehicle automation. The industry has turned to simulation as a cost-effective strategy for creating synthetic datasets with surround-view camera imagery. We examine different simulation methods (such as model-driven and data-driven simulations) and discuss the simulators' ability (or lack thereof) to model real-world optical performance. Overall, this paper highlights the optical aberrations in automotive fisheye datasets, and the limitations of optical reality in simulated fisheye datasets, with a focus on computer vision in surround-view optical systems.
△ Less
Submitted 21 February, 2024; v1 submitted 19 February, 2024;
originally announced February 2024.
-
Fisheye Camera and Ultrasonic Sensor Fusion For Near-Field Obstacle Perception in Bird's-Eye-View
Authors:
Arindam Das,
Sudarshan Paul,
Niko Scholz,
Akhilesh Kumar Malviya,
Ganesh Sistu,
Ujjwal Bhattacharya,
Ciarán Eising
Abstract:
Accurate obstacle identification represents a fundamental challenge within the scope of near-field perception for autonomous driving. Conventionally, fisheye cameras are frequently employed for comprehensive surround-view perception, including rear-view obstacle localization. However, the performance of such cameras can significantly deteriorate in low-light conditions, during nighttime, or when s…
▽ More
Accurate obstacle identification represents a fundamental challenge within the scope of near-field perception for autonomous driving. Conventionally, fisheye cameras are frequently employed for comprehensive surround-view perception, including rear-view obstacle localization. However, the performance of such cameras can significantly deteriorate in low-light conditions, during nighttime, or when subjected to intense sun glare. Conversely, cost-effective sensors like ultrasonic sensors remain largely unaffected under these conditions. Therefore, we present, to our knowledge, the first end-to-end multimodal fusion model tailored for efficient obstacle perception in a bird's-eye-view (BEV) perspective, utilizing fisheye cameras and ultrasonic sensors. Initially, ResNeXt-50 is employed as a set of unimodal encoders to extract features specific to each modality. Subsequently, the feature space associated with the visible spectrum undergoes transformation into BEV. The fusion of these two modalities is facilitated via concatenation. At the same time, the ultrasonic spectrum-based unimodal feature maps pass through content-aware dilated convolution, applied to mitigate the sensor misalignment between two sensors in the fused feature space. Finally, the fused features are utilized by a two-stage semantic occupancy decoder to generate grid-wise predictions for precise obstacle perception. We conduct a systematic investigation to determine the optimal strategy for multimodal fusion of both sensors. We provide insights into our dataset creation procedures, annotation guidelines, and perform a thorough data analysis to ensure adequate coverage of all scenarios. When applied to our dataset, the experimental results underscore the robustness and effectiveness of our proposed multimodal fusion approach.
△ Less
Submitted 1 February, 2024;
originally announced February 2024.
-
Machine Learning for Healthcare-IoT Security: A Review and Risk Mitigation
Authors:
Mirza Akhi Khatun,
Sanober Farheen Memon,
Ciarán Eising,
Lubna Luxmi Dhirani
Abstract:
The Healthcare Internet-of-Things (H-IoT), commonly known as Digital Healthcare, is a data-driven infrastructure that highly relies on smart sensing devices (i.e., blood pressure monitors, temperature sensors, etc.) for faster response time, treatments, and diagnosis. However, with the evolving cyber threat landscape, IoT devices have become more vulnerable to the broader risk surface (e.g., risks…
▽ More
The Healthcare Internet-of-Things (H-IoT), commonly known as Digital Healthcare, is a data-driven infrastructure that highly relies on smart sensing devices (i.e., blood pressure monitors, temperature sensors, etc.) for faster response time, treatments, and diagnosis. However, with the evolving cyber threat landscape, IoT devices have become more vulnerable to the broader risk surface (e.g., risks associated with generative AI, 5G-IoT, etc.), which, if exploited, may lead to data breaches, unauthorized access, and lack of command and control and potential harm. This paper reviews the fundamentals of healthcare IoT, its privacy, and data security challenges associated with machine learning and H-IoT devices. The paper further emphasizes the importance of monitoring healthcare IoT layers such as perception, network, cloud, and application. Detecting and responding to anomalies involves various cyber-attacks and protocols such as Wi-Fi 6, Narrowband Internet of Things (NB-IoT), Bluetooth, ZigBee, LoRa, and 5G New Radio (5G NR). A robust authentication mechanism based on machine learning and deep learning techniques is required to protect and mitigate H-IoT devices from increasing cybersecurity vulnerabilities. Hence, in this review paper, security and privacy challenges and risk mitigation strategies for building resilience in H-IoT are explored and reported.
△ Less
Submitted 17 January, 2024;
originally announced January 2024.
-
Measuring Natural Scenes SFR of Automotive Fisheye Cameras
Authors:
Daniel Jakab,
Eoin Martino Grua,
Brian Micheal Deegan,
Anthony Scanlan,
Pepijn Van De Ven,
Ciarán Eising
Abstract:
The Modulation Transfer Function (MTF) is an important image quality metric typically used in the automotive domain. However, despite the fact that optical quality has an impact on the performance of computer vision in vehicle automation, for many public datasets, this metric is unknown. Additionally, wide field-of-view (FOV) cameras have become increasingly popular, particularly for low-speed veh…
▽ More
The Modulation Transfer Function (MTF) is an important image quality metric typically used in the automotive domain. However, despite the fact that optical quality has an impact on the performance of computer vision in vehicle automation, for many public datasets, this metric is unknown. Additionally, wide field-of-view (FOV) cameras have become increasingly popular, particularly for low-speed vehicle automation applications. To investigate image quality in datasets, this paper proposes an adaptation of the Natural Scenes Spatial Frequency Response (NS-SFR) algorithm to suit cameras with a wide field-of-view.
△ Less
Submitted 10 January, 2024;
originally announced January 2024.
-
Optimizing Ego Vehicle Trajectory Prediction: The Graph Enhancement Approach
Authors:
Sushil Sharma,
Aryan Singh,
Ganesh Sistu,
Mark Halton,
Ciarán Eising
Abstract:
Predicting the trajectory of an ego vehicle is a critical component of autonomous driving systems. Current state-of-the-art methods typically rely on Deep Neural Networks (DNNs) and sequential models to process front-view images for future trajectory prediction. However, these approaches often struggle with perspective issues affecting object features in the scene. To address this, we advocate for…
▽ More
Predicting the trajectory of an ego vehicle is a critical component of autonomous driving systems. Current state-of-the-art methods typically rely on Deep Neural Networks (DNNs) and sequential models to process front-view images for future trajectory prediction. However, these approaches often struggle with perspective issues affecting object features in the scene. To address this, we advocate for the use of Bird's Eye View (BEV) perspectives, which offer unique advantages in capturing spatial relationships and object homogeneity. In our work, we leverage Graph Neural Networks (GNNs) and positional encoding to represent objects in a BEV, achieving competitive performance compared to traditional DNN-based methods. While the BEV-based approach loses some detailed information inherent to front-view images, we balance this by enriching the BEV data by representing it as a graph where relationships between the objects in a scene are captured effectively.
△ Less
Submitted 10 January, 2024; v1 submitted 20 December, 2023;
originally announced December 2023.
-
BEVSeg2TP: Surround View Camera Bird's-Eye-View Based Joint Vehicle Segmentation and Ego Vehicle Trajectory Prediction
Authors:
Sushil Sharma,
Arindam Das,
Ganesh Sistu,
Mark Halton,
Ciarán Eising
Abstract:
Trajectory prediction is, naturally, a key task for vehicle autonomy. While the number of traffic rules is limited, the combinations and uncertainties associated with each agent's behaviour in real-world scenarios are nearly impossible to encode. Consequently, there is a growing interest in learning-based trajectory prediction. The proposed method in this paper predicts trajectories by considering…
▽ More
Trajectory prediction is, naturally, a key task for vehicle autonomy. While the number of traffic rules is limited, the combinations and uncertainties associated with each agent's behaviour in real-world scenarios are nearly impossible to encode. Consequently, there is a growing interest in learning-based trajectory prediction. The proposed method in this paper predicts trajectories by considering perception and trajectory prediction as a unified system. In considering them as unified tasks, we show that there is the potential to improve the performance of perception. To achieve these goals, we present BEVSeg2TP - a surround-view camera bird's-eye-view-based joint vehicle segmentation and ego vehicle trajectory prediction system for autonomous vehicles. The proposed system uses a network trained on multiple camera views. The images are transformed using several deep learning techniques to perform semantic segmentation of objects, including other vehicles, in the scene. The segmentation outputs are fused across the camera views to obtain a comprehensive representation of the surrounding vehicles from the bird's-eye-view perspective. The system further predicts the future trajectory of the ego vehicle using a spatiotemporal probabilistic network (STPN) to optimize trajectory prediction. This network leverages information from encoder-decoder transformers and joint vehicle segmentation.
△ Less
Submitted 20 December, 2023;
originally announced December 2023.
-
Connecting the Dots: Graph Neural Network Powered Ensemble and Classification of Medical Images
Authors:
Aryan Singh,
Pepijn Van de Ven,
Ciarán Eising,
Patrick Denny
Abstract:
Deep learning models have demonstrated remarkable results for various computer vision tasks, including the realm of medical imaging. However, their application in the medical domain is limited due to the requirement for large amounts of training data, which can be both challenging and expensive to obtain. To mitigate this, pre-trained models have been fine-tuned on domain-specific data, but such a…
▽ More
Deep learning models have demonstrated remarkable results for various computer vision tasks, including the realm of medical imaging. However, their application in the medical domain is limited due to the requirement for large amounts of training data, which can be both challenging and expensive to obtain. To mitigate this, pre-trained models have been fine-tuned on domain-specific data, but such an approach can suffer from inductive biases. Furthermore, deep learning models struggle to learn the relationship between spatially distant features and their importance, as convolution operations treat all pixels equally. Pioneering a novel solution to this challenge, we employ the Image Foresting Transform to optimally segment images into superpixels. These superpixels are subsequently transformed into graph-structured data, enabling the proficient extraction of features and modeling of relationships using Graph Neural Networks (GNNs). Our method harnesses an ensemble of three distinct GNN architectures to boost its robustness. In our evaluations targeting pneumonia classification, our methodology surpassed prevailing Deep Neural Networks (DNNs) in performance, all while drastically cutting down on the parameter count. This not only trims down the expenses tied to data but also accelerates training and minimizes bias. Consequently, our proposition offers a sturdy, economically viable, and scalable strategy for medical image classification, significantly diminishing dependency on extensive training data sets.
△ Less
Submitted 13 November, 2023;
originally announced November 2023.
-
Self-Supervised Online Camera Calibration for Automated Driving and Parking Applications
Authors:
Ciarán Hogan,
Ganesh Sistu,
Ciarán Eising
Abstract:
Camera-based perception systems play a central role in modern autonomous vehicles. These camera based perception algorithms require an accurate calibration to map the real world distances to image pixels. In practice, calibration is a laborious procedure requiring specialised data collection and careful tuning. This process must be repeated whenever the parameters of the camera change, which can b…
▽ More
Camera-based perception systems play a central role in modern autonomous vehicles. These camera based perception algorithms require an accurate calibration to map the real world distances to image pixels. In practice, calibration is a laborious procedure requiring specialised data collection and careful tuning. This process must be repeated whenever the parameters of the camera change, which can be a frequent occurrence in autonomous vehicles. Hence there is a need to calibrate at regular intervals to ensure the camera is accurate. Proposed is a deep learning framework to learn intrinsic and extrinsic calibration of the camera in real time. The framework is self-supervised and doesn't require any labelling or supervision to learn the calibration parameters. The framework learns calibration without the need for any physical targets or to drive the car on special planar surfaces.
△ Less
Submitted 16 August, 2023;
originally announced August 2023.
-
Hardware Accelerators in Autonomous Driving
Authors:
Ken Power,
Shailendra Deva,
Ting Wang,
Julius Li,
Ciarán Eising
Abstract:
Computing platforms in autonomous vehicles record large amounts of data from many sensors, process the data through machine learning models, and make decisions to ensure the vehicle's safe operation. Fast, accurate, and reliable decision-making is critical. Traditional computer processors lack the power and flexibility needed for the perception and machine vision demands of advanced autonomous dri…
▽ More
Computing platforms in autonomous vehicles record large amounts of data from many sensors, process the data through machine learning models, and make decisions to ensure the vehicle's safe operation. Fast, accurate, and reliable decision-making is critical. Traditional computer processors lack the power and flexibility needed for the perception and machine vision demands of advanced autonomous driving tasks. Hardware accelerators are special-purpose coprocessors that help autonomous vehicles meet performance requirements for higher levels of autonomy. This paper provides an overview of ML accelerators with examples of their use for machine vision in autonomous vehicles. We offer recommendations for researchers and practitioners and highlight a trajectory for ongoing and future research in this emerging field.
△ Less
Submitted 11 August, 2023;
originally announced August 2023.
-
Compact & Capable: Harnessing Graph Neural Networks and Edge Convolution for Medical Image Classification
Authors:
Aryan Singh,
Pepijn Van de Ven,
Ciarán Eising,
Patrick Denny
Abstract:
Graph-based neural network models are gaining traction in the field of representation learning due to their ability to uncover latent topological relationships between entities that are otherwise challenging to identify. These models have been employed across a diverse range of domains, encompassing drug discovery, protein interactions, semantic segmentation, and fluid dynamics research. In this s…
▽ More
Graph-based neural network models are gaining traction in the field of representation learning due to their ability to uncover latent topological relationships between entities that are otherwise challenging to identify. These models have been employed across a diverse range of domains, encompassing drug discovery, protein interactions, semantic segmentation, and fluid dynamics research. In this study, we investigate the potential of Graph Neural Networks (GNNs) for medical image classification. We introduce a novel model that combines GNNs and edge convolution, leveraging the interconnectedness of RGB channel feature values to strongly represent connections between crucial graph nodes. Our proposed model not only performs on par with state-of-the-art Deep Neural Networks (DNNs) but does so with 1000 times fewer parameters, resulting in reduced training time and data requirements. We compare our Graph Convolutional Neural Network (GCNN) to pre-trained DNNs for classifying MedMNIST dataset classes, revealing promising prospects for GNNs in medical image analysis. Our results also encourage further exploration of advanced graph-based models such as Graph Attention Networks (GAT) and Graph Auto-Encoders in the medical imaging domain. The proposed model yields more reliable, interpretable, and accurate outcomes for tasks like semantic segmentation and image classification compared to simpler GCNNs
△ Less
Submitted 24 July, 2023;
originally announced July 2023.
-
Towards a performance analysis on pre-trained Visual Question Answering models for autonomous driving
Authors:
Kaavya Rekanar,
Ciarán Eising,
Ganesh Sistu,
Martin Hayes
Abstract:
This short paper presents a preliminary analysis of three popular Visual Question Answering (VQA) models, namely ViLBERT, ViLT, and LXMERT, in the context of answering questions relating to driving scenarios. The performance of these models is evaluated by comparing the similarity of responses to reference answers provided by computer vision experts. Model selection is predicated on the analysis o…
▽ More
This short paper presents a preliminary analysis of three popular Visual Question Answering (VQA) models, namely ViLBERT, ViLT, and LXMERT, in the context of answering questions relating to driving scenarios. The performance of these models is evaluated by comparing the similarity of responses to reference answers provided by computer vision experts. Model selection is predicated on the analysis of transformer utilization in multimodal architectures. The results indicate that models incorporating cross-modal attention and late fusion techniques exhibit promising potential for generating improved answers within a driving perspective. This initial analysis serves as a launchpad for a forthcoming comprehensive comparative study involving nine VQA models and sets the scene for further investigations into the effectiveness of VQA model queries in self-driving scenarios. Supplementary material is available at https://github.com/KaavyaRekanar/Towards-a-performance-analysis-on-pre-trained-VQA-models-for-autonomous-driving.
△ Less
Submitted 28 July, 2023; v1 submitted 18 July, 2023;
originally announced July 2023.
-
Navigating Uncertainty: The Role of Short-Term Trajectory Prediction in Autonomous Vehicle Safety
Authors:
Sushil Sharma,
Ganesh Sistu,
Lucie Yahiaoui,
Arindam Das,
Mark Halton,
Ciarán Eising
Abstract:
Autonomous vehicles require accurate and reliable short-term trajectory predictions for safe and efficient driving. While most commercial automated vehicles currently use state machine-based algorithms for trajectory forecasting, recent efforts have focused on end-to-end data-driven systems. Often, the design of these models is limited by the availability of datasets, which are typically restricte…
▽ More
Autonomous vehicles require accurate and reliable short-term trajectory predictions for safe and efficient driving. While most commercial automated vehicles currently use state machine-based algorithms for trajectory forecasting, recent efforts have focused on end-to-end data-driven systems. Often, the design of these models is limited by the availability of datasets, which are typically restricted to generic scenarios. To address this limitation, we have developed a synthetic dataset for short-term trajectory prediction tasks using the CARLA simulator. This dataset is extensive and incorporates what is considered complex scenarios - pedestrians crossing the road, vehicles overtaking - and comprises 6000 perspective view images with corresponding IMU and odometry information for each frame. Furthermore, an end-to-end short-term trajectory prediction model using convolutional neural networks (CNN) and long short-term memory (LSTM) networks has also been developed. This model can handle corner cases, such as slowing down near zebra crossings and stop** when pedestrians cross the road, without the need for explicit encoding of the surrounding environment. In an effort to accelerate this research and assist others, we are releasing our dataset and model to the research community. Our datasets are publicly available on https://github.com/sharmasushil/Navigating-Uncertainty-Trajectory-Prediction .
△ Less
Submitted 12 July, 2023; v1 submitted 11 July, 2023;
originally announced July 2023.
-
Near Field iToF LIDAR Depth Improvement from Limited Number of Shots
Authors:
Mena Nagiub,
Thorsten Beuth,
Ganesh Sistu,
Heinrich Gotzig,
Ciarán Eising
Abstract:
Indirect Time of Flight LiDARs can indirectly calculate the scene's depth from the phase shift angle between transmitted and received laser signals with amplitudes modulated at a predefined frequency. Unfortunately, this method generates ambiguity in calculated depth when the phase shift angle value exceeds $2π$. Current state-of-the-art methods use raw samples generated using two distinct modulat…
▽ More
Indirect Time of Flight LiDARs can indirectly calculate the scene's depth from the phase shift angle between transmitted and received laser signals with amplitudes modulated at a predefined frequency. Unfortunately, this method generates ambiguity in calculated depth when the phase shift angle value exceeds $2π$. Current state-of-the-art methods use raw samples generated using two distinct modulation frequencies to overcome this ambiguity problem. However, this comes at the cost of increasing laser components' stress and raising their temperature, which reduces their lifetime and increases power consumption. In our work, we study two different methods to recover the entire depth range of the LiDAR using fewer raw data sample shots from a single modulation frequency with the support of sensor's gray scale output to reduce the laser components' stress and power consumption.
△ Less
Submitted 28 July, 2023; v1 submitted 14 April, 2023;
originally announced April 2023.
-
Revisiting Modality Imbalance In Multimodal Pedestrian Detection
Authors:
Arindam Das,
Sudip Das,
Ganesh Sistu,
Jonathan Horgan,
Ujjwal Bhattacharya,
Edward Jones,
Martin Glavin,
Ciarán Eising
Abstract:
Multimodal learning, particularly for pedestrian detection, has recently received emphasis due to its capability to function equally well in several critical autonomous driving scenarios such as low-light, night-time, and adverse weather conditions. However, in most cases, the training distribution largely emphasizes the contribution of one specific input that makes the network biased towards one…
▽ More
Multimodal learning, particularly for pedestrian detection, has recently received emphasis due to its capability to function equally well in several critical autonomous driving scenarios such as low-light, night-time, and adverse weather conditions. However, in most cases, the training distribution largely emphasizes the contribution of one specific input that makes the network biased towards one modality. Hence, the generalization of such models becomes a significant problem where the non-dominant input modality during training could be contributing more to the course of inference. Here, we introduce a novel training setup with regularizer in the multimodal architecture to resolve the problem of this disparity between the modalities. Specifically, our regularizer term helps to make the feature fusion method more robust by considering both the feature extractors equivalently important during the training to extract the multimodal distribution which is referred to as removing the imbalance problem. Furthermore, our decoupling concept of output stream helps the detection task by sharing the spatial sensitive information mutually. Extensive experiments of the proposed method on KAIST and UTokyo datasets shows improvement of the respective state-of-the-art performance.
△ Less
Submitted 7 July, 2023; v1 submitted 24 February, 2023;
originally announced February 2023.
-
Classification of electromagnetic interference induced image noise in an analog video link
Authors:
Anthony Purcell,
Ciarán Eising
Abstract:
With the ever-increasing electrification of the vehicle showing no sign of retreating, electronic systems deployed in automotive applications are subject to more stringent Electromagnetic Immunity compliance constraints than ever before, to ensure the proximity of nearby electronic systems will not affect their operation. The EMI compliance testing of an analog camera link requires video quality t…
▽ More
With the ever-increasing electrification of the vehicle showing no sign of retreating, electronic systems deployed in automotive applications are subject to more stringent Electromagnetic Immunity compliance constraints than ever before, to ensure the proximity of nearby electronic systems will not affect their operation. The EMI compliance testing of an analog camera link requires video quality to be monitored and assessed to validate such compliance, which up to now, has been a manual task. Due to the nature of human interpretation, this is open to inconsistency. Here, we propose a solution using deep learning models that analyse, and grade video content derived from an EMI compliance test. These models are trained using a dataset built entirely from real test image data to ensure the accuracy of the resultant model(s) is maximised. Starting with the standard AlexNet, we propose four models to classify the EMI noise level
△ Less
Submitted 18 August, 2022; v1 submitted 9 August, 2022;
originally announced August 2022.
-
Deep Multi-Task Networks For Occluded Pedestrian Pose Estimation
Authors:
Arindam Das,
Sudip Das,
Ganesh Sistu,
Jonathan Horgan,
Ujjwal Bhattacharya,
Edward Jones,
Martin Glavin,
Ciarán Eising
Abstract:
Most of the existing works on pedestrian pose estimation do not consider estimating the pose of an occluded pedestrian, as the annotations of the occluded parts are not available in relevant automotive datasets. For example, CityPersons, a well-known dataset for pedestrian detection in automotive scenes does not provide pose annotations, whereas MS-COCO, a non-automotive dataset, contains human po…
▽ More
Most of the existing works on pedestrian pose estimation do not consider estimating the pose of an occluded pedestrian, as the annotations of the occluded parts are not available in relevant automotive datasets. For example, CityPersons, a well-known dataset for pedestrian detection in automotive scenes does not provide pose annotations, whereas MS-COCO, a non-automotive dataset, contains human pose estimation. In this work, we propose a multi-task framework to extract pedestrian features through detection and instance segmentation tasks performed separately on these two distributions. Thereafter, an encoder learns pose specific features using an unsupervised instance-level domain adaptation method for the pedestrian instances from both distributions. The proposed framework has improved state-of-the-art performances of pose estimation, pedestrian detection, and instance segmentation.
△ Less
Submitted 8 August, 2022; v1 submitted 15 June, 2022;
originally announced June 2022.
-
Direct Triangulation with Spherical Projection for Omnidirectional Cameras
Authors:
Ciarán Eising
Abstract:
In this paper, it is proposed to solve the problem of triangulation for calibrated omnidirectional cameras through the optimisation of ray-pairs on the projective sphere. The proposed solution boils down to finding the roots of a quadratic function, and as such is closed form, completely non-iterative and computationally inexpensive when compared to previous methods. In addition, even thought the…
▽ More
In this paper, it is proposed to solve the problem of triangulation for calibrated omnidirectional cameras through the optimisation of ray-pairs on the projective sphere. The proposed solution boils down to finding the roots of a quadratic function, and as such is closed form, completely non-iterative and computationally inexpensive when compared to previous methods. In addition, even thought the motivation is clearly to solve the triangulation problem for omnidirectional cameras, it is demonstrated that the proposed methods can be applied to non-omnidirectional, narrow field-of-view cameras.
△ Less
Submitted 8 June, 2022;
originally announced June 2022.
-
Surround-view Fisheye Camera Perception for Automated Driving: Overview, Survey and Challenges
Authors:
Varun Ravi Kumar,
Ciaran Eising,
Christian Witt,
Senthil Yogamani
Abstract:
Surround-view fisheye cameras are commonly used for near-field sensing in automated driving. Four fisheye cameras on four sides of the vehicle are sufficient to cover 360° around the vehicle capturing the entire near-field region. Some primary use cases are automated parking, traffic jam assist, and urban driving. There are limited datasets and very little work on near-field perception tasks as th…
▽ More
Surround-view fisheye cameras are commonly used for near-field sensing in automated driving. Four fisheye cameras on four sides of the vehicle are sufficient to cover 360° around the vehicle capturing the entire near-field region. Some primary use cases are automated parking, traffic jam assist, and urban driving. There are limited datasets and very little work on near-field perception tasks as the focus in automotive perception is on far-field perception. In contrast to far-field, surround-view perception poses additional challenges due to high precision object detection requirements of 10cm and partial visibility of objects. Due to the large radial distortion of fisheye cameras, standard algorithms cannot be extended easily to the surround-view use case. Thus, we are motivated to provide a self-contained reference for automotive fisheye camera perception for researchers and practitioners. Firstly, we provide a unified and taxonomic treatment of commonly used fisheye camera models. Secondly, we discuss various perception tasks and existing literature. Finally, we discuss the challenges and future direction.
△ Less
Submitted 5 January, 2023; v1 submitted 26 May, 2022;
originally announced May 2022.
-
UnShadowNet: Illumination Critic Guided Contrastive Learning For Shadow Removal
Authors:
Subhrajyoti Dasgupta,
Arindam Das,
Senthil Yogamani,
Sudip Das,
Ciaran Eising,
Andrei Bursuc,
Ujjwal Bhattacharya
Abstract:
Shadows are frequently encountered natural phenomena that significantly hinder the performance of computer vision perception systems in practical settings, e.g., autonomous driving. A solution to this would be to eliminate shadow regions from the images before the processing of the perception system. Yet, training such a solution requires pairs of aligned shadowed and non-shadowed images which are…
▽ More
Shadows are frequently encountered natural phenomena that significantly hinder the performance of computer vision perception systems in practical settings, e.g., autonomous driving. A solution to this would be to eliminate shadow regions from the images before the processing of the perception system. Yet, training such a solution requires pairs of aligned shadowed and non-shadowed images which are difficult to obtain. We introduce a novel weakly supervised shadow removal framework UnShadowNet trained using contrastive learning. It is composed of a DeShadower network responsible for the removal of the extracted shadow under the guidance of an Illumination network which is trained adversarially by the illumination critic and a Refinement network to further remove artefacts. We show that UnShadowNet can be easily extended to a fully-supervised set-up to exploit the ground-truth when available. UnShadowNet outperforms existing state-of-the-art approaches on three publicly available shadow datasets (ISTD, adjusted ISTD, SRD) in both the weakly and fully supervised setups.
△ Less
Submitted 24 August, 2023; v1 submitted 29 March, 2022;
originally announced March 2022.
-
2.5D Vehicle Odometry Estimation
Authors:
Ciaran Eising,
Leroy-Francisco Pereira,
Jonathan Horgan,
Anbuchezhiyan Selvaraju,
John McDonald,
Paul Moran
Abstract:
It is well understood that in ADAS applications, a good estimate of the pose of the vehicle is required. This paper proposes a metaphorically named 2.5D odometry, whereby the planar odometry derived from the yaw rate sensor and four wheel speed sensors is augmented by a linear model of suspension. While the core of the planar odometry is a yaw rate model that is already understood in the literatur…
▽ More
It is well understood that in ADAS applications, a good estimate of the pose of the vehicle is required. This paper proposes a metaphorically named 2.5D odometry, whereby the planar odometry derived from the yaw rate sensor and four wheel speed sensors is augmented by a linear model of suspension. While the core of the planar odometry is a yaw rate model that is already understood in the literature, we augment this by fitting a quadratic to the incoming signals, enabling interpolation, extrapolation, and a finer integration of the vehicle position. We show, by experimental results with a DGPS/IMU reference, that this model provides highly accurate odometry estimates, compared with existing methods. Utilising sensors that return the change in height of vehicle reference points with changing suspension configurations, we define a planar model of the vehicle suspension, thus augmenting the odometry model. We present an experimental framework and evaluations criteria by which the goodness of the odometry is evaluated and compared with existing methods. This odometry model has been designed to support low-speed surround-view camera systems that are well-known. Thus, we present some application results that show a performance boost for viewing and computer vision applications using the proposed odometry
△ Less
Submitted 16 November, 2021;
originally announced November 2021.
-
An Online Learning System for Wireless Charging Alignment using Surround-view Fisheye Cameras
Authors:
Ashok Dahal,
Varun Ravi Kumar,
Senthil Yogamani,
Ciaran Eising
Abstract:
Electric Vehicles are increasingly common, with inductive chargepads being considered a convenient and efficient means of charging electric vehicles. However, drivers are typically poor at aligning the vehicle to the necessary accuracy for efficient inductive charging, making the automated alignment of the two charging plates desirable. In parallel to the electrification of the vehicular fleet, au…
▽ More
Electric Vehicles are increasingly common, with inductive chargepads being considered a convenient and efficient means of charging electric vehicles. However, drivers are typically poor at aligning the vehicle to the necessary accuracy for efficient inductive charging, making the automated alignment of the two charging plates desirable. In parallel to the electrification of the vehicular fleet, automated parking systems that make use of surround-view camera systems are becoming increasingly popular. In this work, we propose a system based on the surround-view camera architecture to detect, localize, and automatically align the vehicle with the inductive chargepad. The visual design of the chargepads is not standardized and not necessarily known beforehand. Therefore, a system that relies on offline training will fail in some situations. Thus, we propose a self-supervised online learning method that leverages the driver's actions when manually aligning the vehicle with the chargepad and combine it with weak supervision from semantic segmentation and depth to learn a classifier to auto-annotate the chargepad in the video for further training. In this way, when faced with a previously unseen chargepad, the driver needs only manually align the vehicle a single time. As the chargepad is flat on the ground, it is not easy to detect it from a distance. Thus, we propose using a Visual SLAM pipeline to learn landmarks relative to the chargepad to enable alignment from a greater range. We demonstrate the working system on an automated vehicle as illustrated in the video at https://youtu.be/_cLCmkW4UYo. To encourage further research, we will share a chargepad dataset used in this work.
△ Less
Submitted 21 December, 2022; v1 submitted 26 May, 2021;
originally announced May 2021.
-
A 2.5D Vehicle Odometry Estimation for Vision Applications
Authors:
Paul Moran,
Leroy-Francisco Periera,
Anbuchezhiyan Selvaraju,
Tejash Prakash,
Pantelis Ermilios,
John McDonald,
Jonathan Horgan,
Ciarán Eising
Abstract:
This paper proposes a method to estimate the pose of a sensor mounted on a vehicle as the vehicle moves through the world, an important topic for autonomous driving systems. Based on a set of commonly deployed vehicular odometric sensors, with outputs available on automotive communication buses (e.g. CAN or FlexRay), we describe a set of steps to combine a planar odometry based on wheel sensors wi…
▽ More
This paper proposes a method to estimate the pose of a sensor mounted on a vehicle as the vehicle moves through the world, an important topic for autonomous driving systems. Based on a set of commonly deployed vehicular odometric sensors, with outputs available on automotive communication buses (e.g. CAN or FlexRay), we describe a set of steps to combine a planar odometry based on wheel sensors with a suspension model based on linear suspension sensors. The aim is to determine a more accurate estimate of the camera pose. We outline its usage for applications in both visualisation and computer vision.
△ Less
Submitted 6 May, 2021;
originally announced May 2021.
-
Spherical formulation of geometric motion segmentation constraints in fisheye cameras
Authors:
Letizia Mariotti,
Ciaran Eising
Abstract:
We introduce a visual motion segmentation method employing spherical geometry for fisheye cameras and automoated driving. Three commonly used geometric constraints in pin-hole imagery (the positive height, positive depth and epipolar constraints) are reformulated to spherical coordinates, making them invariant to specific camera configurations as long as the camera calibration is known. A fourth c…
▽ More
We introduce a visual motion segmentation method employing spherical geometry for fisheye cameras and automoated driving. Three commonly used geometric constraints in pin-hole imagery (the positive height, positive depth and epipolar constraints) are reformulated to spherical coordinates, making them invariant to specific camera configurations as long as the camera calibration is known. A fourth constraint, known as the anti-parallel constraint, is added to resolve motion-parallax ambiguity, to support the detection of moving objects undergoing parallel or near-parallel motion with respect to the host vehicle. A final constraint constraint is described, known as the spherical three-view constraint, is described though not employed in our proposed algorithm. Results are presented and analyzed that demonstrate that the proposal is an effective motion segmentation approach for direct employment on fisheye imagery.
△ Less
Submitted 26 April, 2021;
originally announced April 2021.
-
Near-field Perception for Low-Speed Vehicle Automation using Surround-view Fisheye Cameras
Authors:
Ciaran Eising,
Jonathan Horgan,
Senthil Yogamani
Abstract:
Cameras are the primary sensor in automated driving systems. They provide high information density and are optimal for detecting road infrastructure cues laid out for human vision. Surround-view camera systems typically comprise of four fisheye cameras with 190°+ field of view covering the entire 360° around the vehicle focused on near-field sensing. They are the principal sensors for low-speed, h…
▽ More
Cameras are the primary sensor in automated driving systems. They provide high information density and are optimal for detecting road infrastructure cues laid out for human vision. Surround-view camera systems typically comprise of four fisheye cameras with 190°+ field of view covering the entire 360° around the vehicle focused on near-field sensing. They are the principal sensors for low-speed, high accuracy, and close-range sensing applications, such as automated parking, traffic jam assistance, and low-speed emergency braking. In this work, we provide a detailed survey of such vision systems, setting up the survey in the context of an architecture that can be decomposed into four modular components namely Recognition, Reconstruction, Relocalization, and Reorganization. We jointly call this the 4R Architecture. We discuss how each component accomplishes a specific aspect and provide a positional argument that they can be synergized to form a complete perception system for low-speed automation. We support this argument by presenting results from previous works and by presenting architecture proposals for such a system. Qualitative results are presented in the video at https://youtu.be/ae8bCOF77uY.
△ Less
Submitted 6 June, 2023; v1 submitted 31 March, 2021;
originally announced March 2021.
-
FisheyeSuperPoint: Keypoint Detection and Description Network for Fisheye Images
Authors:
Anna Konrad,
Ciarán Eising,
Ganesh Sistu,
John McDonald,
Rudi Villing,
Senthil Yogamani
Abstract:
Keypoint detection and description is a commonly used building block in computer vision systems particularly for robotics and autonomous driving. However, the majority of techniques to date have focused on standard cameras with little consideration given to fisheye cameras which are commonly used in urban driving and automated parking. In this paper, we propose a novel training and evaluation pipe…
▽ More
Keypoint detection and description is a commonly used building block in computer vision systems particularly for robotics and autonomous driving. However, the majority of techniques to date have focused on standard cameras with little consideration given to fisheye cameras which are commonly used in urban driving and automated parking. In this paper, we propose a novel training and evaluation pipeline for fisheye images. We make use of SuperPoint as our baseline which is a self-supervised keypoint detector and descriptor that has achieved state-of-the-art results on homography estimation. We introduce a fisheye adaptation pipeline to enable training on undistorted fisheye images. We evaluate the performance on the HPatches benchmark, and, by introducing a fisheye based evaluation method for detection repeatability and descriptor matching correctness, on the Oxford RobotCar dataset.
△ Less
Submitted 29 November, 2021; v1 submitted 27 February, 2021;
originally announced March 2021.
-
Generalized Object Detection on Fisheye Cameras for Autonomous Driving: Dataset, Representations and Baseline
Authors:
Hazem Rashed,
Eslam Mohamed,
Ganesh Sistu,
Varun Ravi Kumar,
Ciaran Eising,
Ahmad El-Sallab,
Senthil Yogamani
Abstract:
Object detection is a comprehensively studied problem in autonomous driving. However, it has been relatively less explored in the case of fisheye cameras. The standard bounding box fails in fisheye cameras due to the strong radial distortion, particularly in the image's periphery. We explore better representations like oriented bounding box, ellipse, and generic polygon for object detection in fis…
▽ More
Object detection is a comprehensively studied problem in autonomous driving. However, it has been relatively less explored in the case of fisheye cameras. The standard bounding box fails in fisheye cameras due to the strong radial distortion, particularly in the image's periphery. We explore better representations like oriented bounding box, ellipse, and generic polygon for object detection in fisheye images in this work. We use the IoU metric to compare these representations using accurate instance segmentation ground truth. We design a novel curved bounding box model that has optimal properties for fisheye distortion models. We also design a curvature adaptive perimeter sampling method for obtaining polygon vertices, improving relative mAP score by 4.9% compared to uniform sampling. Overall, the proposed polygon model improves mIoU relative accuracy by 40.3%. It is the first detailed study on object detection on fisheye cameras for autonomous driving scenarios to the best of our knowledge. The dataset comprising of 10,000 images along with all the object representations ground truth will be made public to encourage further research. We summarize our work in a short video with qualitative results at https://youtu.be/iLkOzvJpL-A.
△ Less
Submitted 21 December, 2022; v1 submitted 3 December, 2020;
originally announced December 2020.