Search | arXiv e-print repository

Physically Embodied Gaussian Splatting: A Realtime Correctable World Model for Robotics

Authors: Jad Abou-Chakra, Krishan Rana, Feras Dayoub, Niko Sünderhauf

Abstract: For robots to robustly understand and interact with the physical world, it is highly beneficial to have a comprehensive representation - modelling geometry, physics, and visual observations - that informs perception, planning, and control algorithms. We propose a novel dual Gaussian-Particle representation that models the physical world while (i) enabling predictive simulation of future states and… ▽ More For robots to robustly understand and interact with the physical world, it is highly beneficial to have a comprehensive representation - modelling geometry, physics, and visual observations - that informs perception, planning, and control algorithms. We propose a novel dual Gaussian-Particle representation that models the physical world while (i) enabling predictive simulation of future states and (ii) allowing online correction from visual observations in a dynamic world. Our representation comprises particles that capture the geometrical aspect of objects in the world and can be used alongside a particle-based physics system to anticipate physically plausible future states. Attached to these particles are 3D Gaussians that render images from any viewpoint through a splatting process thus capturing the visual state. By comparing the predicted and observed images, our approach generates visual forces that correct the particle positions while respecting known physical constraints. By integrating predictive physical modelling with continuous visually-derived corrections, our unified representation reasons about the present and future while synchronizing with reality. Our system runs in realtime at 30Hz using only 3 cameras. We validate our approach on 2D and 3D tracking tasks as well as photometric reconstruction quality. Videos are found at https://embodied-gaussians.github.io/. △ Less

Submitted 15 June, 2024; originally announced June 2024.

arXiv:2405.05792 [pdf, other]

RoboHop: Segment-based Topological Map Representation for Open-World Visual Navigation

Authors: Sourav Garg, Krishan Rana, Mehdi Hosseinzadeh, Lachlan Mares, Niko Sünderhauf, Feras Dayoub, Ian Reid

Abstract: Map** is crucial for spatial reasoning, planning and robot navigation. Existing approaches range from metric, which require precise geometry-based optimization, to purely topological, where image-as-node based graphs lack explicit object-level reasoning and interconnectivity. In this paper, we propose a novel topological representation of an environment based on "image segments", which are seman… ▽ More Map** is crucial for spatial reasoning, planning and robot navigation. Existing approaches range from metric, which require precise geometry-based optimization, to purely topological, where image-as-node based graphs lack explicit object-level reasoning and interconnectivity. In this paper, we propose a novel topological representation of an environment based on "image segments", which are semantically meaningful and open-vocabulary queryable, conferring several advantages over previous works based on pixel-level features. Unlike 3D scene graphs, we create a purely topological graph with segments as nodes, where edges are formed by a) associating segment-level descriptors between pairs of consecutive images and b) connecting neighboring segments within an image using their pixel centroids. This unveils a "continuous sense of a place", defined by inter-image persistence of segments along with their intra-image neighbours. It further enables us to represent and update segment-level descriptors through neighborhood aggregation using graph convolution layers, which improves robot localization based on segment-level retrieval. Using real-world data, we show how our proposed map representation can be used to i) generate navigation plans in the form of "hops over segments" and ii) search for target objects using natural language queries describing spatial relations of objects. Furthermore, we quantitatively analyze data association at the segment level, which underpins inter-image connectivity during map** and segment-level localization when revisiting the same place. Finally, we show preliminary trials on segment-level `hop**' based zero-shot real-world navigation. Project page with supplementary details: oravus.github.io/RoboHop/ △ Less

Submitted 9 May, 2024; originally announced May 2024.

Comments: Published at ICRA 2024; 9 pages, 8 figures

arXiv:2404.11882 [pdf, other]

doi 10.1609/aaaiss.v2i1.27643

Hybrid Navigation Acceptability and Safety

Authors: Benoit Clement, Marie Dubromel, Paulo E. Santos, Karl Sammut, Michelle Oppert, Feras Dayoub

Abstract: Autonomous vessels have emerged as a prominent and accepted solution, particularly in the naval defence sector. However, achieving full autonomy for marine vessels demands the development of robust and reliable control and guidance systems that can handle various encounters with manned and unmanned vessels while operating effectively under diverse weather and sea conditions. A significant challeng… ▽ More Autonomous vessels have emerged as a prominent and accepted solution, particularly in the naval defence sector. However, achieving full autonomy for marine vessels demands the development of robust and reliable control and guidance systems that can handle various encounters with manned and unmanned vessels while operating effectively under diverse weather and sea conditions. A significant challenge in this pursuit is ensuring the autonomous vessels' compliance with the International Regulations for Preventing Collisions at Sea (COLREGs). These regulations present a formidable hurdle for the human-level understanding by autonomous systems as they were originally designed from common navigation practices created since the mid-19th century. Their ambiguous language assumes experienced sailors' interpretation and execution, and therefore demands a high-level (cognitive) understanding of language and agent intentions. These capabilities surpass the current state-of-the-art in intelligent systems. This position paper highlights the critical requirements for a trustworthy control and guidance system, exploring the complexity of adapting COLREGs for safe vessel-on-vessel encounters considering autonomous maritime technology competing and/or cooperating with manned vessels. △ Less

Submitted 17 April, 2024; originally announced April 2024.

arXiv:2403.09212 [pdf, other]

PoIFusion: Multi-Modal 3D Object Detection via Fusion at Points of Interest

Authors: Jiajun Deng, Sha Zhang, Feras Dayoub, Wanli Ouyang, Yanyong Zhang, Ian Reid

Abstract: In this work, we present PoIFusion, a simple yet effective multi-modal 3D object detection framework to fuse the information of RGB images and LiDAR point clouds at the point of interest (abbreviated as PoI). Technically, our PoIFusion follows the paradigm of query-based object detection, formulating object queries as dynamic 3D boxes. The PoIs are adaptively generated from each query box on the f… ▽ More In this work, we present PoIFusion, a simple yet effective multi-modal 3D object detection framework to fuse the information of RGB images and LiDAR point clouds at the point of interest (abbreviated as PoI). Technically, our PoIFusion follows the paradigm of query-based object detection, formulating object queries as dynamic 3D boxes. The PoIs are adaptively generated from each query box on the fly, serving as the keypoints to represent a 3D object and play the role of basic units in multi-modal fusion. Specifically, we project PoIs into the view of each modality to sample the corresponding feature and integrate the multi-modal features at each PoI through a dynamic fusion block. Furthermore, the features of PoIs derived from the same query box are aggregated together to update the query feature. Our approach prevents information loss caused by view transformation and eliminates the computation-intensive global attention, making the multi-modal 3D object detector more applicable. We conducted extensive experiments on the nuScenes dataset to evaluate our approach. Remarkably, our PoIFusion achieves 74.9\% NDS and 73.4\% mAP, setting a state-of-the-art record on the multi-modal 3D object detection benchmark. Codes will be made available via \url{https://djiajunustc.github.io/projects/poifusion}. △ Less

Submitted 14 March, 2024; originally announced March 2024.

Comments: NIL

arXiv:2402.03721 [pdf, other]

Enhancing Embodied Object Detection through Language-Image Pre-training and Implicit Object Memory

Authors: Nicolas Harvey Chapman, Feras Dayoub, Will Browne, Chris Lehnert

Abstract: Deep-learning and large scale language-image training have produced image object detectors that generalise well to diverse environments and semantic classes. However, single-image object detectors trained on internet data are not optimally tailored for the embodied conditions inherent in robotics. Instead, robots must detect objects from complex multi-modal data streams involving depth, localisati… ▽ More Deep-learning and large scale language-image training have produced image object detectors that generalise well to diverse environments and semantic classes. However, single-image object detectors trained on internet data are not optimally tailored for the embodied conditions inherent in robotics. Instead, robots must detect objects from complex multi-modal data streams involving depth, localisation and temporal correlation, a task termed embodied object detection. Paradigms such as Video Object Detection (VOD) and Semantic Map** have been proposed to leverage such embodied data streams, but existing work fails to enhance performance using language-image training. In response, we investigate how an image object detector pre-trained using language-image data can be extended to perform embodied object detection. We propose a novel implicit object memory that uses projective geometry to aggregate the features of detected objects across long temporal horizons. The spatial and temporal information accumulated in memory is then used to enhance the image features of the base detector. When tested on embodied data streams sampled from diverse indoor scenes, our approach improves the base object detector by 3.09 mAP, outperforming alternative external memories designed for VOD and Semantic Map**. Our method also shows a significant improvement of 16.90 mAP relative to baselines that perform embodied object detection without first training on language-image data, and is robust to sensor noise and domain shift experienced in real-world deployment. △ Less

Submitted 6 February, 2024; originally announced February 2024.

arXiv:2401.05594 [pdf, other]

Wasserstein Distance-based Expansion of Low-Density Latent Regions for Unknown Class Detection

Authors: Prakash Mallick, Feras Dayoub, Jamie Sherrah

Abstract: This paper addresses the significant challenge in open-set object detection (OSOD): the tendency of state-of-the-art detectors to erroneously classify unknown objects as known categories with high confidence. We present a novel approach that effectively identifies unknown objects by distinguishing between high and low-density regions in latent space. Our method builds upon the Open-Det (OD) framew… ▽ More This paper addresses the significant challenge in open-set object detection (OSOD): the tendency of state-of-the-art detectors to erroneously classify unknown objects as known categories with high confidence. We present a novel approach that effectively identifies unknown objects by distinguishing between high and low-density regions in latent space. Our method builds upon the Open-Det (OD) framework, introducing two new elements to the loss function. These elements enhance the known embedding space's clustering and expand the unknown space's low-density regions. The first addition is the Class Wasserstein Anchor (CWA), a new function that refines the classification boundaries. The second is a spectral normalisation step, improving the robustness of the model. Together, these augmentations to the existing Contrastive Feature Learner (CFL) and Unknown Probability Learner (UPL) loss functions significantly improve OSOD performance. Our proposed OpenDet-CWA (OD-CWA) method demonstrates: a) a reduction in open-set errors by approximately 17%-22%, b) an enhancement in novelty detection capability by 1.5%-16%, and c) a decrease in the wilderness index by 2%-20% across various open-set scenarios. These results represent a substantial advancement in the field, showcasing the potential of our approach in managing the complexities of open-set object detection. △ Less

Submitted 19 January, 2024; v1 submitted 10 January, 2024; originally announced January 2024.

Comments: 8 Full length pages, followed by 2 supplementary pages, total of 9 Figures

arXiv:2312.08673 [pdf, other]

Segment Beyond View: Handling Partially Missing Modality for Audio-Visual Semantic Segmentation

Authors: Renjie Wu, Hu Wang, Feras Dayoub, Hsiang-Ting Chen

Abstract: Augmented Reality (AR) devices, emerging as prominent mobile interaction platforms, face challenges in user safety, particularly concerning oncoming vehicles. While some solutions leverage onboard camera arrays, these cameras often have limited field-of-view (FoV) with front or downward perspectives. Addressing this, we propose a new out-of-view semantic segmentation task and Segment Beyond View (… ▽ More Augmented Reality (AR) devices, emerging as prominent mobile interaction platforms, face challenges in user safety, particularly concerning oncoming vehicles. While some solutions leverage onboard camera arrays, these cameras often have limited field-of-view (FoV) with front or downward perspectives. Addressing this, we propose a new out-of-view semantic segmentation task and Segment Beyond View (SBV), a novel audio-visual semantic segmentation method. SBV supplements the visual modality, which miss the information beyond FoV, with the auditory information using a teacher-student distillation model (Omni2Ego). The model consists of a vision teacher utilising panoramic information, an auditory teacher with 8-channel audio, and an audio-visual student that takes views with limited FoV and binaural audio as input and produce semantic segmentation for objects outside FoV. SBV outperforms existing models in comparative evaluations and shows a consistent performance across varying FoV ranges and in monaural audio settings. △ Less

Submitted 23 January, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

Comments: Accepted by AAAI-24

arXiv:2310.19258 [pdf, other]

Improving Online Source-free Domain Adaptation for Object Detection by Unsupervised Data Acquisition

Authors: Xiangyu Shi, Yanyuan Qiao, Qi Wu, Lingqiao Liu, Feras Dayoub

Abstract: Effective object detection in mobile robots is challenged by deployment in diverse and unfamiliar environments. Online Source-Free Domain Adaptation (O-SFDA) offers model adaptation using a stream of unlabeled data from a target domain in online manner. However, not all captured frames contain information that is beneficial for adaptation, particularly when there is a strong class imbalance. This… ▽ More Effective object detection in mobile robots is challenged by deployment in diverse and unfamiliar environments. Online Source-Free Domain Adaptation (O-SFDA) offers model adaptation using a stream of unlabeled data from a target domain in online manner. However, not all captured frames contain information that is beneficial for adaptation, particularly when there is a strong class imbalance. This paper introduces a novel approach to enhance O-SFDA for adaptive object detection in mobile robots via unsupervised data acquisition. Our methodology prioritizes the most informative unlabeled frames for inclusion in the online training process. Empirical evaluation on a real-world dataset reveals that our method outperforms existing state-of-the-art O-SFDA techniques, demonstrating the viability of unsupervised data acquisition for improving adaptive object detection in mobile robots. △ Less

Submitted 24 March, 2024; v1 submitted 30 October, 2023; originally announced October 2023.

arXiv:2305.01163 [pdf, other]

Federated Neural Radiance Fields

Authors: Lachlan Holden, Feras Dayoub, David Harvey, Tat-Jun Chin

Abstract: The ability of neural radiance fields or NeRFs to conduct accurate 3D modelling has motivated application of the technique to scene representation. Previous approaches have mainly followed a centralised learning paradigm, which assumes that all training images are available on one compute node for training. In this paper, we consider training NeRFs in a federated manner, whereby multiple compute n… ▽ More The ability of neural radiance fields or NeRFs to conduct accurate 3D modelling has motivated application of the technique to scene representation. Previous approaches have mainly followed a centralised learning paradigm, which assumes that all training images are available on one compute node for training. In this paper, we consider training NeRFs in a federated manner, whereby multiple compute nodes, each having acquired a distinct set of observations of the overall scene, learn a common NeRF in parallel. This supports the scenario of cooperatively modelling a scene using multiple agents. Our contribution is the first federated learning algorithm for NeRF, which splits the training effort across multiple compute nodes and obviates the need to pool the images at a central node. A technique based on low-rank decomposition of NeRF layers is introduced to reduce bandwidth consumption to transmit the model parameters for aggregation. Transferring compressed models instead of the raw data also contributes to the privacy of the data collecting agents. △ Less

Submitted 1 May, 2023; originally announced May 2023.

Comments: 10 pages, 7 figures

arXiv:2303.14930 [pdf, other]

Addressing the Challenges of Open-World Object Detection

Authors: David Pershouse, Feras Dayoub, Dimity Miller, Niko Sünderhauf

Abstract: We address the challenging problem of open world object detection (OWOD), where object detectors must identify objects from known classes while also identifying and continually learning to detect novel objects. Prior work has resulted in detectors that have a relatively low ability to detect novel objects, and a high likelihood of classifying a novel object as one of the known classes. We approach… ▽ More We address the challenging problem of open world object detection (OWOD), where object detectors must identify objects from known classes while also identifying and continually learning to detect novel objects. Prior work has resulted in detectors that have a relatively low ability to detect novel objects, and a high likelihood of classifying a novel object as one of the known classes. We approach the problem by identifying the three main challenges that OWOD presents and introduce OW-RCNN, an open world object detector that addresses each of these three challenges. OW-RCNN establishes a new state of the art using the open-world evaluation protocol on MS-COCO, showing a drastically increased ability to detect novel objects (16-21% absolute increase in U-Recall), to avoid their misclassification as one of the known classes (up to 52% reduction in A-OSE), and to incrementally learn to detect them while maintaining performance on previously known classes (1-6% absolute increase in mAP). △ Less

Submitted 27 March, 2023; originally announced March 2023.

arXiv:2302.06039 [pdf, other]

doi 10.1109/LRA.2023.3290420

Predicting Class Distribution Shift for Reliable Domain Adaptive Object Detection

Authors: Nicolas Harvey Chapman, Feras Dayoub, Will Browne, Christopher Lehnert

Abstract: Unsupervised Domain Adaptive Object Detection (UDA-OD) uses unlabelled data to improve the reliability of robotic vision systems in open-world environments. Previous approaches to UDA-OD based on self-training have been effective in overcoming changes in the general appearance of images. However, shifts in a robot's deployment environment can also impact the likelihood that different objects will… ▽ More Unsupervised Domain Adaptive Object Detection (UDA-OD) uses unlabelled data to improve the reliability of robotic vision systems in open-world environments. Previous approaches to UDA-OD based on self-training have been effective in overcoming changes in the general appearance of images. However, shifts in a robot's deployment environment can also impact the likelihood that different objects will occur, termed class distribution shift. Motivated by this, we propose a framework for explicitly addressing class distribution shift to improve pseudo-label reliability in self-training. Our approach uses the domain invariance and contextual understanding of a pre-trained joint vision and language model to predict the class distribution of unlabelled data. By aligning the class distribution of pseudo-labels with this prediction, we provide weak supervision of pseudo-label accuracy. To further account for low quality pseudo-labels early in self-training, we propose an approach to dynamically adjust the number of pseudo-labels per image based on model confidence. Our method outperforms state-of-the-art approaches on several benchmarks, including a 4.7 mAP improvement when facing challenging class distribution shift. △ Less

Submitted 28 August, 2023; v1 submitted 12 February, 2023; originally announced February 2023.

Journal ref: IEEE Robotics and Automation Letters, vol. 8, no. 8, pp. 5084-5091, Aug. 2023

arXiv:2211.04041 [pdf, other]

ParticleNeRF: A Particle-Based Encoding for Online Neural Radiance Fields

Authors: Jad Abou-Chakra, Feras Dayoub, Niko Sünderhauf

Abstract: While existing Neural Radiance Fields (NeRFs) for dynamic scenes are offline methods with an emphasis on visual fidelity, our paper addresses the online use case that prioritises real-time adaptability. We present ParticleNeRF, a new approach that dynamically adapts to changes in the scene geometry by learning an up-to-date representation online, every 200ms. ParticleNeRF achieves this using a nov… ▽ More While existing Neural Radiance Fields (NeRFs) for dynamic scenes are offline methods with an emphasis on visual fidelity, our paper addresses the online use case that prioritises real-time adaptability. We present ParticleNeRF, a new approach that dynamically adapts to changes in the scene geometry by learning an up-to-date representation online, every 200ms. ParticleNeRF achieves this using a novel particle-based parametric encoding. We couple features to particles in space and backpropagate the photometric reconstruction loss into the particles' position gradients, which are then interpreted as velocity vectors. Governed by a lightweight physics system to handle collisions, this lets the features move freely with the changing scene geometry. We demonstrate ParticleNeRF on various dynamic scenes containing translating, rotating, articulated, and deformable objects. ParticleNeRF is the first online dynamic NeRF and achieves fast adaptability with better visual fidelity than brute-force online InstantNGP and other baseline approaches on dynamic scenes with online constraints. Videos of our system can be found at our project website https://sites.google.com/view/particlenerf. △ Less

Submitted 24 March, 2023; v1 submitted 8 November, 2022; originally announced November 2022.

arXiv:2208.13930 [pdf, other]

SAFE: Sensitivity-Aware Features for Out-of-Distribution Object Detection

Authors: Samuel Wilson, Tobias Fischer, Feras Dayoub, Dimity Miller, Niko Sünderhauf

Abstract: We address the problem of out-of-distribution (OOD) detection for the task of object detection. We show that residual convolutional layers with batch normalisation produce Sensitivity-Aware FEatures (SAFE) that are consistently powerful for distinguishing in-distribution from out-of-distribution detections. We extract SAFE vectors for every detected object, and train a multilayer perceptron on the… ▽ More We address the problem of out-of-distribution (OOD) detection for the task of object detection. We show that residual convolutional layers with batch normalisation produce Sensitivity-Aware FEatures (SAFE) that are consistently powerful for distinguishing in-distribution from out-of-distribution detections. We extract SAFE vectors for every detected object, and train a multilayer perceptron on the surrogate task of distinguishing adversarially perturbed from clean in-distribution examples. This circumvents the need for realistic OOD training data, computationally expensive generative models, or retraining of the base object detector. SAFE outperforms the state-of-the-art OOD object detectors on multiple benchmarks by large margins, e.g. reducing the FPR95 by an absolute 30.6% from 48.3% to 17.7% on the OpenImages dataset. △ Less

Submitted 22 August, 2023; v1 submitted 29 August, 2022; originally announced August 2022.

Journal ref: IEEE International Conference on Computer Vision 2023

arXiv:2204.10516 [pdf, other]

Implicit Object Map** With Noisy Data

Authors: Jad Abou-Chakra, Feras Dayoub, Niko Sünderhauf

Abstract: Modelling individual objects in a scene as Neural Radiance Fields (NeRFs) provides an alternative geometric scene representation that may benefit downstream robotics tasks such as scene understanding and object manipulation. However, we identify three challenges to using real-world training data collected by a robot to train a NeRF: (i) The camera trajectories are constrained, and full visual cove… ▽ More Modelling individual objects in a scene as Neural Radiance Fields (NeRFs) provides an alternative geometric scene representation that may benefit downstream robotics tasks such as scene understanding and object manipulation. However, we identify three challenges to using real-world training data collected by a robot to train a NeRF: (i) The camera trajectories are constrained, and full visual coverage is not guaranteed - especially when obstructions to the objects of interest are present; (ii) the poses associated with the images are noisy due to odometry or localization noise; (iii) the objects are not easily isolated from the background. This paper evaluates the extent to which above factors degrade the quality of the learnt implicit object representation. We introduce a pipeline that decomposes a scene into multiple individual object-NeRFs, using noisy object instance masks and bounding boxes, and evaluate the sensitivity of this pipeline with respect to noisy poses, instance masks, and the number of training images. We uncover that the sensitivity to noisy instance masks can be partially alleviated with depth supervision and quantify the importance of including the camera extrinsics in the NeRF optimisation process. △ Less

Submitted 7 October, 2022; v1 submitted 22 April, 2022; originally announced April 2022.

arXiv:2112.05341 [pdf, other]

Hyperdimensional Feature Fusion for Out-Of-Distribution Detection

Authors: Samuel Wilson, Tobias Fischer, Niko Sünderhauf, Feras Dayoub

Abstract: We introduce powerful ideas from Hyperdimensional Computing into the challenging field of Out-of-Distribution (OOD) detection. In contrast to most existing work that performs OOD detection based on only a single layer of a neural network, we use similarity-preserving semi-orthogonal projection matrices to project the feature maps from multiple layers into a common vector space. By repeatedly apply… ▽ More We introduce powerful ideas from Hyperdimensional Computing into the challenging field of Out-of-Distribution (OOD) detection. In contrast to most existing work that performs OOD detection based on only a single layer of a neural network, we use similarity-preserving semi-orthogonal projection matrices to project the feature maps from multiple layers into a common vector space. By repeatedly applying the bundling operation $\oplus$, we create expressive class-specific descriptor vectors for all in-distribution classes. At test time, a simple and efficient cosine similarity calculation between descriptor vectors consistently identifies OOD samples with better performance than the current state-of-the-art. We show that the hyperdimensional fusion of multiple network layers is critical to achieve best general performance. △ Less

Submitted 29 August, 2022; v1 submitted 10 December, 2021; originally announced December 2021.

Comments: Accepted to WACV2023

arXiv:2109.07748 [pdf, other]

doi 10.1109/IROS51168.2021.9636271

Evaluating the Impact of Semantic Segmentation and Pose Estimation on Dense Semantic SLAM

Authors: Suman Raj Bista, David Hall, Ben Talbot, Haoyang Zhang, Feras Dayoub, Niko Sünderhauf

Abstract: Recent Semantic SLAM methods combine classical geometry-based estimation with deep learning-based object detection or semantic segmentation. In this paper we evaluate the quality of semantic maps generated by state-of-the-art class- and instance-aware dense semantic SLAM algorithms whose codes are publicly available and explore the impacts both semantic segmentation and pose estimation have on the… ▽ More Recent Semantic SLAM methods combine classical geometry-based estimation with deep learning-based object detection or semantic segmentation. In this paper we evaluate the quality of semantic maps generated by state-of-the-art class- and instance-aware dense semantic SLAM algorithms whose codes are publicly available and explore the impacts both semantic segmentation and pose estimation have on the quality of semantic maps. We obtain these results by providing algorithms with ground-truth pose and/or semantic segmentation data available from simulated environments. We establish that semantic segmentation is the largest source of error through our experiments, drop** mAP and OMQ performance by up to 74.3% and 71.3% respectively. △ Less

Submitted 16 September, 2021; originally announced September 2021.

Comments: Paper accepted to IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2021

arXiv:2108.08748 [pdf, other]

FSNet: A Failure Detection Framework for Semantic Segmentation

Authors: Quazi Marufur Rahman, Niko Sünderhauf, Peter Corke, Feras Dayoub

Abstract: Semantic segmentation is an important task that helps autonomous vehicles understand their surroundings and navigate safely. During deployment, even the most mature segmentation models are vulnerable to various external factors that can degrade the segmentation performance with potentially catastrophic consequences for the vehicle and its surroundings. To address this issue, we propose a failure d… ▽ More Semantic segmentation is an important task that helps autonomous vehicles understand their surroundings and navigate safely. During deployment, even the most mature segmentation models are vulnerable to various external factors that can degrade the segmentation performance with potentially catastrophic consequences for the vehicle and its surroundings. To address this issue, we propose a failure detection framework to identify pixel-level misclassification. We do so by exploiting internal features of the segmentation model and training it simultaneously with a failure detection network. During deployment, the failure detector can flag areas in the image where the segmentation model have failed to segment correctly. We evaluate the proposed approach against state-of-the-art methods and achieve 12.30%, 9.46%, and 9.65% performance improvement in the AUPR-Error metric for Cityscapes, BDD100K, and Mapillary semantic segmentation datasets. △ Less

Submitted 27 September, 2021; v1 submitted 19 August, 2021; originally announced August 2021.

arXiv:2107.11566 [pdf, other]

Going Deeper into Semi-supervised Person Re-identification

Authors: Olga Moskvyak, Frederic Maire, Feras Dayoub, Mahsa Baktashmotlagh

Abstract: Person re-identification is the challenging task of identifying a person across different camera views. Training a convolutional neural network (CNN) for this task requires annotating a large dataset, and hence, it involves the time-consuming manual matching of people across cameras. To reduce the need for labeled data, we focus on a semi-supervised approach that requires only a subset of the trai… ▽ More Person re-identification is the challenging task of identifying a person across different camera views. Training a convolutional neural network (CNN) for this task requires annotating a large dataset, and hence, it involves the time-consuming manual matching of people across cameras. To reduce the need for labeled data, we focus on a semi-supervised approach that requires only a subset of the training data to be labeled. We conduct a comprehensive survey in the area of person re-identification with limited labels. Existing works in this realm are limited in the sense that they utilize features from multiple CNNs and require the number of identities in the unlabeled data to be known. To overcome these limitations, we propose to employ part-based features from a single CNN without requiring the knowledge of the label space (i.e., the number of identities). This makes our approach more suitable for practical scenarios, and it significantly reduces the need for computational resources. We also propose a PartMixUp loss that improves the discriminative ability of learned part-based features for pseudo-labeling in semi-supervised settings. Our method outperforms the state-of-the-art results on three large-scale person re-id datasets and achieves the same level of performance as fully supervised methods with only one-third of labeled identities. △ Less

Submitted 24 July, 2021; originally announced July 2021.

arXiv:2104.01328 [pdf, other]

doi 10.1109/LRA.2021.3123374

Uncertainty for Identifying Open-Set Errors in Visual Object Detection

Authors: Dimity Miller, Niko Sünderhauf, Michael Milford, Feras Dayoub

Abstract: Deployed into an open world, object detectors are prone to open-set errors, false positive detections of object classes not present in the training dataset. We propose GMM-Det, a real-time method for extracting epistemic uncertainty from object detectors to identify and reject open-set errors. GMM-Det trains the detector to produce a structured logit space that is modelled with class-specific Gaus… ▽ More Deployed into an open world, object detectors are prone to open-set errors, false positive detections of object classes not present in the training dataset. We propose GMM-Det, a real-time method for extracting epistemic uncertainty from object detectors to identify and reject open-set errors. GMM-Det trains the detector to produce a structured logit space that is modelled with class-specific Gaussian Mixture Models. At test time, open-set errors are identified by their low log-probability under all Gaussian Mixture Models. We test two common detector architectures, Faster R-CNN and RetinaNet, across three varied datasets spanning robotics and computer vision. Our results show that GMM-Det consistently outperforms existing uncertainty techniques for identifying and rejecting open-set detections, especially at the low-error-rate operating point required for safety-critical applications. GMM-Det maintains object detection performance, and introduces only minimal computational overhead. We also introduce a methodology for converting existing object detection datasets into specific open-set datasets to evaluate open-set performance in object detection. △ Less

Submitted 11 November, 2021; v1 submitted 3 April, 2021; originally announced April 2021.

Journal ref: IEEE Robotics and Automation Letters (January 2022), Volume 7, Issue 1, pages 215-222, ISSN 2377-3766

arXiv:2101.07988 [pdf, other]

Semi-supervised Keypoint Localization

Authors: Olga Moskvyak, Frederic Maire, Feras Dayoub, Mahsa Baktashmotlagh

Abstract: Knowledge about the locations of keypoints of an object in an image can assist in fine-grained classification and identification tasks, particularly for the case of objects that exhibit large variations in poses that greatly influence their visual appearance, such as wild animals. However, supervised training of a keypoint detection network requires annotating a large image dataset for each animal… ▽ More Knowledge about the locations of keypoints of an object in an image can assist in fine-grained classification and identification tasks, particularly for the case of objects that exhibit large variations in poses that greatly influence their visual appearance, such as wild animals. However, supervised training of a keypoint detection network requires annotating a large image dataset for each animal species, which is a labor-intensive task. To reduce the need for labeled data, we propose to learn simultaneously keypoint heatmaps and pose invariant keypoint representations in a semi-supervised manner using a small set of labeled images along with a larger set of unlabeled images. Keypoint representations are learnt with a semantic keypoint consistency constraint that forces the keypoint detection network to learn similar features for the same keypoint across the dataset. Pose invariance is achieved by making keypoint representations for the image and its augmented copies closer together in feature space. Our semi-supervised approach significantly outperforms previous methods on several benchmarks for human and animal body landmark localization. △ Less

Submitted 20 January, 2021; originally announced January 2021.

Comments: accepted to ICLR 2021

arXiv:2101.01364 [pdf]

doi 10.1109/ACCESS.2021.3055015

Run-Time Monitoring of Machine Learning for Robotic Perception: A Survey of Emerging Trends

Authors: Quazi Marufur Rahman, Peter Corke, Feras Dayoub

Abstract: As deep learning continues to dominate all state-of-the-art computer vision tasks, it is increasingly becoming an essential building block for robotic perception. This raises important questions concerning the safety and reliability of learning-based perception systems. There is an established field that studies safety certification and convergence guarantees of complex software systems at design-… ▽ More As deep learning continues to dominate all state-of-the-art computer vision tasks, it is increasingly becoming an essential building block for robotic perception. This raises important questions concerning the safety and reliability of learning-based perception systems. There is an established field that studies safety certification and convergence guarantees of complex software systems at design-time. However, the unknown future deployment environments of an autonomous system and the complexity of learning-based perception make the generalization of design-time verification to run-time problematic. In the face of this challenge, more attention is starting to focus on run-time monitoring of performance and reliability of perception systems with several trends emerging in the literature. This paper attempts to identify these trends and summarise the various approaches to the topic. △ Less

Submitted 11 July, 2021; v1 submitted 5 January, 2021; originally announced January 2021.

Comments: Updated version of 10.1109/ACCESS.2021.3055015. Published at IEEE Access. 27 January 2021

arXiv:2101.00443 [pdf, ps, other]

doi 10.1561/2300000059

Semantics for Robotic Map**, Perception and Interaction: A Survey

Authors: Sourav Garg, Niko Sünderhauf, Feras Dayoub, Douglas Morrison, Akansel Cosgun, Gustavo Carneiro, Qi Wu, Tat-Jun Chin, Ian Reid, Stephen Gould, Peter Corke, Michael Milford

Abstract: For robots to navigate and interact more richly with the world around them, they will likely require a deeper understanding of the world in which they operate. In robotics and related research fields, the study of understanding is often referred to as semantics, which dictates what does the world "mean" to a robot, and is strongly tied to the question of how to represent that meaning. With humans… ▽ More For robots to navigate and interact more richly with the world around them, they will likely require a deeper understanding of the world in which they operate. In robotics and related research fields, the study of understanding is often referred to as semantics, which dictates what does the world "mean" to a robot, and is strongly tied to the question of how to represent that meaning. With humans and robots increasingly operating in the same world, the prospects of human-robot interaction also bring semantics and ontology of natural language into the picture. Driven by need, as well as by enablers like increasing availability of training data and computational resources, semantics is a rapidly growing research area in robotics. The field has received significant attention in the research literature to date, but most reviews and surveys have focused on particular aspects of the topic: the technical research issues regarding its use in specific robotic topics like map** or segmentation, or its relevance to one particular application domain like autonomous driving. A new treatment is therefore required, and is also timely because so much relevant research has occurred since many of the key surveys were published. This survey therefore provides an overarching snapshot of where semantics in robotics stands today. We establish a taxonomy for semantics research in or relevant to robotics, split into four broad categories of activity, in which semantics are extracted, used, or both. Within these broad categories we survey dozens of major topics including fundamentals from the computer vision field and key robotics research areas utilizing semantics, including map**, navigation and interaction with the world. The survey also covers key practical considerations, including enablers like increased data availability and improved computational hardware, and major application areas where... △ Less

Submitted 2 January, 2021; originally announced January 2021.

Comments: 81 pages, 1 figure, published in Foundations and Trends in Robotics, 2020

Journal ref: Foundations and Trends in Robotics: Vol. 8: No. 1-2, pp 1-224 (2020)

arXiv:2012.12645 [pdf, other]

SWA Object Detection

Authors: Haoyang Zhang, Ying Wang, Feras Dayoub, Niko Sünderhauf

Abstract: Do you want to improve 1.0 AP for your object detector without any inference cost and any change to your detector? Let us tell you such a recipe. It is surprisingly simple: train your detector for an extra 12 epochs using cyclical learning rates and then average these 12 checkpoints as your final detection model}. This potent recipe is inspired by Stochastic Weights Averaging (SWA), which is propo… ▽ More Do you want to improve 1.0 AP for your object detector without any inference cost and any change to your detector? Let us tell you such a recipe. It is surprisingly simple: train your detector for an extra 12 epochs using cyclical learning rates and then average these 12 checkpoints as your final detection model}. This potent recipe is inspired by Stochastic Weights Averaging (SWA), which is proposed in arXiv:1803.05407 for improving generalization in deep neural networks. We found it also very effective in object detection. In this technique report, we systematically investigate the effects of applying SWA to object detection as well as instance segmentation. Through extensive experiments, we discover the aforementioned workable policy of performing SWA in object detection, and we consistently achieve $\sim$1.0 AP improvement over various popular detectors on the challenging COCO benchmark, including Mask RCNN, Faster RCNN, RetinaNet, FCOS, YOLOv3 and VFNet. We hope this work will make more researchers in object detection know this technique and help them train better object detectors. Code is available at: https://github.com/hyz-xmaster/swa_object_detection . △ Less

Submitted 11 March, 2021; v1 submitted 23 December, 2020; originally announced December 2020.

Comments: 9 pages; polished

arXiv:2011.07750 [pdf, other]

Online Monitoring of Object Detection Performance During Deployment

Authors: Quazi Marufur Rahman, Niko Sünderhauf, Feras Dayoub

Abstract: During deployment, an object detector is expected to operate at a similar performance level reported on its testing dataset. However, when deployed onboard mobile robots that operate under varying and complex environmental conditions, the detector's performance can fluctuate and occasionally degrade severely without warning. Undetected, this can lead the robot to take unsafe and risky actions base… ▽ More During deployment, an object detector is expected to operate at a similar performance level reported on its testing dataset. However, when deployed onboard mobile robots that operate under varying and complex environmental conditions, the detector's performance can fluctuate and occasionally degrade severely without warning. Undetected, this can lead the robot to take unsafe and risky actions based on low-quality and unreliable object detections. We address this problem and introduce a cascaded neural network that monitors the performance of the object detector by predicting the quality of its mean average precision (mAP) on a sliding window of the input frames. The proposed cascaded network exploits the internal features from the deep neural network of the object detector. We evaluate our proposed approach using different combinations of autonomous driving datasets and object detectors. △ Less

Submitted 9 March, 2021; v1 submitted 16 November, 2020; originally announced November 2020.

Comments: V2 with more experimental results and improved clarity of presentation

arXiv:2009.08650 [pdf, other]

Per-frame mAP Prediction for Continuous Performance Monitoring of Object Detection During Deployment

Authors: Quazi Marufur Rahman, Niko Sünderhauf, Feras Dayoub

Abstract: Performance monitoring of object detection is crucial for safety-critical applications such as autonomous vehicles that operate under varying and complex environmental conditions. Currently, object detectors are evaluated using summary metrics based on a single dataset that is assumed to be representative of all future deployment conditions. In practice, this assumption does not hold, and the perf… ▽ More Performance monitoring of object detection is crucial for safety-critical applications such as autonomous vehicles that operate under varying and complex environmental conditions. Currently, object detectors are evaluated using summary metrics based on a single dataset that is assumed to be representative of all future deployment conditions. In practice, this assumption does not hold, and the performance fluctuates as a function of the deployment conditions. To address this issue, we propose an introspection approach to performance monitoring during deployment without the need for ground truth data. We do so by predicting when the per-frame mean average precision drops below a critical threshold using the detector's internal features. We quantitatively evaluate and demonstrate our method's ability to reduce risk by trading off making an incorrect decision by raising the alarm and absenting from detection. △ Less

Submitted 16 November, 2020; v1 submitted 18 September, 2020; originally announced September 2020.

arXiv:2009.05246 [pdf, other]

The Robotic Vision Scene Understanding Challenge

Authors: David Hall, Ben Talbot, Suman Raj Bista, Haoyang Zhang, Rohan Smith, Feras Dayoub, Niko Sünderhauf

Abstract: Being able to explore an environment and understand the location and type of all objects therein is important for indoor robotic platforms that must interact closely with humans. However, it is difficult to evaluate progress in this area due to a lack of standardized testing which is limited due to the need for active robot agency and perfect object ground-truth. To help provide a standard for tes… ▽ More Being able to explore an environment and understand the location and type of all objects therein is important for indoor robotic platforms that must interact closely with humans. However, it is difficult to evaluate progress in this area due to a lack of standardized testing which is limited due to the need for active robot agency and perfect object ground-truth. To help provide a standard for testing scene understanding systems, we present a new robot vision scene understanding challenge using simulation to enable repeatable experiments with active robot agency. We provide two challenging task types, three difficulty levels, five simulated environments and a new evaluation measure for evaluating 3D cuboid object maps. Our aim is to drive state-of-the-art research in scene understanding through enabling evaluation and comparison of active robotic vision systems. △ Less

Submitted 11 September, 2020; originally announced September 2020.

arXiv:2008.13367 [pdf, other]

VarifocalNet: An IoU-aware Dense Object Detector

Authors: Haoyang Zhang, Ying Wang, Feras Dayoub, Niko Sünderhauf

Abstract: Accurately ranking the vast number of candidate detections is crucial for dense object detectors to achieve high performance. Prior work uses the classification score or a combination of classification and predicted localization scores to rank candidates. However, neither option results in a reliable ranking, thus degrading detection performance. In this paper, we propose to learn an Iou-aware Cla… ▽ More Accurately ranking the vast number of candidate detections is crucial for dense object detectors to achieve high performance. Prior work uses the classification score or a combination of classification and predicted localization scores to rank candidates. However, neither option results in a reliable ranking, thus degrading detection performance. In this paper, we propose to learn an Iou-aware Classification Score (IACS) as a joint representation of object presence confidence and localization accuracy. We show that dense object detectors can achieve a more accurate ranking of candidate detections based on the IACS. We design a new loss function, named Varifocal Loss, to train a dense object detector to predict the IACS, and propose a new star-shaped bounding box feature representation for IACS prediction and bounding box refinement. Combining these two new components and a bounding box refinement branch, we build an IoU-aware dense object detector based on the FCOS+ATSS architecture, that we call VarifocalNet or VFNet for short. Extensive experiments on MS COCO show that our VFNet consistently surpasses the strong baseline by $\sim$2.0 AP with different backbones. Our best model VFNet-X-1200 with Res2Net-101-DCN achieves a single-model single-scale AP of 55.1 on COCO test-dev, which is state-of-the-art among various object detectors.Code is available at https://github.com/hyz-xmaster/VarifocalNet . △ Less

Submitted 4 March, 2021; v1 submitted 31 August, 2020; originally announced August 2020.

Comments: Accepted to CVPR 2021 as an oral

arXiv:2008.11368 [pdf, other]

Keypoint-Aligned Embeddings for Image Retrieval and Re-identification

Authors: Olga Moskvyak, Frederic Maire, Feras Dayoub, Mahsa Baktashmotlagh

Abstract: Learning embeddings that are invariant to the pose of the object is crucial in visual image retrieval and re-identification. The existing approaches for person, vehicle, or animal re-identification tasks suffer from high intra-class variance due to deformable shapes and different camera viewpoints. To overcome this limitation, we propose to align the image embedding with a predefined order of the… ▽ More Learning embeddings that are invariant to the pose of the object is crucial in visual image retrieval and re-identification. The existing approaches for person, vehicle, or animal re-identification tasks suffer from high intra-class variance due to deformable shapes and different camera viewpoints. To overcome this limitation, we propose to align the image embedding with a predefined order of the keypoints. The proposed keypoint aligned embeddings model (KAE-Net) learns part-level features via multi-task learning which is guided by keypoint locations. More specifically, KAE-Net extracts channels from a feature map activated by a specific keypoint through learning the auxiliary task of heatmap reconstruction for this keypoint. The KAE-Net is compact, generic and conceptually simple. It achieves state of the art performance on the benchmark datasets of CUB-200-2011, Cars196 and VeRi-776 for retrieval and re-identification tasks. △ Less

Submitted 25 August, 2020; originally announced August 2020.

Comments: 8 pages, 7 figures, accepted to WACV 2021

arXiv:2008.00635 [pdf, other]

BenchBot: Evaluating Robotics Research in Photorealistic 3D Simulation and on Real Robots

Authors: Ben Talbot, David Hall, Haoyang Zhang, Suman Raj Bista, Rohan Smith, Feras Dayoub, Niko Sünderhauf

Abstract: We introduce BenchBot, a novel software suite for benchmarking the performance of robotics research across both photorealistic 3D simulations and real robot platforms. BenchBot provides a simple interface to the sensorimotor capabilities of a robot when solving robotics research problems; an interface that is consistent regardless of whether the target platform is simulated or a real robot. In thi… ▽ More We introduce BenchBot, a novel software suite for benchmarking the performance of robotics research across both photorealistic 3D simulations and real robot platforms. BenchBot provides a simple interface to the sensorimotor capabilities of a robot when solving robotics research problems; an interface that is consistent regardless of whether the target platform is simulated or a real robot. In this paper we outline the BenchBot system architecture, and explore the parallels between its user-centric design and an ideal research development process devoid of tangential robot engineering challenges. The paper describes the research benefits of using the BenchBot system, including: enhanced capacity to focus solely on research problems, direct quantitative feedback to inform research development, tools for deriving comprehensive performance characteristics, and submission formats which promote sharability and repeatability of research outcomes. BenchBot is publicly available (http://benchbot.org), and we encourage its use in the research community for comprehensively evaluating the simulated and real world performance of novel robotic algorithms. △ Less

Submitted 2 August, 2020; originally announced August 2020.

Comments: Future submission to RAL; software available at http://benchbot.org

arXiv:2004.02434 [pdf, other]

Class Anchor Clustering: a Loss for Distance-based Open Set Recognition

Authors: Dimity Miller, Niko Sünderhauf, Michael Milford, Feras Dayoub

Abstract: In open set recognition, deep neural networks encounter object classes that were unknown during training. Existing open set classifiers distinguish between known and unknown classes by measuring distance in a network's logit space, assuming that known classes cluster closer to the training data than unknown classes. However, this approach is applied post-hoc to networks trained with cross-entropy… ▽ More In open set recognition, deep neural networks encounter object classes that were unknown during training. Existing open set classifiers distinguish between known and unknown classes by measuring distance in a network's logit space, assuming that known classes cluster closer to the training data than unknown classes. However, this approach is applied post-hoc to networks trained with cross-entropy loss, which does not guarantee this clustering behaviour. To overcome this limitation, we introduce the Class Anchor Clustering (CAC) loss. CAC is a distance-based loss that explicitly trains known classes to form tight clusters around anchored class-dependent centres in the logit space. We show that training with CAC achieves state-of-the-art performance for distance-based open set classifiers on all six standard benchmark datasets, with a 15.2% AUROC increase on the challenging TinyImageNet, without sacrificing classification accuracy. We also show that our anchored class centres achieve higher open set performance than learnt class centres, particularly on object-based datasets and large numbers of training classes. △ Less

Submitted 2 March, 2021; v1 submitted 6 April, 2020; originally announced April 2020.

Comments: Published at 2021 IEEE Winter Conference on Applications of Computer Vision (WACV)

arXiv:2001.11684 [pdf, other]

doi 10.1109/TCDS.2020.2993855

Robot Navigation in Unseen Spaces using an Abstract Map

Authors: Ben Talbot, Feras Dayoub, Peter Corke, Gordon Wyeth

Abstract: Human navigation in built environments depends on symbolic spatial information which has unrealised potential to enhance robot navigation capabilities. Information sources such as labels, signs, maps, planners, spoken directions, and navigational gestures communicate a wealth of spatial information to the navigators of built environments; a wealth of information that robots typically ignore. We pr… ▽ More Human navigation in built environments depends on symbolic spatial information which has unrealised potential to enhance robot navigation capabilities. Information sources such as labels, signs, maps, planners, spoken directions, and navigational gestures communicate a wealth of spatial information to the navigators of built environments; a wealth of information that robots typically ignore. We present a robot navigation system that uses the same symbolic spatial information employed by humans to purposefully navigate in unseen built environments with a level of performance comparable to humans. The navigation system uses a novel data structure called the abstract map to imagine malleable spatial models for unseen spaces from spatial symbols. Sensorimotor perceptions from a robot are then employed to provide purposeful navigation to symbolic goal locations in the unseen environment. We show how a dynamic system can be used to create malleable spatial models for the abstract map, and provide an open source implementation to encourage future work in the area of symbolic navigation. Symbolic navigation performance of humans and a robot is evaluated in a real-world built environment. The paper concludes with a qualitative analysis of human navigation strategies, providing further insights into how the symbolic navigation capabilities of robots in unseen built environments can be improved in the future. △ Less

Submitted 15 May, 2020; v1 submitted 31 January, 2020; originally announced January 2020.

Comments: 15 pages, published in IEEE Transactions on Cognitive and Developmental Systems (http://doi.org/10.1109/TCDS.2020.2993855), see https://btalb.github.io/abstract_map/ for access to software

arXiv:2001.05650 [pdf, ps, other]

Control of the Final-Phase of Closed-Loop Visual Gras** using Image-Based Visual Servoing

Authors: Jesse Haviland, Feras Dayoub, Peter Corke

Abstract: This paper considers the final approach phase of visual-closed-loop gras** where the RGB-D camera is no longer able to provide valid depth information. Many current robotic gras** controllers are not closed-loop and therefore fail for moving objects. Closed-loop grasp controllers based on RGB-D imagery can track a moving object, but fail when the sensor's minimum object distance is violated ju… ▽ More This paper considers the final approach phase of visual-closed-loop gras** where the RGB-D camera is no longer able to provide valid depth information. Many current robotic gras** controllers are not closed-loop and therefore fail for moving objects. Closed-loop grasp controllers based on RGB-D imagery can track a moving object, but fail when the sensor's minimum object distance is violated just before gras**. To overcome this we propose the use of image-based visual servoing (IBVS) to guide the robot to the object-relative grasp pose using camera RGB information. IBVS robustly moves the camera to a goal pose defined implicitly in terms of an image-plane feature configuration. In this work, the goal image feature coordinates are predicted from RGB-D data to enable RGB-only tracking once depth data becomes unavailable -- this enables more reliable gras** of previously unseen moving objects. Experimental results are provided. △ Less

Submitted 27 February, 2020; v1 submitted 16 January, 2020; originally announced January 2020.

Comments: Under review for RA-L and IROS 2020

arXiv:2001.02801 [pdf, other]

Learning landmark guided embeddings for animal re-identification

Authors: Olga Moskvyak, Frederic Maire, Feras Dayoub, Mahsa Baktashmotlagh

Abstract: Re-identification of individual animals in images can be ambiguous due to subtle variations in body markings between different individuals and no constraints on the poses of animals in the wild. Person re-identification is a similar task and it has been approached with a deep convolutional neural network (CNN) that learns discriminative embeddings for images of people. However, learning discrimina… ▽ More Re-identification of individual animals in images can be ambiguous due to subtle variations in body markings between different individuals and no constraints on the poses of animals in the wild. Person re-identification is a similar task and it has been approached with a deep convolutional neural network (CNN) that learns discriminative embeddings for images of people. However, learning discriminative features for an individual animal is more challenging than for a person's appearance due to the relatively small size of ecological datasets compared to labelled datasets of person's identities. We propose to improve embedding learning by exploiting body landmarks information explicitly. Body landmarks are provided to the input of a CNN as confidence heatmaps that can be obtained from a separate body landmark predictor. The model is encouraged to use heatmaps by learning an auxiliary task of reconstructing input heatmaps. Body landmarks guide a feature extraction network to learn the representation of a distinctive pattern and its position on the body. We evaluate the proposed method on a large synthetic dataset and a small real dataset. Our method outperforms the same model without body landmarks input by 26% and 18% on the synthetic and the real datasets respectively. The method is robust to noise in input coordinates and can tolerate an error in coordinates up to 10% of the image size. △ Less

Submitted 8 January, 2020; originally announced January 2020.

Comments: 7 pages, 7 figures

arXiv:2001.02366 [pdf, other]

What can robotics research learn from computer vision research?

Authors: Peter Corke, Feras Dayoub, David Hall, John Skinner, Niko Sünderhauf

Abstract: The computer vision and robotics research communities are each strong. However progress in computer vision has become turbo-charged in recent years due to big data, GPU computing, novel learning algorithms and a very effective research methodology. By comparison, progress in robotics seems slower. It is true that robotics came later to exploring the potential of learning -- the advantages over the… ▽ More The computer vision and robotics research communities are each strong. However progress in computer vision has become turbo-charged in recent years due to big data, GPU computing, novel learning algorithms and a very effective research methodology. By comparison, progress in robotics seems slower. It is true that robotics came later to exploring the potential of learning -- the advantages over the well-established body of knowledge in dynamics, kinematics, planning and control is still being debated, although reinforcement learning seems to offer real potential. However, the rapid development of computer vision compared to robotics cannot be only attributed to the former's adoption of deep learning. In this paper, we argue that the gains in computer vision are due to research methodology -- evaluation under strict constraints versus experiments; bold numbers versus videos. △ Less

Submitted 11 June, 2020; v1 submitted 7 January, 2020; originally announced January 2020.

Comments: 15 pages, to appear in the proceeding of the International Symposium on Robotics Research (ISRR) 2019

arXiv:2001.00330 [pdf, other]

Close-Proximity Underwater Terrain Map** Using Learning-based Coarse Range Estimation

Authors: Bilal Arain, Feras Dayoub, Paul Rigby, Matthew Dunbabin

Abstract: This paper presents a novel approach to underwater terrain map** for Autonomous Underwater Vehicles (AUVs) operating in close proximity to complex 3D environments. The proposed methodology creates a probabilistic elevation map of the terrain using a monocular image learning-based scene range estimator as a sensor. This scene range estimator can filter transient objects such as fish and lighting… ▽ More This paper presents a novel approach to underwater terrain map** for Autonomous Underwater Vehicles (AUVs) operating in close proximity to complex 3D environments. The proposed methodology creates a probabilistic elevation map of the terrain using a monocular image learning-based scene range estimator as a sensor. This scene range estimator can filter transient objects such as fish and lighting variations. The map** approach considers uncertainty in both the estimated scene range and robot pose as the AUV moves through the environment. The resulting elevation map can be used for reactive path planning and obstacle avoidance to allow robotic systems to approach the underwater terrain as closely as possible. The performance of our approach is evaluated in a simulated underwater environment by comparing the reconstructed terrain to ground truth reference maps, as well as demonstrated using AUV field data collected within in a coral reef environment. The simulations and field results show that the proposed approach is feasible for obstacle detection and range estimation using a monocular camera in reef environments. △ Less

Submitted 23 March, 2020; v1 submitted 2 January, 2020; originally announced January 2020.

Comments: 8 pages, 13 figures, submitted to IEEE RA-L

arXiv:1903.07840 [pdf, other]

The Probabilistic Object Detection Challenge

Authors: John Skinner, David Hall, Haoyang Zhang, Feras Dayoub, Niko Sünderhauf

Abstract: We introduce a new challenge for computer and robotic vision, the first ACRV Robotic Vision Challenge, Probabilistic Object Detection. Probabilistic object detection is a new variation on traditional object detection tasks, requiring estimates of spatial and semantic uncertainty. We extend the traditional bounding box format of object detection to express spatial uncertainty using gaussian distrib… ▽ More We introduce a new challenge for computer and robotic vision, the first ACRV Robotic Vision Challenge, Probabilistic Object Detection. Probabilistic object detection is a new variation on traditional object detection tasks, requiring estimates of spatial and semantic uncertainty. We extend the traditional bounding box format of object detection to express spatial uncertainty using gaussian distributions for the box corners. The challenge introduces a new test dataset of video sequences, which are designed to more closely resemble the kind of data available to a robotic system. We evaluate probabilistic detections using a new probability-based detection quality (PDQ) measure. The goal in creating this challenge is to draw the computer and robotic vision communities together, toward applying object detection solutions for practical robotics applications. △ Less

Submitted 7 April, 2019; v1 submitted 19 March, 2019; originally announced March 2019.

Comments: 4 pages, workshop paper

arXiv:1903.06391 [pdf, other]

doi 10.1109/IROS40897.2019.8968525

Did You Miss the Sign? A False Negative Alarm System for Traffic Sign Detectors

Authors: Quazi Marufur Rahman, Niko Sünderhauf, Feras Dayoub

Abstract: Object detection is an integral part of an autonomous vehicle for its safety-critical and navigational purposes. Traffic signs as objects play a vital role in guiding such systems. However, if the vehicle fails to locate any critical sign, it might make a catastrophic failure. In this paper, we propose an approach to identify traffic signs that have been mistakenly discarded by the object detector… ▽ More Object detection is an integral part of an autonomous vehicle for its safety-critical and navigational purposes. Traffic signs as objects play a vital role in guiding such systems. However, if the vehicle fails to locate any critical sign, it might make a catastrophic failure. In this paper, we propose an approach to identify traffic signs that have been mistakenly discarded by the object detector. The proposed method raises an alarm when it discovers a failure by the object detector to detect a traffic sign. This approach can be useful to evaluate the performance of the detector during the deployment phase. We trained a single shot multi-box object detector to detect traffic signs and used its internal features to train a separate false negative detector (FND). During deployment, FND decides whether the traffic sign detector (TSD) has missed a sign or not. We are using precision and recall to measure the accuracy of FND in two different datasets. For 80% recall, FND has achieved 89.9% precision in Belgium Traffic Sign Detection dataset and 90.8% precision in German Traffic Sign Recognition Benchmark dataset respectively. To the best of our knowledge, our method is the first to tackle this critical aspect of false negative detection in robotic vision. Such a fail-safe mechanism for object detection can improve the engagement of robotic vision systems in our daily life. △ Less

Submitted 15 March, 2019; originally announced March 2019.

Comments: Submitted to the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2019)

arXiv:1902.10847 [pdf, other]

Robust Re-identification of Manta Rays from Natural Markings by Learning Pose Invariant Embeddings

Authors: Olga Moskvyak, Frederic Maire, Asia O. Armstrong, Feras Dayoub, Mahsa Baktashmotlagh

Abstract: Visual identification of individual animals that bear unique natural body markings is an important task in wildlife conservation. The photo databases of animal markings grow larger and each new observation has to be matched against thousands of images. Existing photo-identification solutions have constraints on image quality and appearance of the pattern of interest in the image. These constraints… ▽ More Visual identification of individual animals that bear unique natural body markings is an important task in wildlife conservation. The photo databases of animal markings grow larger and each new observation has to be matched against thousands of images. Existing photo-identification solutions have constraints on image quality and appearance of the pattern of interest in the image. These constraints limit the use of photos from citizen scientists. We present a novel system for visual re-identification based on unique natural markings that is robust to occlusions, viewpoint and illumination changes. We adapt methods developed for face re-identification and implement a deep convolutional neural network (CNN) to learn embeddings for images of natural markings. The distance between the learned embedding points provides a dissimilarity measure between the corresponding input images. The network is optimized using the triplet loss function and the online semi-hard triplet mining strategy. The proposed re-identification method is generic and not species specific. We evaluate the proposed system on image databases of manta ray belly patterns and humpback whale flukes. To be of practical value and adopted by marine biologists, a re-identification system needs to have a top-10 accuracy of at least 95%. The proposed system achieves this performance standard. △ Less

Submitted 27 February, 2019; originally announced February 2019.

Comments: 12 pages, 15 figures

arXiv:1811.10800 [pdf, other]

Probabilistic Object Detection: Definition and Evaluation

Authors: David Hall, Feras Dayoub, John Skinner, Haoyang Zhang, Dimity Miller, Peter Corke, Gustavo Carneiro, Anelia Angelova, Niko Sünderhauf

Abstract: We introduce Probabilistic Object Detection, the task of detecting objects in images and accurately quantifying the spatial and semantic uncertainties of the detections. Given the lack of methods capable of assessing such probabilistic object detections, we present the new Probability-based Detection Quality measure (PDQ).Unlike AP-based measures, PDQ has no arbitrary thresholds and rewards spatia… ▽ More We introduce Probabilistic Object Detection, the task of detecting objects in images and accurately quantifying the spatial and semantic uncertainties of the detections. Given the lack of methods capable of assessing such probabilistic object detections, we present the new Probability-based Detection Quality measure (PDQ).Unlike AP-based measures, PDQ has no arbitrary thresholds and rewards spatial and label quality, and foreground/background separation quality while explicitly penalising false positive and false negative detections. We contrast PDQ with existing mAP and moLRP measures by evaluating state-of-the-art detectors and a Bayesian object detector based on Monte Carlo Dropout. Our experiments indicate that conventional object detectors tend to be spatially overconfident and thus perform poorly on the task of probabilistic object detection. Our paper aims to encourage the development of new object detection approaches that provide detections with accurately estimated spatial and label uncertainties and are of critical importance for deployment on robots and embodied AI systems in the real world. △ Less

Submitted 30 January, 2020; v1 submitted 26 November, 2018; originally announced November 2018.

Comments: 21 pages, 25 figures, to appear in the proceedings of the winter conference on applications of computer vision WACV 2020

arXiv:1809.06006 [pdf, other]

Evaluating Merging Strategies for Sampling-based Uncertainty Techniques in Object Detection

Authors: Dimity Miller, Feras Dayoub, Michael Milford, Niko Sünderhauf

Abstract: There has been a recent emergence of sampling-based techniques for estimating epistemic uncertainty in deep neural networks. While these methods can be applied to classification or semantic segmentation tasks by simply averaging samples, this is not the case for object detection, where detection sample bounding boxes must be accurately associated and merged. A weak merging strategy can significant… ▽ More There has been a recent emergence of sampling-based techniques for estimating epistemic uncertainty in deep neural networks. While these methods can be applied to classification or semantic segmentation tasks by simply averaging samples, this is not the case for object detection, where detection sample bounding boxes must be accurately associated and merged. A weak merging strategy can significantly degrade the performance of the detector and yield an unreliable uncertainty measure. This paper provides the first in-depth investigation of the effect of different association and merging strategies. We compare different combinations of three spatial and two semantic affinity measures with four clustering methods for MC Dropout with a Single Shot Multi-Box Detector. Our results show that the correct choice of affinity-clustering combination can greatly improve the effectiveness of the classification and spatial uncertainty estimation and the resulting object detection performance. We base our evaluation on a new mix of datasets that emulate near open-set conditions (semantically similar unknown classes), distant open-set conditions (semantically dissimilar unknown classes) and the common closed-set conditions (only known classes). △ Less

Submitted 6 March, 2019; v1 submitted 16 September, 2018; originally announced September 2018.

Comments: to appear in IEEE International Conference on Robotics and Automation 2019 (ICRA 2019)

arXiv:1807.03124 [pdf, other]

An Overview of Perception Methods for Horticultural Robots: From Pollination to Harvest

Authors: Ho Seok Ahn, Feras Dayoub, Marija Popovic, Bruce MacDonald, Roland Siegwart, Inkyu Sa

Abstract: Horticultural enterprises are becoming more sophisticated as the range of the crops they target expands. Requirements for enhanced efficiency and productivity have driven the demand for automating on-field operations. However, various problems remain yet to be solved for their reliable, safe deployment in real-world scenarios. This paper examines major research trends and current challenges in hor… ▽ More Horticultural enterprises are becoming more sophisticated as the range of the crops they target expands. Requirements for enhanced efficiency and productivity have driven the demand for automating on-field operations. However, various problems remain yet to be solved for their reliable, safe deployment in real-world scenarios. This paper examines major research trends and current challenges in horticultural robotics. Specifically, our work focuses on sensing and perception in the three main horticultural procedures: pollination, yield estimation, and harvesting. For each task, we expose major issues arising from the unstructured, cluttered, and rugged nature of field environments, including variable lighting conditions and difficulties in fruit-specific detection, and highlight promising contemporary studies. △ Less

Submitted 26 June, 2018; originally announced July 2018.

Comments: 6 pages, 5 figures, 2 tables

arXiv:1804.02154 [pdf, ps, other]

Assisted Control for Semi-Autonomous Power Infrastructure Inspection using Aerial Vehicles

Authors: Aaron McFadyen, Feras Dayoub, Steve Martin, Jason Ford, Peter Corke

Abstract: This paper presents the design and implementation of an assisted control technology for a small multirotor platform for aerial inspection of fixed energy infrastructure. Sensor placement is supported by a theoretical analysis of expected sensor performance and constrained platform behaviour to speed up implementation. The optical sensors provide relative position information between the platform a… ▽ More This paper presents the design and implementation of an assisted control technology for a small multirotor platform for aerial inspection of fixed energy infrastructure. Sensor placement is supported by a theoretical analysis of expected sensor performance and constrained platform behaviour to speed up implementation. The optical sensors provide relative position information between the platform and the asset, which enables human operator inputs to be autonomously adjusted to ensure safe separation. The assisted control approach is designed to reduced operator workload during close proximity inspection tasks, with collision avoidance and safe separation managed autonomously. The energy infrastructure includes single vertical wooden poles and crossarm with attached overhead wires. Simulated and real experimental results are provided. △ Less

Submitted 1 August, 2018; v1 submitted 6 April, 2018; originally announced April 2018.

Comments: to appear in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2018)

arXiv:1801.08613 [pdf, other]

doi 10.1016/j.compag.2018.02.023

A Rapidly Deployable Classification System using Visual Data for the Application of Precision Weed Management

Authors: David Hall, Feras Dayoub, Tristan Perez, Chris McCool

Abstract: In this work we demonstrate a rapidly deployable weed classification system that uses visual data to enable autonomous precision weeding without making prior assumptions about which weed species are present in a given field. Previous work in this area relies on having prior knowledge of the weed species present in the field. This assumption cannot always hold true for every field, and thus limits… ▽ More In this work we demonstrate a rapidly deployable weed classification system that uses visual data to enable autonomous precision weeding without making prior assumptions about which weed species are present in a given field. Previous work in this area relies on having prior knowledge of the weed species present in the field. This assumption cannot always hold true for every field, and thus limits the use of weed classification systems based on this assumption. In this work, we obviate this assumption and introduce a rapidly deployable approach able to operate on any field without any weed species assumptions prior to deployment. We present a three stage pipeline for the implementation of our weed classification system consisting of initial field surveillance, offline processing and selective labelling, and automated precision weeding. The key characteristic of our approach is the combination of plant clustering and selective labelling which is what enables our system to operate without prior weed species knowledge. Testing using field data we are able to label 12.3 times fewer images than traditional full labelling whilst reducing classification accuracy by only 14%. △ Less

Submitted 26 April, 2018; v1 submitted 25 January, 2018; originally announced January 2018.

Comments: 36 pages, 14 figures, published Computers and Electronics in Agriculture Vol. 148

Journal ref: D. Hall, F. Dayoub, T. Perez, and C. McCool, "A Rapidly Deployable Classification System using Visual Data for the Application of Precision Weed Management," Computers and Electronics in Agriculture, Vol. 148, pp. 107-120, May 2018

arXiv:1710.06677 [pdf, other]

Dropout Sampling for Robust Object Detection in Open-Set Conditions

Authors: Dimity Miller, Lachlan Nicholson, Feras Dayoub, Niko Sünderhauf

Abstract: Dropout Variational Inference, or Dropout Sampling, has been recently proposed as an approximation technique for Bayesian Deep Learning and evaluated for image classification and regression tasks. This paper investigates the utility of Dropout Sampling for object detection for the first time. We demonstrate how label uncertainty can be extracted from a state-of-the-art object detection system via… ▽ More Dropout Variational Inference, or Dropout Sampling, has been recently proposed as an approximation technique for Bayesian Deep Learning and evaluated for image classification and regression tasks. This paper investigates the utility of Dropout Sampling for object detection for the first time. We demonstrate how label uncertainty can be extracted from a state-of-the-art object detection system via Dropout Sampling. We evaluate this approach on a large synthetic dataset of 30,000 images, and a real-world dataset captured by a mobile robot in a versatile campus environment. We show that this uncertainty can be utilized to increase object detection performance under the open-set conditions that are typically encountered in robotic vision. A Dropout Sampling network is shown to achieve a 12.3% increase in recall (for the same precision score as a standard network) and a 15.1% increase in precision (for the same recall score as the standard network). △ Less

Submitted 18 April, 2018; v1 submitted 18 October, 2017; originally announced October 2017.

Comments: to appear in IEEE International Conference on Robotics and Automation 2018 (ICRA 2018)

arXiv:1703.07473 [pdf, other]

Episode-Based Active Learning with Bayesian Neural Networks

Authors: Feras Dayoub, Niko Sünderhauf, Peter Corke

Abstract: We investigate different strategies for active learning with Bayesian deep neural networks. We focus our analysis on scenarios where new, unlabeled data is obtained episodically, such as commonly encountered in mobile robotics applications. An evaluation of different strategies for acquisition, updating, and final training on the CIFAR-10 dataset shows that incremental network updates with final t… ▽ More We investigate different strategies for active learning with Bayesian deep neural networks. We focus our analysis on scenarios where new, unlabeled data is obtained episodically, such as commonly encountered in mobile robotics applications. An evaluation of different strategies for acquisition, updating, and final training on the CIFAR-10 dataset shows that incremental network updates with final training on the accumulated acquisition set are essential for best performance, while limiting the amount of required human labeling labor. △ Less

Submitted 21 March, 2017; originally announced March 2017.

arXiv:1702.01247 [pdf, other]

Towards Unsupervised Weed Scouting for Agricultural Robotics

Authors: David Hall, Feras Dayoub, Jason Kulk, Chris McCool

Abstract: Weed scouting is an important part of modern integrated weed management but can be time consuming and sparse when performed manually. Automated weed scouting and weed destruction has typically been performed using classification systems able to classify a set group of species known a priori. This greatly limits deployability as classification systems must be retrained for any field with a differen… ▽ More Weed scouting is an important part of modern integrated weed management but can be time consuming and sparse when performed manually. Automated weed scouting and weed destruction has typically been performed using classification systems able to classify a set group of species known a priori. This greatly limits deployability as classification systems must be retrained for any field with a different set of weed species present within them. In order to overcome this limitation, this paper works towards develo** a clustering approach to weed scouting which can be utilized in any field without the need for prior species knowledge. We demonstrate our system using challenging data collected in the field from an agricultural robotics platform. We show that considerable improvements can be made by (i) learning low-dimensional (bottleneck) features using a deep convolutional neural network to represent plants in general and (ii) tying views of the same area (plant) together. Deploying this algorithm on in-field data collected by AgBotII, we are able to successfully cluster cotton plants from grasses without prior knowledge or training for the specific plants in the field. △ Less

Submitted 25 February, 2017; v1 submitted 4 February, 2017; originally announced February 2017.

Comments: to appear in the proceedings of the IEEE International Conference on Robotics and Automation ICRA2017

arXiv:1701.08608 [pdf, other]

doi 10.1109/LRA.2017.2651952

Peduncle Detection of Sweet Pepper for Autonomous Crop Harvesting - Combined Colour and 3D Information

Authors: Inkyu Sa, Chris Lehnert, Andrew English, Chris McCool, Feras Dayoub, Ben Upcroft, Tristan Perez

Abstract: This paper presents a 3D visual detection method for the challenging task of detecting peduncles of sweet peppers (Capsicum annuum) in the field. Cutting the peduncle cleanly is one of the most difficult stages of the harvesting process, where the peduncle is the part of the crop that attaches it to the main stem of the plant. Accurate peduncle detection in 3D space is therefore a vital step in re… ▽ More This paper presents a 3D visual detection method for the challenging task of detecting peduncles of sweet peppers (Capsicum annuum) in the field. Cutting the peduncle cleanly is one of the most difficult stages of the harvesting process, where the peduncle is the part of the crop that attaches it to the main stem of the plant. Accurate peduncle detection in 3D space is therefore a vital step in reliable autonomous harvesting of sweet peppers, as this can lead to precise cutting while avoiding damage to the surrounding plant. This paper makes use of both colour and geometry information acquired from an RGB-D sensor and utilises a supervised-learning approach for the peduncle detection task. The performance of the proposed method is demonstrated and evaluated using qualitative and quantitative results (the Area-Under-the-Curve (AUC) of the detection precision-recall curve). We are able to achieve an AUC of 0.71 for peduncle detection on field-grown sweet peppers. We release a set of manually annotated 3D sweet pepper and peduncle images to assist the research community in performing further research on this topic. △ Less

Submitted 30 January, 2017; originally announced January 2017.

Comments: 8 pages, 14 figures, Robotics and Automation Letters

arXiv:1507.02428 [pdf, other]

Place Categorization and Semantic Map** on a Mobile Robot

Authors: Niko Sünderhauf, Feras Dayoub, Sean McMahon, Ben Talbot, Ruth Schulz, Peter Corke, Gordon Wyeth, Ben Upcroft, Michael Milford

Abstract: In this paper we focus on the challenging problem of place categorization and semantic map** on a robot without environment-specific training. Motivated by their ongoing success in various visual recognition tasks, we build our system upon a state-of-the-art convolutional network. We overcome its closed-set limitations by complementing the network with a series of one-vs-all classifiers that can… ▽ More In this paper we focus on the challenging problem of place categorization and semantic map** on a robot without environment-specific training. Motivated by their ongoing success in various visual recognition tasks, we build our system upon a state-of-the-art convolutional network. We overcome its closed-set limitations by complementing the network with a series of one-vs-all classifiers that can learn to recognize new semantic classes online. Prior domain knowledge is incorporated by embedding the classification system into a Bayesian filter framework that also ensures temporal coherence. We evaluate the classification accuracy of the system on a robot that maps a variety of places on our campus in real-time. We show how semantic information can boost robotic object detection performance and how the semantic map can be used to modulate the robot's behaviour during navigation tasks. The system is made available to the community as a ROS module. △ Less

Submitted 9 July, 2015; originally announced July 2015.

arXiv:1501.04158 [pdf, other]

On the Performance of ConvNet Features for Place Recognition

Authors: Niko Sünderhauf, Feras Dayoub, Sareh Shirazi, Ben Upcroft, Michael Milford

Abstract: After the incredible success of deep learning in the computer vision domain, there has been much interest in applying Convolutional Network (ConvNet) features in robotic fields such as visual navigation and SLAM. Unfortunately, there are fundamental differences and challenges involved. Computer vision datasets are very different in character to robotic camera data, real-time performance is essenti… ▽ More After the incredible success of deep learning in the computer vision domain, there has been much interest in applying Convolutional Network (ConvNet) features in robotic fields such as visual navigation and SLAM. Unfortunately, there are fundamental differences and challenges involved. Computer vision datasets are very different in character to robotic camera data, real-time performance is essential, and performance priorities can be different. This paper comprehensively evaluates and compares the utility of three state-of-the-art ConvNets on the problems of particular relevance to navigation for robots; viewpoint-invariance and condition-invariance, and for the first time enables real-time place recognition performance using ConvNets with large maps by integrating a variety of existing (locality-sensitive hashing) and novel (semantic search space partitioning) optimization techniques. We present extensive experiments on four real world datasets cultivated to evaluate each of the specific challenges in place recognition. The results demonstrate that speed-ups of two orders of magnitude can be achieved with minimal accuracy degradation, enabling real-time performance. We confirm that networks trained for semantic place categorization also perform better at (specific) place recognition when faced with severe appearance changes and provide a reference for which networks and layers are optimal for different aspects of the place recognition problem. △ Less

Submitted 28 July, 2015; v1 submitted 17 January, 2015; originally announced January 2015.

Showing 1–49 of 49 results for author: Dayoub, F