Search | arXiv e-print repository

Watching Grass Grow: Long-term Visual Navigation and Mission Planning for Autonomous Biodiversity Monitoring

Authors: Matthew Gadd, Daniele De Martini, Luke Pitt, Wayne Tubby, Matthew Towlson, Chris Prahacs, Oliver Bartlett, John Jackson, Man Qi, Paul Newman, Andrew Hector, Roberto Salguero-Gómez, Nick Hawes

Abstract: We describe a challenging robotics deployment in a complex ecosystem to monitor a rich plant community. The study site is dominated by dynamic grassland vegetation and is thus visually ambiguous and liable to drastic appearance change over the course of a day and especially through the growing season. This dynamism and complexity in appearance seriously impact the stability of the robotics platfor… ▽ More We describe a challenging robotics deployment in a complex ecosystem to monitor a rich plant community. The study site is dominated by dynamic grassland vegetation and is thus visually ambiguous and liable to drastic appearance change over the course of a day and especially through the growing season. This dynamism and complexity in appearance seriously impact the stability of the robotics platform, as localisation is a foundational part of that control loop, and so routes must be carefully taught and retaught until autonomy is robust and repeatable. Our system is demonstrated over a 6-week period monitoring the response of grass species to experimental climate change manipulations. We also discuss the applicability of our pipeline to monitor biodiversity in other complex natural settings. △ Less

Submitted 1 May, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

Comments: to be presented at the Workshop on Field Robotics - ICRA 2024

arXiv:2403.09025 [pdf, other]

VDNA-PR: Using General Dataset Representations for Robust Sequential Visual Place Recognition

Authors: Benjamin Ramtoula, Daniele De Martini, Matthew Gadd, Paul Newman

Abstract: This paper adapts a general dataset representation technique to produce robust Visual Place Recognition (VPR) descriptors, crucial to enable real-world mobile robot localisation. Two parallel lines of work on VPR have shown, on one side, that general-purpose off-the-shelf feature representations can provide robustness to domain shifts, and, on the other, that fused information from sequences of im… ▽ More This paper adapts a general dataset representation technique to produce robust Visual Place Recognition (VPR) descriptors, crucial to enable real-world mobile robot localisation. Two parallel lines of work on VPR have shown, on one side, that general-purpose off-the-shelf feature representations can provide robustness to domain shifts, and, on the other, that fused information from sequences of images improves performance. In our recent work on measuring domain gaps between image datasets, we proposed a Visual Distribution of Neuron Activations (VDNA) representation to represent datasets of images. This representation can naturally handle image sequences and provides a general and granular feature representation derived from a general-purpose model. Moreover, our representation is based on tracking neuron activation values over the list of images to represent and is not limited to a particular neural network layer, therefore having access to high- and low-level concepts. This work shows how VDNAs can be used for VPR by learning a very lightweight and simple encoder to generate task-specific descriptors. Our experiments show that our representation can allow for better robustness than current solutions to serious domain shifts away from the training data distribution, such as to indoor environments and aerial imagery. △ Less

Submitted 13 March, 2024; originally announced March 2024.

Comments: Published at ICRA 2024

arXiv:2403.04755 [pdf, other]

That's My Point: Compact Object-centric LiDAR Pose Estimation for Large-scale Outdoor Localisation

Authors: Georgi Pramatarov, Matthew Gadd, Paul Newman, Daniele De Martini

Abstract: This paper is about 3D pose estimation on LiDAR scans with extremely minimal storage requirements to enable scalable map** and localisation. We achieve this by clustering all points of segmented scans into semantic objects and representing them only with their respective centroid and semantic class. In this way, each LiDAR scan is reduced to a compact collection of four-number vectors. This abst… ▽ More This paper is about 3D pose estimation on LiDAR scans with extremely minimal storage requirements to enable scalable map** and localisation. We achieve this by clustering all points of segmented scans into semantic objects and representing them only with their respective centroid and semantic class. In this way, each LiDAR scan is reduced to a compact collection of four-number vectors. This abstracts away important structural information from the scenes, which is crucial for traditional registration approaches. To mitigate this, we introduce an object-matching network based on self- and cross-correlation that captures geometric and semantic relationships between entities. The respective matches allow us to recover the relative transformation between scans through weighted Singular Value Decomposition (SVD) and RANdom SAmple Consensus (RANSAC). We demonstrate that such representation is sufficient for metric localisation by registering point clouds taken under different viewpoints on the KITTI dataset, and at different periods of time localising between KITTI and KITTI-360. We achieve accurate metric estimates comparable with state-of-the-art methods with almost half the representation size, specifically 1.33 kB on average. △ Less

Submitted 7 March, 2024; originally announced March 2024.

Comments: Accepted for publication at the IEEE International Conference on Robotics and Automation (ICRA) 2024

arXiv:2403.02845 [pdf, other]

OORD: The Oxford Offroad Radar Dataset

Authors: Matthew Gadd, Daniele De Martini, Oliver Bartlett, Paul Murcutt, Matt Towlson, Matthew Widojo, Valentina Muşat, Luke Robinson, Efimia Panagiotaki, Georgi Pramatarov, Marc Alexander Kühn, Letizia Marchegiani, Paul Newman, Lars Kunze

Abstract: There is a growing academic interest as well as commercial exploitation of millimetre-wave scanning radar for autonomous vehicle localisation and scene understanding. Although several datasets to support this research area have been released, they are primarily focused on urban or semi-urban environments. Nevertheless, rugged offroad deployments are important application areas which also present u… ▽ More There is a growing academic interest as well as commercial exploitation of millimetre-wave scanning radar for autonomous vehicle localisation and scene understanding. Although several datasets to support this research area have been released, they are primarily focused on urban or semi-urban environments. Nevertheless, rugged offroad deployments are important application areas which also present unique challenges and opportunities for this sensor technology. Therefore, the Oxford Offroad Radar Dataset (OORD) presents data collected in the rugged Scottish highlands in extreme weather. The radar data we offer to the community are accompanied by GPS/INS reference - to further stimulate research in radar place recognition. In total we release over 90GiB of radar scans as well as GPS and IMU readings by driving a diverse set of four routes over 11 forays, totalling approximately 154km of rugged driving. This is an area increasingly explored in literature, and we therefore present and release examples of recent open-sourced radar place recognition systems and their performance on our dataset. This includes a learned neural network, the weights of which we also release. The data and tools are made freely available to the community at https://oxford-robotics-institute.github.io/oord-dataset. △ Less

Submitted 25 May, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

arXiv:2402.17653 [pdf, other]

Mitigating Distributional Shift in Semantic Segmentation via Uncertainty Estimation from Unlabelled Data

Authors: David S. W. Williams, Daniele De Martini, Matthew Gadd, Paul Newman

Abstract: Knowing when a trained segmentation model is encountering data that is different to its training data is important. Understanding and mitigating the effects of this play an important part in their application from a performance and assurance perspective - this being a safety concern in applications such as autonomous vehicles (AVs). This work presents a segmentation network that can detect errors… ▽ More Knowing when a trained segmentation model is encountering data that is different to its training data is important. Understanding and mitigating the effects of this play an important part in their application from a performance and assurance perspective - this being a safety concern in applications such as autonomous vehicles (AVs). This work presents a segmentation network that can detect errors caused by challenging test domains without any additional annotation in a single forward pass. As annotation costs limit the diversity of labelled datasets, we use easy-to-obtain, uncurated and unlabelled data to learn to perform uncertainty estimation by selectively enforcing consistency over data augmentation. To this end, a novel segmentation benchmark based on the SAX Dataset is used, which includes labelled test data spanning three autonomous-driving domains, ranging in appearance from dense urban to off-road. The proposed method, named Gamma-SSL, consistently outperforms uncertainty estimation and Out-of-Distribution (OoD) techniques on this difficult benchmark - by up to 10.7% in area under the receiver operating characteristic (ROC) curve and 19.2% in area under the precision-recall (PR) curve in the most challenging of the three scenarios. △ Less

Submitted 27 February, 2024; originally announced February 2024.

Comments: Accepted for publication in IEEE Transactions on Robotics (T-RO)

arXiv:2402.17622 [pdf, other]

Masked Gamma-SSL: Learning Uncertainty Estimation via Masked Image Modeling

Authors: David S. W. Williams, Matthew Gadd, Paul Newman, Daniele De Martini

Abstract: This work proposes a semantic segmentation network that produces high-quality uncertainty estimates in a single forward pass. We exploit general representations from foundation models and unlabelled datasets through a Masked Image Modeling (MIM) approach, which is robust to augmentation hyper-parameters and simpler than previous techniques. For neural networks used in safety-critical applications,… ▽ More This work proposes a semantic segmentation network that produces high-quality uncertainty estimates in a single forward pass. We exploit general representations from foundation models and unlabelled datasets through a Masked Image Modeling (MIM) approach, which is robust to augmentation hyper-parameters and simpler than previous techniques. For neural networks used in safety-critical applications, bias in the training data can lead to errors; therefore it is crucial to understand a network's limitations at run time and act accordingly. To this end, we test our proposed method on a number of test domains including the SAX Segmentation benchmark, which includes labelled test data from dense urban, rural and off-road driving domains. The proposed method consistently outperforms uncertainty estimation and Out-of-Distribution (OoD) techniques on this difficult benchmark. △ Less

Submitted 27 February, 2024; originally announced February 2024.

Comments: Accepted for publication at 2024 IEEE International Conference on Robotics and Automation (ICRA)

arXiv:2402.10828 [pdf, other]

RAG-Driver: Generalisable Driving Explanations with Retrieval-Augmented In-Context Learning in Multi-Modal Large Language Model

Authors: Jianhao Yuan, Shuyang Sun, Daniel Omeiza, Bo Zhao, Paul Newman, Lars Kunze, Matthew Gadd

Abstract: We need to trust robots that use often opaque AI methods. They need to explain themselves to us, and we need to trust their explanation. In this regard, explainability plays a critical role in trustworthy autonomous decision-making to foster transparency and acceptance among end users, especially in complex autonomous driving. Recent advancements in Multi-Modal Large Language models (MLLMs) have s… ▽ More We need to trust robots that use often opaque AI methods. They need to explain themselves to us, and we need to trust their explanation. In this regard, explainability plays a critical role in trustworthy autonomous decision-making to foster transparency and acceptance among end users, especially in complex autonomous driving. Recent advancements in Multi-Modal Large Language models (MLLMs) have shown promising potential in enhancing the explainability as a driving agent by producing control predictions along with natural language explanations. However, severe data scarcity due to expensive annotation costs and significant domain gaps between different datasets makes the development of a robust and generalisable system an extremely challenging task. Moreover, the prohibitively expensive training requirements of MLLM and the unsolved problem of catastrophic forgetting further limit their generalisability post-deployment. To address these challenges, we present RAG-Driver, a novel retrieval-augmented multi-modal large language model that leverages in-context learning for high-performance, explainable, and generalisable autonomous driving. By grounding in retrieved expert demonstration, we empirically validate that RAG-Driver achieves state-of-the-art performance in producing driving action explanations, justifications, and control signal prediction. More importantly, it exhibits exceptional zero-shot generalisation capabilities to unseen environments without further training endeavours. △ Less

Submitted 29 May, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

Comments: 14 pages, 6 figures

Journal ref: Robotics: Science and Systems (RSS) 2024

arXiv:2401.15380 [pdf, other]

Open-RadVLAD: Fast and Robust Radar Place Recognition

Authors: Matthew Gadd, Paul Newman

Abstract: Radar place recognition often involves encoding a live scan as a vector and matching this vector to a database in order to recognise that the vehicle is in a location that it has visited before. Radar is inherently robust to lighting or weather conditions, but place recognition with this sensor is still affected by: (1) viewpoint variation, i.e. translation and rotation, (2) sensor artefacts or "n… ▽ More Radar place recognition often involves encoding a live scan as a vector and matching this vector to a database in order to recognise that the vehicle is in a location that it has visited before. Radar is inherently robust to lighting or weather conditions, but place recognition with this sensor is still affected by: (1) viewpoint variation, i.e. translation and rotation, (2) sensor artefacts or "noises". For 360-degree scanning radar, rotation is readily dealt with by in some way aggregating across azimuths. Also, we argue in this work that it is more critical to deal with the richness of representation and sensor noises than it is to deal with translational invariance - particularly in urban driving where vehicles predominantly follow the same lane when repeating a route. In our method, for computational efficiency, we use only the polar representation. For partial translation invariance and robustness to signal noise, we use only a one-dimensional Fourier Transform along radial returns. We also achieve rotational invariance and a very discriminative descriptor space by building a vector of locally aggregated descriptors. Our method is more comprehensively tested than all prior radar place recognition work - over an exhaustive combination of all 870 pairs of trajectories from 30 Oxford Radar RobotCar Dataset sequences (each approximately 10 km). Code and detailed results are provided at github.com/mttgdd/open-radvlad, as an open implementation and benchmark for future work in this area. We achieve a median of 91.52% in Recall@1, outstrip** the 69.55% for the only other open implementation, RaPlace, and at a fraction of its computational cost (relying on fewer integral transforms e.g. Radon, Fourier, and inverse Fourier). △ Less

Submitted 2 March, 2024; v1 submitted 27 January, 2024; originally announced January 2024.

Comments: accepted at 2024 IEEE Radar Conference

arXiv:2310.15677 [pdf, other]

Robot-Relay : Building-Wide, Calibration-Less Visual Servoing with Learned Sensor Handover Network

Authors: Luke Robinson, Matthew Gadd, Paul Newman, Daniele De Martini

Abstract: We present a system which grows and manages a network of remote viewpoints during the natural installation cycle for a newly installed camera network or a newly deployed robot fleet. No explicit notion of camera position or orientation is required, neither global - i.e. relative to a building plan - nor local - i.e. relative to an interesting point in a room. Furthermore, no metric relationship be… ▽ More We present a system which grows and manages a network of remote viewpoints during the natural installation cycle for a newly installed camera network or a newly deployed robot fleet. No explicit notion of camera position or orientation is required, neither global - i.e. relative to a building plan - nor local - i.e. relative to an interesting point in a room. Furthermore, no metric relationship between viewpoints is required. Instead, we leverage our prior work in effective remote control without extrinsic or intrinsic calibration and extend it to the multi-camera setting. In this, we memorise, from simultaneous robot detections in the tracker thread, soft pixel-wise topological connections between viewpoints. We demonstrate our system with repeated autonomous traversals of workspaces connected by a network of six cameras across a productive office environment. △ Less

Submitted 24 October, 2023; originally announced October 2023.

Comments: Paper accepted to the 18th International Symposium on Experimental Robotics (ISER 2023)

arXiv:2310.13622 [pdf, other]

What you see is what you get: Experience ranking with deep neural dataset-to-dataset similarity for topological localisation

Authors: Matthew Gadd, Benjamin Ramtoula, Daniele De Martini, Paul Newman

Abstract: Recalling the most relevant visual memories for localisation or understanding a priori the likely outcome of localisation effort against a particular visual memory is useful for efficient and robust visual navigation. Solutions to this problem should be divorced from performance appraisal against ground truth - as this is not available at run-time - and should ideally be based on generalisable env… ▽ More Recalling the most relevant visual memories for localisation or understanding a priori the likely outcome of localisation effort against a particular visual memory is useful for efficient and robust visual navigation. Solutions to this problem should be divorced from performance appraisal against ground truth - as this is not available at run-time - and should ideally be based on generalisable environmental observations. For this, we propose applying the recently developed Visual DNA as a highly scalable tool for comparing datasets of images - in this work, sequences of map and live experiences. In the case of localisation, important dataset differences impacting performance are modes of appearance change, including weather, lighting, and season. Specifically, for any deep architecture which is used for place recognition by matching feature volumes at a particular layer, we use distribution measures to compare neuron-wise activation statistics between live images and multiple previously recorded past experiences, with a potentially large seasonal (winter/summer) or time of day (day/night) shift. We find that differences in these statistics correlate to performance when localising using a past experience with the same appearance gap. We validate our approach over the Nordland cross-season dataset as well as data from Oxford's University Parks with lighting and mild seasonal change, showing excellent ability of our system to rank actual localisation performance across candidate experiences. △ Less

Submitted 20 October, 2023; originally announced October 2023.

Comments: 18th International Symposium on Experimental Robotics (ISER 2023)

arXiv:2310.02781 [pdf, other]

LROC-PANGU-GAN: Closing the Simulation Gap in Learning Crater Segmentation with Planetary Simulators

Authors: Jaewon La, Jaime Phadke, Matt Hutton, Marius Schwinning, Gabriele De Canio, Florian Renk, Lars Kunze, Matthew Gadd

Abstract: It is critical for probes landing on foreign planetary bodies to be able to robustly identify and avoid hazards - as, for example, steep cliffs or deep craters can pose significant risks to a probe's landing and operational success. Recent applications of deep learning to this problem show promising results. These models are, however, often learned with explicit supervision over annotated datasets… ▽ More It is critical for probes landing on foreign planetary bodies to be able to robustly identify and avoid hazards - as, for example, steep cliffs or deep craters can pose significant risks to a probe's landing and operational success. Recent applications of deep learning to this problem show promising results. These models are, however, often learned with explicit supervision over annotated datasets. These human-labelled crater databases, such as from the Lunar Reconnaissance Orbiter Camera (LROC), may lack in consistency and quality, undermining model performance - as incomplete and/or inaccurate labels introduce noise into the supervisory signal, which encourages the model to learn incorrect associations and results in the model making unreliable predictions. Physics-based simulators, such as the Planet and Asteroid Natural Scene Generation Utility, have, in contrast, perfect ground truth, as the internal state that they use to render scenes is known with exactness. However, they introduce a serious simulation-to-real domain gap - because of fundamental differences between the simulated environment and the real-world arising from modelling assumptions, unaccounted for physical interactions, environmental variability, etc. Therefore, models trained on their outputs suffer when deployed in the face of realism they have not encountered in their training data distributions. In this paper, we therefore introduce a system to close this "realism" gap while retaining label fidelity. We train a CycleGAN model to synthesise LROC from Planet and Asteroid Natural Scene Generation Utility (PANGU) images. We show that these improve the training of a downstream crater segmentation network, with segmentation performance on a test set of real LROC images improved as compared to using only simulated PANGU images. △ Less

Submitted 4 October, 2023; originally announced October 2023.

Comments: 17th Symposium on Advanced Space Technologies in Robotics and Automation

arXiv:2308.03718 [pdf, other]

SEM-GAT: Explainable Semantic Pose Estimation using Learned Graph Attention

Authors: Efimia Panagiotaki, Daniele De Martini, Georgi Pramatarov, Matthew Gadd, Lars Kunze

Abstract: This paper proposes a Graph Neural Network(GNN)-based method for exploiting semantics and local geometry to guide the identification of reliable pointcloud registration candidates. Semantic and morphological features of the environment serve as key reference points for registration, enabling accurate lidar-based pose estimation. Our novel lightweight static graph structure informs our attention-ba… ▽ More This paper proposes a Graph Neural Network(GNN)-based method for exploiting semantics and local geometry to guide the identification of reliable pointcloud registration candidates. Semantic and morphological features of the environment serve as key reference points for registration, enabling accurate lidar-based pose estimation. Our novel lightweight static graph structure informs our attention-based node aggregation network by identifying semantic-instance relationships, acting as an inductive bias to significantly reduce the computational burden of pointcloud registration. By connecting candidate nodes and exploiting cross-graph attention, we identify confidence scores for all potential registration correspondences and estimate the displacement between pointcloud scans. Our pipeline enables introspective analysis of the model's performance by correlating it with the individual contributions of local structures in the environment, providing valuable insights into the system's behaviour. We test our method on the KITTI odometry dataset, achieving competitive accuracy compared to benchmark methods and a higher track smoothness while relying on significantly fewer network parameters. △ Less

Submitted 22 October, 2023; v1 submitted 7 August, 2023; originally announced August 2023.

Comments: International Conference on Advanced Robotics (ICAR 2023)

ACM Class: I.2.9; I.2.10; I.2.4; I.4.8; I.5.1; I.5.2

arXiv:2306.14848 [pdf, other]

Visual Servoing on Wheels: Robust Robot Orientation Estimation in Remote Viewpoint Control

Authors: Luke Robinson, Daniele De Martini, Matthew Gadd, Paul Newman

Abstract: This work proposes a fast deployment pipeline for visually-servoed robots which does not assume anything about either the robot - e.g. sizes, colour or the presence of markers - or the deployment environment. In this, accurate estimation of robot orientation is crucial for successful navigation in complex environments; manual labelling of angular values is, though, time-consuming and possibly hard… ▽ More This work proposes a fast deployment pipeline for visually-servoed robots which does not assume anything about either the robot - e.g. sizes, colour or the presence of markers - or the deployment environment. In this, accurate estimation of robot orientation is crucial for successful navigation in complex environments; manual labelling of angular values is, though, time-consuming and possibly hard to perform. For this reason, we propose a weakly supervised pipeline that can produce a vast amount of data in a small amount of time. We evaluate our approach on a dataset of remote camera images captured in various indoor environments demonstrating high tracking performances when integrated into a fully-autonomous pipeline with a simple controller. With this, we then analyse the data requirement of our approach, showing how it is possible to deploy a new robot in a new environment in less than 30.00 min. △ Less

Submitted 26 June, 2023; originally announced June 2023.

Comments: Accepted at IROS 2023

arXiv:2306.12556 [pdf, other]

Off the Radar: Uncertainty-Aware Radar Place Recognition with Introspective Querying and Map Maintenance

Authors: Jianhao Yuan, Paul Newman, Matthew Gadd

Abstract: Localisation with Frequency-Modulated Continuous-Wave (FMCW) radar has gained increasing interest due to its inherent resistance to challenging environments. However, complex artefacts of the radar measurement process require appropriate uncertainty estimation to ensure the safe and reliable application of this promising sensor modality. In this work, we propose a multi-session map management syst… ▽ More Localisation with Frequency-Modulated Continuous-Wave (FMCW) radar has gained increasing interest due to its inherent resistance to challenging environments. However, complex artefacts of the radar measurement process require appropriate uncertainty estimation to ensure the safe and reliable application of this promising sensor modality. In this work, we propose a multi-session map management system which constructs the best maps for further localisation based on learned variance properties in an embedding space. Using the same variance properties, we also propose a new way to introspectively reject localisation queries that are likely to be incorrect. For this, we apply robust noise-aware metric learning, which both leverages the short-timescale variability of radar data along a driven path (for data augmentation) and predicts the downstream uncertainty in metric-space-based place recognition. We prove the effectiveness of our method over extensive cross-validated tests of the Oxford Radar RobotCar and MulRan dataset. In this, we outperform the current state-of-the-art in radar place recognition and other uncertainty-aware methods when using only single nearest-neighbour queries. We also show consistent performance increases when rejecting queries based on uncertainty over a difficult test environment, which we did not observe for a competing uncertainty-aware place recognition system. △ Less

Submitted 21 June, 2023; originally announced June 2023.

Comments: 8 pages, 6 figures

Journal ref: International Conference on Intelligent Robots and Systems (IROS) 2023

arXiv:2304.10036 [pdf, other]

Visual DNA: Representing and Comparing Images using Distributions of Neuron Activations

Authors: Benjamin Ramtoula, Matthew Gadd, Paul Newman, Daniele De Martini

Abstract: Selecting appropriate datasets is critical in modern computer vision. However, no general-purpose tools exist to evaluate the extent to which two datasets differ. For this, we propose representing images - and by extension datasets - using Distributions of Neuron Activations (DNAs). DNAs fit distributions, such as histograms or Gaussians, to activations of neurons in a pre-trained feature extracto… ▽ More Selecting appropriate datasets is critical in modern computer vision. However, no general-purpose tools exist to evaluate the extent to which two datasets differ. For this, we propose representing images - and by extension datasets - using Distributions of Neuron Activations (DNAs). DNAs fit distributions, such as histograms or Gaussians, to activations of neurons in a pre-trained feature extractor through which we pass the image(s) to represent. This extractor is frozen for all datasets, and we rely on its generally expressive power in feature space. By comparing two DNAs, we can evaluate the extent to which two datasets differ with granular control over the comparison attributes of interest, providing the ability to customise the way distances are measured to suit the requirements of the task at hand. Furthermore, DNAs are compact, representing datasets of any size with less than 15 megabytes. We demonstrate the value of DNAs by evaluating their applicability on several tasks, including conditional dataset comparison, synthetic image evaluation, and transfer learning, and across diverse datasets, ranging from synthetic cat images to celebrity faces and urban driving scenes. △ Less

Submitted 19 April, 2023; originally announced April 2023.

Comments: Published at CVPR 2023. Project page with code: https://bramtoula.github.io/vdna/

arXiv:2206.15154 [pdf, other]

BoxGraph: Semantic Place Recognition and Pose Estimation from 3D LiDAR

Authors: Georgi Pramatarov, Daniele De Martini, Matthew Gadd, Paul Newman

Abstract: This paper is about extremely robust and lightweight localisation using LiDAR point clouds based on instance segmentation and graph matching. We model 3D point clouds as fully-connected graphs of semantically identified components where each vertex corresponds to an object instance and encodes its shape. Optimal vertex association across graphs allows for full 6-Degree-of-Freedom (DoF) pose estima… ▽ More This paper is about extremely robust and lightweight localisation using LiDAR point clouds based on instance segmentation and graph matching. We model 3D point clouds as fully-connected graphs of semantically identified components where each vertex corresponds to an object instance and encodes its shape. Optimal vertex association across graphs allows for full 6-Degree-of-Freedom (DoF) pose estimation and place recognition by measuring similarity. This representation is very concise, condensing the size of maps by a factor of 25 against the state-of-the-art, requiring only 3kB to represent a 1.4MB laser scan. We verify the efficacy of our system on the SemanticKITTI dataset, where we achieve a new state-of-the-art in place recognition, with an average of 88.4% recall at 100% precision where the next closest competitor follows with 64.9%. We also show accurate metric pose estimation performance - estimating 6-DoF pose with median errors of 10 cm and 0.33 deg. △ Less

Submitted 30 June, 2022; originally announced June 2022.

Comments: Accepted for publication at the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2022

arXiv:2206.10517 [pdf, other]

What Goes Around: Leveraging a Constant-curvature Motion Constraint in Radar Odometry

Authors: Roberto Aldera, Matthew Gadd, Daniele De Martini, Paul Newman

Abstract: This paper presents a method that leverages vehicle motion constraints to refine data associations in a point-based radar odometry system. By using the strong prior on how a non-holonomic robot is constrained to move smoothly through its environment, we develop the necessary framework to estimate ego-motion from a single landmark association rather than considering all of these correspondences at… ▽ More This paper presents a method that leverages vehicle motion constraints to refine data associations in a point-based radar odometry system. By using the strong prior on how a non-holonomic robot is constrained to move smoothly through its environment, we develop the necessary framework to estimate ego-motion from a single landmark association rather than considering all of these correspondences at once. This allows for informed outlier detection of poor matches that are a dominant source of pose estimate error. By refining the subset of matched landmarks, we see an absolute decrease of 2.15% (from 4.68% to 2.53%) in translational error, approximately halving the error in odometry (reducing by 45.94%) than when using the full set of correspondences. This contribution is relevant to other point-based odometry implementations that rely on a range sensor and provides a lightweight and interpretable means of incorporating vehicle dynamics for ego-motion estimation. △ Less

Submitted 21 June, 2022; originally announced June 2022.

Comments: Accepted for RA-L

arXiv:2203.03405 [pdf, other]

Depth-SIMS: Semi-Parametric Image and Depth Synthesis

Authors: Valentina Musat, Daniele De Martini, Matthew Gadd, Paul Newman

Abstract: In this paper we present a compositing image synthesis method that generates RGB canvases with well aligned segmentation maps and sparse depth maps, coupled with an in-painting network that transforms the RGB canvases into high quality RGB images and the sparse depth maps into pixel-wise dense depth maps. We benchmark our method in terms of structural alignment and image quality, showing an increa… ▽ More In this paper we present a compositing image synthesis method that generates RGB canvases with well aligned segmentation maps and sparse depth maps, coupled with an in-painting network that transforms the RGB canvases into high quality RGB images and the sparse depth maps into pixel-wise dense depth maps. We benchmark our method in terms of structural alignment and image quality, showing an increase in mIoU over SOTA by 3.7 percentage points and a highly competitive FID. Furthermore, we analyse the quality of the generated data as training data for semantic segmentation and depth completion, and show that our approach is more suited for this purpose than other methods. △ Less

Submitted 2 June, 2022; v1 submitted 7 March, 2022; originally announced March 2022.

arXiv:2203.00459 [pdf, other]

Fast-MbyM: Leveraging Translational Invariance of the Fourier Transform for Efficient and Accurate Radar Odometry

Authors: Robert Weston, Matthew Gadd, Daniele De Martini, Paul Newman, Ingmar Posner

Abstract: Masking By Moving (MByM), provides robust and accurate radar odometry measurements through an exhaustive correlative search across discretised pose candidates. However, this dense search creates a significant computational bottleneck which hinders real-time performance when high-end GPUs are not available. Utilising the translational invariance of the Fourier Transform, in our approach, f-MByM, we… ▽ More Masking By Moving (MByM), provides robust and accurate radar odometry measurements through an exhaustive correlative search across discretised pose candidates. However, this dense search creates a significant computational bottleneck which hinders real-time performance when high-end GPUs are not available. Utilising the translational invariance of the Fourier Transform, in our approach, f-MByM, we decouple the search for angle and translation. By maintaining end-to-end differentiability a neural network is used to mask scans and trained by supervising pose prediction directly. Training faster and with less memory, utilising a decoupled search allows f-MByM to achieve significant run-time performance improvements on a CPU (168%) and to run in real-time on embedded devices, in stark contrast to MByM. Throughout, our approach remains accurate and competitive with the best radar odometry variants available in the literature -- achieving an end-point drift of 2.01% in translation and 6.3deg/km on the Oxford Radar RobotCar Dataset. △ Less

Submitted 1 March, 2022; originally announced March 2022.

Comments: 7 pages

arXiv:2110.02744 [pdf, other]

Contrastive Learning for Unsupervised Radar Place Recognition

Authors: Matthew Gadd, Daniele De Martini, Paul Newman

Abstract: We learn, in an unsupervised way, an embedding from sequences of radar images that is suitable for solving the place recognition problem with complex radar data. Our method is based on invariant instance feature learning but is tailored for the task of re-localisation by exploiting for data augmentation the temporal successivity of data as collected by a mobile platform moving through the scene sm… ▽ More We learn, in an unsupervised way, an embedding from sequences of radar images that is suitable for solving the place recognition problem with complex radar data. Our method is based on invariant instance feature learning but is tailored for the task of re-localisation by exploiting for data augmentation the temporal successivity of data as collected by a mobile platform moving through the scene smoothly. We experiment across two prominent urban radar datasets totalling over 400 km of driving and show that we achieve a new radar place recognition state-of-the-art. Specifically, the proposed system proves correct for 98.38% of the queries that it is presented with over a challenging re-localisation sequence, using only the single nearest neighbour in the learned metric space. We also find that our learned model shows better understanding of out-of-lane loop closures at arbitrary orientation than non-learned radar scan descriptors. △ Less

Submitted 6 October, 2021; originally announced October 2021.

Comments: accepted for publication at the IEEE International Conference on Advanced Robotics (ICAR) 2021. arXiv admin note: substantial text overlap with arXiv:2106.06703

arXiv:2106.08983 [pdf, other]

The Oxford Road Boundaries Dataset

Authors: Tarlan Suleymanov, Matthew Gadd, Daniele De Martini, Paul Newman

Abstract: In this paper we present the Oxford Road Boundaries Dataset, designed for training and testing machine-learning-based road-boundary detection and inference approaches. We have hand-annotated two of the 10 km-long forays from the Oxford Robotcar Dataset and generated from other forays several thousand further examples with semi-annotated road-boundary masks. To boost the number of training samples… ▽ More In this paper we present the Oxford Road Boundaries Dataset, designed for training and testing machine-learning-based road-boundary detection and inference approaches. We have hand-annotated two of the 10 km-long forays from the Oxford Robotcar Dataset and generated from other forays several thousand further examples with semi-annotated road-boundary masks. To boost the number of training samples in this way, we used a vision-based localiser to project labels from the annotated datasets to other traversals at different times and weather conditions. As a result, we release 62605 labelled samples, of which 47639 samples are curated. Each of these samples contains both raw and classified masks for left and right lenses. Our data contains images from a diverse set of scenarios such as straight roads, parked cars, junctions, etc. Files for download and tools for manipulating the labelled data are available at: oxford-robotics-institute.github.io/road-boundaries-dataset △ Less

Submitted 16 June, 2021; originally announced June 2021.

Comments: Accepted for publication at the workshop "3D-DLAD: 3D-Deep Learning for Autonomous Driving" (WS15), Intelligent Vehicles Symposium (IV 2021)

arXiv:2106.06703 [pdf, other]

Unsupervised Place Recognition with Deep Embedding Learning over Radar Videos

Authors: Matthew Gadd, Daniele De Martini, Paul Newman

Abstract: We learn, in an unsupervised way, an embedding from sequences of radar images that is suitable for solving place recognition problem using complex radar data. We experiment on 280 km of data and show performance exceeding state-of-the-art supervised approaches, localising correctly 98.38% of the time when using just the nearest database candidate. We learn, in an unsupervised way, an embedding from sequences of radar images that is suitable for solving place recognition problem using complex radar data. We experiment on 280 km of data and show performance exceeding state-of-the-art supervised approaches, localising correctly 98.38% of the time when using just the nearest database candidate. △ Less

Submitted 12 June, 2021; originally announced June 2021.

Comments: to be presented at the Workshop on Radar Perception for All-Weather Autonomy at the IEEE International Conference on Robotics and Automation (ICRA) 2021

arXiv:2103.00869 [pdf, other]

Fool Me Once: Robust Selective Segmentation via Out-of-Distribution Detection with Contrastive Learning

Authors: David Williams, Matthew Gadd, Daniele De Martini, Paul Newman

Abstract: In this work, we train a network to simultaneously perform segmentation and pixel-wise Out-of-Distribution (OoD) detection, such that the segmentation of unknown regions of scenes can be rejected. This is made possible by leveraging an OoD dataset with a novel contrastive objective and data augmentation scheme. By combining data including unknown classes in the training data, a more robust feature… ▽ More In this work, we train a network to simultaneously perform segmentation and pixel-wise Out-of-Distribution (OoD) detection, such that the segmentation of unknown regions of scenes can be rejected. This is made possible by leveraging an OoD dataset with a novel contrastive objective and data augmentation scheme. By combining data including unknown classes in the training data, a more robust feature representation can be learned with known classes represented distinctly from those unknown. When presented with unknown classes or conditions, many current approaches for segmentation frequently exhibit high confidence in their inaccurate segmentations and cannot be trusted in many operational environments. We validate our system on a real-world dataset of unusual driving scenes, and show that by selectively segmenting scenes based on what is predicted as OoD, we can increase the segmentation accuracy by an IoU of 0.2 with respect to alternative techniques. △ Less

Submitted 1 March, 2021; originally announced March 2021.

Comments: Accepted for publication at the 2021 IEEE International Conference on Robotics and Automation (ICRA)

arXiv:2005.05175 [pdf, other]

Keep off the Grass: Permissible Driving Routes from Radar with Weak Audio Supervision

Authors: David Williams, Daniele De Martini, Matthew Gadd, Letizia Marchegiani, Paul Newman

Abstract: Reliable outdoor deployment of mobile robots requires the robust identification of permissible driving routes in a given environment. The performance of LiDAR and vision-based perception systems deteriorates significantly if certain environmental factors are present e.g. rain, fog, darkness. Perception systems based on FMCW scanning radar maintain full performance regardless of environmental condi… ▽ More Reliable outdoor deployment of mobile robots requires the robust identification of permissible driving routes in a given environment. The performance of LiDAR and vision-based perception systems deteriorates significantly if certain environmental factors are present e.g. rain, fog, darkness. Perception systems based on FMCW scanning radar maintain full performance regardless of environmental conditions and with a longer range than alternative sensors. Learning to segment a radar scan based on driveability in a fully supervised manner is not feasible as labelling each radar scan on a bin-by-bin basis is both difficult and time-consuming to do by hand. We therefore weakly supervise the training of the radar-based classifier through an audio-based classifier that is able to predict the terrain type underneath the robot. By combining odometry, GPS and the terrain labels from the audio classifier, we are able to construct a terrain labelled trajectory of the robot in the environment which is then used to label the radar scans. Using a curriculum learning procedure, we then train a radar segmentation network to generalise beyond the initial labelling and to detect all permissible driving routes in the environment. △ Less

Submitted 22 September, 2020; v1 submitted 11 May, 2020; originally announced May 2020.

Comments: accepted for publication at the IEEE Intelligent Transportation Systems Conference (ITSC) 2020

arXiv:2005.02031 [pdf, other]

Sense-Assess-eXplain (SAX): Building Trust in Autonomous Vehicles in Challenging Real-World Driving Scenarios

Authors: Matthew Gadd, Daniele De Martini, Letizia Marchegiani, Paul Newman, Lars Kunze

Abstract: This paper discusses ongoing work in demonstrating research in mobile autonomy in challenging driving scenarios. In our approach, we address fundamental technical issues to overcome critical barriers to assurance and regulation for large-scale deployments of autonomous systems. To this end, we present how we build robots that (1) can robustly sense and interpret their environment using traditional… ▽ More This paper discusses ongoing work in demonstrating research in mobile autonomy in challenging driving scenarios. In our approach, we address fundamental technical issues to overcome critical barriers to assurance and regulation for large-scale deployments of autonomous systems. To this end, we present how we build robots that (1) can robustly sense and interpret their environment using traditional as well as unconventional sensors; (2) can assess their own capabilities; and (3), vitally in the purpose of assurance and trust, can provide causal explanations of their interpretations and assessments. As it is essential that robots are safe and trusted, we design, develop, and demonstrate fundamental technologies in real-world applications to overcome critical barriers which impede the current deployment of robots in economically and socially important areas. Finally, we describe ongoing work in the collection of an unusual, rare, and highly valuable dataset. △ Less

Submitted 5 May, 2020; originally announced May 2020.

Comments: accepted for publication at the IEEE Intelligent Vehicles Symposium (IV), Workshop on Ensuring and Validating Safety for Automated Vehicles (EVSAV), 2020, project URL: https://ori.ox.ac.uk/projects/sense-assess-explain-sax

arXiv:2004.03451 [pdf, other]

RSS-Net: Weakly-Supervised Multi-Class Semantic Segmentation with FMCW Radar

Authors: Prannay Kaul, Daniele De Martini, Matthew Gadd, Paul Newman

Abstract: This paper presents an efficient annotation procedure and an application thereof to end-to-end, rich semantic segmentation of the sensed environment using FMCW scanning radar. We advocate radar over the traditional sensors used for this task as it operates at longer ranges and is substantially more robust to adverse weather and illumination conditions. We avoid laborious manual labelling by exploi… ▽ More This paper presents an efficient annotation procedure and an application thereof to end-to-end, rich semantic segmentation of the sensed environment using FMCW scanning radar. We advocate radar over the traditional sensors used for this task as it operates at longer ranges and is substantially more robust to adverse weather and illumination conditions. We avoid laborious manual labelling by exploiting the largest radar-focused urban autonomy dataset collected to date, correlating radar scans with RGB cameras and LiDAR sensors, for which semantic segmentation is an already consolidated procedure. The training procedure leverages a state-of-the-art natural image segmentation system which is publicly available and as such, in contrast to previous approaches, allows for the production of copious labels for the radar stream by incorporating four camera and two LiDAR streams. Additionally, the losses are computed taking into account labels to the radar sensor horizon by accumulating LiDAR returns along a pose-chain ahead and behind of the current vehicle position. Finally, we present the network with multi-channel radar scan inputs in order to deal with ephemeral and dynamic scene objects. △ Less

Submitted 2 April, 2020; originally announced April 2020.

Comments: submitted to IEEE Intelligent Vehicles Symposium (IV) 2020

arXiv:2003.04708 [pdf, other]

LiDAR Lateral Localisation Despite Challenging Occlusion from Traffic

Authors: Tarlan Suleymanov, Matthew Gadd, Lars Kunze, Paul Newman

Abstract: This paper presents a system for improving the robustness of LiDAR lateral localisation systems. This is made possible by including detections of road boundaries which are invisible to the sensor (due to occlusion, e.g. traffic) but can be located by our Occluded Road Boundary Inference Deep Neural Network. We show an example application in which fusion of a camera stream is used to initialise the… ▽ More This paper presents a system for improving the robustness of LiDAR lateral localisation systems. This is made possible by including detections of road boundaries which are invisible to the sensor (due to occlusion, e.g. traffic) but can be located by our Occluded Road Boundary Inference Deep Neural Network. We show an example application in which fusion of a camera stream is used to initialise the lateral localisation. We demonstrate over four driven forays through central Oxford - totalling 40 km of driving - a gain in performance that inferring of occluded road boundaries brings. △ Less

Submitted 10 March, 2020; originally announced March 2020.

Comments: accepted for publication at the IEEE/ION Position, Location and Navigation Symposium (PLANS) 2020

arXiv:2003.04699 [pdf, other]

Look Around You: Sequence-based Radar Place Recognition with Learned Rotational Invariance

Authors: Matthew Gadd, Daniele De Martini, Paul Newman

Abstract: This paper details an application which yields significant improvements to the adeptness of place recognition with Frequency-Modulated Continuous-Wave radar - a commercially promising sensor poised for exploitation in mobile autonomy. We show how a rotationally-invariant metric embedding for radar scans can be integrated into sequence-based trajectory matching systems typically applied to videos t… ▽ More This paper details an application which yields significant improvements to the adeptness of place recognition with Frequency-Modulated Continuous-Wave radar - a commercially promising sensor poised for exploitation in mobile autonomy. We show how a rotationally-invariant metric embedding for radar scans can be integrated into sequence-based trajectory matching systems typically applied to videos taken by visual sensors. Due to the complete horizontal field of view inherent to the radar scan formation process, we show how this off-the-shelf sequence-based trajectory matching system can be manipulated to detect place matches when the vehicle is travelling down a previously visited stretch of road in the opposite direction. We demonstrate the efficacy of the approach on 26 km of challenging urban driving taken from the largest radar-focused urban autonomy dataset released to date -- showing a boost of 30% in recall at high levels of precision over a nearest neighbour approach. △ Less

Submitted 10 March, 2020; originally announced March 2020.

Comments: accepted for publication at the IEEE/ION Position, Location and Navigation Symposium (PLANS) 2020

arXiv:2002.10152 [pdf, other]

Real-time Kinematic Ground Truth for the Oxford RobotCar Dataset

Authors: Will Maddern, Geoffrey Pascoe, Matthew Gadd, Dan Barnes, Brian Yeomans, Paul Newman

Abstract: We describe the release of reference data towards a challenging long-term localisation and map** benchmark based on the large-scale Oxford RobotCar Dataset. The release includes 72 traversals of a route through Oxford, UK, gathered in all illumination, weather and traffic conditions, and is representative of the conditions an autonomous vehicle would be expected to operate reliably in. Using pos… ▽ More We describe the release of reference data towards a challenging long-term localisation and map** benchmark based on the large-scale Oxford RobotCar Dataset. The release includes 72 traversals of a route through Oxford, UK, gathered in all illumination, weather and traffic conditions, and is representative of the conditions an autonomous vehicle would be expected to operate reliably in. Using post-processed raw GPS, IMU, and static GNSS base station recordings, we have produced a globally-consistent centimetre-accurate ground truth for the entire year-long duration of the dataset. Coupled with a planned online benchmarking service, we hope to enable quantitative evaluation and comparison of different localisation and map** approaches focusing on long-term autonomy for road vehicles in urban environments challenged by changing weather. △ Less

Submitted 24 February, 2020; originally announced February 2020.

Comments: Dataset website: https://robotcar-dataset.robots.ox.ac.uk/

arXiv:2001.09438 [pdf, other]

Kidnapped Radar: Topological Radar Localisation using Rotationally-Invariant Metric Learning

Authors: Ştefan Săftescu, Matthew Gadd, Daniele De Martini, Dan Barnes, Paul Newman

Abstract: This paper presents a system for robust, large-scale topological localisation using Frequency-Modulated Continuous-Wave (FMCW) scanning radar. We learn a metric space for embedding polar radar scans using CNN and NetVLAD architectures traditionally applied to the visual domain. However, we tailor the feature extraction for more suitability to the polar nature of radar scan formation using cylindri… ▽ More This paper presents a system for robust, large-scale topological localisation using Frequency-Modulated Continuous-Wave (FMCW) scanning radar. We learn a metric space for embedding polar radar scans using CNN and NetVLAD architectures traditionally applied to the visual domain. However, we tailor the feature extraction for more suitability to the polar nature of radar scan formation using cylindrical convolutions, anti-aliasing blurring, and azimuth-wise max-pooling; all in order to bolster the rotational invariance. The enforced metric space is then used to encode a reference trajectory, serving as a map, which is queried for nearest neighbours (NNs) for recognition of places at run-time. We demonstrate the performance of our topological localisation system over the course of many repeat forays using the largest radar-focused mobile autonomy dataset released to date, totalling 280 km of urban driving, a small portion of which we also use to learn the weights of the modified architecture. As this work represents a novel application for FMCW radar, we analyse the utility of the proposed method via a comprehensive set of metrics which provide insight into the efficacy when used in a realistic system, showing improved performance over the root architecture even in the face of random rotational perturbation. △ Less

Submitted 26 January, 2020; originally announced January 2020.

Comments: submitted to the 2020 International Conference on Robotics and Automation (ICRA)

arXiv:1909.01300 [pdf, other]

The Oxford Radar RobotCar Dataset: A Radar Extension to the Oxford RobotCar Dataset

Authors: Dan Barnes, Matthew Gadd, Paul Murcutt, Paul Newman, Ingmar Posner

Abstract: In this paper we present The Oxford Radar RobotCar Dataset, a new dataset for researching scene understanding using Millimetre-Wave FMCW scanning radar data. The target application is autonomous vehicles where this modality is robust to environmental conditions such as fog, rain, snow, or lens flare, which typically challenge other sensor modalities such as vision and LIDAR. The data were gather… ▽ More In this paper we present The Oxford Radar RobotCar Dataset, a new dataset for researching scene understanding using Millimetre-Wave FMCW scanning radar data. The target application is autonomous vehicles where this modality is robust to environmental conditions such as fog, rain, snow, or lens flare, which typically challenge other sensor modalities such as vision and LIDAR. The data were gathered in January 2019 over thirty-two traversals of a central Oxford route spanning a total of 280km of urban driving. It encompasses a variety of weather, traffic, and lighting conditions. This 4.7TB dataset consists of over 240,000 scans from a Navtech CTS350-X radar and 2.4 million scans from two Velodyne HDL-32E 3D LIDARs; along with six cameras, two 2D LIDARs, and a GPS/INS receiver. In addition we release ground truth optimised radar odometry to provide an additional impetus to research in this domain. The full dataset is available for download at: ori.ox.ac.uk/datasets/radar-robotcar-dataset △ Less

Submitted 26 February, 2020; v1 submitted 3 September, 2019; originally announced September 2019.

Comments: The Oxford Radar RobotCar Dataset Website: http://ori.ox.ac.uk/datasets/radar-robotcar-dataset

arXiv:1801.05607 [pdf, other]

The Data Market: Policies for Decentralised Visual Localisation

Authors: Matthew Gadd, Paul Newman

Abstract: This paper presents a mercantile framework for the decentralised sharing of navigation expertise amongst a fleet of robots which perform regular missions into a common but variable environment. We build on our earlier work and allow individual agents to intermittently initiate trades based on a real-time assessment of the nature of their missions or demand for localisation capability, and to choos… ▽ More This paper presents a mercantile framework for the decentralised sharing of navigation expertise amongst a fleet of robots which perform regular missions into a common but variable environment. We build on our earlier work and allow individual agents to intermittently initiate trades based on a real-time assessment of the nature of their missions or demand for localisation capability, and to choose trading partners with discrimination based on an internally evolving set of beliefs in the expected value of trading with each other member of the team. To this end, we suggest some obligatory properties that a formalisation of the distributed versioning of experience maps should exhibit, to ensure the eventual convergence in the state of each agent's map under a sequence of pairwise exchanges, as well as the uninterrupted integrity of the representation under versioning operations. To mitigate limitations in hardware and network resources, the "data market" is catalogued by distinct sections of the world, which the agents treat as "products" for appraisal and purchase. To this end, we demonstrate and evaluate our system using the publicly available Oxford RobotCar Dataset, the hand-labelled data market catalogue (approaching 446km of fully indexed sections-of-interest) for which we plan to release alongside the existing raw stereo imagery. We show that, by refining market policies over time, agents achieve improved localisation in a directed and accelerated manner. △ Less

Submitted 17 January, 2018; originally announced January 2018.

Showing 1–32 of 32 results for author: Gadd, M