Search | arXiv e-print repository

Image-based Geolocalization by Ground-to-2.5D Map Matching

Authors: Mengjie Zhou, Liu Liu, Yiran Zhong, Andrew Calway

Abstract: We study the image-based geolocalization problem, aiming to localize ground-view query images on cartographic maps. Current methods often utilize cross-view localization techniques to match ground-view query images with 2D maps. However, the performance of these methods is unsatisfactory due to significant cross-view appearance differences. In this paper, we lift cross-view matching to a 2.5D spac… ▽ More We study the image-based geolocalization problem, aiming to localize ground-view query images on cartographic maps. Current methods often utilize cross-view localization techniques to match ground-view query images with 2D maps. However, the performance of these methods is unsatisfactory due to significant cross-view appearance differences. In this paper, we lift cross-view matching to a 2.5D space, where heights of structures (e.g., trees and buildings) provide geometric information to guide the cross-view matching. We propose a new approach to learning representative embeddings from multi-modal data. Specifically, we establish a projection relationship between 2.5D space and 2D aerial-view space. The projection is further used to combine multi-modal features from the 2.5D and 2D maps using an effective pixel-to-point fusion method. By encoding crucial geometric cues, our method learns discriminative location embeddings for matching panoramic images and maps. Additionally, we construct the first large-scale ground-to-2.5D map geolocalization dataset to validate our method and facilitate future research. Both single-image based and route based localization experiments are conducted to test our method. Extensive experiments demonstrate that the proposed method achieves significantly higher localization accuracy and faster convergence than previous 2D map-based approaches. △ Less

Submitted 3 November, 2023; v1 submitted 11 August, 2023; originally announced August 2023.

arXiv:2209.07919 [pdf, other]

iDF-SLAM: End-to-End RGB-D SLAM with Neural Implicit Map** and Deep Feature Tracking

Authors: Yuhang Ming, Weicai Ye, Andrew Calway

Abstract: We propose a novel end-to-end RGB-D SLAM, iDF-SLAM, which adopts a feature-based deep neural tracker as the front-end and a NeRF-style neural implicit mapper as the back-end. The neural implicit mapper is trained on-the-fly, while though the neural tracker is pretrained on the ScanNet dataset, it is also finetuned along with the training of the neural implicit mapper. Under such a design, our iDF-… ▽ More We propose a novel end-to-end RGB-D SLAM, iDF-SLAM, which adopts a feature-based deep neural tracker as the front-end and a NeRF-style neural implicit mapper as the back-end. The neural implicit mapper is trained on-the-fly, while though the neural tracker is pretrained on the ScanNet dataset, it is also finetuned along with the training of the neural implicit mapper. Under such a design, our iDF-SLAM is capable of learning to use scene-specific features for camera tracking, thus enabling lifelong learning of the SLAM system. Both the training for the tracker and the mapper are self-supervised without introducing ground truth poses. We test the performance of our iDF-SLAM on the Replica and ScanNet datasets and compare the results to the two recent NeRF-based neural SLAM systems. The proposed iDF-SLAM demonstrates state-of-the-art results in terms of scene reconstruction and competitive performance in camera tracking. △ Less

Submitted 16 September, 2022; originally announced September 2022.

Comments: 7 pages, 6 figures, 3 tables

arXiv:2204.09015 [pdf, other]

Dual-Domain Image Synthesis using Segmentation-Guided GAN

Authors: Dena Bazazian, Andrew Calway, Dima Damen

Abstract: We introduce a segmentation-guided approach to synthesise images that integrate features from two distinct domains. Images synthesised by our dual-domain model belong to one domain within the semantic mask, and to another in the rest of the image - smoothly integrated. We build on the successes of few-shot StyleGAN and single-shot semantic segmentation to minimise the amount of training required i… ▽ More We introduce a segmentation-guided approach to synthesise images that integrate features from two distinct domains. Images synthesised by our dual-domain model belong to one domain within the semantic mask, and to another in the rest of the image - smoothly integrated. We build on the successes of few-shot StyleGAN and single-shot semantic segmentation to minimise the amount of training required in utilising two domains. The method combines a few-shot cross-domain StyleGAN with a latent optimiser to achieve images containing features of two distinct domains. We use a segmentation-guided perceptual loss, which compares both pixel-level and activations between domain-specific and dual-domain synthetic images. Results demonstrate qualitatively and quantitatively that our model is capable of synthesising dual-domain images on a variety of objects (faces, horses, cats, cars), domains (natural, caricature, sketches) and part-based masks (eyes, nose, mouth, hair, car bonnet). The code is publicly available at: https://github.com/denabazazian/Dual-Domain-Synthesis. △ Less

Submitted 19 April, 2022; originally announced April 2022.

Comments: CVPR2022 Workshops. 14 pages, 19 figures

arXiv:2203.13861 [pdf, other]

doi 10.1109/ICRA46639.2022.9812049

FD-SLAM: 3-D Reconstruction Using Features and Dense Matching

Authors: Xingrui Yang, Yuhang Ming, Zhaopeng Cui, Andrew Calway

Abstract: It is well known that visual SLAM systems based on dense matching are locally accurate but are also susceptible to long-term drift and map corruption. In contrast, feature matching methods can achieve greater long-term consistency but can suffer from inaccurate local pose estimation when feature information is sparse. Based on these observations, we propose an RGB-D SLAM system that leverages the… ▽ More It is well known that visual SLAM systems based on dense matching are locally accurate but are also susceptible to long-term drift and map corruption. In contrast, feature matching methods can achieve greater long-term consistency but can suffer from inaccurate local pose estimation when feature information is sparse. Based on these observations, we propose an RGB-D SLAM system that leverages the advantages of both approaches: using dense frame-to-model odometry to build accurate sub-maps and on-the-fly feature-based matching across sub-maps for global map optimisation. In addition, we incorporate a learning-based loop closure component based on 3-D features which further stabilises map building. We have evaluated the approach on indoor sequences from public datasets, and the results show that it performs on par or better than state-of-the-art systems in terms of map reconstruction quality and pose estimation. The approach can also scale to large scenes where other systems often fail. △ Less

Submitted 25 March, 2022; originally announced March 2022.

arXiv:2202.02070 [pdf, other]

CGiS-Net: Aggregating Colour, Geometry and Implicit Semantic Features for Indoor Place Recognition

Authors: Yuhang Ming, Xingrui Yang, Guofeng Zhang, Andrew Calway

Abstract: We describe a novel approach to indoor place recognition from RGB point clouds based on aggregating low-level colour and geometry features with high-level implicit semantic features. It uses a 2-stage deep learning framework, in which the first stage is trained for the auxiliary task of semantic segmentation and the second stage uses features from layers in the first stage to generate discriminate… ▽ More We describe a novel approach to indoor place recognition from RGB point clouds based on aggregating low-level colour and geometry features with high-level implicit semantic features. It uses a 2-stage deep learning framework, in which the first stage is trained for the auxiliary task of semantic segmentation and the second stage uses features from layers in the first stage to generate discriminate descriptors for place recognition. The auxiliary task encourages the features to be semantically meaningful, hence aggregating the geometry and colour in the RGB point cloud data with implicit semantic information. We use an indoor place recognition dataset derived from the ScanNet dataset for training and evaluation, with a test set comprising 3,608 point clouds generated from 100 different rooms. Comparison with a traditional feature-based method and four state-of-the-art deep learning methods demonstrate that our approach significantly outperforms all five methods, achieving, for example, a top-3 average recall rate of 75% compared with 41% for the closest rival method. Our code is available at: https://github.com/YuhangMing/Semantic-Indoor-Place-Recognition △ Less

Submitted 11 July, 2022; v1 submitted 4 February, 2022; originally announced February 2022.

Comments: Accepted by 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2022)

arXiv:2108.02522 [pdf, other]

Object-Augmented RGB-D SLAM for Wide-Disparity Relocalisation

Authors: Yuhang Ming, Xingrui Yang, Andrew Calway

Abstract: We propose a novel object-augmented RGB-D SLAM system that is capable of constructing a consistent object map and performing relocalisation based on centroids of objects in the map. The approach aims to overcome the view dependence of appearance-based relocalisation methods using point features or images. During the map construction, we use a pre-trained neural network to detect objects and estima… ▽ More We propose a novel object-augmented RGB-D SLAM system that is capable of constructing a consistent object map and performing relocalisation based on centroids of objects in the map. The approach aims to overcome the view dependence of appearance-based relocalisation methods using point features or images. During the map construction, we use a pre-trained neural network to detect objects and estimate 6D poses from RGB-D data. An incremental probabilistic model is used to aggregate estimates over time to create the object map. Then in relocalisation, we use the same network to extract objects-of-interest in the `lost' frames. Pairwise geometric matching finds correspondences between map and frame objects, and probabilistic absolute orientation followed by application of iterative closest point to dense depth maps and object centroids gives relocalisation. Results of experiments in desktop environments demonstrate very high success rates even for frames with widely different viewpoints from those used to construct the map, significantly outperforming two appearance-based methods. △ Less

Submitted 5 August, 2021; originally announced August 2021.

Comments: Accepted by 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2021)

arXiv:1911.08797 [pdf, other]

You Are Here: Geolocation by Embedding Maps and Images

Authors: Noe Samano, Mengjie Zhou, Andrew Calway

Abstract: We present a novel approach to geolocalising panoramic images on a 2-D cartographic map based on learning a low dimensional embedded space, which allows a comparison between an image captured at a location and local neighbourhoods of the map. The representation is not sufficiently discriminatory to allow localisation from a single image, but when concatenated along a route, localisation converges… ▽ More We present a novel approach to geolocalising panoramic images on a 2-D cartographic map based on learning a low dimensional embedded space, which allows a comparison between an image captured at a location and local neighbourhoods of the map. The representation is not sufficiently discriminatory to allow localisation from a single image, but when concatenated along a route, localisation converges quickly, with over 90% accuracy being achieved for routes of around 200m in length when using Google Street View and Open Street Map data. The method generalises a previous fixed semantic feature based approach and achieves significantly higher localisation accuracy and faster convergence. △ Less

Submitted 20 July, 2020; v1 submitted 20 November, 2019; originally announced November 2019.

Comments: 18 pages, new version accepted for ECCV 2020 (poster), with new results on publicly available dataset and comparison with implementation of previously published alternative approach

arXiv:1904.04523 [pdf, other]

Simultaneous drone localisation and wind turbine model fitting during autonomous surface inspection

Authors: Oliver Moolan-Feroze, Konstantinos Karachalios, Dimitrios N. Nikolaidis, Andrew Calway

Abstract: We present a method for simultaneous localisation and wind turbine model fitting for a drone performing an automated surface inspection. We use a skeletal parameterisation of the turbine that can be easily integrated into a non-linear least squares optimiser, combined with a pose graph representation of the drone's 3-D trajectory, allowing us to optimise both sets of parameters simultaneously. Giv… ▽ More We present a method for simultaneous localisation and wind turbine model fitting for a drone performing an automated surface inspection. We use a skeletal parameterisation of the turbine that can be easily integrated into a non-linear least squares optimiser, combined with a pose graph representation of the drone's 3-D trajectory, allowing us to optimise both sets of parameters simultaneously. Given images from an onboard camera, we use a CNN to infer projections of the skeletal model, enabling correspondence constraints to be established through a cost function. This is then coupled with GPS/IMU measurements taken at key frames in the graph to allow successive optimisation as the drone navigates around the turbine. We present two variants of the cost function, one based on traditional 2D point correspondences and the other on direct image interpolation within the inferred projections. Results from experiments on simulated and real-world data show that simultaneous optimisation provides improvements to localisation over only optimising the pose and that combined use of both cost functions proves most effective. △ Less

Submitted 9 April, 2019; originally announced April 2019.

Comments: Submitted to IROS2019

arXiv:1902.10474 [pdf, other]

Improving drone localisation around wind turbines using monocular model-based tracking

Authors: Oliver Moolan-Feroze, Konstantinos Karachalios, Dimitrios N. Nikolaidis, Andrew Calway

Abstract: We present a novel method of integrating image-based measurements into a drone navigation system for the automated inspection of wind turbines. We take a model-based tracking approach, where a 3D skeleton representation of the turbine is matched to the image data. Matching is based on comparing the projection of the representation to that inferred from images using a convolutional neural network.… ▽ More We present a novel method of integrating image-based measurements into a drone navigation system for the automated inspection of wind turbines. We take a model-based tracking approach, where a 3D skeleton representation of the turbine is matched to the image data. Matching is based on comparing the projection of the representation to that inferred from images using a convolutional neural network. This enables us to find image correspondences using a generic turbine model that can be applied to a wide range of turbine shapes and sizes. To estimate 3D pose of the drone, we fuse the network output with GPS and IMU measurements using a pose graph optimiser. Results illustrate that the use of the image measurements significantly improves the accuracy of the localisation over that obtained using GPS and IMU alone. △ Less

Submitted 27 February, 2019; originally announced February 2019.

Comments: Accepted at for the International Conference on Robotics and Automation

arXiv:1803.01577 [pdf, other]

Predicting Out-of-View Feature Points for Model-Based Camera Pose Estimation

Authors: Oliver Moolan-Feroze, Andrew Calway

Abstract: In this work we present a novel framework that uses deep learning to predict object feature points that are out-of-view in the input image. This system was developed with the application of model-based tracking in mind, particularly in the case of autonomous inspection robots, where only partial views of the object are available. Out-of-view prediction is enabled by applying scaling to the feature… ▽ More In this work we present a novel framework that uses deep learning to predict object feature points that are out-of-view in the input image. This system was developed with the application of model-based tracking in mind, particularly in the case of autonomous inspection robots, where only partial views of the object are available. Out-of-view prediction is enabled by applying scaling to the feature point labels during network training. This is combined with a recurrent neural network architecture designed to provide the final prediction layers with rich feature information from across the spatial extent of the input image. To show the versatility of these out-of-view predictions, we describe how to integrate them in both a particle filter tracker and an optimisation based tracker. To evaluate our work we compared our framework with one that predicts only points inside the image. We show that as the amount of the object in view decreases, being able to predict outside the image bounds adds robustness to the final pose estimation. △ Less

Submitted 5 March, 2018; originally announced March 2018.

Comments: Submitted to IROS 2018

arXiv:1803.00788 [pdf, other]

Automated Map Reading: Image Based Localisation in 2-D Maps Using Binary Semantic Descriptors

Authors: Pilailuck Panphattarasap, Andrew Calway

Abstract: We describe a novel approach to image based localisation in urban environments using semantic matching between images and a 2-D map. It contrasts with the vast majority of existing approaches which use image to image database matching. We use highly compact binary descriptors to represent semantic features at locations, significantly increasing scalability compared with existing methods and having… ▽ More We describe a novel approach to image based localisation in urban environments using semantic matching between images and a 2-D map. It contrasts with the vast majority of existing approaches which use image to image database matching. We use highly compact binary descriptors to represent semantic features at locations, significantly increasing scalability compared with existing methods and having the potential for greater invariance to variable imaging conditions. The approach is also more akin to human map reading, making it more suited to human-system interaction. The binary descriptors indicate the presence or not of semantic features relating to buildings and road junctions in discrete viewing directions. We use CNN classifiers to detect the features in images and match descriptor estimates with a database of location tagged descriptors derived from the 2-D map. In isolation, the descriptors are not sufficiently discriminative, but when concatenated sequentially along a route, their combination becomes highly distinctive and allows localisation even when using non-perfect classifiers. Performance is further improved by taking into account left or right turns over a route. Experimental results obtained using Google StreetView and OpenStreetMap data show that the approach has considerable potential, achieving localisation accuracy of around 85% using routes corresponding to approximately 200 meters. △ Less

Submitted 2 March, 2018; originally announced March 2018.

Comments: 8 pages, submitted to IEEE/RSJ International Conference on Intelligent Robots and Systems 2018

arXiv:1608.04274 [pdf, other]

Visual place recognition using landmark distribution descriptors

Authors: Pilailuck Panphattarasap, Andrew Calway

Abstract: Recent work by Suenderhauf et al. [1] demonstrated improved visual place recognition using proposal regions coupled with features from convolutional neural networks (CNN) to match landmarks between views. In this work we extend the approach by introducing descriptors built from landmark features which also encode the spatial distribution of the landmarks within a view. Matching descriptors then en… ▽ More Recent work by Suenderhauf et al. [1] demonstrated improved visual place recognition using proposal regions coupled with features from convolutional neural networks (CNN) to match landmarks between views. In this work we extend the approach by introducing descriptors built from landmark features which also encode the spatial distribution of the landmarks within a view. Matching descriptors then enforces consistency of the relative positions of landmarks between views. This has a significant impact on performance. For example, in experiments on 10 image-pair datasets, each consisting of 200 urban locations with significant differences in viewing positions and conditions, we recorded average precision of around 70% (at 100% recall), compared with 58% obtained using whole image CNN features and 50% for the method in [1]. △ Less

Submitted 15 August, 2016; originally announced August 2016.

Comments: 13 pages

arXiv:1604.00895 [pdf, other]

HDRFusion: HDR SLAM using a low-cost auto-exposure RGB-D sensor

Authors: Shuda Li, Ankur Handa, Yang Zhang, Andrew Calway

Abstract: We describe a new method for comparing frame appearance in a frame-to-model 3-D map** and tracking system using an low dynamic range (LDR) RGB-D camera which is robust to brightness changes caused by auto exposure. It is based on a normalised radiance measure which is invariant to exposure changes and not only robustifies the tracking under changing lighting conditions, but also enables the foll… ▽ More We describe a new method for comparing frame appearance in a frame-to-model 3-D map** and tracking system using an low dynamic range (LDR) RGB-D camera which is robust to brightness changes caused by auto exposure. It is based on a normalised radiance measure which is invariant to exposure changes and not only robustifies the tracking under changing lighting conditions, but also enables the following exposure compensation perform accurately to allow online building of high dynamic range (HDR) maps. The latter facilitates the frame-to-model tracking to minimise drift as well as better capturing light variation within the scene. Results from experiments with synthetic and real data demonstrate that the method provides both improved tracking and maps with far greater dynamic range of luminosity. △ Less

Submitted 4 April, 2016; originally announced April 2016.

Comments: 14 pages

Showing 1–13 of 13 results for author: Calway, A