-
Online Mutual Adaptation of Deep Depth Prediction and Visual SLAM
Authors:
Shing Yan Loo,
Moein Shakeri,
Sai Hong Tang,
Syamsiah Mashohor,
Hong Zhang
Abstract:
The ability of accurate depth prediction by a convolutional neural network (CNN) is a major challenge for its wide use in practical visual simultaneous localization and map** (SLAM) applications, such as enhanced camera tracking and dense map**. This paper is set out to answer the following question: Can we tune a depth prediction CNN with the help of a visual SLAM algorithm even if the CNN is…
▽ More
The ability of accurate depth prediction by a convolutional neural network (CNN) is a major challenge for its wide use in practical visual simultaneous localization and map** (SLAM) applications, such as enhanced camera tracking and dense map**. This paper is set out to answer the following question: Can we tune a depth prediction CNN with the help of a visual SLAM algorithm even if the CNN is not trained for the current operating environment in order to benefit the SLAM performance? To this end, we propose a novel online adaptation framework consisting of two complementary processes: a SLAM algorithm that is used to generate keyframes to fine-tune the depth prediction and another algorithm that uses the online adapted depth to improve map quality. Once the potential noisy map points are removed, we perform global photometric bundle adjustment (BA) to improve the overall SLAM performance. Experimental results on both benchmark datasets and a real robot in our own experimental environments show that our proposed method improves the overall SLAM accuracy. While regularization has been shown to be effective in multi-task classification problems, we present experimental results and an ablation study to show the effectiveness of regularization in preventing catastrophic forgetting in the online adaptation of depth prediction, a single-task regression problem. In addition, we compare our online adaptation framework against the state-of-the-art pre-trained depth prediction CNNs to show that our online adapted depth prediction CNN outperforms the depth prediction CNNs that have been trained on a large collection of datasets.
△ Less
Submitted 1 February, 2022; v1 submitted 7 November, 2021;
originally announced November 2021.
-
Polarimetric Monocular Dense Map** Using Relative Deep Depth Prior
Authors:
Moein Shakeri,
Shing Yan Loo,
Hong Zhang
Abstract:
This paper is concerned with polarimetric dense map reconstruction based on a polarization camera with the help of relative depth information as a prior. In general, polarization imaging is able to reveal information about surface normal such as azimuth and zenith angles, which can support the development of solutions to the problem of dense reconstruction, especially in texture-poor regions. Howe…
▽ More
This paper is concerned with polarimetric dense map reconstruction based on a polarization camera with the help of relative depth information as a prior. In general, polarization imaging is able to reveal information about surface normal such as azimuth and zenith angles, which can support the development of solutions to the problem of dense reconstruction, especially in texture-poor regions. However, polarimetric shape cues are ambiguous due to two types of polarized reflection (specular/diffuse). Although methods have been proposed to address this issue, they either are offline and therefore not practical in robotics applications, or use incomplete polarimetric cues, leading to sub-optimal performance. In this paper, we propose an online reconstruction method that uses full polarimetric cues available from the polarization camera. With our online method, we can propagate sparse depth values both along and perpendicular to iso-depth contours. Through comprehensive experiments on challenging image sequences, we demonstrate that our method is able to significantly improve the accuracy of the depthmap as well as increase its density, specially in regions of poor texture.
△ Less
Submitted 9 February, 2021;
originally announced February 2021.
-
DeepRelativeFusion: Dense Monocular SLAM using Single-Image Relative Depth Prediction
Authors:
Shing Yan Loo,
Syamsiah Mashohor,
Sai Hong Tang,
Hong Zhang
Abstract:
In this paper, we propose a dense monocular SLAM system, named DeepRelativeFusion, that is capable to recover a globally consistent 3D structure. To this end, we use a visual SLAM algorithm to reliably recover the camera poses and semi-dense depth maps of the keyframes, and then use relative depth prediction to densify the semi-dense depth maps and refine the keyframe pose-graph. To improve the se…
▽ More
In this paper, we propose a dense monocular SLAM system, named DeepRelativeFusion, that is capable to recover a globally consistent 3D structure. To this end, we use a visual SLAM algorithm to reliably recover the camera poses and semi-dense depth maps of the keyframes, and then use relative depth prediction to densify the semi-dense depth maps and refine the keyframe pose-graph. To improve the semi-dense depth maps, we propose an adaptive filtering scheme, which is a structure-preserving weighted average smoothing filter that takes into account the pixel intensity and depth of the neighbouring pixels, yielding substantial reconstruction accuracy gain in densification. To perform densification, we introduce two incremental improvements upon the energy minimization framework proposed by DeepFusion: (1) an improved cost function, and (2) the use of single-image relative depth prediction. After densification, we update the keyframes with two-view consistent optimized semi-dense and dense depth maps to improve pose-graph optimization, providing a feedback loop to refine the keyframe poses for accurate scene reconstruction. Our system outperforms the state-of-the-art dense SLAM systems quantitatively in dense reconstruction accuracy by a large margin.
△ Less
Submitted 9 July, 2021; v1 submitted 7 June, 2020;
originally announced June 2020.
-
Semi-Supervised Monocular Depth Estimation with Left-Right Consistency Using Deep Neural Network
Authors:
Ali Jahani Amiri,
Shing Yan Loo,
Hong Zhang
Abstract:
There has been tremendous research progress in estimating the depth of a scene from a monocular camera image. Existing methods for single-image depth prediction are exclusively based on deep neural networks, and their training can be unsupervised using stereo image pairs, supervised using LiDAR point clouds, or semi-supervised using both stereo and LiDAR. In general, semi-supervised training is pr…
▽ More
There has been tremendous research progress in estimating the depth of a scene from a monocular camera image. Existing methods for single-image depth prediction are exclusively based on deep neural networks, and their training can be unsupervised using stereo image pairs, supervised using LiDAR point clouds, or semi-supervised using both stereo and LiDAR. In general, semi-supervised training is preferred as it does not suffer from the weaknesses of either supervised training, resulting from the difference in the cameras and the LiDARs field of view, or unsupervised training, resulting from the poor depth accuracy that can be recovered from a stereo pair. In this paper, we present our research in single image depth prediction using semi-supervised training that outperforms the state-of-the-art. We achieve this through a loss function that explicitly exploits left-right consistency in a stereo reconstruction, which has not been adopted in previous semi-supervised training. In addition, we describe the correct use of ground truth depth derived from LiDAR that can significantly reduce prediction error. The performance of our depth prediction model is evaluated on popular datasets, and the importance of each aspect of our semi-supervised training approach is demonstrated through experimental results. Our deep neural network model has been made publicly available.
△ Less
Submitted 18 May, 2019;
originally announced May 2019.
-
CNN-SVO: Improving the Map** in Semi-Direct Visual Odometry Using Single-Image Depth Prediction
Authors:
Shing Yan Loo,
Ali Jahani Amiri,
Syamsiah Mashohor,
Sai Hong Tang,
Hong Zhang
Abstract:
Reliable feature correspondence between frames is a critical step in visual odometry (VO) and visual simultaneous localization and map** (V-SLAM) algorithms. In comparison with existing VO and V-SLAM algorithms, semi-direct visual odometry (SVO) has two main advantages that lead to state-of-the-art frame rate camera motion estimation: direct pixel correspondence and efficient implementation of p…
▽ More
Reliable feature correspondence between frames is a critical step in visual odometry (VO) and visual simultaneous localization and map** (V-SLAM) algorithms. In comparison with existing VO and V-SLAM algorithms, semi-direct visual odometry (SVO) has two main advantages that lead to state-of-the-art frame rate camera motion estimation: direct pixel correspondence and efficient implementation of probabilistic map** method. This paper improves the SVO map** by initializing the mean and the variance of the depth at a feature location according to the depth prediction from a single-image depth prediction network. By significantly reducing the depth uncertainty of the initialized map point (i.e., small variance centred about the depth prediction), the benefits are twofold: reliable feature correspondence between views and fast convergence to the true depth in order to create new map points. We evaluate our method with two outdoor datasets: KITTI dataset and Oxford Robotcar dataset. The experimental results indicate that the improved SVO map** results in increased robustness and camera tracking accuracy.
△ Less
Submitted 1 October, 2018;
originally announced October 2018.