Search | arXiv e-print repository

Stereo-NEC: Enhancing Stereo Visual-Inertial SLAM Initialization with Normal Epipolar Constraints

Authors: Weihan Wang, Chieh Chou, Ganesh Sevagamoorthy, Kevin Chen, Zheng Chen, Ziyue Feng, Youjie Xia, Feiyang Cai, Yi Xu, Philippos Mordohai

Abstract: We propose an accurate and robust initialization approach for stereo visual-inertial SLAM systems. Unlike the current state-of-the-art method, which heavily relies on the accuracy of a pure visual SLAM system to estimate inertial variables without updating camera poses, potentially compromising accuracy and robustness, our approach offers a different solution. We realize the crucial impact of prec… ▽ More We propose an accurate and robust initialization approach for stereo visual-inertial SLAM systems. Unlike the current state-of-the-art method, which heavily relies on the accuracy of a pure visual SLAM system to estimate inertial variables without updating camera poses, potentially compromising accuracy and robustness, our approach offers a different solution. We realize the crucial impact of precise gyroscope bias estimation on rotation accuracy. This, in turn, affects trajectory accuracy due to the accumulation of translation errors. To address this, we first independently estimate the gyroscope bias and use it to formulate a maximum a posteriori problem for further refinement. After this refinement, we proceed to update the rotation estimation by performing IMU integration with gyroscope bias removed from gyroscope measurements. We then leverage robust and accurate rotation estimates to enhance translation estimation via 3-DoF bundle adjustment. Moreover, we introduce a novel approach for determining the success of the initialization by evaluating the residual of the normal epipolar constraint. Extensive evaluations on the EuRoC dataset illustrate that our method excels in accuracy and robustness. It outperforms ORB-SLAM3, the current leading stereo visual-inertial initialization method, in terms of absolute trajectory error and relative rotation error, while maintaining competitive computational speed. Notably, even with 5 keyframes for initialization, our method consistently surpasses the state-of-the-art approach using 10 keyframes in rotation accuracy. △ Less

Submitted 11 March, 2024; originally announced March 2024.

arXiv:2308.08715 [pdf, other]

V-FUSE: Volumetric Depth Map Fusion with Long-Range Constraints

Authors: Nathaniel Burgdorfer, Philippos Mordohai

Abstract: We introduce a learning-based depth map fusion framework that accepts a set of depth and confidence maps generated by a Multi-View Stereo (MVS) algorithm as input and improves them. This is accomplished by integrating volumetric visibility constraints that encode long-range surface relationships across different views into an end-to-end trainable architecture. We also introduce a depth search wind… ▽ More We introduce a learning-based depth map fusion framework that accepts a set of depth and confidence maps generated by a Multi-View Stereo (MVS) algorithm as input and improves them. This is accomplished by integrating volumetric visibility constraints that encode long-range surface relationships across different views into an end-to-end trainable architecture. We also introduce a depth search window estimation sub-network trained jointly with the larger fusion sub-network to reduce the depth hypothesis search space along each ray. Our method learns to model depth consensus and violations of visibility constraints directly from the data; effectively removing the necessity of fine-tuning fusion parameters. Extensive experiments on MVS datasets show substantial improvements in the accuracy of the output fused depth and confidence maps. △ Less

Submitted 16 August, 2023; originally announced August 2023.

Comments: ICCV 2023

arXiv:2308.02670 [pdf, other]

EDI: ESKF-based Disjoint Initialization for Visual-Inertial SLAM Systems

Authors: Weihan Wang, Jiani Li, Yuhang Ming, Philippos Mordohai

Abstract: Visual-inertial initialization can be classified into joint and disjoint approaches. Joint approaches tackle both the visual and the inertial parameters together by aligning observations from feature-bearing points based on IMU integration then use a closed-form solution with visual and acceleration observations to find initial velocity and gravity. In contrast, disjoint approaches independently s… ▽ More Visual-inertial initialization can be classified into joint and disjoint approaches. Joint approaches tackle both the visual and the inertial parameters together by aligning observations from feature-bearing points based on IMU integration then use a closed-form solution with visual and acceleration observations to find initial velocity and gravity. In contrast, disjoint approaches independently solve the Structure from Motion (SFM) problem and determine inertial parameters from up-to-scale camera poses obtained from pure monocular SLAM. However, previous disjoint methods have limitations, like assuming negligible acceleration bias impact or accurate rotation estimation by pure monocular SLAM. To address these issues, we propose EDI, a novel approach for fast, accurate, and robust visual-inertial initialization. Our method incorporates an Error-state Kalman Filter (ESKF) to estimate gyroscope bias and correct rotation estimates from monocular SLAM, overcoming dependence on pure monocular SLAM for rotation estimation. To estimate the scale factor without prior information, we offer a closed-form solution for initial velocity, scale, gravity, and acceleration bias estimation. To address gravity and acceleration bias coupling, we introduce weights in the linear least-squares equations, ensuring acceleration bias observability and handling outliers. Extensive evaluation on the EuRoC dataset shows that our method achieves an average scale error of 5.8% in less than 3 seconds, outperforming other state-of-the-art disjoint visual-inertial initialization approaches, even in challenging environments and with artificial noise corruption. △ Less

Submitted 4 August, 2023; originally announced August 2023.

arXiv:2304.02704 [pdf, other]

Real-Time Dense 3D Map** of Underwater Environments

Authors: Weihan Wang, Bharat Joshi, Nathaniel Burgdorfer, Konstantinos Batsos, Alberto Quattrini Li, Philippos Mordohai, Ioannis Rekleitis

Abstract: This paper addresses real-time dense 3D reconstruction for a resource-constrained Autonomous Underwater Vehicle (AUV). Underwater vision-guided operations are among the most challenging as they combine 3D motion in the presence of external forces, limited visibility, and absence of global positioning. Obstacle avoidance and effective path planning require online dense reconstructions of the enviro… ▽ More This paper addresses real-time dense 3D reconstruction for a resource-constrained Autonomous Underwater Vehicle (AUV). Underwater vision-guided operations are among the most challenging as they combine 3D motion in the presence of external forces, limited visibility, and absence of global positioning. Obstacle avoidance and effective path planning require online dense reconstructions of the environment. Autonomous operation is central to environmental monitoring, marine archaeology, resource utilization, and underwater cave exploration. To address this problem, we propose to use SVIn2, a robust VIO method, together with a real-time 3D reconstruction pipeline. We provide extensive evaluation on four challenging underwater datasets. Our pipeline produces comparable reconstruction with that of COLMAP, the state-of-the-art offline 3D reconstruction method, at high frame rates on a single CPU. △ Less

Submitted 5 April, 2023; originally announced April 2023.

arXiv:2304.00152 [pdf, other]

Learning the Distribution of Errors in Stereo Matching for Joint Disparity and Uncertainty Estimation

Authors: Liyan Chen, Weihan Wang, Philippos Mordohai

Abstract: We present a new loss function for joint disparity and uncertainty estimation in deep stereo matching. Our work is motivated by the need for precise uncertainty estimates and the observation that multi-task learning often leads to improved performance in all tasks. We show that this can be achieved by requiring the distribution of uncertainty to match the distribution of disparity errors via a KL… ▽ More We present a new loss function for joint disparity and uncertainty estimation in deep stereo matching. Our work is motivated by the need for precise uncertainty estimates and the observation that multi-task learning often leads to improved performance in all tasks. We show that this can be achieved by requiring the distribution of uncertainty to match the distribution of disparity errors via a KL divergence term in the network's loss function. A differentiable soft-histogramming technique is used to approximate the distributions so that they can be used in the loss. We experimentally assess the effectiveness of our approach and observe significant improvements in both disparity and uncertainty prediction on large datasets. △ Less

Submitted 31 March, 2023; originally announced April 2023.

Comments: CVPR 2023

MSC Class: 65D19

arXiv:2109.02740 [pdf, other]

doi 10.1016/j.cviu.2022.103384

Single-Camera 3D Head Fitting for Mixed Reality Clinical Applications

Authors: Tejas Mane, Aylar Bayramova, Kostas Daniilidis, Philippos Mordohai, Elena Bernardis

Abstract: We address the problem of estimating the shape of a person's head, defined as the geometry of the complete head surface, from a video taken with a single moving camera, and determining the alignment of the fitted 3D head for all video frames, irrespective of the person's pose. 3D head reconstructions commonly tend to focus on perfecting the face reconstruction, leaving the scalp to a statistical a… ▽ More We address the problem of estimating the shape of a person's head, defined as the geometry of the complete head surface, from a video taken with a single moving camera, and determining the alignment of the fitted 3D head for all video frames, irrespective of the person's pose. 3D head reconstructions commonly tend to focus on perfecting the face reconstruction, leaving the scalp to a statistical approximation. Our goal is to reconstruct the head model of each person to enable future mixed reality applications. To do this, we recover a dense 3D reconstruction and camera information via structure-from-motion and multi-view stereo. These are then used in a new two-stage fitting process to recover the 3D head shape by iteratively fitting a 3D morphable model of the head with the dense reconstruction in canonical space and fitting it to each person's head, using both traditional facial landmarks and scalp features extracted from the head's segmentation mask. Our approach recovers consistent geometry for varying head shapes, from videos taken by different people, with different smartphones, and in a variety of environments from living rooms to outdoor spaces. △ Less

Submitted 7 March, 2022; v1 submitted 6 September, 2021; originally announced September 2021.

arXiv:2010.07350 [pdf, other]

Do End-to-end Stereo Algorithms Under-utilize Information?

Authors: Changjiang Cai, Philippos Mordohai

Abstract: Deep networks for stereo matching typically leverage 2D or 3D convolutional encoder-decoder architectures to aggregate cost and regularize the cost volume for accurate disparity estimation. Due to content-insensitive convolutions and down-sampling and up-sampling operations, these cost aggregation mechanisms do not take full advantage of the information available in the images. Disparity maps suff… ▽ More Deep networks for stereo matching typically leverage 2D or 3D convolutional encoder-decoder architectures to aggregate cost and regularize the cost volume for accurate disparity estimation. Due to content-insensitive convolutions and down-sampling and up-sampling operations, these cost aggregation mechanisms do not take full advantage of the information available in the images. Disparity maps suffer from over-smoothing near occlusion boundaries, and erroneous predictions in thin structures. In this paper, we show how deep adaptive filtering and differentiable semi-global aggregation can be integrated in existing 2D and 3D convolutional networks for end-to-end stereo matching, leading to improved accuracy. The improvements are due to utilizing RGB information from the images as a signal to dynamically guide the matching process, in addition to being the signal we attempt to match across the images. We show extensive experimental results on the KITTI 2015 and Virtual KITTI 2 datasets comparing four stereo networks (DispNetC, GCNet, PSMNet and GANet) after integrating four adaptive filters (segmentation-aware bilateral filtering, dynamic filtering networks, pixel adaptive convolution and semi-global aggregation) into their architectures. Our code is available at https://github.com/ccj5351/DAFStereoNets. △ Less

Submitted 14 October, 2020; originally announced October 2020.

Comments: 13 pages, 10 figures, International Conference on 3D Vision (3DV'2020)

arXiv:2010.07347 [pdf, other]

Matching-space Stereo Networks for Cross-domain Generalization

Authors: Changjiang Cai, Matteo Poggi, Stefano Mattoccia, Philippos Mordohai

Abstract: End-to-end deep networks represent the state of the art for stereo matching. While excelling on images framing environments similar to the training set, major drops in accuracy occur in unseen domains (e.g., when moving from synthetic to real scenes). In this paper we introduce a novel family of architectures, namely Matching-Space Networks (MS-Nets), with improved generalization properties. By re… ▽ More End-to-end deep networks represent the state of the art for stereo matching. While excelling on images framing environments similar to the training set, major drops in accuracy occur in unseen domains (e.g., when moving from synthetic to real scenes). In this paper we introduce a novel family of architectures, namely Matching-Space Networks (MS-Nets), with improved generalization properties. By replacing learning-based feature extraction from image RGB values with matching functions and confidence measures from conventional wisdom, we move the learning process from the color space to the Matching Space, avoiding over-specialization to domain specific features. Extensive experimental results on four real datasets highlight that our proposal leads to superior generalization to unseen environments over conventional deep architectures, kee** accuracy on the source domain almost unaltered. Our code is available at https://github.com/ccj5351/MS-Nets. △ Less

Submitted 14 October, 2020; originally announced October 2020.

Comments: 14 pages, 8 figures, International Conference on 3D Vision (3DV'2020), Github code at https://github.com/ccj5351/MS-Nets

arXiv:2004.08566 [pdf, other]

On the Synergies between Machine Learning and Binocular Stereo for Depth Estimation from Images: a Survey

Authors: Matteo Poggi, Fabio Tosi, Konstantinos Batsos, Philippos Mordohai, Stefano Mattoccia

Abstract: Stereo matching is one of the longest-standing problems in computer vision with close to 40 years of studies and research. Throughout the years the paradigm has shifted from local, pixel-level decision to various forms of discrete and continuous optimization to data-driven, learning-based methods. Recently, the rise of machine learning and the rapid proliferation of deep learning enhanced stereo m… ▽ More Stereo matching is one of the longest-standing problems in computer vision with close to 40 years of studies and research. Throughout the years the paradigm has shifted from local, pixel-level decision to various forms of discrete and continuous optimization to data-driven, learning-based methods. Recently, the rise of machine learning and the rapid proliferation of deep learning enhanced stereo matching with new exciting trends and applications unthinkable until a few years ago. Interestingly, the relationship between these two worlds is two-way. While machine, and especially deep, learning advanced the state-of-the-art in stereo matching, stereo itself enabled new ground-breaking methodologies such as self-supervised monocular depth estimation based on deep networks. In this paper, we review recent research in the field of learning-based depth estimation from single and binocular images highlighting the synergies, the successes achieved so far and the open challenges the community is going to face in the immediate future. △ Less

Submitted 31 March, 2021; v1 submitted 18 April, 2020; originally announced April 2020.

Comments: Accepted to TPAMI. Paper version of our CVPR 2019 tutorial: "Learning-based depth estimation from stereo and monocular images: successes, limitations and future challenges" (https://sites.google.com/view/cvpr-2019-depth-from-image/home)

arXiv:1905.02553 [pdf, other]

Oriented Point Sampling for Plane Detection in Unorganized Point Clouds

Authors: Bo Sun, Philippos Mordohai

Abstract: Plane detection in 3D point clouds is a crucial pre-processing step for applications such as point cloud segmentation, semantic map** and SLAM. In contrast to many recent plane detection methods that are only applicable on organized point clouds, our work is targeted to unorganized point clouds that do not permit a 2D parametrization. We compare three methods for detecting planes in point clouds… ▽ More Plane detection in 3D point clouds is a crucial pre-processing step for applications such as point cloud segmentation, semantic map** and SLAM. In contrast to many recent plane detection methods that are only applicable on organized point clouds, our work is targeted to unorganized point clouds that do not permit a 2D parametrization. We compare three methods for detecting planes in point clouds efficiently. One is a novel method proposed in this paper that generates plane hypotheses by sampling from a set of points with estimated normals. We named this method Oriented Point Sampling (OPS) to contrast with more conventional techniques that require the sampling of three unoriented points to generate plane hypotheses. We also implemented an efficient plane detection method based on local sampling of three unoriented points and compared it with OPS and the 3D-KHT algorithm, which is based on octrees, on the detection of planes on 10,000 point clouds from the SUN RGB-D dataset. △ Less

Submitted 3 May, 2019; originally announced May 2019.

Comments: 7 pages, 3 figures, 2019 IEEE International Conference on Robotics and Automation (Accepted)

arXiv:1804.01967 [pdf, other]

CBMV: A Coalesced Bidirectional Matching Volume for Disparity Estimation

Authors: Konstantinos Batsos, Changjiang Cai, Philippos Mordohai

Abstract: Recently, there has been a paradigm shift in stereo matching with learning-based methods achieving the best results on all popular benchmarks. The success of these methods is due to the availability of training data with ground truth; training learning-based systems on these datasets has allowed them to surpass the accuracy of conventional approaches based on heuristics and assumptions. Many of th… ▽ More Recently, there has been a paradigm shift in stereo matching with learning-based methods achieving the best results on all popular benchmarks. The success of these methods is due to the availability of training data with ground truth; training learning-based systems on these datasets has allowed them to surpass the accuracy of conventional approaches based on heuristics and assumptions. Many of these assumptions, however, had been validated extensively and hold for the majority of possible inputs. In this paper, we generate a matching volume leveraging both data with ground truth and conventional wisdom. We accomplish this by coalescing diverse evidence from a bidirectional matching process via random forest classifiers. We show that the resulting matching volume estimation method achieves similar accuracy to purely data-driven alternatives on benchmarks and that it generalizes to unseen data much better. In fact, the results we submitted to the KITTI and ETH3D benchmarks were generated using a classifier trained on the Middlebury 2014 dataset. △ Less

Submitted 5 April, 2018; originally announced April 2018.

Comments: Accepted to Computer Vision and Pattern Recognition (CVPR) 2018

arXiv:1706.01966 [pdf, other]

Controlling a Robotic Stereo Camera Under Image Quantization Noise

Authors: Charles Freundlich, Yan Zhang, Alex Zihao Zhu, Philippos Mordohai, Michael M. Zavlanos

Abstract: In this paper, we address the problem of controlling a mobile stereo camera under image quantization noise. Assuming that a pair of images of a set of targets is available, the camera moves through a sequence of Next-Best-Views (NBVs), i.e., a sequence of views that minimize the trace of the targets' cumulative state covariance, constructed using a realistic model of the stereo rig that captures i… ▽ More In this paper, we address the problem of controlling a mobile stereo camera under image quantization noise. Assuming that a pair of images of a set of targets is available, the camera moves through a sequence of Next-Best-Views (NBVs), i.e., a sequence of views that minimize the trace of the targets' cumulative state covariance, constructed using a realistic model of the stereo rig that captures image quantization noise and a Kalman Filter (KF) that fuses the observation history with new information. The proposed algorithm decomposes control into two stages: first the NBV is computed in the camera relative coordinates, and then the camera moves to realize this view in the fixed global coordinate frame. This decomposition allows the camera to drive to a new pose that effectively realizes the NBV in camera coordinates while satisfying Field-of-View constraints in global coordinates, a task that is particularly challenging using complex sensing models. We provide simulations and real experiments that illustrate the ability of the proposed mobile camera system to accurately localize sets of targets. We also propose a novel data-driven technique to characterize unmodeled uncertainty, such as calibration errors, at the pixel level and show that this method ensures stability of the KF. △ Less

Submitted 13 January, 2018; v1 submitted 6 June, 2017; originally announced June 2017.

Comments: International Journal of Robotics Research, October 2017

arXiv:1312.6826 [pdf, other]

3D Interest Point Detection via Discriminative Learning

Authors: Leizer Teran, Philippos Mordohai

Abstract: The task of detecting the interest points in 3D meshes has typically been handled by geometric methods. These methods, while greatly describing human preference, can be ill-equipped for handling the variety and subjectivity in human responses. Different tasks have different requirements for interest point detection; some tasks may necessitate high precision while other tasks may require high recal… ▽ More The task of detecting the interest points in 3D meshes has typically been handled by geometric methods. These methods, while greatly describing human preference, can be ill-equipped for handling the variety and subjectivity in human responses. Different tasks have different requirements for interest point detection; some tasks may necessitate high precision while other tasks may require high recall. Sometimes points with high curvature may be desirable, while in other cases high curvature may be an indication of noise. Geometric methods lack the required flexibility to adapt to such changes. As a consequence, interest point detection seems to be well suited for machine learning methods that can be trained to match the criteria applied on the annotated training data. In this paper, we formulate interest point detection as a supervised binary classification problem using a random forest as our classifier. Among other challenges, we are faced with an imbalanced learning problem due to the substantial difference in the priors between interest and non-interest points. We address this by re-sampling the training set. We validate the accuracy of our method and compare our results to those of five state of the art methods on a new, standard benchmark. △ Less

Submitted 24 December, 2013; originally announced December 2013.

Showing 1–13 of 13 results for author: Mordohai, P