Search | arXiv e-print repository

Breaking the Frame: Image Retrieval by Visual Overlap Prediction

Authors: Tong Wei, Philipp Lindenberger, Jiri Matas, Daniel Barath

Abstract: We propose a novel visual place recognition approach, VOP, that efficiently addresses occlusions and complex scenes by shifting from traditional reliance on global image similarities and local features to image overlap prediction. The proposed method enables the identification of visible image sections without requiring expensive feature detection and matching. By focusing on obtaining patch-level… ▽ More We propose a novel visual place recognition approach, VOP, that efficiently addresses occlusions and complex scenes by shifting from traditional reliance on global image similarities and local features to image overlap prediction. The proposed method enables the identification of visible image sections without requiring expensive feature detection and matching. By focusing on obtaining patch-level embeddings by a Vision Transformer backbone and establishing patch-to-patch correspondences, our approach uses a voting mechanism to assess overlap scores for potential database images, thereby providing a nuanced image retrieval metric in challenging scenarios. VOP leads to more accurate relative pose estimation and localization results on the retrieved image pairs than state-of-the-art baselines on a number of large-scale, real-world datasets. The code is available at https://github.com/weitong8591/vop. △ Less

Submitted 23 June, 2024; originally announced June 2024.

arXiv:2406.05849 [pdf, other]

MAP-ADAPT: Real-Time Quality-Adaptive Semantic 3D Maps

Authors: Jianhao Zheng, Daniel Barath, Marc Pollefeys, Iro Armeni

Abstract: Creating 3D semantic reconstructions of environments is fundamental to many applications, especially when related to autonomous agent operation (e.g., goal-oriented navigation or object interaction and manipulation). Commonly, 3D semantic reconstruction systems capture the entire scene in the same level of detail. However, certain tasks (e.g., object interaction) require a fine-grained and high-re… ▽ More Creating 3D semantic reconstructions of environments is fundamental to many applications, especially when related to autonomous agent operation (e.g., goal-oriented navigation or object interaction and manipulation). Commonly, 3D semantic reconstruction systems capture the entire scene in the same level of detail. However, certain tasks (e.g., object interaction) require a fine-grained and high-resolution map, particularly if the objects to interact are of small size or intricate geometry. In recent practice, this leads to the entire map being in the same high-quality resolution, which results in increased computational and storage costs. To address this challenge, we propose MAP-ADAPT, a real-time method for quality-adaptive semantic 3D reconstruction using RGBD frames. MAP-ADAPT is the first adaptive semantic 3D map** algorithm that, unlike prior work, generates directly a single map with regions of different quality based on both the semantic information and the geometric complexity of the scene. Leveraging a semantic SLAM pipeline for pose and semantic estimation, we achieve comparable or superior results to state-of-the-art methods on synthetic and real-world data, while significantly reducing storage and computation requirements. △ Less

Submitted 9 June, 2024; originally announced June 2024.

arXiv:2404.14565 [pdf, other]

"Where am I?" Scene Retrieval with Language

Authors: Jiaqi Chen, Daniel Barath, Iro Armeni, Marc Pollefeys, Hermann Blum

Abstract: Natural language interfaces to embodied AI are becoming more ubiquitous in our daily lives. This opens further opportunities for language-based interaction with embodied agents, such as a user instructing an agent to execute some task in a specific location. For example, "put the bowls back in the cupboard next to the fridge" or "meet me at the intersection under the red sign." As such, we need me… ▽ More Natural language interfaces to embodied AI are becoming more ubiquitous in our daily lives. This opens further opportunities for language-based interaction with embodied agents, such as a user instructing an agent to execute some task in a specific location. For example, "put the bowls back in the cupboard next to the fridge" or "meet me at the intersection under the red sign." As such, we need methods that interface between natural language and map representations of the environment. To this end, we explore the question of whether we can use an open-set natural language query to identify a scene represented by a 3D scene graph. We define this task as "language-based scene-retrieval" and it is closely related to "coarse-localization," but we are instead searching for a match from a collection of disjoint scenes and not necessarily a large-scale continuous map. Therefore, we present Text2SceneGraphMatcher, a "scene-retrieval" pipeline that learns joint embeddings between text descriptions and scene graphs to determine if they are matched. The code, trained models, and datasets will be made public. △ Less

Submitted 22 April, 2024; originally announced April 2024.

arXiv:2404.00469 [pdf, other]

SceneGraphLoc: Cross-Modal Coarse Visual Localization on 3D Scene Graphs

Authors: Yang Miao, Francis Engelmann, Olga Vysotska, Federico Tombari, Marc Pollefeys, Dániel Béla Baráth

Abstract: We introduce a novel problem, i.e., the localization of an input image within a multi-modal reference map represented by a database of 3D scene graphs. These graphs comprise multiple modalities, including object-level point clouds, images, attributes, and relationships between objects, offering a lightweight and efficient alternative to conventional methods that rely on extensive image databases.… ▽ More We introduce a novel problem, i.e., the localization of an input image within a multi-modal reference map represented by a database of 3D scene graphs. These graphs comprise multiple modalities, including object-level point clouds, images, attributes, and relationships between objects, offering a lightweight and efficient alternative to conventional methods that rely on extensive image databases. Given the available modalities, the proposed method SceneGraphLoc learns a fixed-sized embedding for each node (i.e., representing an object instance) in the scene graph, enabling effective matching with the objects visible in the input query image. This strategy significantly outperforms other cross-modal methods, even without incorporating images into the map embeddings. When images are leveraged, SceneGraphLoc achieves performance close to that of state-of-the-art techniques depending on large image databases, while requiring three orders-of-magnitude less storage and operating orders-of-magnitude faster. The code will be made public. △ Less

Submitted 8 July, 2024; v1 submitted 30 March, 2024; originally announced April 2024.

arXiv:2404.00429 [pdf, other]

Multiway Point Cloud Mosaicking with Diffusion and Global Optimization

Authors: Shengze **, Iro Armeni, Marc Pollefeys, Daniel Barath

Abstract: We introduce a novel framework for multiway point cloud mosaicking (named Wednesday), designed to co-align sets of partially overlap** point clouds -- typically obtained from 3D scanners or moving RGB-D cameras -- into a unified coordinate system. At the core of our approach is ODIN, a learned pairwise registration algorithm that iteratively identifies overlaps and refines attention scores, empl… ▽ More We introduce a novel framework for multiway point cloud mosaicking (named Wednesday), designed to co-align sets of partially overlap** point clouds -- typically obtained from 3D scanners or moving RGB-D cameras -- into a unified coordinate system. At the core of our approach is ODIN, a learned pairwise registration algorithm that iteratively identifies overlaps and refines attention scores, employing a diffusion-based process for denoising pairwise correlation matrices to enhance matching accuracy. Further steps include constructing a pose graph from all point clouds, performing rotation averaging, a novel robust algorithm for re-estimating translations optimally in terms of consensus maximization and translation optimization. Finally, the point cloud rotations and positions are optimized jointly by a diffusion-based approach. Tested on four diverse, large-scale datasets, our method achieves state-of-the-art pairwise and multiway registration results by a large margin on all benchmarks. Our code and models are available at https://github.com/**sz/Multiway-Point-Cloud-Mosaicking-with-Diffusion-and-Global-Optimization. △ Less

Submitted 30 March, 2024; originally announced April 2024.

arXiv:2309.16040 [pdf, other]

Handbook on Leveraging Lines for Two-View Relative Pose Estimation

Authors: Petr Hruby, Shaohui Liu, Rémi Pautrat, Marc Pollefeys, Daniel Barath

Abstract: We propose an approach for estimating the relative pose between calibrated image pairs by jointly exploiting points, lines, and their coincidences in a hybrid manner. We investigate all possible configurations where these data modalities can be used together and review the minimal solvers available in the literature. Our hybrid framework combines the advantages of all configurations, enabling robu… ▽ More We propose an approach for estimating the relative pose between calibrated image pairs by jointly exploiting points, lines, and their coincidences in a hybrid manner. We investigate all possible configurations where these data modalities can be used together and review the minimal solvers available in the literature. Our hybrid framework combines the advantages of all configurations, enabling robust and accurate estimation in challenging environments. In addition, we design a method for jointly estimating multiple vanishing point correspondences in two images, and a bundle adjustment that considers all relevant data modalities. Experiments on various indoor and outdoor datasets show that our approach outperforms point-based methods, improving AUC@10$^\circ$ by 1-7 points while running at comparable speeds. The source code of the solvers and hybrid framework will be made public. △ Less

Submitted 27 September, 2023; originally announced September 2023.

Comments: 2 view relative pose from special configurations of line

MSC Class: 68T45 ACM Class: I.4.5; I.4.8

arXiv:2309.16023 [pdf, other]

Q-REG: End-to-End Trainable Point Cloud Registration with Surface Curvature

Authors: Shengze **, Daniel Barath, Marc Pollefeys, Iro Armeni

Abstract: Point cloud registration has seen recent success with several learning-based methods that focus on correspondence matching and, as such, optimize only for this objective. Following the learning step of correspondence matching, they evaluate the estimated rigid transformation with a RANSAC-like framework. While it is an indispensable component of these methods, it prevents a fully end-to-end traini… ▽ More Point cloud registration has seen recent success with several learning-based methods that focus on correspondence matching and, as such, optimize only for this objective. Following the learning step of correspondence matching, they evaluate the estimated rigid transformation with a RANSAC-like framework. While it is an indispensable component of these methods, it prevents a fully end-to-end training, leaving the objective to minimize the pose error nonserved. We present a novel solution, Q-REG, which utilizes rich geometric information to estimate the rigid pose from a single correspondence. Q-REG allows to formalize the robust estimation as an exhaustive search, hence enabling end-to-end training that optimizes over both objectives of correspondence matching and rigid pose estimation. We demonstrate in the experiments that Q-REG is agnostic to the correspondence matching method and provides consistent improvement both when used only in inference and in end-to-end training. It sets a new state-of-the-art on the 3DMatch, KITTI, and ModelNet benchmarks. △ Less

Submitted 27 September, 2023; originally announced September 2023.

arXiv:2309.14737 [pdf, other]

Volumetric Semantically Consistent 3D Panoptic Map**

Authors: Yang Miao, Iro Armeni, Marc Pollefeys, Daniel Barath

Abstract: We introduce an online 2D-to-3D semantic instance map** algorithm aimed at generating comprehensive, accurate, and efficient semantic 3D maps suitable for autonomous agents in unstructured environments. The proposed approach is based on a Voxel-TSDF representation used in recent algorithms. It introduces novel ways of integrating semantic prediction confidence during map**, producing semantic… ▽ More We introduce an online 2D-to-3D semantic instance map** algorithm aimed at generating comprehensive, accurate, and efficient semantic 3D maps suitable for autonomous agents in unstructured environments. The proposed approach is based on a Voxel-TSDF representation used in recent algorithms. It introduces novel ways of integrating semantic prediction confidence during map**, producing semantic and instance-consistent 3D regions. Further improvements are achieved by graph optimization-based semantic labeling and instance refinement. The proposed method achieves accuracy superior to the state of the art on public large-scale datasets, improving on a number of widely used metrics. We also highlight a downfall in the evaluation of recent studies: using the ground truth trajectory as input instead of a SLAM-estimated one substantially affects the accuracy, creating a large gap between the reported results and the actual performance on real-world data. △ Less

Submitted 8 July, 2024; v1 submitted 26 September, 2023; originally announced September 2023.

Comments: 8 pages, 2 figures

arXiv:2308.10694 [pdf, other]

Vanishing Point Estimation in Uncalibrated Images with Prior Gravity Direction

Authors: Rémi Pautrat, Shaohui Liu, Petr Hruby, Marc Pollefeys, Daniel Barath

Abstract: We tackle the problem of estimating a Manhattan frame, i.e. three orthogonal vanishing points, and the unknown focal length of the camera, leveraging a prior vertical direction. The direction can come from an Inertial Measurement Unit that is a standard component of recent consumer devices, e.g., smartphones. We provide an exhaustive analysis of minimal line configurations and derive two new 2-lin… ▽ More We tackle the problem of estimating a Manhattan frame, i.e. three orthogonal vanishing points, and the unknown focal length of the camera, leveraging a prior vertical direction. The direction can come from an Inertial Measurement Unit that is a standard component of recent consumer devices, e.g., smartphones. We provide an exhaustive analysis of minimal line configurations and derive two new 2-line solvers, one of which does not suffer from singularities affecting existing solvers. Additionally, we design a new non-minimal method, running on an arbitrary number of lines, to boost the performance in local optimization. Combining all solvers in a hybrid robust estimator, our method achieves increased accuracy even with a rough prior. Experiments on synthetic and real-world datasets demonstrate the superior accuracy of our method compared to the state of the art, while having comparable runtimes. We further demonstrate the applicability of our solvers for relative rotation estimation. The code is available at https://github.com/cvg/VP-Estimation-with-Prior-Gravity. △ Less

Submitted 21 August, 2023; originally announced August 2023.

Comments: Accepted at ICCV 2023

arXiv:2307.15381 [pdf, other]

AffineGlue: Joint Matching and Robust Estimation

Authors: Daniel Barath, Dmytro Mishkin, Luca Cavalli, Paul-Edouard Sarlin, Petr Hruby, Marc Pollefeys

Abstract: We propose AffineGlue, a method for joint two-view feature matching and robust estimation that reduces the combinatorial complexity of the problem by employing single-point minimal solvers. AffineGlue selects potential matches from one-to-many correspondences to estimate minimal models. Guided matching is then used to find matches consistent with the model, suffering less from the ambiguities of o… ▽ More We propose AffineGlue, a method for joint two-view feature matching and robust estimation that reduces the combinatorial complexity of the problem by employing single-point minimal solvers. AffineGlue selects potential matches from one-to-many correspondences to estimate minimal models. Guided matching is then used to find matches consistent with the model, suffering less from the ambiguities of one-to-one matches. Moreover, we derive a new minimal solver for homography estimation, requiring only a single affine correspondence (AC) and a gravity prior. Furthermore, we train a neural network to reject ACs that are unlikely to lead to a good model. AffineGlue is superior to the SOTA on real-world datasets, even when assuming that the gravity direction points downwards. On PhotoTourism, the AUC@10° score is improved by 6.6 points compared to the SOTA. On ScanNet, AffineGlue makes SuperPoint and SuperGlue achieve similar accuracy as the detector-free LoFTR. △ Less

Submitted 28 July, 2023; originally announced July 2023.

arXiv:2307.14030 [pdf, other]

Consensus-Adaptive RANSAC

Authors: Luca Cavalli, Daniel Barath, Marc Pollefeys, Viktor Larsson

Abstract: RANSAC and its variants are widely used for robust estimation, however, they commonly follow a greedy approach to finding the highest scoring model while ignoring other model hypotheses. In contrast, Iteratively Reweighted Least Squares (IRLS) techniques gradually approach the model by iteratively updating the weight of each correspondence based on the residuals from previous iterations. Inspired… ▽ More RANSAC and its variants are widely used for robust estimation, however, they commonly follow a greedy approach to finding the highest scoring model while ignoring other model hypotheses. In contrast, Iteratively Reweighted Least Squares (IRLS) techniques gradually approach the model by iteratively updating the weight of each correspondence based on the residuals from previous iterations. Inspired by these methods, we propose a new RANSAC framework that learns to explore the parameter space by considering the residuals seen so far via a novel attention layer. The attention mechanism operates on a batch of point-to-model residuals, and updates a per-point estimation state to take into account the consensus found through a lightweight one-step transformer. This rich state then guides the minimal sampling between iterations as well as the model refinement. We evaluate the proposed approach on essential and fundamental matrix estimation on a number of indoor and outdoor datasets. It outperforms state-of-the-art estimators by a significant margin adding only a small runtime overhead. Moreover, we demonstrate good generalization properties of our trained model, indicating its effectiveness across different datasets and tasks. The proposed attention mechanism and one-step transformer provide an adaptive behavior that enhances the performance of RANSAC, making it a more effective tool for robust estimation. Code is available at https://github.com/cavalli1234/CA-RANSAC. △ Less

Submitted 26 July, 2023; originally announced July 2023.

arXiv:2306.12547 [pdf, other]

DGC-GNN: Leveraging Geometry and Color Cues for Visual Descriptor-Free 2D-3D Matching

Authors: Shuzhe Wang, Juho Kannala, Daniel Barath

Abstract: Matching 2D keypoints in an image to a sparse 3D point cloud of the scene without requiring visual descriptors has garnered increased interest due to its low memory requirements, inherent privacy preservation, and reduced need for expensive 3D model maintenance compared to visual descriptor-based methods. However, existing algorithms often compromise on performance, resulting in a significant dete… ▽ More Matching 2D keypoints in an image to a sparse 3D point cloud of the scene without requiring visual descriptors has garnered increased interest due to its low memory requirements, inherent privacy preservation, and reduced need for expensive 3D model maintenance compared to visual descriptor-based methods. However, existing algorithms often compromise on performance, resulting in a significant deterioration compared to their descriptor-based counterparts. In this paper, we introduce DGC-GNN, a novel algorithm that employs a global-to-local Graph Neural Network (GNN) that progressively exploits geometric and color cues to represent keypoints, thereby improving matching accuracy. Our procedure encodes both Euclidean and angular relations at a coarse level, forming the geometric embedding to guide the point matching. We evaluate DGC-GNN on both indoor and outdoor datasets, demonstrating that it not only doubles the accuracy of the state-of-the-art visual descriptor-free algorithm but also substantially narrows the performance gap between descriptor-based and descriptor-free methods. △ Less

Submitted 24 March, 2024; v1 submitted 21 June, 2023; originally announced June 2023.

Comments: CVPR 2024

arXiv:2304.14880 [pdf, other]

SGAligner : 3D Scene Alignment with Scene Graphs

Authors: Sayan Deb Sarkar, Ondrej Miksik, Marc Pollefeys, Daniel Barath, Iro Armeni

Abstract: Building 3D scene graphs has recently emerged as a topic in scene representation for several embodied AI applications to represent the world in a structured and rich manner. With their increased use in solving downstream tasks (eg, navigation and room rearrangement), can we leverage and recycle them for creating 3D maps of environments, a pivotal step in agent operation? We focus on the fundamenta… ▽ More Building 3D scene graphs has recently emerged as a topic in scene representation for several embodied AI applications to represent the world in a structured and rich manner. With their increased use in solving downstream tasks (eg, navigation and room rearrangement), can we leverage and recycle them for creating 3D maps of environments, a pivotal step in agent operation? We focus on the fundamental problem of aligning pairs of 3D scene graphs whose overlap can range from zero to partial and can contain arbitrary changes. We propose SGAligner, the first method for aligning pairs of 3D scene graphs that is robust to in-the-wild scenarios (ie, unknown overlap -- if any -- and changes in the environment). We get inspired by multi-modality knowledge graphs and use contrastive learning to learn a joint, multi-modal embedding space. We evaluate on the 3RScan dataset and further showcase that our method can be used for estimating the transformation between pairs of 3D scenes. Since benchmarks for these tasks are missing, we create them on this dataset. The code, benchmark, and trained models are available on the project website. △ Less

Submitted 26 September, 2023; v1 submitted 28 April, 2023; originally announced April 2023.

Comments: Accepted at ICCV 2023

arXiv:2303.16078 [pdf, other]

Relative pose of three calibrated and partially calibrated cameras from four points using virtual correspondences

Authors: Charalambos Tzamos, Daniel Barath, Torsten Sattler, Zuzana Kukelova

Abstract: We study challenging problems of estimating the relative pose of three cameras and propose novel efficient solutions to (1) the notoriously difficult configuration of four points in three calibrated views, known as the 4p3v problem, and (2) to the previously unsolved configuration of four points in three cameras with unknown shared focal length, i.e., the 4p3vf problem. Our solutions are based on… ▽ More We study challenging problems of estimating the relative pose of three cameras and propose novel efficient solutions to (1) the notoriously difficult configuration of four points in three calibrated views, known as the 4p3v problem, and (2) to the previously unsolved configuration of four points in three cameras with unknown shared focal length, i.e., the 4p3vf problem. Our solutions are based on the simple idea of generating one or two additional virtual point correspondences in two views by using the information from the locations of the four input correspondences in the three views. We generate such correspondences using either a very simple and efficient strategy where the new points are the mean points of three corresponding input points or using a simple neural network. The new solvers are efficient and easy to implement since they are based on existing efficient minimal solvers, i.e., the well-known 5-point and 6-point relative pose solvers and the P3P solver. Our solvers achieve state-of-the-art results on real data. △ Less

Submitted 11 December, 2023; v1 submitted 28 March, 2023; originally announced March 2023.

arXiv:2303.05195 [pdf, other]

Revisiting Rotation Averaging: Uncertainties and Robust Losses

Authors: Ganlin Zhang, Viktor Larsson, Daniel Barath

Abstract: In this paper, we revisit the rotation averaging problem applied in global Structure-from-Motion pipelines. We argue that the main problem of current methods is the minimized cost function that is only weakly connected with the input data via the estimated epipolar geometries.We propose to better model the underlying noise distributions by directly propagating the uncertainty from the point corres… ▽ More In this paper, we revisit the rotation averaging problem applied in global Structure-from-Motion pipelines. We argue that the main problem of current methods is the minimized cost function that is only weakly connected with the input data via the estimated epipolar geometries.We propose to better model the underlying noise distributions by directly propagating the uncertainty from the point correspondences into the rotation averaging. Such uncertainties are obtained for free by considering the Jacobians of two-view refinements. Moreover, we explore integrating a variant of the MAGSAC loss into the rotation averaging problem, instead of using classical robust losses employed in current frameworks. The proposed method leads to results superior to baselines, in terms of accuracy, on large-scale public benchmarks. The code is public. https://github.com/zhangganlin/GlobalSfMpy △ Less

Submitted 9 March, 2023; originally announced March 2023.

Comments: submitted to CVPR2023

arXiv:2302.09997 [pdf, other]

A Large Scale Homography Benchmark

Authors: Daniel Barath, Dmytro Mishkin, Michal Polic, Wolfgang Förstner, Jiri Matas

Abstract: We present a large-scale dataset of Planes in 3D, Pi3D, of roughly 1000 planes observed in 10 000 images from the 1DSfM dataset, and HEB, a large-scale homography estimation benchmark leveraging Pi3D. The applications of the Pi3D dataset are diverse, e.g. training or evaluating monocular depth, surface normal estimation and image matching algorithms. The HEB dataset consists of 226 260 homographie… ▽ More We present a large-scale dataset of Planes in 3D, Pi3D, of roughly 1000 planes observed in 10 000 images from the 1DSfM dataset, and HEB, a large-scale homography estimation benchmark leveraging Pi3D. The applications of the Pi3D dataset are diverse, e.g. training or evaluating monocular depth, surface normal estimation and image matching algorithms. The HEB dataset consists of 226 260 homographies and includes roughly 4M correspondences. The homographies link images that often undergo significant viewpoint and illumination changes. As applications of HEB, we perform a rigorous evaluation of a wide range of robust estimators and deep learning-based correspondence filtering methods, establishing the current state-of-the-art in robust homography estimation. We also evaluate the uncertainty of the SIFT orientations and scales w.r.t. the ground truth coming from the underlying homographies and provide codes for comparing uncertainty of custom detectors. The dataset is available at \url{https://github.com/danini/homography-benchmark}. △ Less

Submitted 20 February, 2023; originally announced February 2023.

arXiv:2212.13185 [pdf, other]

Generalized Differentiable RANSAC

Authors: Tong Wei, Yash Patel, Alexander Shekhovtsov, Jiri Matas, Daniel Barath

Abstract: We propose $\nabla$-RANSAC, a generalized differentiable RANSAC that allows learning the entire randomized robust estimation pipeline. The proposed approach enables the use of relaxation techniques for estimating the gradients in the sampling distribution, which are then propagated through a differentiable solver. The trainable quality function marginalizes over the scores from all the models esti… ▽ More We propose $\nabla$-RANSAC, a generalized differentiable RANSAC that allows learning the entire randomized robust estimation pipeline. The proposed approach enables the use of relaxation techniques for estimating the gradients in the sampling distribution, which are then propagated through a differentiable solver. The trainable quality function marginalizes over the scores from all the models estimated within $\nabla$-RANSAC to guide the network learning accurate and useful inlier probabilities or to train feature detection and matching networks. Our method directly maximizes the probability of drawing a good hypothesis, allowing us to learn better sampling distributions. We test $\nabla$-RANSAC on various real-world scenarios on fundamental and essential matrix estimation, and 3D point cloud registration, outdoors and indoors, with handcrafted and learning-based features. It is superior to the state-of-the-art in terms of accuracy while running at a similar speed to its less accurate alternatives. The code and trained models are available at https://github.com/weitong8591/differentiable_ransac. △ Less

Submitted 8 September, 2023; v1 submitted 26 December, 2022; originally announced December 2022.

arXiv:2212.07766 [pdf, other]

DeepLSD: Line Segment Detection and Refinement with Deep Image Gradients

Authors: Rémi Pautrat, Daniel Barath, Viktor Larsson, Martin R. Oswald, Marc Pollefeys

Abstract: Line segments are ubiquitous in our human-made world and are increasingly used in vision tasks. They are complementary to feature points thanks to their spatial extent and the structural information they provide. Traditional line detectors based on the image gradient are extremely fast and accurate, but lack robustness in noisy images and challenging conditions. Their learned counterparts are more… ▽ More Line segments are ubiquitous in our human-made world and are increasingly used in vision tasks. They are complementary to feature points thanks to their spatial extent and the structural information they provide. Traditional line detectors based on the image gradient are extremely fast and accurate, but lack robustness in noisy images and challenging conditions. Their learned counterparts are more repeatable and can handle challenging images, but at the cost of a lower accuracy and a bias towards wireframe lines. We propose to combine traditional and learned approaches to get the best of both worlds: an accurate and robust line detector that can be trained in the wild without ground truth lines. Our new line segment detector, DeepLSD, processes images with a deep network to generate a line attraction field, before converting it to a surrogate image gradient magnitude and angle, which is then fed to any existing handcrafted line detector. Additionally, we propose a new optimization tool to refine line segments based on the attraction field and vanishing points. This refinement improves the accuracy of current deep detectors by a large margin. We demonstrate the performance of our method on low-level line detection metrics, as well as on several downstream tasks using multiple challenging datasets. The source code and models are available at https://github.com/cvg/DeepLSD. △ Less

Submitted 28 March, 2023; v1 submitted 15 December, 2022; originally announced December 2022.

Comments: Accepted at CVPR 2023

arXiv:2207.07872 [pdf, other]

NeFSAC: Neurally Filtered Minimal Samples

Authors: Luca Cavalli, Marc Pollefeys, Daniel Barath

Abstract: Since RANSAC, a great deal of research has been devoted to improving both its accuracy and run-time. Still, only a few methods aim at recognizing invalid minimal samples early, before the often expensive model estimation and quality calculation are done. To this end, we propose NeFSAC, an efficient algorithm for neural filtering of motion-inconsistent and poorly-conditioned minimal samples. We tra… ▽ More Since RANSAC, a great deal of research has been devoted to improving both its accuracy and run-time. Still, only a few methods aim at recognizing invalid minimal samples early, before the often expensive model estimation and quality calculation are done. To this end, we propose NeFSAC, an efficient algorithm for neural filtering of motion-inconsistent and poorly-conditioned minimal samples. We train NeFSAC to predict the probability of a minimal sample leading to an accurate relative pose, only based on the pixel coordinates of the image correspondences. Our neural filtering model learns typical motion patterns of samples which lead to unstable poses, and regularities in the possible motions to favour well-conditioned and likely-correct samples. The novel lightweight architecture implements the main invariants of minimal samples for pose estimation, and a novel training scheme addresses the problem of extreme class imbalance. NeFSAC can be plugged into any existing RANSAC-based pipeline. We integrate it into USAC and show that it consistently provides strong speed-ups even under extreme train-test domain gaps - for example, the model trained for the autonomous driving scenario works on PhotoTourism too. We tested NeFSAC on more than 100k image pairs from three publicly available real-world datasets and found that it leads to one order of magnitude speed-up, while often finding more accurate results than USAC alone. The source code is available at https://github.com/cavalli1234/NeFSAC. △ Less

Submitted 16 July, 2022; originally announced July 2022.

Comments: Published in the 17th European Conference on Computer Vision (ECCV 2022)

arXiv:2203.07930 [pdf, other]

Relative Pose from SIFT Features

Authors: Daniel Barath, Zuzana Kukelova

Abstract: This paper proposes the geometric relationship of epipolar geometry and orientation- and scale-covariant, e.g., SIFT, features. We derive a new linear constraint relating the unknown elements of the fundamental matrix and the orientation and scale. This equation can be used together with the well-known epipolar constraint to, e.g., estimate the fundamental matrix from four SIFT correspondences, es… ▽ More This paper proposes the geometric relationship of epipolar geometry and orientation- and scale-covariant, e.g., SIFT, features. We derive a new linear constraint relating the unknown elements of the fundamental matrix and the orientation and scale. This equation can be used together with the well-known epipolar constraint to, e.g., estimate the fundamental matrix from four SIFT correspondences, essential matrix from three, and to solve the semi-calibrated case from three correspondences. Requiring fewer correspondences than the well-known point-based approaches (e.g., 5PT, 6PT and 7PT solvers) for epipolar geometry estimation makes RANSAC-like randomized robust estimation significantly faster. The proposed constraint is tested on a number of problems in a synthetic environment and on publicly available real-world datasets on more than 80000 image pairs. It is superior to the state-of-the-art in terms of processing time while often leading to more accurate results. △ Less

Submitted 15 March, 2022; originally announced March 2022.

arXiv:2111.14093 [pdf, other]

Adaptive Reordering Sampler with Neurally Guided MAGSAC

Authors: Tong Wei, Jiri Matas, Daniel Barath

Abstract: We propose a new sampler for robust estimators that always selects the sample with the highest probability of consisting only of inliers. After every unsuccessful iteration, the inlier probabilities are updated in a principled way via a Bayesian approach. The probabilities obtained by the deep network are used as prior (so-called neural guidance) inside the sampler. Moreover, we introduce a new lo… ▽ More We propose a new sampler for robust estimators that always selects the sample with the highest probability of consisting only of inliers. After every unsuccessful iteration, the inlier probabilities are updated in a principled way via a Bayesian approach. The probabilities obtained by the deep network are used as prior (so-called neural guidance) inside the sampler. Moreover, we introduce a new loss that exploits, in a geometrically justifiable manner, the orientation and scale that can be estimated for any type of feature, e.g., SIFT or SuperPoint, to estimate two-view geometry. The new loss helps to learn higher-order information about the underlying scene geometry. Benefiting from the new sampler and the proposed loss, we combine the neural guidance with the state-of-the-art MAGSAC++. Adaptive Reordering Sampler with Neurally Guided MAGSAC (ARS-MAGSAC) is superior to the state-of-the-art in terms of accuracy and run-time on the PhotoTourism and KITTI datasets for essential and fundamental matrix estimation. The code and trained models are available at https://github.com/weitong8591/ars_magsac. △ Less

Submitted 8 September, 2023; v1 submitted 28 November, 2021; originally announced November 2021.

arXiv:2111.12385 [pdf, other]

Space-Partitioning RANSAC

Authors: Daniel Barath, Gabor Valasek

Abstract: A new algorithm is proposed to accelerate RANSAC model quality calculations. The method is based on partitioning the joint correspondence space, e.g., 2D-2D point correspondences, into a pair of regular grids. The grid cells are mapped by minimal sample models, estimated within RANSAC, to reject correspondences that are inconsistent with the model parameters early. The proposed technique is genera… ▽ More A new algorithm is proposed to accelerate RANSAC model quality calculations. The method is based on partitioning the joint correspondence space, e.g., 2D-2D point correspondences, into a pair of regular grids. The grid cells are mapped by minimal sample models, estimated within RANSAC, to reject correspondences that are inconsistent with the model parameters early. The proposed technique is general. It works with arbitrary transformations even if a point is mapped to a point set, e.g., as a fundamental matrix maps to epipolar lines. The method is tested on thousands of image pairs from publicly available datasets on fundamental and essential matrix, homography and radially distorted homography estimation. On average, it reduces the RANSAC run-time by 41% with provably no deterioration in the accuracy. It can be straightforwardly plugged into state-of-the-art RANSAC frameworks, e.g. VSAC. △ Less

Submitted 20 July, 2022; v1 submitted 24 November, 2021; originally announced November 2021.

arXiv:2106.10240 [pdf, other]

VSAC: Efficient and Accurate Estimator for H and F

Authors: Maksym Ivashechkin, Daniel Barath, Jiri Matas

Abstract: We present VSAC, a RANSAC-type robust estimator with a number of novelties. It benefits from the introduction of the concept of independent inliers that improves significantly the efficacy of the dominant plane handling and, also, allows near error-free rejection of incorrect models, without false positives. The local optimization process and its application is improved so that it is run on averag… ▽ More We present VSAC, a RANSAC-type robust estimator with a number of novelties. It benefits from the introduction of the concept of independent inliers that improves significantly the efficacy of the dominant plane handling and, also, allows near error-free rejection of incorrect models, without false positives. The local optimization process and its application is improved so that it is run on average only once. Further technical improvements include adaptive sequential hypothesis verification and efficient model estimation via Gaussian elimination. Experiments on four standard datasets show that VSAC is significantly faster than all its predecessors and runs on average in 1-2 ms, on a CPU. It is two orders of magnitude faster and yet as precise as MAGSAC++, the currently most accurate estimator of two-view geometry. In the repeated runs on EVD, HPatches, PhotoTourism, and Kusvod2 datasets, it never failed. △ Less

Submitted 13 September, 2021; v1 submitted 18 June, 2021; originally announced June 2021.

arXiv:2104.05044 [pdf, other]

USACv20: robust essential, fundamental and homography matrix estimation

Authors: Maksym Ivashechkin, Daniel Barath, Jiri Matas

Abstract: We review the most recent RANSAC-like hypothesize-and-verify robust estimators. The best performing ones are combined to create a state-of-the-art version of the Universal Sample Consensus (USAC) algorithm. A recent objective is to implement a modular and optimized framework, making future RANSAC modules easy to be included. The proposed method, USACv20, is tested on eight publicly available real-… ▽ More We review the most recent RANSAC-like hypothesize-and-verify robust estimators. The best performing ones are combined to create a state-of-the-art version of the Universal Sample Consensus (USAC) algorithm. A recent objective is to implement a modular and optimized framework, making future RANSAC modules easy to be included. The proposed method, USACv20, is tested on eight publicly available real-world datasets, estimating homographies, fundamental and essential matrices. On average, USACv20 leads to the most geometrically accurate models and it is the fastest in comparison to the state-of-the-art robust estimators. All reported properties improved performance of original USAC algorithm significantly. The pipeline will be made available after publication. △ Less

Submitted 11 April, 2021; originally announced April 2021.

Comments: arXiv admin note: text overlap with arXiv:1912.05909

arXiv:2103.13875 [pdf, other]

Finding Geometric Models by Clustering in the Consensus Space

Authors: Daniel Barath, Denys Rozumny, Ivan Eichhardt, Levente Hajder, Jiri Matas

Abstract: We propose a new algorithm for finding an unknown number of geometric models, e.g., homographies. The problem is formalized as finding dominant model instances progressively without forming crisp point-to-model assignments. Dominant instances are found via a RANSAC-like sampling and a consolidation process driven by a model quality function considering previously proposed instances. New ones are f… ▽ More We propose a new algorithm for finding an unknown number of geometric models, e.g., homographies. The problem is formalized as finding dominant model instances progressively without forming crisp point-to-model assignments. Dominant instances are found via a RANSAC-like sampling and a consolidation process driven by a model quality function considering previously proposed instances. New ones are found by clustering in the consensus space. This new formulation leads to a simple iterative algorithm with state-of-the-art accuracy while running in real-time on a number of vision problems - at least two orders of magnitude faster than the competitors on two-view motion estimation. Also, we propose a deterministic sampler reflecting the fact that real-world data tend to form spatially coherent structures. The sampler returns connected components in a progressively densified neighborhood-graph. We present a number of applications where the use of multiple geometric models improves accuracy. These include pose estimation from multiple generalized homographies; trajectory estimation of fast-moving objects; and we also propose a way of using multiple homographies in global SfM algorithms. Source code: https://github.com/danini/clustering-in-consensus-space. △ Less

Submitted 17 April, 2023; v1 submitted 25 March, 2021; originally announced March 2021.

arXiv:2103.06535 [pdf, other]

Calibrated and Partially Calibrated Semi-Generalized Homographies

Authors: Snehal Bhayani, Torsten Sattler, Daniel Barath, Patrik Beliansky, Janne Heikkila, Zuzana Kukelova

Abstract: In this paper, we propose the first minimal solutions for estimating the semi-generalized homography given a perspective and a generalized camera. The proposed solvers use five 2D-2D image point correspondences induced by a scene plane. One of them assumes the perspective camera to be fully calibrated, while the other solver estimates the unknown focal length together with the absolute pose parame… ▽ More In this paper, we propose the first minimal solutions for estimating the semi-generalized homography given a perspective and a generalized camera. The proposed solvers use five 2D-2D image point correspondences induced by a scene plane. One of them assumes the perspective camera to be fully calibrated, while the other solver estimates the unknown focal length together with the absolute pose parameters. This setup is particularly important in structure-from-motion and image-based localization pipelines, where a new camera is localized in each step with respect to a set of known cameras and 2D-3D correspondences might not be available. As a consequence of a clever parametrization and the elimination ideal method, our approach only needs to solve a univariate polynomial of degree five or three. The proposed solvers are stable and efficient as demonstrated by a number of synthetic and real-world experiments. △ Less

Submitted 11 October, 2021; v1 submitted 11 March, 2021; originally announced March 2021.

Comments: Accepted to ICCV 2021 and to appear in the conference proceedings

arXiv:2012.00465 [pdf, other]

Minimal Solutions for Panoramic Stitching Given Gravity Prior

Authors: Yaqing Ding, Daniel Barath, Zuzana Kukelova

Abstract: When capturing panoramas, people tend to align their cameras with the vertical axis, i.e., the direction of gravity. Moreover, modern devices, such as smartphones and tablets, are equipped with an IMU (Inertial Measurement Unit) that can measure the gravity vector accurately. Using this prior, the y-axes of the cameras can be aligned or assumed to be already aligned, reducing their relative orient… ▽ More When capturing panoramas, people tend to align their cameras with the vertical axis, i.e., the direction of gravity. Moreover, modern devices, such as smartphones and tablets, are equipped with an IMU (Inertial Measurement Unit) that can measure the gravity vector accurately. Using this prior, the y-axes of the cameras can be aligned or assumed to be already aligned, reducing their relative orientation to 1-DOF (degree of freedom). Exploiting this assumption, we propose new minimal solutions to panoramic image stitching of images taken by cameras with coinciding optical centers, i.e., undergoing pure rotation. We consider four practical camera configurations, assuming unknown fixed or varying focal length with or without radial distortion. The solvers are tested both on synthetic scenes and on more than 500k real image pairs from the Sun360 dataset and from scenes captured by us using two smartphones equipped with IMUs. It is shown, that they outperform the state-of-the-art both in terms of accuracy and processing time. △ Less

Submitted 1 December, 2020; originally announced December 2020.

arXiv:2012.00458 [pdf, other]

Globally Optimal Relative Pose Estimation with Gravity Prior

Authors: Yaqing Ding, Daniel Barath, Jian Yang, Hui Kong, Zuzana Kukelova

Abstract: Smartphones, tablets and camera systems used, e.g., in cars and UAVs, are typically equipped with IMUs (inertial measurement units) that can measure the gravity vector accurately. Using this additional information, the $y$-axes of the cameras can be aligned, reducing their relative orientation to a single degree-of-freedom. With this assumption, we propose a novel globally optimal solver, minimizi… ▽ More Smartphones, tablets and camera systems used, e.g., in cars and UAVs, are typically equipped with IMUs (inertial measurement units) that can measure the gravity vector accurately. Using this additional information, the $y$-axes of the cameras can be aligned, reducing their relative orientation to a single degree-of-freedom. With this assumption, we propose a novel globally optimal solver, minimizing the algebraic error in the least-squares sense, to estimate the relative pose in the over-determined case. Based on the epipolar constraint, we convert the optimization problem into solving two polynomials with only two unknowns. Also, a fast solver is proposed using the first-order approximation of the rotation. The proposed solvers are compared with the state-of-the-art ones on four real-world datasets with approx. 50000 image pairs in total. Moreover, we collected a dataset, by a smartphone, consisting of 10933 image pairs, gravity directions, and ground truth 3D reconstructions. △ Less

Submitted 4 February, 2021; v1 submitted 1 December, 2020; originally announced December 2020.

arXiv:2011.11986 [pdf, other]

Efficient Initial Pose-graph Generation for Global SfM

Authors: Daniel Barath, Dmytro Mishkin, Ivan Eichhardt, Ilia Shipachev, Jiri Matas

Abstract: We propose ways to speed up the initial pose-graph generation for global Structure-from-Motion algorithms. To avoid forming tentative point correspondences by FLANN and geometric verification by RANSAC, which are the most time-consuming steps of the pose-graph creation, we propose two new methods - built on the fact that image pairs usually are matched consecutively. Thus, candidate relative poses… ▽ More We propose ways to speed up the initial pose-graph generation for global Structure-from-Motion algorithms. To avoid forming tentative point correspondences by FLANN and geometric verification by RANSAC, which are the most time-consuming steps of the pose-graph creation, we propose two new methods - built on the fact that image pairs usually are matched consecutively. Thus, candidate relative poses can be recovered from paths in the partly-built pose-graph. We propose a heuristic for the A* traversal, considering global similarity of images and the quality of the pose-graph edges. Given a relative pose from a path, descriptor-based feature matching is made "light-weight" by exploiting the known epipolar geometry. To speed up PROSAC-based sampling when RANSAC is applied, we propose a third method to order the correspondences by their inlier probabilities from previous estimations. The algorithms are tested on 402130 image pairs from the 1DSfM dataset and they speed up the feature matching 17 times and pose estimation 5 times. △ Less

Submitted 26 November, 2020; v1 submitted 24 November, 2020; originally announced November 2020.

Comments: Added supplementary material

arXiv:2011.08790 [pdf, other]

P1AC: Revisiting Absolute Pose From a Single Affine Correspondence

Authors: Jonathan Ventura, Zuzana Kukelova, Torsten Sattler, Dániel Baráth

Abstract: Affine correspondences have traditionally been used to improve feature matching over wide baselines. While recent work has successfully used affine correspondences to solve various relative camera pose estimation problems, less attention has been given to their use in absolute pose estimation. We introduce the first general solution to the problem of estimating the pose of a calibrated camera give… ▽ More Affine correspondences have traditionally been used to improve feature matching over wide baselines. While recent work has successfully used affine correspondences to solve various relative camera pose estimation problems, less attention has been given to their use in absolute pose estimation. We introduce the first general solution to the problem of estimating the pose of a calibrated camera given a single observation of an oriented point and an affine correspondence. The advantage of our approach (P1AC) is that it requires only a single correspondence, in comparison to the traditional point-based approach (P3P), significantly reducing the combinatorics in robust estimation. P1AC provides a general solution that removes restrictive assumptions made in prior work and is applicable to large-scale image-based localization. We propose a minimal solution to the P1AC problem and evaluate our novel solver on synthetic data, showing its numerical stability and performance under various types of noise. On standard image-based localization benchmarks we show that P1AC achieves more accurate results than the widely used P3P algorithm. Code for our method is available at https://github.com/jonathanventura/P1AC/ . △ Less

Submitted 29 June, 2024; v1 submitted 17 November, 2020; originally announced November 2020.

Comments: ICCV 2023 (with corrections in Eqs. 6 and 13 and Fig. 4)

arXiv:2008.05743 [pdf, other]

Pose Estimation for Vehicle-mounted Cameras via Horizontal and Vertical Planes

Authors: Istan Gergo Gal, Daniel Barath, Levente Hajder

Abstract: We propose two novel solvers for estimating the egomotion of a calibrated camera mounted to a moving vehicle from a single affine correspondence via recovering special homographies. For the first class of solvers, the sought plane is expected to be perpendicular to one of the camera axes. For the second class, the plane is orthogonal to the ground with unknown normal, e.g., it is a building facade… ▽ More We propose two novel solvers for estimating the egomotion of a calibrated camera mounted to a moving vehicle from a single affine correspondence via recovering special homographies. For the first class of solvers, the sought plane is expected to be perpendicular to one of the camera axes. For the second class, the plane is orthogonal to the ground with unknown normal, e.g., it is a building facade. Both methods are solved via a linear system with a small coefficient matrix, thus, being extremely efficient. Both the minimal and over-determined cases can be solved by the proposed methods. They are tested on synthetic data and on publicly available real-world datasets. The novel methods are more accurate or comparable to the traditional algorithms and are faster when included in state of the art robust estimators. △ Less

Submitted 13 August, 2020; originally announced August 2020.

arXiv:2007.10700 [pdf, other]

Minimal Cases for Computing the Generalized Relative Pose using Affine Correspondences

Authors: Banglei Guan, Ji Zhao, Daniel Barath, Friedrich Fraundorfer

Abstract: We propose three novel solvers for estimating the relative pose of a multi-camera system from affine correspondences (ACs). A new constraint is derived interpreting the relationship of ACs and the generalized camera model. Using the constraint, we demonstrate efficient solvers for two types of motions assumed. Considering that the cameras undergo planar motion, we propose a minimal solution using… ▽ More We propose three novel solvers for estimating the relative pose of a multi-camera system from affine correspondences (ACs). A new constraint is derived interpreting the relationship of ACs and the generalized camera model. Using the constraint, we demonstrate efficient solvers for two types of motions assumed. Considering that the cameras undergo planar motion, we propose a minimal solution using a single AC and a solver with two ACs to overcome the degenerate case. Also, we propose a minimal solution using two ACs with known vertical direction, e.g., from an IMU. Since the proposed methods require significantly fewer correspondences than state-of-the-art algorithms, they can be efficiently used within RANSAC for outlier removal and initial motion estimation. The solvers are tested both on synthetic data and on real-world scenes from the KITTI odometry benchmark. It is shown that the accuracy of the estimated poses is superior to the state-of-the-art techniques. △ Less

Submitted 19 August, 2021; v1 submitted 21 July, 2020; originally announced July 2020.

Comments: ICCV 2021

arXiv:2007.10082 [pdf, other]

Relative Pose from Deep Learned Depth and a Single Affine Correspondence

Authors: Ivan Eichhardt, Daniel Barath

Abstract: We propose a new approach for combining deep-learned non-metric monocular depth with affine correspondences (ACs) to estimate the relative pose of two calibrated cameras from a single correspondence. Considering the depth information and affine features, two new constraints on the camera pose are derived. The proposed solver is usable within 1-point RANSAC approaches. Thus, the processing time of… ▽ More We propose a new approach for combining deep-learned non-metric monocular depth with affine correspondences (ACs) to estimate the relative pose of two calibrated cameras from a single correspondence. Considering the depth information and affine features, two new constraints on the camera pose are derived. The proposed solver is usable within 1-point RANSAC approaches. Thus, the processing time of the robust estimation is linear in the number of correspondences and, therefore, orders of magnitude faster than by using traditional approaches. The proposed 1AC+D solver is tested both on synthetic data and on 110395 publicly available real image pairs where we used an off-the-shelf monocular depth network to provide up-to-scale depth per pixel. The proposed 1AC+D leads to similar accuracy as traditional approaches while being significantly faster. When solving large-scale problems, e.g., pose-graph initialization for Structure-from-Motion (SfM) pipelines, the overhead of obtaining ACs and monocular depth is negligible compared to the speed-up gained in the pairwise geometric verification, i.e., relative pose estimation. This is demonstrated on scenes from the 1DSfM dataset using a state-of-the-art global SfM algorithm. Source code: https://github.com/eivan/one-ac-pose △ Less

Submitted 20 July, 2020; originally announced July 2020.

arXiv:2007.10032 [pdf, other]

Making Affine Correspondences Work in Camera Geometry Computation

Authors: Daniel Barath, Michal Polic, Wolfgang Förstner, Torsten Sattler, Tomas Pajdla, Zuzana Kukelova

Abstract: Local features e.g. SIFT and its affine and learned variants provide region-to-region rather than point-to-point correspondences. This has recently been exploited to create new minimal solvers for classical problems such as homography, essential and fundamental matrix estimation. The main advantage of such solvers is that their sample size is smaller, e.g., only two instead of four matches are req… ▽ More Local features e.g. SIFT and its affine and learned variants provide region-to-region rather than point-to-point correspondences. This has recently been exploited to create new minimal solvers for classical problems such as homography, essential and fundamental matrix estimation. The main advantage of such solvers is that their sample size is smaller, e.g., only two instead of four matches are required to estimate a homography. Works proposing such solvers often claim a significant improvement in run-time thanks to fewer RANSAC iterations. We show that this argument is not valid in practice if the solvers are used naively. To overcome this, we propose guidelines for effective use of region-to-region matches in the course of a full model estimation pipeline. We propose a method for refining the local feature geometries by symmetric intensity-based matching, combine uncertainty propagation inside RANSAC with preemptive model verification, show a general scheme for computing uncertainty of minimal solvers results, and adapt the sample cheirality check for homography estimation. Our experiments show that affine solvers can achieve accuracy comparable to point-based solvers at faster run-times when following our guidelines. We make code available at https://github.com/danini/affine-correspondences-for-camera-geometry. △ Less

Submitted 20 July, 2020; originally announced July 2020.

arXiv:2004.00605 [pdf, other]

EPOS: Estimating 6D Pose of Objects with Symmetries

Authors: Tomas Hodan, Daniel Barath, Jiri Matas

Abstract: We present a new method for estimating the 6D pose of rigid objects with available 3D models from a single RGB input image. The method is applicable to a broad range of objects, including challenging ones with global or partial symmetries. An object is represented by compact surface fragments which allow handling symmetries in a systematic manner. Correspondences between densely sampled pixels and… ▽ More We present a new method for estimating the 6D pose of rigid objects with available 3D models from a single RGB input image. The method is applicable to a broad range of objects, including challenging ones with global or partial symmetries. An object is represented by compact surface fragments which allow handling symmetries in a systematic manner. Correspondences between densely sampled pixels and the fragments are predicted using an encoder-decoder network. At each pixel, the network predicts: (i) the probability of each object's presence, (ii) the probability of the fragments given the object's presence, and (iii) the precise 3D location on each fragment. A data-dependent number of corresponding 3D locations is selected per pixel, and poses of possibly multiple object instances are estimated using a robust and efficient variant of the PnP-RANSAC algorithm. In the BOP Challenge 2019, the method outperforms all RGB and most RGB-D and D methods on the T-LESS and LM-O datasets. On the YCB-V dataset, it is superior to all competitors, with a large margin over the second-best RGB method. Source code is at: cmp.felk.cvut.cz/epos. △ Less

Submitted 1 April, 2020; originally announced April 2020.

Comments: Accepted to CVPR 2020

arXiv:1912.06465 [pdf, other]

Relative planar motion for vehicle-mounted cameras from a single affine correspondence

Authors: Levente Hajder, Daniel Barath

Abstract: Two solvers are proposed for estimating the extrinsic camera parameters from a single affine correspondence assuming general planar motion. In this case, the camera movement is constrained to a plane and the image plane is orthogonal to the ground. The algorithms do not assume other constraints, e.g.\ the non-holonomic one, to hold. A new minimal solver is proposed for the semi-calibrated case, i.… ▽ More Two solvers are proposed for estimating the extrinsic camera parameters from a single affine correspondence assuming general planar motion. In this case, the camera movement is constrained to a plane and the image plane is orthogonal to the ground. The algorithms do not assume other constraints, e.g.\ the non-holonomic one, to hold. A new minimal solver is proposed for the semi-calibrated case, i.e. the camera parameters are known except a common focal length. Another method is proposed for the fully calibrated case. Due to requiring a single correspondence, robust estimation, e.g. histogram voting, leads to a fast and accurate procedure. The proposed methods are tested in our synthetic environment and on publicly available real datasets consisting of videos through tens of kilometres. They are superior to the state-of-the-art both in terms of accuracy and processing time. △ Less

Submitted 13 December, 2019; originally announced December 2019.

arXiv:1912.06464 [pdf, other]

Least-squares Optimal Relative Planar Motion for Vehicle-mounted Cameras

Authors: Levente Hajder, Daniel Barath

Abstract: A new closed-form solver is proposed minimizing the algebraic error optimally, in the least-squares sense, to estimate the relative planar motion of two calibrated cameras. The main objective is to solve the over-determined case, i.e., when a larger-than-minimal sample of point correspondences is given - thus, estimating the motion from at least three correspondences. The algorithm requires the ca… ▽ More A new closed-form solver is proposed minimizing the algebraic error optimally, in the least-squares sense, to estimate the relative planar motion of two calibrated cameras. The main objective is to solve the over-determined case, i.e., when a larger-than-minimal sample of point correspondences is given - thus, estimating the motion from at least three correspondences. The algorithm requires the camera movement to be constrained to a plane, e.g. mounted to a vehicle, and the image plane to be orthogonal to the ground. The solver obtains the motion parameters as the roots of a 6-th degree polynomial. It is validated both in synthetic experiments and on publicly available real-world datasets that using the proposed solver leads to results superior to the state-of-the-art in terms of geometric accuracy with no noticeable deterioration in the processing time. △ Less

Submitted 13 December, 2019; originally announced December 2019.

arXiv:1912.05909 [pdf, other]

MAGSAC++, a fast, reliable and accurate robust estimator

Authors: Daniel Barath, Jana Noskova, Maksym Ivashechkin, Jiri Matas

Abstract: A new method for robust estimation, MAGSAC++, is proposed. It introduces a new model quality (scoring) function that does not require the inlier-outlier decision, and a novel marginalization procedure formulated as an iteratively re-weighted least-squares approach. We also propose a new sampler, Progressive NAPSAC, for RANSAC-like robust estimators. Exploiting the fact that nearby points often ori… ▽ More A new method for robust estimation, MAGSAC++, is proposed. It introduces a new model quality (scoring) function that does not require the inlier-outlier decision, and a novel marginalization procedure formulated as an iteratively re-weighted least-squares approach. We also propose a new sampler, Progressive NAPSAC, for RANSAC-like robust estimators. Exploiting the fact that nearby points often originate from the same model in real-world data, it finds local structures earlier than global samplers. The progressive transition from local to global sampling does not suffer from the weaknesses of purely localized samplers. On six publicly available real-world datasets for homography and fundamental matrix fitting, MAGSAC++ produces results superior to state-of-the-art robust methods. It is faster, more geometrically accurate and fails less often. △ Less

Submitted 11 December, 2019; originally announced December 2019.

Comments: arXiv admin note: substantial text overlap with arXiv:1906.02295

arXiv:1906.11927 [pdf, other]

Homography from two orientation- and scale-covariant features

Authors: Daniel Barath, Zuzana Kukelova

Abstract: This paper proposes a geometric interpretation of the angles and scales which the orientation- and scale-covariant feature detectors, e.g. SIFT, provide. Two new general constraints are derived on the scales and rotations which can be used in any geometric model estimation tasks. Using these formulas, two new constraints on homography estimation are introduced. Exploiting the derived equations, a… ▽ More This paper proposes a geometric interpretation of the angles and scales which the orientation- and scale-covariant feature detectors, e.g. SIFT, provide. Two new general constraints are derived on the scales and rotations which can be used in any geometric model estimation tasks. Using these formulas, two new constraints on homography estimation are introduced. Exploiting the derived equations, a solver for estimating the homography from the minimal number of two correspondences is proposed. Also, it is shown how the normalization of the point correspondences affects the rotation and scale parameters, thus achieving numerically stable results. Due to requiring merely two feature pairs, robust estimators, e.g. RANSAC, do significantly fewer iterations than by using the four-point algorithm. When using covariant features, e.g. SIFT, the information about the scale and orientation is given at no cost. The proposed homography estimation method is tested in a synthetic environment and on publicly available real-world datasets. △ Less

Submitted 27 June, 2019; originally announced June 2019.

arXiv:1906.02295 [pdf, other]

Progressive NAPSAC: sampling from gradually growing neighborhoods

Authors: Daniel Barath, Maksym Ivashechkin, Jiri Matas

Abstract: We propose Progressive NAPSAC, P-NAPSAC in short, which merges the advantages of local and global sampling by drawing samples from gradually growing neighborhoods. Exploiting the fact that nearby points are more likely to originate from the same geometric model, P-NAPSAC finds local structures earlier than global samplers. We show that the progressive spatial sampling in P-NAPSAC can be integrated… ▽ More We propose Progressive NAPSAC, P-NAPSAC in short, which merges the advantages of local and global sampling by drawing samples from gradually growing neighborhoods. Exploiting the fact that nearby points are more likely to originate from the same geometric model, P-NAPSAC finds local structures earlier than global samplers. We show that the progressive spatial sampling in P-NAPSAC can be integrated with PROSAC sampling, which is applied to the first, location-defining, point. P-NAPSAC is embedded in USAC, a state-of-the-art robust estimation pipeline, which we further improve by implementing its local optimization as in Graph-Cut RANSAC. We call the resulting estimator USAC*. The method is tested on homography and fundamental matrix fitting on a total of 10,691 models from seven publicly available datasets. USAC* with P-NAPSAC outperforms reference methods in terms of speed on all problems. △ Less

Submitted 5 June, 2019; originally announced June 2019.

arXiv:1906.02290 [pdf, other]

Progressive-X: Efficient, Anytime, Multi-Model Fitting Algorithm

Authors: Daniel Barath, Jiri Matas

Abstract: The Progressive-X algorithm, Prog-X in short, is proposed for geometric multi-model fitting. The method interleaves sampling and consolidation of the current data interpretation via repetitive hypothesis proposal, fast rejection, and integration of the new hypothesis into the kept instance set by labeling energy minimization. Due to exploring the data progressively, the method has several benefici… ▽ More The Progressive-X algorithm, Prog-X in short, is proposed for geometric multi-model fitting. The method interleaves sampling and consolidation of the current data interpretation via repetitive hypothesis proposal, fast rejection, and integration of the new hypothesis into the kept instance set by labeling energy minimization. Due to exploring the data progressively, the method has several beneficial properties compared with the state-of-the-art. First, a clear criterion, adopted from RANSAC, controls the termination and stops the algorithm when the probability of finding a new model with a reasonable number of inliers falls below a threshold. Second, Prog-X is an any-time algorithm. Thus, whenever is interrupted, e.g. due to a time limit, the returned instances cover real and, likely, the most dominant ones. The method is superior to the state-of-the-art in terms of accuracy in both synthetic experiments and on publicly available real-world datasets for homography, two-view motion, and motion segmentation. △ Less

Submitted 5 June, 2019; originally announced June 2019.

arXiv:1905.00519 [pdf, other]

Optimal Multi-view Correction of Local Affine Frames

Authors: Ivan Eichhardt, Daniel Barath

Abstract: The technique requires the epipolar geometry to be pre-estimated between each image pair. It exploits the constraints which the camera movement implies, in order to apply a closed-form correction to the parameters of the input affinities. Also, it is shown that the rotations and scales obtained by partially affine-covariant detectors, e.g., AKAZE or SIFT, can be completed to be full affine frames… ▽ More The technique requires the epipolar geometry to be pre-estimated between each image pair. It exploits the constraints which the camera movement implies, in order to apply a closed-form correction to the parameters of the input affinities. Also, it is shown that the rotations and scales obtained by partially affine-covariant detectors, e.g., AKAZE or SIFT, can be completed to be full affine frames by the proposed algorithm. It is validated both in synthetic experiments and on publicly available real-world datasets that the method always improves the output of the evaluated affine-covariant feature detectors. As a by-product, these detectors are compared and the ones obtaining the most accurate affine frames are reported. For demonstrating the applicability, we show that the proposed technique as a pre-processing step improves the accuracy of pose estimation for a camera rig, surface normal and homography estimation. △ Less

Submitted 1 May, 2019; originally announced May 2019.

arXiv:1807.03503 [pdf, other]

Recovering affine features from orientation- and scale-invariant ones

Authors: Daniel Barath

Abstract: An approach is proposed for recovering affine correspondences (ACs) from orientation- and scale-invariant, e.g. SIFT, features. The method calculates the affine parameters consistent with a pre-estimated epipolar geometry from the point coordinates and the scales and rotations which the feature detector obtains. The closed-form solution is given as the roots of a quadratic polynomial equation, thu… ▽ More An approach is proposed for recovering affine correspondences (ACs) from orientation- and scale-invariant, e.g. SIFT, features. The method calculates the affine parameters consistent with a pre-estimated epipolar geometry from the point coordinates and the scales and rotations which the feature detector obtains. The closed-form solution is given as the roots of a quadratic polynomial equation, thus having two possible real candidates and fast procedure, i.e. <1 millisecond. It is shown, as a possible application, that using the proposed algorithm allows us to estimate a homography for every single correspondence independently. It is validated both in our synthetic environment and on publicly available real world datasets, that the proposed technique leads to accurate ACs. Also, the estimated homographies have similar accuracy to what the state-of-the-art methods obtain, but due to requiring only a single correspondence, the robust estimation, e.g. by locally optimized RANSAC, is an order of magnitude faster. △ Less

Submitted 10 July, 2018; originally announced July 2018.

arXiv:1803.07469 [pdf, other]

MAGSAC: marginalizing sample consensus

Authors: Daniel Barath, Jana Noskova, Jiri Matas

Abstract: A method called, sigma-consensus, is proposed to eliminate the need for a user-defined inlier-outlier threshold in RANSAC. Instead of estimating the noise sigma, it is marginalized over a range of noise scales. The optimized model is obtained by weighted least-squares fitting where the weights come from the marginalization over sigma of the point likelihoods of being inliers. A new quality functio… ▽ More A method called, sigma-consensus, is proposed to eliminate the need for a user-defined inlier-outlier threshold in RANSAC. Instead of estimating the noise sigma, it is marginalized over a range of noise scales. The optimized model is obtained by weighted least-squares fitting where the weights come from the marginalization over sigma of the point likelihoods of being inliers. A new quality function is proposed not requiring sigma and, thus, a set of inliers to determine the model quality. Also, a new termination criterion for RANSAC is built on the proposed marginalization approach. Applying sigma-consensus, MAGSAC is proposed with no need for a user-defined sigma and improving the accuracy of robust estimation significantly. It is superior to the state-of-the-art in terms of geometric accuracy on publicly available real-world datasets for epipolar geometry (F and E) and homography estimation. In addition, applying sigma-consensus only once as a post-processing step to the RANSAC output always improved the model quality on a wide range of vision problems without noticeable deterioration in processing time, adding a few milliseconds. The source code is at https://github.com/danini/magsac. △ Less

Submitted 4 June, 2019; v1 submitted 20 March, 2018; originally announced March 2018.

arXiv:1803.00260 [pdf, other]

Five-point Fundamental Matrix Estimation for Uncalibrated Cameras

Authors: Daniel Barath

Abstract: We aim at estimating the fundamental matrix in two views from five correspondences of rotation invariant features obtained by e.g.\ the SIFT detector. The proposed minimal solver first estimates a homography from three correspondences assuming that they are co-planar and exploiting their rotational components. Then the fundamental matrix is obtained from the homography and two additional point pai… ▽ More We aim at estimating the fundamental matrix in two views from five correspondences of rotation invariant features obtained by e.g.\ the SIFT detector. The proposed minimal solver first estimates a homography from three correspondences assuming that they are co-planar and exploiting their rotational components. Then the fundamental matrix is obtained from the homography and two additional point pairs in general position. The proposed approach, combined with robust estimators like Graph-Cut RANSAC, is superior to other state-of-the-art algorithms both in terms of accuracy and number of iterations required. This is validated on synthesized data and $561$ real image pairs. Moreover, the tests show that requiring three points on a plane is not too restrictive in urban environment and locally optimized robust estimators lead to accurate estimates even if the points are not entirely co-planar. As a potential application, we show that using the proposed method makes two-view multi-motion estimation more accurate. △ Less

Submitted 1 March, 2018; originally announced March 2018.

arXiv:1706.01649 [pdf, other]

A Minimal Solution for Two-view Focal-length Estimation using Two Affine Correspondences

Authors: Daniel Barath, Tekla Toth, Levente Hajder

Abstract: A minimal solution using two affine correspondences is presented to estimate the common focal length and the fundamental matrix between two semi-calibrated cameras - known intrinsic parameters except a common focal length. To the best of our knowledge, this problem is unsolved. The proposed approach extends point correspondence-based techniques with linear constraints derived from local affine tra… ▽ More A minimal solution using two affine correspondences is presented to estimate the common focal length and the fundamental matrix between two semi-calibrated cameras - known intrinsic parameters except a common focal length. To the best of our knowledge, this problem is unsolved. The proposed approach extends point correspondence-based techniques with linear constraints derived from local affine transformations. The obtained multivariate polynomial system is efficiently solved by the hidden-variable technique. Observing the geometry of local affinities, we introduce novel conditions eliminating invalid roots. To select the best one out of the remaining candidates, a root selection technique is proposed outperforming the recent ones especially in case of high-level noise. The proposed 2-point algorithm is validated on both synthetic data and 104 publicly available real image pairs. A Matlab implementation of the proposed solution is included in the paper. △ Less

Submitted 6 June, 2017; originally announced June 2017.

arXiv:1706.00984 [pdf, other]

Graph-Cut RANSAC

Authors: Daniel Barath, Jiri Matas

Abstract: A novel method for robust estimation, called Graph-Cut RANSAC, GC-RANSAC in short, is introduced. To separate inliers and outliers, it runs the graph-cut algorithm in the local optimization (LO) step which is applied when a so-far-the-best model is found. The proposed LO step is conceptually simple, easy to implement, globally optimal and efficient. GC-RANSAC is shown experimentally, both on synth… ▽ More A novel method for robust estimation, called Graph-Cut RANSAC, GC-RANSAC in short, is introduced. To separate inliers and outliers, it runs the graph-cut algorithm in the local optimization (LO) step which is applied when a so-far-the-best model is found. The proposed LO step is conceptually simple, easy to implement, globally optimal and efficient. GC-RANSAC is shown experimentally, both on synthesized tests and real image pairs, to be more geometrically accurate than state-of-the-art methods on a range of problems, e.g. line fitting, homography, affine transformation, fundamental and essential matrix estimation. It runs in real-time for many problems at a speed approximately equal to that of the less accurate alternatives (in milliseconds on standard CPU). △ Less

Submitted 16 November, 2017; v1 submitted 3 June, 2017; originally announced June 2017.

arXiv:1706.00827 [pdf, other]

Multi-Class Model Fitting by Energy Minimization and Mode-Seeking

Authors: Daniel Barath, Jiri Matas

Abstract: We propose a general formulation, called Multi-X, for multi-class multi-instance model fitting - the problem of interpreting the input data as a mixture of noisy observations originating from multiple instances of multiple classes. We extend the commonly used alpha-expansion-based technique with a new move in the label space. The move replaces a set of labels with the corresponding density mode in… ▽ More We propose a general formulation, called Multi-X, for multi-class multi-instance model fitting - the problem of interpreting the input data as a mixture of noisy observations originating from multiple instances of multiple classes. We extend the commonly used alpha-expansion-based technique with a new move in the label space. The move replaces a set of labels with the corresponding density mode in the model parameter domain, thus achieving fast and robust optimization. Key optimization parameters like the bandwidth of the mode seeking are set automatically within the algorithm. Considering that a group of outliers may form spatially coherent structures in the data, we propose a cross-validation-based technique removing statistically insignificant instances. Multi-X outperforms significantly the state-of-the-art on publicly available datasets for diverse problems: multiple plane and rigid motion detection; motion segmentation; simultaneous plane and cylinder fitting; circle and line fitting. △ Less

Submitted 16 November, 2017; v1 submitted 2 June, 2017; originally announced June 2017.

Showing 1–48 of 48 results for author: Baráth, D