Search | arXiv e-print repository

Yes, we CANN: Constrained Approximate Nearest Neighbors for local feature-based visual localization

Authors: Dror Aiger, André Araujo, Simon Lynen

Abstract: Large-scale visual localization systems continue to rely on 3D point clouds built from image collections using structure-from-motion. While the 3D points in these models are represented using local image features, directly matching a query image's local features against the point cloud is challenging due to the scale of the nearest-neighbor search problem. Many recent approaches to visual localiza… ▽ More Large-scale visual localization systems continue to rely on 3D point clouds built from image collections using structure-from-motion. While the 3D points in these models are represented using local image features, directly matching a query image's local features against the point cloud is challenging due to the scale of the nearest-neighbor search problem. Many recent approaches to visual localization have thus proposed a hybrid method, where first a global (per image) embedding is used to retrieve a small subset of database images, and local features of the query are matched only against those. It seems to have become common belief that global embeddings are critical for said image-retrieval in visual localization, despite the significant downside of having to compute two feature types for each query image. In this paper, we take a step back from this assumption and propose Constrained Approximate Nearest Neighbors (CANN), a joint solution of k-nearest-neighbors across both the geometry and appearance space using only local features. We first derive the theoretical foundation for k-nearest-neighbor retrieval across multiple metrics and then showcase how CANN improves visual localization. Our experiments on public localization benchmarks demonstrate that our method significantly outperforms both state-of-the-art global feature-based retrieval and approaches using local feature aggregation schemes. Moreover, it is an order of magnitude faster in both index and query time than feature aggregation schemes for these datasets. Code: \url{https://github.com/google-research/google-research/tree/master/cann} △ Less

Submitted 29 December, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

Comments: ICCV23 camera-ready + appendix

arXiv:2211.14020 [pdf, other]

SCOOP: Self-Supervised Correspondence and Optimization-Based Scene Flow

Authors: Itai Lang, Dror Aiger, Forrester Cole, Shai Avidan, Michael Rubinstein

Abstract: Scene flow estimation is a long-standing problem in computer vision, where the goal is to find the 3D motion of a scene from its consecutive observations. Recently, there have been efforts to compute the scene flow from 3D point clouds. A common approach is to train a regression model that consumes source and target point clouds and outputs the per-point translation vector. An alternative is to le… ▽ More Scene flow estimation is a long-standing problem in computer vision, where the goal is to find the 3D motion of a scene from its consecutive observations. Recently, there have been efforts to compute the scene flow from 3D point clouds. A common approach is to train a regression model that consumes source and target point clouds and outputs the per-point translation vector. An alternative is to learn point matches between the point clouds concurrently with regressing a refinement of the initial correspondence flow. In both cases, the learning task is very challenging since the flow regression is done in the free 3D space, and a typical solution is to resort to a large annotated synthetic dataset. We introduce SCOOP, a new method for scene flow estimation that can be learned on a small amount of data without employing ground-truth flow supervision. In contrast to previous work, we train a pure correspondence model focused on learning point feature representation and initialize the flow as the difference between a source point and its softly corresponding target point. Then, in the run-time phase, we directly optimize a flow refinement component with a self-supervised objective, which leads to a coherent and accurate flow field between the point clouds. Experiments on widespread datasets demonstrate the performance gains achieved by our method compared to existing leading techniques while using a fraction of the training data. Our code is publicly available at https://github.com/itailang/SCOOP. △ Less

Submitted 13 April, 2023; v1 submitted 25 November, 2022; originally announced November 2022.

Comments: CVPR 2023. Project page: https://itailang.github.io/SCOOP/

arXiv:2107.11810 [pdf, other]

Efficient Large Scale Inlier Voting for Geometric Vision Problems

Authors: Dror Aiger, Simon Lynen, Jan Hosang, Bernhard Zeisl

Abstract: Outlier rejection and equivalently inlier set optimization is a key ingredient in numerous applications in computer vision such as filtering point-matches in camera pose estimation or plane and normal estimation in point clouds. Several approaches exist, yet at large scale we face a combinatorial explosion of possible solutions and state-of-the-art methods like RANSAC, Hough transform or Branch&Bo… ▽ More Outlier rejection and equivalently inlier set optimization is a key ingredient in numerous applications in computer vision such as filtering point-matches in camera pose estimation or plane and normal estimation in point clouds. Several approaches exist, yet at large scale we face a combinatorial explosion of possible solutions and state-of-the-art methods like RANSAC, Hough transform or Branch&Bound require a minimum inlier ratio or prior knowledge to remain practical. In fact, for problems such as camera posing in very large scenes these approaches become useless as they have exponential runtime growth if these conditions aren't met. To approach the problem we present a efficient and general algorithm for outlier rejection based on "intersecting" $k$-dimensional surfaces in $R^d$. We provide a recipe for casting a variety of geometric problems as finding a point in $R^d$ which maximizes the number of nearby surfaces (and thus inliers). The resulting algorithm has linear worst-case complexity with a better runtime dependency in the approximation factor than competing algorithms while not requiring domain specific bounds. This is achieved by introducing a space decomposition scheme that bounds the number of computations by successively rounding and grou** samples. Our recipe (and open-source code) enables anybody to derive such fast approaches to new problems across a wide range of domains. We demonstrate the versatility of the approach on several camera posing problems with a high number of matches at low inlier ratio achieving state-of-the-art results at significantly lower processing times. △ Less

Submitted 27 July, 2021; v1 submitted 25 July, 2021; originally announced July 2021.

arXiv:2006.12318 [pdf, other]

Duality-based approximation algorithms for depth queries and maximum depth

Authors: Dror Aiger, Haim Kaplan, Micha Sharir

Abstract: We design an efficient data structure for computing a suitably defined approximate depth of any query point in the arrangement $\mathcal{A}(S)$ of a collection $S$ of $n$ halfplanes or triangles in the plane or of halfspaces or simplices in higher dimensions. We then use this structure to find a point of an approximate maximum depth in $\mathcal{A}(S)$. Specifically, given an error parameter… ▽ More We design an efficient data structure for computing a suitably defined approximate depth of any query point in the arrangement $\mathcal{A}(S)$ of a collection $S$ of $n$ halfplanes or triangles in the plane or of halfspaces or simplices in higher dimensions. We then use this structure to find a point of an approximate maximum depth in $\mathcal{A}(S)$. Specifically, given an error parameter $ε>0$, we compute, for any query point $q$, an underestimate $d^-(q)$ of the depth of $q$, that counts only objects containing $q$, but is allowed to exclude objects when $q$ is $ε$-close to their boundary. Similarly, we compute an overestimate $d^+(q)$ that counts all objects containing $q$ but may also count objects that do not contain $q$ but $q$ is $ε$-close to their boundary. Our algorithms for halfplanes and halfspaces are linear in the number of input objects and in the number of queries, and the dependence of their running time on $ε$ is considerably better than that of earlier techniques. Our improvements are particularly substantial for triangles and in higher dimensions. △ Less

Submitted 22 June, 2020; originally announced June 2020.

arXiv:2005.08193 [pdf, other]

Output sensitive algorithms for approximate incidences and their applications

Authors: Dror Aiger, Haim Kaplan, Micha Sharir

Abstract: An $ε$-approximate incidence between a point and some geometric object (line, circle, plane, sphere) occurs when the point and the object lie at distance at most $ε$ from each other. Given a set of points and a set of objects, computing the approximate incidences between them is a major step in many database and web-based applications in computer vision and graphics, including robust model fitting… ▽ More An $ε$-approximate incidence between a point and some geometric object (line, circle, plane, sphere) occurs when the point and the object lie at distance at most $ε$ from each other. Given a set of points and a set of objects, computing the approximate incidences between them is a major step in many database and web-based applications in computer vision and graphics, including robust model fitting, approximate point pattern matching, and estimating the fundamental matrix in epipolar (stereo) geometry. In a typical approximate incidence problem of this sort, we are given a set $P$ of $m$ points in two or three dimensions, a set $S$ of $n$ objects (lines, circles, planes, spheres), and an error parameter $ε>0$, and our goal is to report all pairs $(p,s)\in P\times S$ that lie at distance at most $ε$ from one another. We present efficient output-sensitive approximation algorithms for quite a few cases, including points and lines or circles in the plane, and points and planes, spheres, lines, or circles in three dimensions. Several of these cases arise in the applications mentioned above. △ Less

Submitted 17 May, 2020; originally announced May 2020.

Comments: A preliminary version of this work appeared in Proc. 25th European Sympos. Algorithms (ESA), 2017

arXiv:1907.00338 [pdf, other]

Large-scale, real-time visual-inertial localization revisited

Authors: Simon Lynen, Bernhard Zeisl, Dror Aiger, Michael Bosse, Joel Hesch, Marc Pollefeys, Roland Siegwart, Torsten Sattler

Abstract: The overarching goals in image-based localization are scale, robustness and speed. In recent years, approaches based on local features and sparse 3D point-cloud models have both dominated the benchmarks and seen successful realworld deployment. They enable applications ranging from robot navigation, autonomous driving, virtual and augmented reality to device geo-localization. Recently end-to-end l… ▽ More The overarching goals in image-based localization are scale, robustness and speed. In recent years, approaches based on local features and sparse 3D point-cloud models have both dominated the benchmarks and seen successful realworld deployment. They enable applications ranging from robot navigation, autonomous driving, virtual and augmented reality to device geo-localization. Recently end-to-end learned localization approaches have been proposed which show promising results on small scale datasets. However the positioning accuracy, scalability, latency and compute & storage requirements of these approaches remain open challenges. We aim to deploy localization at global-scale where one thus relies on methods using local features and sparse 3D models. Our approach spans from offline model building to real-time client-side pose fusion. The system compresses appearance and geometry of the scene for efficient model storage and lookup leading to scalability beyond what what has been previously demonstrated. It allows for low-latency localization queries and efficient fusion run in real-time on mobile platforms by combining server-side localization with real-time visual-inertial-based camera pose tracking. In order to further improve efficiency we leverage a combination of priors, nearest neighbor search, geometric match culling and a cascaded pose candidate refinement step. This combination outperforms previous approaches when working with large scale models and allows deployment at unprecedented scale. We demonstrate the effectiveness of our approach on a proof-of-concept system localizing 2.5 million images against models from four cities in different regions on the world achieving query latencies in the 200ms range. △ Less

Submitted 30 June, 2019; originally announced July 2019.

arXiv:1903.07047 [pdf, other]

General techniques for approximate incidences and their application to the camera posing problem

Authors: Dror Aiger, Haim Kaplan, Efi Kokiopoulou, Micha Sharir, Bernhard Zeisl

Abstract: We consider the classical camera pose estimation problem that arises in many computer vision applications, in which we are given n 2D-3D correspondences between points in the scene and points in the camera image (some of which are incorrect associations), and where we aim to determine the camera pose (the position and orientation of the camera in the scene) from this data. We demonstrate that this… ▽ More We consider the classical camera pose estimation problem that arises in many computer vision applications, in which we are given n 2D-3D correspondences between points in the scene and points in the camera image (some of which are incorrect associations), and where we aim to determine the camera pose (the position and orientation of the camera in the scene) from this data. We demonstrate that this posing problem can be reduced to the problem of computing ε-approximate incidences between two-dimensional surfaces (derived from the input correspondences) and points (on a grid) in a four-dimensional pose space. Similar reductions can be applied to other camera pose problems, as well as to similar problems in related application areas. We describe and analyze three techniques for solving the resulting ε-approximate incidences problem in the context of our camera posing application. The first is a straightforward assignment of surfaces to the cells of a grid (of side-length ε) that they intersect. The second is a variant of a primal-dual technique, recently introduced by a subset of the authors [2] for different (and simpler) applications. The third is a non-trivial generalization of a data structure Fonseca and Mount [3], originally designed for the case of hyperplanes. We present and analyze this technique in full generality, and then apply it to the camera posing problem at hand. We compare our methods experimentally on real and synthetic data. Our experiments show that for the typical values of n and ε, the primal-dual method is the fastest, also in practice. △ Less

Submitted 17 March, 2019; originally announced March 2019.

arXiv:1712.09216 [pdf, other]

Large-Scale 3D Scene Classification With Multi-View Volumetric CNN

Authors: Dror Aiger, Brett Allen, Aleksey Golovinskiy

Abstract: We introduce a method to classify imagery using a convo- lutional neural network (CNN) on multi-view image pro- jections. The power of our method comes from using pro- jections of multiple images at multiple depth planes near the reconstructed surface. This enables classification of categories whose salient aspect is appearance change un- der different viewpoints, such as water, trees, and other m… ▽ More We introduce a method to classify imagery using a convo- lutional neural network (CNN) on multi-view image pro- jections. The power of our method comes from using pro- jections of multiple images at multiple depth planes near the reconstructed surface. This enables classification of categories whose salient aspect is appearance change un- der different viewpoints, such as water, trees, and other materials with complex reflection/light response proper- ties. Our method does not require boundary labelling in images and works on pixel-level classification with a small (few pixels) context, which simplifies the cre- ation of a training set. We demonstrate this application on large-scale aerial imagery collections, and extend the per-pixel classification to robustly create a consistent 2D classification which can be used to fill the gaps in non- reconstructible water regions. We also apply our method to classify tree regions. In both cases, the training data can quickly be generated using a small number of manually- created polygons on a map. We show that even with a very simple and standard network our CNN outperforms the state-of-the-art image classification, the Inception-V3 model retrained from a large collection of aerial images. △ Less

Submitted 26 December, 2017; originally announced December 2017.

arXiv:1709.02933 [pdf, ps, other]

Homotheties and incidences

Authors: Dror Aiger, Micha Sharir

Abstract: We consider problems involving rich homotheties in a set S of n points in the plane (that is, homotheties that map many points of S to other points of S). By reducing these problems to incidence problems involving points and lines in R^3, we are able to obtain refined and new bounds for the number of rich homotheties, and for the number of distinct equivalence classes, under homotheties, of k-elem… ▽ More We consider problems involving rich homotheties in a set S of n points in the plane (that is, homotheties that map many points of S to other points of S). By reducing these problems to incidence problems involving points and lines in R^3, we are able to obtain refined and new bounds for the number of rich homotheties, and for the number of distinct equivalence classes, under homotheties, of k-element subsets of S, for any k >= 3. We also discuss the extensions of these problems to three and higher dimensions. △ Less

Submitted 9 September, 2017; originally announced September 2017.

Showing 1–9 of 9 results for author: Aiger, D