Search | arXiv e-print repository

CudaSIFT-SLAM: multiple-map visual SLAM for full procedure map** in real human endoscopy

Authors: Richard Elvira, Juan D. Tardós, José M. M. Montiel

Abstract: Monocular visual simultaneous localization and map** (V-SLAM) is nowadays an irreplaceable tool in mobile robotics and augmented reality, where it performs robustly. However, human colonoscopies pose formidable challenges like occlusions, blur, light changes, lack of texture, deformation, water jets or tool interaction, which result in very frequent tracking losses. ORB-SLAM3, the top performing… ▽ More Monocular visual simultaneous localization and map** (V-SLAM) is nowadays an irreplaceable tool in mobile robotics and augmented reality, where it performs robustly. However, human colonoscopies pose formidable challenges like occlusions, blur, light changes, lack of texture, deformation, water jets or tool interaction, which result in very frequent tracking losses. ORB-SLAM3, the top performing multiple-map V-SLAM, is unable to recover from them by merging sub-maps or relocalizing the camera, due to the poor performance of its place recognition algorithm based on ORB features and DBoW2 bag-of-words. We present CudaSIFT-SLAM, the first V-SLAM system able to process complete human colonoscopies in real-time. To overcome the limitations of ORB-SLAM3, we use SIFT instead of ORB features and replace the DBoW2 direct index with the more computationally demanding brute-force matching, being able to successfully match images separated in time for relocation and map merging. Real-time performance is achieved thanks to CudaSIFT, a GPU implementation for SIFT extraction and brute-force matching. We benchmark our system in the C3VD phantom colon dataset, and in a full real colonoscopy from the Endomapper dataset, demonstrating the capabilities to merge sub-maps and relocate in them, obtaining significantly longer sub-maps. Our system successfully maps in real-time 88 % of the frames in the C3VD dataset. In a real screening colonoscopy, despite the much higher prevalence of occluded and blurred frames, the map** coverage is 53 % in carefully explored areas and 38 % in the full sequence, a 70 % improvement over ORB-SLAM3. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: 10 pages, 10 figures, 6 tables, under revision

ACM Class: I.4.9

arXiv:2309.02777 [pdf, other]

LightNeuS: Neural Surface Reconstruction in Endoscopy using Illumination Decline

Authors: Víctor M. Batlle, José M. M. Montiel, Pascal Fua, Juan D. Tardós

Abstract: We propose a new approach to 3D reconstruction from sequences of images acquired by monocular endoscopes. It is based on two key insights. First, endoluminal cavities are watertight, a property naturally enforced by modeling them in terms of a signed distance function. Second, the scene illumination is variable. It comes from the endoscope's light sources and decays with the inverse of the squared… ▽ More We propose a new approach to 3D reconstruction from sequences of images acquired by monocular endoscopes. It is based on two key insights. First, endoluminal cavities are watertight, a property naturally enforced by modeling them in terms of a signed distance function. Second, the scene illumination is variable. It comes from the endoscope's light sources and decays with the inverse of the squared distance to the surface. To exploit these insights, we build on NeuS, a neural implicit surface reconstruction technique with an outstanding capability to learn appearance and a SDF surface model from multiple views, but currently limited to scenes with static illumination. To remove this limitation and exploit the relation between pixel brightness and depth, we modify the NeuS architecture to explicitly account for it and introduce a calibrated photometric model of the endoscope's camera and light source. Our method is the first one to produce watertight reconstructions of whole colon sections. We demonstrate excellent accuracy on phantom imagery. Remarkably, the watertight prior combined with illumination decline, allows to complete the reconstruction of unseen portions of the surface with acceptable accuracy, paving the way to automatic quality assessment of cancer screening explorations, measuring the global percentage of observed mucosa. △ Less

Submitted 6 September, 2023; originally announced September 2023.

Comments: 12 pages, 7 figures, 1 table, submitted to MICCAI 2023

arXiv:2308.10525 [pdf, other]

LightDepth: Single-View Depth Self-Supervision from Illumination Decline

Authors: Javier Rodríguez-Puigvert, Víctor M. Batlle, J. M. M. Montiel, Ruben Martinez-Cantin, Pascal Fua, Juan D. Tardós, Javier Civera

Abstract: Single-view depth estimation can be remarkably effective if there is enough ground-truth depth data for supervised training. However, there are scenarios, especially in medicine in the case of endoscopies, where such data cannot be obtained. In such cases, multi-view self-supervision and synthetic-to-real transfer serve as alternative approaches, however, with a considerable performance reduction… ▽ More Single-view depth estimation can be remarkably effective if there is enough ground-truth depth data for supervised training. However, there are scenarios, especially in medicine in the case of endoscopies, where such data cannot be obtained. In such cases, multi-view self-supervision and synthetic-to-real transfer serve as alternative approaches, however, with a considerable performance reduction in comparison to supervised case. Instead, we propose a single-view self-supervised method that achieves a performance similar to the supervised case. In some medical devices, such as endoscopes, the camera and light sources are co-located at a small distance from the target surfaces. Thus, we can exploit that, for any given albedo and surface orientation, pixel brightness is inversely proportional to the square of the distance to the surface, providing a strong single-view self-supervisory signal. In our experiments, our self-supervised models deliver accuracies comparable to those of fully supervised ones, while being applicable without depth ground-truth data. △ Less

Submitted 19 September, 2023; v1 submitted 21 August, 2023; originally announced August 2023.

arXiv:2308.04036 [pdf, other]

NR-SLAM: Non-Rigid Monocular SLAM

Authors: Juan J. Gomez Rodriguez, J. M. M Montiel, Juan D. Tardos

Abstract: In this paper we present NR-SLAM, a novel non-rigid monocular SLAM system founded on the combination of a Dynamic Deformation Graph with a Visco-Elastic deformation model. The former enables our system to represent the dynamics of the deforming environment as the camera explores, while the later allows us to model general deformations in a simple way. The presented system is able to automatically… ▽ More In this paper we present NR-SLAM, a novel non-rigid monocular SLAM system founded on the combination of a Dynamic Deformation Graph with a Visco-Elastic deformation model. The former enables our system to represent the dynamics of the deforming environment as the camera explores, while the later allows us to model general deformations in a simple way. The presented system is able to automatically initialize and extend a map modeled by a sparse point cloud in deforming environments, that is refined with a sliding-window Deformable Bundle Adjustment. This map serves as base for the estimation of the camera motion and deformation and enables us to represent arbitrary surface topologies, overcoming the limitations of previous methods. To assess the performance of our system in challenging deforming scenarios, we evaluate it in several representative medical datasets. In our experiments, NR-SLAM outperforms previous deformable SLAM systems, achieving millimeter reconstruction accuracy and bringing automated medical intervention closer. For the benefit of the community, we make the source code public. △ Less

Submitted 1 August, 2023; originally announced August 2023.

Comments: 12 pages, 7 figures, submited to the IEEE Transactions on Robotics (T-RO)

arXiv:2305.15118 [pdf, other]

Fairness in Streaming Submodular Maximization over a Matroid Constraint

Authors: Marwa El Halabi, Federico Fusco, Ashkan Norouzi-Fard, Jakab Tardos, Jakub Tarnawski

Abstract: Streaming submodular maximization is a natural model for the task of selecting a representative subset from a large-scale dataset. If datapoints have sensitive attributes such as gender or race, it becomes important to enforce fairness to avoid bias and discrimination. This has spurred significant interest in develo** fair machine learning algorithms. Recently, such algorithms have been develope… ▽ More Streaming submodular maximization is a natural model for the task of selecting a representative subset from a large-scale dataset. If datapoints have sensitive attributes such as gender or race, it becomes important to enforce fairness to avoid bias and discrimination. This has spurred significant interest in develo** fair machine learning algorithms. Recently, such algorithms have been developed for monotone submodular maximization under a cardinality constraint. In this paper, we study the natural generalization of this problem to a matroid constraint. We give streaming algorithms as well as impossibility results that provide trade-offs between efficiency, quality and fairness. We validate our findings empirically on a range of well-known real-world applications: exemplar-based clustering, movie recommendation, and maximum coverage in social networks. △ Less

Submitted 19 October, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

Comments: Accepted to ICML 23

arXiv:2305.05546 [pdf, other]

ColonMapper: topological map** and localization for colonoscopy

Authors: Javier Morlana, Juan D. Tardós, J. M. M. Montiel

Abstract: We propose a topological map** and localization system able to operate on real human colonoscopies, despite significant shape and illumination changes. The map is a graph where each node codes a colon location by a set of real images, while edges represent traversability between nodes. For close-in-time images, where scene changes are minor, place recognition can be successfully managed with the… ▽ More We propose a topological map** and localization system able to operate on real human colonoscopies, despite significant shape and illumination changes. The map is a graph where each node codes a colon location by a set of real images, while edges represent traversability between nodes. For close-in-time images, where scene changes are minor, place recognition can be successfully managed with the recent transformers-based local feature matching algorithms. However, under long-term changes -- such as different colonoscopies of the same patient -- feature-based matching fails. To address this, we train on real colonoscopies a deep global descriptor achieving high recall with significant changes in the scene. The addition of a Bayesian filter boosts the accuracy of long-term place recognition, enabling relocalization in a previously built map. Our experiments show that ColonMapper is able to autonomously build a map and localize against it in two important use cases: localization within the same colonoscopy or within different colonoscopies of the same patient. Code will be available upon acceptance. △ Less

Submitted 21 November, 2023; v1 submitted 9 May, 2023; originally announced May 2023.

Comments: Under review. ICRA 2024

arXiv:2209.03693 [pdf, other]

ExplORB-SLAM: Active Visual SLAM Exploiting the Pose-graph Topology

Authors: Julio A. Placed, Juan J. Gómez Rodríguez, Juan D. Tardós, José A. Castellanos

Abstract: Deploying autonomous robots capable of exploring unknown environments has long been a topic of great relevance to the robotics community. In this work, we take a further step in that direction by presenting an open-source active visual SLAM framework that leverages the accuracy of a state-of-the-art graph-SLAM system and takes advantage of the fast utility computation that exploiting the structure… ▽ More Deploying autonomous robots capable of exploring unknown environments has long been a topic of great relevance to the robotics community. In this work, we take a further step in that direction by presenting an open-source active visual SLAM framework that leverages the accuracy of a state-of-the-art graph-SLAM system and takes advantage of the fast utility computation that exploiting the structure of the underlying pose-graph offers. Through careful estimation of a posteriori weighted pose-graphs, D-optimal decision-making is achieved online with the objective of improving localization and map** uncertainties as exploration occurs. △ Less

Submitted 8 September, 2022; originally announced September 2022.

Comments: 12 pages. To be presented in 5th Iberian Robotics Conference

arXiv:2204.14240 [pdf, other]

doi 10.1038/s41597-023-02564-7

EndoMapper dataset of complete calibrated endoscopy procedures

Authors: Pablo Azagra, Carlos Sostres, Ángel Ferrandez, Luis Riazuelo, Clara Tomasini, Oscar León Barbed, Javier Morlana, David Recasens, Victor M. Batlle, Juan J. Gómez-Rodríguez, Richard Elvira, Julia López, Cristina Oriol, Javier Civera, Juan D. Tardós, Ana Cristina Murillo, Angel Lanas, José M. M. Montiel

Abstract: Computer-assisted systems are becoming broadly used in medicine. In endoscopy, most research focuses on the automatic detection of polyps or other pathologies, but localization and navigation of the endoscope are completely performed manually by physicians. To broaden this research and bring spatial Artificial Intelligence to endoscopies, data from complete procedures is needed. This paper introdu… ▽ More Computer-assisted systems are becoming broadly used in medicine. In endoscopy, most research focuses on the automatic detection of polyps or other pathologies, but localization and navigation of the endoscope are completely performed manually by physicians. To broaden this research and bring spatial Artificial Intelligence to endoscopies, data from complete procedures is needed. This paper introduces the Endomapper dataset, the first collection of complete endoscopy sequences acquired during regular medical practice, making secondary use of medical data. Its main purpose is to facilitate the development and evaluation of Visual Simultaneous Localization and Map** (VSLAM) methods in real endoscopy data. The dataset contains more than 24 hours of video. It is the first endoscopic dataset that includes endoscope calibration as well as the original calibration videos. Meta-data and annotations associated with the dataset vary from the anatomical landmarks, procedure labeling, segmentations, reconstructions, simulated sequences with ground truth and same patient procedures. The software used in this paper is publicly available. △ Less

Submitted 10 October, 2023; v1 submitted 29 April, 2022; originally announced April 2022.

Comments: 17 pages, 14 figures, 8 tables

Journal ref: Sci Data 10, 671 (2023)

arXiv:2204.09951 [pdf, other]

Motif Cut Sparsifiers

Authors: Michael Kapralov, Mikhail Makarov, Sandeep Silwal, Christian Sohler, Jakab Tardos

Abstract: A motif is a frequently occurring subgraph of a given directed or undirected graph $G$. Motifs capture higher order organizational structure of $G$ beyond edge relationships, and, therefore, have found wide applications such as in graph clustering, community detection, and analysis of biological and physical networks to name a few. In these applications, the cut structure of motifs plays a crucial… ▽ More A motif is a frequently occurring subgraph of a given directed or undirected graph $G$. Motifs capture higher order organizational structure of $G$ beyond edge relationships, and, therefore, have found wide applications such as in graph clustering, community detection, and analysis of biological and physical networks to name a few. In these applications, the cut structure of motifs plays a crucial role as vertices are partitioned into clusters by cuts whose conductance is based on the number of instances of a particular motif, as opposed to just the number of edges, crossing the cuts. In this paper, we introduce the concept of a motif cut sparsifier. We show that one can compute in polynomial time a sparse weighted subgraph $G'$ with only $\widetilde{O}(n/ε^2)$ edges such that for every cut, the weighted number of copies of $M$ crossing the cut in $G'$ is within a $1+ε$ factor of the number of copies of $M$ crossing the cut in $G$, for every constant size motif $M$. Our work carefully combines the viewpoints of both graph sparsification and hypergraph sparsification. We sample edges which requires us to extend and strengthen the concept of cut sparsifiers introduced in the seminal work of to the motif setting. We adapt the importance sampling framework through the viewpoint of hypergraph sparsification by deriving the edge sampling probabilities from the strong connectivity values of a hypergraph whose hyperedges represent motif instances. Finally, an iterative sparsification primitive inspired by both viewpoints is used to reduce the number of edges in $G$ to nearly linear. In addition, we present a strong lower bound ruling out a similar result for sparsification with respect to induced occurrences of motifs. △ Less

Submitted 12 September, 2022; v1 submitted 21 April, 2022; originally announced April 2022.

Comments: 48 pages, 3 figures

arXiv:2204.09083 [pdf, other]

Photometric single-view dense 3D reconstruction in endoscopy

Authors: Victor M. Batlle, J. M. M. Montiel, Juan D. Tardos

Abstract: Visual SLAM inside the human body will open the way to computer-assisted navigation in endoscopy. However, due to space limitations, medical endoscopes only provide monocular images, leading to systems lacking true scale. In this paper, we exploit the controlled lighting in colonoscopy to achieve the first in-vivo 3D reconstruction of the human colon using photometric stereo on a calibrated monocu… ▽ More Visual SLAM inside the human body will open the way to computer-assisted navigation in endoscopy. However, due to space limitations, medical endoscopes only provide monocular images, leading to systems lacking true scale. In this paper, we exploit the controlled lighting in colonoscopy to achieve the first in-vivo 3D reconstruction of the human colon using photometric stereo on a calibrated monocular endoscope. Our method works in a real medical environment, providing both a suitable in-place calibration procedure and a depth estimation technique adapted to the colon's tubular geometry. We validate our method on simulated colonoscopies, obtaining a mean error of 7% on depth estimation, which is below 3 mm on average. Our qualitative results on the EndoMapper dataset show that the method is able to correctly estimate the colon shape in real human colonoscopies, paving the ground for true-scale monocular SLAM in endoscopy. △ Less

Submitted 19 April, 2022; originally announced April 2022.

Comments: 7 pages, 7 figures, submitted to IROS 2022

arXiv:2204.08309 [pdf, other]

Tracking monocular camera pose and deformation for SLAM inside the human body

Authors: Juan J. Gomez Rodriguez, J. M. M Montiel, Juan D. Tardos

Abstract: Monocular SLAM in deformable scenes will open the way to multiple medical applications like computer-assisted navigation in endoscopy, automatic drug delivery or autonomous robotic surgery. In this paper we propose a novel method to simultaneously track the camera pose and the 3D scene deformation, without any assumption about environment topology or shape. The method uses an illumination-invarian… ▽ More Monocular SLAM in deformable scenes will open the way to multiple medical applications like computer-assisted navigation in endoscopy, automatic drug delivery or autonomous robotic surgery. In this paper we propose a novel method to simultaneously track the camera pose and the 3D scene deformation, without any assumption about environment topology or shape. The method uses an illumination-invariant photometric method to track image features and estimates camera motion and deformation combining reprojection error with spatial and temporal regularization of deformations. Our results in simulated colonoscopies show the method's accuracy and robustness in complex scenes under increasing levels of deformation. Our qualitative results in human colonoscopies from Endomapper dataset show that the method is able to successfully cope with the challenges of real endoscopies: deformations, low texture and strong illumination changes. We also compare with previous tracking methods in simpler scenarios from Hamlyn dataset where we obtain competitive performance, without needing any topological assumption. △ Less

Submitted 18 April, 2022; originally announced April 2022.

Comments: 8 pages, 3 figures, submitted to IROS 2022

arXiv:2112.00655 [pdf, ps, other]

Efficient and Local Parallel Random Walks

Authors: Michael Kapralov, Silvio Lattanzi, Navid Nouri, Jakab Tardos

Abstract: Random walks are a fundamental primitive used in many machine learning algorithms with several applications in clustering and semi-supervised learning. Despite their relevance, the first efficient parallel algorithm to compute random walks has been introduced very recently (Lacki et al.). Unfortunately their method has a fundamental shortcoming: their algorithm is non-local in that it heavily reli… ▽ More Random walks are a fundamental primitive used in many machine learning algorithms with several applications in clustering and semi-supervised learning. Despite their relevance, the first efficient parallel algorithm to compute random walks has been introduced very recently (Lacki et al.). Unfortunately their method has a fundamental shortcoming: their algorithm is non-local in that it heavily relies on computing random walks out of all nodes in the input graph, even though in many practical applications one is interested in computing random walks only from a small subset of nodes in the graph. In this paper, we present a new algorithm that overcomes this limitation by building random walk efficiently and locally at the same time. We show that our technique is both memory and round efficient, and in particular yields an efficient parallel local clustering algorithm. Finally, we complement our theoretical analysis with experimental results showing that our algorithm is significantly more scalable than previous approaches. △ Less

Submitted 1 December, 2021; originally announced December 2021.

arXiv:2109.10077 [pdf, other]

Scale-aware direct monocular odometry

Authors: Carlos Campos, Juan D. Tardós

Abstract: We present a generic framework for scale-aware direct monocular odometry based on depth prediction from a deep neural network. In contrast with previous methods where depth information is only partially exploited, we formulate a novel depth prediction residual which allows us to incorporate multi-view depth information. In addition, we propose to use a truncated robust cost function which prevents… ▽ More We present a generic framework for scale-aware direct monocular odometry based on depth prediction from a deep neural network. In contrast with previous methods where depth information is only partially exploited, we formulate a novel depth prediction residual which allows us to incorporate multi-view depth information. In addition, we propose to use a truncated robust cost function which prevents considering inconsistent depth estimations. The photometric and depth-prediction measurements are integrated into a tightly-coupled optimization leading to a scale-aware monocular system which does not accumulate scale drift. Our proposal does not particularize for a concrete neural network, being able to work along with the vast majority of the existing depth prediction solutions. We demonstrate the validity and generality of our proposal evaluating it on the KITTI odometry dataset, using two publicly available neural networks and comparing it with similar approaches and the state-of-the-art for monocular and stereo SLAM. Experiments show that our proposal largely outperforms classic monocular SLAM, being 5 to 9 times more precise, beating similar approaches and having an accuracy which is closer to that of stereo systems. △ Less

Submitted 22 July, 2022; v1 submitted 21 September, 2021; originally announced September 2021.

Comments: This paper has been accepted for publication in the IROS2022 conference

arXiv:2109.07370 [pdf, other]

Direct and Sparse Deformable Tracking

Authors: Jose Lamarca, Juan J. Gomez Rodriguez, Juan D. Tardos, J. M. M. Montiel

Abstract: Deformable Monocular SLAM algorithms recover the localization of a camera in an unknown deformable environment. Current approaches use a template-based deformable tracking to recover the camera pose and the deformation of the map. These template-based methods use an underlying global deformation model. In this paper, we introduce a novel deformable camera tracking method with a local deformation m… ▽ More Deformable Monocular SLAM algorithms recover the localization of a camera in an unknown deformable environment. Current approaches use a template-based deformable tracking to recover the camera pose and the deformation of the map. These template-based methods use an underlying global deformation model. In this paper, we introduce a novel deformable camera tracking method with a local deformation model for each point. Each map point is defined as a single textured surfel that moves independently of the other map points. Thanks to a direct photometric error cost function, we can track the position and orientation of the surfel without an explicit global deformation model. In our experiments, we validate the proposed system and observe that our local deformation model estimates more accurately and robustly the targeted deformations of the map in both laboratory-controlled experiments and in-body scenarios undergoing non-isometric deformations, with changing topology or discontinuities. △ Less

Submitted 15 September, 2021; originally announced September 2021.

Comments: 8 pages, 5 figures, submitted to RAL with ICRA

arXiv:2107.02578 [pdf, ps, other]

Noisy Boolean Hidden Matching with Applications

Authors: Michael Kapralov, Amulya Musipatla, Jakab Tardos, David P. Woodruff, Samson Zhou

Abstract: The Boolean Hidden Matching (BHM) problem, introduced in a seminal paper of Gavinsky et. al. [STOC'07], has played an important role in the streaming lower bounds for graph problems such as triangle and subgraph counting, maximum matching, MAX-CUT, Schatten $p$-norm approximation, maximum acyclic subgraph, testing bipartiteness, $k$-connectivity, and cycle-freeness. The one-way communication compl… ▽ More The Boolean Hidden Matching (BHM) problem, introduced in a seminal paper of Gavinsky et. al. [STOC'07], has played an important role in the streaming lower bounds for graph problems such as triangle and subgraph counting, maximum matching, MAX-CUT, Schatten $p$-norm approximation, maximum acyclic subgraph, testing bipartiteness, $k$-connectivity, and cycle-freeness. The one-way communication complexity of the Boolean Hidden Matching problem on a universe of size $n$ is $Θ(\sqrt{n})$, resulting in $Ω(\sqrt{n})$ lower bounds for constant factor approximations to several of the aforementioned graph problems. The related (and, in fact, more general) Boolean Hidden Hypermatching (BHH) problem introduced by Verbin and Yu [SODA'11] provides an approach to proving higher lower bounds of $Ω(n^{1-1/t})$ for integer $t\geq 2$. Reductions based on Boolean Hidden Hypermatching generate distributions on graphs with connected components of diameter about $t$, and basically show that long range exploration is hard in the streaming model of computation with adversarial arrivals. In this paper we introduce a natural variant of the BHM problem, called noisy BHM (and its natural noisy BHH variant), that we use to obtain higher than $Ω(\sqrt{n})$ lower bounds for approximating several of the aforementioned problems in graph streams when the input graphs consist only of components of diameter bounded by a fixed constant. We also use the noisy BHM problem to show that the problem of classifying whether an underlying graph is isomorphic to a complete binary tree in insertion-only streams requires $Ω(n)$ space, which seems challenging to show using BHM or BHH alone. △ Less

Submitted 28 January, 2022; v1 submitted 6 July, 2021; originally announced July 2021.

arXiv:2106.04805 [pdf, other]

Streaming Belief Propagation for Community Detection

Authors: Yuchen Wu, MohammadHossein Bateni, Andre Linhares, Filipe Miguel Goncalves de Almeida, Andrea Montanari, Ashkan Norouzi-Fard, Jakab Tardos

Abstract: The community detection problem requires to cluster the nodes of a network into a small number of well-connected "communities". There has been substantial recent progress in characterizing the fundamental statistical limits of community detection under simple stochastic block models. However, in real-world applications, the network structure is typically dynamic, with nodes that join over time. In… ▽ More The community detection problem requires to cluster the nodes of a network into a small number of well-connected "communities". There has been substantial recent progress in characterizing the fundamental statistical limits of community detection under simple stochastic block models. However, in real-world applications, the network structure is typically dynamic, with nodes that join over time. In this setting, we would like a detection algorithm to perform only a limited number of updates at each node arrival. While standard voting approaches satisfy this constraint, it is unclear whether they exploit the network information optimally. We introduce a simple model for networks growing over time which we refer to as streaming stochastic block model (StSBM). Within this model, we prove that voting algorithms have fundamental limitations. We also develop a streaming belief-propagation (StreamBP) approach, for which we prove optimality in certain regimes. We validate our theoretical findings on synthetic and real data. △ Less

Submitted 10 June, 2021; v1 submitted 9 June, 2021; originally announced June 2021.

Comments: 36 pages, 13 figures

arXiv:2106.02353 [pdf, ps, other]

Spectral Hypergraph Sparsifiers of Nearly Linear Size

Authors: Michael Kapralov, Robert Krauthgamer, Jakab Tardos, Yuichi Yoshida

Abstract: Graph sparsification has been studied extensively over the past two decades, culminating in spectral sparsifiers of optimal size (up to constant factors). Spectral hypergraph sparsification is a natural analogue of this problem, for which optimal bounds on the sparsifier size are not known, mainly because the hypergraph Laplacian is non-linear, and thus lacks the linear-algebraic structure and too… ▽ More Graph sparsification has been studied extensively over the past two decades, culminating in spectral sparsifiers of optimal size (up to constant factors). Spectral hypergraph sparsification is a natural analogue of this problem, for which optimal bounds on the sparsifier size are not known, mainly because the hypergraph Laplacian is non-linear, and thus lacks the linear-algebraic structure and tools that have been so effective for graphs. Our main contribution is the first algorithm for constructing $ε$-spectral sparsifiers for hypergraphs with $O^*(n)$ hyperedges, where $O^*$ suppresses $(ε^{-1} \log n)^{O(1)}$ factors. This bound is independent of the rank $r$ (maximum cardinality of a hyperedge), and is essentially best possible due to a recent bit complexity lower bound of $Ω(nr)$ for hypergraph sparsification. This result is obtained by introducing two new tools. First, we give a new proof of spectral concentration bounds for sparsifiers of graphs; it avoids linear-algebraic methods, replacing e.g.~the usual application of the matrix Bernstein inequality and therefore applies to the (non-linear) hypergraph setting. To achieve the result, we design a new sequence of hypergraph-dependent $ε$-nets on the unit sphere in $\mathbb{R}^n$. Second, we extend the weight assignment technique of Chen, Khanna and Nagda [FOCS'20] to the spectral sparsification setting. Surprisingly, the number of spanning trees after the weight assignment can serve as a potential function guiding the reweighting process in the spectral setting. △ Less

Submitted 4 June, 2021; originally announced June 2021.

arXiv:2011.06530 [pdf, ps, other]

Towards Tight Bounds for Spectral Sparsification of Hypergraphs

Authors: Michael Kapralov, Robert Krauthgamer, Jakab Tardos, Yuichi Yoshida

Abstract: Cut and spectral sparsification of graphs have numerous applications, including e.g. speeding up algorithms for cuts and Laplacian solvers. These powerful notions have recently been extended to hypergraphs, which are much richer and may offer new applications. However, the current bounds on the size of hypergraph sparsifiers are not as tight as the corresponding bounds for graphs. Our first resu… ▽ More Cut and spectral sparsification of graphs have numerous applications, including e.g. speeding up algorithms for cuts and Laplacian solvers. These powerful notions have recently been extended to hypergraphs, which are much richer and may offer new applications. However, the current bounds on the size of hypergraph sparsifiers are not as tight as the corresponding bounds for graphs. Our first result is a polynomial-time algorithm that, given a hypergraph on $n$ vertices with maximum hyperedge size $r$, outputs an $ε$-spectral sparsifier with $O^*(nr)$ hyperedges, where $O^*$ suppresses $(ε^{-1} \log n)^{O(1)}$ factors. This size bound improves the two previous bounds: $O^*(n^3)$ [Soma and Yoshida, SODA'19] and $O^*(nr^3)$ [Bansal, Svensson and Trevisan, FOCS'19]. Our main technical tool is a new method for proving concentration of the nonlinear analogue of the quadratic form of the Laplacians for hypergraph expanders. We complement this with lower bounds on the bit complexity of any compression scheme that $(1+ε)$-approximates all the cuts in a given hypergraph, and hence also on the bit complexity of every $ε$-cut/spectral sparsifier. These lower bounds are based on Ruzsa-Szemerédi graphs, and a particular instantiation yields an $Ω(nr)$ lower bound on the bit complexity even for fixed constant $ε$. This is tight up to polylogarithmic factors in $n$, due to recent hypergraph cut sparsifiers of [Chen, Khanna and Nagda, FOCS'20]. Finally, for directed hypergraphs, we present an algorithm that computes an $ε$-spectral sparsifier with $O^*(n^2r^3)$ hyperarcs, where $r$ is the maximum size of a hyperarc. For small $r$, this improves over $O^*(n^3)$ known from [Soma and Yoshida, SODA'19], and is getting close to the trivial lower bound of $Ω(n^2)$ hyperarcs. △ Less

Submitted 12 April, 2021; v1 submitted 12 November, 2020; originally announced November 2020.

arXiv:2011.06481 [pdf, ps, other]

Communication Efficient Coresets for Maximum Matching

Authors: Michael Kapralov, Gilbert Maystre, Jakab Tardos

Abstract: In this paper we revisit the problem of constructing randomized composable coresets for bipartite matching. In this problem the input graph is randomly partitioned across $k$ players, each of which sends a single message to a coordinator, who then must output a good approximation to the maximum matching in the input graph. Assadi and Khanna gave the first such coreset, achieving a $1/9$-approximat… ▽ More In this paper we revisit the problem of constructing randomized composable coresets for bipartite matching. In this problem the input graph is randomly partitioned across $k$ players, each of which sends a single message to a coordinator, who then must output a good approximation to the maximum matching in the input graph. Assadi and Khanna gave the first such coreset, achieving a $1/9$-approximation by having every player send a maximum matching, i.e. at most $n/2$ words per player. The approximation factor was improved to $1/3$ by Bernstein et al. In this paper, we show that the matching skeleton construction of Goel, Kapralov and Khanna, which is a carefully chosen (fractional) matching, is a randomized composable coreset that achieves a $1/2-o(1)$ approximation using at most $n-1$ words of communication per player. We also show an upper bound of $2/3+o(1)$ on the approximation ratio achieved by this coreset. △ Less

Submitted 12 November, 2020; originally announced November 2020.

arXiv:2010.09409 [pdf, other]

SD-DefSLAM: Semi-Direct Monocular SLAM for Deformable and Intracorporeal Scenes

Authors: Juan J. Gómez Rodríguez, José Lamarca, Javier Morlana, Juan D. Tardós, José M. M. Montiel

Abstract: Conventional SLAM techniques strongly rely on scene rigidity to solve data association, ignoring dynamic parts of the scene. In this work we present Semi-Direct DefSLAM (SD-DefSLAM), a novel monocular deformable SLAM method able to map highly deforming environments, built on top of DefSLAM. To robustly solve data association in challenging deforming scenes, SD-DefSLAM combines direct and indirect… ▽ More Conventional SLAM techniques strongly rely on scene rigidity to solve data association, ignoring dynamic parts of the scene. In this work we present Semi-Direct DefSLAM (SD-DefSLAM), a novel monocular deformable SLAM method able to map highly deforming environments, built on top of DefSLAM. To robustly solve data association in challenging deforming scenes, SD-DefSLAM combines direct and indirect methods: an enhanced illumination-invariant Lucas-Kanade tracker for data association, geometric Bundle Adjustment for pose and deformable map estimation, and bag-of-words based on feature descriptors for camera relocation. Dynamic objects are detected and segmented-out using a CNN trained for the specific application domain. We thoroughly evaluate our system in two public datasets. The mandala dataset is a SLAM benchmark with increasingly aggressive deformations. The Hamlyn dataset contains intracorporeal sequences that pose serious real-life challenges beyond deformation like weak texture, specular reflections, surgical tools and occlusions. Our results show that SD-DefSLAM outperforms DefSLAM in point tracking, reconstruction accuracy and scale drift thanks to the improvement in all the data association steps, being the first system able to robustly perform SLAM inside the human body. △ Less

Submitted 19 October, 2020; originally announced October 2020.

Comments: 10 pages, 8 figures. Submitted to RA-L with option to ICRA 2021. Associated video: https://youtu.be/gkcC0IR3X6A

ACM Class: I.4.5; I.4.6; I.4.8

arXiv:2010.07820 [pdf, other]

DynaSLAM II: Tightly-Coupled Multi-Object Tracking and SLAM

Authors: Berta Bescos, Carlos Campos, Juan D. Tardós, José Neira

Abstract: The assumption of scene rigidity is common in visual SLAM algorithms. However, it limits their applicability in populated real-world environments. Furthermore, most scenarios including autonomous driving, multi-robot collaboration and augmented/virtual reality, require explicit motion information of the surroundings to help with decision making and scene understanding. We present in this paper Dyn… ▽ More The assumption of scene rigidity is common in visual SLAM algorithms. However, it limits their applicability in populated real-world environments. Furthermore, most scenarios including autonomous driving, multi-robot collaboration and augmented/virtual reality, require explicit motion information of the surroundings to help with decision making and scene understanding. We present in this paper DynaSLAM II, a visual SLAM system for stereo and RGB-D configurations that tightly integrates the multi-object tracking capability. DynaSLAM II makes use of instance semantic segmentation and of ORB features to track dynamic objects. The structure of the static scene and of the dynamic objects is optimized jointly with the trajectories of both the camera and the moving agents within a novel bundle adjustment proposal. The 3D bounding boxes of the objects are also estimated and loosely optimized within a fixed temporal window. We demonstrate that tracking dynamic objects does not only provide rich clues for scene understanding but is also beneficial for camera tracking. The project code will be released upon acceptance. △ Less

Submitted 15 October, 2020; originally announced October 2020.

arXiv:2010.07431 [pdf, other]

Fairness in Streaming Submodular Maximization: Algorithms and Hardness

Authors: Marwa El Halabi, Slobodan Mitrović, Ashkan Norouzi-Fard, Jakab Tardos, Jakub Tarnawski

Abstract: Submodular maximization has become established as the method of choice for the task of selecting representative and diverse summaries of data. However, if datapoints have sensitive attributes such as gender or age, such machine learning algorithms, left unchecked, are known to exhibit bias: under- or over-representation of particular groups. This has made the design of fair machine learning algori… ▽ More Submodular maximization has become established as the method of choice for the task of selecting representative and diverse summaries of data. However, if datapoints have sensitive attributes such as gender or age, such machine learning algorithms, left unchecked, are known to exhibit bias: under- or over-representation of particular groups. This has made the design of fair machine learning algorithms increasingly important. In this work we address the question: Is it possible to create fair summaries for massive datasets? To this end, we develop the first streaming approximation algorithms for submodular maximization under fairness constraints, for both monotone and non-monotone functions. We validate our findings empirically on exemplar-based clustering, movie recommendation, DPP-based summarization, and maximum coverage in social networks, showing that fairness constraints do not significantly impact utility. △ Less

Submitted 18 October, 2020; v1 submitted 14 October, 2020; originally announced October 2020.

Comments: Accepted to NeurIPS 2020

arXiv:2007.11898 [pdf, other]

doi 10.1109/TRO.2021.3075644

ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual-Inertial and Multi-Map SLAM

Authors: Carlos Campos, Richard Elvira, Juan J. Gómez Rodríguez, José M. M. Montiel, Juan D. Tardós

Abstract: This paper presents ORB-SLAM3, the first system able to perform visual, visual-inertial and multi-map SLAM with monocular, stereo and RGB-D cameras, using pin-hole and fisheye lens models. The first main novelty is a feature-based tightly-integrated visual-inertial SLAM system that fully relies on Maximum-a-Posteriori (MAP) estimation, even during the IMU initialization phase. The result is a syst… ▽ More This paper presents ORB-SLAM3, the first system able to perform visual, visual-inertial and multi-map SLAM with monocular, stereo and RGB-D cameras, using pin-hole and fisheye lens models. The first main novelty is a feature-based tightly-integrated visual-inertial SLAM system that fully relies on Maximum-a-Posteriori (MAP) estimation, even during the IMU initialization phase. The result is a system that operates robustly in real-time, in small and large, indoor and outdoor environments, and is 2 to 5 times more accurate than previous approaches. The second main novelty is a multiple map system that relies on a new place recognition method with improved recall. Thanks to it, ORB-SLAM3 is able to survive to long periods of poor visual information: when it gets lost, it starts a new map that will be seamlessly merged with previous maps when revisiting mapped areas. Compared with visual odometry systems that only use information from the last few seconds, ORB-SLAM3 is the first system able to reuse in all the algorithm stages all previous information. This allows to include in bundle adjustment co-visible keyframes, that provide high parallax observations boosting accuracy, even if they are widely separated in time or if they come from a previous map** session. Our experiments show that, in all sensor configurations, ORB-SLAM3 is as robust as the best systems available in the literature, and significantly more accurate. Notably, our stereo-inertial SLAM achieves an average accuracy of 3.6 cm on the EuRoC drone and 9 mm under quick hand-held motions in the room of TUM-VI dataset, a setting representative of AR/VR scenarios. For the benefit of the community we make public the source code. △ Less

Submitted 23 April, 2021; v1 submitted 23 July, 2020; originally announced July 2020.

arXiv:2003.05766 [pdf, other]

Inertial-Only Optimization for Visual-Inertial Initialization

Authors: Carlos Campos, José M. M. Montiel, Juan D. Tardós

Abstract: We formulate for the first time visual-inertial initialization as an optimal estimation problem, in the sense of maximum-a-posteriori (MAP) estimation. This allows us to properly take into account IMU measurement uncertainty, which was neglected in previous methods that either solved sets of algebraic equations, or minimized ad-hoc cost functions using least squares. Our exhaustive initialization… ▽ More We formulate for the first time visual-inertial initialization as an optimal estimation problem, in the sense of maximum-a-posteriori (MAP) estimation. This allows us to properly take into account IMU measurement uncertainty, which was neglected in previous methods that either solved sets of algebraic equations, or minimized ad-hoc cost functions using least squares. Our exhaustive initialization tests on EuRoC dataset show that our proposal largely outperforms the best methods in the literature, being able to initialize in less than 4 seconds in almost any point of the trajectory, with a scale error of 5.3% on average. This initialization has been integrated into ORB-SLAM Visual-Inertial boosting its robustness and efficiency while maintaining its excellent accuracy. △ Less

Submitted 12 March, 2020; originally announced March 2020.

Comments: 2020 International Conference on Robotics and Automation

arXiv:1908.11585 [pdf, other]

ORBSLAM-Atlas: a robust and accurate multi-map system

Authors: Richard Elvira, Juan D. Tardós, J. M. M. Montiel

Abstract: We propose ORBSLAM-Atlas, a system able to handle an unlimited number of disconnected sub-maps, that includes a robust map merging algorithm able to detect sub-maps with common regions and seamlessly fuse them. The outstanding robustness and accuracy of ORBSLAM are due to its ability to detect wide-baseline matches between keyframes, and to exploit them by means of non-linear optimization, however… ▽ More We propose ORBSLAM-Atlas, a system able to handle an unlimited number of disconnected sub-maps, that includes a robust map merging algorithm able to detect sub-maps with common regions and seamlessly fuse them. The outstanding robustness and accuracy of ORBSLAM are due to its ability to detect wide-baseline matches between keyframes, and to exploit them by means of non-linear optimization, however it only can handle a single map. ORBSLAM-Atlas brings the wide-baseline matching detection and exploitation to the multiple map arena. The result is a SLAM system significantly more general and robust, able to perform multi-session map**. If tracking is lost during exploration, instead of freezing the map, a new sub-map is launched, and it can be fused with the previous map when common parts are visited. Our criteria to declare the camera lost contrast with previous approaches that simply count the number of tracked points, we propose to discard also inaccurately estimated camera poses due to bad geometrical conditioning. As a result, the map is split into more accurate sub-maps, that are eventually merged in a more accurate global map, thanks to the multi-map** capabilities. We provide extensive experimental validation in the EuRoC datasets, where ORBSLAM-Atlas obtains accurate monocular and stereo results in the difficult sequences where ORBSLAM failed. We also build global maps after multiple sessions in the same room, obtaining the best results to date, between 2 and 3 times more accurate than competing multi-map approaches. We also show the robustness and capability of our system to deal with dynamic scenes, quantitatively in the EuRoC datasets and qualitatively in a densely populated corridor where camera occlusions and tracking losses are frequent. △ Less

Submitted 30 August, 2019; originally announced August 2019.

Comments: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

arXiv:1908.10653 [pdf, other]

doi 10.1109/ICRA.2019.8793718

Fast and Robust Initialization for Visual-Inertial SLAM

Authors: Carlos Campos, J. M. M. Montiel, Juan D. Tardós

Abstract: Visual-inertial SLAM (VI-SLAM) requires a good initial estimation of the initial velocity, orientation with respect to gravity and gyroscope and accelerometer biases. In this paper we build on the initialization method proposed by Martinelli and extended by Kaiser et al. , modifying it to be more general and efficient. We improve accuracy with several rounds of visual-inertial bundle adjustment, a… ▽ More Visual-inertial SLAM (VI-SLAM) requires a good initial estimation of the initial velocity, orientation with respect to gravity and gyroscope and accelerometer biases. In this paper we build on the initialization method proposed by Martinelli and extended by Kaiser et al. , modifying it to be more general and efficient. We improve accuracy with several rounds of visual-inertial bundle adjustment, and robustify the method with novel observability and consensus tests, that discard erroneous solutions. Our results on the EuRoC dataset show that, while the original method produces scale errors up to 156%, our method is able to consistently initialize in less than two seconds with scale errors around 5%, which can be further reduced to less than 1% performing visual-inertial bundle adjustment after ten seconds. △ Less

Submitted 28 August, 2019; originally announced August 2019.

Comments: 2019 International Conference on Robotics and Automation

Journal ref: C. Campos, M. José M.M. and J. D. Tardós, "Fast and Robust Initialization for Visual-Inertial SLAM," 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 2019, pp. 1288-1294

arXiv:1907.05725 [pdf, other]

Space Efficient Approximation to Maximum Matching Size from Uniform Edge Samples

Authors: Michael Kapralov, Slobodan Mitrović, Ashkan Norouzi-Fard, Jakab Tardos

Abstract: Given a source of iid samples of edges of an input graph $G$ with $n$ vertices and $m$ edges, how many samples does one need to compute a constant factor approximation to the maximum matching size in $G$? Moreover, is it possible to obtain such an estimate in a small amount of space? We show that, on the one hand, this problem cannot be solved using a nontrivially sublinear (in $m$) number of samp… ▽ More Given a source of iid samples of edges of an input graph $G$ with $n$ vertices and $m$ edges, how many samples does one need to compute a constant factor approximation to the maximum matching size in $G$? Moreover, is it possible to obtain such an estimate in a small amount of space? We show that, on the one hand, this problem cannot be solved using a nontrivially sublinear (in $m$) number of samples: $m^{1-o(1)}$ samples are needed. On the other hand, a surprisingly space efficient algorithm for processing the samples exists: $O(\log^2 n)$ bits of space suffice to compute an estimate. Our main technical tool is a new peeling type algorithm for matching that we simulate using a recursive sampling process that crucially ensures that local neighborhood information from `dense' regions of the graph is provided at appropriately higher sampling rates. We show that a delicate balance between exploration depth and sampling rate allows our simulation to not lose precision over a logarithmic number of levels of recursion and achieve a constant factor approximation. The previous best result on matching size estimation from random samples was a $\log^{O(1)} n$ approximation [Kapralov et al'14]. Our algorithm also yields a constant factor approximate local computation algorithm (LCA) for matching with $O(d\log n)$ exploration starting from any vertex. Previous approaches were based on local simulations of randomized greedy, which take $O(d)$ time {\em in expectation over the starting vertex or edge} (Yoshida et al'09, Onak et al'12), and could not achieve a better than $d^2$ runtime. Interestingly, we also show that unlike our algorithm, the local simulation of randomized greedy that is the basis of the most efficient prior results does take $\wtΩ(d^2)\gg O(d\log n)$ time for a worst case edge even for $d=\exp(Θ(\sqrt{\log n}))$. △ Less

Submitted 12 July, 2019; originally announced July 2019.

arXiv:1903.12150 [pdf, ps, other]

Dynamic Streaming Spectral Sparsification in Nearly Linear Time and Space

Authors: Michael Kapralov, Navid Nouri, Aaron Sidford, Jakab Tardos

Abstract: In this paper we consider the problem of computing spectral approximations to graphs in the single pass dynamic streaming model. We provide a linear sketching based solution that given a stream of edge insertions and deletions to a $n$-node undirected graph, uses $\tilde O(n)$ space, processes each update in $\tilde O(1)$ time, and with high probability recovers a spectral sparsifier in… ▽ More In this paper we consider the problem of computing spectral approximations to graphs in the single pass dynamic streaming model. We provide a linear sketching based solution that given a stream of edge insertions and deletions to a $n$-node undirected graph, uses $\tilde O(n)$ space, processes each update in $\tilde O(1)$ time, and with high probability recovers a spectral sparsifier in $\tilde O(n)$ time. Prior to our work, state of the art results either used near optimal $\tilde O(n)$ space complexity, but brute-force $Ω(n^2)$ recovery time [Kapralov et al.'14], or with subquadratic runtime, but polynomially suboptimal space complexity [Ahn et al.'14, Kapralov et al.'19]. Our main technical contribution is a novel method for `bucketing' vertices of the input graph into clusters that allows fast recovery of edges of sufficiently large effective resistance. Our algorithm first buckets vertices of the graph by performing ball-carving using (an approximation to) its effective resistance metric, and then recovers the high effective resistance edges from a sketched version of an electrical flow between vertices in a bucket, taking nearly linear time in the number of vertices overall. This process is performed at different geometric scales to recover a sample of edges with probabilities proportional to effective resistances and obtain an actual sparsifier of the input graph. This work provides both the first efficient $\ell_2$-sparse recovery algorithm for graphs and new primitives for manipulating the effective resistance embedding of a graph, both of which we hope have further applications. △ Less

Submitted 28 March, 2019; originally announced March 2019.

arXiv:1801.10043 [pdf, other]

Simultaneous Deployment and Tracking Multi-Robot Strategies with Connectivity Maintenance

Authors: Javier Tardós, Rosario Aragues, Carlos Sagüés, Carlos Rubio

Abstract: Multi robot teams composed by ground and aerial vehicles have gained attention during the last years. We present a scenario where both types of robots must monitor the same area from different view points. In this paper we propose two Lloyd-based tracking strategies to allow the ground robots (agents) follow the aerial ones (targets), kee** the connectivity between the agents. The first strategy… ▽ More Multi robot teams composed by ground and aerial vehicles have gained attention during the last years. We present a scenario where both types of robots must monitor the same area from different view points. In this paper we propose two Lloyd-based tracking strategies to allow the ground robots (agents) follow the aerial ones (targets), kee** the connectivity between the agents. The first strategy establishes density functions on the environment so that the targets acquire more importance than other zones, while the second one iteratively modifies the virtual limits of the working area depending on the positions of the targets. We consider the connectivity maintenance due to the fact that coverage tasks tend to spread the agents as much as possible, which is addressed by restricting their motions so that they keep the links of a Minimum Spanning Tree of the communication graph. We provide a thorough parametric study of the performance of the proposed strategies under several simulated scenarios. In addition, the methods are implemented and tested using realistic robotic simulation environments and real experiments. △ Less

Submitted 19 March, 2018; v1 submitted 30 January, 2018; originally announced January 2018.

arXiv:1610.06475 [pdf, other]

doi 10.1109/TRO.2017.2705103

ORB-SLAM2: an Open-Source SLAM System for Monocular, Stereo and RGB-D Cameras

Authors: Raul Mur-Artal, Juan D. Tardos

Abstract: We present ORB-SLAM2 a complete SLAM system for monocular, stereo and RGB-D cameras, including map reuse, loop closing and relocalization capabilities. The system works in real-time on standard CPUs in a wide variety of environments from small hand-held indoors sequences, to drones flying in industrial environments and cars driving around a city. Our back-end based on bundle adjustment with monocu… ▽ More We present ORB-SLAM2 a complete SLAM system for monocular, stereo and RGB-D cameras, including map reuse, loop closing and relocalization capabilities. The system works in real-time on standard CPUs in a wide variety of environments from small hand-held indoors sequences, to drones flying in industrial environments and cars driving around a city. Our back-end based on bundle adjustment with monocular and stereo observations allows for accurate trajectory estimation with metric scale. Our system includes a lightweight localization mode that leverages visual odometry tracks for unmapped regions and matches to map points that allow for zero-drift localization. The evaluation on 29 popular public sequences shows that our method achieves state-of-the-art accuracy, being in most cases the most accurate SLAM solution. We publish the source code, not only for the benefit of the SLAM community, but with the aim of being an out-of-the-box SLAM solution for researchers in other fields. △ Less

Submitted 19 June, 2017; v1 submitted 20 October, 2016; originally announced October 2016.

Comments: Accepted for publication in IEEE Transactions on Robotics

arXiv:1610.05949 [pdf, other]

doi 10.1109/LRA.2017.2653359

Visual-Inertial Monocular SLAM with Map Reuse

Authors: Raul Mur-Artal, Juan D. Tardos

Abstract: In recent years there have been excellent results in Visual-Inertial Odometry techniques, which aim to compute the incremental motion of the sensor with high accuracy and robustness. However these approaches lack the capability to close loops, and trajectory estimation accumulates drift even if the sensor is continually revisiting the same place. In this work we present a novel tightly-coupled Vis… ▽ More In recent years there have been excellent results in Visual-Inertial Odometry techniques, which aim to compute the incremental motion of the sensor with high accuracy and robustness. However these approaches lack the capability to close loops, and trajectory estimation accumulates drift even if the sensor is continually revisiting the same place. In this work we present a novel tightly-coupled Visual-Inertial Simultaneous Localization and Map** system that is able to close loops and reuse its map to achieve zero-drift localization in already mapped areas. While our approach can be applied to any camera configuration, we address here the most general problem of a monocular camera, with its well-known scale ambiguity. We also propose a novel IMU initialization method, which computes the scale, the gravity direction, the velocity, and gyroscope and accelerometer biases, in a few seconds with high accuracy. We test our system in the 11 sequences of a recent micro-aerial vehicle public dataset achieving a typical scale factor error of 1% and centimeter precision. We compare to the state-of-the-art in visual-inertial odometry in sequences with revisiting, proving the better accuracy of our method due to map reuse and no drift accumulation. △ Less

Submitted 17 January, 2017; v1 submitted 19 October, 2016; originally announced October 2016.

Comments: Accepted for publication in IEEE Robotics and Automation Letters

arXiv:1504.02398 [pdf, other]

Real-time Monocular Object SLAM

Authors: Dorian Gálvez-López, Marta Salas, Juan D. Tardós, J. M. M. Montiel

Abstract: We present a real-time object-based SLAM system that leverages the largest object database to date. Our approach comprises two main components: 1) a monocular SLAM algorithm that exploits object rigidity constraints to improve the map and find its real scale, and 2) a novel object recognition algorithm based on bags of binary words, which provides live detections with a database of 500 3D objects.… ▽ More We present a real-time object-based SLAM system that leverages the largest object database to date. Our approach comprises two main components: 1) a monocular SLAM algorithm that exploits object rigidity constraints to improve the map and find its real scale, and 2) a novel object recognition algorithm based on bags of binary words, which provides live detections with a database of 500 3D objects. The two components work together and benefit each other: the SLAM algorithm accumulates information from the observations of the objects, anchors object features to especial map landmarks and sets constrains on the optimization. At the same time, objects partially or fully located within the map are used as a prior to guide the recognition algorithm, achieving higher recall. We evaluate our proposal on five real environments showing improvements on the accuracy of the map and efficiency with respect to other state-of-the-art techniques. △ Less

Submitted 9 April, 2015; originally announced April 2015.

arXiv:1502.00956 [pdf, other]

doi 10.1109/TRO.2015.2463671

ORB-SLAM: a Versatile and Accurate Monocular SLAM System

Authors: Raul Mur-Artal, J. M. M. Montiel, Juan D. Tardos

Abstract: This paper presents ORB-SLAM, a feature-based monocular SLAM system that operates in real time, in small and large, indoor and outdoor environments. The system is robust to severe motion clutter, allows wide baseline loop closing and relocalization, and includes full automatic initialization. Building on excellent algorithms of recent years, we designed from scratch a novel system that uses the sa… ▽ More This paper presents ORB-SLAM, a feature-based monocular SLAM system that operates in real time, in small and large, indoor and outdoor environments. The system is robust to severe motion clutter, allows wide baseline loop closing and relocalization, and includes full automatic initialization. Building on excellent algorithms of recent years, we designed from scratch a novel system that uses the same features for all SLAM tasks: tracking, map**, relocalization, and loop closing. A survival of the fittest strategy that selects the points and keyframes of the reconstruction leads to excellent robustness and generates a compact and trackable map that only grows if the scene content changes, allowing lifelong operation. We present an exhaustive evaluation in 27 sequences from the most popular datasets. ORB-SLAM achieves unprecedented performance with respect to other state-of-the-art monocular SLAM approaches. For the benefit of the community, we make the source code public. △ Less

Submitted 18 September, 2015; v1 submitted 3 February, 2015; originally announced February 2015.

Comments: 17 pages. 13 figures. IEEE Transactions on Robotics, 2015. Project webpage (videos, code): http://webdiis.unizar.es/~raulmur/orbslam/

Showing 1–33 of 33 results for author: Tardos, J