Search | arXiv e-print repository

LOSS-SLAM: Lightweight Open-Set Semantic Simultaneous Localization and Map**

Authors: Kurran Singh, Tim Magoun, John J. Leonard

Abstract: Enabling robots to understand the world in terms of objects is a critical building block towards higher level autonomy. The success of foundation models in vision has created the ability to segment and identify nearly all objects in the world. However, utilizing such objects to localize the robot and build an open-set semantic map of the world remains an open research question. In this work, a sys… ▽ More Enabling robots to understand the world in terms of objects is a critical building block towards higher level autonomy. The success of foundation models in vision has created the ability to segment and identify nearly all objects in the world. However, utilizing such objects to localize the robot and build an open-set semantic map of the world remains an open research question. In this work, a system of identifying, localizing, and encoding objects is tightly coupled with probabilistic graphical models for performing open-set semantic simultaneous localization and map** (SLAM). Results are presented demonstrating that the proposed lightweight object encoding can be used to perform more accurate object-based SLAM than existing open-set methods, closed-set methods, and geometric methods while incurring a lower computational overhead than existing open-set map** methods. △ Less

Submitted 5 April, 2024; originally announced April 2024.

arXiv:2403.12837 [pdf, other]

Opti-Acoustic Semantic SLAM with Unknown Objects in Underwater Environments

Authors: Kurran Singh, Jungseok Hong, Nicholas R. Rypkema, John J. Leonard

Abstract: Despite recent advances in semantic Simultaneous Localization and Map** (SLAM) for terrestrial and aerial applications, underwater semantic SLAM remains an open and largely unaddressed research problem due to the unique sensing modalities and the object classes found underwater. This paper presents an object-based semantic SLAM method for underwater environments that can identify, localize, clas… ▽ More Despite recent advances in semantic Simultaneous Localization and Map** (SLAM) for terrestrial and aerial applications, underwater semantic SLAM remains an open and largely unaddressed research problem due to the unique sensing modalities and the object classes found underwater. This paper presents an object-based semantic SLAM method for underwater environments that can identify, localize, classify, and map a wide variety of marine objects without a priori knowledge of the object classes present in the scene. The method performs unsupervised object segmentation and object-level feature aggregation, and then uses opti-acoustic sensor fusion for object localization. Probabilistic data association is used to determine observation to landmark correspondences. Given such correspondences, the method then jointly optimizes landmark and vehicle position estimates. Indoor and outdoor underwater datasets with a wide variety of objects and challenging acoustic and lighting conditions are collected for evaluation and made publicly available. Quantitative and qualitative results show the proposed method achieves reduced trajectory error compared to baseline methods, and is able to obtain comparable map accuracy to a baseline closed-set method that requires hand-labeled data of all objects in the scene. △ Less

Submitted 19 March, 2024; originally announced March 2024.

arXiv:2303.14283 [pdf, other]

GAPSLAM: Blending Gaussian Approximation and Particle Filters for Real-Time Non-Gaussian SLAM

Authors: Qiangqiang Huang, John J. Leonard

Abstract: Inferring the posterior distribution in SLAM is critical for evaluating the uncertainty in localization and map**, as well as supporting subsequent planning tasks aiming to reduce uncertainty for safe navigation. However, real-time full posterior inference techniques, such as Gaussian approximation and particle filters, either lack expressiveness for representing non-Gaussian posteriors or suffe… ▽ More Inferring the posterior distribution in SLAM is critical for evaluating the uncertainty in localization and map**, as well as supporting subsequent planning tasks aiming to reduce uncertainty for safe navigation. However, real-time full posterior inference techniques, such as Gaussian approximation and particle filters, either lack expressiveness for representing non-Gaussian posteriors or suffer from performance degeneracy when estimating high-dimensional posteriors. Inspired by the complementary strengths of Gaussian approximation and particle filters$\unicode{x2013}$scalability and non-Gaussian estimation, respectively$\unicode{x2013}$we blend these two approaches to infer marginal posteriors in SLAM. Specifically, Gaussian approximation provides robot pose distributions on which particle filters are conditioned to sample landmark marginals. In return, the maximum a posteriori point among these samples can be used to reset linearization points in the nonlinear optimization solver of the Gaussian approximation, facilitating the pursuit of global optima. We demonstrate the scalability, generalizability, and accuracy of our algorithm for real-time full posterior inference on realworld range-only SLAM and object-based bearing-only SLAM datasets. △ Less

Submitted 9 August, 2023; v1 submitted 24 March, 2023; originally announced March 2023.

Comments: 8 pages, 5 figures. To appear in IROS 2023

arXiv:2303.07308 [pdf, other]

NeuSE: Neural SE(3)-Equivariant Embedding for Consistent Spatial Understanding with Objects

Authors: Jiahui Fu, Yilun Du, Kurran Singh, Joshua B. Tenenbaum, John J. Leonard

Abstract: We present NeuSE, a novel Neural SE(3)-Equivariant Embedding for objects, and illustrate how it supports object SLAM for consistent spatial understanding with long-term scene changes. NeuSE is a set of latent object embeddings created from partial object observations. It serves as a compact point cloud surrogate for complete object models, encoding full shape information while transforming SE(3)-e… ▽ More We present NeuSE, a novel Neural SE(3)-Equivariant Embedding for objects, and illustrate how it supports object SLAM for consistent spatial understanding with long-term scene changes. NeuSE is a set of latent object embeddings created from partial object observations. It serves as a compact point cloud surrogate for complete object models, encoding full shape information while transforming SE(3)-equivariantly in tandem with the object in the physical world. With NeuSE, relative frame transforms can be directly derived from inferred latent codes. Our proposed SLAM paradigm, using NeuSE for object shape and pose characterization, can operate independently or in conjunction with typical SLAM systems. It directly infers SE(3) camera pose constraints that are compatible with general SLAM pose graph optimization, while also maintaining a lightweight object-centric map that adapts to real-world changes. Our approach is evaluated on synthetic and real-world sequences featuring changed objects and shows improved localization accuracy and change-aware map** capability, when working either standalone or jointly with a common SLAM pipeline. △ Less

Submitted 10 July, 2023; v1 submitted 13 March, 2023; originally announced March 2023.

Comments: 15 Pages and 12 figures. Accepted to RSS 2023. Project webpage: https://neuse-slam.github.io/neuse/

arXiv:2302.13264 [pdf, other]

Data-Association-Free Landmark-based SLAM

Authors: Yihao Zhang, Odin A. Severinsen, John J. Leonard, Luca Carlone, Kasra Khosoussi

Abstract: We study landmark-based SLAM with unknown data association: our robot navigates in a completely unknown environment and has to simultaneously reason over its own trajectory, the positions of an unknown number of landmarks in the environment, and potential data associations between measurements and landmarks. This setup is interesting since: (i) it arises when recovering from data association failu… ▽ More We study landmark-based SLAM with unknown data association: our robot navigates in a completely unknown environment and has to simultaneously reason over its own trajectory, the positions of an unknown number of landmarks in the environment, and potential data associations between measurements and landmarks. This setup is interesting since: (i) it arises when recovering from data association failures or from SLAM with information-poor sensors, (ii) it sheds light on fundamental limits (and hardness) of landmark-based SLAM problems irrespective of the front-end data association method, and (iii) it generalizes existing approaches where data association is assumed to be known or partially known. We approach the problem by splitting it into an inner problem of estimating the trajectory, landmark positions and data associations and an outer problem of estimating the number of landmarks. Our approach creates useful and novel connections with existing techniques from discrete-continuous optimization (e.g., k-means clustering), which has the potential to trigger novel research. We demonstrate the proposed approaches in extensive simulations and on real datasets and show that the proposed techniques outperform typical data association baselines and are even competitive against an "oracle" baseline which has access to the number of landmarks and an initial guess for each landmark. △ Less

Submitted 4 May, 2023; v1 submitted 26 February, 2023; originally announced February 2023.

Comments: Accepted at ICRA 2023. Correcting a typo (missing parentheses) in eq. (1) and following equations

arXiv:2302.11614 [pdf, other]

Certifiably Correct Range-Aided SLAM

Authors: Alan Papalia, Andrew Fishberg, Brendan W. O'Neill, Jonathan P. How, David M. Rosen, John J. Leonard

Abstract: We present the first algorithm to efficiently compute certifiably optimal solutions to range-aided simultaneous localization and map** (RA-SLAM) problems. Robotic navigation systems increasingly incorporate point-to-point ranging sensors, leading to state estimation problems in the form of RA-SLAM. However, the RA-SLAM problem is significantly more difficult to solve than traditional pose-graph… ▽ More We present the first algorithm to efficiently compute certifiably optimal solutions to range-aided simultaneous localization and map** (RA-SLAM) problems. Robotic navigation systems increasingly incorporate point-to-point ranging sensors, leading to state estimation problems in the form of RA-SLAM. However, the RA-SLAM problem is significantly more difficult to solve than traditional pose-graph SLAM: ranging sensor models introduce non-convexity and single range measurements do not uniquely determine the transform between the involved sensors. As a result, RA-SLAM inference is sensitive to initial estimates yet lacks reliable initialization techniques. Our approach, certifiably correct RA-SLAM (CORA), leverages a novel quadratically constrained quadratic programming (QCQP) formulation of RA-SLAM to relax the RA-SLAM problem to a semidefinite program (SDP). CORA solves the SDP efficiently using the Riemannian Staircase methodology; the SDP solution provides both (i) a lower bound on the RA-SLAM problem's optimal value, and (ii) an approximate solution of the RA-SLAM problem, which can be subsequently refined using local optimization. CORA applies to problems with arbitrary pose-pose, pose-landmark, and ranging measurements and, due to using convex relaxation, is insensitive to initialization. We evaluate CORA on several real-world problems. In contrast to state-of-the-art approaches, CORA is able to obtain high-quality solutions on all problems despite being initialized with random values. Additionally, we study the tightness of the SDP relaxation with respect to important problem parameters: the number of (i) robots, (ii) landmarks, and (iii) range measurements. These experiments demonstrate that the SDP relaxation is often tight and reveal relationships between graph rigidity and the tightness of the SDP relaxation. △ Less

Submitted 19 September, 2023; v1 submitted 22 February, 2023; originally announced February 2023.

Comments: 17 pages, 9 figures, submitted to T-RO

arXiv:2211.01513 [pdf, other]

Optimizing Fiducial Marker Placement for Improved Visual Localization

Authors: Qiangqiang Huang, Joseph DeGol, Victor Fragoso, Sudipta N. Sinha, John J. Leonard

Abstract: Adding fiducial markers to a scene is a well-known strategy for making visual localization algorithms more robust. Traditionally, these marker locations are selected by humans who are familiar with visual localization techniques. This paper explores the problem of automatic marker placement within a scene. Specifically, given a predetermined set of markers and a scene model, we compute optimized m… ▽ More Adding fiducial markers to a scene is a well-known strategy for making visual localization algorithms more robust. Traditionally, these marker locations are selected by humans who are familiar with visual localization techniques. This paper explores the problem of automatic marker placement within a scene. Specifically, given a predetermined set of markers and a scene model, we compute optimized marker positions within the scene that can improve accuracy in visual localization. Our main contribution is a novel framework for modeling camera localizability that incorporates both natural scene features and artificial fiducial markers added to the scene. We present optimized marker placement (OMP), a greedy algorithm that is based on the camera localizability framework. We have also designed a simulation framework for testing marker placement algorithms on 3D models and images generated from synthetic scenes. We have evaluated OMP within this testbed and demonstrate an improvement in the localization rate by up to 20 percent on four different scenes. △ Less

Submitted 16 March, 2023; v1 submitted 2 November, 2022; originally announced November 2022.

Comments: Extended technical report for publication in IEEE Robotics and Automation Letters (RA-L)

arXiv:2210.13641 [pdf, other]

NeRF-SLAM: Real-Time Dense Monocular SLAM with Neural Radiance Fields

Authors: Antoni Rosinol, John J. Leonard, Luca Carlone

Abstract: We propose a novel geometric and photometric 3D map** pipeline for accurate and real-time scene reconstruction from monocular images. To achieve this, we leverage recent advances in dense monocular SLAM and real-time hierarchical volumetric neural radiance fields. Our insight is that dense monocular SLAM provides the right information to fit a neural radiance field of the scene in real-time, by… ▽ More We propose a novel geometric and photometric 3D map** pipeline for accurate and real-time scene reconstruction from monocular images. To achieve this, we leverage recent advances in dense monocular SLAM and real-time hierarchical volumetric neural radiance fields. Our insight is that dense monocular SLAM provides the right information to fit a neural radiance field of the scene in real-time, by providing accurate pose estimates and depth-maps with associated uncertainty. With our proposed uncertainty-based depth loss, we achieve not only good photometric accuracy, but also great geometric accuracy. In fact, our proposed pipeline achieves better geometric and photometric accuracy than competing approaches (up to 179% better PSNR and 86% better L1 depth), while working in real-time and using only monocular images. △ Less

Submitted 24 October, 2022; originally announced October 2022.

Comments: 10 pages, 6 figures

arXiv:2210.03177 [pdf, other]

SCORE: A Second-Order Conic Initialization for Range-Aided SLAM

Authors: Alan Papalia, Joseph Morales, Kevin J. Doherty, David M. Rosen, John J. Leonard

Abstract: We present a novel initialization technique for the range-aided simultaneous localization and map** (RA-SLAM) problem. In RA-SLAM we consider measurements of point-to-point distances in addition to measurements of rigid transformations to landmark or pose variables. Standard formulations of RA-SLAM approach the problem as non-convex optimization, which requires a good initialization to obtain qu… ▽ More We present a novel initialization technique for the range-aided simultaneous localization and map** (RA-SLAM) problem. In RA-SLAM we consider measurements of point-to-point distances in addition to measurements of rigid transformations to landmark or pose variables. Standard formulations of RA-SLAM approach the problem as non-convex optimization, which requires a good initialization to obtain quality results. The initialization technique proposed here relaxes the RA-SLAM problem to a convex problem which is then solved to determine an initialization for the original, non-convex problem. The relaxation is a second-order cone program (SOCP), which is derived from a quadratically constrained quadratic program (QCQP) formulation of the RA-SLAM problem. As a SOCP, the method is highly scalable. We name this relaxation Second-order COnic RElaxation for RA-SLAM (SCORE). To our knowledge, this work represents the first convex relaxation for RA-SLAM. We present real-world and simulated experiments which show SCORE initialization permits the efficient recovery of quality solutions for a variety of challenging single- and multi-robot RA-SLAM problems with thousands of poses and range measurements. △ Less

Submitted 6 October, 2022; originally announced October 2022.

Comments: 9 pages, 8 figures, extended version of paper submitted to ICRA 2023

arXiv:2210.01276 [pdf, other]

Probabilistic Volumetric Fusion for Dense Monocular SLAM

Authors: Antoni Rosinol, John J. Leonard, Luca Carlone

Abstract: We present a novel method to reconstruct 3D scenes from images by leveraging deep dense monocular SLAM and fast uncertainty propagation. The proposed approach is able to 3D reconstruct scenes densely, accurately, and in real-time while being robust to extremely noisy depth estimates coming from dense monocular SLAM. Differently from previous approaches, that either use ad-hoc depth filters, or tha… ▽ More We present a novel method to reconstruct 3D scenes from images by leveraging deep dense monocular SLAM and fast uncertainty propagation. The proposed approach is able to 3D reconstruct scenes densely, accurately, and in real-time while being robust to extremely noisy depth estimates coming from dense monocular SLAM. Differently from previous approaches, that either use ad-hoc depth filters, or that estimate the depth uncertainty from RGB-D cameras' sensor models, our probabilistic depth uncertainty derives directly from the information matrix of the underlying bundle adjustment problem in SLAM. We show that the resulting depth uncertainty provides an excellent signal to weight the depth-maps for volumetric fusion. Without our depth uncertainty, the resulting mesh is noisy and with artifacts, while our approach generates an accurate 3D mesh with significantly fewer artifacts. We provide results on the challenging Euroc dataset, and show that our approach achieves 92% better accuracy than directly fusing depths from monocular SLAM, and up to 90% improvements compared to the best competing approach. △ Less

Submitted 16 October, 2022; v1 submitted 3 October, 2022; originally announced October 2022.

Comments: 9 pages, 6 figures, 2 tables

arXiv:2208.01014 [pdf, other]

Robust Change Detection Based on Neural Descriptor Fields

Authors: Jiahui Fu, Yilun Du, Kurran Singh, Joshua B. Tenenbaum, John J. Leonard

Abstract: The ability to reason about changes in the environment is crucial for robots operating over extended periods of time. Agents are expected to capture changes during operation so that actions can be followed to ensure a smooth progression of the working session. However, varying viewing angles and accumulated localization errors make it easy for robots to falsely detect changes in the surrounding wo… ▽ More The ability to reason about changes in the environment is crucial for robots operating over extended periods of time. Agents are expected to capture changes during operation so that actions can be followed to ensure a smooth progression of the working session. However, varying viewing angles and accumulated localization errors make it easy for robots to falsely detect changes in the surrounding world due to low observation overlap and drifted object associations. In this paper, based on the recently proposed category-level Neural Descriptor Fields (NDFs), we develop an object-level online change detection approach that is robust to partially overlap** observations and noisy localization results. Utilizing the shape completion capability and SE(3)-equivariance of NDFs, we represent objects with compact shape codes encoding full object shapes from partial observations. The objects are then organized in a spatial tree structure based on object centers recovered from NDFs for fast queries of object neighborhoods. By associating objects via shape code similarity and comparing local object-neighbor spatial layout, our proposed approach demonstrates robustness to low observation overlap and localization noises. We conduct experiments on both synthetic and real-world sequences and achieve improved change detection results compared to multiple baseline methods. Project webpage: https://yilundu.github.io/ndf_change △ Less

Submitted 1 August, 2022; originally announced August 2022.

Comments: 8 pages, 8 figures, and 2 tables. Accepted to IROS 2022. Project webpage: https://yilundu.github.io/ndf_change

arXiv:2207.08323 [pdf, other]

doi 10.1109/LRA.2022.3191794

PlaneSDF-based Change Detection for Long-term Dense Map**

Authors: Jiahui Fu, Chengyuan Lin, Yuichi Taguchi, Andrea Cohen, Yifu Zhang, Stephen Mylabathula, John J. Leonard

Abstract: The ability to process environment maps across multiple sessions is critical for robots operating over extended periods of time. Specifically, it is desirable for autonomous agents to detect changes amongst maps of different sessions so as to gain a conflict-free understanding of the current environment. In this paper, we look into the problem of change detection based on a novel map representatio… ▽ More The ability to process environment maps across multiple sessions is critical for robots operating over extended periods of time. Specifically, it is desirable for autonomous agents to detect changes amongst maps of different sessions so as to gain a conflict-free understanding of the current environment. In this paper, we look into the problem of change detection based on a novel map representation, dubbed Plane Signed Distance Fields (PlaneSDF), where dense maps are represented as a collection of planes and their associated geometric components in SDF volumes. Given point clouds of the source and target scenes, we propose a three-step PlaneSDF-based change detection approach: (1) PlaneSDF volumes are instantiated within each scene and registered across scenes using plane poses; 2D height maps and object maps are extracted per volume via height projection and connected component analysis. (2) Height maps are compared and intersected with the object map to produce a 2D change location mask for changed object candidates in the source scene. (3) 3D geometric validation is performed using SDF-derived features per object candidate for change mask refinement. We evaluate our approach on both synthetic and real-world datasets and demonstrate its effectiveness via the task of changed object detection. Supplementary video: https://youtu.be/oh-MQPWTwZI △ Less

Submitted 5 October, 2022; v1 submitted 17 July, 2022; originally announced July 2022.

Comments: 8 pages, 7 figures, and 1 table. To be published in Robotics and Automation Letters and IROS 2022. Link to supplementary video added in the abstract: https://youtu.be/oh-MQPWTwZI

arXiv:2204.11936 [pdf, other]

Discrete-Continuous Smoothing and Map**

Authors: Kevin J. Doherty, Ziqi Lu, Kurran Singh, John J. Leonard

Abstract: We describe a general approach for maximum a posteriori (MAP) inference in a class of discrete-continuous factor graphs commonly encountered in robotics applications. While there are openly available tools providing flexible and easy-to-use interfaces for specifying and solving inference problems formulated in terms of either discrete or continuous graphical models, at present, no similarly genera… ▽ More We describe a general approach for maximum a posteriori (MAP) inference in a class of discrete-continuous factor graphs commonly encountered in robotics applications. While there are openly available tools providing flexible and easy-to-use interfaces for specifying and solving inference problems formulated in terms of either discrete or continuous graphical models, at present, no similarly general tools exist enabling the same functionality for hybrid discrete-continuous problems. We aim to address this problem. In particular, we provide a library, DC-SAM, extending existing tools for inference problems defined in terms of factor graphs to the setting of discrete-continuous models. A key contribution of our work is a novel solver for efficiently recovering approximate solutions to discrete-continuous inference problems. The key insight to our approach is that while joint inference over continuous and discrete state spaces is often hard, many commonly encountered discrete-continuous problems can naturally be split into a "discrete part" and a "continuous part" that can individually be solved easily. Leveraging this structure, we optimize discrete and continuous variables in an alternating fashion. In consequence, our proposed work enables straightforward representation of and approximate inference in discrete-continuous graphical models. We also provide a method to approximate the uncertainty in estimates of both discrete and continuous variables. We demonstrate the versatility of our approach through its application to distinct robot perception applications, including robust pose graph optimization, and object-based map** and localization. △ Less

Submitted 17 November, 2022; v1 submitted 25 April, 2022; originally announced April 2022.

Comments: Extended technical report for publication in IEEE Robotics and Automation Letters (RA-L)

arXiv:2203.13897 [pdf, other]

Spectral Measurement Sparsification for Pose-Graph SLAM

Authors: Kevin J. Doherty, David M. Rosen, John J. Leonard

Abstract: Simultaneous localization and map** (SLAM) is a critical capability in autonomous navigation, but in order to scale SLAM to the setting of "lifelong" SLAM, particularly under memory or computation constraints, a robot must be able to determine what information should be retained and what can safely be forgotten. In graph-based SLAM, the number of edges (measurements) in a pose graph determines b… ▽ More Simultaneous localization and map** (SLAM) is a critical capability in autonomous navigation, but in order to scale SLAM to the setting of "lifelong" SLAM, particularly under memory or computation constraints, a robot must be able to determine what information should be retained and what can safely be forgotten. In graph-based SLAM, the number of edges (measurements) in a pose graph determines both the memory requirements of storing a robot's observations and the computational expense of algorithms deployed for performing state estimation using those observations; both of which can grow unbounded during long-term navigation. To address this, we propose a spectral approach for pose graph sparsification which maximizes the algebraic connectivity of the sparsified measurement graphs, a key quantity which has been shown to control the estimation error of pose graph SLAM solutions. Our algorithm, MAC (for "maximizing algebraic connectivity"), which is based on convex relaxation, is simple and computationally inexpensive, and admits formal post hoc performance guarantees on the quality of the solutions it provides. In experiments on benchmark pose-graph SLAM datasets, we show that our approach quickly produces high-quality sparsification results which retain the connectivity of the graph and, in turn, the quality of corresponding SLAM solutions, as compared to a baseline approach which does not consider graph connectivity. △ Less

Submitted 25 March, 2022; originally announced March 2022.

Comments: Submitted to the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2022

arXiv:2201.03773 [pdf, other]

Performance Guarantees for Spectral Initialization in Rotation Averaging and Pose-Graph SLAM

Authors: Kevin J. Doherty, David M. Rosen, John J. Leonard

Abstract: In this work we present the first initialization methods equipped with explicit performance guarantees adapted to the pose-graph simultaneous localization and map** (SLAM) and rotation averaging (RA) problems. SLAM and rotation averaging are typically formalized as large-scale nonconvex point estimation problems, with many bad local minima that can entrap the smooth optimization methods typicall… ▽ More In this work we present the first initialization methods equipped with explicit performance guarantees adapted to the pose-graph simultaneous localization and map** (SLAM) and rotation averaging (RA) problems. SLAM and rotation averaging are typically formalized as large-scale nonconvex point estimation problems, with many bad local minima that can entrap the smooth optimization methods typically applied to solve them; the performance of standard SLAM and RA algorithms thus crucially depends upon the quality of the estimates used to initialize this local search. While many initialization methods for SLAM and RA have appeared in the literature, these are typically obtained as purely heuristic approximations, making it difficult to determine whether (or under what circumstances) these techniques can be reliably deployed. In contrast, in this work we study the problem of initialization through the lens of spectral relaxation. Specifically, we derive a simple spectral relaxation of SLAM and RA, the form of which enables us to exploit classical linear-algebraic techniques (eigenvector perturbation bounds) to control the distance from our spectral estimate to both the (unknown) ground-truth and the global minimizer of the estimation problem as a function of measurement noise. Our results reveal the critical role that spectral graph-theoretic properties of the measurement network play in controlling estimation accuracy; moreover, as a by-product of our analysis we obtain new bounds on the estimation error for the maximum likelihood estimators in SLAM and RA, which are likely to be of independent interest. Finally, we show experimentally that our spectral estimator is very effective in practice, producing initializations of comparable or superior quality at lower computational cost compared to existing state-of-the-art techniques. △ Less

Submitted 10 January, 2022; originally announced January 2022.

arXiv:2110.09741 [pdf, other]

Trajectory Prediction with Linguistic Representations

Authors: Yen-Ling Kuo, Xin Huang, Andrei Barbu, Stephen G. McGill, Boris Katz, John J. Leonard, Guy Rosman

Abstract: Language allows humans to build mental models that interpret what is happening around them resulting in more accurate long-term predictions. We present a novel trajectory prediction model that uses linguistic intermediate representations to forecast trajectories, and is trained using trajectory samples with partially-annotated captions. The model learns the meaning of each of the words without dir… ▽ More Language allows humans to build mental models that interpret what is happening around them resulting in more accurate long-term predictions. We present a novel trajectory prediction model that uses linguistic intermediate representations to forecast trajectories, and is trained using trajectory samples with partially-annotated captions. The model learns the meaning of each of the words without direct per-word supervision. At inference time, it generates a linguistic description of trajectories which captures maneuvers and interactions over an extended time interval. This generated description is used to refine predictions of the trajectories of multiple agents. We train and validate our model on the Argoverse dataset, and demonstrate improved accuracy results in trajectory prediction. In addition, our model is more interpretable: it presents part of its reasoning in plain language as captions, which can aid model development and can aid in building confidence in the model before deploying it. △ Less

Submitted 9 March, 2022; v1 submitted 19 October, 2021; originally announced October 2021.

Comments: Accepted in ICRA 2022

arXiv:2110.08750 [pdf, other]

TIP: Task-Informed Motion Prediction for Intelligent Vehicles

Authors: Xin Huang, Guy Rosman, Ashkan Jasour, Stephen G. McGill, John J. Leonard, Brian C. Williams

Abstract: When predicting trajectories of road agents, motion predictors usually approximate the future distribution by a limited number of samples. This constraint requires the predictors to generate samples that best support the task given task specifications. However, existing predictors are often optimized and evaluated via task-agnostic measures without accounting for the use of predictions in downstre… ▽ More When predicting trajectories of road agents, motion predictors usually approximate the future distribution by a limited number of samples. This constraint requires the predictors to generate samples that best support the task given task specifications. However, existing predictors are often optimized and evaluated via task-agnostic measures without accounting for the use of predictions in downstream tasks, and thus could result in sub-optimal task performance. In this paper, we propose a task-informed motion prediction model that better supports the tasks through its predictions, by jointly reasoning about prediction accuracy and the utility of the downstream tasks, which is commonly used to evaluate the task performance. The task utility function does not require the full task information, but rather a specification of the utility of the task, resulting in predictors that serve a wide range of downstream tasks. We demonstrate our approach on two use cases of common decision making tasks and their utility functions, in the context of autonomous driving and parallel autonomy. Experiment results show that our predictor produces accurate predictions that improve the task performance by a large margin in both tasks when compared to task-agnostic baselines on the Waymo Open Motion dataset. △ Less

Submitted 26 May, 2022; v1 submitted 17 October, 2021; originally announced October 2021.

Comments: 9 pages, 5 figures, 5 tables

arXiv:2110.02344 [pdf, other]

HYPER: Learned Hybrid Trajectory Prediction via Factored Inference and Adaptive Sampling

Authors: Xin Huang, Guy Rosman, Igor Gilitschenski, Ashkan Jasour, Stephen G. McGill, John J. Leonard, Brian C. Williams

Abstract: Modeling multi-modal high-level intent is important for ensuring diversity in trajectory prediction. Existing approaches explore the discrete nature of human intent before predicting continuous trajectories, to improve accuracy and support explainability. However, these approaches often assume the intent to remain fixed over the prediction horizon, which is problematic in practice, especially over… ▽ More Modeling multi-modal high-level intent is important for ensuring diversity in trajectory prediction. Existing approaches explore the discrete nature of human intent before predicting continuous trajectories, to improve accuracy and support explainability. However, these approaches often assume the intent to remain fixed over the prediction horizon, which is problematic in practice, especially over longer horizons. To overcome this limitation, we introduce HYPER, a general and expressive hybrid prediction framework that models evolving human intent. By modeling traffic agents as a hybrid discrete-continuous system, our approach is capable of predicting discrete intent changes over time. We learn the probabilistic hybrid model via a maximum likelihood estimation problem and leverage neural proposal distributions to sample adaptively from the exponentially growing discrete space. The overall approach affords a better trade-off between accuracy and coverage. We train and validate our model on the Argoverse dataset, and demonstrate its effectiveness through comprehensive ablation studies and comparisons with state-of-the-art models. △ Less

Submitted 5 October, 2021; originally announced October 2021.

Comments: 12 pages, 10 figures, 4 tables

arXiv:2110.00876 [pdf, other]

Incremental Non-Gaussian Inference for SLAM Using Normalizing Flows

Authors: Qiangqiang Huang, Can Pu, Kasra Khosoussi, David M. Rosen, Dehann Fourie, Jonathan P. How, John J. Leonard

Abstract: This paper presents normalizing flows for incremental smoothing and map** (NF-iSAM), a novel algorithm for inferring the full posterior distribution in SLAM problems with nonlinear measurement models and non-Gaussian factors. NF-iSAM exploits the expressive power of neural networks, and trains normalizing flows to model and sample the full posterior. By leveraging the Bayes tree, NF-iSAM enables… ▽ More This paper presents normalizing flows for incremental smoothing and map** (NF-iSAM), a novel algorithm for inferring the full posterior distribution in SLAM problems with nonlinear measurement models and non-Gaussian factors. NF-iSAM exploits the expressive power of neural networks, and trains normalizing flows to model and sample the full posterior. By leveraging the Bayes tree, NF-iSAM enables efficient incremental updates similar to iSAM2, albeit in the more challenging non-Gaussian setting. We demonstrate the advantages of NF-iSAM over state-of-the-art point and distribution estimation algorithms using range-only SLAM problems with data association ambiguity. NF-iSAM presents superior accuracy in describing the posterior beliefs of continuous variables (e.g., position) and discrete variables (e.g., data association). △ Less

Submitted 2 July, 2022; v1 submitted 2 October, 2021; originally announced October 2021.

Comments: Extension of work published at arXiv:2105.05045

arXiv:2109.10871 [pdf, other]

doi 10.1109/LRA.2022.3189786

Nested Sampling for Non-Gaussian Inference in SLAM Factor Graphs

Authors: Qiangqiang Huang, Alan Papalia, John J. Leonard

Abstract: We present nested sampling for factor graphs (NSFG), a novel nested sampling approach to approximate inference for posterior distributions expressed over factor-graphs. Performing such inference is a key step in simultaneous localization and map** (SLAM). Although the Gaussian approximation often works well, in other more challenging SLAM situations, the posterior distribution is non-Gaussian an… ▽ More We present nested sampling for factor graphs (NSFG), a novel nested sampling approach to approximate inference for posterior distributions expressed over factor-graphs. Performing such inference is a key step in simultaneous localization and map** (SLAM). Although the Gaussian approximation often works well, in other more challenging SLAM situations, the posterior distribution is non-Gaussian and cannot be explicitly represented with standard distributions. Our technique applies to settings where the posterior distribution is substantially non-Gaussian (e.g., multi-modal) and thus needs a more expressive representation. NSFG exploits nested sampling methods to directly sample the posterior to represent the distribution without parametric density models. While nested sampling methods are known for their powerful capability in sampling multi-modal distributions, the application of the methods to SLAM factor graphs is not straightforward. NSFG leverages the structure of factor graphs to construct informative prior distributions which are efficiently sampled and provide notable computational benefits for nested sampling methods. We present simulated experiments which demonstrate that NSFG is more robust and computes solutions over an order of magnitude faster than state-of-the-art sampling techniques. Similarly, we compare NSFG to state-of-the-art Gaussian and non-Gaussian SLAM approaches and demonstrate that NSFG is notably more robust in describing non-Gaussian posteriors. △ Less

Submitted 8 August, 2022; v1 submitted 22 September, 2021; originally announced September 2021.

Journal ref: IEEE Robotics and Automation Letters, vol. 7, no. 4, pp. 9232-9239, Oct. 2022

arXiv:2108.01225 [pdf, other]

A Multi-Hypothesis Approach to Pose Ambiguity in Object-Based SLAM

Authors: Jiahui Fu, Qiangqiang Huang, Kevin Doherty, Yue Wang, John J. Leonard

Abstract: In object-based Simultaneous Localization and Map** (SLAM), 6D object poses offer a compact representation of landmark geometry useful for downstream planning and manipulation tasks. However, measurement ambiguity then arises as objects may possess complete or partial object shape symmetries (e.g., due to occlusion), making it difficult or impossible to generate a single consistent object pose e… ▽ More In object-based Simultaneous Localization and Map** (SLAM), 6D object poses offer a compact representation of landmark geometry useful for downstream planning and manipulation tasks. However, measurement ambiguity then arises as objects may possess complete or partial object shape symmetries (e.g., due to occlusion), making it difficult or impossible to generate a single consistent object pose estimate. One idea is to generate multiple pose candidates to counteract measurement ambiguity. In this paper, we develop a novel approach that enables an object-based SLAM system to reason about multiple pose hypotheses for an object, and synthesize this locally ambiguous information into a globally consistent robot and landmark pose estimation formulation. In particular, we (1) present a learned pose estimation network that provides multiple hypotheses about the 6D pose of an object; (2) by treating the output of our network as components of a mixture model, we incorporate pose predictions into a SLAM system, which, over successive observations, recovers a globally consistent set of robot and object (landmark) pose estimates. We evaluate our approach on the popular YCB-Video Dataset and a simulated video featuring YCB objects. Experiments demonstrate that our approach is effective in improving the robustness of object-based SLAM in the face of object pose ambiguity. △ Less

Submitted 2 August, 2021; originally announced August 2021.

Comments: 8 pages, 8 figures, and 1 table. Accepted to The IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS) 2021

arXiv:2105.05045 [pdf, other]

NF-iSAM: Incremental Smoothing and Map** via Normalizing Flows

Authors: Qiangqiang Huang, Can Pu, Dehann Fourie, Kasra Khosoussi, Jonathan P. How, John J. Leonard

Abstract: This paper presents a novel non-Gaussian inference algorithm, Normalizing Flow iSAM (NF-iSAM), for solving SLAM problems with non-Gaussian factors and/or non-linear measurement models. NF-iSAM exploits the expressive power of neural networks, and trains normalizing flows to draw samples from the joint posterior of non-Gaussian factor graphs. By leveraging the Bayes tree, NF-iSAM is able to exploit… ▽ More This paper presents a novel non-Gaussian inference algorithm, Normalizing Flow iSAM (NF-iSAM), for solving SLAM problems with non-Gaussian factors and/or non-linear measurement models. NF-iSAM exploits the expressive power of neural networks, and trains normalizing flows to draw samples from the joint posterior of non-Gaussian factor graphs. By leveraging the Bayes tree, NF-iSAM is able to exploit the sparsity structure of SLAM, thus enabling efficient incremental updates similar to iSAM2, albeit in the more challenging non-Gaussian setting. We demonstrate the performance of NF-iSAM and compare it against the state-of-the-art algorithms such as iSAM2 (Gaussian) and mm-iSAM (non-Gaussian) in synthetic and real range-only SLAM datasets. △ Less

Submitted 11 May, 2021; originally announced May 2021.

Comments: 8 pages, 6 figures, to be published in IEEE International Conference on Robotics and Automation (ICRA) 2021

arXiv:2104.02761 [pdf, other]

Lidar-Monocular Surface Reconstruction Using Line Segments

Authors: Victor Amblard, Timothy P. Osedach, Arnaud Croux, Andrew Speck, John J. Leonard

Abstract: Structure from Motion (SfM) often fails to estimate accurate poses in environments that lack suitable visual features. In such cases, the quality of the final 3D mesh, which is contingent on the accuracy of those estimates, is reduced. One way to overcome this problem is to combine data from a monocular camera with that of a LIDAR. This allows fine details and texture to be captured while still ac… ▽ More Structure from Motion (SfM) often fails to estimate accurate poses in environments that lack suitable visual features. In such cases, the quality of the final 3D mesh, which is contingent on the accuracy of those estimates, is reduced. One way to overcome this problem is to combine data from a monocular camera with that of a LIDAR. This allows fine details and texture to be captured while still accurately representing featureless subjects. However, fusing these two sensor modalities is challenging due to their fundamentally different characteristics. Rather than directly fusing image features and LIDAR points, we propose to leverage common geometric features that are detected in both the LIDAR scans and image data, allowing data from the two sensors to be processed in a higher-level space. In particular, we propose to find correspondences between 3D lines extracted from LIDAR scans and 2D lines detected in images before performing a bundle adjustment to refine poses. We also exploit the detected and optimized line segments to improve the quality of the final mesh. We test our approach on the recently published dataset, Newer College Dataset. We compare the accuracy and the completeness of the 3D mesh to a ground truth obtained with a survey-grade 3D scanner. We show that our method delivers results that are comparable to a state-of-the-art LIDAR survey while not requiring highly accurate ground truth pose estimates. △ Less

Submitted 6 April, 2021; originally announced April 2021.

arXiv:2104.00562 [pdf, other]

A Front-End for Dense Monocular SLAM using a Learned Outlier Mask Prior

Authors: Yihao Zhang, John J. Leonard

Abstract: Recent achievements in depth prediction from a single RGB image have powered the new research area of combining convolutional neural networks (CNNs) with classical simultaneous localization and map** (SLAM) algorithms. The depth prediction from a CNN provides a reasonable initial point in the optimization process in the traditional SLAM algorithms, while the SLAM algorithms further improve the C… ▽ More Recent achievements in depth prediction from a single RGB image have powered the new research area of combining convolutional neural networks (CNNs) with classical simultaneous localization and map** (SLAM) algorithms. The depth prediction from a CNN provides a reasonable initial point in the optimization process in the traditional SLAM algorithms, while the SLAM algorithms further improve the CNN prediction online. However, most of the current CNN-SLAM approaches have only taken advantage of the depth prediction but not yet other products from a CNN. In this work, we explore the use of the outlier mask, a by-product from unsupervised learning of depth from video, as a prior in a classical probability model for depth estimate fusion to step up the outlier-resistant tracking performance of a SLAM front-end. On the other hand, some of the previous CNN-SLAM work builds on feature-based sparse SLAM methods, wasting the per-pixel dense prediction from a CNN. In contrast to these sparse methods, we devise a dense CNN-assisted SLAM front-end that is implementable with TensorFlow and evaluate it on both indoor and outdoor datasets. △ Less

Submitted 1 April, 2021; originally announced April 2021.

arXiv:2103.11031 [pdf, other]

Bootstrapped Self-Supervised Training with Monocular Video for Semantic Segmentation and Depth Estimation

Authors: Yihao Zhang, John J. Leonard

Abstract: For a robot deployed in the world, it is desirable to have the ability of autonomous learning to improve its initial pre-set knowledge. We formalize this as a bootstrapped self-supervised learning problem where a system is initially bootstrapped with supervised training on a labeled dataset and we look for a self-supervised training method that can subsequently improve the system over the supervis… ▽ More For a robot deployed in the world, it is desirable to have the ability of autonomous learning to improve its initial pre-set knowledge. We formalize this as a bootstrapped self-supervised learning problem where a system is initially bootstrapped with supervised training on a labeled dataset and we look for a self-supervised training method that can subsequently improve the system over the supervised training baseline using only unlabeled data. In this work, we leverage temporal consistency between frames in monocular video to perform this bootstrapped self-supervised training. We show that a well-trained state-of-the-art semantic segmentation network can be further improved through our method. In addition, we show that the bootstrapped self-supervised training framework can help a network learn depth estimation better than pure supervised training or self-supervised training. △ Less

Submitted 31 July, 2021; v1 submitted 19 March, 2021; originally announced March 2021.

Comments: IROS 2021

arXiv:2103.05041 [pdf, other]

Advances in Inference and Representation for Simultaneous Localization and Map**

Authors: David M. Rosen, Kevin J. Doherty, Antonio Teran Espinoza, John J. Leonard

Abstract: Simultaneous localization and map** (SLAM) is the process of constructing a global model of an environment from local observations of it; this is a foundational capability for mobile robots, supporting such core functions as planning, navigation, and control. This article reviews recent progress in SLAM, focusing on advances in the expressive capacity of the environmental models used in SLAM sys… ▽ More Simultaneous localization and map** (SLAM) is the process of constructing a global model of an environment from local observations of it; this is a foundational capability for mobile robots, supporting such core functions as planning, navigation, and control. This article reviews recent progress in SLAM, focusing on advances in the expressive capacity of the environmental models used in SLAM systems (representation) and the performance of the algorithms used to estimate these models from data (inference). A prominent theme of recent SLAM research is the pursuit of environmental representations (including learned representations) that go beyond the classical attributes of geometry and appearance to model properties such as hierarchical organization, affordance, dynamics, and semantics; these advances equip autonomous agents with a more comprehensive understanding of the world, enabling more versatile and intelligent operation. A second major theme is a revitalized interest in the mathematical properties of the SLAM estimation problem itself (including its computational and information-theoretic performance limits); this work has led to the development of novel classes of certifiable and robust inference methods that dramatically improve the reliability of SLAM systems in real-world operation. We survey these advances with an emphasis on their ramifications for achieving robust, long-duration autonomy, and conclude with a discussion of open challenges and a perspective on future research directions. △ Less

Submitted 8 March, 2021; originally announced March 2021.

Comments: 30 pages, 4 figures. To appear in Annual Review of Control, Robotics, and Autonomous Systems 2021

arXiv:2003.08003 [pdf, other]

CARPAL: Confidence-Aware Intent Recognition for Parallel Autonomy

Authors: Xin Huang, Stephen G. McGill, Jonathan A. DeCastro, Luke Fletcher, John J. Leonard, Brian C. Williams, Guy Rosman

Abstract: Predicting driver intentions is a difficult and crucial task for advanced driver assistance systems. Traditional confidence measures on predictions often ignore the way predicted trajectories affect downstream decisions for safe driving. In this paper, we propose a novel multi-task intent recognition neural network that predicts not only probabilistic driver trajectories, but also utility statisti… ▽ More Predicting driver intentions is a difficult and crucial task for advanced driver assistance systems. Traditional confidence measures on predictions often ignore the way predicted trajectories affect downstream decisions for safe driving. In this paper, we propose a novel multi-task intent recognition neural network that predicts not only probabilistic driver trajectories, but also utility statistics associated with the predictions for a given downstream task. We establish a decision criterion for parallel autonomy that takes into account the role of driver trajectory prediction in real-time decision making by reasoning about estimated task-specific utility statistics. We further improve the robustness of our system by considering uncertainties in downstream planning tasks that may lead to unsafe decisions. We test our online system on a realistic urban driving dataset, and demonstrate its advantage in terms of recall and fall-out metrics compared to baseline methods, and demonstrate its effectiveness in intervention and warning use cases. △ Less

Submitted 17 March, 2021; v1 submitted 17 March, 2020; originally announced March 2020.

Comments: Accepted at ICRA'21/RA-L'21. Author version with 9 pages, 5 figures, 2 algorithms

arXiv:1911.12736 [pdf, other]

DiversityGAN: Diversity-Aware Vehicle Motion Prediction via Latent Semantic Sampling

Authors: Xin Huang, Stephen G. McGill, Jonathan A. DeCastro, Luke Fletcher, John J. Leonard, Brian C. Williams, Guy Rosman

Abstract: Vehicle trajectory prediction is crucial for autonomous driving and advanced driver assistant systems. While existing approaches may sample from a predicted distribution of vehicle trajectories, they lack the ability to explore it -- a key ability for evaluating safety from a planning and verification perspective. In this work, we devise a novel approach for generating realistic and diverse vehicl… ▽ More Vehicle trajectory prediction is crucial for autonomous driving and advanced driver assistant systems. While existing approaches may sample from a predicted distribution of vehicle trajectories, they lack the ability to explore it -- a key ability for evaluating safety from a planning and verification perspective. In this work, we devise a novel approach for generating realistic and diverse vehicle trajectories. We extend the generative adversarial network (GAN) framework with a low-dimensional approximate semantic space, and shape that space to capture semantics such as merging and turning. We sample from this space in a way that mimics the predicted distribution, but allows us to control coverage of semantically distinct outcomes. We validate our approach on a publicly available dataset and show results that achieve state-of-the-art prediction performance, while providing improved coverage of the space of predicted trajectory semantics. △ Less

Submitted 21 March, 2020; v1 submitted 28 November, 2019; originally announced November 2019.

Comments: 8 pages, 5 figures, 1 table

arXiv:1705.10279 [pdf, other]

Towards Visual Ego-motion Learning in Robots

Authors: Sudeep Pillai, John J. Leonard

Abstract: Many model-based Visual Odometry (VO) algorithms have been proposed in the past decade, often restricted to the type of camera optics, or the underlying motion manifold observed. We envision robots to be able to learn and perform these tasks, in a minimally supervised setting, as they gain more experience. To this end, we propose a fully trainable solution to visual ego-motion estimation for varie… ▽ More Many model-based Visual Odometry (VO) algorithms have been proposed in the past decade, often restricted to the type of camera optics, or the underlying motion manifold observed. We envision robots to be able to learn and perform these tasks, in a minimally supervised setting, as they gain more experience. To this end, we propose a fully trainable solution to visual ego-motion estimation for varied camera optics. We propose a visual ego-motion learning architecture that maps observed optical flow vectors to an ego-motion density estimate via a Mixture Density Network (MDN). By modeling the architecture as a Conditional Variational Autoencoder (C-VAE), our model is able to provide introspective reasoning and prediction for ego-motion induced scene-flow. Additionally, our proposed model is especially amenable to bootstrapped ego-motion learning in robots where the supervision in ego-motion estimation for a particular camera sensor can be obtained from standard navigation-based sensor fusion strategies (GPS/INS and wheel-odometry fusion). Through experiments, we show the utility of our proposed approach in enabling the concept of self-supervised learning for visual ego-motion estimation in autonomous robots. △ Less

Submitted 29 May, 2017; originally announced May 2017.

Comments: Conference paper; Submitted to IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2017, Vancouver CA; 8 pages, 8 figures, 2 tables

arXiv:1612.07386 [pdf, other]

SE-Sync: A Certifiably Correct Algorithm for Synchronization over the Special Euclidean Group

Authors: David M. Rosen, Luca Carlone, Afonso S. Bandeira, John J. Leonard

Abstract: Many important geometric estimation problems take the form of synchronization over the special Euclidean group: estimate the values of a set of poses given a set of relative measurements between them. This problem is typically formulated as a nonconvex maximum-likelihood estimation that is computationally hard to solve in general. Nevertheless, in this paper we present an algorithm that is able to… ▽ More Many important geometric estimation problems take the form of synchronization over the special Euclidean group: estimate the values of a set of poses given a set of relative measurements between them. This problem is typically formulated as a nonconvex maximum-likelihood estimation that is computationally hard to solve in general. Nevertheless, in this paper we present an algorithm that is able to efficiently recover certifiably globally optimal solutions of the special Euclidean synchronization problem in a non-adversarial noise regime. The crux of our approach is the development of a semidefinite relaxation of the maximum-likelihood estimation whose minimizer provides an exact MLE so long as the magnitude of the noise corrupting the available measurements falls below a certain critical threshold; furthermore, whenever exactness obtains, it is possible to verify this fact a posteriori, thereby certifying the optimality of the recovered estimate. We develop a specialized optimization scheme for solving large-scale instances of this relaxation by exploiting its low-rank, geometric, and graph-theoretic structure to reduce it to an equivalent optimization problem on a low-dimensional Riemannian manifold, and design a truncated-Newton trust-region method to solve this reduction efficiently. Finally, we combine this fast optimization approach with a simple rounding procedure to produce our algorithm, SE-Sync. Experimental evaluation on a variety of simulated and real-world pose-graph SLAM datasets shows that SE-Sync is able to recover certifiably globally optimal solutions when the available measurements are corrupted by noise up to an order of magnitude greater than that typically encountered in robotics and computer vision applications, and does so more than an order of magnitude faster than the Gauss-Newton-based approach that forms the basis of current state-of-the-art techniques. △ Less

Submitted 4 February, 2017; v1 submitted 21 December, 2016; originally announced December 2016.

Comments: 49 Pages, 20 figures

arXiv:1611.00128 [pdf, other]

A Certifiably Correct Algorithm for Synchronization over the Special Euclidean Group

Authors: David M. Rosen, Luca Carlone, Afonso S. Bandeira, John J. Leonard

Abstract: Many geometric estimation problems take the form of synchronization over the special Euclidean group: estimate the values of a set of poses given noisy measurements of a subset of their pairwise relative transforms. This problem is typically formulated as a maximum-likelihood estimation that requires solving a nonconvex nonlinear program, which is computationally intractable in general. Neverthele… ▽ More Many geometric estimation problems take the form of synchronization over the special Euclidean group: estimate the values of a set of poses given noisy measurements of a subset of their pairwise relative transforms. This problem is typically formulated as a maximum-likelihood estimation that requires solving a nonconvex nonlinear program, which is computationally intractable in general. Nevertheless, in this paper we present an algorithm that is able to efficiently recover certifiably globally optimal solutions of this estimation problem in a non-adversarial noise regime. The crux of our approach is the development of a semidefinite relaxation of the maximum-likelihood estimation whose minimizer provides the exact MLE so long as the magnitude of the noise corrupting the available measurements falls below a certain critical threshold; furthermore, whenever exactness obtains, it is possible to verify this fact a posteriori, thereby certifying the optimality of the recovered estimate. We develop a specialized optimization scheme for solving large-scale instances of this semidefinite relaxation by exploiting its low-rank, geometric, and graph-theoretic structure to reduce it to an equivalent optimization problem on a low-dimensional Riemannian manifold, and then design a Riemannian truncated-Newton trust-region method to solve this reduction efficiently. We combine this fast optimization approach with a simple rounding procedure to produce our algorithm, SE-Sync. Experimental evaluation on a variety of simulated and real-world pose-graph SLAM datasets shows that SE-Sync is capable of recovering globally optimal solutions when the available measurements are corrupted by noise up to an order of magnitude greater than that typically encountered in robotics applications, and does so at a computational cost that scales comparably with that of direct Newton-type local search techniques. △ Less

Submitted 9 February, 2017; v1 submitted 1 November, 2016; originally announced November 2016.

Comments: 16 pages, 8 figures, to appear in the International Workshop on the Algorithmic Foundations of Robotics (WAFR), Dec 2016

arXiv:1606.05830 [pdf, other]

doi 10.1109/TRO.2016.2624754

Past, Present, and Future of Simultaneous Localization And Map**: Towards the Robust-Perception Age

Authors: Cesar Cadena, Luca Carlone, Henry Carrillo, Yasir Latif, Davide Scaramuzza, Jose Neira, Ian Reid, John J. Leonard

Abstract: Simultaneous Localization and Map** (SLAM)consists in the concurrent construction of a model of the environment (the map), and the estimation of the state of the robot moving within it. The SLAM community has made astonishing progress over the last 30 years, enabling large-scale real-world applications, and witnessing a steady transition of this technology to industry. We survey the current stat… ▽ More Simultaneous Localization and Map** (SLAM)consists in the concurrent construction of a model of the environment (the map), and the estimation of the state of the robot moving within it. The SLAM community has made astonishing progress over the last 30 years, enabling large-scale real-world applications, and witnessing a steady transition of this technology to industry. We survey the current state of SLAM. We start by presenting what is now the de-facto standard formulation for SLAM. We then review related work, covering a broad set of topics including robustness and scalability in long-term map**, metric and semantic representations for map**, theoretical performance guarantees, active SLAM and exploration, and other new frontiers. This paper simultaneously serves as a position paper and tutorial to those who are users of SLAM. By looking at the published research with a critical eye, we delineate open challenges and new research issues, that still deserve careful scientific investigation. The paper also contains the authors' take on two questions that often animate discussions during robotics conferences: Do robots need SLAM? and Is SLAM solved? △ Less

Submitted 30 January, 2017; v1 submitted 18 June, 2016; originally announced June 2016.

Journal ref: IEEE Transactions on Robotics 32 (6) pp 1309-1332, 2016

arXiv:1511.00758 [pdf, other]

High-Performance and Tunable Stereo Reconstruction

Authors: Sudeep Pillai, Srikumar Ramalingam, John J. Leonard

Abstract: Traditional stereo algorithms have focused their efforts on reconstruction quality and have largely avoided prioritizing for run time performance. Robots, on the other hand, require quick maneuverability and effective computation to observe its immediate environment and perform tasks within it. In this work, we propose a high-performance and tunable stereo disparity estimation method, with a peak… ▽ More Traditional stereo algorithms have focused their efforts on reconstruction quality and have largely avoided prioritizing for run time performance. Robots, on the other hand, require quick maneuverability and effective computation to observe its immediate environment and perform tasks within it. In this work, we propose a high-performance and tunable stereo disparity estimation method, with a peak frame-rate of 120Hz (VGA resolution, on a single CPU-thread), that can potentially enable robots to quickly reconstruct their immediate surroundings and maneuver at high-speeds. Our key contribution is a disparity estimation algorithm that iteratively approximates the scene depth via a piece-wise planar mesh from stereo imagery, with a fast depth validation step for semi-dense reconstruction. The mesh is initially seeded with sparsely matched keypoints, and is recursively tessellated and refined as needed (via a resampling stage), to provide the desired stereo disparity accuracy. The inherent simplicity and speed of our approach, with the ability to tune it to a desired reconstruction quality and runtime performance makes it a compelling solution for applications in high-speed vehicles. △ Less

Submitted 17 February, 2016; v1 submitted 2 November, 2015; originally announced November 2015.

Comments: Accepted to International Conference on Robotics and Automation (ICRA) 2016; 8 pages, 5 figures

Showing 1–33 of 33 results for author: Leonard, J J