Search | arXiv e-print repository

PRAGO: Differentiable Multi-View Pose Optimization From Objectness Detections

Authors: Matteo Taiana, Matteo Toso, Stuart James, Alessio Del Bue

Abstract: Robustly estimating camera poses from a set of images is a fundamental task which remains challenging for differentiable methods, especially in the case of small and sparse camera pose graphs. To overcome this challenge, we propose Pose-refined Rotation Averaging Graph Optimization (PRAGO). From a set of objectness detections on unordered images, our method reconstructs the rotational pose, and in… ▽ More Robustly estimating camera poses from a set of images is a fundamental task which remains challenging for differentiable methods, especially in the case of small and sparse camera pose graphs. To overcome this challenge, we propose Pose-refined Rotation Averaging Graph Optimization (PRAGO). From a set of objectness detections on unordered images, our method reconstructs the rotational pose, and in turn, the absolute pose, in a differentiable manner benefiting from the optimization of a sequence of geometrical tasks. We show how our objectness pose-refinement module in PRAGO is able to refine the inherent ambiguities in pairwise relative pose estimation without removing edges and avoiding making early decisions on the viability of graph edges. PRAGO then refines the absolute rotations through iterative graph construction, reweighting the graph edges to compute the final rotational pose, which can be converted into absolute poses using translation averaging. We show that PRAGO is able to outperform non-differentiable solvers on small and sparse scenes extracted from 7-Scenes achieving a relative improvement of 21% for rotations while achieving similar translation estimates. △ Less

Submitted 15 March, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

arXiv:2304.06373 [pdf, other]

3DoF Localization from a Single Image and an Object Map: the Flatlandia Problem and Dataset

Authors: Matteo Toso, Matteo Taiana, Stuart James, Alessio Del Bue

Abstract: Efficient visual localization is crucial to many applications, such as large-scale deployment of autonomous agents and augmented reality. Traditional visual localization, while achieving remarkable accuracy, relies on extensive 3D models of the scene or large collections of geolocalized images, which are often inefficient to store and to scale to novel environments. In contrast, humans orient them… ▽ More Efficient visual localization is crucial to many applications, such as large-scale deployment of autonomous agents and augmented reality. Traditional visual localization, while achieving remarkable accuracy, relies on extensive 3D models of the scene or large collections of geolocalized images, which are often inefficient to store and to scale to novel environments. In contrast, humans orient themselves using very abstract 2D maps, using the location of clearly identifiable landmarks. Drawing on this and on the success of recent works that explored localization on 2D abstract maps, we propose Flatlandia, a novel visual localization challenge. With Flatlandia, we investigate whether it is possible to localize a visual query by comparing the layout of its common objects detected against the known spatial layout of objects in the map. We formalize the challenge as two tasks at different levels of accuracy to investigate the problem and its possible limitations; for each, we propose initial baseline models and compare them against state-of-the-art 6DoF and 3DoF methods. Code and dataset are publicly available at github.com/IIT-PAVIS/Flatlandia. △ Less

Submitted 8 November, 2023; v1 submitted 13 April, 2023; originally announced April 2023.

arXiv:2301.00866 [pdf, other]

3DSGrasp: 3D Shape-Completion for Robotic Grasp

Authors: Seyed S. Mohammadi, Nuno F. Duarte, Dimitris Dimou, Yiming Wang, Matteo Taiana, Pietro Morerio, Atabak Dehban, Plinio Moreno, Alexandre Bernardino, Alessio Del Bue, Jose Santos-Victor

Abstract: Real-world robotic gras** can be done robustly if a complete 3D Point Cloud Data (PCD) of an object is available. However, in practice, PCDs are often incomplete when objects are viewed from few and sparse viewpoints before the gras** action, leading to the generation of wrong or inaccurate grasp poses. We propose a novel gras** strategy, named 3DSGrasp, that predicts the missing geometry fr… ▽ More Real-world robotic gras** can be done robustly if a complete 3D Point Cloud Data (PCD) of an object is available. However, in practice, PCDs are often incomplete when objects are viewed from few and sparse viewpoints before the gras** action, leading to the generation of wrong or inaccurate grasp poses. We propose a novel gras** strategy, named 3DSGrasp, that predicts the missing geometry from the partial PCD to produce reliable grasp poses. Our proposed PCD completion network is a Transformer-based encoder-decoder network with an Offset-Attention layer. Our network is inherently invariant to the object pose and point's permutation, which generates PCDs that are geometrically consistent and completed properly. Experiments on a wide range of partial PCD show that 3DSGrasp outperforms the best state-of-the-art method on PCD completion tasks and largely improves the gras** success rate in real-world scenarios. The code and dataset will be made available upon acceptance. △ Less

Submitted 2 January, 2023; originally announced January 2023.

arXiv:2207.09445 [pdf, other]

PoserNet: Refining Relative Camera Poses Exploiting Object Detections

Authors: Matteo Taiana, Matteo Toso, Stuart James, Alessio Del Bue

Abstract: The estimation of the camera poses associated with a set of images commonly relies on feature matches between the images. In contrast, we are the first to address this challenge by using objectness regions to guide the pose estimation problem rather than explicit semantic object detections. We propose Pose Refiner Network (PoserNet) a light-weight Graph Neural Network to refine the approximate pai… ▽ More The estimation of the camera poses associated with a set of images commonly relies on feature matches between the images. In contrast, we are the first to address this challenge by using objectness regions to guide the pose estimation problem rather than explicit semantic object detections. We propose Pose Refiner Network (PoserNet) a light-weight Graph Neural Network to refine the approximate pair-wise relative camera poses. PoserNet exploits associations between the objectness regions - concisely expressed as bounding boxes - across multiple views to globally refine sparsely connected view graphs. We evaluate on the 7-Scenes dataset across varied sizes of graphs and show how this process can be beneficial to optimisation-based Motion Averaging algorithms improving the median error on the rotation by 62 degrees with respect to the initial estimates obtained based on bounding boxes. Code and data are available at https://github.com/IIT-PAVIS/PoserNet. △ Less

Submitted 21 July, 2022; v1 submitted 19 July, 2022; originally announced July 2022.

Comments: Accepted at ECCV 2022

Showing 1–4 of 4 results for author: Taiana, M