-
SeMLaPS: Real-time Semantic Map** with Latent Prior Networks and Quasi-Planar Segmentation
Authors:
**gwen Wang,
Juan Tarrio,
Lourdes Agapito,
Pablo F. Alcantarilla,
Alexander Vakhitov
Abstract:
The availability of real-time semantics greatly improves the core geometric functionality of SLAM systems, enabling numerous robotic and AR/VR applications. We present a new methodology for real-time semantic map** from RGB-D sequences that combines a 2D neural network and a 3D network based on a SLAM system with 3D occupancy map**. When segmenting a new frame we perform latent feature re-proj…
▽ More
The availability of real-time semantics greatly improves the core geometric functionality of SLAM systems, enabling numerous robotic and AR/VR applications. We present a new methodology for real-time semantic map** from RGB-D sequences that combines a 2D neural network and a 3D network based on a SLAM system with 3D occupancy map**. When segmenting a new frame we perform latent feature re-projection from previous frames based on differentiable rendering. Fusing re-projected feature maps from previous frames with current-frame features greatly improves image segmentation quality, compared to a baseline that processes images independently. For 3D map processing, we propose a novel geometric quasi-planar over-segmentation method that groups 3D map elements likely to belong to the same semantic classes, relying on surface normals. We also describe a novel neural network design for lightweight semantic map post-processing. Our system achieves state-of-the-art semantic map** quality within 2D-3D networks-based systems and matches the performance of 3D convolutional networks on three real indoor datasets, while working in real-time. Moreover, it shows better cross-sensor generalization abilities compared to 3D CNNs, enabling training and inference with different depth sensors. Code and data will be released on project page: http://**gwenwang95.github.io/SeMLaPS
△ Less
Submitted 13 October, 2023; v1 submitted 28 June, 2023;
originally announced June 2023.
-
Self-Supervised Depth Completion for Active Stereo
Authors:
Frederik Warburg,
Daniel Hernandez-Juarez,
Juan Tarrio,
Alexander Vakhitov,
Ujwal Bonde,
Pablo F. Alcantarilla
Abstract:
Active stereo systems are used in many robotic applications that require 3D information. These depth sensors, however, suffer from stereo artefacts and do not provide dense depth estimates.In this work, we present the first self-supervised depth completion method for active stereo systems that predicts accurate dense depth maps. Our system leverages a feature-based visual inertial SLAM system to p…
▽ More
Active stereo systems are used in many robotic applications that require 3D information. These depth sensors, however, suffer from stereo artefacts and do not provide dense depth estimates.In this work, we present the first self-supervised depth completion method for active stereo systems that predicts accurate dense depth maps. Our system leverages a feature-based visual inertial SLAM system to produce motion estimates and accurate (but sparse) 3D landmarks. The 3D landmarks are used both as model input and as supervision during training. The motion estimates are used in our novel reconstruction loss that relies on a combination of passive and active stereo frames, resulting in significant improvements in textureless areas that are common in indoor environments. Due to the nonexistence of publicly available active stereo datasets, we release a real dataset together with additional information for a publicly available synthetic dataset (TartanAir [42]) needed for active depth completion and prediction. Through rigorous evaluations we show that our method outperforms state of the art on both datasets. Additionally we show how our method obtains more complete, and therefore safer, 3D maps when used in a robotic platform.
△ Less
Submitted 20 January, 2022; v1 submitted 7 October, 2021;
originally announced October 2021.
-
Multi-Resolution 3D Map** with Explicit Free Space Representation for Fast and Accurate Mobile Robot Motion Planning
Authors:
Nils Funk,
Juan Tarrio,
Sotiris Papatheodorou,
Marija Popovic,
Pablo F. Alcantarilla,
Stefan Leutenegger
Abstract:
With the aim of bridging the gap between high quality reconstruction and mobile robot motion planning, we propose an efficient system that leverages the concept of adaptive-resolution volumetric map**, which naturally integrates with the hierarchical decomposition of space in an octree data structure. Instead of a Truncated Signed Distance Function (TSDF), we adopt map** of occupancy probabili…
▽ More
With the aim of bridging the gap between high quality reconstruction and mobile robot motion planning, we propose an efficient system that leverages the concept of adaptive-resolution volumetric map**, which naturally integrates with the hierarchical decomposition of space in an octree data structure. Instead of a Truncated Signed Distance Function (TSDF), we adopt map** of occupancy probabilities in log-odds representation, which allows to represent both surfaces, as well as the entire free, i.e. observed space, as opposed to unobserved space. We introduce a method for choosing resolution -- on the fly -- in real-time by means of a multi-scale max-min pooling of the input depth image. The notion of explicit free space map** paired with the spatial hierarchy in the data structure, as well as map resolution, allows for collision queries, as needed for robot motion planning, at unprecedented speed. We quantitatively evaluate map** accuracy, memory, runtime performance, and planning performance showing improvements over the state of the art, particularly in cases requiring high resolution maps.
△ Less
Submitted 30 January, 2021; v1 submitted 15 October, 2020;
originally announced October 2020.
-
Towards Bounding-Box Free Panoptic Segmentation
Authors:
Ujwal Bonde,
Pablo F. Alcantarilla,
Stefan Leutenegger
Abstract:
In this work we introduce a new Bounding-Box Free Network (BBFNet) for panoptic segmentation. Panoptic segmentation is an ideal problem for proposal-free methods as it already requires per-pixel semantic class labels. We use this observation to exploit class boundaries from off-the-shelf semantic segmentation networks and refine them to predict instance labels. Towards this goal BBFNet predicts co…
▽ More
In this work we introduce a new Bounding-Box Free Network (BBFNet) for panoptic segmentation. Panoptic segmentation is an ideal problem for proposal-free methods as it already requires per-pixel semantic class labels. We use this observation to exploit class boundaries from off-the-shelf semantic segmentation networks and refine them to predict instance labels. Towards this goal BBFNet predicts coarse watershed levels and uses them to detect large instance candidates where boundaries are well defined. For smaller instances, whose boundaries are less reliable, BBFNet also predicts instance centers by means of Hough voting followed by mean-shift to reliably detect small objects. A novel triplet loss network helps merging fragmented instances while refining boundary pixels. Our approach is distinct from previous works in panoptic segmentation that rely on a combination of a semantic segmentation network with a computationally costly instance segmentation network based on bounding box proposals, such as Mask R-CNN, to guide the prediction of instance labels using a Mixture-of-Expert (MoE) approach. We benchmark our proposal-free method on Cityscapes and Microsoft COCO datasets and show competitive performance with other MoE based approaches while outperforming existing non-proposal based methods on the COCO dataset. We show the flexibility of our method using different semantic segmentation backbones.
△ Less
Submitted 27 July, 2020; v1 submitted 18 February, 2020;
originally announced February 2020.
-
A Continuous Optimization Approach for Efficient and Accurate Scene Flow
Authors:
Zhaoyang Lv,
Chris Beall,
Pablo F. Alcantarilla,
Fuxin Li,
Zsolt Kira,
Frank Dellaert
Abstract:
We propose a continuous optimization method for solving dense 3D scene flow problems from stereo imagery. As in recent work, we represent the dynamic 3D scene as a collection of rigidly moving planar segments. The scene flow problem then becomes the joint estimation of pixel-to-segment assignment, 3D position, normal vector and rigid motion parameters for each segment, leading to a complex and exp…
▽ More
We propose a continuous optimization method for solving dense 3D scene flow problems from stereo imagery. As in recent work, we represent the dynamic 3D scene as a collection of rigidly moving planar segments. The scene flow problem then becomes the joint estimation of pixel-to-segment assignment, 3D position, normal vector and rigid motion parameters for each segment, leading to a complex and expensive discrete-continuous optimization problem. In contrast, we propose a purely continuous formulation which can be solved more efficiently. Using a fine superpixel segmentation that is fixed a-priori, we propose a factor graph formulation that decomposes the problem into photometric, geometric, and smoothing constraints. We initialize the solution with a novel, high-quality initialization method, then independently refine the geometry and motion of the scene, and finally perform a global non-linear refinement using Levenberg-Marquardt. We evaluate our method in the challenging KITTI Scene Flow benchmark, ranking in third position, while being 3 to 30 times faster than the top competitors.
△ Less
Submitted 27 July, 2016;
originally announced July 2016.
-
Noise Models in Feature-based Stereo Visual Odometry
Authors:
Pablo F. Alcantarilla,
Oliver J. Woodford
Abstract:
Feature-based visual structure and motion reconstruction pipelines, common in visual odometry and large-scale reconstruction from photos, use the location of corresponding features in different images to determine the 3D structure of the scene, as well as the camera parameters associated with each image. The noise model, which defines the likelihood of the location of each feature in each image, i…
▽ More
Feature-based visual structure and motion reconstruction pipelines, common in visual odometry and large-scale reconstruction from photos, use the location of corresponding features in different images to determine the 3D structure of the scene, as well as the camera parameters associated with each image. The noise model, which defines the likelihood of the location of each feature in each image, is a key factor in the accuracy of such pipelines, alongside optimization strategy. Many different noise models have been proposed in the literature; in this paper we investigate the performance of several. We evaluate these models specifically w.r.t. stereo visual odometry, as this task is both simple (camera intrinsics are constant and known; geometry can be initialized reliably) and has datasets with ground truth readily available (KITTI Odometry and New Tsukuba Stereo Dataset). Our evaluation shows that noise models which are more adaptable to the varying nature of noise generally perform better.
△ Less
Submitted 1 July, 2016;
originally announced July 2016.
-
Training Constrained Deconvolutional Networks for Road Scene Semantic Segmentation
Authors:
German Ros,
Simon Stent,
Pablo F. Alcantarilla,
Tomoki Watanabe
Abstract:
In this work we investigate the problem of road scene semantic segmentation using Deconvolutional Networks (DNs). Several constraints limit the practical performance of DNs in this context: firstly, the paucity of existing pixel-wise labelled training data, and secondly, the memory constraints of embedded hardware, which rule out the practical use of state-of-the-art DN architectures such as fully…
▽ More
In this work we investigate the problem of road scene semantic segmentation using Deconvolutional Networks (DNs). Several constraints limit the practical performance of DNs in this context: firstly, the paucity of existing pixel-wise labelled training data, and secondly, the memory constraints of embedded hardware, which rule out the practical use of state-of-the-art DN architectures such as fully convolutional networks (FCN). To address the first constraint, we introduce a Multi-Domain Road Scene Semantic Segmentation (MDRS3) dataset, aggregating data from six existing densely and sparsely labelled datasets for training our models, and two existing, separate datasets for testing their generalisation performance. We show that, while MDRS3 offers a greater volume and variety of data, end-to-end training of a memory efficient DN does not yield satisfactory performance. We propose a new training strategy to overcome this, based on (i) the creation of a best-possible source network (S-Net) from the aggregated data, ignoring time and memory constraints; and (ii) the transfer of knowledge from S-Net to the memory-efficient target network (T-Net). We evaluate different techniques for S-Net creation and T-Net transferral, and demonstrate that training a constrained deconvolutional network in this manner can unlock better performance than existing training approaches. Specifically, we show that a target network can be trained to achieve improved accuracy versus an FCN despite using less than 1\% of the memory. We believe that our approach can be useful beyond automotive scenarios where labelled data is similarly scarce or fragmented and where practical constraints exist on the desired model size. We make available our network models and aggregated multi-domain dataset for reproducibility.
△ Less
Submitted 6 April, 2016;
originally announced April 2016.