Search | arXiv e-print repository

doi 10.1109/LRA.2024.3410164

Beyond the Frontier: Predicting Unseen Walls from Occupancy Grids by Learning from Floor Plans

Authors: Ludvig Ericson, Patric Jensfelt

Abstract: In this paper, we tackle the challenge of predicting the unseen walls of a partially observed environment as a set of 2D line segments, conditioned on occupancy grids integrated along the trajectory of a 360° LIDAR sensor. A dataset of such occupancy grids and their corresponding target wall segments is collected by navigating a virtual robot between a set of randomly sampled waypoints in a collec… ▽ More In this paper, we tackle the challenge of predicting the unseen walls of a partially observed environment as a set of 2D line segments, conditioned on occupancy grids integrated along the trajectory of a 360° LIDAR sensor. A dataset of such occupancy grids and their corresponding target wall segments is collected by navigating a virtual robot between a set of randomly sampled waypoints in a collection of office-scale floor plans from a university campus. The line segment prediction task is formulated as an autoregressive sequence prediction task, and an attention-based deep network is trained on the dataset. The sequence-based autoregressive formulation is evaluated through predicted information gain, as in frontier-based autonomous exploration, demonstrating significant improvements over both non-predictive estimation and convolution-based image prediction found in the literature. Ablations on key components are evaluated, as well as sensor range and the occupancy grid's metric area. Finally, model generality is validated by predicting walls in a novel floor plan reconstructed on-the-fly in a real-world office environment. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: RA-L, 8 pages

Journal ref: IEEE Robotics and Automation Letters (2024) pp. 2377-3766

arXiv:2405.07283 [pdf, other]

BeautyMap: Binary-Encoded Adaptable Ground Matrix for Dynamic Points Removal in Global Maps

Authors: Mingkai Jia, Qingwen Zhang, Bowen Yang, ** Wu, Ming Liu, Patric Jensfelt

Abstract: Global point clouds that correctly represent the static environment features can facilitate accurate localization and robust path planning. However, dynamic objects introduce undesired ghost tracks that are mixed up with the static environment. Existing dynamic removal methods normally fail to balance the performance in computational efficiency and accuracy. In response, we present BeautyMap to ef… ▽ More Global point clouds that correctly represent the static environment features can facilitate accurate localization and robust path planning. However, dynamic objects introduce undesired ghost tracks that are mixed up with the static environment. Existing dynamic removal methods normally fail to balance the performance in computational efficiency and accuracy. In response, we present BeautyMap to efficiently remove the dynamic points while retaining static features for high-fidelity global maps. Our approach utilizes a binary-encoded matrix to efficiently extract the environment features. With a bit-wise comparison between matrices of each frame and the corresponding map region, we can extract potential dynamic regions. Then we use coarse to fine hierarchical segmentation of the $z$-axis to handle terrain variations. The final static restoration module accounts for the range-visibility of each single scan and protects static points out of sight. Comparative experiments underscore BeautyMap's superior performance in both accuracy and efficiency against other dynamic points removal methods. The code is open-sourced at https://github.com/MKJia/BeautyMap. △ Less

Submitted 12 May, 2024; originally announced May 2024.

Comments: The first two authors are co-first authors. 8 pages, accepted by RA-L

arXiv:2405.03633 [pdf, other]

Neural Graph Map** for Dense SLAM with Efficient Loop Closure

Authors: Leonard Bruns, Jun Zhang, Patric Jensfelt

Abstract: Existing neural field-based SLAM methods typically employ a single monolithic field as their scene representation. This prevents efficient incorporation of loop closure constraints and limits scalability. To address these shortcomings, we propose a neural map** framework which anchors lightweight neural fields to the pose graph of a sparse visual SLAM system. Our approach shows the ability to in… ▽ More Existing neural field-based SLAM methods typically employ a single monolithic field as their scene representation. This prevents efficient incorporation of loop closure constraints and limits scalability. To address these shortcomings, we propose a neural map** framework which anchors lightweight neural fields to the pose graph of a sparse visual SLAM system. Our approach shows the ability to integrate large-scale loop closures, while limiting necessary reintegration. Furthermore, we verify the scalability of our approach by demonstrating successful building-scale map** taking multiple loop closures into account during the optimization, and show that our method outperforms existing state-of-the-art approaches on large scenes in terms of quality and runtime. Our code is available at https://kth-rpl.github.io/neural_graph_map**/. △ Less

Submitted 6 May, 2024; originally announced May 2024.

Comments: Project page: https://kth-rpl.github.io/neural_graph_map**/

arXiv:2403.18649 [pdf, other]

Addressing Data Annotation Challenges in Multiple Sensors: A Solution for Scania Collected Datasets

Authors: A**kya Khoche, Aron Asefaw, Alejandro Gonzalez, Bogdan Timus, Sina Sharif Mansouri, Patric Jensfelt

Abstract: Data annotation in autonomous vehicles is a critical step in the development of Deep Neural Network (DNN) based models or the performance evaluation of the perception system. This often takes the form of adding 3D bounding boxes on time-sequential and registered series of point-sets captured from active sensors like Light Detection and Ranging (LiDAR) and Radio Detection and Ranging (RADAR). When… ▽ More Data annotation in autonomous vehicles is a critical step in the development of Deep Neural Network (DNN) based models or the performance evaluation of the perception system. This often takes the form of adding 3D bounding boxes on time-sequential and registered series of point-sets captured from active sensors like Light Detection and Ranging (LiDAR) and Radio Detection and Ranging (RADAR). When annotating multiple active sensors, there is a need to motion compensate and translate the points to a consistent coordinate frame and timestamp respectively. However, highly dynamic objects pose a unique challenge, as they can appear at different timestamps in each sensor's data. Without knowing the speed of the objects, their position appears to be different in different sensor outputs. Thus, even after motion compensation, highly dynamic objects are not matched from multiple sensors in the same frame, and human annotators struggle to add unique bounding boxes that capture all objects. This article focuses on addressing this challenge, primarily within the context of Scania collected datasets. The proposed solution takes a track of an annotated object as input and uses the Moving Horizon Estimation (MHE) to robustly estimate its speed. The estimated speed profile is utilized to correct the position of the annotated box and add boxes to object clusters missed by the original annotation. △ Less

Submitted 27 March, 2024; originally announced March 2024.

Comments: Accepted to European Control Conference 2024

arXiv:2403.17633 [pdf, other]

UADA3D: Unsupervised Adversarial Domain Adaptation for 3D Object Detection with Sparse LiDAR and Large Domain Gaps

Authors: Maciej K Wozniak, Mattias Hansson, Marko Thiel, Patric Jensfelt

Abstract: In this study, we address a gap in existing unsupervised domain adaptation approaches on LiDAR-based 3D object detection, which have predominantly concentrated on adapting between established, high-density autonomous driving datasets. We focus on sparser point clouds, capturing scenarios from different perspectives: not just from vehicles on the road but also from mobile robots on sidewalks, which… ▽ More In this study, we address a gap in existing unsupervised domain adaptation approaches on LiDAR-based 3D object detection, which have predominantly concentrated on adapting between established, high-density autonomous driving datasets. We focus on sparser point clouds, capturing scenarios from different perspectives: not just from vehicles on the road but also from mobile robots on sidewalks, which encounter significantly different environmental conditions and sensor configurations. We introduce Unsupervised Adversarial Domain Adaptation for 3D Object Detection (UADA3D). UADA3D does not depend on pre-trained source models or teacher-student architectures. Instead, it uses an adversarial approach to directly learn domain-invariant features. We demonstrate its efficacy in various adaptation scenarios, showing significant improvements in both self-driving car and mobile robot domains. Our code is open-source and will be available soon. △ Less

Submitted 12 June, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

arXiv:2403.11496 [pdf, other]

MCD: Diverse Large-Scale Multi-Campus Dataset for Robot Perception

Authors: Thien-Minh Nguyen, Shenghai Yuan, Thien Hoang Nguyen, Pengyu Yin, Haozhi Cao, Lihua Xie, Maciej Wozniak, Patric Jensfelt, Marko Thiel, Justin Ziegenbein, Noel Blunder

Abstract: Perception plays a crucial role in various robot applications. However, existing well-annotated datasets are biased towards autonomous driving scenarios, while unlabelled SLAM datasets are quickly over-fitted, and often lack environment and domain variations. To expand the frontier of these fields, we introduce a comprehensive dataset named MCD (Multi-Campus Dataset), featuring a wide range of sen… ▽ More Perception plays a crucial role in various robot applications. However, existing well-annotated datasets are biased towards autonomous driving scenarios, while unlabelled SLAM datasets are quickly over-fitted, and often lack environment and domain variations. To expand the frontier of these fields, we introduce a comprehensive dataset named MCD (Multi-Campus Dataset), featuring a wide range of sensing modalities, high-accuracy ground truth, and diverse challenging environments across three Eurasian university campuses. MCD comprises both CCS (Classical Cylindrical Spinning) and NRE (Non-Repetitive Epicyclic) lidars, high-quality IMUs (Inertial Measurement Units), cameras, and UWB (Ultra-WideBand) sensors. Furthermore, in a pioneering effort, we introduce semantic annotations of 29 classes over 59k sparse NRE lidar scans across three domains, thus providing a novel challenge to existing semantic segmentation research upon this largely unexplored lidar modality. Finally, we propose, for the first time to the best of our knowledge, continuous-time ground truth based on optimization-based registration of lidar-inertial data on large survey-grade prior maps, which are also publicly released, each several times the size of existing ones. We conduct a rigorous evaluation of numerous state-of-the-art algorithms on MCD, report their performance, and highlight the challenges awaiting solutions from the research community. △ Less

Submitted 18 March, 2024; originally announced March 2024.

Comments: Accepted by The IEEE/CVF Conference on Computer Vision and Pattern Recognition 2024

arXiv:2403.01449 [pdf, other]

DUFOMap: Efficient Dynamic Awareness Map**

Authors: Daniel Duberg, Qingwen Zhang, MingKai Jia, Patric Jensfelt

Abstract: The dynamic nature of the real world is one of the main challenges in robotics. The first step in dealing with it is to detect which parts of the world are dynamic. A typical benchmark task is to create a map that contains only the static part of the world to support, for example, localization and planning. Current solutions are often applied in post-processing, where parameter tuning allows the u… ▽ More The dynamic nature of the real world is one of the main challenges in robotics. The first step in dealing with it is to detect which parts of the world are dynamic. A typical benchmark task is to create a map that contains only the static part of the world to support, for example, localization and planning. Current solutions are often applied in post-processing, where parameter tuning allows the user to adjust the setting for a specific dataset. In this paper, we propose DUFOMap, a novel dynamic awareness map** framework designed for efficient online processing. Despite having the same parameter settings for all scenarios, it performs better or is on par with state-of-the-art methods. Ray casting is utilized to identify and classify fully observed empty regions. Since these regions have been observed empty, it follows that anything inside them at another time must be dynamic. Evaluation is carried out in various scenarios, including outdoor environments in KITTI and Argoverse 2, open areas on the KTH campus, and with different sensor types. DUFOMap outperforms the state of the art in terms of accuracy and computational efficiency. The source code, benchmarks, and links to the datasets utilized are provided. See https://kth-rpl.github.io/dufomap for more details. △ Less

Submitted 12 April, 2024; v1 submitted 3 March, 2024; originally announced March 2024.

Comments: The first two authors hold equal contribution. 8 pages, 7 figures, project page https://kth-rpl.github.io/dufomap

arXiv:2401.16122 [pdf, other]

DeFlow: Decoder of Scene Flow Network in Autonomous Driving

Authors: Qingwen Zhang, Yi Yang, Heng Fang, Ruoyu Geng, Patric Jensfelt

Abstract: Scene flow estimation determines a scene's 3D motion field, by predicting the motion of points in the scene, especially for aiding tasks in autonomous driving. Many networks with large-scale point clouds as input use voxelization to create a pseudo-image for real-time running. However, the voxelization process often results in the loss of point-specific features. This gives rise to a challenge in… ▽ More Scene flow estimation determines a scene's 3D motion field, by predicting the motion of points in the scene, especially for aiding tasks in autonomous driving. Many networks with large-scale point clouds as input use voxelization to create a pseudo-image for real-time running. However, the voxelization process often results in the loss of point-specific features. This gives rise to a challenge in recovering those features for scene flow tasks. Our paper introduces DeFlow which enables a transition from voxel-based features to point features using Gated Recurrent Unit (GRU) refinement. To further enhance scene flow estimation performance, we formulate a novel loss function that accounts for the data imbalance between static and dynamic points. Evaluations on the Argoverse 2 scene flow task reveal that DeFlow achieves state-of-the-art results on large-scale point cloud data, demonstrating that our network has better performance and efficiency compared to others. The code is open-sourced at https://github.com/KTH-RPL/deflow. △ Less

Submitted 29 January, 2024; originally announced January 2024.

Comments: 7 pages, 4 figures, Code check https://github.com/KTH-RPL/deflow, accepted by ICRA 2024

arXiv:2401.06518 [pdf, other]

Transitional Grid Maps: Efficient Analytical Inference of Dynamic Environments under Limited Sensing

Authors: José Manuel Gaspar Sánchez, Leonard Bruns, Jana Tumova, Patric Jensfelt, Martin Törngren

Abstract: Autonomous agents rely on sensor data to construct representations of their environment, essential for predicting future events and planning their own actions. However, sensor measurements suffer from limited range, occlusions, and sensor noise. These challenges become more evident in dynamic environments, where efficiently inferring the state of the environment based on sensor readings from diffe… ▽ More Autonomous agents rely on sensor data to construct representations of their environment, essential for predicting future events and planning their own actions. However, sensor measurements suffer from limited range, occlusions, and sensor noise. These challenges become more evident in dynamic environments, where efficiently inferring the state of the environment based on sensor readings from different times is still an open problem. This work focuses on inferring the state of the dynamic part of the environment, i.e., where dynamic objects might be, based on previous observations and constraints on their dynamics. We formalize the problem and introduce Transitional Grid Maps (TGMs), an efficient analytical solution. TGMs are based on a set of novel assumptions that hold in many practical scenarios. They significantly reduce the complexity of the problem, enabling continuous prediction and updating of the entire dynamic map based on the known static map (see Fig.1), differentiating them from other alternatives. We compare our approach with a state-of-the-art particle filter, obtaining more prudent predictions in occluded scenarios and on-par results on unoccluded tracking. △ Less

Submitted 12 January, 2024; originally announced January 2024.

arXiv:2310.04800 [pdf, other]

Towards Long-Range 3D Object Detection for Autonomous Vehicles

Authors: A**kya Khoche, Laura Pereira Sánchez, Nazre Batool, Sina Sharif Mansouri, Patric Jensfelt

Abstract: 3D object detection at long range is crucial for ensuring the safety and efficiency of self driving vehicles, allowing them to accurately perceive and react to objects, obstacles, and potential hazards from a distance. But most current state of the art LiDAR based methods are range limited due to sparsity at long range, which generates a form of domain gap between points closer to and farther away… ▽ More 3D object detection at long range is crucial for ensuring the safety and efficiency of self driving vehicles, allowing them to accurately perceive and react to objects, obstacles, and potential hazards from a distance. But most current state of the art LiDAR based methods are range limited due to sparsity at long range, which generates a form of domain gap between points closer to and farther away from the ego vehicle. Another related problem is the label imbalance for faraway objects, which inhibits the performance of Deep Neural Networks at long range. To address the above limitations, we investigate two ways to improve long range performance of current LiDAR based 3D detectors. First, we combine two 3D detection networks, referred to as range experts, one specializing at near to mid range objects, and one at long range 3D detection. To train a detector at long range under a scarce label regime, we further weigh the loss according to the labelled point's distance from ego vehicle. Second, we augment LiDAR scans with virtual points generated using Multimodal Virtual Points (MVP), a readily available image-based depth completion algorithm. Our experiments on the long range Argoverse2 (AV2) dataset indicate that MVP is more effective in improving long range performance, while maintaining a straightforward implementation. On the other hand, the range experts offer a computationally efficient and simpler alternative, avoiding dependency on image-based segmentation networks and perfect camera-LiDAR calibration. △ Less

Submitted 20 May, 2024; v1 submitted 7 October, 2023; originally announced October 2023.

Comments: Accepted to Intelligent Vehicle Symposium (IV) 2024

arXiv:2307.07260 [pdf, other]

A Dynamic Points Removal Benchmark in Point Cloud Maps

Authors: Qingwen Zhang, Daniel Duberg, Ruoyu Geng, Mingkai Jia, Lujia Wang, Patric Jensfelt

Abstract: In the field of robotics, the point cloud has become an essential map representation. From the perspective of downstream tasks like localization and global path planning, points corresponding to dynamic objects will adversely affect their performance. Existing methods for removing dynamic points in point clouds often lack clarity in comparative evaluations and comprehensive analysis. Therefore, we… ▽ More In the field of robotics, the point cloud has become an essential map representation. From the perspective of downstream tasks like localization and global path planning, points corresponding to dynamic objects will adversely affect their performance. Existing methods for removing dynamic points in point clouds often lack clarity in comparative evaluations and comprehensive analysis. Therefore, we propose an easy-to-extend unified benchmarking framework for evaluating techniques for removing dynamic points in maps. It includes refactored state-of-art methods and novel metrics to analyze the limitations of these approaches. This enables researchers to dive deep into the underlying reasons behind these limitations. The benchmark makes use of several datasets with different sensor types. All the code and datasets related to our study are publicly available for further development and utilization. △ Less

Submitted 14 July, 2023; originally announced July 2023.

Comments: Code check https://github.com/KTH-RPL/DynamicMap_Benchmark.git , 7 pages, accepted by ITSC 2023

arXiv:2306.14589 [pdf, other]

Happily Error After: Framework Development and User Study for Correcting Robot Perception Errors in Virtual Reality

Authors: Maciej K. Wozniak, Rebecca Stower, Patric Jensfelt, Andre Pereira

Abstract: While we can see robots in more areas of our lives, they still make errors. One common cause of failure stems from the robot perception module when detecting objects. Allowing users to correct such errors can help improve the interaction and prevent the same errors in the future. Consequently, we investigate the effectiveness of a virtual reality (VR) framework for correcting perception errors of… ▽ More While we can see robots in more areas of our lives, they still make errors. One common cause of failure stems from the robot perception module when detecting objects. Allowing users to correct such errors can help improve the interaction and prevent the same errors in the future. Consequently, we investigate the effectiveness of a virtual reality (VR) framework for correcting perception errors of a Franka Panda robot. We conducted a user study with 56 participants who interacted with the robot using both VR and screen interfaces. Participants learned to collaborate with the robot faster in the VR interface compared to the screen interface. Additionally, participants found the VR interface more immersive, enjoyable, and expressed a preference for using it again. These findings suggest that VR interfaces may offer advantages over screen interfaces for human-robot interaction in erroneous environments. △ Less

Submitted 26 June, 2023; originally announced June 2023.

Comments: Accepted for IEEE RO-MAN 2023

arXiv:2306.07344 [pdf, other]

Towards a Robust Sensor Fusion Step for 3D Object Detection on Corrupted Data

Authors: Maciej K. Wozniak, Viktor Karefjards, Marko Thiel, Patric Jensfelt

Abstract: Multimodal sensor fusion methods for 3D object detection have been revolutionizing the autonomous driving research field. Nevertheless, most of these methods heavily rely on dense LiDAR data and accurately calibrated sensors which is often not the case in real-world scenarios. Data from LiDAR and cameras often come misaligned due to the miscalibration, decalibration, or different frequencies of th… ▽ More Multimodal sensor fusion methods for 3D object detection have been revolutionizing the autonomous driving research field. Nevertheless, most of these methods heavily rely on dense LiDAR data and accurately calibrated sensors which is often not the case in real-world scenarios. Data from LiDAR and cameras often come misaligned due to the miscalibration, decalibration, or different frequencies of the sensors. Additionally, some parts of the LiDAR data may be occluded and parts of the data may be missing due to hardware malfunction or weather conditions. This work presents a novel fusion step that addresses data corruptions and makes sensor fusion for 3D object detection more robust. Through extensive experiments, we demonstrate that our method performs on par with state-of-the-art approaches on normal data and outperforms them on misaligned data. △ Less

Submitted 12 June, 2023; originally announced June 2023.

arXiv:2301.08147 [pdf, other]

RGB-D-Based Categorical Object Pose and Shape Estimation: Methods, Datasets, and Evaluation

Authors: Leonard Bruns, Patric Jensfelt

Abstract: Recently, various methods for 6D pose and shape estimation of objects at a per-category level have been proposed. This work provides an overview of the field in terms of methods, datasets, and evaluation protocols. First, an overview of existing works and their commonalities and differences is provided. Second, we take a critical look at the predominant evaluation protocol, including metrics and d… ▽ More Recently, various methods for 6D pose and shape estimation of objects at a per-category level have been proposed. This work provides an overview of the field in terms of methods, datasets, and evaluation protocols. First, an overview of existing works and their commonalities and differences is provided. Second, we take a critical look at the predominant evaluation protocol, including metrics and datasets. Based on the findings, we propose a new set of metrics, contribute new annotations for the Redwood dataset, and evaluate state-of-the-art methods in a fair comparison. The results indicate that existing methods do not generalize well to unconstrained orientations and are actually heavily biased towards objects being upright. We provide an easy-to-use evaluation toolbox with well-defined metrics, methods, and dataset interfaces, which allows evaluation and comparison with various state-of-the-art approaches (https://github.com/roym899/pose_and_shape_evaluation). △ Less

Submitted 19 January, 2023; originally announced January 2023.

Comments: arXiv admin note: substantial text overlap with arXiv:2202.10346

arXiv:2301.04919 [pdf, other]

doi 10.1145/3568294.3580081

What you see is (not) what you get: A VR Framework for Correcting Robot Errors

Authors: Maciej K. Wozniak, Rebecca Stower, Patric Jensfelt, Andre Pereira

Abstract: Many solutions tailored for intuitive visualization or teleoperation of virtual, augmented and mixed (VAM) reality systems are not robust to robot failures, such as the inability to detect and recognize objects in the environment or planning unsafe trajectories. In this paper, we present a novel virtual reality (VR) framework where users can (i) recognize when the robot has failed to detect a real… ▽ More Many solutions tailored for intuitive visualization or teleoperation of virtual, augmented and mixed (VAM) reality systems are not robust to robot failures, such as the inability to detect and recognize objects in the environment or planning unsafe trajectories. In this paper, we present a novel virtual reality (VR) framework where users can (i) recognize when the robot has failed to detect a real-world object, (ii) correct the error in VR, (iii) modify proposed object trajectories and, (iv) implement behaviors on a real-world robot. Finally, we propose a user study aimed at testing the efficacy of our framework. Project materials can be found in the OSF repository. △ Less

Submitted 17 January, 2023; v1 submitted 12 January, 2023; originally announced January 2023.

Journal ref: HRI '23: Companion of the 2023 ACM/IEEE International Conference on Human-Robot Interaction

arXiv:2301.02086 [pdf, other]

A Probabilistic Framework for Visual Localization in Ambiguous Scenes

Authors: Fereidoon Zangeneh, Leonard Bruns, Amit Dekel, Alessandro Pieropan, Patric Jensfelt

Abstract: Visual localization allows autonomous robots to relocalize when losing track of their pose by matching their current observation with past ones. However, ambiguous scenes pose a challenge for such systems, as repetitive structures can be viewed from many distinct, equally likely camera poses, which means it is not sufficient to produce a single best pose hypothesis. In this work, we propose a prob… ▽ More Visual localization allows autonomous robots to relocalize when losing track of their pose by matching their current observation with past ones. However, ambiguous scenes pose a challenge for such systems, as repetitive structures can be viewed from many distinct, equally likely camera poses, which means it is not sufficient to produce a single best pose hypothesis. In this work, we propose a probabilistic framework that for a given image predicts the arbitrarily shaped posterior distribution of its camera pose. We do this via a novel formulation of camera pose regression using variational inference, which allows sampling from the predicted distribution. Our method outperforms existing methods on localization in ambiguous scenes. Code and data will be released at https://github.com/efreidun/vapor. △ Less

Submitted 5 January, 2023; originally announced January 2023.

arXiv:2211.03900 [pdf, other]

SLICT: Multi-input Multi-scale Surfel-Based Lidar-Inertial Continuous-Time Odometry and Map**

Authors: Thien-Minh Nguyen, Daniel Duberg, Patric Jensfelt, Shenghai Yuan, Lihua Xie

Abstract: While feature association to a global map has significant benefits, to keep the computations from growing exponentially, most lidar-based odometry and map** methods opt to associate features with local maps at one voxel scale. Taking advantage of the fact that surfels (surface elements) at different voxel scales can be organized in a tree-like structure, we propose an octree-based global map of… ▽ More While feature association to a global map has significant benefits, to keep the computations from growing exponentially, most lidar-based odometry and map** methods opt to associate features with local maps at one voxel scale. Taking advantage of the fact that surfels (surface elements) at different voxel scales can be organized in a tree-like structure, we propose an octree-based global map of multi-scale surfels that can be updated incrementally. This alleviates the need for recalculating, for example, a k-d tree of the whole map repeatedly. The system can also take input from a single or a number of sensors, reinforcing the robustness in degenerate cases. We also propose a point-to-surfel (PTS) association scheme, continuous-time optimization on PTS and IMU preintegration factors, along with loop closure and bundle adjustment, making a complete framework for Lidar-Inertial continuous-time odometry and map**. Experiments on public and in-house datasets demonstrate the advantages of our system compared to other state-of-the-art methods. To benefit the community, we release the source code and dataset at https://github.com/brytsknguyen/slict. △ Less

Submitted 9 November, 2022; v1 submitted 7 November, 2022; originally announced November 2022.

arXiv:2211.01700 [pdf, other]

doi 10.1109/ITSC55140.2022.9922537

Semantic 3D Grid Maps for Autonomous Driving

Authors: A**kya Khoche, Maciej K Wozniak, Daniel Duberg, Patric Jensfelt

Abstract: Maps play a key role in rapidly develo** area of autonomous driving. We survey the literature for different map representations and find that while the world is three-dimensional, it is common to rely on 2D map representations in order to meet real-time constraints. We believe that high levels of situation awareness require a 3D representation as well as the inclusion of semantic information. We… ▽ More Maps play a key role in rapidly develo** area of autonomous driving. We survey the literature for different map representations and find that while the world is three-dimensional, it is common to rely on 2D map representations in order to meet real-time constraints. We believe that high levels of situation awareness require a 3D representation as well as the inclusion of semantic information. We demonstrate that our recently presented hierarchical 3D grid map** framework UFOMap meets the real-time constraints. Furthermore, we show how it can be used to efficiently support more complex functions such as calculating the occluded parts of space and accumulating the output from a semantic segmentation network. △ Less

Submitted 9 November, 2022; v1 submitted 3 November, 2022; originally announced November 2022.

Comments: Submitted, accepted and presented at the 25th IEEE International Conference on Intelligent Transportation Systems (IEEE ITSC 2022)

arXiv:2207.04880 [pdf, other]

SDFEst: Categorical Pose and Shape Estimation of Objects from RGB-D using Signed Distance Fields

Authors: Leonard Bruns, Patric Jensfelt

Abstract: Rich geometric understanding of the world is an important component of many robotic applications such as planning and manipulation. In this paper, we present a modular pipeline for pose and shape estimation of objects from RGB-D images given their category. The core of our method is a generative shape model, which we integrate with a novel initialization network and a differentiable renderer to en… ▽ More Rich geometric understanding of the world is an important component of many robotic applications such as planning and manipulation. In this paper, we present a modular pipeline for pose and shape estimation of objects from RGB-D images given their category. The core of our method is a generative shape model, which we integrate with a novel initialization network and a differentiable renderer to enable 6D pose and shape estimation from a single or multiple views. We investigate the use of discretized signed distance fields as an efficient shape representation for fast analysis-by-synthesis optimization. Our modular framework enables multi-view optimization and extensibility. We demonstrate the benefits of our approach over state-of-the-art methods in several experiments on both synthetic and real data. We open-source our approach at https://github.com/roym899/sdfest. △ Less

Submitted 11 July, 2022; originally announced July 2022.

Comments: Accepted to IEEE Robotics and Automation Letters (and IROS 2022). Project page: https://github.com/roym899/sdfest

arXiv:2205.02079 [pdf, other]

SDF-based RGB-D Camera Tracking in Neural Scene Representations

Authors: Leonard Bruns, Fereidoon Zangeneh, Patric Jensfelt

Abstract: We consider the problem of tracking the 6D pose of a moving RGB-D camera in a neural scene representation. Different such representations have recently emerged, and we investigate the suitability of them for the task of camera tracking. In particular, we propose to track an RGB-D camera using a signed distance field-based representation and show that compared to density-based representations, trac… ▽ More We consider the problem of tracking the 6D pose of a moving RGB-D camera in a neural scene representation. Different such representations have recently emerged, and we investigate the suitability of them for the task of camera tracking. In particular, we propose to track an RGB-D camera using a signed distance field-based representation and show that compared to density-based representations, tracking can be sped up, which enables more robust and accurate pose estimates when computation time is limited. △ Less

Submitted 4 May, 2022; originally announced May 2022.

Comments: Accepted to the "Motion Planning with Implicit Neural Representations of Geometry" Workshop at ICRA 2022

arXiv:2203.03385 [pdf, other]

FloorGenT: Generative Vector Graphic Model of Floor Plans for Robotics

Authors: Ludvig Ericson, Patric Jensfelt

Abstract: Floor plans are the basis of reasoning in and communicating about indoor environments. In this paper, we show that by modelling floor plans as sequences of line segments seen from a particular point of view, recent advances in autoregressive sequence modelling can be leveraged to model and predict floor plans. The line segments are canonicalized and translated to sequence of tokens and an attentio… ▽ More Floor plans are the basis of reasoning in and communicating about indoor environments. In this paper, we show that by modelling floor plans as sequences of line segments seen from a particular point of view, recent advances in autoregressive sequence modelling can be leveraged to model and predict floor plans. The line segments are canonicalized and translated to sequence of tokens and an attention-based neural network is used to fit a one-step distribution over next tokens. We fit the network to sequences derived from a set of large-scale floor plans, and demonstrate the capabilities of the model in four scenarios: novel floor plan generation, completion of partially observed floor plans, generation of floor plans from simulated sensor data, and finally, the applicability of a floor plan model in predicting the shortest distance with partial knowledge of the environment. △ Less

Submitted 7 March, 2022; originally announced March 2022.

Comments: Submitted to IROS 2022. 7 pages, 6 figures

arXiv:2202.10346 [pdf, other]

On the Evaluation of RGB-D-based Categorical Pose and Shape Estimation

Authors: Leonard Bruns, Patric Jensfelt

Abstract: Recently, various methods for 6D pose and shape estimation of objects have been proposed. Typically, these methods evaluate their pose estimation in terms of average precision, and reconstruction quality with chamfer distance. In this work we take a critical look at this predominant evaluation protocol including metrics and datasets. We propose a new set of metrics, contribute new annotations for… ▽ More Recently, various methods for 6D pose and shape estimation of objects have been proposed. Typically, these methods evaluate their pose estimation in terms of average precision, and reconstruction quality with chamfer distance. In this work we take a critical look at this predominant evaluation protocol including metrics and datasets. We propose a new set of metrics, contribute new annotations for the Redwood dataset and evaluate state-of-the-art methods in a fair comparison. We find that existing methods do not generalize well to unconstrained orientations, and are actually heavily biased towards objects being upright. We contribute an easy-to-use evaluation toolbox with well-defined metrics, method and dataset interfaces, which readily allows evaluation and comparison with various state-of-the-art approaches (see https://github.com/roym899/pose_and_shape_evaluation ). △ Less

Submitted 21 February, 2022; originally announced February 2022.

Comments: 17 pages, 8 figures, submitted to IAS-17

arXiv:2003.04749 [pdf, other]

UFOMap: An Efficient Probabilistic 3D Map** Framework That Embraces the Unknown

Authors: Daniel Duberg, Patric Jensfelt

Abstract: 3D models are an essential part of many robotic applications. In applications where the environment is unknown a-priori, or where only a part of the environment is known, it is important that the 3D model can handle the unknown space efficiently. Path planning, exploration, and reconstruction all fall into this category. In this paper we present an extension to OctoMap which we call UFOMap. UFOMap… ▽ More 3D models are an essential part of many robotic applications. In applications where the environment is unknown a-priori, or where only a part of the environment is known, it is important that the 3D model can handle the unknown space efficiently. Path planning, exploration, and reconstruction all fall into this category. In this paper we present an extension to OctoMap which we call UFOMap. UFOMap uses an explicit representation of all three states in the map, i.e., occupied, free, and unknown. This gives, surprisingly, a more memory efficient representation. Furthermore, we provide methods that allow for significantly faster insertions into the octree. This enables real-time colored volumetric map** at high resolution (below 1 cm). UFOMap is contributed as a C++ library that can be used standalone but is also integrated into ROS. △ Less

Submitted 10 March, 2020; originally announced March 2020.

Comments: Project page: https://github.com/danielduberg/UFOMap

arXiv:1912.03426 [pdf, other]

Self-Supervised 3D Keypoint Learning for Ego-motion Estimation

Authors: Jiexiong Tang, Rares Ambrus, Vitor Guizilini, Sudeep Pillai, Hanme Kim, Patric Jensfelt, Adrien Gaidon

Abstract: Detecting and matching robust viewpoint-invariant keypoints is critical for visual SLAM and Structure-from-Motion. State-of-the-art learning-based methods generate training samples via homography adaptation to create 2D synthetic views with known keypoint matches from a single image. This approach, however, does not generalize to non-planar 3D scenes with illumination variations commonly seen in r… ▽ More Detecting and matching robust viewpoint-invariant keypoints is critical for visual SLAM and Structure-from-Motion. State-of-the-art learning-based methods generate training samples via homography adaptation to create 2D synthetic views with known keypoint matches from a single image. This approach, however, does not generalize to non-planar 3D scenes with illumination variations commonly seen in real-world videos. In this work, we propose self-supervised learning of depth-aware keypoints directly from unlabeled videos. We jointly learn keypoint and depth estimation networks by combining appearance and geometric matching via a differentiable structure-from-motion module based on Procrustean residual pose correction. We describe how our self-supervised keypoints can be integrated into state-of-the-art visual odometry frameworks for robust and accurate ego-motion estimation of autonomous vehicles in real-world conditions. △ Less

Submitted 17 November, 2020; v1 submitted 6 December, 2019; originally announced December 2019.

arXiv:1909.08812 [pdf, other]

Flexible Disaster Response of Tomorrow -- Final Presentation and Evaluation of the CENTAURO System

Authors: Tobias Klamt, Diego Rodriguez, Lorenzo Baccelliere, Xi Chen, Domenico Chiaradia, Torben Cichon, Massimiliano Gabardi, Paolo Guria, Karl Holmquist, Malgorzata Kamedula, Hakan Karaoguz, Navvab Kashiri, Arturo Laurenzi, Christian Lenz, Daniele Leonardis, Enrico Mingo Hoffman, Luca Muratore, Dmytro Pavlichenko, Francesco Porcini, Zeyu Ren, Fabian Schilling, Max Schwarz, Massimiliano Solazzi, Michael Felsberg, Antonio Frisoli , et al. (7 additional authors not shown)

Abstract: Mobile manipulation robots have high potential to support rescue forces in disaster-response missions. Despite the difficulties imposed by real-world scenarios, robots are promising to perform mission tasks from a safe distance. In the CENTAURO project, we developed a disaster-response system which consists of the highly flexible Centauro robot and suitable control interfaces including an immersiv… ▽ More Mobile manipulation robots have high potential to support rescue forces in disaster-response missions. Despite the difficulties imposed by real-world scenarios, robots are promising to perform mission tasks from a safe distance. In the CENTAURO project, we developed a disaster-response system which consists of the highly flexible Centauro robot and suitable control interfaces including an immersive tele-presence suit and support-operator controls on different levels of autonomy. In this article, we give an overview of the final CENTAURO system. In particular, we explain several high-level design decisions and how those were derived from requirements and extensive experience of Kerntechnische Hilfsdienst GmbH, Karlsruhe, Germany (KHG). We focus on components which were recently integrated and report about a systematic evaluation which demonstrated system capabilities and revealed valuable insights. △ Less

Submitted 19 September, 2019; originally announced September 2019.

Comments: Accepted for IEEE Robotics and Automation Magazine (RAM), to appear December 2019

arXiv:1909.07745 [pdf, other]

Adversarial Feature Training for Generalizable Robotic Visuomotor Control

Authors: Xi Chen, Ali Ghadirzadeh, Mårten Björkman, Patric Jensfelt

Abstract: Deep reinforcement learning (RL) has enabled training action-selection policies, end-to-end, by learning a function which maps image pixels to action outputs. However, it's application to visuomotor robotic policy training has been limited because of the challenge of large-scale data collection when working with physical hardware. A suitable visuomotor policy should perform well not just for the t… ▽ More Deep reinforcement learning (RL) has enabled training action-selection policies, end-to-end, by learning a function which maps image pixels to action outputs. However, it's application to visuomotor robotic policy training has been limited because of the challenge of large-scale data collection when working with physical hardware. A suitable visuomotor policy should perform well not just for the task-setup it has been trained for, but also for all varieties of the task, including novel objects at different viewpoints surrounded by task-irrelevant objects. However, it is impractical for a robotic setup to sufficiently collect interactive samples in a RL framework to generalize well to novel aspects of a task. In this work, we demonstrate that by using adversarial training for domain transfer, it is possible to train visuomotor policies based on RL frameworks, and then transfer the acquired policy to other novel task domains. We propose to leverage the deep RL capabilities to learn complex visuomotor skills for uncomplicated task setups, and then exploit transfer learning to generalize to new task domains provided only still images of the task in the target domain. We evaluate our method on two real robotic tasks, picking and pouring, and compare it to a number of prior works, demonstrating its superiority. △ Less

Submitted 17 September, 2019; originally announced September 2019.

arXiv:1906.01258 [pdf, other]

Knowledge is Never Enough: Towards Web Aided Deep Open World Recognition

Authors: Massimiliano Mancini, Hakan Karaoguz, Elisa Ricci, Patric Jensfelt, Barbara Caputo

Abstract: While today's robots are able to perform sophisticated tasks, they can only act on objects they have been trained to recognize. This is a severe limitation: any robot will inevitably see new objects in unconstrained settings, and thus will always have visual knowledge gaps. However, standard visual modules are usually built on a limited set of classes and are based on the strong prior that an obje… ▽ More While today's robots are able to perform sophisticated tasks, they can only act on objects they have been trained to recognize. This is a severe limitation: any robot will inevitably see new objects in unconstrained settings, and thus will always have visual knowledge gaps. However, standard visual modules are usually built on a limited set of classes and are based on the strong prior that an object must belong to one of those classes. Identifying whether an instance does not belong to the set of known categories (i.e. open set recognition), only partially tackles this problem, as a truly autonomous agent should be able not only to detect what it does not know, but also to extend dynamically its knowledge about the world. We contribute to this challenge with a deep learning architecture that can dynamically update its known classes in an end-to-end fashion. The proposed deep network, based on a deep extension of a non-parametric model, detects whether a perceived object belongs to the set of categories known by the system and learns it without the need to retrain the whole system from scratch. Annotated images about the new category can be provided by an 'oracle' (i.e. human supervision), or by autonomous mining of the Web. Experiments on two different databases and on a robot platform demonstrate the promise of our approach. △ Less

Submitted 4 June, 2019; originally announced June 2019.

Comments: ICRA 2019

arXiv:1903.09199 [pdf, other]

Sparse2Dense: From direct sparse odometry to dense 3D reconstruction

Authors: Jiexiong Tang, John Folkesson, Patric Jensfelt

Abstract: In this paper, we proposed a new deep learning based dense monocular SLAM method. Compared to existing methods, the proposed framework constructs a dense 3D model via a sparse to dense map** using learned surface normals. With single view learned depth estimation as prior for monocular visual odometry, we obtain both accurate positioning and high quality depth reconstruction. The depth and norma… ▽ More In this paper, we proposed a new deep learning based dense monocular SLAM method. Compared to existing methods, the proposed framework constructs a dense 3D model via a sparse to dense map** using learned surface normals. With single view learned depth estimation as prior for monocular visual odometry, we obtain both accurate positioning and high quality depth reconstruction. The depth and normal are predicted by a single network trained in a tightly coupled manner.Experimental results show that our method significantly improves the performance of visual tracking and depth prediction in comparison to the state-of-the-art in deep monocular dense SLAM. △ Less

Submitted 21 March, 2019; originally announced March 2019.

Comments: Accepted to ICRA 2019 (RA-L option), video demo available at https://www.youtube.com/watch?v=3pbSHX72JC8&t=22s

arXiv:1902.11046 [pdf, other]

GCNv2: Efficient Correspondence Prediction for Real-Time SLAM

Authors: Jiexiong Tang, Ludvig Ericson, John Folkesson, Patric Jensfelt

Abstract: In this paper, we present a deep learning-based network, GCNv2, for generation of keypoints and descriptors. GCNv2 is built on our previous method, GCN, a network trained for 3D projective geometry. GCNv2 is designed with a binary descriptor vector as the ORB feature so that it can easily replace ORB in systems such as ORB-SLAM2. GCNv2 significantly improves the computational efficiency over GCN t… ▽ More In this paper, we present a deep learning-based network, GCNv2, for generation of keypoints and descriptors. GCNv2 is built on our previous method, GCN, a network trained for 3D projective geometry. GCNv2 is designed with a binary descriptor vector as the ORB feature so that it can easily replace ORB in systems such as ORB-SLAM2. GCNv2 significantly improves the computational efficiency over GCN that was only able to run on desktop hardware. We show how a modified version of ORB-SLAM2 using GCNv2 features runs on a Jetson TX2, an embedded low-power platform. Experimental results show that GCNv2 retains comparable accuracy as GCN and that it is robust enough to use for control of a flying drone. △ Less

Submitted 18 June, 2019; v1 submitted 28 February, 2019; originally announced February 2019.

Comments: Project page: https://github.com/jiexiong2016/GCNv2_SLAM

arXiv:1811.03376 [pdf, other]

Meta-Learning for Multi-objective Reinforcement Learning

Authors: Xi Chen, Ali Ghadirzadeh, Mårten Björkman, Patric Jensfelt

Abstract: Multi-objective reinforcement learning (MORL) is the generalization of standard reinforcement learning (RL) approaches to solve sequential decision making problems that consist of several, possibly conflicting, objectives. Generally, in such formulations, there is no single optimal policy which optimizes all the objectives simultaneously, and instead, a number of policies has to be found each opti… ▽ More Multi-objective reinforcement learning (MORL) is the generalization of standard reinforcement learning (RL) approaches to solve sequential decision making problems that consist of several, possibly conflicting, objectives. Generally, in such formulations, there is no single optimal policy which optimizes all the objectives simultaneously, and instead, a number of policies has to be found each optimizing a preference of the objectives. In other words, the MORL is framed as a meta-learning problem, with the task distribution given by a distribution over the preferences. We demonstrate that such a formulation results in a better approximation of the Pareto optimal solutions in terms of both the optimality and the computational efficiency. We evaluated our method on obtaining Pareto optimal policies using a number of continuous control problems with high degrees of freedom. △ Less

Submitted 7 October, 2019; v1 submitted 8 November, 2018; originally announced November 2018.

arXiv:1807.01028 [pdf, other]

Kitting in the Wild through Online Domain Adaptation

Authors: Massimiliano Mancini, Hakan Karaoguz, Elisa Ricci, Patric Jensfelt, Barbara Caputo

Abstract: Technological developments call for increasing perception and action capabilities of robots. Among other skills, vision systems that can adapt to any possible change in the working conditions are needed. Since these conditions are unpredictable, we need benchmarks which allow to assess the generalization and robustness capabilities of our visual recognition algorithms. In this work we focus on rob… ▽ More Technological developments call for increasing perception and action capabilities of robots. Among other skills, vision systems that can adapt to any possible change in the working conditions are needed. Since these conditions are unpredictable, we need benchmarks which allow to assess the generalization and robustness capabilities of our visual recognition algorithms. In this work we focus on robotic kitting in unconstrained scenarios. As a first contribution, we present a new visual dataset for the kitting task. Differently from standard object recognition datasets, we provide images of the same objects acquired under various conditions where camera, illumination and background are changed. This novel dataset allows for testing the robustness of robot visual recognition algorithms to a series of different domain shifts both in isolation and unified. Our second contribution is a novel online adaptation algorithm for deep models, based on batch-normalization layers, which allows to continuously adapt a model to the current working conditions. Differently from standard domain adaptation algorithms, it does not require any image from the target domain at training time. We benchmark the performance of the algorithm on the proposed dataset, showing its capability to fill the gap between the performances of a standard architecture and its counterpart adapted offline to the given target domain. △ Less

Submitted 3 July, 2018; originally announced July 2018.

Comments: Accepted to IROS 2018

arXiv:1804.10500 [pdf, other]

Deep Reinforcement Learning to Acquire Navigation Skills for Wheel-Legged Robots in Complex Environments

Authors: Xi Chen, Ali Ghadirzadeh, John Folkesson, Patric Jensfelt

Abstract: Mobile robot navigation in complex and dynamic environments is a challenging but important problem. Reinforcement learning approaches fail to solve these tasks efficiently due to reward sparsities, temporal complexities and high-dimensionality of sensorimotor spaces which are inherent in such problems. We present a novel approach to train action policies to acquire navigation skills for wheel-legg… ▽ More Mobile robot navigation in complex and dynamic environments is a challenging but important problem. Reinforcement learning approaches fail to solve these tasks efficiently due to reward sparsities, temporal complexities and high-dimensionality of sensorimotor spaces which are inherent in such problems. We present a novel approach to train action policies to acquire navigation skills for wheel-legged robots using deep reinforcement learning. The policy maps height-map image observations to motor commands to navigate to a target position while avoiding obstacles. We propose to acquire the multifaceted navigation skill by learning and exploiting a number of manageable navigation behaviors. We also introduce a domain randomization technique to improve the versatility of the training samples. We demonstrate experimentally a significant improvement in terms of data-efficiency, success rate, robustness against irrelevant sensory data, and also the quality of the maneuver skills. △ Less

Submitted 27 April, 2018; originally announced April 2018.

Comments: Submitted to IROS 2018

arXiv:1804.03905 [pdf, other]

Fusing Saliency Maps with Region Proposals for Unsupervised Object Localization

Authors: Hakan Karaoguz, Patric Jensfelt

Abstract: In this paper we address the problem of unsupervised localization of objects in single images. Compared to previous state-of-the-art method our method is fully unsupervised in the sense that there is no prior instance level or category level information about the image. Furthermore, we treat each image individually and do not rely on any neighboring image similarity. We employ deep-learning based… ▽ More In this paper we address the problem of unsupervised localization of objects in single images. Compared to previous state-of-the-art method our method is fully unsupervised in the sense that there is no prior instance level or category level information about the image. Furthermore, we treat each image individually and do not rely on any neighboring image similarity. We employ deep-learning based generation of saliency maps and region proposals to tackle this problem. First salient regions in the image are determined using an encoder/decoder architecture. The resulting saliency map is matched with region proposals from a class agnostic region proposal network to roughly localize the candidate object regions. These regions are further refined based on the overlap and similarity ratios. Our experimental evaluations on a benchmark dataset show that the method gets close to current state-of-the-art methods in terms of localization accuracy even though these make use of multiple frames. Furthermore, we created a more challenging and realistic dataset with multiple object categories and varying viewpoint and illumination conditions for evaluating the method's performance in real world scenarios. △ Less

Submitted 11 April, 2018; originally announced April 2018.

Comments: 7 pages, 4 figures, 6 tables, submitted to IEEE IROS2018 conference

arXiv:1801.09292 [pdf, other]

Multiple Object Detection, Tracking and Long-Term Dynamics Learning in Large 3D Maps

Authors: Nils Bore, Patric Jensfelt, John Folkesson

Abstract: In this work, we present a method for tracking and learning the dynamics of all objects in a large scale robot environment. A mobile robot patrols the environment and visits the different locations one by one. Movable objects are discovered by change detection, and tracked throughout the robot deployment. For tracking, we extend the Rao-Blackwellized particle filter of previous work with birth and… ▽ More In this work, we present a method for tracking and learning the dynamics of all objects in a large scale robot environment. A mobile robot patrols the environment and visits the different locations one by one. Movable objects are discovered by change detection, and tracked throughout the robot deployment. For tracking, we extend the Rao-Blackwellized particle filter of previous work with birth and death processes, enabling the method to handle an arbitrary number of objects. Target births and associations are sampled using Gibbs sampling. The parameters of the system are then learnt using the Expectation Maximization algorithm in an unsupervised fashion. The system therefore enables learning of the dynamics of one particular environment, and of its objects. The algorithm is evaluated on data collected autonomously by a mobile robot in an office environment during a real-world deployment. We show that the algorithm automatically identifies and tracks the moving objects within 3D maps and infers plausible dynamics models, significantly decreasing the modeling bias of our previous work. The proposed method represents an improvement over previous methods for environment dynamics learning as it allows for learning of fine grained processes. △ Less

Submitted 28 January, 2018; originally announced January 2018.

Comments: Submitted for peer review

arXiv:1712.08409 [pdf, other]

Detection and Tracking of General Movable Objects in Large 3D Maps

Authors: Nils Bore, Johan Ekekrantz, Patric Jensfelt, John Folkesson

Abstract: This paper studies the problem of detection and tracking of general objects with long-term dynamics, observed by a mobile robot moving in a large environment. A key problem is that due to the environment scale, it can only observe a subset of the objects at any given time. Since some time passes between observations of objects in different places, the objects might be moved when the robot is not t… ▽ More This paper studies the problem of detection and tracking of general objects with long-term dynamics, observed by a mobile robot moving in a large environment. A key problem is that due to the environment scale, it can only observe a subset of the objects at any given time. Since some time passes between observations of objects in different places, the objects might be moved when the robot is not there. We propose a model for this movement in which the objects typically only move locally, but with some small probability they jump longer distances, through what we call global motion. For filtering, we decompose the posterior over local and global movements into two linked processes. The posterior over the global movements and measurement associations is sampled, while we track the local movement analytically using Kalman filters. This novel filter is evaluated on point cloud data gathered autonomously by a mobile robot over an extended period of time. We show that tracking jum** objects is feasible, and that the proposed probabilistic treatment outperforms previous methods when applied to real world data. The key to efficient probabilistic tracking in this scenario is focused sampling of the object posteriors. △ Less

Submitted 30 January, 2018; v1 submitted 22 December, 2017; originally announced December 2017.

Comments: Submitted for peer review

arXiv:1710.06929 [pdf, other]

Unsupervised Object Discovery and Segmentation of RGBD-images

Authors: Johan Ekekrantz, Nils Bore, Rares Ambrus, John Folkesson, Patric Jensfelt

Abstract: In this paper we introduce a system for unsupervised object discovery and segmentation of RGBD-images. The system models the sensor noise directly from data, allowing accurate segmentation without sensor specific hand tuning of measurement noise models making use of the recently introduced Statistical Inlier Estimation (SIE) method. Through a fully probabilistic formulation, the system is able to… ▽ More In this paper we introduce a system for unsupervised object discovery and segmentation of RGBD-images. The system models the sensor noise directly from data, allowing accurate segmentation without sensor specific hand tuning of measurement noise models making use of the recently introduced Statistical Inlier Estimation (SIE) method. Through a fully probabilistic formulation, the system is able to apply probabilistic inference, enabling reliable segmentation in previously challenging scenarios. In addition, we introduce new methods for filtering out false positives, significantly improving the signal to noise ratio. We show that the system significantly outperform state-of-the-art in on a challenging real-world dataset. △ Less

Submitted 18 October, 2017; originally announced October 2017.

Comments: 15 pages, 6 figures

arXiv:1704.07910 [pdf, ps, other]

Adaptive Cost Function for Pointcloud Registration

Authors: Johan Ekekrantz, John Folkesson, Patric Jensfelt

Abstract: In this paper we introduce an adaptive cost function for pointcloud registration. The algorithm automatically estimates the sensor noise, which is important for generalization across different sensors and environments. Through experiments on real and synthetic data, we show significant improvements in accuracy and robustness over state-of-the-art solutions. In this paper we introduce an adaptive cost function for pointcloud registration. The algorithm automatically estimates the sensor noise, which is important for generalization across different sensors and environments. Through experiments on real and synthetic data, we show significant improvements in accuracy and robustness over state-of-the-art solutions. △ Less

Submitted 25 April, 2017; originally announced April 2017.

Comments: 10 pages, 7 figures, 1 table

arXiv:1604.04384 [pdf, other]

doi 10.1109/MRA.2016.2636359

The STRANDS Project: Long-Term Autonomy in Everyday Environments

Authors: Nick Hawes, Chris Burbridge, Ferdian Jovan, Lars Kunze, Bruno Lacerda, Lenka Mudrová, Jay Young, Jeremy Wyatt, Denise Hebesberger, Tobias Körtner, Rares Ambrus, Nils Bore, John Folkesson, Patric Jensfelt, Lucas Beyer, Alexander Hermans, Bastian Leibe, Aitor Aldoma, Thomas Fäulhammer, Michael Zillich, Markus Vincze, Eris Chinellato, Muhannad Al-Omari, Paul Duckworth, Yiannis Gatsoulis , et al. (8 additional authors not shown)

Abstract: Thanks to the efforts of the robotics and autonomous systems community, robots are becoming ever more capable. There is also an increasing demand from end-users for autonomous service robots that can operate in real environments for extended periods. In the STRANDS project we are tackling this demand head-on by integrating state-of-the-art artificial intelligence and robotics research into mobile… ▽ More Thanks to the efforts of the robotics and autonomous systems community, robots are becoming ever more capable. There is also an increasing demand from end-users for autonomous service robots that can operate in real environments for extended periods. In the STRANDS project we are tackling this demand head-on by integrating state-of-the-art artificial intelligence and robotics research into mobile service robots, and deploying these systems for long-term installations in security and care environments. Over four deployments, our robots have been operational for a combined duration of 104 days autonomously performing end-user defined tasks, covering 116km in the process. In this article we describe the approach we have used to enable long-term autonomous operation in everyday environments, and how our robots are able to use their long run times to improve their own performance. △ Less

Submitted 14 October, 2016; v1 submitted 15 April, 2016; originally announced April 2016.

Showing 1–38 of 38 results for author: Jensfelt, P