-
Introspective Robot Perception using Smoothed Predictions from Bayesian Neural Networks
Authors:
Jianxiang Feng,
Maximilian Durner,
Zoltan-Csaba Marton,
Ferenc Balint-Benczedi,
Rudolph Triebel
Abstract:
This work focuses on improving uncertainty estimation in the field of object classification from RGB images and demonstrates its benefits in two robotic applications. We employ a (BNN), and evaluate two practical inference techniques to obtain better uncertainty estimates, namely Concrete Dropout (CDP) and Kronecker-factored Laplace Approximation (LAP). We show a performance increase using more re…
▽ More
This work focuses on improving uncertainty estimation in the field of object classification from RGB images and demonstrates its benefits in two robotic applications. We employ a (BNN), and evaluate two practical inference techniques to obtain better uncertainty estimates, namely Concrete Dropout (CDP) and Kronecker-factored Laplace Approximation (LAP). We show a performance increase using more reliable uncertainty estimates as unary potentials within a Conditional Random Field (CRF), which is able to incorporate contextual information as well. Furthermore, the obtained uncertainties are exploited to achieve domain adaptation in a semi-supervised manner, which requires less manual efforts in annotating data. We evaluate our approach on two public benchmark datasets that are relevant for robot perception tasks.
△ Less
Submitted 27 September, 2021;
originally announced September 2021.
-
Unknown Object Segmentation from Stereo Images
Authors:
Maximilian Durner,
Wout Boerdijk,
Martin Sundermeyer,
Werner Friedl,
Zoltan-Csaba Marton,
Rudolph Triebel
Abstract:
Although instance-aware perception is a key prerequisite for many autonomous robotic applications, most of the methods only partially solve the problem by focusing solely on known object categories. However, for robots interacting in dynamic and cluttered environments, this is not realistic and severely limits the range of potential applications. Therefore, we propose a novel object instance segme…
▽ More
Although instance-aware perception is a key prerequisite for many autonomous robotic applications, most of the methods only partially solve the problem by focusing solely on known object categories. However, for robots interacting in dynamic and cluttered environments, this is not realistic and severely limits the range of potential applications. Therefore, we propose a novel object instance segmentation approach that does not require any semantic or geometric information of the objects beforehand. In contrast to existing works, we do not explicitly use depth data as input, but rely on the insight that slight viewpoint changes, which for example are provided by stereo image pairs, are often sufficient to determine object boundaries and thus to segment objects. Focusing on the versatility of stereo sensors, we employ a transformer-based architecture that maps directly from the pair of input images to the object instances. This has the major advantage that instead of a noisy, and potentially incomplete depth map as an input, on which the segmentation is computed, we use the original image pair to infer the object instances and a dense depth map. In experiments in several different application domains, we show that our Instance Stereo Transformer (INSTR) algorithm outperforms current state-of-the-art methods that are based on depth maps. Training code and pretrained models will be made available.
△ Less
Submitted 11 March, 2021;
originally announced March 2021.
-
RoboSherlock: Cognition-enabled Robot Perception for Everyday Manipulation Tasks
Authors:
Ferenc Bálint-Benczédi,
Jan-Hendrik Worch,
Daniel Nyga,
Nico Blodow,
Patrick Mania,
Zoltán-Csaba Márton,
Michael Beetz
Abstract:
A pressing question when designing intelligent autonomous systems is how to integrate the various subsystems concerned with complementary tasks. More specifically, robotic vision must provide task-relevant information about the environment and the objects in it to various planning related modules. In most implementations of the traditional Perception-Cognition-Action paradigm these tasks are treat…
▽ More
A pressing question when designing intelligent autonomous systems is how to integrate the various subsystems concerned with complementary tasks. More specifically, robotic vision must provide task-relevant information about the environment and the objects in it to various planning related modules. In most implementations of the traditional Perception-Cognition-Action paradigm these tasks are treated as quasi-independent modules that function as black boxes for each other. It is our view that perception can benefit tremendously from a tight collaboration with cognition. We present RoboSherlock, a knowledge-enabled cognitive perception systems for mobile robots performing human-scale everyday manipulation tasks. In RoboSherlock, perception and interpretation of realistic scenes is formulated as an unstructured information management(UIM) problem. The application of the UIM principle supports the implementation of perception systems that can answer task-relevant queries about objects in a scene, boost object recognition performance by combining the strengths of multiple perception algorithms, support knowledge-enabled reasoning about objects and enable automatic and knowledge-driven generation of processing pipelines. We demonstrate the potential of the proposed framework through feasibility studies of systems for real-world scene perception that have been built on top of the framework.
△ Less
Submitted 22 November, 2019;
originally announced November 2019.
-
Multi-path Learning for Object Pose Estimation Across Domains
Authors:
Martin Sundermeyer,
Maximilian Durner,
En Yen Puang,
Zoltan-Csaba Marton,
Narunas Vaskevicius,
Kai O. Arras,
Rudolph Triebel
Abstract:
We introduce a scalable approach for object pose estimation trained on simulated RGB views of multiple 3D models together. We learn an encoding of object views that does not only describe an implicit orientation of all objects seen during training, but can also relate views of untrained objects. Our single-encoder-multi-decoder network is trained using a technique we denote "multi-path learning":…
▽ More
We introduce a scalable approach for object pose estimation trained on simulated RGB views of multiple 3D models together. We learn an encoding of object views that does not only describe an implicit orientation of all objects seen during training, but can also relate views of untrained objects. Our single-encoder-multi-decoder network is trained using a technique we denote "multi-path learning": While the encoder is shared by all objects, each decoder only reconstructs views of a single object. Consequently, views of different instances do not have to be separated in the latent space and can share common features. The resulting encoder generalizes well from synthetic to real data and across various instances, categories, model types and datasets. We systematically investigate the learned encodings, their generalization, and iterative refinement strategies on the ModelNet40 and T-LESS dataset. Despite training jointly on multiple objects, our 6D Object Detection pipeline achieves state-of-the-art results on T-LESS at much lower runtimes than competing approaches.
△ Less
Submitted 3 April, 2020; v1 submitted 31 July, 2019;
originally announced August 2019.
-
Implicit 3D Orientation Learning for 6D Object Detection from RGB Images
Authors:
Martin Sundermeyer,
Zoltan-Csaba Marton,
Maximilian Durner,
Manuel Brucker,
Rudolph Triebel
Abstract:
We propose a real-time RGB-based pipeline for object detection and 6D pose estimation. Our novel 3D orientation estimation is based on a variant of the Denoising Autoencoder that is trained on simulated views of a 3D model using Domain Randomization. This so-called Augmented Autoencoder has several advantages over existing methods: It does not require real, pose-annotated training data, generalize…
▽ More
We propose a real-time RGB-based pipeline for object detection and 6D pose estimation. Our novel 3D orientation estimation is based on a variant of the Denoising Autoencoder that is trained on simulated views of a 3D model using Domain Randomization. This so-called Augmented Autoencoder has several advantages over existing methods: It does not require real, pose-annotated training data, generalizes to various test sensors and inherently handles object and view symmetries. Instead of learning an explicit map** from input images to object poses, it provides an implicit representation of object orientations defined by samples in a latent space. Our pipeline achieves state-of-the-art performance on the T-LESS dataset both in the RGB and RGB-D domain. We also evaluate on the LineMOD dataset where we can compete with other synthetically trained approaches. We further increase performance by correcting 3D orientation estimates to account for perspective errors when the object deviates from the image center and show extended results.
△ Less
Submitted 17 July, 2019; v1 submitted 4 February, 2019;
originally announced February 2019.