-
ConvPoseCNN: Dense Convolutional 6D Object Pose Estimation
Authors:
Catherine Capellen,
Max Schwarz,
Sven Behnke
Abstract:
6D object pose estimation is a prerequisite for many applications. In recent years, monocular pose estimation has attracted much research interest because it does not need depth measurements. In this work, we introduce ConvPoseCNN, a fully convolutional architecture that avoids cutting out individual objects. Instead we propose pixel-wise, dense prediction of both translation and orientation compo…
▽ More
6D object pose estimation is a prerequisite for many applications. In recent years, monocular pose estimation has attracted much research interest because it does not need depth measurements. In this work, we introduce ConvPoseCNN, a fully convolutional architecture that avoids cutting out individual objects. Instead we propose pixel-wise, dense prediction of both translation and orientation components of the object pose, where the dense orientation is represented in Quaternion form. We present different approaches for aggregation of the dense orientation predictions, including averaging and clustering schemes. We evaluate ConvPoseCNN on the challenging YCB-Video Dataset, where we show that the approach has far fewer parameters and trains faster than comparable methods without sacrificing accuracy. Furthermore, our results indicate that the dense orientation prediction implicitly learns to attend to trustworthy, occlusion-free, and feature-rich object regions.
△ Less
Submitted 16 December, 2019;
originally announced December 2019.
-
Bonn Activity Maps: Dataset Description
Authors:
Julian Tanke,
Oh-Hun Kwon,
Patrick Stotko,
Radu Alexandru Rosu,
Michael Weinmann,
Hassan Errami,
Sven Behnke,
Maren Bennewitz,
Reinhard Klein,
Andreas Weber,
Angela Yao,
Juergen Gall
Abstract:
The key prerequisite for accessing the huge potential of current machine learning techniques is the availability of large databases that capture the complex relations of interest. Previous datasets are focused on either 3D scene representations with semantic information, tracking of multiple persons and recognition of their actions, or activity recognition of a single person in captured 3D environ…
▽ More
The key prerequisite for accessing the huge potential of current machine learning techniques is the availability of large databases that capture the complex relations of interest. Previous datasets are focused on either 3D scene representations with semantic information, tracking of multiple persons and recognition of their actions, or activity recognition of a single person in captured 3D environments. We present Bonn Activity Maps, a large-scale dataset for human tracking, activity recognition and anticipation of multiple persons. Our dataset comprises four different scenes that have been recorded by time-synchronized cameras each only capturing the scene partially, the reconstructed 3D models with semantic annotations, motion trajectories for individual people including 3D human poses as well as human activity annotations. We utilize the annotations to generate activity likelihoods on the 3D models called activity maps.
△ Less
Submitted 13 December, 2019;
originally announced December 2019.
-
LatticeNet: Fast Point Cloud Segmentation Using Permutohedral Lattices
Authors:
Radu Alexandru Rosu,
Peer Schütt,
Jan Quenzel,
Sven Behnke
Abstract:
Deep convolutional neural networks (CNNs) have shown outstanding performance in the task of semantically segmenting images. However, applying the same methods on 3D data still poses challenges due to the heavy memory requirements and the lack of structured data. Here, we propose LatticeNet, a novel approach for 3D semantic segmentation, which takes as input raw point clouds. A PointNet describes t…
▽ More
Deep convolutional neural networks (CNNs) have shown outstanding performance in the task of semantically segmenting images. However, applying the same methods on 3D data still poses challenges due to the heavy memory requirements and the lack of structured data. Here, we propose LatticeNet, a novel approach for 3D semantic segmentation, which takes as input raw point clouds. A PointNet describes the local geometry which we embed into a sparse permutohedral lattice. The lattice allows for fast convolutions while kee** a low memory footprint. Further, we introduce DeformSlice, a novel learned data-dependent interpolation for projecting lattice features back onto the point cloud. We present results of 3D segmentation on various datasets where our method achieves state-of-the-art performance.
△ Less
Submitted 16 August, 2020; v1 submitted 12 December, 2019;
originally announced December 2019.
-
Refining 6D Object Pose Predictions using Abstract Render-and-Compare
Authors:
Arul Selvam Periyasamy,
Max Schwarz,
Sven Behnke
Abstract:
Robotic systems often require precise scene analysis capabilities, especially in unstructured, cluttered situations, as occurring in human-made environments. While current deep-learning based methods yield good estimates of object poses, they often struggle with large amounts of occlusion and do not take inter-object effects into account. Vision as inverse graphics is a promising concept for detai…
▽ More
Robotic systems often require precise scene analysis capabilities, especially in unstructured, cluttered situations, as occurring in human-made environments. While current deep-learning based methods yield good estimates of object poses, they often struggle with large amounts of occlusion and do not take inter-object effects into account. Vision as inverse graphics is a promising concept for detailed scene analysis. A key element for this idea is a method for inferring scene parameter updates from the rasterized 2D scene. However, the rasterization process is notoriously difficult to invert, both due to the projection and occlusion process, but also due to secondary effects such as lighting or reflections. We propose to remove the latter from the process by map** the rasterized image into an abstract feature space learned in a self-supervised way from pixel correspondences. Using only a light-weight inverse rendering module, this allows us to refine 6D object pose estimations in highly cluttered scenes by optimizing a simple pixel-wise difference in the abstract image representation. We evaluate our approach on the challenging YCB-Video dataset, where it yields large improvements and demonstrates a large basin of attraction towards the correct object poses.
△ Less
Submitted 8 October, 2019;
originally announced October 2019.
-
Autonomous Bimanual Functional Regras** of Novel Object Class Instances
Authors:
Dmytro Pavlichenko,
Diego Rodriguez,
Christian Lenz,
Max Schwarz,
Sven Behnke
Abstract:
In human-made scenarios, robots need to be able to fully operate objects in their surroundings, i.e., objects are required to be functionally grasped rather than only picked. This imposes very strict constraints on the object pose such that a direct grasp can be performed. Inspired by the anthropomorphic nature of humanoid robots, we propose an approach that first grasps an object with one hand, o…
▽ More
In human-made scenarios, robots need to be able to fully operate objects in their surroundings, i.e., objects are required to be functionally grasped rather than only picked. This imposes very strict constraints on the object pose such that a direct grasp can be performed. Inspired by the anthropomorphic nature of humanoid robots, we propose an approach that first grasps an object with one hand, obtaining full control over its pose, and performs the functional grasp with the second hand subsequently. Thus, we develop a fully autonomous pipeline for dual-arm functional regras** of novel familiar objects, i.e., objects never seen before that belong to a known object category, e.g., spray bottles. This process involves semantic segmentation, object pose estimation, non-rigid mesh registration, grasp sampling, handover pose generation and in-hand pose refinement. The latter is used to compensate for the unpredictable object movement during the first grasp. The approach is applied to a human-like upper body. To the best knowledge of the authors, this is the first system that exhibits autonomous bimanual functional regras** capabilities. We demonstrate that our system yields reliable success rates and can be applied on-line to real-world tasks using only one off-the-shelf RGB-D sensor.
△ Less
Submitted 1 October, 2019;
originally announced October 2019.
-
Flexible Disaster Response of Tomorrow -- Final Presentation and Evaluation of the CENTAURO System
Authors:
Tobias Klamt,
Diego Rodriguez,
Lorenzo Baccelliere,
Xi Chen,
Domenico Chiaradia,
Torben Cichon,
Massimiliano Gabardi,
Paolo Guria,
Karl Holmquist,
Malgorzata Kamedula,
Hakan Karaoguz,
Navvab Kashiri,
Arturo Laurenzi,
Christian Lenz,
Daniele Leonardis,
Enrico Mingo Hoffman,
Luca Muratore,
Dmytro Pavlichenko,
Francesco Porcini,
Zeyu Ren,
Fabian Schilling,
Max Schwarz,
Massimiliano Solazzi,
Michael Felsberg,
Antonio Frisoli
, et al. (7 additional authors not shown)
Abstract:
Mobile manipulation robots have high potential to support rescue forces in disaster-response missions. Despite the difficulties imposed by real-world scenarios, robots are promising to perform mission tasks from a safe distance. In the CENTAURO project, we developed a disaster-response system which consists of the highly flexible Centauro robot and suitable control interfaces including an immersiv…
▽ More
Mobile manipulation robots have high potential to support rescue forces in disaster-response missions. Despite the difficulties imposed by real-world scenarios, robots are promising to perform mission tasks from a safe distance. In the CENTAURO project, we developed a disaster-response system which consists of the highly flexible Centauro robot and suitable control interfaces including an immersive tele-presence suit and support-operator controls on different levels of autonomy.
In this article, we give an overview of the final CENTAURO system. In particular, we explain several high-level design decisions and how those were derived from requirements and extensive experience of Kerntechnische Hilfsdienst GmbH, Karlsruhe, Germany (KHG). We focus on components which were recently integrated and report about a systematic evaluation which demonstrated system capabilities and revealed valuable insights.
△ Less
Submitted 19 September, 2019;
originally announced September 2019.
-
Utilizing Temporal Information in Deep Convolutional Network for Efficient Soccer Ball Detection and Tracking
Authors:
Anna Kukleva,
Mohammad Asif Khan,
Hafez Farazi,
Sven Behnke
Abstract:
Soccer ball detection is identified as one of the critical challenges in the RoboCup competition. It requires an efficient vision system capable of handling the task of detection with high precision and recall and providing robust and low inference time. In this work, we present a novel convolutional neural network (CNN) approach to detect the soccer ball in an image sequence. In contrast to the e…
▽ More
Soccer ball detection is identified as one of the critical challenges in the RoboCup competition. It requires an efficient vision system capable of handling the task of detection with high precision and recall and providing robust and low inference time. In this work, we present a novel convolutional neural network (CNN) approach to detect the soccer ball in an image sequence. In contrast to the existing methods where only the current frame or an image is used for the detection, we make use of the history of frames. Using history allows to efficiently track the ball in situations where the ball disappears or gets partially occluded in some of the frames. Our approach exploits spatio-temporal correlation and detects the ball based on the trajectory of its movements. We present our results with three convolutional methods, namely temporal convolutional networks (TCN), ConvLSTM, and ConvGRU. We first solve the detection task for an image using fully convolutional encoder-decoder architecture, and later, we use it as an input to our temporal models and jointly learn the detection task in sequences of images. We evaluate all our experiments on a novel dataset prepared as a part of this work. Furthermore, we present empirical results to support the effectiveness of using the history of the ball in challenging scenarios.
△ Less
Submitted 6 September, 2019; v1 submitted 5 September, 2019;
originally announced September 2019.
-
NimbRo Robots Winning RoboCup 2018 Humanoid AdultSize Soccer Competitions
Authors:
Hafez Farazi,
Grzegorz Ficht,
Philipp Allgeuer,
Dmytro Pavlichenko,
Diego Rodriguez,
Andre Brandenburger,
Mojtaba Hosseini,
Sven Behnke
Abstract:
Over the past few years, the Humanoid League rules have changed towards more realistic and challenging game environments, which encourage teams to advance their robot soccer performances. In this paper, we present the software and hardware designs that led our team NimbRo to win the competitions in the AdultSize league -- including the soccer tournament, the drop-in games, and the technical challe…
▽ More
Over the past few years, the Humanoid League rules have changed towards more realistic and challenging game environments, which encourage teams to advance their robot soccer performances. In this paper, we present the software and hardware designs that led our team NimbRo to win the competitions in the AdultSize league -- including the soccer tournament, the drop-in games, and the technical challenges at RoboCup 2018 in Montreal. Altogether, this resulted in NimbRo winning the Best Humanoid Award. In particular, we describe our deep-learning approaches for visual perception and our new fully 3D printed robot NimbRo-OP2X.
△ Less
Submitted 5 September, 2019;
originally announced September 2019.
-
Two-Staged Acoustic Modeling Adaption for Robust Speech Recognition by the Example of German Oral History Interviews
Authors:
Michael Gref,
Christoph Schmidt,
Sven Behnke,
Joachim Köhler
Abstract:
In automatic speech recognition, often little training data is available for specific challenging tasks, but training of state-of-the-art automatic speech recognition systems requires large amounts of annotated speech. To address this issue, we propose a two-staged approach to acoustic modeling that combines noise and reverberation data augmentation with transfer learning to robustly address chall…
▽ More
In automatic speech recognition, often little training data is available for specific challenging tasks, but training of state-of-the-art automatic speech recognition systems requires large amounts of annotated speech. To address this issue, we propose a two-staged approach to acoustic modeling that combines noise and reverberation data augmentation with transfer learning to robustly address challenges such as difficult acoustic recording conditions, spontaneous speech, and speech of elderly people. We evaluate our approach using the example of German oral history interviews, where a relative average reduction of the word error rate by 19.3% is achieved.
△ Less
Submitted 19 August, 2019;
originally announced August 2019.
-
Directional TSDF: Modeling Surface Orientation for Coherent Meshes
Authors:
Malte Splietker,
Sven Behnke
Abstract:
Real-time 3D reconstruction from RGB-D sensor data plays an important role in many robotic applications, such as object modeling and map**. The popular method of fusing depth information into a truncated signed distance function (TSDF) and applying the marching cubes algorithm for mesh extraction has severe issues with thin structures: not only does it lead to loss of accuracy, but it can genera…
▽ More
Real-time 3D reconstruction from RGB-D sensor data plays an important role in many robotic applications, such as object modeling and map**. The popular method of fusing depth information into a truncated signed distance function (TSDF) and applying the marching cubes algorithm for mesh extraction has severe issues with thin structures: not only does it lead to loss of accuracy, but it can generate completely wrong surfaces. To address this, we propose the directional TSDF - a novel representation that stores opposite surfaces separate from each other. The marching cubes algorithm is modified accordingly to retrieve a coherent mesh representation. We further increase the accuracy by using surface gradient-based ray casting for fusing new measurements. We show that our method outperforms state-of-the-art TSDF reconstruction algorithms in mesh accuracy.
△ Less
Submitted 14 August, 2019;
originally announced August 2019.
-
A VR System for Immersive Teleoperation and Live Exploration with a Mobile Robot
Authors:
Patrick Stotko,
Stefan Krumpen,
Max Schwarz,
Christian Lenz,
Sven Behnke,
Reinhard Klein,
Michael Weinmann
Abstract:
Applications like disaster management and industrial inspection often require experts to enter contaminated places. To circumvent the need for physical presence, it is desirable to generate a fully immersive individual live teleoperation experience. However, standard video-based approaches suffer from a limited degree of immersion and situation awareness due to the restriction to the camera view,…
▽ More
Applications like disaster management and industrial inspection often require experts to enter contaminated places. To circumvent the need for physical presence, it is desirable to generate a fully immersive individual live teleoperation experience. However, standard video-based approaches suffer from a limited degree of immersion and situation awareness due to the restriction to the camera view, which impacts the navigation. In this paper, we present a novel VR-based practical system for immersive robot teleoperation and scene exploration. While being operated through the scene, a robot captures RGB-D data that is streamed to a SLAM-based live multi-client telepresence system. Here, a global 3D model of the already captured scene parts is reconstructed and streamed to the individual remote user clients where the rendering for e.g. head-mounted display devices (HMDs) is performed. We introduce a novel lightweight robot client component which transmits robot-specific data and enables a quick integration into existing robotic systems. This way, in contrast to first-person exploration systems, the operators can explore and navigate in the remote site completely independent of the current position and view of the capturing robot, complementing traditional input devices for teleoperation. We provide a proof-of-concept implementation and demonstrate the capabilities as well as the performance of our system regarding interactive object measurements and bandwidth-efficient data streaming and visualization. Furthermore, we show its benefits over purely video-based teleoperation in a user study revealing a higher degree of situation awareness and a more precise navigation in challenging environments.
△ Less
Submitted 3 February, 2020; v1 submitted 8 August, 2019;
originally announced August 2019.
-
Interpretable and Fine-Grained Visual Explanations for Convolutional Neural Networks
Authors:
Jörg Wagner,
Jan Mathias Köhler,
Tobias Gindele,
Leon Hetzel,
Jakob Thaddäus Wiedemer,
Sven Behnke
Abstract:
To verify and validate networks, it is essential to gain insight into their decisions, limitations as well as possible shortcomings of training data. In this work, we propose a post-hoc, optimization based visual explanation method, which highlights the evidence in the input image for a specific prediction. Our approach is based on a novel technique to defend against adversarial evidence (i.e. fau…
▽ More
To verify and validate networks, it is essential to gain insight into their decisions, limitations as well as possible shortcomings of training data. In this work, we propose a post-hoc, optimization based visual explanation method, which highlights the evidence in the input image for a specific prediction. Our approach is based on a novel technique to defend against adversarial evidence (i.e. faulty evidence due to artefacts) by filtering gradients during optimization. The defense does not depend on human-tuned parameters. It enables explanations which are both fine-grained and preserve the characteristics of images, such as edges and colors. The explanations are interpretable, suited for visualizing detailed evidence and can be tested as they are valid model inputs. We qualitatively and quantitatively evaluate our approach on a multitude of models and datasets.
△ Less
Submitted 7 August, 2019;
originally announced August 2019.
-
Fast Time-optimal Avoidance of Moving Obstacles for High-Speed MAV Flight
Authors:
Marius Beul,
Sven Behnke
Abstract:
In this work, we propose a method to efficiently compute smooth, time-optimal trajectories for micro aerial vehicles (MAVs) evading a moving obstacle. Our approach first computes an n-dimensional trajectory from the start- to an arbitrary target state including position, velocity and acceleration. It respects input- and state-constraints and is thus dynamically feasible. The trajectory is then eff…
▽ More
In this work, we propose a method to efficiently compute smooth, time-optimal trajectories for micro aerial vehicles (MAVs) evading a moving obstacle. Our approach first computes an n-dimensional trajectory from the start- to an arbitrary target state including position, velocity and acceleration. It respects input- and state-constraints and is thus dynamically feasible. The trajectory is then efficiently checked for collisions, exploiting the piecewise polynomial formulation. If collisions occur, viastates are inserted into the trajectory to circumvent the obstacle and still maintain time-optimality. These viastates are described by position, velocity, and acceleration. The evaluation shows that the computational demands of the proposed method are minimal such that obstacle avoidance can begin within few milliseconds. Optimality of generated trajectories, combined with the ability for frequent online re-planning from non-hover initial conditions, make the approach well suited for evasion of suddenly perceived obstacles during fast flight.
△ Less
Submitted 6 August, 2019;
originally announced August 2019.
-
Remote Mobile Manipulation with the Centauro Robot: Full-body Telepresence and Autonomous Operator Assistance
Authors:
Tobias Klamt,
Max Schwarz,
Christian Lenz,
Lorenzo Baccelliere,
Domenico Buongiorno,
Torben Cichon,
Antonio Di Guardo,
David Droeschel,
Massimiliano Gabardi,
Malgorzata Kamedula,
Navvab Kashiri,
Arturo Laurenzi,
Daniele Leonardis,
Luca Muratore,
Dmytro Pavlichenko,
Arul Selvam Periyasamy,
Diego Rodriguez,
Massimiliano Solazzi,
Antonio Frisoli,
Michael Gustmann,
Jürgen Roßmann,
Uwe Süss,
Nikos G. Tsagarakis,
Sven Behnke
Abstract:
Solving mobile manipulation tasks in inaccessible and dangerous environments is an important application of robots to support humans. Example domains are construction and maintenance of manned and unmanned stations on the moon and other planets. Suitable platforms require flexible and robust hardware, a locomotion approach that allows for navigating a wide variety of terrains, dexterous manipulati…
▽ More
Solving mobile manipulation tasks in inaccessible and dangerous environments is an important application of robots to support humans. Example domains are construction and maintenance of manned and unmanned stations on the moon and other planets. Suitable platforms require flexible and robust hardware, a locomotion approach that allows for navigating a wide variety of terrains, dexterous manipulation capabilities, and respective user interfaces. We present the CENTAURO system which has been designed for these requirements and consists of the Centauro robot and a set of advanced operator interfaces with complementary strength enabling the system to solve a wide range of realistic mobile manipulation tasks. The robot possesses a centaur-like body plan and is driven by torque-controlled compliant actuators. Four articulated legs ending in steerable wheels allow for omnidirectional driving as well as for making steps. An anthropomorphic upper body with two arms ending in five-finger hands enables human-like manipulation. The robot perceives its environment through a suite of multimodal sensors. The resulting platform complexity goes beyond the complexity of most known systems which puts the focus on a suitable operator interface. An operator can control the robot through a telepresence suit, which allows for flexibly solving a large variety of mobile manipulation tasks. Locomotion and manipulation functionalities on different levels of autonomy support the operation. The proposed user interfaces enable solving a wide variety of tasks without previous task-specific training. The integrated system is evaluated in numerous teleoperated experiments that are described along with lessons learned.
△ Less
Submitted 5 August, 2019;
originally announced August 2019.
-
Semi-Supervised Semantic Map** through Label Propagation with Semantic Texture Meshes
Authors:
Radu Alexandru Rosu,
Jan Quenzel,
Sven Behnke
Abstract:
Scene understanding is an important capability for robots acting in unstructured environments. While most SLAM approaches provide a geometrical representation of the scene, a semantic map is necessary for more complex interactions with the surroundings. Current methods treat the semantic map as part of the geometry which limits scalability and accuracy. We propose to represent the semantic map as…
▽ More
Scene understanding is an important capability for robots acting in unstructured environments. While most SLAM approaches provide a geometrical representation of the scene, a semantic map is necessary for more complex interactions with the surroundings. Current methods treat the semantic map as part of the geometry which limits scalability and accuracy. We propose to represent the semantic map as a geometrical mesh and a semantic texture coupled at independent resolution. The key idea is that in many environments the geometry can be greatly simplified without loosing fidelity, while semantic information can be stored at a higher resolution, independent of the mesh. We construct a mesh from depth sensors to represent the scene geometry and fuse information into the semantic texture from segmentations of individual RGB views of the scene. Making the semantics persistent in a global mesh enables us to enforce temporal and spatial consistency of the individual view predictions. For this, we propose an efficient method of establishing consensus between individual segmentations by iteratively retraining semantic segmentation with the information stored within the map and using the retrained segmentation to re-fuse the semantics. We demonstrate the accuracy and scalability of our approach by reconstructing semantic maps of scenes from NYUv2 and a scene spanning large buildings.
△ Less
Submitted 17 June, 2019;
originally announced June 2019.
-
Value Iteration Networks on Multiple Levels of Abstraction
Authors:
Daniel Schleich,
Tobias Klamt,
Sven Behnke
Abstract:
Learning-based methods are promising to plan robot motion without performing extensive search, which is needed by many non-learning approaches. Recently, Value Iteration Networks (VINs) received much interest since---in contrast to standard CNN-based architectures---they learn goal-directed behaviors which generalize well to unseen domains. However, VINs are restricted to small and low-dimensional…
▽ More
Learning-based methods are promising to plan robot motion without performing extensive search, which is needed by many non-learning approaches. Recently, Value Iteration Networks (VINs) received much interest since---in contrast to standard CNN-based architectures---they learn goal-directed behaviors which generalize well to unseen domains. However, VINs are restricted to small and low-dimensional domains, limiting their applicability to real-world planning problems.
To address this issue, we propose to extend VINs to representations with multiple levels of abstraction. While the vicinity of the robot is represented in sufficient detail, the representation gets spatially coarser with increasing distance from the robot. The information loss caused by the decreasing resolution is compensated by increasing the number of features representing a cell. We show that our approach is capable of solving significantly larger 2D grid world planning tasks than the original VIN implementation. In contrast to a multiresolution coarse-to-fine VIN implementation which does not employ additional descriptive features, our approach is capable of solving challenging environments, which demonstrates that the proposed method learns to encode useful information in the additional features. As an application for solving real-world planning tasks, we successfully employ our method to plan omnidirectional driving for a search-and-rescue robot in cluttered terrain.
△ Less
Submitted 1 July, 2019; v1 submitted 27 May, 2019;
originally announced May 2019.
-
SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences
Authors:
Jens Behley,
Martin Garbade,
Andres Milioto,
Jan Quenzel,
Sven Behnke,
Cyrill Stachniss,
Juergen Gall
Abstract:
Semantic scene understanding is important for various applications. In particular, self-driving cars need a fine-grained understanding of the surfaces and objects in their vicinity. Light detection and ranging (LiDAR) provides precise geometric information about the environment and is thus a part of the sensor suites of almost all self-driving cars. Despite the relevance of semantic scene understa…
▽ More
Semantic scene understanding is important for various applications. In particular, self-driving cars need a fine-grained understanding of the surfaces and objects in their vicinity. Light detection and ranging (LiDAR) provides precise geometric information about the environment and is thus a part of the sensor suites of almost all self-driving cars. Despite the relevance of semantic scene understanding for this application, there is a lack of a large dataset for this task which is based on an automotive LiDAR.
In this paper, we introduce a large dataset to propel research on laser-based semantic segmentation. We annotated all sequences of the KITTI Vision Odometry Benchmark and provide dense point-wise annotations for the complete $360^{o}$ field-of-view of the employed automotive LiDAR. We propose three benchmark tasks based on this dataset: (i) semantic segmentation of point clouds using a single scan, (ii) semantic segmentation using multiple past scans, and (iii) semantic scene completion, which requires to anticipate the semantic scene in the future. We provide baseline experiments and show that there is a need for more sophisticated models to efficiently tackle these tasks. Our dataset opens the door for the development of more advanced methods, but also provides plentiful data to investigate new research directions.
△ Less
Submitted 16 August, 2019; v1 submitted 2 April, 2019;
originally announced April 2019.
-
Learning Super-resolution 3D Segmentation of Plant Root MRI Images from Few Examples
Authors:
Ali Oguz Uzman,
Jannis Horn,
Sven Behnke
Abstract:
Analyzing plant roots is crucial to understand plant performance in different soil environments. While magnetic resonance imaging (MRI) can be used to obtain 3D images of plant roots, extracting the root structural model is challenging due to highly noisy soil environments and low-resolution of MRI images. To improve both contrast and resolution, we adapt the state-of-the-art method RefineNet for…
▽ More
Analyzing plant roots is crucial to understand plant performance in different soil environments. While magnetic resonance imaging (MRI) can be used to obtain 3D images of plant roots, extracting the root structural model is challenging due to highly noisy soil environments and low-resolution of MRI images. To improve both contrast and resolution, we adapt the state-of-the-art method RefineNet for 3D segmentation of the plant root MRI images in super-resolution. The networks are trained from few manual segmentations that are augmented by geometric transformations, realistic noise, and other variabilities. The resulting segmentations contain most root structures, including branches not extracted by the human annotator.
△ Less
Submitted 15 March, 2019;
originally announced March 2019.
-
Detection and Tracking of Small Objects in Sparse 3D Laser Range Data
Authors:
Jan Razlaw,
Jan Quenzel,
Sven Behnke
Abstract:
Detection and tracking of dynamic objects is a key feature for autonomous behavior in a continuously changing environment. With the increasing popularity and capability of micro aerial vehicles (MAVs) efficient algorithms have to be utilized to enable multi object tracking on limited hardware and data provided by lightweight sensors. We present a novel segmentation approach based on a combination…
▽ More
Detection and tracking of dynamic objects is a key feature for autonomous behavior in a continuously changing environment. With the increasing popularity and capability of micro aerial vehicles (MAVs) efficient algorithms have to be utilized to enable multi object tracking on limited hardware and data provided by lightweight sensors. We present a novel segmentation approach based on a combination of median filters and an efficient pipeline for detection and tracking of small objects within sparse point clouds generated by a Velodyne VLP-16 sensor. We achieve real-time performance on a single core of our MAV hardware by exploiting the inherent structure of the data. Our approach is evaluated on simulated and real scans of in- and outdoor environments, obtaining results comparable to the state of the art. Additionally, we provide an application for filtering the dynamic and map** the static part of the data, generating further insights into the performance of the pipeline on unlabeled data.
△ Less
Submitted 14 March, 2019;
originally announced March 2019.
-
Search-based 3D Planning and Trajectory Optimization for Safe Micro Aerial Vehicle Flight Under Sensor Visibility Constraints
Authors:
Matthias Nieuwenhuisen,
Sven Behnke
Abstract:
Safe navigation of Micro Aerial Vehicles (MAVs) requires not only obstacle-free flight paths according to a static environment map, but also the perception of and reaction to previously unknown and dynamic objects. This implies that the onboard sensors cover the current flight direction. Due to the limited payload of MAVs, full sensor coverage of the environment has to be traded off with flight ti…
▽ More
Safe navigation of Micro Aerial Vehicles (MAVs) requires not only obstacle-free flight paths according to a static environment map, but also the perception of and reaction to previously unknown and dynamic objects. This implies that the onboard sensors cover the current flight direction. Due to the limited payload of MAVs, full sensor coverage of the environment has to be traded off with flight time. Thus, often only a part of the environment is covered.
We present a combined allocentric complete planning and trajectory optimization approach taking these sensor visibility constraints into account. The optimized trajectories yield flight paths within the apex angle of a Velodyne Puck Lite 3D laser scanner enabling low-level collision avoidance to perceive obstacles in the flight direction. Furthermore, the optimized trajectories take the flight dynamics into account and contain the velocities and accelerations along the path.
We evaluate our approach with a DJI Matrice 600 MAV and in simulation employing hardware-in-the-loop.
△ Less
Submitted 28 August, 2019; v1 submitted 12 March, 2019;
originally announced March 2019.
-
Complex Valued Gated Auto-encoder for Video Frame Prediction
Authors:
Niloofar Azizi,
Nils Wandel,
Sven Behnke
Abstract:
In recent years, complex valued artificial neural networks have gained increasing interest as they allow neural networks to learn richer representations while potentially incorporating less parameters. Especially in the domain of computer graphics, many traditional operations rely heavily on computations in the complex domain, thus complex valued neural networks apply naturally. In this paper, we…
▽ More
In recent years, complex valued artificial neural networks have gained increasing interest as they allow neural networks to learn richer representations while potentially incorporating less parameters. Especially in the domain of computer graphics, many traditional operations rely heavily on computations in the complex domain, thus complex valued neural networks apply naturally. In this paper, we perform frame predictions in video sequences using a complex valued gated auto-encoder. First, our method is motivated showing how the Fourier transform can be seen as the basis for translational operations. Then, we present how a complex neural network can learn such transformations and compare its performance and parameter efficiency to a real-valued gated autoencoder. Furthermore, we show how extending both - the real and the complex valued - neural networks by using convolutional units can significantly improve prediction performance and parameter efficiency. The networks are assessed on a moving noise and a bouncing ball dataset.
△ Less
Submitted 8 March, 2019;
originally announced March 2019.
-
Towards Learning Abstract Representations for Locomotion Planning in High-dimensional State Spaces
Authors:
Tobias Klamt,
Sven Behnke
Abstract:
Ground robots which are able to navigate a variety of terrains are needed in many domains. One of the key aspects is the capability to adapt to the ground structure, which can be realized through movable body parts coming along with additional degrees of freedom (DoF). However, planning respective locomotion is challenging since suitable representations result in large state spaces. Employing an a…
▽ More
Ground robots which are able to navigate a variety of terrains are needed in many domains. One of the key aspects is the capability to adapt to the ground structure, which can be realized through movable body parts coming along with additional degrees of freedom (DoF). However, planning respective locomotion is challenging since suitable representations result in large state spaces. Employing an additional abstract representation---which is coarser, lower-dimensional, and semantically enriched---can support the planning.
While a desired robot representation and action set of such an abstract representation can be easily defined, the cost function requires large tuning efforts. We propose a method to represent the cost function as a CNN. Training of the network is done on generated artificial data, while it generalizes well to the abstraction of real world scenes. We further apply our method to the problem of search-based planning of hybrid driving-step** locomotion. The abstract representation is used as a powerful informed heuristic which accelerates planning by multiple orders of magnitude.
△ Less
Submitted 6 March, 2019;
originally announced March 2019.
-
Frequency Domain Transformer Networks for Video Prediction
Authors:
Hafez Farazi,
Sven Behnke
Abstract:
The task of video prediction is forecasting the next frames given some previous frames. Despite much recent progress, this task is still challenging mainly due to high nonlinearity in the spatial domain. To address this issue, we propose a novel architecture, Frequency Domain Transformer Network (FDTN), which is an end-to-end learnable model that estimates and uses the transformations of the signa…
▽ More
The task of video prediction is forecasting the next frames given some previous frames. Despite much recent progress, this task is still challenging mainly due to high nonlinearity in the spatial domain. To address this issue, we propose a novel architecture, Frequency Domain Transformer Network (FDTN), which is an end-to-end learnable model that estimates and uses the transformations of the signal in the frequency domain. Experimental evaluations show that this approach can outperform some widely used video prediction methods like Video Ladder Network (VLN) and Predictive Gated Pyramids (PGP).
△ Less
Submitted 1 March, 2019;
originally announced March 2019.
-
Autonomous Dual-Arm Manipulation of Familiar Objects
Authors:
Dmytro Pavlichenko,
Diego Rodriguez,
Max Schwarz,
Christian Lenz,
Arul Selvam Periyasamy,
Sven Behnke
Abstract:
Autonomous dual-arm manipulation is an essential skill to deploy robots in unstructured scenarios. However, this is a challenging undertaking, particularly in terms of perception and planning. Unstructured scenarios are full of objects with different shapes and appearances that have to be grasped in a very specific manner so they can be functionally used. In this paper we present an integrated app…
▽ More
Autonomous dual-arm manipulation is an essential skill to deploy robots in unstructured scenarios. However, this is a challenging undertaking, particularly in terms of perception and planning. Unstructured scenarios are full of objects with different shapes and appearances that have to be grasped in a very specific manner so they can be functionally used. In this paper we present an integrated approach to perform dual-arm pick tasks autonomously. Our method consists of semantic segmentation, object pose estimation, deformable model registration, grasp planning and arm trajectory optimization. The entire pipeline can be executed on-board and is suitable for on-line gras** scenarios. For this, our approach makes use of accumulated knowledge expressed as convolutional neural network models and low-dimensional latent shape spaces. For manipulating objects, we propose a stochastic trajectory optimization that includes a kinematic chain closure constraint. Evaluation in simulation and on the real robot corroborates the feasibility and applicability of the proposed methods on a task of picking up unknown watering cans and drills using both arms.
△ Less
Submitted 21 November, 2018;
originally announced November 2018.
-
Team NimbRo at MBZIRC 2017: Fast Landing on a Moving Target and Treasure Hunting with a Team of MAVs
Authors:
Marius Beul,
Matthias Nieuwenhuisen,
Jan Quenzel,
Radu Alexandru Rosu,
Jannis Horn,
Dmytro Pavlichenko,
Sebastian Houben,
Sven Behnke
Abstract:
The Mohamed Bin Zayed International Robotics Challenge (MBZIRC) 2017 has defined ambitious new benchmarks to advance the state-of-the-art in autonomous operation of ground-based and flying robots. This article covers our approaches to solve the two challenges that involved micro aerial vehicles (MAV). Challenge 1 required reliable target perception, fast trajectory planning, and stable control of…
▽ More
The Mohamed Bin Zayed International Robotics Challenge (MBZIRC) 2017 has defined ambitious new benchmarks to advance the state-of-the-art in autonomous operation of ground-based and flying robots. This article covers our approaches to solve the two challenges that involved micro aerial vehicles (MAV). Challenge 1 required reliable target perception, fast trajectory planning, and stable control of an MAV in order to land on a moving vehicle. Challenge 3 demanded a team of MAVs to perform a search and transportation task, coined "Treasure Hunt", which required mission planning and multi-robot coordination as well as adaptive control to account for the additional object weight. We describe our base MAV setup and the challenge-specific extensions, cover the camera-based perception, explain control and trajectory-planning in detail, and elaborate on mission planning and team coordination. We evaluated our systems in simulation as well as with real-robot experiments during the competition in Abu Dhabi. With our system, we-as part of the larger team NimbRo-won the MBZIRC Grand Challenge and achieved a third place in both subchallenges involving flying robots.
△ Less
Submitted 2 January, 2019; v1 submitted 13 November, 2018;
originally announced November 2018.
-
NimbRo-OP2X: Adult-sized Open-source 3D Printed Humanoid Robot
Authors:
Grzegorz Ficht,
Hafez Farazi,
André Brandenburger,
Diego Rodriguez,
Dmytro Pavlichenko,
Philipp Allgeuer,
Mojtaba Hosseini,
Sven Behnke
Abstract:
Humanoid robotics research depends on capable robot platforms, but recently developed advanced platforms are often not available to other research groups, expensive, dangerous to operate, or closed-source. The lack of available platforms forces researchers to work with smaller robots, which have less strict dynamic constraints or with simulations, which lack many real-world effects. We developed N…
▽ More
Humanoid robotics research depends on capable robot platforms, but recently developed advanced platforms are often not available to other research groups, expensive, dangerous to operate, or closed-source. The lack of available platforms forces researchers to work with smaller robots, which have less strict dynamic constraints or with simulations, which lack many real-world effects. We developed NimbRo-OP2X to address this need. At a height of 135 cm our robot is large enough to interact in a human environment. Its low weight of only 19 kg makes the operation of the robot safe and easy, as no special operational equipment is necessary. Our robot is equipped with a fast onboard computer and a GPU to accelerate parallel computations. We extend our already open-source software by a deep-learning based vision system and gait parameter optimisation. The NimbRo-OP2X was evaluated during RoboCup 2018 in Montréal, Canada, where it won all possible awards in the Humanoid AdultSize class.
△ Less
Submitted 19 October, 2018;
originally announced October 2018.
-
Online Balanced Motion Generation for Humanoid Robots
Authors:
Grzegorz Ficht,
Sven Behnke
Abstract:
Reducing the complexity of higher order problems can enable solving them in analytical ways. In this paper, we propose an analytic whole body motion generator for humanoid robots. Our approach targets inexpensive platforms that possess position controlled joints and have limited feedback capabilities. By analysing the mass distribution in a humanoid-like body, we find relations between limb moveme…
▽ More
Reducing the complexity of higher order problems can enable solving them in analytical ways. In this paper, we propose an analytic whole body motion generator for humanoid robots. Our approach targets inexpensive platforms that possess position controlled joints and have limited feedback capabilities. By analysing the mass distribution in a humanoid-like body, we find relations between limb movement and their respective CoM positions. A full pose of a humanoid robot is then described with five point-masses, with one attached to the trunk and the remaining four assigned to each limb. The weighted sum of these masses in combination with a contact point form an inverted pendulum. We then generate statically stable poses by specifying a desired upright pendulum orientation, and any desired trunk orientation. Limb and trunk placement strategies are utilised to meet the reference CoM position. A set of these poses is interpolated to achieve stable whole body motions. The approach is evaluated by performing several motions with an igus Humanoid Open Platform robot. We demonstrate the extendability of the approach by applying basic feedback mechanisms for disturbance rejection and tracking error minimisation.
△ Less
Submitted 19 October, 2018;
originally announced October 2018.
-
Learning Postural Synergies for Categorical Gras** through Shape Space Registration
Authors:
Diego Rodriguez,
Antonio Di Guardo,
Antonio Frisoli,
Sven Behnke
Abstract:
Every time a person encounters an object with a given degree of familiarity, he/she immediately knows how to grasp it. Adaptation of the movement of the hand according to the object geometry happens effortlessly because of the accumulated knowledge of previous experiences gras** similar objects. In this paper, we present a novel method for inferring grasp configurations based on the object shape…
▽ More
Every time a person encounters an object with a given degree of familiarity, he/she immediately knows how to grasp it. Adaptation of the movement of the hand according to the object geometry happens effortlessly because of the accumulated knowledge of previous experiences gras** similar objects. In this paper, we present a novel method for inferring grasp configurations based on the object shape. Gras** knowledge is gathered in a synergy space of the robotic hand built by following a human gras** taxonomy. The synergy space is constructed through human demonstrations employing a exoskeleton that provides force feedback, which provides the advantage of evaluating the quality of the grasp. The shape descriptor is obtained by means of a categorical non-rigid registration that encodes typical intra-class variations. This approach is especially suitable for on-line scenarios where only a portion of the object's surface is observable. This method is demonstrated through simulation and real robot experiments by gras** objects never seen before by the robot.
△ Less
Submitted 18 October, 2018;
originally announced October 2018.
-
Feature-based visual odometry prior for real-time semi-dense stereo SLAM
Authors:
Nicola Krombach,
David Droeschel,
Sebastian Houben,
Sven Behnke
Abstract:
Robust and fast motion estimation and map** is a key prerequisite for autonomous operation of mobile robots. The goal of performing this task solely on a stereo pair of video cameras is highly demanding and bears conflicting objectives: on one hand, the motion has to be tracked fast and reliably, on the other hand, high-level functions like navigation and obstacle avoidance depend crucially on a…
▽ More
Robust and fast motion estimation and map** is a key prerequisite for autonomous operation of mobile robots. The goal of performing this task solely on a stereo pair of video cameras is highly demanding and bears conflicting objectives: on one hand, the motion has to be tracked fast and reliably, on the other hand, high-level functions like navigation and obstacle avoidance depend crucially on a complete and accurate environment representation. In this work, we propose a two-layer approach for visual odometry and SLAM with stereo cameras that runs in real-time and combines feature-based matching with semi-dense direct image alignment. Our method initializes semi-dense depth estimation, which is computationally expensive, from motion that is tracked by a fast but robust keypoint-based method. Experiments on public benchmark and proprietary datasets show that our approach is faster than state-of-the-art methods without losing accuracy and yields comparable map building capabilities. Moreover, our approach is shown to handle large inter-frame motion and illumination changes much more robustly than its direct counterparts.
△ Less
Submitted 17 October, 2018;
originally announced October 2018.
-
Efficient Continuous-Time SLAM for 3D Lidar-Based Online Map**
Authors:
David Droeschel,
Sven Behnke
Abstract:
Modern 3D laser-range scanners have a high data rate, making online simultaneous localization and map** (SLAM) computationally challenging. Recursive state estimation techniques are efficient but commit to a state estimate immediately after a new scan is made, which may lead to misalignments of measurements. We present a 3D SLAM approach that allows for refining alignments during online map**.…
▽ More
Modern 3D laser-range scanners have a high data rate, making online simultaneous localization and map** (SLAM) computationally challenging. Recursive state estimation techniques are efficient but commit to a state estimate immediately after a new scan is made, which may lead to misalignments of measurements. We present a 3D SLAM approach that allows for refining alignments during online map**. Our method is based on efficient local map** and a hierarchical optimization back-end. Measurements of a 3D laser scanner are aggregated in local multiresolution maps by means of surfel-based registration. The local maps are used in a multi-level graph for allocentric map** and localization. In order to incorporate corrections when refining the alignment, the individual 3D scans in the local map are modeled as a sub-graph and graph optimization is performed to account for drift and misalignments in the local maps. Furthermore, in each sub-graph, a continuous-time representation of the sensor trajectory allows to correct measurements between scan poses. We evaluate our approach in multiple experiments by showing qualitative results. Furthermore, we quantify the map quality by an entropy-based measure.
△ Less
Submitted 16 October, 2018;
originally announced October 2018.
-
Real-Time Visual Tracking and Identification for a Team of Homogeneous Humanoid Robots
Authors:
Hafez Farazi,
Sven Behnke
Abstract:
The use of a team of humanoid robots to collaborate in completing a task is an increasingly important field of research. One of the challenges in achieving collaboration, is mutual identification and tracking of the robots. This work presents a real-time vision-based approach to the detection and tracking of robots of known appearance, based on the images captured by a stationary robot. A Histogra…
▽ More
The use of a team of humanoid robots to collaborate in completing a task is an increasingly important field of research. One of the challenges in achieving collaboration, is mutual identification and tracking of the robots. This work presents a real-time vision-based approach to the detection and tracking of robots of known appearance, based on the images captured by a stationary robot. A Histogram of Oriented Gradients descriptor is used to detect the robots and the robot headings are estimated by a multiclass classifier. The tracked robots report their own heading estimate from magnetometer readings. For tracking, a cost function based on position and heading is applied to each of the tracklets, and a globally optimal labeling of the detected robots is found using the Hungarian algorithm. The complete identification and tracking system was tested using two igus Humanoid Open Platform robots on a soccer field. We expect that a similar system can be used with other humanoid robots, such as Nao and DARwIn-OP
△ Less
Submitted 16 October, 2018; v1 submitted 15 October, 2018;
originally announced October 2018.
-
Bipedal Walking with Corrective Actions in the Tilt Phase Space
Authors:
Philipp Allgeuer,
Sven Behnke
Abstract:
Many methods exist for a bipedal robot to keep its balance while walking. In addition to step size and timing, other strategies are possible that influence the stability of the robot without interfering with the target direction and speed of locomotion. This paper introduces a multifaceted feedback controller that uses numerous different feedback mechanisms, collectively termed corrective actions,…
▽ More
Many methods exist for a bipedal robot to keep its balance while walking. In addition to step size and timing, other strategies are possible that influence the stability of the robot without interfering with the target direction and speed of locomotion. This paper introduces a multifaceted feedback controller that uses numerous different feedback mechanisms, collectively termed corrective actions, to stabilise a core keypoint-based gait. The feedback controller is experimentally effective, yet free of any physical model of the robot, very computationally inexpensive, and requires only a single 6-axis IMU sensor. Due to these low requirements, the approach is deemed to be highly portable between robots, and was specifically also designed to target lower cost robots that have suboptimal sensing, actuation and computational resources. The IMU data is used to estimate the yaw-independent tilt orientation of the robot, expressed in the so-called tilt phase space, and is the source of all feedback provided by the controller. Experimental validation is performed in simulation as well as on real robot hardware.
△ Less
Submitted 12 October, 2018;
originally announced October 2018.
-
Tilt Rotations and the Tilt Phase Space
Authors:
Philipp Allgeuer,
Sven Behnke
Abstract:
In this paper, the intuitive idea of tilt is formalised into the rigorous concept of tilt rotations. This is motivated by the high relevance that pure tilt rotations have in the analysis of balancing bodies in 3D, and their applicability to the analysis of certain types of contacts. The notion of a 'tilt rotation' is first precisely defined, before multiple parameterisations thereof are presented…
▽ More
In this paper, the intuitive idea of tilt is formalised into the rigorous concept of tilt rotations. This is motivated by the high relevance that pure tilt rotations have in the analysis of balancing bodies in 3D, and their applicability to the analysis of certain types of contacts. The notion of a 'tilt rotation' is first precisely defined, before multiple parameterisations thereof are presented for mathematical analysis. It is demonstrated how such rotations can be represented in the so-called tilt phase space, which as a vector space allows for a meaningful definition of commutative addition. The properties of both tilt rotations and the tilt phase space are also extensively explored, including in the areas of spherical linear interpolation, rotational velocities, rotation composition and rotation decomposition.
△ Less
Submitted 12 October, 2018;
originally announced October 2018.
-
Online Visual Robot Tracking and Identification using Deep LSTM Networks
Authors:
Hafez Farazi,
Sven Behnke
Abstract:
Collaborative robots working on a common task are necessary for many applications. One of the challenges for achieving collaboration in a team of robots is mutual tracking and identification. We present a novel pipeline for online visionbased detection, tracking and identification of robots with a known and identical appearance. Our method runs in realtime on the limited hardware of the observer r…
▽ More
Collaborative robots working on a common task are necessary for many applications. One of the challenges for achieving collaboration in a team of robots is mutual tracking and identification. We present a novel pipeline for online visionbased detection, tracking and identification of robots with a known and identical appearance. Our method runs in realtime on the limited hardware of the observer robot. Unlike previous works addressing robot tracking and identification, we use a data-driven approach based on recurrent neural networks to learn relations between sequential inputs and outputs. We formulate the data association problem as multiple classification problems. A deep LSTM network was trained on a simulated dataset and fine-tuned on small set of real data. Experiments on two challenging datasets, one synthetic and one real, which include long-term occlusions, show promising results.
△ Less
Submitted 16 October, 2018; v1 submitted 11 October, 2018;
originally announced October 2018.
-
Location Dependency in Video Prediction
Authors:
Niloofar Azizi,
Hafez Farazi,
Sven Behnke
Abstract:
Deep convolutional neural networks are used to address many computer vision problems, including video prediction. The task of video prediction requires analyzing the video frames, temporally and spatially, and constructing a model of how the environment evolves. Convolutional neural networks are spatially invariant, though, which prevents them from modeling location-dependent patterns. In this wor…
▽ More
Deep convolutional neural networks are used to address many computer vision problems, including video prediction. The task of video prediction requires analyzing the video frames, temporally and spatially, and constructing a model of how the environment evolves. Convolutional neural networks are spatially invariant, though, which prevents them from modeling location-dependent patterns. In this work, the authors propose location-biased convolutional layers to overcome this limitation. The effectiveness of location bias is evaluated on two architectures: Video Ladder Network (VLN) and Convolutional redictive Gating Pyramid (Conv-PGP). The results indicate that encoding location-dependent features is crucial for the task of video prediction. Our proposed methods significantly outperform spatially invariant models.
△ Less
Submitted 16 October, 2018; v1 submitted 11 October, 2018;
originally announced October 2018.
-
Functionally Modular and Interpretable Temporal Filtering for Robust Segmentation
Authors:
Jörg Wagner,
Volker Fischer,
Michael Herman,
Sven Behnke
Abstract:
The performance of autonomous systems heavily relies on their ability to generate a robust representation of the environment. Deep neural networks have greatly improved vision-based perception systems but still fail in challenging situations, e.g. sensor outages or heavy weather. These failures are often introduced by data-inherent perturbations, which significantly reduce the information provided…
▽ More
The performance of autonomous systems heavily relies on their ability to generate a robust representation of the environment. Deep neural networks have greatly improved vision-based perception systems but still fail in challenging situations, e.g. sensor outages or heavy weather. These failures are often introduced by data-inherent perturbations, which significantly reduce the information provided to the perception system. We propose a functionally modularized temporal filter, which stabilizes an abstract feature representation of a single-frame segmentation model using information of previous time steps. Our filter module splits the filter task into multiple less complex and more interpretable subtasks. The basic structure of the filter is inspired by a Bayes estimator consisting of a prediction and an update step. To make the prediction more transparent, we implement it using a geometric projection and estimate its parameters. This additionally enables the decomposition of the filter task into static representation filtering and low-dimensional motion filtering. Our model can cope with missing frames and is trainable in an end-to-end fashion. Using photorealistic, synthetic video data, we show the ability of the proposed architecture to overcome data-inherent perturbations. The experiments especially highlight advantages introduced by an interpretable and explicit filter module.
△ Less
Submitted 15 October, 2018; v1 submitted 9 October, 2018;
originally announced October 2018.
-
Robust 6D Object Pose Estimation in Cluttered Scenes using Semantic Segmentation and Pose Regression Networks
Authors:
Arul Selvam Periyasamy,
Max Schwarz,
Sven Behnke
Abstract:
Object pose estimation is a crucial prerequisite for robots to perform autonomous manipulation in clutter. Real-world bin-picking settings such as warehouses present additional challenges, e.g., new objects are added constantly. Most of the existing object pose estimation methods assume that 3D models of the objects is available beforehand. We present a pipeline that requires minimal human interve…
▽ More
Object pose estimation is a crucial prerequisite for robots to perform autonomous manipulation in clutter. Real-world bin-picking settings such as warehouses present additional challenges, e.g., new objects are added constantly. Most of the existing object pose estimation methods assume that 3D models of the objects is available beforehand. We present a pipeline that requires minimal human intervention and circumvents the reliance on the availability of 3D models by a fast data acquisition method and a synthetic data generation procedure. This work builds on previous work on semantic segmentation of cluttered bin-picking scenes to isolate individual objects in clutter. An additional network is trained on synthetic scenes to estimate object poses from a cropped object-centered encoding extracted from the segmentation results. The proposed method is evaluated on a synthetic validation dataset and cluttered real-world scenes.
△ Less
Submitted 8 October, 2018;
originally announced October 2018.
-
Team NimbRo at MBZIRC 2017: Autonomous Valve Stem Turning using a Wrench
Authors:
Max Schwarz,
David Droeschel,
Christian Lenz,
Arul Selvam Periyasamy,
En Yen Puang,
Jan Razlaw,
Diego Rodriguez,
Sebastian Schüller,
Michael Schreiber,
Sven Behnke
Abstract:
The Mohamed Bin Zayed International Robotics Challenge (MBZIRC) 2017 has defined ambitious new benchmarks to advance the state-of-the-art in autonomous operation of ground-based and flying robots. In this article, we describe our winning entry to MBZIRC Challenge 2: the mobile manipulation robot Mario. It is capable of autonomously solving a valve manipulation task using a wrench tool detected, gr…
▽ More
The Mohamed Bin Zayed International Robotics Challenge (MBZIRC) 2017 has defined ambitious new benchmarks to advance the state-of-the-art in autonomous operation of ground-based and flying robots. In this article, we describe our winning entry to MBZIRC Challenge 2: the mobile manipulation robot Mario. It is capable of autonomously solving a valve manipulation task using a wrench tool detected, grasped, and finally employed to turn a valve stem. Mario's omnidirectional base allows both fast locomotion and precise close approach to the manipulation panel. We describe an efficient detector for medium-sized objects in 3D laser scans and apply it to detect the manipulation panel. An object detection architecture based on deep neural networks is used to find and select the correct tool from grayscale images. Parametrized motion primitives are adapted online to percepts of the tool and valve stem in order to turn the stem. We report in detail on our winning performance at the challenge and discuss lessons learned.
△ Less
Submitted 6 October, 2018;
originally announced October 2018.
-
Fast Object Learning and Dual-arm Coordination for Cluttered Stowing, Picking, and Packing
Authors:
Max Schwarz,
Christian Lenz,
Germán Martín García,
Seongyong Koo,
Arul Selvam Periyasamy,
Michael Schreiber,
Sven Behnke
Abstract:
Robotic picking from cluttered bins is a demanding task, for which Amazon Robotics holds challenges. The 2017 Amazon Robotics Challenge (ARC) required stowing items into a storage system, picking specific items, and packing them into boxes. In this paper, we describe the entry of team NimbRo Picking. Our deep object perception pipeline can be quickly and efficiently adapted to new items using a cu…
▽ More
Robotic picking from cluttered bins is a demanding task, for which Amazon Robotics holds challenges. The 2017 Amazon Robotics Challenge (ARC) required stowing items into a storage system, picking specific items, and packing them into boxes. In this paper, we describe the entry of team NimbRo Picking. Our deep object perception pipeline can be quickly and efficiently adapted to new items using a custom turntable capture system and transfer learning. It produces high-quality item segments, on which grasp poses are found. A planning component coordinates manipulation actions between two robot arms, minimizing execution time. The system has been demonstrated successfully at ARC, where our team reached second places in both the picking task and the final stow-and-pick task. We also evaluate individual components.
△ Less
Submitted 6 October, 2018;
originally announced October 2018.
-
Hierarchical Recurrent Filtering for Fully Convolutional DenseNets
Authors:
Jörg Wagner,
Volker Fischer,
Michael Herman,
Sven Behnke
Abstract:
Generating a robust representation of the environment is a crucial ability of learning agents. Deep learning based methods have greatly improved perception systems but still fail in challenging situations. These failures are often not solvable on the basis of a single image. In this work, we present a parameter-efficient temporal filtering concept which extends an existing single-frame segmentatio…
▽ More
Generating a robust representation of the environment is a crucial ability of learning agents. Deep learning based methods have greatly improved perception systems but still fail in challenging situations. These failures are often not solvable on the basis of a single image. In this work, we present a parameter-efficient temporal filtering concept which extends an existing single-frame segmentation model to work with multiple frames. The resulting recurrent architecture temporally filters representations on all abstraction levels in a hierarchical manner, while decoupling temporal dependencies from scene representation. Using a synthetic dataset, we show the ability of our model to cope with data perturbations and highlight the importance of recurrent and hierarchical filtering.
△ Less
Submitted 15 October, 2018; v1 submitted 5 October, 2018;
originally announced October 2018.
-
NimbRo Rescue: Solving Disaster-Response Tasks through Mobile Manipulation Robot Momaro
Authors:
Max Schwarz,
Tobias Rodehutskors,
David Droeschel,
Marius Beul,
Michael Schreiber,
Nikita Araslanov,
Ivan Ivanov,
Christian Lenz,
Jan Razlaw,
Sebastian Schüller,
David Schwarz,
Angeliki Topalidou-Kyniazopoulou,
Sven Behnke
Abstract:
Robots that solve complex tasks in environments too dangerous for humans to enter are desperately needed, e.g. for search and rescue applications. We describe our mobile manipulation robot Momaro, with which we participated successfully in the DARPA Robotics Challenge. It features a unique locomotion design with four legs ending in steerable wheels, which allows it both to drive omnidirectionally…
▽ More
Robots that solve complex tasks in environments too dangerous for humans to enter are desperately needed, e.g. for search and rescue applications. We describe our mobile manipulation robot Momaro, with which we participated successfully in the DARPA Robotics Challenge. It features a unique locomotion design with four legs ending in steerable wheels, which allows it both to drive omnidirectionally and to step over obstacles or climb. Furthermore, we present advanced communication and teleoperation approaches, which include immersive 3D visualization, and 6D tracking of operator head and arm motions. The proposed system is evaluated in the DARPA Robotics Challenge, the DLR SpaceBot Cup Qualification and lab experiments. We also discuss the lessons learned from the competitions.
△ Less
Submitted 15 October, 2018; v1 submitted 2 October, 2018;
originally announced October 2018.
-
First International HARTING Open Source Prize Winner: The igus Humanoid Open Platform
Authors:
Philipp Allgeuer,
Grzegorz Ficht,
Hafez Farazi,
Michael Schreiber,
Sven Behnke
Abstract:
The use of standard platforms in the field of humanoid robotics can lower the entry barrier for new research groups, and accelerate research by the facilitation of code sharing. Numerous humanoid standard platforms exist in the lower size ranges of up to 60cm, but beyond that humanoid robots scale up quickly in weight and price, becoming less affordable and more difficult to operate, maintain and…
▽ More
The use of standard platforms in the field of humanoid robotics can lower the entry barrier for new research groups, and accelerate research by the facilitation of code sharing. Numerous humanoid standard platforms exist in the lower size ranges of up to 60cm, but beyond that humanoid robots scale up quickly in weight and price, becoming less affordable and more difficult to operate, maintain and modify. The igus Humanoid Open Platform is an affordable, fully open-source platform for humanoid research. At 92cm, the robot is capable of acting in an environment meant for humans, and is equipped with enough sensors, actuators and computing power to support researchers in many fields. The structure of the robot is entirely 3D printed, leading to a lightweight and visually appealing design. This paper covers the mechanical and electrical aspects of the robot, as well as the main features of the corresponding open-source ROS software. At RoboCup 2016, the platform was awarded the first International HARTING Open Source Prize.
△ Less
Submitted 28 September, 2018;
originally announced October 2018.
-
RGB-D Object Detection and Semantic Segmentation for Autonomous Manipulation in Clutter
Authors:
Max Schwarz,
Anton Milan,
Arul Selvam Periyasamy,
Sven Behnke
Abstract:
Autonomous robotic manipulation in clutter is challenging. A large variety of objects must be perceived in complex scenes, where they are partially occluded and embedded among many distractors, often in restricted spaces. To tackle these challenges, we developed a deep-learning approach that combines object detection and semantic segmentation. The manipulation scenes are captured with RGB-D camera…
▽ More
Autonomous robotic manipulation in clutter is challenging. A large variety of objects must be perceived in complex scenes, where they are partially occluded and embedded among many distractors, often in restricted spaces. To tackle these challenges, we developed a deep-learning approach that combines object detection and semantic segmentation. The manipulation scenes are captured with RGB-D cameras, for which we developed a depth fusion method. Employing pretrained features makes learning from small annotated robotic data sets possible. We evaluate our approach on two challenging data sets: one captured for the Amazon Picking Challenge 2016, where our team NimbRo came in second in the Stowing and third in the Picking task, and one captured in disaster-response scenarios. The experiments show that object detection and semantic segmentation complement each other and can be combined to yield reliable object perception.
△ Less
Submitted 1 October, 2018;
originally announced October 2018.
-
NimbRo-OP2: Grown-up 3D Printed Open Humanoid Platform for Research
Authors:
Grzegorz Ficht,
Philipp Allgeuer,
Hafez Farazi,
Sven Behnke
Abstract:
The versatility of humanoid robots in locomotion, full-body motion, interaction with unmodified human environments, and intuitive human-robot interaction led to increased research interest. Multiple smaller platforms are available for research, but these require a miniaturized environment to interact with---and often the small scale of the robot diminishes the influence of factors which would have…
▽ More
The versatility of humanoid robots in locomotion, full-body motion, interaction with unmodified human environments, and intuitive human-robot interaction led to increased research interest. Multiple smaller platforms are available for research, but these require a miniaturized environment to interact with---and often the small scale of the robot diminishes the influence of factors which would have affected larger robots. Unfortunately, many research platforms in the larger size range are less affordable, more difficult to operate, maintain and modify, and very often closed-source. In this work, we introduce NimbRo-OP2X, an affordable, fully open-source platform in terms of both hardware and software. Being almost 135cm tall and only 18kg in weight, the robot is not only capable of interacting in an environment meant for humans, but also easy and safe to operate and does not require a gantry when doing so. The exoskeleton of the robot is 3D printed, which produces a lightweight and visually appealing design. We present all mechanical and electrical aspects of the robot, as well as some of the software features of our well-established open-source ROS software. The NimbRo-OP2X performed at RoboCup 2017 in Nagoya, Japan, where it won the Humanoid League AdultSize Soccer competition and Technical Challenge.
△ Less
Submitted 28 September, 2018;
originally announced September 2018.
-
RoboCup 2016 Humanoid TeenSize Winner NimbRo: Robust Visual Perception and Soccer Behaviors
Authors:
Hafez Farazi,
Philipp Allgeuer,
Grzegorz Ficht,
André Brandenburger,
Dmytro Pavlichenko,
Michael Schreiber,
Sven Behnke
Abstract:
The trend in the RoboCup Humanoid League rules over the past few years has been towards a more realistic and challenging game environment. Elementary skills such as visual perception and walking, which had become mature enough for exciting gameplay, are now once again core challenges. The field goals are both white, and the walking surface is artificial grass, which constitutes a much more irregul…
▽ More
The trend in the RoboCup Humanoid League rules over the past few years has been towards a more realistic and challenging game environment. Elementary skills such as visual perception and walking, which had become mature enough for exciting gameplay, are now once again core challenges. The field goals are both white, and the walking surface is artificial grass, which constitutes a much more irregular surface than the carpet used before. In this paper, team NimbRo TeenSize, the winner of the TeenSize class of the RoboCup 2016 Humanoid League, presents its robotic platforms, the adaptations that had to be made to them, and the newest developments in visual perception and soccer behaviour.
△ Less
Submitted 28 September, 2018;
originally announced September 2018.
-
The igus Humanoid Open Platform: A Child-sized 3D Printed Open-Source Robot for Research
Authors:
Philipp Allgeuer,
Hafez Farazi,
Grzegorz Ficht,
Michael Schreiber,
Sven Behnke
Abstract:
The use of standard robotic platforms can accelerate research and lower the entry barrier for new research groups. There exist many affordable humanoid standard platforms in the lower size ranges of up to 60cm, but larger humanoid robots quickly become less affordable and more difficult to operate, maintain and modify. The igus Humanoid Open Platform is a new and affordable, fully open-source huma…
▽ More
The use of standard robotic platforms can accelerate research and lower the entry barrier for new research groups. There exist many affordable humanoid standard platforms in the lower size ranges of up to 60cm, but larger humanoid robots quickly become less affordable and more difficult to operate, maintain and modify. The igus Humanoid Open Platform is a new and affordable, fully open-source humanoid platform. At 92cm in height, the robot is capable of interacting in an environment meant for humans, and is equipped with enough sensors, actuators and computing power to support researchers in many fields. The structure of the robot is entirely 3D printed, leading to a lightweight and visually appealing design. The main features of the platform are described in this article.
△ Less
Submitted 28 September, 2018;
originally announced September 2018.
-
A Monocular Vision System for Playing Soccer in Low Color Information Environments
Authors:
Hafez Farazi,
Philipp Allgeuer,
Sven Behnke
Abstract:
Humanoid soccer robots perceive their environment exclusively through cameras. This paper presents a monocular vision system that was originally developed for use in the RoboCup Humanoid League, but is expected to be transferable to other soccer leagues. Recent changes in the Humanoid League rules resulted in a soccer environment with less color coding than in previous years, which makes perceptio…
▽ More
Humanoid soccer robots perceive their environment exclusively through cameras. This paper presents a monocular vision system that was originally developed for use in the RoboCup Humanoid League, but is expected to be transferable to other soccer leagues. Recent changes in the Humanoid League rules resulted in a soccer environment with less color coding than in previous years, which makes perception of the game situation more challenging. The proposed vision system addresses these challenges by using brightness and texture for the detection of the required field features and objects. Our system is robust to changes in lighting conditions, and is designed for real-time use on a humanoid soccer robot. This paper describes the main components of the detection algorithms in use, and presents experimental results from the soccer field, using ROS and the igus Humanoid Open Platform as a testbed. The proposed vision system was used successfully at RoboCup 2015.
△ Less
Submitted 28 September, 2018;
originally announced September 2018.
-
Learning to Improve Capture Steps for Disturbance Rejection in Humanoid Soccer
Authors:
Marcell Missura,
Cedrick Münstermann,
Philipp Allgeuer,
Max Schwarz,
Julio Pastrana,
Sebastian Schueller,
Michael Schreiber,
Sven Behnke
Abstract:
Over the past few years, soccer-playing humanoid robots have advanced significantly. Elementary skills, such as bipedal walking, visual perception, and collision avoidance have matured enough to allow for dynamic and exciting games. When two robots are fighting for the ball, they frequently push each other and balance recovery becomes crucial. In this paper, we report on insights we gained from sy…
▽ More
Over the past few years, soccer-playing humanoid robots have advanced significantly. Elementary skills, such as bipedal walking, visual perception, and collision avoidance have matured enough to allow for dynamic and exciting games. When two robots are fighting for the ball, they frequently push each other and balance recovery becomes crucial. In this paper, we report on insights we gained from systematic push experiments performed on a bipedal model and outline an online learning method we used to improve its push-recovery capabilities. In addition, we describe how the localization ambiguity introduced by the uniform goal color was resolved and report on the results of the RoboCup 2013 competition.
△ Less
Submitted 28 September, 2018;
originally announced September 2018.
-
Hierarchical and State-based Architectures for Robot Behavior Planning and Control
Authors:
Philipp Allgeuer,
Sven Behnke
Abstract:
In this paper, two behavior control architectures for autonomous agents in the form of cross-platform C++ frameworks are presented, the State Controller Library and the Behavior Control Framework. While the former is state-based and generalizes the notion of states and finite state machines to allow for multi-action planning, the latter is behavior-based and exploits a hierarchical structure and t…
▽ More
In this paper, two behavior control architectures for autonomous agents in the form of cross-platform C++ frameworks are presented, the State Controller Library and the Behavior Control Framework. While the former is state-based and generalizes the notion of states and finite state machines to allow for multi-action planning, the latter is behavior-based and exploits a hierarchical structure and the concept of inhibitions to allow for dynamic transitioning. The two frameworks have completely independent implementations, but can be used effectively in tandem to solve behavior control problems on all levels of granularity. Both frameworks have been used to control the NimbRo-OP, a humanoid soccer robot developed by team NimbRo of the University of Bonn.
△ Less
Submitted 28 September, 2018;
originally announced September 2018.
-
A ROS-based Software Framework for the NimbRo-OP Humanoid Open Platform
Authors:
Philipp Allgeuer,
Max Schwarz,
Julio Pastrana,
Sebastian Schueller,
Marcell Missura,
Sven Behnke
Abstract:
Over the past few years, a number of successful humanoid platforms have been developed, including the Nao and the DARwIn-OP, both of which are used by many research groups for the investigation of bipedal walking, full-body motions, and human-robot interaction. The NimbRo-OP is an open humanoid platform under development by team NimbRo of the University of Bonn. Significantly larger than the two a…
▽ More
Over the past few years, a number of successful humanoid platforms have been developed, including the Nao and the DARwIn-OP, both of which are used by many research groups for the investigation of bipedal walking, full-body motions, and human-robot interaction. The NimbRo-OP is an open humanoid platform under development by team NimbRo of the University of Bonn. Significantly larger than the two aforementioned humanoids, this platform has the potential to interact with a more human-scale environment. This paper describes a software framework for the NimbRo-OP that is based on the Robot Operating System (ROS) middleware. The software provides functionality for hardware abstraction, visual perception, and behavior generation, and has been used to implement basic soccer skills. These were demonstrated at RoboCup 2013, as part of the winning team of the Humanoid League competition.
△ Less
Submitted 28 September, 2018;
originally announced September 2018.