Search | arXiv e-print repository

Offline Actor-Critic Reinforcement Learning Scales to Large Models

Authors: Jost Tobias Springenberg, Abbas Abdolmaleki, **gwei Zhang, Oliver Groth, Michael Bloesch, Thomas Lampe, Philemon Brakel, Sarah Bechtle, Steven Kapturowski, Roland Hafner, Nicolas Heess, Martin Riedmiller

Abstract: We show that offline actor-critic reinforcement learning can scale to large models - such as transformers - and follows similar scaling laws as supervised learning. We find that offline actor-critic algorithms can outperform strong, supervised, behavioral cloning baselines for multi-task training on a large dataset containing both sub-optimal and expert behavior on 132 continuous control tasks. We… ▽ More We show that offline actor-critic reinforcement learning can scale to large models - such as transformers - and follows similar scaling laws as supervised learning. We find that offline actor-critic algorithms can outperform strong, supervised, behavioral cloning baselines for multi-task training on a large dataset containing both sub-optimal and expert behavior on 132 continuous control tasks. We introduce a Perceiver-based actor-critic model and elucidate the key model features needed to make offline RL work with self- and cross-attention modules. Overall, we find that: i) simple offline actor critic algorithms are a natural choice for gradually moving away from the currently predominant paradigm of behavioral cloning, and ii) via offline RL it is possible to learn multi-task policies that master many domains simultaneously, including real robotics tasks, from sub-optimal demonstrations or self-generated data. △ Less

Submitted 8 February, 2024; originally announced February 2024.

arXiv:2312.11374 [pdf, other]

Mastering Stacking of Diverse Shapes with Large-Scale Iterative Reinforcement Learning on Real Robots

Authors: Thomas Lampe, Abbas Abdolmaleki, Sarah Bechtle, Sandy H. Huang, Jost Tobias Springenberg, Michael Bloesch, Oliver Groth, Roland Hafner, Tim Hertweck, Michael Neunert, Markus Wulfmeier, **gwei Zhang, Francesco Nori, Nicolas Heess, Martin Riedmiller

Abstract: Reinforcement learning solely from an agent's self-generated data is often believed to be infeasible for learning on real robots, due to the amount of data needed. However, if done right, agents learning from real data can be surprisingly efficient through re-using previously collected sub-optimal data. In this paper we demonstrate how the increased understanding of off-policy learning methods and… ▽ More Reinforcement learning solely from an agent's self-generated data is often believed to be infeasible for learning on real robots, due to the amount of data needed. However, if done right, agents learning from real data can be surprisingly efficient through re-using previously collected sub-optimal data. In this paper we demonstrate how the increased understanding of off-policy learning methods and their embedding in an iterative online/offline scheme (``collect and infer'') can drastically improve data-efficiency by using all the collected experience, which empowers learning from real robot experience only. Moreover, the resulting policy improves significantly over the state of the art on a recently proposed real robot manipulation benchmark. Our approach learns end-to-end, directly from pixels, and does not rely on additional human domain knowledge such as a simulator or demonstrations. △ Less

Submitted 18 December, 2023; originally announced December 2023.

arXiv:2304.13653 [pdf, other]

doi 10.1126/scirobotics.adi8022

Learning Agile Soccer Skills for a Bipedal Robot with Deep Reinforcement Learning

Authors: Tuomas Haarnoja, Ben Moran, Guy Lever, Sandy H. Huang, Dhruva Tirumala, Jan Humplik, Markus Wulfmeier, Saran Tunyasuvunakool, Noah Y. Siegel, Roland Hafner, Michael Bloesch, Kristian Hartikainen, Arunkumar Byravan, Leonard Hasenclever, Yuval Tassa, Fereshteh Sadeghi, Nathan Batchelor, Federico Casarini, Stefano Saliceti, Charles Game, Neil Sreendra, Kushal Patel, Marlon Gwira, Andrea Huber, Nicole Hurley , et al. (3 additional authors not shown)

Abstract: We investigate whether Deep Reinforcement Learning (Deep RL) is able to synthesize sophisticated and safe movement skills for a low-cost, miniature humanoid robot that can be composed into complex behavioral strategies in dynamic environments. We used Deep RL to train a humanoid robot with 20 actuated joints to play a simplified one-versus-one (1v1) soccer game. The resulting agent exhibits robust… ▽ More We investigate whether Deep Reinforcement Learning (Deep RL) is able to synthesize sophisticated and safe movement skills for a low-cost, miniature humanoid robot that can be composed into complex behavioral strategies in dynamic environments. We used Deep RL to train a humanoid robot with 20 actuated joints to play a simplified one-versus-one (1v1) soccer game. The resulting agent exhibits robust and dynamic movement skills such as rapid fall recovery, walking, turning, kicking and more; and it transitions between them in a smooth, stable, and efficient manner. The agent's locomotion and tactical behavior adapts to specific game contexts in a way that would be impractical to manually design. The agent also developed a basic strategic understanding of the game, and learned, for instance, to anticipate ball movements and to block opponent shots. Our agent was trained in simulation and transferred to real robots zero-shot. We found that a combination of sufficiently high-frequency control, targeted dynamics randomization, and perturbations during training in simulation enabled good-quality transfer. Although the robots are inherently fragile, basic regularization of the behavior during training led the robots to learn safe and effective movements while still performing in a dynamic and agile way -- well beyond what is intuitively expected from the robot. Indeed, in experiments, they walked 181% faster, turned 302% faster, took 63% less time to get up, and kicked a ball 34% faster than a scripted baseline, while efficiently combining the skills to achieve the longer term objectives. △ Less

Submitted 11 April, 2024; v1 submitted 26 April, 2023; originally announced April 2023.

Comments: Project website: https://sites.google.com/view/op3-soccer

arXiv:2207.10940 [pdf, other]

Dense RGB-D-Inertial SLAM with Map Deformations

Authors: Tristan Laidlow, Michael Bloesch, Wenbin Li, Stefan Leutenegger

Abstract: While dense visual SLAM methods are capable of estimating dense reconstructions of the environment, they suffer from a lack of robustness in their tracking step, especially when the optimisation is poorly initialised. Sparse visual SLAM systems have attained high levels of accuracy and robustness through the inclusion of inertial measurements in a tightly-coupled fusion. Inspired by this performan… ▽ More While dense visual SLAM methods are capable of estimating dense reconstructions of the environment, they suffer from a lack of robustness in their tracking step, especially when the optimisation is poorly initialised. Sparse visual SLAM systems have attained high levels of accuracy and robustness through the inclusion of inertial measurements in a tightly-coupled fusion. Inspired by this performance, we propose the first tightly-coupled dense RGB-D-inertial SLAM system. Our system has real-time capability while running on a GPU. It jointly optimises for the camera pose, velocity, IMU biases and gravity direction while building up a globally consistent, fully dense surfel-based 3D reconstruction of the environment. Through a series of experiments on both synthetic and real world datasets, we show that our dense visual-inertial SLAM system is more robust to fast motions and periods of low texture and low geometric variation than a related RGB-D-only SLAM system. △ Less

Submitted 22 July, 2022; originally announced July 2022.

Comments: Accepted at IROS 2017; supplementary video available at https://youtu.be/-gUdQ0cxDh0

arXiv:2201.11861 [pdf, other]

The Challenges of Exploration for Offline Reinforcement Learning

Authors: Nathan Lambert, Markus Wulfmeier, William Whitney, Arunkumar Byravan, Michael Bloesch, Vibhavari Dasagi, Tim Hertweck, Martin Riedmiller

Abstract: Offline Reinforcement Learning (ORL) enablesus to separately study the two interlinked processes of reinforcement learning: collecting informative experience and inferring optimal behaviour. The second step has been widely studied in the offline setting, but just as critical to data-efficient RL is the collection of informative data. The task-agnostic setting for data collection, where the task is… ▽ More Offline Reinforcement Learning (ORL) enablesus to separately study the two interlinked processes of reinforcement learning: collecting informative experience and inferring optimal behaviour. The second step has been widely studied in the offline setting, but just as critical to data-efficient RL is the collection of informative data. The task-agnostic setting for data collection, where the task is not known a priori, is of particular interest due to the possibility of collecting a single dataset and using it to solve several downstream tasks as they arise. We investigate this setting via curiosity-based intrinsic motivation, a family of exploration methods which encourage the agent to explore those states or transitions it has not yet learned to model. With Explore2Offline, we propose to evaluate the quality of collected data by transferring the collected data and inferring policies with reward relabelling and standard offline RL algorithms. We evaluate a wide variety of data collection strategies, including a new exploration agent, Intrinsic Model Predictive Control (IMPC), using this scheme and demonstrate their performance on various tasks. We use this decoupled framework to strengthen intuitions about exploration and the data prerequisites for effective offline RL. △ Less

Submitted 18 February, 2022; v1 submitted 27 January, 2022; originally announced January 2022.

arXiv:2101.09458 [pdf, other]

Decoupled Exploration and Exploitation Policies for Sample-Efficient Reinforcement Learning

Authors: William F. Whitney, Michael Bloesch, Jost Tobias Springenberg, Abbas Abdolmaleki, Kyunghyun Cho, Martin Riedmiller

Abstract: Despite the close connection between exploration and sample efficiency, most state of the art reinforcement learning algorithms include no considerations for exploration beyond maximizing the entropy of the policy. In this work we address this seeming missed opportunity. We observe that the most common formulation of directed exploration in deep RL, known as bonus-based exploration (BBE), suffers… ▽ More Despite the close connection between exploration and sample efficiency, most state of the art reinforcement learning algorithms include no considerations for exploration beyond maximizing the entropy of the policy. In this work we address this seeming missed opportunity. We observe that the most common formulation of directed exploration in deep RL, known as bonus-based exploration (BBE), suffers from bias and slow coverage in the few-sample regime. This causes BBE to be actively detrimental to policy learning in many control tasks. We show that by decoupling the task policy from the exploration policy, directed exploration can be highly effective for sample-efficient continuous control. Our method, Decoupled Exploration and Exploitation Policies (DEEP), can be combined with any off-policy RL algorithm without modification. When used in conjunction with soft actor-critic, DEEP incurs no performance penalty in densely-rewarding environments. On sparse environments, DEEP gives a several-fold improvement in data efficiency due to better exploration. △ Less

Submitted 1 July, 2021; v1 submitted 23 January, 2021; originally announced January 2021.

arXiv:2008.12228 [pdf, other]

Towards General and Autonomous Learning of Core Skills: A Case Study in Locomotion

Authors: Roland Hafner, Tim Hertweck, Philipp Klöppner, Michael Bloesch, Michael Neunert, Markus Wulfmeier, Saran Tunyasuvunakool, Nicolas Heess, Martin Riedmiller

Abstract: Modern Reinforcement Learning (RL) algorithms promise to solve difficult motor control problems directly from raw sensory inputs. Their attraction is due in part to the fact that they can represent a general class of methods that allow to learn a solution with a reasonably set reward and minimal prior knowledge, even in situations where it is difficult or expensive for a human expert. For RL to tr… ▽ More Modern Reinforcement Learning (RL) algorithms promise to solve difficult motor control problems directly from raw sensory inputs. Their attraction is due in part to the fact that they can represent a general class of methods that allow to learn a solution with a reasonably set reward and minimal prior knowledge, even in situations where it is difficult or expensive for a human expert. For RL to truly make good on this promise, however, we need algorithms and learning setups that can work across a broad range of problems with minimal problem specific adjustments or engineering. In this paper, we study this idea of generality in the locomotion domain. We develop a learning framework that can learn sophisticated locomotion behavior for a wide spectrum of legged robots, such as bipeds, tripeds, quadrupeds and hexapods, including wheeled variants. Our learning framework relies on a data-efficient, off-policy multi-task RL algorithm and a small set of reward functions that are semantically identical across robots. To underline the general applicability of the method, we keep the hyper-parameter settings and reward definitions constant across experiments and rely exclusively on on-board sensing. For nine different types of robots, including a real-world quadruped robot, we demonstrate that the same algorithm can rapidly learn diverse and reusable locomotion skills without any platform specific adjustments or additional instrumentation of the learning setup. △ Less

Submitted 6 August, 2020; originally announced August 2020.

arXiv:2005.07541 [pdf, other]

Simple Sensor Intentions for Exploration

Authors: Tim Hertweck, Martin Riedmiller, Michael Bloesch, Jost Tobias Springenberg, Noah Siegel, Markus Wulfmeier, Roland Hafner, Nicolas Heess

Abstract: Modern reinforcement learning algorithms can learn solutions to increasingly difficult control problems while at the same time reduce the amount of prior knowledge needed for their application. One of the remaining challenges is the definition of reward schemes that appropriately facilitate exploration without biasing the solution in undesirable ways, and that can be implemented on real robotic sy… ▽ More Modern reinforcement learning algorithms can learn solutions to increasingly difficult control problems while at the same time reduce the amount of prior knowledge needed for their application. One of the remaining challenges is the definition of reward schemes that appropriately facilitate exploration without biasing the solution in undesirable ways, and that can be implemented on real robotic systems without expensive instrumentation. In this paper we focus on a setting in which goal tasks are defined via simple sparse rewards, and exploration is facilitated via agent-internal auxiliary tasks. We introduce the idea of simple sensor intentions (SSIs) as a generic way to define auxiliary tasks. SSIs reduce the amount of prior knowledge that is required to define suitable rewards. They can further be computed directly from raw sensor streams and thus do not require expensive and possibly brittle state estimation on real systems. We demonstrate that a learning system based on these rewards can solve complex robotic tasks in simulation and in real world settings. In particular, we show that a real robotic arm can learn to grasp and lift and solve a Ball-in-a-Cup task from scratch, when only raw sensor streams are used for both controller input and in the auxiliary reward definition. △ Less

Submitted 15 May, 2020; originally announced May 2020.

arXiv:2002.10342 [pdf, other]

Comparing View-Based and Map-Based Semantic Labelling in Real-Time SLAM

Authors: Zoe Landgraf, Fabian Falck, Michael Bloesch, Stefan Leutenegger, Andrew Davison

Abstract: Generally capable Spatial AI systems must build persistent scene representations where geometric models are combined with meaningful semantic labels. The many approaches to labelling scenes can be divided into two clear groups: view-based which estimate labels from the input view-wise data and then incrementally fuse them into the scene model as it is built; and map-based which label the generated… ▽ More Generally capable Spatial AI systems must build persistent scene representations where geometric models are combined with meaningful semantic labels. The many approaches to labelling scenes can be divided into two clear groups: view-based which estimate labels from the input view-wise data and then incrementally fuse them into the scene model as it is built; and map-based which label the generated scene model. However, there has so far been no attempt to quantitatively compare view-based and map-based labelling. Here, we present an experimental framework and comparison which uses real-time height map fusion as an accessible platform for a fair comparison, opening up the route to further systematic research in this area. △ Less

Submitted 24 February, 2020; originally announced February 2020.

Comments: ICRA 2020

arXiv:1903.06482 [pdf, other]

SceneCode: Monocular Dense Semantic Reconstruction using Learned Encoded Scene Representations

Authors: Shuaifeng Zhi, Michael Bloesch, Stefan Leutenegger, Andrew J. Davison

Abstract: Systems which incrementally create 3D semantic maps from image sequences must store and update representations of both geometry and semantic entities. However, while there has been much work on the correct formulation for geometrical estimation, state-of-the-art systems usually rely on simple semantic representations which store and update independent label estimates for each surface element (dept… ▽ More Systems which incrementally create 3D semantic maps from image sequences must store and update representations of both geometry and semantic entities. However, while there has been much work on the correct formulation for geometrical estimation, state-of-the-art systems usually rely on simple semantic representations which store and update independent label estimates for each surface element (depth pixels, surfels, or voxels). Spatial correlation is discarded, and fused label maps are incoherent and noisy. We introduce a new compact and optimisable semantic representation by training a variational auto-encoder that is conditioned on a colour image. Using this learned latent space, we can tackle semantic label fusion by jointly optimising the low-dimenional codes associated with each of a set of overlap** images, producing consistent fused label maps which preserve spatial correlation. We also show how this approach can be used within a monocular keyframe based semantic map** system where a similar code approach is used for geometry. The probabilistic formulation allows a flexible formulation where we can jointly estimate motion, geometry and semantics in a unified optimisation. △ Less

Submitted 18 March, 2019; v1 submitted 15 March, 2019; originally announced March 2019.

Comments: To be published in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2019)

arXiv:1812.07976 [pdf, other]

MID-Fusion: Octree-based Object-Level Multi-Instance Dynamic SLAM

Authors: Binbin Xu, Wenbin Li, Dimos Tzoumanikas, Michael Bloesch, Andrew Davison, Stefan Leutenegger

Abstract: We propose a new multi-instance dynamic RGB-D SLAM system using an object-level octree-based volumetric representation. It can provide robust camera tracking in dynamic environments and at the same time, continuously estimate geometric, semantic, and motion properties for arbitrary objects in the scene. For each incoming frame, we perform instance segmentation to detect objects and refine mask bou… ▽ More We propose a new multi-instance dynamic RGB-D SLAM system using an object-level octree-based volumetric representation. It can provide robust camera tracking in dynamic environments and at the same time, continuously estimate geometric, semantic, and motion properties for arbitrary objects in the scene. For each incoming frame, we perform instance segmentation to detect objects and refine mask boundaries using geometric and motion information. Meanwhile, we estimate the pose of each existing moving object using an object-oriented tracking method and robustly track the camera pose against the static scene. Based on the estimated camera pose and object poses, we associate segmented masks with existing models and incrementally fuse corresponding colour, depth, semantic, and foreground object probabilities into each object model. In contrast to existing approaches, our system is the first system to generate an object-level dynamic volumetric map from a single RGB-D camera, which can be used directly for robotic tasks. Our method can run at 2-3 Hz on a CPU, excluding the instance segmentation part. We demonstrate its effectiveness by quantitatively and qualitatively testing it on both synthetic and real-world sequences. △ Less

Submitted 21 March, 2019; v1 submitted 19 December, 2018; originally announced December 2018.

Comments: Accepted to International Conference on Robotics and Automation (ICRA) 2019. 7 (6 + 1) pages. Please also see video Link: https://youtu.be/gturboNl9gg

arXiv:1811.10681 [pdf, other]

Matching Features without Descriptors: Implicitly Matched Interest Points

Authors: Titus Cieslewski, Michael Bloesch, Davide Scaramuzza

Abstract: The extraction and matching of interest points is a prerequisite for many geometric computer vision problems. Traditionally, matching has been achieved by assigning descriptors to interest points and matching points that have similar descriptors. In this paper, we propose a method by which interest points are instead already implicitly matched at detection time. With this, descriptors do not need… ▽ More The extraction and matching of interest points is a prerequisite for many geometric computer vision problems. Traditionally, matching has been achieved by assigning descriptors to interest points and matching points that have similar descriptors. In this paper, we propose a method by which interest points are instead already implicitly matched at detection time. With this, descriptors do not need to be calculated, stored, communicated, or matched any more. This is achieved by a convolutional neural network with multiple output channels and can be thought of as a collection of a variety of detectors, each specialized to specific visual features. This paper describes how to design and train such a network in a way that results in successful relative pose estimation performance despite the limitation on interest point count. While the overall matching score is slightly lower than with traditional methods, the approach is descriptor free and thus enables localization systems with a significantly smaller memory footprint and multi-agent localization systems with lower bandwidth requirements. The network also outputs the confidence for a specific interest point resulting in a valid match. We evaluate performance relative to state-of-the-art alternatives. △ Less

Submitted 5 August, 2019; v1 submitted 26 November, 2018; originally announced November 2018.

Comments: 10 pages without references, accepted for publication at the British Machine Vision Conference (BMVC), Cardiff, 2019. v2 contains additional results, and a bug in the evaluation of LF-NET has been fixed

Journal ref: British Machine Vision Conference (BMVC), Cardiff, 2019

arXiv:1810.03237 [pdf, other]

Task-Embedded Control Networks for Few-Shot Imitation Learning

Authors: Stephen James, Michael Bloesch, Andrew J. Davison

Abstract: Much like humans, robots should have the ability to leverage knowledge from previously learned tasks in order to learn new tasks quickly in new and unfamiliar environments. Despite this, most robot learning approaches have focused on learning a single task, from scratch, with a limited notion of generalisation, and no way of leveraging the knowledge to learn other tasks more efficiently. One possi… ▽ More Much like humans, robots should have the ability to leverage knowledge from previously learned tasks in order to learn new tasks quickly in new and unfamiliar environments. Despite this, most robot learning approaches have focused on learning a single task, from scratch, with a limited notion of generalisation, and no way of leveraging the knowledge to learn other tasks more efficiently. One possible solution is meta-learning, but many of the related approaches are limited in their ability to scale to a large number of tasks and to learn further tasks without forgetting previously learned ones. With this in mind, we introduce Task-Embedded Control Networks, which employ ideas from metric learning in order to create a task embedding that can be used by a robot to learn new tasks from one or more demonstrations. In the area of visually-guided manipulation, we present simulation results in which we surpass the performance of a state-of-the-art method when using only visual information from each demonstration. Additionally, we demonstrate that our approach can also be used in conjunction with domain randomisation to train our few-shot learning ability in simulation and then deploy in the real world without any additional training. Once deployed, the robot can learn new tasks from a single real-world demonstration. △ Less

Submitted 7 October, 2018; originally announced October 2018.

Comments: Published at the Conference on Robot Learning (CoRL) 2018

arXiv:1809.02966 [pdf, other]

LS-Net: Learning to Solve Nonlinear Least Squares for Monocular Stereo

Authors: Ronald Clark, Michael Bloesch, Jan Czarnowski, Stefan Leutenegger, Andrew J. Davison

Abstract: Sum-of-squares objective functions are very popular in computer vision algorithms. However, these objective functions are not always easy to optimize. The underlying assumptions made by solvers are often not satisfied and many problems are inherently ill-posed. In this paper, we propose LS-Net, a neural nonlinear least squares optimization algorithm which learns to effectively optimize these cost… ▽ More Sum-of-squares objective functions are very popular in computer vision algorithms. However, these objective functions are not always easy to optimize. The underlying assumptions made by solvers are often not satisfied and many problems are inherently ill-posed. In this paper, we propose LS-Net, a neural nonlinear least squares optimization algorithm which learns to effectively optimize these cost functions even in the presence of adversities. Unlike traditional approaches, the proposed solver requires no hand-crafted regularizers or priors as these are implicitly learned from the data. We apply our method to the problem of motion stereo ie. jointly estimating the motion and scene geometry from pairs of images of a monocular sequence. We show that our learned optimizer is able to efficiently and effectively solve this challenging optimization problem. △ Less

Submitted 9 September, 2018; originally announced September 2018.

Comments: ECCV 2018. Video: https://youtu.be/5bZbMm8UqbA

arXiv:1808.08378 [pdf, other]

Fusion++: Volumetric Object-Level SLAM

Authors: John McCormac, Ronald Clark, Michael Bloesch, Andrew J. Davison, Stefan Leutenegger

Abstract: We propose an online object-level SLAM system which builds a persistent and accurate 3D graph map of arbitrary reconstructed objects. As an RGB-D camera browses a cluttered indoor scene, Mask-RCNN instance segmentations are used to initialise compact per-object Truncated Signed Distance Function (TSDF) reconstructions with object size-dependent resolutions and a novel 3D foreground mask. Reconstru… ▽ More We propose an online object-level SLAM system which builds a persistent and accurate 3D graph map of arbitrary reconstructed objects. As an RGB-D camera browses a cluttered indoor scene, Mask-RCNN instance segmentations are used to initialise compact per-object Truncated Signed Distance Function (TSDF) reconstructions with object size-dependent resolutions and a novel 3D foreground mask. Reconstructed objects are stored in an optimisable 6DoF pose graph which is our only persistent map representation. Objects are incrementally refined via depth fusion, and are used for tracking, relocalisation and loop closure detection. Loop closures cause adjustments in the relative pose estimates of object instances, but no intra-object war**. Each object also carries semantic information which is refined over time and an existence probability to account for spurious instance predictions. We demonstrate our approach on a hand-held RGB-D sequence from a cluttered office scene with a large number and variety of object instances, highlighting how the system closes loops and makes good use of existing objects on repeated loops. We quantitatively evaluate the trajectory error of our system against a baseline approach on the RGB-D SLAM benchmark, and qualitatively compare reconstruction quality of discovered objects on the YCB video dataset. Performance evaluation shows our approach is highly memory efficient and runs online at 4-8Hz (excluding relocalisation) despite not being optimised at the software level. △ Less

Submitted 28 August, 2018; v1 submitted 25 August, 2018; originally announced August 2018.

arXiv:1804.00874 [pdf, other]

CodeSLAM - Learning a Compact, Optimisable Representation for Dense Visual SLAM

Authors: Michael Bloesch, Jan Czarnowski, Ronald Clark, Stefan Leutenegger, Andrew J. Davison

Abstract: The representation of geometry in real-time 3D perception systems continues to be a critical research issue. Dense maps capture complete surface shape and can be augmented with semantic labels, but their high dimensionality makes them computationally costly to store and process, and unsuitable for rigorous probabilistic inference. Sparse feature-based representations avoid these problems, but capt… ▽ More The representation of geometry in real-time 3D perception systems continues to be a critical research issue. Dense maps capture complete surface shape and can be augmented with semantic labels, but their high dimensionality makes them computationally costly to store and process, and unsuitable for rigorous probabilistic inference. Sparse feature-based representations avoid these problems, but capture only partial scene information and are mainly useful for localisation only. We present a new compact but dense representation of scene geometry which is conditioned on the intensity data from a single image and generated from a code consisting of a small number of parameters. We are inspired by work both on learned depth from images, and auto-encoders. Our approach is suitable for use in a keyframe-based monocular dense SLAM system: While each keyframe with a code can produce a depth map, the code can be optimised efficiently jointly with pose variables and together with the codes of overlap** keyframes to attain global consistency. Conditioning the depth map on the image allows the code to only represent aspects of the local geometry which cannot directly be predicted from the image. We explain how to learn our code representation, and demonstrate its advantageous properties in monocular SLAM. △ Less

Submitted 14 April, 2019; v1 submitted 3 April, 2018; originally announced April 2018.

Comments: Published in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018)

arXiv:1801.07478 [pdf, other]

Why and How to Avoid the Flipped Quaternion Multiplication

Authors: Hannes Sommer, Igor Gilitschenski, Michael Bloesch, Stephan Weiss, Roland Siegwart, Juan Nieto

Abstract: Over the last decades quaternions have become a crucial and very successful tool for attitude representation in robotics and aerospace. However, there is a major problem that is continuously causing trouble in practice when it comes to exchanging formulas or implementations: there are two quaternion multiplications in common use, Hamilton's original multiplication and its flipped version, which is… ▽ More Over the last decades quaternions have become a crucial and very successful tool for attitude representation in robotics and aerospace. However, there is a major problem that is continuously causing trouble in practice when it comes to exchanging formulas or implementations: there are two quaternion multiplications in common use, Hamilton's original multiplication and its flipped version, which is often associated with NASA's Jet Propulsion Laboratory. We believe that this particular issue is completely avoidable and only exists today due to a lack of understanding. This paper explains the underlying problem for the popular passive world to body usage of rotation quaternions, and derives an alternative solution compatible with Hamilton's multiplication. Furthermore, it argues for entirely discontinuing the flipped multiplication. Additionally, it provides recipes for efficiently detecting relevant conventions and migrating formulas or algorithms between them. △ Less

Submitted 5 February, 2018; v1 submitted 23 January, 2018; originally announced January 2018.

Comments: 16 pages, 1 figure, 2 tables (minor improvements and fixes over v1, smaller page margins)

arXiv:1708.06652 [pdf, other]

doi 10.1109/MRA.2017.2771326

Build Your Own Visual-Inertial Drone: A Cost-Effective and Open-Source Autonomous Drone

Authors: Inkyu Sa, Mina Kamel, Michael Burri, Michael Bloesch, Raghav Khanna, Marija Popovic, Juan Nieto, Roland Siegwart

Abstract: This paper describes an approach to building a cost-effective and research grade visual-inertial odometry aided vertical taking-off and landing (VTOL) platform. We utilize an off-the-shelf visual-inertial sensor, an onboard computer, and a quadrotor platform that are factory-calibrated and mass-produced, thereby sharing similar hardware and sensor specifications (e.g., mass, dimensions, intrinsic… ▽ More This paper describes an approach to building a cost-effective and research grade visual-inertial odometry aided vertical taking-off and landing (VTOL) platform. We utilize an off-the-shelf visual-inertial sensor, an onboard computer, and a quadrotor platform that are factory-calibrated and mass-produced, thereby sharing similar hardware and sensor specifications (e.g., mass, dimensions, intrinsic and extrinsic of camera-IMU systems, and signal-to-noise ratio). We then perform a system calibration and identification enabling the use of our visual-inertial odometry, multi-sensor fusion, and model predictive control frameworks with the off-the-shelf products. This implies that we can partially avoid tedious parameter tuning procedures for building a full system. The complete system is extensively evaluated both indoors using a motion capture system and outdoors using a laser tracker while performing hover and step responses, and trajectory following tasks in the presence of external wind disturbances. We achieve root-mean-square (RMS) pose errors between a reference and actual trajectories of 0.036m, while performing hover. We also conduct relatively long distance flight (~180m) experiments on a farm site and achieve 0.82% drift error of the total distance flight. This paper conveys the insights we acquired about the platform and sensor module and returns to the community as open-source code with tutorial documentation. △ Less

Submitted 6 September, 2018; v1 submitted 22 August, 2017; originally announced August 2017.

Comments: 21 pages, 10 figures, accepted to IEEE Robotics & Automation Magazine

Journal ref: IEEE Robotics & Automation Magazine 2017

arXiv:1607.07405 [pdf, other]

gvnn: Neural Network Library for Geometric Computer Vision

Authors: Ankur Handa, Michael Bloesch, Viorica Patraucean, Simon Stent, John McCormac, Andrew Davison

Abstract: We introduce gvnn, a neural network library in Torch aimed towards bridging the gap between classic geometric computer vision and deep learning. Inspired by the recent success of Spatial Transformer Networks, we propose several new layers which are often used as parametric transformations on the data in geometric computer vision. These layers can be inserted within a neural network much in the spi… ▽ More We introduce gvnn, a neural network library in Torch aimed towards bridging the gap between classic geometric computer vision and deep learning. Inspired by the recent success of Spatial Transformer Networks, we propose several new layers which are often used as parametric transformations on the data in geometric computer vision. These layers can be inserted within a neural network much in the spirit of the original spatial transformers and allow backpropagation to enable end-to-end learning of a network involving any domain knowledge in geometric computer vision. This opens up applications in learning invariance to 3D geometric transformation for place recognition, end-to-end visual odometry, depth estimation and unsupervised learning through war** with a parametric transformation for image reconstruction error. △ Less

Submitted 12 August, 2016; v1 submitted 25 July, 2016; originally announced July 2016.

Comments: Submitted to ECCV Workshop on Deep Geometry

arXiv:1606.05285 [pdf, other]

A Primer on the Differential Calculus of 3D Orientations

Authors: Michael Bloesch, Hannes Sommer, Tristan Laidlow, Michael Burri, Gabriel Nuetzi, Péter Fankhauser, Dario Bellicoso, Christian Gehring, Stefan Leutenegger, Marco Hutter, Roland Siegwart

Abstract: The proper handling of 3D orientations is a central element in many optimization problems in engineering. Unfortunately many researchers and engineers struggle with the formulation of such problems and often fall back to suboptimal solutions. The existence of many different conventions further complicates this issue, especially when interfacing multiple differing implementations. This document dis… ▽ More The proper handling of 3D orientations is a central element in many optimization problems in engineering. Unfortunately many researchers and engineers struggle with the formulation of such problems and often fall back to suboptimal solutions. The existence of many different conventions further complicates this issue, especially when interfacing multiple differing implementations. This document discusses an alternative approach which makes use of a more abstract notion of 3D orientations. The relative orientation between two coordinate systems is primarily identified by the coordinate map** it induces. This is combined with the standard exponential map in order to introduce representation-independent and minimal differentials, which are very convenient in optimization based methods. △ Less

Submitted 31 October, 2016; v1 submitted 16 June, 2016; originally announced June 2016.

arXiv:1507.02081 [pdf, other]

An Open Source, Fiducial Based, Visual-Inertial Motion Capture System

Authors: Michael Neunert, Michael Bloesch, Jonas Buchli

Abstract: Many robotic tasks rely on the accurate localization of moving objects within a given workspace. This information about the objects' poses and velocities are used for control,motion planning, navigation, interaction with the environment or verification. Often motion capture systems are used to obtain such a state estimate. However, these systems are often costly, limited in workspace size and not… ▽ More Many robotic tasks rely on the accurate localization of moving objects within a given workspace. This information about the objects' poses and velocities are used for control,motion planning, navigation, interaction with the environment or verification. Often motion capture systems are used to obtain such a state estimate. However, these systems are often costly, limited in workspace size and not suitable for outdoor usage. Therefore, we propose a lightweight and easy to use, visual-inertial Simultaneous Localization and Map** approach that leverages cost-efficient, paper printable artificial landmarks, socalled fiducials. Results show that by fusing visual and inertial data, the system provides accurate estimates and is robust against fast motions and changing lighting conditions. Tight integration of the estimation of sensor and fiducial pose as well as extrinsics ensures accuracy, map consistency and avoids the requirement for precalibration. By providing an open source implementation and various datasets, partially with ground truth information, we enable community members to run, test, modify and extend the system either using these datasets or directly running the system on their own robotic setups. △ Less

Submitted 13 June, 2016; v1 submitted 8 July, 2015; originally announced July 2015.

Comments: To appear in The International Conference on Information Fusion (FUSION) 2016

arXiv:1402.5450 [pdf, ps, other]

doi 10.1109/IROS.2014.6942674

State Estimation for a Humanoid Robot

Authors: Nicholas Rotella, Michael Bloesch, Ludovic Righetti, Stefan Schaal

Abstract: This paper introduces a framework for state estimation on a humanoid robot platform using only common proprioceptive sensors and knowledge of leg kinematics. The presented approach extends that detailed in [1] on a quadruped platform by incorporating the rotational constraints imposed by the humanoid's flat feet. As in previous work, the proposed Extended Kalman Filter (EKF) accommodates contact s… ▽ More This paper introduces a framework for state estimation on a humanoid robot platform using only common proprioceptive sensors and knowledge of leg kinematics. The presented approach extends that detailed in [1] on a quadruped platform by incorporating the rotational constraints imposed by the humanoid's flat feet. As in previous work, the proposed Extended Kalman Filter (EKF) accommodates contact switching and makes no assumptions about gait or terrain, making it applicable on any humanoid platform for use in any task. The filter employs a sensor-based prediction model which uses inertial data from an IMU and corrects for integrated error using a kinematics-based measurement model which relies on joint encoders and a kinematic model to determine the relative position and orientation of the feet. A nonlinear observability analysis is performed on both the original and updated filters and it is concluded that the new filter significantly simplifies singular cases and improves the observability characteristics of the system. Results on simulated walking and squatting datasets demonstrate the performance gain of the flat-foot filter as well as confirm the results of the presented observability analysis. △ Less

Submitted 10 December, 2014; v1 submitted 21 February, 2014; originally announced February 2014.

Comments: IROS 2014 Submission, IEEE/RSJ International Conference on Intelligent Robots and Systems (2014) 952-958

Showing 1–22 of 22 results for author: Bloesch, M