Search | arXiv e-print repository

QuadricSLAM: Dual Quadrics from Object Detections as Landmarks in Object-oriented SLAM

Authors: Lachlan Nicholson, Michael Milford, Niko Sünderhauf

Abstract: In this paper, we use 2D object detections from multiple views to simultaneously estimate a 3D quadric surface for each object and localize the camera position. We derive a SLAM formulation that uses dual quadrics as 3D landmark representations, exploiting their ability to compactly represent the size, position and orientation of an object, and show how 2D object detections can directly constrain… ▽ More In this paper, we use 2D object detections from multiple views to simultaneously estimate a 3D quadric surface for each object and localize the camera position. We derive a SLAM formulation that uses dual quadrics as 3D landmark representations, exploiting their ability to compactly represent the size, position and orientation of an object, and show how 2D object detections can directly constrain the quadric parameters via a novel geometric error formulation. We develop a sensor model for object detectors that addresses the challenge of partially visible objects, and demonstrate how to jointly estimate the camera pose and constrained dual quadric parameters in factor graph based SLAM with a general perspective camera. △ Less

Submitted 16 August, 2018; v1 submitted 10 April, 2018; originally announced April 2018.

Comments: Accepted at IEEE Robotics and Automation Letters (RA-L). arXiv admin note: text overlap with arXiv:1708.00965

arXiv:1801.05078 [pdf, other]

Don't Look Back: Robustifying Place Categorization for Viewpoint- and Condition-Invariant Place Recognition

Authors: Sourav Garg, Niko Suenderhauf, Michael Milford

Abstract: When a human drives a car along a road for the first time, they later recognize where they are on the return journey typically without needing to look in their rear-view mirror or turn around to look back, despite significant viewpoint and appearance change. Such navigation capabilities are typically attributed to our semantic visual understanding of the environment [1] beyond geometry to recogniz… ▽ More When a human drives a car along a road for the first time, they later recognize where they are on the return journey typically without needing to look in their rear-view mirror or turn around to look back, despite significant viewpoint and appearance change. Such navigation capabilities are typically attributed to our semantic visual understanding of the environment [1] beyond geometry to recognizing the types of places we are passing through such as "passing a shop on the left" or "moving through a forested area". Humans are in effect using place categorization [2] to perform specific place recognition even when the viewpoint is 180 degrees reversed. Recent advances in deep neural networks have enabled high-performance semantic understanding of visual places and scenes, opening up the possibility of emulating what humans do. In this work, we develop a novel methodology for using the semantics-aware higher-order layers of deep neural networks for recognizing specific places from within a reference database. To further improve the robustness to appearance change, we develop a descriptor normalization scheme that builds on the success of normalization schemes for pure appearance-based techniques such as SeqSLAM [3]. Using two different datasets - one road-based, one pedestrian-based, we evaluate the performance of the system in performing place recognition on reverse traversals of a route with a limited field of view camera and no turn-back-and-look behaviours, and compare to existing state-of-the-art techniques and vanilla off-the-shelf features. The results demonstrate significant improvements over the existing state of the art, especially for extreme perceptual challenges that involve both great viewpoint change and environmental appearance change. We also provide experimental analyses of the contributions of the various system components. △ Less

Submitted 15 January, 2018; originally announced January 2018.

Comments: 9 pages, 11 figures, ICRA 2018

arXiv:1711.10137 [pdf, other]

One-Shot Reinforcement Learning for Robot Navigation with Interactive Replay

Authors: Jake Bruce, Niko Suenderhauf, Piotr Mirowski, Raia Hadsell, Michael Milford

Abstract: Recently, model-free reinforcement learning algorithms have been shown to solve challenging problems by learning from extensive interaction with the environment. A significant issue with transferring this success to the robotics domain is that interaction with the real world is costly, but training on limited experience is prone to overfitting. We present a method for learning to navigate, to a fi… ▽ More Recently, model-free reinforcement learning algorithms have been shown to solve challenging problems by learning from extensive interaction with the environment. A significant issue with transferring this success to the robotics domain is that interaction with the real world is costly, but training on limited experience is prone to overfitting. We present a method for learning to navigate, to a fixed goal and in a known environment, on a mobile robot. The robot leverages an interactive world model built from a single traversal of the environment, a pre-trained visual feature encoder, and stochastic environmental augmentation, to demonstrate successful zero-shot transfer under real-world environmental variations without fine-tuning. △ Less

Submitted 28 November, 2017; v1 submitted 28 November, 2017; originally announced November 2017.

Comments: NIPS Workshop on Acting and Interacting in the Real World: Challenges in Robot Learning

Journal ref: Bruce, Jake, et al. "One-Shot Reinforcement Learning for Robot Navigation with Interactive Replay." Proceedings of the NIPS Workshop on Acting and Interacting in the Real World: Challenges in Robot Learning. 2017

arXiv:1711.07280 [pdf, other]

Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments

Authors: Peter Anderson, Qi Wu, Damien Teney, Jake Bruce, Mark Johnson, Niko Sünderhauf, Ian Reid, Stephen Gould, Anton van den Hengel

Abstract: A robot that can carry out a natural-language instruction has been a dream since before the Jetsons cartoon series imagined a life of leisure mediated by a fleet of attentive robot helpers. It is a dream that remains stubbornly distant. However, recent advances in vision and language methods have made incredible progress in closely related areas. This is significant because a robot interpreting a… ▽ More A robot that can carry out a natural-language instruction has been a dream since before the Jetsons cartoon series imagined a life of leisure mediated by a fleet of attentive robot helpers. It is a dream that remains stubbornly distant. However, recent advances in vision and language methods have made incredible progress in closely related areas. This is significant because a robot interpreting a natural-language navigation instruction on the basis of what it sees is carrying out a vision and language process that is similar to Visual Question Answering. Both tasks can be interpreted as visually grounded sequence-to-sequence translation problems, and many of the same methods are applicable. To enable and encourage the application of vision and language methods to the problem of interpreting visually-grounded navigation instructions, we present the Matterport3D Simulator -- a large-scale reinforcement learning environment based on real imagery. Using this simulator, which can in future support a range of embodied vision and language tasks, we provide the first benchmark dataset for visually-grounded natural language navigation in real buildings -- the Room-to-Room (R2R) dataset. △ Less

Submitted 5 April, 2018; v1 submitted 20 November, 2017; originally announced November 2017.

Comments: CVPR 2018 Spotlight presentation

arXiv:1710.06677 [pdf, other]

Dropout Sampling for Robust Object Detection in Open-Set Conditions

Authors: Dimity Miller, Lachlan Nicholson, Feras Dayoub, Niko Sünderhauf

Abstract: Dropout Variational Inference, or Dropout Sampling, has been recently proposed as an approximation technique for Bayesian Deep Learning and evaluated for image classification and regression tasks. This paper investigates the utility of Dropout Sampling for object detection for the first time. We demonstrate how label uncertainty can be extracted from a state-of-the-art object detection system via… ▽ More Dropout Variational Inference, or Dropout Sampling, has been recently proposed as an approximation technique for Bayesian Deep Learning and evaluated for image classification and regression tasks. This paper investigates the utility of Dropout Sampling for object detection for the first time. We demonstrate how label uncertainty can be extracted from a state-of-the-art object detection system via Dropout Sampling. We evaluate this approach on a large synthetic dataset of 30,000 images, and a real-world dataset captured by a mobile robot in a versatile campus environment. We show that this uncertainty can be utilized to increase object detection performance under the open-set conditions that are typically encountered in robotic vision. A Dropout Sampling network is shown to achieve a 12.3% increase in recall (for the same precision score as a standard network) and a 15.1% increase in precision (for the same recall score as the standard network). △ Less

Submitted 18 April, 2018; v1 submitted 18 October, 2017; originally announced October 2017.

Comments: to appear in IEEE International Conference on Robotics and Automation 2018 (ICRA 2018)

arXiv:1709.07158 [pdf, other]

SceneCut: Joint Geometric and Object Segmentation for Indoor Scenes

Authors: Trung Pham, Thanh-Toan Do, Niko Sünderhauf, Ian Reid

Abstract: This paper presents SceneCut, a novel approach to jointly discover previously unseen objects and non-object surfaces using a single RGB-D image. SceneCut's joint reasoning over scene semantics and geometry allows a robot to detect and segment object instances in complex scenes where modern deep learning-based methods either fail to separate object instances, or fail to detect objects that were not… ▽ More This paper presents SceneCut, a novel approach to jointly discover previously unseen objects and non-object surfaces using a single RGB-D image. SceneCut's joint reasoning over scene semantics and geometry allows a robot to detect and segment object instances in complex scenes where modern deep learning-based methods either fail to separate object instances, or fail to detect objects that were not seen during training. SceneCut automatically decomposes a scene into meaningful regions which either represent objects or scene surfaces. The decomposition is qualified by an unified energy function over objectness and geometric fitting. We show how this energy function can be optimized efficiently by utilizing hierarchical segmentation trees. Moreover, we leverage a pre-trained convolutional oriented boundary network to predict accurate boundaries from images, which are used to construct high-quality region hierarchies. We evaluate SceneCut on several different indoor environments, and the results show that SceneCut significantly outperforms all the existing methods. △ Less

Submitted 24 May, 2018; v1 submitted 21 September, 2017; originally announced September 2017.

Comments: Published in ICRA 2018

Journal ref: ICRA 2018

arXiv:1708.00965 [pdf, other]

Dual Quadrics from Object Detection BoundingBoxes as Landmark Representations in SLAM

Authors: Niko Sünderhauf, Michael Milford

Abstract: Research in Simultaneous Localization And Map** (SLAM) is increasingly moving towards richer world representations involving objects and high level features that enable a semantic model of the world for robots, potentially leading to a more meaningful set of robot-world interactions. Many of these advances are grounded in state-of-the-art computer vision techniques primarily developed in the con… ▽ More Research in Simultaneous Localization And Map** (SLAM) is increasingly moving towards richer world representations involving objects and high level features that enable a semantic model of the world for robots, potentially leading to a more meaningful set of robot-world interactions. Many of these advances are grounded in state-of-the-art computer vision techniques primarily developed in the context of image-based benchmark datasets, leaving several challenges to be addressed in adapting them for use in robotics. In this paper, we derive a formulation for Simultaneous Localization And Map** (SLAM) that uses dual quadrics as 3D landmark representations, and show how 2D bounding boxes (such as those typically obtained from visual object detection systems) can directly constrain the quadric parameters. Our paper demonstrates how to jointly estimate the robot pose and dual quadric parameters in factor graph based SLAM with a general perspective camera, and covers the use-cases of a robot moving with a monocular camera with and without the availability of additional depth information. △ Less

Submitted 2 August, 2017; originally announced August 2017.

arXiv:1706.06718 [pdf, other]

Multi-Modal Trip Hazard Affordance Detection On Construction Sites

Authors: Sean McMahon, Niko Sünderhauf, Ben Upcroft, Michael Milford

Abstract: Trip hazards are a significant contributor to accidents on construction and manufacturing sites, where over a third of Australian workplace injuries occur [1]. Current safety inspections are labour intensive and limited by human fallibility,making automation of trip hazard detection appealing from both a safety and economic perspective. Trip hazards present an interesting challenge to modern learn… ▽ More Trip hazards are a significant contributor to accidents on construction and manufacturing sites, where over a third of Australian workplace injuries occur [1]. Current safety inspections are labour intensive and limited by human fallibility,making automation of trip hazard detection appealing from both a safety and economic perspective. Trip hazards present an interesting challenge to modern learning techniques because they are defined as much by affordance as by object type; for example wires on a table are not a trip hazard, but can be if lying on the ground. To address these challenges, we conduct a comprehensive investigation into the performance characteristics of 11 different colour and depth fusion approaches, including 4 fusion and one non fusion approach; using colour and two types of depth images. Trained and tested on over 600 labelled trip hazards over 4 floors and 2000m$\mathrm{^{2}}$ in an active construction site,this approach was able to differentiate between identical objects in different physical configurations (see Figure 1). Outperforming a colour-only detector, our multi-modal trip detector fuses colour and depth information to achieve a 4% absolute improvement in F1-score. These investigative results and the extensive publicly available dataset moves us one step closer to assistive or fully automated safety inspection systems on construction sites. △ Less

Submitted 20 June, 2017; originally announced June 2017.

Comments: 9 Pages, 12 Figures, 2 Tables, Accepted to Robotics and Automation Letters (RA-L)

arXiv:1703.07473 [pdf, other]

Episode-Based Active Learning with Bayesian Neural Networks

Authors: Feras Dayoub, Niko Sünderhauf, Peter Corke

Abstract: We investigate different strategies for active learning with Bayesian deep neural networks. We focus our analysis on scenarios where new, unlabeled data is obtained episodically, such as commonly encountered in mobile robotics applications. An evaluation of different strategies for acquisition, updating, and final training on the CIFAR-10 dataset shows that incremental network updates with final t… ▽ More We investigate different strategies for active learning with Bayesian deep neural networks. We focus our analysis on scenarios where new, unlabeled data is obtained episodically, such as commonly encountered in mobile robotics applications. An evaluation of different strategies for acquisition, updating, and final training on the CIFAR-10 dataset shows that incremental network updates with final training on the accumulated acquisition set are essential for best performance, while limiting the amount of required human labeling labor. △ Less

Submitted 21 March, 2017; originally announced March 2017.

arXiv:1703.02658 [pdf, other]

What Would You Do? Acting by Learning to Predict

Authors: Adam Tow, Niko Sünderhauf, Sareh Shirazi, Michael Milford, Jürgen Leitner

Abstract: We propose to learn tasks directly from visual demonstrations by learning to predict the outcome of human and robot actions on an environment. We enable a robot to physically perform a human demonstrated task without knowledge of the thought processes or actions of the human, only their visually observable state transitions. We evaluate our approach on two table-top, object manipulation tasks and… ▽ More We propose to learn tasks directly from visual demonstrations by learning to predict the outcome of human and robot actions on an environment. We enable a robot to physically perform a human demonstrated task without knowledge of the thought processes or actions of the human, only their visually observable state transitions. We evaluate our approach on two table-top, object manipulation tasks and demonstrate generalisation to previously unseen states. Our approach reduces the priors required to implement a robot task learning system compared with the existing approaches of Learning from Demonstration, Reinforcement Learning and Inverse Reinforcement Learning. △ Less

Submitted 7 March, 2017; originally announced March 2017.

Comments: Submitted to International Conference on Intelligent Robots and Systems (IROS 2017)

arXiv:1701.05105 [pdf]

Deep Learning Features at Scale for Visual Place Recognition

Authors: Zetao Chen, Adam Jacobson, Niko Sunderhauf, Ben Upcroft, Lingqiao Liu, Chunhua Shen, Ian Reid, Michael Milford

Abstract: The success of deep learning techniques in the computer vision domain has triggered a range of initial investigations into their utility for visual place recognition, all using generic features from networks that were trained for other types of recognition tasks. In this paper, we train, at large scale, two CNN architectures for the specific place recognition task and employ a multi-scale feature… ▽ More The success of deep learning techniques in the computer vision domain has triggered a range of initial investigations into their utility for visual place recognition, all using generic features from networks that were trained for other types of recognition tasks. In this paper, we train, at large scale, two CNN architectures for the specific place recognition task and employ a multi-scale feature encoding method to generate condition- and viewpoint-invariant features. To enable this training to occur, we have developed a massive Specific PlacEs Dataset (SPED) with hundreds of examples of place appearance change at thousands of different places, as opposed to the semantic place type datasets currently available. This new dataset enables us to set up a training regime that interprets place recognition as a classification problem. We comprehensively evaluate our trained networks on several challenging benchmark place recognition datasets and demonstrate that they achieve an average 10% increase in performance over other place recognition algorithms and pre-trained CNNs. By analyzing the network responses and their differences from pre-trained networks, we provide insights into what a network learns when training for place recognition, and what these results signify for future research in this area. △ Less

Submitted 18 January, 2017; originally announced January 2017.

Comments: 8 pages, 10 figures. Accepted by International Conference on Robotics and Automation (ICRA) 2017. This is the submitted version. The final published version may be slightly different

arXiv:1609.07849 [pdf, other]

Meaningful Maps With Object-Oriented Semantic Map**

Authors: Niko Sünderhauf, Trung T. Pham, Yasir Latif, Michael Milford, Ian Reid

Abstract: For intelligent robots to interact in meaningful ways with their environment, they must understand both the geometric and semantic properties of the scene surrounding them. The majority of research to date has addressed these map** challenges separately, focusing on either geometric or semantic map**. In this paper we address the problem of building environmental maps that include both semanti… ▽ More For intelligent robots to interact in meaningful ways with their environment, they must understand both the geometric and semantic properties of the scene surrounding them. The majority of research to date has addressed these map** challenges separately, focusing on either geometric or semantic map**. In this paper we address the problem of building environmental maps that include both semantically meaningful, object-level entities and point- or mesh-based geometrical representations. We simultaneously build geometric point cloud models of previously unseen instances of known object classes and create a map that contains these object models as central entities. Our system leverages sparse, feature-based RGB-D SLAM, image-based deep-learning object detection and 3D unsupervised segmentation. △ Less

Submitted 2 August, 2017; v1 submitted 26 September, 2016; originally announced September 2016.

arXiv:1609.05258 [pdf, other]

The ACRV Picking Benchmark (APB): A Robotic Shelf Picking Benchmark to Foster Reproducible Research

Authors: Jürgen Leitner, Adam W. Tow, Jake E. Dean, Niko Suenderhauf, Joseph W. Durham, Matthew Cooper, Markus Eich, Christopher Lehnert, Ruben Mangels, Christopher McCool, Peter Kujala, Lachlan Nicholson, Trung Pham, James Sergeant, Liao Wu, Fangyi Zhang, Ben Upcroft, Peter Corke

Abstract: Robotic challenges like the Amazon Picking Challenge (APC) or the DARPA Challenges are an established and important way to drive scientific progress. They make research comparable on a well-defined benchmark with equal test conditions for all participants. However, such challenge events occur only occasionally, are limited to a small number of contestants, and the test conditions are very difficul… ▽ More Robotic challenges like the Amazon Picking Challenge (APC) or the DARPA Challenges are an established and important way to drive scientific progress. They make research comparable on a well-defined benchmark with equal test conditions for all participants. However, such challenge events occur only occasionally, are limited to a small number of contestants, and the test conditions are very difficult to replicate after the main event. We present a new physical benchmark challenge for robotic picking: the ACRV Picking Benchmark (APB). Designed to be reproducible, it consists of a set of 42 common objects, a widely available shelf, and exact guidelines for object arrangement using stencils. A well-defined evaluation protocol enables the comparison of \emph{complete} robotic systems -- including perception and manipulation -- instead of sub-systems only. Our paper also describes and reports results achieved by an open baseline system based on a Baxter robot. △ Less

Submitted 14 December, 2016; v1 submitted 16 September, 2016; originally announced September 2016.

Comments: 8 pages, submitted to RA:Letters

arXiv:1507.02428 [pdf, other]

Place Categorization and Semantic Map** on a Mobile Robot

Authors: Niko Sünderhauf, Feras Dayoub, Sean McMahon, Ben Talbot, Ruth Schulz, Peter Corke, Gordon Wyeth, Ben Upcroft, Michael Milford

Abstract: In this paper we focus on the challenging problem of place categorization and semantic map** on a robot without environment-specific training. Motivated by their ongoing success in various visual recognition tasks, we build our system upon a state-of-the-art convolutional network. We overcome its closed-set limitations by complementing the network with a series of one-vs-all classifiers that can… ▽ More In this paper we focus on the challenging problem of place categorization and semantic map** on a robot without environment-specific training. Motivated by their ongoing success in various visual recognition tasks, we build our system upon a state-of-the-art convolutional network. We overcome its closed-set limitations by complementing the network with a series of one-vs-all classifiers that can learn to recognize new semantic classes online. Prior domain knowledge is incorporated by embedding the classification system into a Bayesian filter framework that also ensures temporal coherence. We evaluate the classification accuracy of the system on a robot that maps a variety of places on our campus in real-time. We show how semantic information can boost robotic object detection performance and how the semantic map can be used to modulate the robot's behaviour during navigation tasks. The system is made available to the community as a ROS module. △ Less

Submitted 9 July, 2015; originally announced July 2015.

arXiv:1501.04158 [pdf, other]

On the Performance of ConvNet Features for Place Recognition

Authors: Niko Sünderhauf, Feras Dayoub, Sareh Shirazi, Ben Upcroft, Michael Milford

Abstract: After the incredible success of deep learning in the computer vision domain, there has been much interest in applying Convolutional Network (ConvNet) features in robotic fields such as visual navigation and SLAM. Unfortunately, there are fundamental differences and challenges involved. Computer vision datasets are very different in character to robotic camera data, real-time performance is essenti… ▽ More After the incredible success of deep learning in the computer vision domain, there has been much interest in applying Convolutional Network (ConvNet) features in robotic fields such as visual navigation and SLAM. Unfortunately, there are fundamental differences and challenges involved. Computer vision datasets are very different in character to robotic camera data, real-time performance is essential, and performance priorities can be different. This paper comprehensively evaluates and compares the utility of three state-of-the-art ConvNets on the problems of particular relevance to navigation for robots; viewpoint-invariance and condition-invariance, and for the first time enables real-time place recognition performance using ConvNets with large maps by integrating a variety of existing (locality-sensitive hashing) and novel (semantic search space partitioning) optimization techniques. We present extensive experiments on four real world datasets cultivated to evaluate each of the specific challenges in place recognition. The results demonstrate that speed-ups of two orders of magnitude can be achieved with minimal accuracy degradation, enabling real-time performance. We confirm that networks trained for semantic place categorization also perform better at (specific) place recognition when faced with severe appearance changes and provide a reference for which networks and layers are optimal for different aspects of the place recognition problem. △ Less

Submitted 28 July, 2015; v1 submitted 17 January, 2015; originally announced January 2015.

arXiv:1402.3213 [pdf, other]

Proceedings of the 1st Workshop on Robotics Challenges and Vision (RCV2013)

Authors: Aitor Aladren, Sasa Bodiroza, Hamidreza Chitsaz, J. J. Guerrero, Verena Hafner, Kris Hauser, Aleksandar Jevtic, Moslem Kazemi, Bruno Lara, Gonzalo Lopez-Nicolas, Peer Neubert, Peter Protzel, Laurel D. Riek, Niko Sunderhauf, Chee Yap

Abstract: Proceedings of the 1st Workshop on Robotics Challenges and Vision (RCV2013) Proceedings of the 1st Workshop on Robotics Challenges and Vision (RCV2013) △ Less

Submitted 13 February, 2014; originally announced February 2014.

Comments: http://compbio.cs.wayne.edu/robotics/rcv2013/proceedings-emb.pdf

Showing 51–66 of 66 results for author: Sünderhauf, N