Search | arXiv e-print repository

Semi-Autonomous Mobile Search and Rescue Robot for Radiation Disaster Scenarios

Authors: Simon Schwaiger, Lucas Muster, Georg Novotny, Michael Schebek, Wilfried Wöber, Stefan Thalhammer, Christoph Böhm

Abstract: This paper describes a novel semi-autonomous mobile robot system designed to assist search and rescue (SAR) first responders in disaster scenarios. While robots offer significant potential in SAR missions, current solutions are limited in their ability to handle a diverse range of tasks. This gap is addressed by presenting a system capable of (1) autonomous navigation and map**, allowing the rob… ▽ More This paper describes a novel semi-autonomous mobile robot system designed to assist search and rescue (SAR) first responders in disaster scenarios. While robots offer significant potential in SAR missions, current solutions are limited in their ability to handle a diverse range of tasks. This gap is addressed by presenting a system capable of (1) autonomous navigation and map**, allowing the robot to autonomously explore and map areas affected by catastrophic events, (2) radiation map**, enabling the system to triangulate a radiation map from discrete radiation measurements to aid in identifying hazardous areas, (3) semi-autonomous substance sampling, allowing the robot to collect samples of suspicious substances and analyze them onboard with immediate classification, and (4) valve manipulation, enabling teleoperated closing of valves that control hazardous material flow. This semi-autonomous approach balances human control over critical tasks like substance sampling with efficient robot navigation in low-risk areas. The system is evaluated during three trials that simulate possible disaster scenarios, two of which have been recorded during the European Robotics Hackathon (EnRicH). Furthermore, we provide recorded sensor data as well as the implemented software system as supplemental material through a GitHub repository: https://github.com/TW-Robotics/search-and-rescue-robot-IROS2024. △ Less

Submitted 20 June, 2024; originally announced June 2024.

arXiv:2402.06436 [pdf, other]

Improving 2D-3D Dense Correspondences with Diffusion Models for 6D Object Pose Estimation

Authors: Peter Hönig, Stefan Thalhammer, Markus Vincze

Abstract: Estimating 2D-3D correspondences between RGB images and 3D space is a fundamental problem in 6D object pose estimation. Recent pose estimators use dense correspondence maps and Point-to-Point algorithms to estimate object poses. The accuracy of pose estimation depends heavily on the quality of the dense correspondence maps and their ability to withstand occlusion, clutter, and challenging material… ▽ More Estimating 2D-3D correspondences between RGB images and 3D space is a fundamental problem in 6D object pose estimation. Recent pose estimators use dense correspondence maps and Point-to-Point algorithms to estimate object poses. The accuracy of pose estimation depends heavily on the quality of the dense correspondence maps and their ability to withstand occlusion, clutter, and challenging material properties. Currently, dense correspondence maps are estimated using image-to-image translation models based on GANs, Autoencoders, or direct regression models. However, recent advancements in image-to-image translation have led to diffusion models being the superior choice when evaluated on benchmarking datasets. In this study, we compare image-to-image translation networks based on GANs and diffusion models for the downstream task of 6D object pose estimation. Our results demonstrate that the diffusion-based image-to-image translation model outperforms the GAN, revealing potential for further improvements in 6D object pose estimation models. △ Less

Submitted 9 February, 2024; originally announced February 2024.

Comments: Submitted to the First Austrian Symposium on AI, Robotics, and Vision 2024

arXiv:2402.04878 [pdf, other]

STAR: Shape-focused Texture Agnostic Representations for Improved Object Detection and 6D Pose Estimation

Authors: Peter Hönig, Stefan Thalhammer, Jean-Baptiste Weibel, Matthias Hirschmanner, Markus Vincze

Abstract: Recent advances in machine learning have greatly benefited object detection and 6D pose estimation for robotic gras**. However, textureless and metallic objects still pose a significant challenge due to fewer visual cues and the texture bias of CNNs. To address this issue, we propose a texture-agnostic approach that focuses on learning from CAD models and emphasizes object shape features. To ach… ▽ More Recent advances in machine learning have greatly benefited object detection and 6D pose estimation for robotic gras**. However, textureless and metallic objects still pose a significant challenge due to fewer visual cues and the texture bias of CNNs. To address this issue, we propose a texture-agnostic approach that focuses on learning from CAD models and emphasizes object shape features. To achieve a focus on learning shape features, the textures are randomized during the rendering of the training data. By treating the texture as noise, the need for real-world object instances or their final appearance during training data generation is eliminated. The TLESS and ITODD datasets, specifically created for industrial settings in robotics and featuring textureless and metallic objects, were used for evaluation. Texture agnosticity also increases the robustness against image perturbations such as imaging noise, motion blur, and brightness changes, which are common in robotics applications. Code and datasets are publicly available at github.com/hoenigpeter/randomized_texturing. △ Less

Submitted 7 February, 2024; originally announced February 2024.

Comments: Submitted to IEEE Robotics and Automation Letters

arXiv:2309.11986 [pdf, other]

ZS6D: Zero-shot 6D Object Pose Estimation using Vision Transformers

Authors: Philipp Ausserlechner, David Haberger, Stefan Thalhammer, Jean-Baptiste Weibel, Markus Vincze

Abstract: As robotic systems increasingly encounter complex and unconstrained real-world scenarios, there is a demand to recognize diverse objects. The state-of-the-art 6D object pose estimation methods rely on object-specific training and therefore do not generalize to unseen objects. Recent novel object pose estimation methods are solving this issue using task-specific fine-tuned CNNs for deep template ma… ▽ More As robotic systems increasingly encounter complex and unconstrained real-world scenarios, there is a demand to recognize diverse objects. The state-of-the-art 6D object pose estimation methods rely on object-specific training and therefore do not generalize to unseen objects. Recent novel object pose estimation methods are solving this issue using task-specific fine-tuned CNNs for deep template matching. This adaptation for pose estimation still requires expensive data rendering and training procedures. MegaPose for example is trained on a dataset consisting of two million images showing 20,000 different objects to reach such generalization capabilities. To overcome this shortcoming we introduce ZS6D, for zero-shot novel object 6D pose estimation. Visual descriptors, extracted using pre-trained Vision Transformers (ViT), are used for matching rendered templates against query images of objects and for establishing local correspondences. These local correspondences enable deriving geometric correspondences and are used for estimating the object's 6D pose with RANSAC-based PnP. This approach showcases that the image descriptors extracted by pre-trained ViTs are well-suited to achieve a notable improvement over two state-of-the-art novel object 6D pose estimation methods, without the need for task-specific fine-tuning. Experiments are performed on LMO, YCBV, and TLESS. In comparison to one of the two methods we improve the Average Recall on all three datasets and compared to the second method we improve on two datasets. △ Less

Submitted 21 September, 2023; originally announced September 2023.

arXiv:2307.12172 [pdf, ps, other]

Challenges for Monocular 6D Object Pose Estimation in Robotics

Authors: Stefan Thalhammer, Dominik Bauer, Peter Hönig, Jean-Baptiste Weibel, José García-Rodríguez, Markus Vincze

Abstract: Object pose estimation is a core perception task that enables, for example, object gras** and scene understanding. The widely available, inexpensive and high-resolution RGB sensors and CNNs that allow for fast inference based on this modality make monocular approaches especially well suited for robotics applications. We observe that previous surveys on object pose estimation establish the state… ▽ More Object pose estimation is a core perception task that enables, for example, object gras** and scene understanding. The widely available, inexpensive and high-resolution RGB sensors and CNNs that allow for fast inference based on this modality make monocular approaches especially well suited for robotics applications. We observe that previous surveys on object pose estimation establish the state of the art for varying modalities, single- and multi-view settings, and datasets and metrics that consider a multitude of applications. We argue, however, that those works' broad scope hinders the identification of open challenges that are specific to monocular approaches and the derivation of promising future challenges for their application in robotics. By providing a unified view on recent publications from both robotics and computer vision, we find that occlusion handling, novel pose representations, and formalizing and improving category-level pose estimation are still fundamental challenges that are highly relevant for robotics. Moreover, to further improve robotic performance, large object sets, novel objects, refractive materials, and uncertainty estimates are central, largely unsolved open challenges. In order to address them, ontological reasoning, deformability handling, scene-level reasoning, realistic datasets, and the ecological footprint of algorithms need to be improved. △ Less

Submitted 22 July, 2023; originally announced July 2023.

Comments: arXiv admin note: substantial text overlap with arXiv:2302.11827

arXiv:2306.00129 [pdf, ps, other]

Self-supervised Vision Transformers for 3D Pose Estimation of Novel Objects

Authors: Stefan Thalhammer, Jean-Baptiste Weibel, Markus Vincze, Jose Garcia-Rodriguez

Abstract: Object pose estimation is important for object manipulation and scene understanding. In order to improve the general applicability of pose estimators, recent research focuses on providing estimates for novel objects, that is objects unseen during training. Such works use deep template matching strategies to retrieve the closest template connected to a query image. This template retrieval implicitl… ▽ More Object pose estimation is important for object manipulation and scene understanding. In order to improve the general applicability of pose estimators, recent research focuses on providing estimates for novel objects, that is objects unseen during training. Such works use deep template matching strategies to retrieve the closest template connected to a query image. This template retrieval implicitly provides object class and pose. Despite the recent success and improvements of Vision Transformers over CNNs for many vision tasks, the state of the art uses CNN-based approaches for novel object pose estimation. This work evaluates and demonstrates the differences between self-supervised CNNs and Vision Transformers for deep template matching. In detail, both types of approaches are trained using contrastive learning to match training images against rendered templates of isolated objects. At test time, such templates are matched against query images of known and novel objects under challenging settings, such as clutter, occlusion and object symmetries, using masked cosine similarity. The presented results not only demonstrate that Vision Transformers improve in matching accuracy over CNNs, but also that for some cases pre-trained Vision Transformers do not need fine-tuning to do so. Furthermore, we highlight the differences in optimization and network architecture when comparing these two types of network for deep template matching. △ Less

Submitted 31 May, 2023; originally announced June 2023.

arXiv:2302.11827

Open Challenges for Monocular Single-shot 6D Object Pose Estimation

Authors: Stefan Thalhammer, Peter Hönig, Jean-Baptiste Weibel, Markus Vincze

Abstract: Object pose estimation is a non-trivial task that enables robotic manipulation, bin picking, augmented reality, and scene understanding, to name a few use cases. Monocular object pose estimation gained considerable momentum with the rise of high-performing deep learning-based solutions and is particularly interesting for the community since sensors are inexpensive and inference is fast. Prior work… ▽ More Object pose estimation is a non-trivial task that enables robotic manipulation, bin picking, augmented reality, and scene understanding, to name a few use cases. Monocular object pose estimation gained considerable momentum with the rise of high-performing deep learning-based solutions and is particularly interesting for the community since sensors are inexpensive and inference is fast. Prior works establish the comprehensive state of the art for diverse pose estimation problems. Their broad scopes make it difficult to identify promising future directions. We narrow down the scope to the problem of single-shot monocular 6D object pose estimation, which is commonly used in robotics, and thus are able to identify such trends. By reviewing recent publications in robotics and computer vision, the state of the art is established at the union of both fields. Following that, we identify promising research directions in order to help researchers to formulate relevant research ideas and effectively advance the state of the art. Findings include that methods are sophisticated enough to overcome the domain shift and that occlusion handling is a fundamental challenge. We also highlight problems such as novel object pose estimation and challenging materials handling as central challenges to advance robotics. △ Less

Submitted 20 July, 2023; v1 submitted 23 February, 2023; originally announced February 2023.

Comments: Revised version in the making

arXiv:2211.08182 [pdf, other]

Gras** the Inconspicuous

Authors: Hrishikesh Gupta, Stefan Thalhammer, Markus Leitner, Markus Vincze

Abstract: Transparent objects are common in day-to-day life and hence find many applications that require robot gras**. Many solutions toward object gras** exist for non-transparent objects. However, due to the unique visual properties of transparent objects, standard 3D sensors produce noisy or distorted measurements. Modern approaches tackle this problem by either refining the noisy depth measurements… ▽ More Transparent objects are common in day-to-day life and hence find many applications that require robot gras**. Many solutions toward object gras** exist for non-transparent objects. However, due to the unique visual properties of transparent objects, standard 3D sensors produce noisy or distorted measurements. Modern approaches tackle this problem by either refining the noisy depth measurements or using some intermediate representation of the depth. Towards this, we study deep learning 6D pose estimation from RGB images only for transparent object gras**. To train and test the suitability of RGB-based object pose estimation, we construct a dataset of RGB-only images with 6D pose annotations. The experiments demonstrate the effectiveness of RGB image space for gras** transparent objects. △ Less

Submitted 15 November, 2022; originally announced November 2022.

arXiv:2208.08807 [pdf, other]

COPE: End-to-end trainable Constant Runtime Object Pose Estimation

Authors: Stefan Thalhammer, Timothy Patten, Markus Vincze

Abstract: State-of-the-art object pose estimation handles multiple instances in a test image by using multi-model formulations: detection as a first stage and then separately trained networks per object for 2D-3D geometric correspondence prediction as a second stage. Poses are subsequently estimated using the Perspective-n-Points algorithm at runtime. Unfortunately, multi-model formulations are slow and do… ▽ More State-of-the-art object pose estimation handles multiple instances in a test image by using multi-model formulations: detection as a first stage and then separately trained networks per object for 2D-3D geometric correspondence prediction as a second stage. Poses are subsequently estimated using the Perspective-n-Points algorithm at runtime. Unfortunately, multi-model formulations are slow and do not scale well with the number of object instances involved. Recent approaches show that direct 6D object pose estimation is feasible when derived from the aforementioned geometric correspondences. We present an approach that learns an intermediate geometric representation of multiple objects to directly regress 6D poses of all instances in a test image. The inherent end-to-end trainability overcomes the requirement of separately processing individual object instances. By calculating the mutual Intersection-over-Unions, pose hypotheses are clustered into distinct instances, which achieves negligible runtime overhead with respect to the number of object instances. Results on multiple challenging standard datasets show that the pose estimation performance is superior to single-model state-of-the-art approaches despite being more than ~35 times faster. We additionally provide an analysis showing real-time applicability (>24 fps) for images where more than 90 object instances are present. Further results show the advantage of supervising geometric-correspondence-based object pose estimation with the 6D pose. △ Less

Submitted 22 August, 2022; v1 submitted 18 August, 2022; originally announced August 2022.

arXiv:2010.16117 [pdf, other]

PyraPose: Feature Pyramids for Fast and Accurate Object Pose Estimation under Domain Shift

Authors: Stefan Thalhammer, Markus Leitner, Timothy Patten, Markus Vincze

Abstract: Object pose estimation enables robots to understand and interact with their environments. Training with synthetic data is necessary in order to adapt to novel situations. Unfortunately, pose estimation under domain shift, i.e., training on synthetic data and testing in the real world, is challenging. Deep learning-based approaches currently perform best when using encoder-decoder networks but typi… ▽ More Object pose estimation enables robots to understand and interact with their environments. Training with synthetic data is necessary in order to adapt to novel situations. Unfortunately, pose estimation under domain shift, i.e., training on synthetic data and testing in the real world, is challenging. Deep learning-based approaches currently perform best when using encoder-decoder networks but typically do not generalize to new scenarios with different scene characteristics. We argue that patch-based approaches, instead of encoder-decoder networks, are more suited for synthetic-to-real transfer because local to global object information is better represented. To that end, we present a novel approach based on a specialized feature pyramid network to compute multi-scale features for creating pose hypotheses on different feature map resolutions in parallel. Our single-shot pose estimation approach is evaluated on multiple standard datasets and outperforms the state of the art by up to 35%. We also perform gras** experiments in the real world to demonstrate the advantage of using synthetic data to generalize to novel environments. △ Less

Submitted 30 October, 2020; originally announced October 2020.

Showing 1–10 of 10 results for author: Thalhammer, S