Search | arXiv e-print repository

PerAct2: A Perceiver Actor Framework for Bimanual Manipulation Tasks

Authors: Markus Grotz, Mohit Shridhar, Tamim Asfour, Dieter Fox

Abstract: Bimanual manipulation is challenging due to precise spatial and temporal coordination required between two arms. While there exist several real-world bimanual systems, there is a lack of simulated benchmarks with a large task diversity for systematically studying bimanual capabilities across a wide range of tabletop tasks. This paper addresses the gap by extending RLBench to bimanual manipulation.… ▽ More Bimanual manipulation is challenging due to precise spatial and temporal coordination required between two arms. While there exist several real-world bimanual systems, there is a lack of simulated benchmarks with a large task diversity for systematically studying bimanual capabilities across a wide range of tabletop tasks. This paper addresses the gap by extending RLBench to bimanual manipulation. We open-source our code and benchmark comprising 13 new tasks with 23 unique task variations, each requiring a high degree of coordination and adaptability. To kickstart the benchmark, we extended several state-of-the art methods to bimanual manipulation and also present a language-conditioned behavioral cloning agent -- PerAct2, which enables the learning and execution of bimanual 6-DoF manipulation tasks. Our novel network architecture efficiently integrates language processing with action prediction, allowing robots to understand and perform complex bimanual tasks in response to user-specified goals. Project website with code is available at: http://bimanual.github.io △ Less

Submitted 28 June, 2024; originally announced July 2024.

arXiv:2406.10421 [pdf, other]

SciEx: Benchmarking Large Language Models on Scientific Exams with Human Expert Grading and Automatic Grading

Authors: Tu Anh Dinh, Carlos Mullov, Leonard Bärmann, Zhaolin Li, Danni Liu, Simon Reiß, Jueun Lee, Nathan Lerzer, Fabian Ternava, Jianfeng Gao, Alexander Waibel, Tamim Asfour, Michael Beigl, Rainer Stiefelhagen, Carsten Dachsbacher, Klemens Böhm, Jan Niehues

Abstract: With the rapid development of Large Language Models (LLMs), it is crucial to have benchmarks which can evaluate the ability of LLMs on different domains. One common use of LLMs is performing tasks on scientific topics, such as writing algorithms, querying databases or giving mathematical proofs. Inspired by the way university students are evaluated on such tasks, in this paper, we propose SciEx -… ▽ More With the rapid development of Large Language Models (LLMs), it is crucial to have benchmarks which can evaluate the ability of LLMs on different domains. One common use of LLMs is performing tasks on scientific topics, such as writing algorithms, querying databases or giving mathematical proofs. Inspired by the way university students are evaluated on such tasks, in this paper, we propose SciEx - a benchmark consisting of university computer science exam questions, to evaluate LLMs ability on solving scientific tasks. SciEx is (1) multilingual, containing both English and German exams, and (2) multi-modal, containing questions that involve images, and (3) contains various types of freeform questions with different difficulty levels, due to the nature of university exams. We evaluate the performance of various state-of-the-art LLMs on our new benchmark. Since SciEx questions are freeform, it is not straightforward to evaluate LLM performance. Therefore, we provide human expert grading of the LLM outputs on SciEx. We show that the free-form exams in SciEx remain challenging for the current LLMs, where the best LLM only achieves 59.4\% exam grade on average. We also provide detailed comparisons between LLM performance and student performance on SciEx. To enable future evaluation of new LLMs, we propose using LLM-as-a-judge to grade the LLM answers on SciEx. Our experiments show that, although they do not perform perfectly on solving the exams, LLMs are decent as graders, achieving 0.948 Pearson correlation with expert grading. △ Less

Submitted 14 June, 2024; originally announced June 2024.

ACM Class: I.2.7

arXiv:2406.06054 [pdf, other]

Influence of Motion Restrictions in an Ankle Exoskeleton on Gait Kinematics and Stability in Straight Walking

Authors: Miha Dezman, Charlotte Marquardt, Adnan Ugur, Tamim Asfour

Abstract: Exoskeleton devices impose kinematic constraints on a user's motion and affect their stability due to added mass but also due to the simplified mechanical design. This paper investigates how these constraints resulting from simplified mechanical designs impact the gait kinematics and stability of users by wearing an ankle exoskeleton with changeable degree of freedom (DoF). The exoskeleton used in… ▽ More Exoskeleton devices impose kinematic constraints on a user's motion and affect their stability due to added mass but also due to the simplified mechanical design. This paper investigates how these constraints resulting from simplified mechanical designs impact the gait kinematics and stability of users by wearing an ankle exoskeleton with changeable degree of freedom (DoF). The exoskeleton used in this paper allows one, two, or three DoF at the ankle, simulating different levels of mechanical complexity. This effect was evaluated in a pilot study consisting of six participants walking on a straight path. The results show that increasing the exoskeleton DoF results in an improvement of several metrics, including kinematics and gait parameters. The transition from 1 DoF to 2 DoF is shown to have a larger effect than the transition from 2 DoF to 3 DoF for an ankle exoskeleton. However, an exoskeleton with 3 DoF at the ankle featured the best results. Increasing the number of DoF resulted in stability values closer the values when walking without the exoskeleton, despite the added weight of the exoskeleton. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: This document is the non-revised version of the paper submitted to IEEE RAS EMBS BioRob 2024. The revised version has been submitted to IEEE Transactions on Medical Robotics and Bionics (IEEE T-MRB)

arXiv:2403.16953 [pdf, other]

Learning Symbolic and Subsymbolic Temporal Task Constraints from Bimanual Human Demonstrations

Authors: Christian Dreher, Tamim Asfour

Abstract: Learning task models of bimanual manipulation from human demonstration and their execution on a robot should take temporal constraints between actions into account. This includes constraints on (i) the symbolic level such as precedence relations or temporal overlap in the execution, and (ii) the subsymbolic level such as the duration of different actions, or their starting and end points in time.… ▽ More Learning task models of bimanual manipulation from human demonstration and their execution on a robot should take temporal constraints between actions into account. This includes constraints on (i) the symbolic level such as precedence relations or temporal overlap in the execution, and (ii) the subsymbolic level such as the duration of different actions, or their starting and end points in time. Such temporal constraints are crucial for temporal planning, reasoning, and the exact timing for the execution of bimanual actions on a bimanual robot. In our previous work, we addressed the learning of temporal task constraints on the symbolic level and demonstrated how a robot can leverage this knowledge to respond to failures during execution. In this work, we propose a novel model-driven approach for the combined learning of symbolic and subsymbolic temporal task constraints from multiple bimanual human demonstrations. Our main contributions are a subsymbolic foundation of a temporal task model that describes temporal nexuses of actions in the task based on distributions of temporal differences between semantic action keypoints, as well as a method based on fuzzy logic to derive symbolic temporal task constraints from this representation. This complements our previous work on learning comprehensive temporal task models by integrating symbolic and subsymbolic information based on a subsymbolic foundation, while still maintaining the symbolic expressiveness of our previous approach. We compare our proposed approach with our previous pure-symbolic approach and show that we can reproduce and even outperform it. Additionally, we show how the subsymbolic temporal task constraints can synchronize otherwise unimanual movement primitives for bimanual behavior on a humanoid robot. △ Less

Submitted 25 March, 2024; originally announced March 2024.

Comments: 8 pages, submitted to IROS 2024

arXiv:2403.16238 [pdf, other]

KITchen: A Real-World Benchmark and Dataset for 6D Object Pose Estimation in Kitchen Environments

Authors: Abdelrahman Younes, Tamim Asfour

Abstract: Despite the recent progress on 6D object pose estimation methods for robotic gras**, a substantial performance gap persists between the capabilities of these methods on existing datasets and their efficacy in real-world mobile manipulation tasks, particularly when robots rely solely on their monocular egocentric field of view (FOV). Existing real-world datasets primarily focus on table-top grasp… ▽ More Despite the recent progress on 6D object pose estimation methods for robotic gras**, a substantial performance gap persists between the capabilities of these methods on existing datasets and their efficacy in real-world mobile manipulation tasks, particularly when robots rely solely on their monocular egocentric field of view (FOV). Existing real-world datasets primarily focus on table-top gras** scenarios, where a robotic arm is placed in a fixed position and the objects are centralized within the FOV of fixed external camera(s). Assessing performance on such datasets may not accurately reflect the challenges encountered in everyday mobile manipulation tasks within kitchen environments such as retrieving objects from higher shelves, sinks, dishwashers, ovens, refrigerators, or microwaves. To address this gap, we present Kitchen, a novel benchmark designed specifically for estimating the 6D poses of objects located in diverse positions within kitchen settings. For this purpose, we recorded a comprehensive dataset comprising around 205k real-world RGBD images for 111 kitchen objects captured in two distinct kitchens, utilizing one humanoid robot with its egocentric perspectives. Subsequently, we developed a semi-automated annotation pipeline, to streamline the labeling process of such datasets, resulting in the generation of 2D object labels, 2D object segmentation masks, and 6D object poses with minimized human effort. The benchmark, the dataset, and the annotation pipeline are available at https://kitchen-dataset.github.io/KITchen. △ Less

Submitted 24 March, 2024; originally announced March 2024.

arXiv:2403.14000 [pdf, other]

Visual Imitation Learning of Task-Oriented Object Gras** and Rearrangement

Authors: Yichen Cai, Jianfeng Gao, Christoph Pohl, Tamim Asfour

Abstract: Task-oriented object gras** and rearrangement are critical skills for robots to accomplish different real-world manipulation tasks. However, they remain challenging due to partial observations of the objects and shape variations in categorical objects. In this paper, we propose the Multi-feature Implicit Model (MIMO), a novel object representation that encodes multiple spatial features between a… ▽ More Task-oriented object gras** and rearrangement are critical skills for robots to accomplish different real-world manipulation tasks. However, they remain challenging due to partial observations of the objects and shape variations in categorical objects. In this paper, we propose the Multi-feature Implicit Model (MIMO), a novel object representation that encodes multiple spatial features between a point and an object in an implicit neural field. Training such a model on multiple features ensures that it embeds the object shapes consistently in different aspects, thus improving its performance in object shape reconstruction from partial observation, shape similarity measure, and modeling spatial relations between objects. Based on MIMO, we propose a framework to learn task-oriented object gras** and rearrangement from single or multiple human demonstration videos. The evaluations in simulation show that our approach outperforms the state-of-the-art methods for multi- and single-view observations. Real-world experiments demonstrate the efficacy of our approach in one- and few-shot imitation learning of manipulation tasks. △ Less

Submitted 20 March, 2024; originally announced March 2024.

arXiv:2403.10672 [pdf, other]

Riemannian Flow Matching Policy for Robot Motion Learning

Authors: Max Braun, Noémie Jaquier, Leonel Rozo, Tamim Asfour

Abstract: We introduce Riemannian Flow Matching Policies (RFMP), a novel model for learning and synthesizing robot visuomotor policies. RFMP leverages the efficient training and inference capabilities of flow matching methods. By design, RFMP inherits the strengths of flow matching: the ability to encode high-dimensional multimodal distributions, commonly encountered in robotic tasks, and a very simple and… ▽ More We introduce Riemannian Flow Matching Policies (RFMP), a novel model for learning and synthesizing robot visuomotor policies. RFMP leverages the efficient training and inference capabilities of flow matching methods. By design, RFMP inherits the strengths of flow matching: the ability to encode high-dimensional multimodal distributions, commonly encountered in robotic tasks, and a very simple and fast inference process. We demonstrate the applicability of RFMP to both state-based and vision-conditioned robot motion policies. Notably, as the robot state resides on a Riemannian manifold, RFMP inherently incorporates geometric awareness, which is crucial for realistic robotic tasks. To evaluate RFMP, we conduct two proof-of-concept experiments, comparing its performance against Diffusion Policies. Although both approaches successfully learn the considered tasks, our results show that RFMP provides smoother action trajectories with significantly lower inference times. △ Less

Submitted 15 March, 2024; originally announced March 2024.

Comments: 8 pages, 5 figures, 4 tables

arXiv:2403.03270 [pdf, other]

Bi-KVIL: Keypoints-based Visual Imitation Learning of Bimanual Manipulation Tasks

Authors: Jianfeng Gao, Xiaoshu **, Franziska Krebs, Noémie Jaquier, Tamim Asfour

Abstract: Visual imitation learning has achieved impressive progress in learning unimanual manipulation tasks from a small set of visual observations, thanks to the latest advances in computer vision. However, learning bimanual coordination strategies and complex object relations from bimanual visual demonstrations, as well as generalizing them to categorical objects in novel cluttered scenes remain unsolve… ▽ More Visual imitation learning has achieved impressive progress in learning unimanual manipulation tasks from a small set of visual observations, thanks to the latest advances in computer vision. However, learning bimanual coordination strategies and complex object relations from bimanual visual demonstrations, as well as generalizing them to categorical objects in novel cluttered scenes remain unsolved challenges. In this paper, we extend our previous work on keypoints-based visual imitation learning (\mbox{K-VIL})~\cite{gao_kvil_2023} to bimanual manipulation tasks. The proposed Bi-KVIL jointly extracts so-called \emph{Hybrid Master-Slave Relationships} (HMSR) among objects and hands, bimanual coordination strategies, and sub-symbolic task representations. Our bimanual task representation is object-centric, embodiment-independent, and viewpoint-invariant, thus generalizing well to categorical objects in novel scenes. We evaluate our approach in various real-world applications, showcasing its ability to learn fine-grained bimanual manipulation tasks from a small number of human demonstration videos. Videos and source code are available at https://sites.google.com/view/bi-kvil. △ Less

Submitted 22 March, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

arXiv:2402.10778 [pdf, other]

AutoGPT+P: Affordance-based Task Planning with Large Language Models

Authors: Timo Birr, Christoph Pohl, Abdelrahman Younes, Tamim Asfour

Abstract: Recent advances in task planning leverage Large Language Models (LLMs) to improve generalizability by combining such models with classical planning algorithms to address their inherent limitations in reasoning capabilities. However, these approaches face the challenge of dynamically capturing the initial state of the task planning problem. To alleviate this issue, we propose AutoGPT+P, a system th… ▽ More Recent advances in task planning leverage Large Language Models (LLMs) to improve generalizability by combining such models with classical planning algorithms to address their inherent limitations in reasoning capabilities. However, these approaches face the challenge of dynamically capturing the initial state of the task planning problem. To alleviate this issue, we propose AutoGPT+P, a system that combines an affordance-based scene representation with a planning system. Affordances encompass the action possibilities of an agent on the environment and objects present in it. Thus, deriving the planning domain from an affordance-based scene representation allows symbolic planning with arbitrary objects. AutoGPT+P leverages this representation to derive and execute a plan for a task specified by the user in natural language. In addition to solving planning tasks under a closed-world assumption, AutoGPT+P can also handle planning with incomplete information, e. g., tasks with missing objects by exploring the scene, suggesting alternatives, or providing a partial plan. The affordance-based scene representation combines object detection with an automatically generated object-affordance-map** using ChatGPT. The core planning tool extends existing work by automatically correcting semantic and syntactic errors. Our approach achieves a success rate of 98%, surpassing the current 81% success rate of the current state-of-the-art LLM-based planning method SayCan on the SayCan instruction set. Furthermore, we evaluated our approach on our newly created dataset with 150 scenarios covering a wide range of complex tasks with missing objects, achieving a success rate of 79% on our dataset. The dataset and the code are publicly available at https://git.h2t.iar.kit.edu/birr/autogpt-p-standalone. △ Less

Submitted 16 February, 2024; originally announced February 2024.

Comments: 12 pages, 16 pages including references and appendix, 5 figures

ACM Class: I.2

arXiv:2401.16899 [pdf, other]

MAkEable: Memory-centered and Affordance-based Task Execution Framework for Transferable Mobile Manipulation Skills

Authors: Christoph Pohl, Fabian Reister, Fabian Peller-Konrad, Tamim Asfour

Abstract: To perform versatile mobile manipulation tasks in human-centered environments, the ability to efficiently transfer learned tasks and experiences from one robot to another or across different environments is key. In this paper, we present MAkEable, a versatile uni- and multi-manual mobile manipulation framework that facilitates the transfer of capabilities and knowledge across different tasks, envi… ▽ More To perform versatile mobile manipulation tasks in human-centered environments, the ability to efficiently transfer learned tasks and experiences from one robot to another or across different environments is key. In this paper, we present MAkEable, a versatile uni- and multi-manual mobile manipulation framework that facilitates the transfer of capabilities and knowledge across different tasks, environments, and robots. Our framework integrates an affordance-based task description into the memory-centric cognitive architecture of the ARMAR humanoid robot family, which supports the sharing of experiences and demonstrations for transfer learning. By representing mobile manipulation actions through affordances, i.e., interaction possibilities of the robot with its environment, we provide a unifying framework for the autonomous uni- and multi-manual manipulation of known and unknown objects in various environments. We demonstrate the applicability of the framework in real-world experiments for multiple robots, tasks, and environments. This includes gras** known and unknown objects, object placing, bimanual object gras**, memory-enabled skill transfer in a drawer opening scenario across two different humanoid robots, and a pouring task learned from human demonstration. △ Less

Submitted 21 March, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

arXiv:2312.08820 [pdf, other]

How to Raise a Robot -- A Case for Neuro-Symbolic AI in Constrained Task Planning for Humanoid Assistive Robots

Authors: Niklas Hemken, Florian Jacob, Fabian Peller-Konrad, Rainer Kartmann, Tamim Asfour, Hannes Hartenstein

Abstract: Humanoid robots will be able to assist humans in their daily life, in particular due to their versatile action capabilities. However, while these robots need a certain degree of autonomy to learn and explore, they also should respect various constraints, for access control and beyond. We explore the novel field of incorporating privacy, security, and access control constraints with robot task plan… ▽ More Humanoid robots will be able to assist humans in their daily life, in particular due to their versatile action capabilities. However, while these robots need a certain degree of autonomy to learn and explore, they also should respect various constraints, for access control and beyond. We explore the novel field of incorporating privacy, security, and access control constraints with robot task planning approaches. We report preliminary results on the classical symbolic approach, deep-learned neural networks, and modern ideas using large language models as knowledge base. From analyzing their trade-offs, we conclude that a hybrid approach is necessary, and thereby present a new use case for the emerging field of neuro-symbolic artificial intelligence. △ Less

Submitted 27 December, 2023; v1 submitted 14 December, 2023; originally announced December 2023.

Comments: 8 pages, follow-up extended version of our SACMAT 2023 poster abstract: "Poster: How to Raise a Robot - Beyond Access Control Constraints in Assistive Humanoid Robots" https://dl.acm.org/doi/abs/10.1145/3589608.3595078

arXiv:2312.08030 [pdf, other]

Incremental Learning of Full-Pose Via-Point Movement Primitives on Riemannian Manifolds

Authors: Tilman Daab, Noémie Jaquier, Christian Dreher, Andre Meixner, Franziska Krebs, Tamim Asfour

Abstract: Movement primitives (MPs) are compact representations of robot skills that can be learned from demonstrations and combined into complex behaviors. However, merely equip** robots with a fixed set of innate MPs is insufficient to deploy them in dynamic and unpredictable environments. Instead, the full potential of MPs remains to be attained via adaptable, large-scale MP libraries. In this paper, w… ▽ More Movement primitives (MPs) are compact representations of robot skills that can be learned from demonstrations and combined into complex behaviors. However, merely equip** robots with a fixed set of innate MPs is insufficient to deploy them in dynamic and unpredictable environments. Instead, the full potential of MPs remains to be attained via adaptable, large-scale MP libraries. In this paper, we propose a set of seven fundamental operations to incrementally learn, improve, and re-organize MP libraries. To showcase their applicability, we provide explicit formulations of the spatial operations for libraries composed of Via-Point Movement Primitives (VMPs). By building on Riemannian manifold theory, our approach enables the incremental learning of all parameters of position and orientation VMPs within a library. Moreover, our approach stores a fixed number of parameters, thus complying with the essential principles of incremental learning. We evaluate our approach to incrementally learn a VMP library from motion capture data provided sequentially. △ Less

Submitted 13 December, 2023; originally announced December 2023.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible. 7 pages, 7 figures and 2 tables

arXiv:2311.18044 [pdf, other]

Transfer Learning in Robotics: An Upcoming Breakthrough? A Review of Promises and Challenges

Authors: Noémie Jaquier, Michael C. Welle, Andrej Gams, Kunpeng Yao, Bernardo Fichera, Aude Billard, Aleš Ude, Tamim Asfour, Danica Kragic

Abstract: Transfer learning is a conceptually-enticing paradigm in pursuit of truly intelligent embodied agents. The core concept -- reusing prior knowledge to learn in and from novel situations -- is successfully leveraged by humans to handle novel situations. In recent years, transfer learning has received renewed interest from the community from different perspectives, including imitation learning, domai… ▽ More Transfer learning is a conceptually-enticing paradigm in pursuit of truly intelligent embodied agents. The core concept -- reusing prior knowledge to learn in and from novel situations -- is successfully leveraged by humans to handle novel situations. In recent years, transfer learning has received renewed interest from the community from different perspectives, including imitation learning, domain adaptation, and transfer of experience from simulation to the real world, among others. In this paper, we unify the concept of transfer learning in robotics and provide the first taxonomy of its kind considering the key concepts of robot, task, and environment. Through a review of the promises and challenges in the field, we identify the need of transferring at different abstraction levels, the need of quantifying the transfer gap and the quality of transfer, as well as the dangers of negative transfer. Via this position paper, we hope to channel the effort of the community towards the most significant roadblocks to realize the full potential of transfer learning in robotics. △ Less

Submitted 2 May, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

Comments: 21 pages, 7 figures

arXiv:2311.02907 [pdf, other]

Reinforcement Learning for Safety Testing: Lessons from A Mobile Robot Case Study

Authors: Tom P. Huck, Martin Kaiser, Constantin Cronrath, Bengt Lennartson, Torsten Kröger, Tamim Asfour

Abstract: Safety-critical robot systems need thorough testing to expose design flaws and software bugs which could endanger humans. Testing in simulation is becoming increasingly popular, as it can be applied early in the development process and does not endanger any real-world operators. However, not all safety-critical flaws become immediately observable in simulation. Some may only become observable unde… ▽ More Safety-critical robot systems need thorough testing to expose design flaws and software bugs which could endanger humans. Testing in simulation is becoming increasingly popular, as it can be applied early in the development process and does not endanger any real-world operators. However, not all safety-critical flaws become immediately observable in simulation. Some may only become observable under certain critical conditions. If these conditions are not covered, safety flaws may remain undetected. Creating critical tests is therefore crucial. In recent years, there has been a trend towards using Reinforcement Learning (RL) for this purpose. Guided by domain-specific reward functions, RL algorithms are used to learn critical test strategies. This paper presents a case study in which the collision avoidance behavior of a mobile robot is subjected to RL-based testing. The study confirms prior research which shows that RL can be an effective testing tool. However, the study also highlights certain challenges associated with RL-based testing, namely (i) a possible lack of diversity in test conditions and (ii) the phenomenon of reward hacking where the RL agent behaves in undesired ways due to a misalignment of reward and test specification. The challenges are illustrated with data and examples from the experiments, and possible mitigation strategies are discussed. △ Less

Submitted 6 November, 2023; originally announced November 2023.

arXiv:2310.07902 [pdf, other]

Unraveling the Single Tangent Space Fallacy: An Analysis and Clarification for Applying Riemannian Geometry in Robot Learning

Authors: Noémie Jaquier, Leonel Rozo, Tamim Asfour

Abstract: In the realm of robotics, numerous downstream robotics tasks leverage machine learning methods for processing, modeling, or synthesizing data. Often, this data comprises variables that inherently carry geometric constraints, such as the unit-norm condition of quaternions representing rigid-body orientations or the positive definiteness of stiffness and manipulability ellipsoids. Handling such geom… ▽ More In the realm of robotics, numerous downstream robotics tasks leverage machine learning methods for processing, modeling, or synthesizing data. Often, this data comprises variables that inherently carry geometric constraints, such as the unit-norm condition of quaternions representing rigid-body orientations or the positive definiteness of stiffness and manipulability ellipsoids. Handling such geometric constraints effectively requires the incorporation of tools from differential geometry into the formulation of machine learning methods. In this context, Riemannian manifolds emerge as a powerful mathematical framework to handle such geometric constraints. Nevertheless, their recent adoption in robot learning has been largely characterized by a mathematically-flawed simplification, hereinafter referred to as the "single tangent space fallacy". This approach involves merely projecting the data of interest onto a single tangent (Euclidean) space, over which an off-the-shelf learning algorithm is applied. This paper provides a theoretical elucidation of various misconceptions surrounding this approach and offers experimental evidence of its shortcomings. Finally, it presents valuable insights to promote best practices when employing Riemannian geometry within robot learning applications. △ Less

Submitted 29 April, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

Comments: Accepted for publication in ICRA'24. 8 pages, 5 figures, 3 tables

arXiv:2309.04316 [pdf, other]

Incremental Learning of Humanoid Robot Behavior from Natural Interaction and Large Language Models

Authors: Leonard Bärmann, Rainer Kartmann, Fabian Peller-Konrad, Jan Niehues, Alex Waibel, Tamim Asfour

Abstract: Natural-language dialog is key for intuitive human-robot interaction. It can be used not only to express humans' intents, but also to communicate instructions for improvement if a robot does not understand a command correctly. Of great importance is to endow robots with the ability to learn from such interaction experience in an incremental way to allow them to improve their behaviors or avoid mis… ▽ More Natural-language dialog is key for intuitive human-robot interaction. It can be used not only to express humans' intents, but also to communicate instructions for improvement if a robot does not understand a command correctly. Of great importance is to endow robots with the ability to learn from such interaction experience in an incremental way to allow them to improve their behaviors or avoid mistakes in the future. In this paper, we propose a system to achieve incremental learning of complex behavior from natural interaction, and demonstrate its implementation on a humanoid robot. Building on recent advances, we present a system that deploys Large Language Models (LLMs) for high-level orchestration of the robot's behavior, based on the idea of enabling the LLM to generate Python statements in an interactive console to invoke both robot perception and action. The interaction loop is closed by feeding back human instructions, environment observations, and execution results to the LLM, thus informing the generation of the next statement. Specifically, we introduce incremental prompt learning, which enables the system to interactively learn from its mistakes. For that purpose, the LLM can call another LLM responsible for code-level improvements of the current interaction based on human feedback. The improved interaction is then saved in the robot's memory, and thus retrieved on similar requests. We integrate the system in the robot cognitive architecture of the humanoid robot ARMAR-6 and evaluate our methods both quantitatively (in simulation) and qualitatively (in simulation and real-world) by demonstrating generalized incrementally-learned knowledge. △ Less

Submitted 16 May, 2024; v1 submitted 8 September, 2023; originally announced September 2023.

Comments: This version (v3) adds further quantitative evaluation and many improvements. v2 was presented at the Workshop on Language and Robot Learning (LangRob) at the Conference on Robot Learning (CoRL) 2023. Supplementary video available at https://youtu.be/y5O2mRGtsLM

arXiv:2308.14068 [pdf, other]

Uncertainty-aware Risk Assessment of Robotic Systems via Importance Sampling

Authors: Woo-Jeong Baek, Tom P. Huck, Joschka Haas, Jonas Lewandrowski, Tamim Asfour, Torsten Kröger

Abstract: In this paper, we introduce a probabilistic approach to risk assessment of robot systems by focusing on the impact of uncertainties. While various approaches to identifying systematic hazards (e.g., bugs, design flaws, etc.) can be found in current literature, little attention has been devoted to evaluating risks in robot systems in a probabilistic manner. Existing methods rely on discrete notions… ▽ More In this paper, we introduce a probabilistic approach to risk assessment of robot systems by focusing on the impact of uncertainties. While various approaches to identifying systematic hazards (e.g., bugs, design flaws, etc.) can be found in current literature, little attention has been devoted to evaluating risks in robot systems in a probabilistic manner. Existing methods rely on discrete notions for dangerous events and assume that the consequences of these can be described by simple logical operations. In this work, we consider measurement uncertainties as one main contributor to the evolvement of risks. Specifically, we study the impact of temporal and spatial uncertainties on the occurrence probability of dangerous failures, thereby deriving an approach for an uncertainty-aware risk assessment. Secondly, we introduce a method to improve the statistical significance of our results: While the rare occurrence of hazardous events makes it challenging to draw conclusions with reliable accuracy, we show that importance sampling -- a technique that successively generates samples in regions with sparse probability densities -- allows for overcoming this issue. We demonstrate the validity of our novel uncertainty-aware risk assessment method in three simulation scenarios from the domain of human-robot collaboration. Finally, we show how the results can be used to evaluate arbitrary safety limits of robot systems. △ Less

Submitted 27 August, 2023; originally announced August 2023.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2307.15440 [pdf, other]

On the Design of Region-Avoiding Metrics for Collision-Safe Motion Generation on Riemannian Manifolds

Authors: Holger Klein, Noémie Jaquier, Andre Meixner, Tamim Asfour

Abstract: The generation of energy-efficient and dynamic-aware robot motions that satisfy constraints such as joint limits, self-collisions, and collisions with the environment remains a challenge. In this context, Riemannian geometry offers promising solutions by identifying robot motions with geodesics on the so-called configuration space manifold. While this manifold naturally considers the intrinsic rob… ▽ More The generation of energy-efficient and dynamic-aware robot motions that satisfy constraints such as joint limits, self-collisions, and collisions with the environment remains a challenge. In this context, Riemannian geometry offers promising solutions by identifying robot motions with geodesics on the so-called configuration space manifold. While this manifold naturally considers the intrinsic robot dynamics, constraints such as joint limits, self-collisions, and collisions with the environment remain overlooked. In this paper, we propose a modification of the Riemannian metric of the configuration space manifold allowing for the generation of robot motions as geodesics that efficiently avoid given regions. We introduce a class of Riemannian metrics based on barrier functions that guarantee strict region avoidance by systematically generating accelerations away from no-go regions in joint and task space. We evaluate the proposed Riemannian metric to generate energy-efficient, dynamic-aware, and collision-free motions of a humanoid robot as geodesics and sequences thereof. △ Less

Submitted 28 July, 2023; originally announced July 2023.

Comments: Accepted for publication in IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS) 2023. 8 pages, 7 figures, accompanying video at https://youtu.be/qT43XgYOlU0

arXiv:2305.09551 [pdf, other]

doi 10.3389/frobt.2023.1151303

Interactive and Incremental Learning of Spatial Object Relations from Human Demonstrations

Authors: Rainer Kartmann, Tamim Asfour

Abstract: Humans use semantic concepts such as spatial relations between objects to describe scenes and communicate tasks such as "Put the tea to the right of the cup" or "Move the plate between the fork and the spoon." Just as children, assistive robots must be able to learn the sub-symbolic meaning of such concepts from human demonstrations and instructions. We address the problem of incrementally learnin… ▽ More Humans use semantic concepts such as spatial relations between objects to describe scenes and communicate tasks such as "Put the tea to the right of the cup" or "Move the plate between the fork and the spoon." Just as children, assistive robots must be able to learn the sub-symbolic meaning of such concepts from human demonstrations and instructions. We address the problem of incrementally learning geometric models of spatial relations from few demonstrations collected online during interaction with a human. Such models enable a robot to manipulate objects in order to fulfill desired spatial relations specified by verbal instructions. At the start, we assume the robot has no geometric model of spatial relations. Given a task as above, the robot requests the user to demonstrate the task once in order to create a model from a single demonstration, leveraging cylindrical probability distribution as generative representation of spatial relations. We show how this model can be updated incrementally with each new demonstration without access to past examples in a sample-efficient way using incremental maximum likelihood estimation, and demonstrate the approach on a real humanoid robot. △ Less

Submitted 16 May, 2023; originally announced May 2023.

Comments: Accepted for publication in Frontiers in Robotics and AI, Sec. Robot Learning and Evolution

arXiv:2210.09678 [pdf, other]

Virtual Reality via Object Pose Estimation and Active Learning: Realizing Telepresence Robots with Aerial Manipulation Capabilities

Authors: Jongseok Lee, Ribin Balachandran, Konstantin Kondak, Andre Coelho, Marco De Stefano, Matthias Humt, Jianxiang Feng, Tamim Asfour, Rudolph Triebel

Abstract: This article presents a novel telepresence system for advancing aerial manipulation in dynamic and unstructured environments. The proposed system not only features a haptic device, but also a virtual reality (VR) interface that provides real-time 3D displays of the robot's workspace as well as a haptic guidance to its remotely located operator. To realize this, multiple sensors namely a LiDAR, cam… ▽ More This article presents a novel telepresence system for advancing aerial manipulation in dynamic and unstructured environments. The proposed system not only features a haptic device, but also a virtual reality (VR) interface that provides real-time 3D displays of the robot's workspace as well as a haptic guidance to its remotely located operator. To realize this, multiple sensors namely a LiDAR, cameras and IMUs are utilized. For processing of the acquired sensory data, pose estimation pipelines are devised for industrial objects of both known and unknown geometries. We further propose an active learning pipeline in order to increase the sample efficiency of a pipeline component that relies on Deep Neural Networks (DNNs) based object detection. All these algorithms jointly address various challenges encountered during the execution of perception tasks in industrial scenarios. In the experiments, exhaustive ablation studies are provided to validate the proposed pipelines. Methodologically, these results commonly suggest how an awareness of the algorithms' own failures and uncertainty (`introspection') can be used tackle the encountered problems. Moreover, outdoor experiments are conducted to evaluate the effectiveness of the overall system in enhancing aerial manipulation capabilities. In particular, with flight campaigns over days and nights, from spring to winter, and with different users and locations, we demonstrate over 70 robust executions of pick-and-place, force application and peg-in-hole tasks with the DLR cable-Suspended Aerial Manipulator (SAM). As a result, we show the viability of the proposed system in future industrial applications. △ Less

Submitted 10 February, 2023; v1 submitted 18 October, 2022; originally announced October 2022.

Comments: Accepted to Field Robotics

arXiv:2210.01672 [pdf, other]

Bringing motion taxonomies to continuous domains via GPLVM on hyperbolic manifolds

Authors: Noémie Jaquier, Leonel Rozo, Miguel González-Duque, Viacheslav Borovitskiy, Tamim Asfour

Abstract: Human motion taxonomies serve as high-level hierarchical abstractions that classify how humans move and interact with their environment. They have proven useful to analyse grasps, manipulation skills, and whole-body support poses. Despite substantial efforts devoted to design their hierarchy and underlying categories, their use remains limited. This may be attributed to the lack of computational m… ▽ More Human motion taxonomies serve as high-level hierarchical abstractions that classify how humans move and interact with their environment. They have proven useful to analyse grasps, manipulation skills, and whole-body support poses. Despite substantial efforts devoted to design their hierarchy and underlying categories, their use remains limited. This may be attributed to the lack of computational models that fill the gap between the discrete hierarchical structure of the taxonomy and the high-dimensional heterogeneous data associated to its categories. To overcome this problem, we propose to model taxonomy data via hyperbolic embeddings that capture the associated hierarchical structure. We achieve this by formulating a novel Gaussian process hyperbolic latent variable model that incorporates the taxonomy structure through graph-based priors on the latent space and distance-preserving back constraints. We validate our model on three different human motion taxonomies to learn hyperbolic embeddings that faithfully preserve the original graph structure. We show that our model properly encodes unseen data from existing or new taxonomy categories, and outperforms its Euclidean and VAE-based counterparts. Finally, through proof-of-concept experiments, we show that our model may be used to generate realistic trajectories between the learned embeddings. △ Less

Submitted 4 June, 2024; v1 submitted 4 October, 2022; originally announced October 2022.

Comments: Intl. Conference on Machine Learning (ICML), 2024

arXiv:2209.15539 [pdf, other]

Riemannian geometry as a unifying theory for robot motion learning and control

Authors: Noémie Jaquier, Tamim Asfour

Abstract: Riemannian geometry is a mathematical field which has been the cornerstone of revolutionary scientific discoveries such as the theory of general relativity. Despite early uses in robot design and recent applications for exploiting data with specific geometries, it mostly remains overlooked in robotics. With this blue sky paper, we argue that Riemannian geometry provides the most suitable tools to… ▽ More Riemannian geometry is a mathematical field which has been the cornerstone of revolutionary scientific discoveries such as the theory of general relativity. Despite early uses in robot design and recent applications for exploiting data with specific geometries, it mostly remains overlooked in robotics. With this blue sky paper, we argue that Riemannian geometry provides the most suitable tools to analyze and generate well-coordinated, energy-efficient motions of robots with many degrees of freedom. Via preliminary solutions and novel research directions, we discuss how Riemannian geometry may be leveraged to design and combine physically-meaningful synergies for robotics, and how this theory also opens the door to coupling motion synergies with perceptual inputs. △ Less

Submitted 30 September, 2022; originally announced September 2022.

Comments: Published as a blue sky paper at ISRR'22. 8 pages, 2 figures. Video at https://youtu.be/XblzcKRRITE

arXiv:2209.03277 [pdf, other]

doi 10.1109/TRO.2023.3286074

K-VIL: Keypoints-based Visual Imitation Learning

Authors: Jianfeng Gao, Zhi Tao, Noémie Jaquier, Tamim Asfour

Abstract: Visual imitation learning provides efficient and intuitive solutions for robotic systems to acquire novel manipulation skills. However, simultaneously learning geometric task constraints and control policies from visual inputs alone remains a challenging problem. In this paper, we propose an approach for keypoint-based visual imitation (K-VIL) that automatically extracts sparse, object-centric, an… ▽ More Visual imitation learning provides efficient and intuitive solutions for robotic systems to acquire novel manipulation skills. However, simultaneously learning geometric task constraints and control policies from visual inputs alone remains a challenging problem. In this paper, we propose an approach for keypoint-based visual imitation (K-VIL) that automatically extracts sparse, object-centric, and embodiment-independent task representations from a small number of human demonstration videos. The task representation is composed of keypoint-based geometric constraints on principal manifolds, their associated local frames, and the movement primitives that are then needed for the task execution. Our approach is capable of extracting such task representations from a single demonstration video, and of incrementally updating them when new demonstrations become available. To reproduce manipulation skills using the learned set of prioritized geometric constraints in novel scenes, we introduce a novel keypoint-based admittance controller. We evaluate our approach in several real-world applications, showcasing its ability to deal with cluttered scenes, viewpoint mismatch, new instances of categorical objects, and large object pose and shape variations, as well as its efficiency and robustness in both one-shot and few-shot imitation learning settings. Videos and source code are available at https://sites.google.com/view/k-vil. △ Less

Submitted 25 July, 2023; v1 submitted 7 September, 2022; originally announced September 2022.

Journal ref: IEEE Transactions on Robotics, (2023) 1-21

arXiv:2208.10552 [pdf, other]

SpeedFolding: Learning Efficient Bimanual Folding of Garments

Authors: Yahav Avigal, Lars Berscheid, Tamim Asfour, Torsten Kröger, Ken Goldberg

Abstract: Folding garments reliably and efficiently is a long standing challenge in robotic manipulation due to the complex dynamics and high dimensional configuration space of garments. An intuitive approach is to initially manipulate the garment to a canonical smooth configuration before folding. In this work, we develop SpeedFolding, a reliable and efficient bimanual system, which given user-defined inst… ▽ More Folding garments reliably and efficiently is a long standing challenge in robotic manipulation due to the complex dynamics and high dimensional configuration space of garments. An intuitive approach is to initially manipulate the garment to a canonical smooth configuration before folding. In this work, we develop SpeedFolding, a reliable and efficient bimanual system, which given user-defined instructions as folding lines, manipulates an initially crumpled garment to (1) a smoothed and (2) a folded configuration. Our primary contribution is a novel neural network architecture that is able to predict pairs of gripper poses to parameterize a diverse set of bimanual action primitives. After learning from 4300 human-annotated and self-supervised actions, the robot is able to fold garments from a random initial configuration in under 120s on average with a success rate of 93%. Real-world experiments show that the system is able to generalize to unseen garments of different color, shape, and stiffness. While prior work achieved 3-6 Folds Per Hour (FPH), SpeedFolding achieves 30-40 FPH. △ Less

Submitted 9 September, 2022; v1 submitted 22 August, 2022; originally announced August 2022.

Comments: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2022

arXiv:2208.01372 [pdf, other]

A Riemannian Take on Human Motion Analysis and Retargeting

Authors: Holger Klein, Noémie Jaquier, Andre Meixner, Tamim Asfour

Abstract: Dynamic motions of humans and robots are widely driven by posture-dependent nonlinear interactions between their degrees of freedom. However, these dynamical effects remain mostly overlooked when studying the mechanisms of human movement generation. Inspired by recent works, we hypothesize that human motions are planned as sequences of geodesic synergies, and thus correspond to coordinated joint m… ▽ More Dynamic motions of humans and robots are widely driven by posture-dependent nonlinear interactions between their degrees of freedom. However, these dynamical effects remain mostly overlooked when studying the mechanisms of human movement generation. Inspired by recent works, we hypothesize that human motions are planned as sequences of geodesic synergies, and thus correspond to coordinated joint movements achieved with piecewise minimum energy. The underlying computational model is built on Riemannian geometry to account for the inertial characteristics of the body. Through the analysis of various human arm motions, we find that our model segments motions into geodesic synergies, and successfully predicts observed arm postures, hand trajectories, as well as their respective velocity profiles. Moreover, we show that our analysis can further be exploited to transfer arm motions to robots by reproducing individual human synergies as geodesic paths in the robot configuration space. △ Less

Submitted 2 August, 2022; originally announced August 2022.

Comments: Accepted for publication in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2022

arXiv:2207.02556 [pdf, other]

Deep Learning Approaches to Grasp Synthesis: A Review

Authors: Rhys Newbury, Morris Gu, Lachlan Chumbley, Arsalan Mousavian, Clemens Eppner, Jürgen Leitner, Jeannette Bohg, Antonio Morales, Tamim Asfour, Danica Kragic, Dieter Fox, Akansel Cosgun

Abstract: Gras** is the process of picking up an object by applying forces and torques at a set of contacts. Recent advances in deep-learning methods have allowed rapid progress in robotic object gras**. In this systematic review, we surveyed the publications over the last decade, with a particular interest in gras** an object using all 6 degrees of freedom of the end-effector pose. Our review found f… ▽ More Gras** is the process of picking up an object by applying forces and torques at a set of contacts. Recent advances in deep-learning methods have allowed rapid progress in robotic object gras**. In this systematic review, we surveyed the publications over the last decade, with a particular interest in gras** an object using all 6 degrees of freedom of the end-effector pose. Our review found four common methodologies for robotic gras**: sampling-based approaches, direct regression, reinforcement learning, and exemplar approaches. Additionally, we found two `supporting methods` around gras** that use deep-learning to support the gras** process, shape approximation, and affordances. We have distilled the publications found in this systematic review (85 papers) into ten key takeaways we consider crucial for future robotic gras** and manipulation research. An online version of the survey is available at https://rhys-newbury.github.io/projects/6dof/ △ Less

Submitted 4 May, 2023; v1 submitted 6 July, 2022; originally announced July 2022.

Comments: 20 pages. Accepted to T-RO

arXiv:2206.02241 [pdf, other]

doi 10.1016/j.robot.2023.104415

A Memory System of a Robot Cognitive Architecture and its Implementation in ArmarX

Authors: Fabian Peller-Konrad, Rainer Kartmann, Christian R. G. Dreher, Andre Meixner, Fabian Reister, Markus Grotz, Tamim Asfour

Abstract: Cognitive agents such as humans and robots perceive their environment through an abundance of sensors producing streams of data that need to be processed to generate intelligent behavior. A key question of cognition-enabled and AI-driven robotics is how to organize and manage knowledge efficiently in a cognitive robot control architecture. We argue, that memory is a central active component of suc… ▽ More Cognitive agents such as humans and robots perceive their environment through an abundance of sensors producing streams of data that need to be processed to generate intelligent behavior. A key question of cognition-enabled and AI-driven robotics is how to organize and manage knowledge efficiently in a cognitive robot control architecture. We argue, that memory is a central active component of such architectures that mediates between semantic and sensorimotor representations, orchestrates the flow of data streams and events between different processes and provides the components of a cognitive architecture with data-driven services for the abstraction of semantics from sensorimotor data, the parametrization of symbolic plans for execution and prediction of action effects. Based on related work, and the experience gained in develo** our ARMAR humanoid robot systems, we identified conceptual and technical requirements of a memory system as central component of cognitive robot control architecture that facilitate the realization of high-level cognitive abilities such as explaining, reasoning, prospection, simulation and augmentation. Conceptually, a memory should be active, support multi-modal data representations, associate knowledge, be introspective, and have an inherently episodic structure. Technically, the memory should support a distributed design, be access-efficient and capable of long-term data storage. We introduce the memory system for our cognitive robot control architecture and its implementation in the robot software framework ArmarX. We evaluate the efficiency of the memory system with respect to transfer speeds, compression, reproduction and prediction capabilities. △ Less

Submitted 31 January, 2023; v1 submitted 5 June, 2022; originally announced June 2022.

Comments: 35 pages, 19 figures, submitted to RAS

Report number: ROBOT: 104415

Journal ref: Robotics and Autonomous Systems (2023)

arXiv:2206.00559 [pdf, other]

Learning to Sequence and Blend Robot Skills via Differentiable Optimization

Authors: Noémie Jaquier, You Zhou, Julia Starke, Tamim Asfour

Abstract: In contrast to humans and animals who naturally execute seamless motions, learning and smoothly executing sequences of actions remains a challenge in robotics. This paper introduces a novel skill-agnostic framework that learns to sequence and blend skills based on differentiable optimization. Our approach encodes sequences of previously-defined skills as quadratic programs (QP), whose parameters d… ▽ More In contrast to humans and animals who naturally execute seamless motions, learning and smoothly executing sequences of actions remains a challenge in robotics. This paper introduces a novel skill-agnostic framework that learns to sequence and blend skills based on differentiable optimization. Our approach encodes sequences of previously-defined skills as quadratic programs (QP), whose parameters determine the relative importance of skills along the task. Seamless skill sequences are then learned from demonstrations by exploiting differentiable optimization layers and a tailored loss formulated from the QP optimality conditions. Via the use of differentiable optimization, our work offers novel perspectives on multitask control. We validate our approach in a pick-and-place scenario with planar robots, a pouring experiment with a real humanoid robot, and a bimanual swee** task with a human model. △ Less

Submitted 1 June, 2022; originally announced June 2022.

Comments: Accepted for publication in IEEE Robotics and Automation Letters. Video: https://youtu.be/00NXvTpL-YU, code: https://github.com/NoemieJaquier/sequencing-blending/

arXiv:2111.01460 [pdf, other]

Geometry-aware Bayesian Optimization in Robotics using Riemannian Matérn Kernels

Authors: Noémie Jaquier, Viacheslav Borovitskiy, Andrei Smolensky, Alexander Terenin, Tamim Asfour, Leonel Rozo

Abstract: Bayesian optimization is a data-efficient technique which can be used for control parameter tuning, parametric policy adaptation, and structure design in robotics. Many of these problems require optimization of functions defined on non-Euclidean domains like spheres, rotation groups, or spaces of positive-definite matrices. To do so, one must place a Gaussian process prior, or equivalently define… ▽ More Bayesian optimization is a data-efficient technique which can be used for control parameter tuning, parametric policy adaptation, and structure design in robotics. Many of these problems require optimization of functions defined on non-Euclidean domains like spheres, rotation groups, or spaces of positive-definite matrices. To do so, one must place a Gaussian process prior, or equivalently define a kernel, on the space of interest. Effective kernels typically reflect the geometry of the spaces they are defined on, but designing them is generally non-trivial. Recent work on the Riemannian Matérn kernels, based on stochastic partial differential equations and spectral theory of the Laplace-Beltrami operator, offers promising avenues towards constructing such geometry-aware kernels. In this paper, we study techniques for implementing these kernels on manifolds of interest in robotics, demonstrate their performance on a set of artificial benchmark functions, and illustrate geometry-aware Bayesian optimization for a variety of robotic applications, covering orientation control, manipulability optimization, and motion planning, while showing its improved performance. △ Less

Submitted 17 March, 2023; v1 submitted 2 November, 2021; originally announced November 2021.

Comments: Source code: https://github.com/NoemieJaquier/MaternGaBO, Video: https://youtu.be/6awfFRqP7wA

Journal ref: Conference on Robot Learning, 2021

arXiv:2103.02932 [pdf, other]

Graph-based Task-specific Prediction Models for Interactions between Deformable and Rigid Objects

Authors: Zehang Weng, Fabian Paus, Anastasiia Varava, Hang Yin, Tamim Asfour, Danica Kragic

Abstract: Capturing scene dynamics and predicting the future scene state is challenging but essential for robotic manipulation tasks, especially when the scene contains both rigid and deformable objects. In this work, we contribute a simulation environment and generate a novel dataset for task-specific manipulation, involving interactions between rigid objects and a deformable bag. The dataset incorporates… ▽ More Capturing scene dynamics and predicting the future scene state is challenging but essential for robotic manipulation tasks, especially when the scene contains both rigid and deformable objects. In this work, we contribute a simulation environment and generate a novel dataset for task-specific manipulation, involving interactions between rigid objects and a deformable bag. The dataset incorporates a rich variety of scenarios including different object sizes, object numbers and manipulation actions. We approach dynamics learning by proposing an object-centric graph representation and two modules which are Active Prediction Module (APM) and Position Prediction Module (PPM) based on graph neural networks with an encode-process-decode architecture. At the inference stage, we build a two-stage model based on the learned modules for single time step prediction. We combine modules with different prediction horizons into a mixed-horizon model which addresses long-term prediction. In an ablation study, we show the benefits of the two-stage model for single time step prediction and the effectiveness of the mixed-horizon model for long-term prediction tasks. Supplementary material is available at https://github.com/wengzehang/deformable_rigid_interaction_prediction △ Less

Submitted 4 March, 2021; originally announced March 2021.

Comments: IROS 2021 submission, Zehang Weng and Fabian Paus have equal contribution to this paper

arXiv:2102.12141 [pdf, other]

Learning to Shift Attention for Motion Generation

Authors: You Zhou, Jianfeng Gao, Tamim Asfour

Abstract: One challenge of motion generation using robot learning from demonstration techniques is that human demonstrations follow a distribution with multiple modes for one task query. Previous approaches fail to capture all modes or tend to average modes of the demonstrations and thus generate invalid trajectories. The other difficulty is the small number of demonstrations that cannot cover the entire wo… ▽ More One challenge of motion generation using robot learning from demonstration techniques is that human demonstrations follow a distribution with multiple modes for one task query. Previous approaches fail to capture all modes or tend to average modes of the demonstrations and thus generate invalid trajectories. The other difficulty is the small number of demonstrations that cannot cover the entire working space. To overcome this problem, a motion generation model with extrapolation ability is needed. Previous works restrict task queries as local frames and learn representations in local frames. We propose a model to solve both problems. For multiple modes, we suggest to learn local latent representations of motion trajectories with a density estimation method based on real-valued non-volume preserving (RealNVP) transformations that provides a set of powerful, stably invertible, and learnable transformations. To improve the extrapolation ability, we propose to shift the attention of the robot from one local frame to another during the task execution. In experiments, we consider the docking problem used also in previous works where a trajectory has to be generated to connect two dockers without collision. We increase complexity of the task and show that the proposed method outperforms other approaches. In addition, we evaluate the approach in real robot experiments. △ Less

Submitted 24 February, 2021; originally announced February 2021.

arXiv:2102.02100 [pdf, other]

doi 10.1016/j.robot.2024.104632

Object and Relation Centric Representations for Push Effect Prediction

Authors: Ahmet E. Tekden, Aykut Erdem, Erkut Erdem, Tamim Asfour, Emre Ugur

Abstract: Pushing is an essential non-prehensile manipulation skill used for tasks ranging from pre-grasp manipulation to scene rearrangement, reasoning about object relations in the scene, and thus pushing actions have been widely studied in robotics. The effective use of pushing actions often requires an understanding of the dynamics of the manipulated objects and adaptation to the discrepancies between p… ▽ More Pushing is an essential non-prehensile manipulation skill used for tasks ranging from pre-grasp manipulation to scene rearrangement, reasoning about object relations in the scene, and thus pushing actions have been widely studied in robotics. The effective use of pushing actions often requires an understanding of the dynamics of the manipulated objects and adaptation to the discrepancies between prediction and reality. For this reason, effect prediction and parameter estimation with pushing actions have been heavily investigated in the literature. However, current approaches are limited because they either model systems with a fixed number of objects or use image-based representations whose outputs are not very interpretable and quickly accumulate errors. In this paper, we propose a graph neural network based framework for effect prediction and parameter estimation of pushing actions by modeling object relations based on contacts or articulations. Our framework is validated both in real and simulated environments containing different shaped multi-part objects connected via different types of joints and objects with different masses, and it outperforms image-based representations on physics prediction. Our approach enables the robot to predict and adapt the effect of a pushing action as it observes the scene. It can also be used for tool manipulation with never-seen tools. Further, we demonstrate 6D effect prediction in the lever-up action in the context of robot-based hard-disk disassembly. △ Less

Submitted 22 February, 2023; v1 submitted 3 February, 2021; originally announced February 2021.

Comments: Project Page: https://fzaero.github.io/push_learning/

arXiv:2010.08169 [pdf, other]

Uncertainty-aware Contact-safe Model-based Reinforcement Learning

Authors: Cheng-Yu Kuo, Andreas Schaarschmidt, Yunduan Cui, Tamim Asfour, Takamitsu Matsubara

Abstract: This letter presents contact-safe Model-based Reinforcement Learning (MBRL) for robot applications that achieves contact-safe behaviors in the learning process. In typical MBRL, we cannot expect the data-driven model to generate accurate and reliable policies to the intended robotic tasks during the learning process due to sample scarcity. Operating these unreliable policies in a contact-rich envi… ▽ More This letter presents contact-safe Model-based Reinforcement Learning (MBRL) for robot applications that achieves contact-safe behaviors in the learning process. In typical MBRL, we cannot expect the data-driven model to generate accurate and reliable policies to the intended robotic tasks during the learning process due to sample scarcity. Operating these unreliable policies in a contact-rich environment could cause damage to the robot and its surroundings. To alleviate the risk of causing damage through unexpected intensive physical contacts, we present the contact-safe MBRL that associates the probabilistic Model Predictive Control's (pMPC) control limits with the model uncertainty so that the allowed acceleration of controlled behavior is adjusted according to learning progress. Control planning with such uncertainty-aware control limits is formulated as a deterministic MPC problem using a computation-efficient approximated GP dynamics and an approximated inference technique. Our approach's effectiveness is evaluated through bowl mixing tasks with simulated and real robots, scoo** tasks with a real robot as examples of contact-rich manipulation skills. (video: https://youtu.be/sdhHP3NhYi0) △ Less

Submitted 9 March, 2021; v1 submitted 16 October, 2020; originally announced October 2020.

Comments: 8 pages, Accepted by Robotics and Automation Letters with ICRA 2021 option

arXiv:2006.03537 [pdf, other]

A Soft Humanoid Hand with In-Finger Visual Perception

Authors: Felix Hundhausen, Julia Starke, Tamim Asfour

Abstract: We present a novel underactued humanoid five finger soft hand, the KIT \softhand, which is equipped with cameras in the fingertips and integrates a high performance embedded system for visual processing and control. We describe the actuation mechanism of the hand and the tendon-driven soft finger design with internally routed high-bandwidth flat-flex cables. For efficient on-board parallel process… ▽ More We present a novel underactued humanoid five finger soft hand, the KIT \softhand, which is equipped with cameras in the fingertips and integrates a high performance embedded system for visual processing and control. We describe the actuation mechanism of the hand and the tendon-driven soft finger design with internally routed high-bandwidth flat-flex cables. For efficient on-board parallel processing of visual data from the cameras in each fingertip, we present a hybrid embedded architecture consisting of a field programmable logic array (FPGA) and a microcontroller that allows the realization of visual object segmentation based on convolutional neural networks. We evaluate the hand design by conducting durability experiments with one finger and quantify the grasp performance in terms of gras** force, speed and grasp success. The results show that the hand exhibits a grasp force of 31.8 N and a mechanical durability of the finger of more than 15.000 closing cycles. Finally, we evaluate the accuracy of visual object segmentation during the different phases of the gras** process using five different objects. Hereby, an accuracy above 90 % can be achieved. △ Less

Submitted 5 June, 2020; originally announced June 2020.

Comments: 7 pages

arXiv:2005.00227 [pdf, other]

Learning Compliance Adaptation in Contact-Rich Manipulation

Authors: Jianfeng Gao, You Zhou, Tamim Asfour

Abstract: Compliant robot behavior is crucial for the realization of contact-rich manipulation tasks. In such tasks, it is important to ensure a high stiffness and force tracking accuracy during normal task execution as well as rapid adaptation and complaint behavior to react to abnormal situations and changes. In this paper, we propose a novel approach for learning predictive models of force profiles requi… ▽ More Compliant robot behavior is crucial for the realization of contact-rich manipulation tasks. In such tasks, it is important to ensure a high stiffness and force tracking accuracy during normal task execution as well as rapid adaptation and complaint behavior to react to abnormal situations and changes. In this paper, we propose a novel approach for learning predictive models of force profiles required for contact-rich tasks. Such models allow detecting unexpected situations and facilitates better adaptive control. The approach combines an anomaly detection based on Bidirectional Gated Recurrent Units (Bi-GRU) and an adaptive force/impedance controller. We evaluated the approach in simulated and real world experiments on a humanoid robot.The results show that the approach allow simultaneous high tracking accuracy of desired motions and force profile as well as the adaptation to force perturbations due to physical human interaction. △ Less

Submitted 1 May, 2020; originally announced May 2020.

arXiv:1909.03749 [pdf, other]

Learning Visual Dynamics Models of Rigid Objects using Relational Inductive Biases

Authors: Fabio Ferreira, Lin Shao, Tamim Asfour, Jeannette Bohg

Abstract: Endowing robots with human-like physical reasoning abilities remains challenging. We argue that existing methods often disregard spatio-temporal relations and by using Graph Neural Networks (GNNs) that incorporate a relational inductive bias, we can shift the learning process towards exploiting relations. In this work, we learn action-conditional forward dynamics models of a simulated manipulation… ▽ More Endowing robots with human-like physical reasoning abilities remains challenging. We argue that existing methods often disregard spatio-temporal relations and by using Graph Neural Networks (GNNs) that incorporate a relational inductive bias, we can shift the learning process towards exploiting relations. In this work, we learn action-conditional forward dynamics models of a simulated manipulation task from visual observations involving cluttered and irregularly shaped objects. We investigate two GNN approaches and empirically assess their capability to generalize to scenarios with novel and an increasing number of objects. The first, Graph Networks (GN) based approach, considers explicitly defined edge attributes and not only does it consistently underperform an auto-encoder baseline that we modified to predict future states, our results indicate how different edge attributes can significantly influence the predictions. Consequently, we develop the Auto-Predictor that does not rely on explicitly defined edge attributes. It outperforms the baseline and the GN-based models. Overall, our results show the sensitivity of GNN-based approaches to the task representation, the efficacy of relational inductive biases and advocate choosing lightweight approaches that implicitly reason about relations over ones that leave these decisions to human designers. △ Less

Submitted 23 October, 2019; v1 submitted 9 September, 2019; originally announced September 2019.

Comments: short paper (4 pages, two figures), accepted to NeurIPS 2019 Graph Representation Learning workshop

arXiv:1908.08391 [pdf, other]

Learning Object-Action Relations from Bimanual Human Demonstration Using Graph Networks

Authors: Christian R. G. Dreher, Mirko Wächter, Tamim Asfour

Abstract: Recognizing human actions is a vital task for a humanoid robot, especially in domains like programming by demonstration. Previous approaches on action recognition primarily focused on the overall prevalent action being executed, but we argue that bimanual human motion cannot always be described sufficiently with a single action label. We present a system for frame-wise action classification and se… ▽ More Recognizing human actions is a vital task for a humanoid robot, especially in domains like programming by demonstration. Previous approaches on action recognition primarily focused on the overall prevalent action being executed, but we argue that bimanual human motion cannot always be described sufficiently with a single action label. We present a system for frame-wise action classification and segmentation in bimanual human demonstrations. The system extracts symbolic spatial object relations from raw RGB-D video data captured from the robot's point of view in order to build graph-based scene representations. To learn object-action relations, a graph network classifier is trained using these representations together with ground truth action labels to predict the action executed by each hand. We evaluated the proposed classifier on a new RGB-D video dataset showing daily action sequences focusing on bimanual manipulation actions. It consists of 6 subjects performing 9 tasks with 10 repetitions each, which leads to 540 video recordings with 2 hours and 18 minutes total playtime and per-hand ground truth action labels for each frame. We show that the classifier is able to reliably identify (action classification macro F1-score of 0.86) the true executed action of each hand within its top 3 predictions on a frame-by-frame basis without prior temporal action segmentation. △ Less

Submitted 12 September, 2019; v1 submitted 22 August, 2019; originally announced August 2019.

Comments: Submitted to IEEE Robotics and Automation Letters

arXiv:1907.08982 [pdf, other]

Noise Regularization for Conditional Density Estimation

Authors: Jonas Rothfuss, Fabio Ferreira, Simon Boehm, Simon Walther, Maxim Ulrich, Tamim Asfour, Andreas Krause

Abstract: Modelling statistical relationships beyond the conditional mean is crucial in many settings. Conditional density estimation (CDE) aims to learn the full conditional probability density from data. Though highly expressive, neural network based CDE models can suffer from severe over-fitting when trained with the maximum likelihood objective. Due to the inherent structure of such models, classical re… ▽ More Modelling statistical relationships beyond the conditional mean is crucial in many settings. Conditional density estimation (CDE) aims to learn the full conditional probability density from data. Though highly expressive, neural network based CDE models can suffer from severe over-fitting when trained with the maximum likelihood objective. Due to the inherent structure of such models, classical regularization approaches in the parameter space are rendered ineffective. To address this issue, we develop a model-agnostic noise regularization method for CDE that adds random perturbations to the data during training. We demonstrate that the proposed approach corresponds to a smoothness regularization and prove its asymptotic consistency. In our experiments, noise regularization significantly and consistently outperforms other regularization methods across seven data sets and three CDE models. The effectiveness of noise regularization makes neural network based CDE the preferable method over previous non- and semi-parametric approaches, even when training data is scarce. △ Less

Submitted 14 February, 2020; v1 submitted 21 July, 2019; originally announced July 2019.

arXiv:1810.06784 [pdf, other]

ProMP: Proximal Meta-Policy Search

Authors: Jonas Rothfuss, Dennis Lee, Ignasi Clavera, Tamim Asfour, Pieter Abbeel

Abstract: Credit assignment in Meta-reinforcement learning (Meta-RL) is still poorly understood. Existing methods either neglect credit assignment to pre-adaptation behavior or implement it naively. This leads to poor sample-efficiency during meta-training as well as ineffective task identification strategies. This paper provides a theoretical analysis of credit assignment in gradient-based Meta-RL. Buildin… ▽ More Credit assignment in Meta-reinforcement learning (Meta-RL) is still poorly understood. Existing methods either neglect credit assignment to pre-adaptation behavior or implement it naively. This leads to poor sample-efficiency during meta-training as well as ineffective task identification strategies. This paper provides a theoretical analysis of credit assignment in gradient-based Meta-RL. Building on the gained insights we develop a novel meta-learning algorithm that overcomes both the issue of poor credit assignment and previous difficulties in estimating meta-policy gradients. By controlling the statistical distance of both pre-adaptation and adapted policies during meta-policy search, the proposed algorithm endows efficient and stable meta-learning. Our approach leads to superior pre-adaptation policy behavior and consistently outperforms previous Meta-RL algorithms in sample-efficiency, wall-clock time, and asymptotic performance. △ Less

Submitted 11 February, 2022; v1 submitted 15 October, 2018; originally announced October 2018.

Comments: The first three authors contributed equally. Published at ICLR 2019

arXiv:1810.00357 [pdf, other]

doi 10.1109/HUMANOIDS.2017.8239541

A Framework for Evaluating Motion Segmentation Algorithms

Authors: Christian R. G. Dreher, Nicklas Kulp, Christian Mandery, Mirko Wächter, Tamim Asfour

Abstract: There have been many proposals for algorithms segmenting human whole-body motion in the literature. However, the wide range of use cases, datasets, and quality measures that were used for the evaluation render the comparison of algorithms challenging. In this paper, we introduce a framework that puts motion segmentation algorithms on a unified testing ground and provides a possibility to allow com… ▽ More There have been many proposals for algorithms segmenting human whole-body motion in the literature. However, the wide range of use cases, datasets, and quality measures that were used for the evaluation render the comparison of algorithms challenging. In this paper, we introduce a framework that puts motion segmentation algorithms on a unified testing ground and provides a possibility to allow comparing them. The testing ground features both a set of quality measures known from the literature and a novel approach tailored to the evaluation of motion segmentation algorithms, termed Integrated Kernel approach. Datasets of motion recordings, provided with a ground truth, are included as well. They are labelled in a new way, which hierarchically organises the ground truth, to cover different use cases that segmentation algorithms can possess. The framework and datasets are publicly available and are intended to represent a service for the community regarding the comparison and evaluation of existing and new motion segmentation algorithms. △ Less

Submitted 30 September, 2018; originally announced October 2018.

Journal ref: Humanoid Robotics (Humanoids), 2017 IEEE-RAS 17th International Conference on. IEEE, 2017. p. 83-90

arXiv:1809.05214 [pdf, other]

Model-Based Reinforcement Learning via Meta-Policy Optimization

Authors: Ignasi Clavera, Jonas Rothfuss, John Schulman, Yasuhiro Fujita, Tamim Asfour, Pieter Abbeel

Abstract: Model-based reinforcement learning approaches carry the promise of being data efficient. However, due to challenges in learning dynamics models that sufficiently match the real-world dynamics, they struggle to achieve the same asymptotic performance as model-free methods. We propose Model-Based Meta-Policy-Optimization (MB-MPO), an approach that foregoes the strong reliance on accurate learned dyn… ▽ More Model-based reinforcement learning approaches carry the promise of being data efficient. However, due to challenges in learning dynamics models that sufficiently match the real-world dynamics, they struggle to achieve the same asymptotic performance as model-free methods. We propose Model-Based Meta-Policy-Optimization (MB-MPO), an approach that foregoes the strong reliance on accurate learned dynamics models. Using an ensemble of learned dynamic models, MB-MPO meta-learns a policy that can quickly adapt to any model in the ensemble with one policy gradient step. This steers the meta-policy towards internalizing consistent dynamics predictions among the ensemble while shifting the burden of behaving optimally w.r.t. the model discrepancies towards the adaptation step. Our experiments show that MB-MPO is more robust to model imperfections than previous model-based approaches. Finally, we demonstrate that our approach is able to match the asymptotic performance of model-free methods while requiring significantly less experience. △ Less

Submitted 13 September, 2018; originally announced September 2018.

Comments: First 2 authors contributed equally. Accepted for Conference on Robot Learning (CoRL)

arXiv:1807.00703 [pdf, other]

Introducing the Simulated Flying Shapes and Simulated Planar Manipulator Datasets

Authors: Fabio Ferreira, Jonas Rothfuss, Eren Erdal Aksoy, You Zhou, Tamim Asfour

Abstract: We release two artificial datasets, Simulated Flying Shapes and Simulated Planar Manipulator that allow to test the learning ability of video processing systems. In particular, the dataset is meant as a tool which allows to easily assess the sanity of deep neural network models that aim to encode, reconstruct or predict video frame sequences. The datasets each consist of 90000 videos. The Simulate… ▽ More We release two artificial datasets, Simulated Flying Shapes and Simulated Planar Manipulator that allow to test the learning ability of video processing systems. In particular, the dataset is meant as a tool which allows to easily assess the sanity of deep neural network models that aim to encode, reconstruct or predict video frame sequences. The datasets each consist of 90000 videos. The Simulated Flying Shapes dataset comprises scenes showing two objects of equal shape (rectangle, triangle and circle) and size in which one object approaches its counterpart. The Simulated Planar Manipulator shows a 3-DOF planar manipulator that executes a pick-and-place task in which it has to place a size-varying circle on a squared platform. Different from other widely used datasets such as moving MNIST [1], [2], the two presented datasets involve goal-oriented tasks (e.g. the manipulator gras** an object and placing it on a platform), rather than showing random movements. This makes our datasets more suitable for testing prediction capabilities and the learning of sophisticated motions by a machine learning model. This technical document aims at providing an introduction into the usage of both datasets. △ Less

Submitted 2 July, 2018; originally announced July 2018.

Comments: technical documentation, 2 figures, links to repositories

arXiv:1801.04134 [pdf, other]

Deep Episodic Memory: Encoding, Recalling, and Predicting Episodic Experiences for Robot Action Execution

Authors: Jonas Rothfuss, Fabio Ferreira, Eren Erdal Aksoy, You Zhou, Tamim Asfour

Abstract: We present a novel deep neural network architecture for representing robot experiences in an episodic-like memory which facilitates encoding, recalling, and predicting action experiences. Our proposed unsupervised deep episodic memory model 1) encodes observed actions in a latent vector space and, based on this latent encoding, 2) infers most similar episodes previously experienced, 3) reconstruct… ▽ More We present a novel deep neural network architecture for representing robot experiences in an episodic-like memory which facilitates encoding, recalling, and predicting action experiences. Our proposed unsupervised deep episodic memory model 1) encodes observed actions in a latent vector space and, based on this latent encoding, 2) infers most similar episodes previously experienced, 3) reconstructs original episodes, and 4) predicts future frames in an end-to-end fashion. Results show that conceptually similar actions are mapped into the same region of the latent vector space. Based on these results, we introduce an action matching and retrieval mechanism, benchmark its performance on two large-scale action datasets, 20BN-something-something and ActivityNet and evaluate its generalization capability in a real-world scenario on a humanoid robot. △ Less

Submitted 14 July, 2018; v1 submitted 12 January, 2018; originally announced January 2018.

arXiv:1710.02418 [pdf, other]

Planning High-Quality Grasps using Mean Curvature Object Skeletons

Authors: Nikolaus Vahrenkamp, Eduard Koch, Mirko Waechter, Tamim Asfour

Abstract: In this work, we present a grasp planner which integrates two sources of information to generate robust grasps for a robotic hand. First, the topological information of the object model is incorporated by building the mean curvature skeleton and segmenting the object accordingly in order to identify object regions which are suitable for applying a grasp. Second, the local surface structure is inve… ▽ More In this work, we present a grasp planner which integrates two sources of information to generate robust grasps for a robotic hand. First, the topological information of the object model is incorporated by building the mean curvature skeleton and segmenting the object accordingly in order to identify object regions which are suitable for applying a grasp. Second, the local surface structure is investigated to construct feasible and robust gras** poses by aligning the hand according to the local object shape. We show how this information can be used to derive different gras** strategies, which also allows to distinguish between precision and power grasps. We applied the approach to a wide variety of object models of the KIT and the YCB real-world object model databases and evaluated the approach with several robotic hands. The results show that the skeleton-based grasp planner is capable to autonomously generate high-quality grasps in an efficient manner. In addition, we evaluate how robust the planned grasps are against hand positioning errors as they occur in real-world applications due to perception and actuation inaccuracies. The evaluation shows that the majority of the generated grasps are of high quality since they can be successfully applied even when the hand is not exactly positioned. △ Less

Submitted 6 October, 2017; originally announced October 2017.

arXiv:1706.01905 [pdf, other]

Parameter Space Noise for Exploration

Authors: Matthias Plappert, Rein Houthooft, Prafulla Dhariwal, Szymon Sidor, Richard Y. Chen, Xi Chen, Tamim Asfour, Pieter Abbeel, Marcin Andrychowicz

Abstract: Deep reinforcement learning (RL) methods generally engage in exploratory behavior through noise injection in the action space. An alternative is to add noise directly to the agent's parameters, which can lead to more consistent exploration and a richer set of behaviors. Methods such as evolutionary strategies use parameter perturbations, but discard all temporal structure in the process and requir… ▽ More Deep reinforcement learning (RL) methods generally engage in exploratory behavior through noise injection in the action space. An alternative is to add noise directly to the agent's parameters, which can lead to more consistent exploration and a richer set of behaviors. Methods such as evolutionary strategies use parameter perturbations, but discard all temporal structure in the process and require significantly more samples. Combining parameter noise with traditional RL methods allows to combine the best of both worlds. We demonstrate that both off- and on-policy methods benefit from this approach through experimental comparison of DQN, DDPG, and TRPO on high-dimensional discrete action environments as well as continuous control tasks. Our results show that RL with parameter noise learns more efficiently than traditional RL with action space noise and evolutionary strategies individually. △ Less

Submitted 31 January, 2018; v1 submitted 6 June, 2017; originally announced June 2017.

Comments: Updated to camera-ready ICLR submission

arXiv:1705.06400 [pdf, other]

doi 10.1016/j.robot.2018.07.006

Learning a bidirectional map** between human whole-body motion and natural language using deep recurrent neural networks

Authors: Matthias Plappert, Christian Mandery, Tamim Asfour

Abstract: Linking human whole-body motion and natural language is of great interest for the generation of semantic representations of observed human behaviors as well as for the generation of robot behaviors based on natural language input. While there has been a large body of research in this area, most approaches that exist today require a symbolic representation of motions (e.g. in the form of motion pri… ▽ More Linking human whole-body motion and natural language is of great interest for the generation of semantic representations of observed human behaviors as well as for the generation of robot behaviors based on natural language input. While there has been a large body of research in this area, most approaches that exist today require a symbolic representation of motions (e.g. in the form of motion primitives), which have to be defined a-priori or require complex segmentation algorithms. In contrast, recent advances in the field of neural networks and especially deep learning have demonstrated that sub-symbolic representations that can be learned end-to-end usually outperform more traditional approaches, for applications such as machine translation. In this paper we propose a generative model that learns a bidirectional map** between human whole-body motion and natural language using deep recurrent neural networks (RNNs) and sequence-to-sequence learning. Our approach does not require any segmentation or manual feature engineering and learns a distributed representation, which is shared for all motions and descriptions. We evaluate our approach on 2,846 human whole-body motions and 6,187 natural language descriptions thereof from the KIT Motion-Language Dataset. Our results clearly demonstrate the effectiveness of the proposed model: We show that our model generates a wide variety of realistic motions only from descriptions thereof in form of a single sentence. Conversely, our model is also capable of generating correct and detailed natural language descriptions from human motions. △ Less

Submitted 2 August, 2018; v1 submitted 17 May, 2017; originally announced May 2017.

arXiv:1703.00390 [pdf, other]

Multimodal Gaze Stabilization of a Humanoid Robot based on Reafferences

Authors: Timothee Habra, Markus Grotz, David Sippel, Tamim Asfour, Renaud Ronsse

Abstract: Gaze stabilization is fundamental for humanoid robots. By stabilizing vision, it enhances perception of the environment and keeps points of interest in the field of view. In this contribution, a multimodal gaze stabilization combining classic inverse kinematic control with vestibulo-ocular and optokinetic reflexes is introduced. Inspired by neuroscience, it implements a forward model that can modu… ▽ More Gaze stabilization is fundamental for humanoid robots. By stabilizing vision, it enhances perception of the environment and keeps points of interest in the field of view. In this contribution, a multimodal gaze stabilization combining classic inverse kinematic control with vestibulo-ocular and optokinetic reflexes is introduced. Inspired by neuroscience, it implements a forward model that can modulate the reflexes based on the reafference principle. This principle filters self-generated movements out of the reflexive feedback loop. The versatility and effectiveness of this method are experimentally validated on the Armar-III humanoid robot. It is first demonstrated that each stabilization mechanism (inverse kinematics and reflexes) performs better than the others as a function of the type of perturbation to be stabilized. Furthermore, combining these three modalities by reafference provides a universal gaze stabilizer which can handle any kind of perturbation. △ Less

Submitted 1 March, 2017; originally announced March 2017.

arXiv:1607.03827 [pdf, other]

doi 10.1089/big.2016.0028

The KIT Motion-Language Dataset

Authors: Matthias Plappert, Christian Mandery, Tamim Asfour

Abstract: Linking human motion and natural language is of great interest for the generation of semantic representations of human activities as well as for the generation of robot activities based on natural language input. However, while there have been years of research in this area, no standardized and openly available dataset exists to support the development and evaluation of such systems. We therefore… ▽ More Linking human motion and natural language is of great interest for the generation of semantic representations of human activities as well as for the generation of robot activities based on natural language input. However, while there have been years of research in this area, no standardized and openly available dataset exists to support the development and evaluation of such systems. We therefore propose the KIT Motion-Language Dataset, which is large, open, and extensible. We aggregate data from multiple motion capture databases and include them in our dataset using a unified representation that is independent of the capture system or marker set, making it easy to work with the data regardless of its origin. To obtain motion annotations in natural language, we apply a crowd-sourcing approach and a web-based tool that was specifically build for this purpose, the Motion Annotation Tool. We thoroughly document the annotation process itself and discuss gamification methods that we used to keep annotators motivated. We further propose a novel method, perplexity-based selection, which systematically selects motions for further annotation that are either under-represented in our dataset or that have erroneous annotations. We show that our method mitigates the two aforementioned problems and ensures a systematic annotation process. We provide an in-depth analysis of the structure and contents of our resulting dataset, which, as of October 10, 2016, contains 3911 motions with a total duration of 11.23 hours and 6278 annotations in natural language that contain 52,903 words. We believe this makes our dataset an excellent choice that enables more transparent and comparable research in this important area. △ Less

Submitted 9 August, 2018; v1 submitted 13 July, 2016; originally announced July 2016.

Comments: 5 figures, 4 tables, submitted to Big Data journal, Special Issue on Robotics

arXiv:1507.08799 [pdf, other]

Analyzing Whole-Body Pose Transitions in Multi-Contact Motions

Authors: Christian Mandery, Júlia Borràs, Mirjam Jöchner, Tamim Asfour

Abstract: When executing whole-body motions, humans are able to use a large variety of support poses which not only utilize the feet, but also hands, knees and elbows to enhance stability. While there are many works analyzing the transitions involved in walking, very few works analyze human motion where more complex supports occur. In this work, we analyze complex support pose transitions in human motion… ▽ More When executing whole-body motions, humans are able to use a large variety of support poses which not only utilize the feet, but also hands, knees and elbows to enhance stability. While there are many works analyzing the transitions involved in walking, very few works analyze human motion where more complex supports occur. In this work, we analyze complex support pose transitions in human motion involving locomotion and manipulation tasks (loco-manipulation). We have applied a method for the detection of human support contacts from motion capture data to a large-scale dataset of loco-manipulation motions involving multi-contact supports, providing a semantic representation of them. Our results provide a statistical analysis of the used support poses, their transitions and the time spent in each of them. In addition, our data partially validates our taxonomy of whole-body support poses presented in our previous work. We believe that this work extends our understanding of human motion for humanoids, with a long-term objective of develo** methods for autonomous multi-contact motion planning. △ Less

Submitted 30 September, 2015; v1 submitted 31 July, 2015; originally announced July 2015.

Comments: 8 pages, IEEE-RAS International Conference on Humanoid Robots (Humanoids) 2015

arXiv:1503.06839 [pdf, other]

A Whole-Body Pose Taxonomy for Loco-Manipulation Tasks

Authors: Júlia Borràs, Tamim Asfour

Abstract: Exploiting interaction with the environment is a promising and powerful way to enhance stability of humanoid robots and robustness while executing locomotion and manipulation tasks. Recently some works have started to show advances in this direction considering humanoid locomotion with multi-contacts, but to be able to fully develop such abilities in a more autonomous way, we need to first underst… ▽ More Exploiting interaction with the environment is a promising and powerful way to enhance stability of humanoid robots and robustness while executing locomotion and manipulation tasks. Recently some works have started to show advances in this direction considering humanoid locomotion with multi-contacts, but to be able to fully develop such abilities in a more autonomous way, we need to first understand and classify the variety of possible poses a humanoid robot can achieve to balance. To this end, we propose the adaptation of a successful idea widely used in the field of robot gras** to the field of humanoid balance with multi-contacts: a whole-body pose taxonomy classifying the set of whole-body robot configurations that use the environment to enhance stability. We have revised criteria of classification used to develop gras** taxonomies, focusing on structuring and simplifying the large number of possible poses the human body can adopt. We propose a taxonomy with 46 poses, containing three main categories, considering number and type of supports as well as possible transitions between poses. The taxonomy induces a classification of motion primitives based on the pose used for support, and a set of rules to store and generate new motions. We present preliminary results that apply known segmentation techniques to motion data from the KIT whole-body motion database. Using motion capture data with multi-contacts, we can identify support poses providing a segmentation that can distinguish between locomotion and manipulation parts of an action. △ Less

Submitted 22 September, 2015; v1 submitted 23 March, 2015; originally announced March 2015.

Comments: 8 pages, 7 figures, 1 table with full page figure that appears in landscape page, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems

Showing 1–50 of 53 results for author: Asfour, T