Search | arXiv e-print repository

Bi-KVIL: Keypoints-based Visual Imitation Learning of Bimanual Manipulation Tasks

Authors: Jianfeng Gao, Xiaoshu **, Franziska Krebs, Noémie Jaquier, Tamim Asfour

Abstract: Visual imitation learning has achieved impressive progress in learning unimanual manipulation tasks from a small set of visual observations, thanks to the latest advances in computer vision. However, learning bimanual coordination strategies and complex object relations from bimanual visual demonstrations, as well as generalizing them to categorical objects in novel cluttered scenes remain unsolve… ▽ More Visual imitation learning has achieved impressive progress in learning unimanual manipulation tasks from a small set of visual observations, thanks to the latest advances in computer vision. However, learning bimanual coordination strategies and complex object relations from bimanual visual demonstrations, as well as generalizing them to categorical objects in novel cluttered scenes remain unsolved challenges. In this paper, we extend our previous work on keypoints-based visual imitation learning (\mbox{K-VIL})~\cite{gao_kvil_2023} to bimanual manipulation tasks. The proposed Bi-KVIL jointly extracts so-called \emph{Hybrid Master-Slave Relationships} (HMSR) among objects and hands, bimanual coordination strategies, and sub-symbolic task representations. Our bimanual task representation is object-centric, embodiment-independent, and viewpoint-invariant, thus generalizing well to categorical objects in novel scenes. We evaluate our approach in various real-world applications, showcasing its ability to learn fine-grained bimanual manipulation tasks from a small number of human demonstration videos. Videos and source code are available at https://sites.google.com/view/bi-kvil. △ Less

Submitted 22 March, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

arXiv:2312.08030 [pdf, other]

Incremental Learning of Full-Pose Via-Point Movement Primitives on Riemannian Manifolds

Authors: Tilman Daab, Noémie Jaquier, Christian Dreher, Andre Meixner, Franziska Krebs, Tamim Asfour

Abstract: Movement primitives (MPs) are compact representations of robot skills that can be learned from demonstrations and combined into complex behaviors. However, merely equip** robots with a fixed set of innate MPs is insufficient to deploy them in dynamic and unpredictable environments. Instead, the full potential of MPs remains to be attained via adaptable, large-scale MP libraries. In this paper, w… ▽ More Movement primitives (MPs) are compact representations of robot skills that can be learned from demonstrations and combined into complex behaviors. However, merely equip** robots with a fixed set of innate MPs is insufficient to deploy them in dynamic and unpredictable environments. Instead, the full potential of MPs remains to be attained via adaptable, large-scale MP libraries. In this paper, we propose a set of seven fundamental operations to incrementally learn, improve, and re-organize MP libraries. To showcase their applicability, we provide explicit formulations of the spatial operations for libraries composed of Via-Point Movement Primitives (VMPs). By building on Riemannian manifold theory, our approach enables the incremental learning of all parameters of position and orientation VMPs within a library. Moreover, our approach stores a fixed number of parameters, thus complying with the essential principles of incremental learning. We evaluate our approach to incrementally learn a VMP library from motion capture data provided sequentially. △ Less

Submitted 13 December, 2023; originally announced December 2023.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible. 7 pages, 7 figures and 2 tables

arXiv:2005.12651 [pdf, other]

What the HoloLens Maps Is Your Workspace: Fast Map** and Set-up of Robot Cells via Head Mounted Displays and Augmented Reality

Authors: David Puljiz, Franziska Krebs, Fabian Bösing, Björn Hein

Abstract: Classical methods of modelling and map** robot work cells are time consuming, expensive and involve expert knowledge. We present a novel approach to map** and cell setup using modern Head Mounted Displays (HMDs) that possess self-localisation and map** capabilities. We leveraged these capabilities to create a point cloud of the environment and build an OctoMap - a voxel occupancy grid repres… ▽ More Classical methods of modelling and map** robot work cells are time consuming, expensive and involve expert knowledge. We present a novel approach to map** and cell setup using modern Head Mounted Displays (HMDs) that possess self-localisation and map** capabilities. We leveraged these capabilities to create a point cloud of the environment and build an OctoMap - a voxel occupancy grid representation of the robot's workspace for path planning. Through the use of Augmented Reality (AR) interactions, the user can edit the created Octomap and add security zones. We perform comprehensive tests of the HoloLens' depth sensing capabilities and the quality of the resultant point cloud. A high-end laser scanner is used to provide the ground truth for the evaluation of the point cloud quality. The amount of false-positive and false-negative voxels in the OctoMap are also tested. △ Less

Submitted 26 May, 2020; originally announced May 2020.

Comments: As submited to IROS 2020

arXiv:1712.03249 [pdf, other]

Social Emotion Mining Techniques for Facebook Posts Reaction Prediction

Authors: Florian Krebs, Bruno Lubascher, Tobias Moers, Pieter Schaap, Gerasimos Spanakis

Abstract: As of February 2016 Facebook allows users to express their experienced emotions about a post by using five so-called `reactions'. This research paper proposes and evaluates alternative methods for predicting these reactions to user posts on public pages of firms/companies (like supermarket chains). For this purpose, we collected posts (and their reactions) from Facebook pages of large supermarket… ▽ More As of February 2016 Facebook allows users to express their experienced emotions about a post by using five so-called `reactions'. This research paper proposes and evaluates alternative methods for predicting these reactions to user posts on public pages of firms/companies (like supermarket chains). For this purpose, we collected posts (and their reactions) from Facebook pages of large supermarket chains and constructed a dataset which is available for other researches. In order to predict the distribution of reactions of a new post, neural network architectures (convolutional and recurrent neural networks) were tested using pretrained word embeddings. Results of the neural networks were improved by introducing a bootstrap** approach for sentiment and emotion mining on the comments for each post. The final model (a combination of neural network and a baseline emotion miner) is able to predict the reaction distribution on Facebook posts with a mean squared error (or misclassification rate) of 0.135. △ Less

Submitted 8 December, 2017; originally announced December 2017.

Comments: 10 pages, 13 figures and accepted at ICAART 2018. (Dataset: https://github.com/jerryspan/FacebookR)

arXiv:1605.07008 [pdf, ps, other]

madmom: a new Python Audio and Music Signal Processing Library

Authors: Sebastian Böck, Filip Korzeniowski, Jan Schlüter, Florian Krebs, Gerhard Widmer

Abstract: In this paper, we present madmom, an open-source audio processing and music information retrieval (MIR) library written in Python. madmom features a concise, NumPy-compatible, object oriented design with simple calling conventions and sensible default values for all parameters, which facilitates fast prototy** of MIR applications. Prototypes can be seamlessly converted into callable processing p… ▽ More In this paper, we present madmom, an open-source audio processing and music information retrieval (MIR) library written in Python. madmom features a concise, NumPy-compatible, object oriented design with simple calling conventions and sensible default values for all parameters, which facilitates fast prototy** of MIR applications. Prototypes can be seamlessly converted into callable processing pipelines through madmom's concept of Processors, callable objects that run transparently on multiple cores. Processors can also be serialised, saved, and re-run to allow results to be easily reproduced anywhere. Apart from low-level audio processing, madmom puts emphasis on musically meaningful high-level features. Many of these incorporate machine learning techniques and madmom provides a module that implements some in MIR commonly used methods such as hidden Markov models and neural networks. Additionally, madmom comes with several state-of-the-art MIR algorithms for onset detection, beat, downbeat and meter tracking, tempo estimation, and piano transcription. These can easily be incorporated into bigger MIR systems or run as stand-alone programs. △ Less

Submitted 23 May, 2016; originally announced May 2016.

ACM Class: H.5.5

Showing 1–5 of 5 results for author: Krebs, F