Search | arXiv e-print repository

Learning Embeddings with Centroid Triplet Loss for Object Identification in Robotic Gras**

Authors: Anas Gouda, Max Schwarz, Christopher Reining, Sven Behnke, Alice Kirchheim

Abstract: Foundation models are a strong trend in deep learning and computer vision. These models serve as a base for applications as they require minor or no further fine-tuning by developers to integrate into their applications. Foundation models for zero-shot object segmentation such as Segment Anything (SAM) output segmentation masks from images without any further object information. When they are foll… ▽ More Foundation models are a strong trend in deep learning and computer vision. These models serve as a base for applications as they require minor or no further fine-tuning by developers to integrate into their applications. Foundation models for zero-shot object segmentation such as Segment Anything (SAM) output segmentation masks from images without any further object information. When they are followed in a pipeline by an object identification model, they can perform object detection without training. Here, we focus on training such an object identification model. A crucial practical aspect for an object identification model is to be flexible in input size. As object identification is an image retrieval problem, a suitable method should handle multi-query multi-gallery situations without constraining the number of input images (e.g. by having fixed-size aggregation layers). The key solution to train such a model is the centroid triplet loss (CTL), which aggregates image features to their centroids. CTL yields high accuracy, avoids misleading training signals and keeps the model input size flexible. In our experiments, we establish a new state of the art on the ArmBench object identification task, which shows general applicability of our model. We furthermore demonstrate an integrated unseen object detection pipeline on the challenging HOPE dataset, which requires fine-grained detection. There, our pipeline matches and surpasses related methods which have been trained on dataset-specific data. △ Less

Submitted 8 July, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

Comments: Accepted to CASE 2024

arXiv:2402.05480 [pdf]

Kontextbasierte Aktivitätserkennung -- Synergie von Mensch und Technik in der Social Networked Industry

Authors: Friedrich Niemann, Christopher Reining

Abstract: In a social networked industry, the focus is on collaboration between humans and technology. Communication is the basic prerequisite for synergetic collaboration between all players. It includes non-verbal as well as verbal interactions. To enable non-verbal interaction, machines must be able to detect and understand human movements. This article presents the ongoing fundamental research on the an… ▽ More In a social networked industry, the focus is on collaboration between humans and technology. Communication is the basic prerequisite for synergetic collaboration between all players. It includes non-verbal as well as verbal interactions. To enable non-verbal interaction, machines must be able to detect and understand human movements. This article presents the ongoing fundamental research on the analysis of human movements using sensor-based activity recognition and identifies potential for a transfer to industrial applications. The focus is on the practical feasibility of activity recognition by adding further data streams such as the position data of logistical objects and tools, meaning the context in which a certain activity is carried out. -- In der Social Networked Industry steht die Zusammenarbeit von Mensch und Technik im Vordergrund. Grundvoraussetzung für eine synergetische Zusammenarbeit aller Akteure ist die Kommunikation, welche neben verbalen auch nonverbale Interaktionen umfasst. Um eine nonverbale Interaktion zu ermöglichen, müssen Maschinen in der Lage sein, menschliche Bewegungen zu erfassen und zu verstehen. Dieser Beitrag stellt die laufende Grundlagenforschung zur Analyse menschlicher Bewegungen mittels sensorgestützter Aktivitätserkennung vor und zeigt Anknüpfungspunkte für einen Transfer in industrielle Anwendungen. Im Fokus steht die Praxistauglichkeit der Aktivitätserkennung durch die Hinzunahme weiterer Datenströme wie beispielsweise den Positionsdaten logistischer Objekte und Hilfsmitteln, d. h. dem Kontext, in dem eine gewisse Aktivität ausgeführt wird. △ Less

Submitted 8 February, 2024; originally announced February 2024.

Comments: in German language. 30. Deutscher Materialfluss-Kongress 2023

arXiv:2310.14914 [pdf, other]

Object Pose Estimation Annotation Pipeline for Multi-view Monocular Camera Systems in Industrial Settings

Authors: Hazem Youssef, Frederik Polachowski, Jérôme Rutinowski, Moritz Roidl, Christopher Reining

Abstract: Object localization, and more specifically object pose estimation, in large industrial spaces such as warehouses and production facilities, is essential for material flow operations. Traditional approaches rely on artificial artifacts installed in the environment or excessively expensive equipment, that is not suitable at scale. A more practical approach is to utilize existing cameras in such spac… ▽ More Object localization, and more specifically object pose estimation, in large industrial spaces such as warehouses and production facilities, is essential for material flow operations. Traditional approaches rely on artificial artifacts installed in the environment or excessively expensive equipment, that is not suitable at scale. A more practical approach is to utilize existing cameras in such spaces in order to address the underlying pose estimation problem and to localize objects of interest. In order to leverage state-of-the-art methods in deep learning for object pose estimation, large amounts of data need to be collected and annotated. In this work, we provide an approach to the annotation of large datasets of monocular images without the need for manual labor. Our approach localizes cameras in space, unifies their location with a motion capture system, and uses a set of linear map**s to project 3D models of objects of interest at their ground truth 6D pose locations. We test our pipeline on a custom dataset collected from a system of eight cameras in an industrial setting that mimics the intended area of operation. Our approach was able to provide consistent quality annotations for our dataset with 26, 482 object instances at a fraction of the time required by human annotators. △ Less

Submitted 23 October, 2023; originally announced October 2023.

arXiv:2304.01585 [pdf, other]

Multi-Channel Time-Series Person and Soft-Biometric Identification

Authors: Nilah Ravi Nair, Fernando Moya Rueda, Christopher Reining, Gernot A. Fink

Abstract: Multi-channel time-series datasets are popular in the context of human activity recognition (HAR). On-body device (OBD) recordings of human movements are often preferred for HAR applications not only for their reliability but as an approach for identity protection, e.g., in industrial settings. Contradictory, the gait activity is a biometric, as the cyclic movement is distinctive and collectable.… ▽ More Multi-channel time-series datasets are popular in the context of human activity recognition (HAR). On-body device (OBD) recordings of human movements are often preferred for HAR applications not only for their reliability but as an approach for identity protection, e.g., in industrial settings. Contradictory, the gait activity is a biometric, as the cyclic movement is distinctive and collectable. In addition, the gait cycle has proven to contain soft-biometric information of human groups, such as age and height. Though general human movements have not been considered a biometric, they might contain identity information. This work investigates person and soft-biometrics identification from OBD recordings of humans performing different activities using deep architectures. Furthermore, we propose the use of attribute representation for soft-biometric identification. We evaluate the method on four datasets of multi-channel time-series HAR, measuring the performance of a person and soft-biometrics identification and its relation concerning performed activities. We find that person identification is not limited to gait activity. The impact of activities on the identification performance was found to be training and dataset specific. Soft-biometric based attribute representation shows promising results and emphasis the necessity of larger datasets. △ Less

Submitted 4 April, 2023; originally announced April 2023.

Comments: Accepted at the ICPR 2022 workshop: 12th International Workshop on Human Behavior Understanding

arXiv:2304.00950 [pdf, other]

Semi-Automated Computer Vision based Tracking of Multiple Industrial Entities -- A Framework and Dataset Creation Approach

Authors: Jérôme Rutinowski, Hazem Youssef, Sven Franke, Irfan Fachrudin Priyanta, Frederik Polachowski, Moritz Roidl, Christopher Reining

Abstract: This contribution presents the TOMIE framework (Tracking Of Multiple Industrial Entities), a framework for the continuous tracking of industrial entities (e.g., pallets, crates, barrels) over a network of, in this example, six RGB cameras. This framework, makes use of multiple sensors, data pipelines and data annotation procedures, and is described in detail in this contribution. With the vision o… ▽ More This contribution presents the TOMIE framework (Tracking Of Multiple Industrial Entities), a framework for the continuous tracking of industrial entities (e.g., pallets, crates, barrels) over a network of, in this example, six RGB cameras. This framework, makes use of multiple sensors, data pipelines and data annotation procedures, and is described in detail in this contribution. With the vision of a fully automated tracking system for industrial entities in mind, it enables researchers to efficiently capture high quality data in an industrial setting. Using this framework, an image dataset, the TOMIE dataset, is created, which at the same time is used to gauge the framework's validity. This dataset contains annotation files for 112,860 frames and 640,936 entity instances that are captured from a set of six cameras that perceive a large indoor space. This dataset out-scales comparable datasets by a factor of four and is made up of scenarios, drawn from industrial applications from the sector of warehousing. Three tracking algorithms, namely ByteTrack, Bot-Sort and SiamMOT are applied to this dataset, serving as a proof-of-concept and providing tracking results that are comparable to the state of the art. △ Less

Submitted 3 April, 2023; originally announced April 2023.

arXiv:2301.10161 [pdf, other]

Dataset Bias in Human Activity Recognition

Authors: Nilah Ravi Nair, Lena Schmid, Fernando Moya Rueda, Markus Pauly, Gernot A. Fink, Christopher Reining

Abstract: When creating multi-channel time-series datasets for Human Activity Recognition (HAR), researchers are faced with the issue of subject selection criteria. It is unknown what physical characteristics and/or soft-biometrics, such as age, height, and weight, need to be taken into account to train a classifier to achieve robustness towards heterogeneous populations in the training and testing data. Th… ▽ More When creating multi-channel time-series datasets for Human Activity Recognition (HAR), researchers are faced with the issue of subject selection criteria. It is unknown what physical characteristics and/or soft-biometrics, such as age, height, and weight, need to be taken into account to train a classifier to achieve robustness towards heterogeneous populations in the training and testing data. This contribution statistically curates the training data to assess to what degree the physical characteristics of humans influence HAR performance. We evaluate the performance of a state-of-the-art convolutional neural network on two HAR datasets that vary in the sensors, activities, and recording for time-series HAR. The training data is intentionally biased with respect to human characteristics to determine the features that impact motion behaviour. The evaluations brought forth the impact of the subjects' characteristics on HAR. Thus, providing insights regarding the robustness of the classifier with respect to heterogeneous populations. The study is a step forward in the direction of fair and trustworthy artificial intelligence by attempting to quantify representation bias in multi-channel time series HAR data. △ Less

Submitted 19 January, 2023; originally announced January 2023.

Comments: Submitted for review to THE 32nd INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI-23)

arXiv:2212.10105 [pdf, other]

On the Applicability of Synthetic Data for Re-Identification

Authors: Jérôme Rutinowski, Bhargav Vankayalapati, Nils Schwenzfeier, Maribel Acosta, Christopher Reining

Abstract: This contribution demonstrates the feasibility of applying Generative Adversarial Networks (GANs) on images of EPAL pallet blocks for dataset enhancement in the context of re-identification. For many industrial applications of re-identification methods, datasets of sufficient volume would otherwise be unattainable in non-laboratory settings. Using a state-of-the-art GAN architecture, namely CycleG… ▽ More This contribution demonstrates the feasibility of applying Generative Adversarial Networks (GANs) on images of EPAL pallet blocks for dataset enhancement in the context of re-identification. For many industrial applications of re-identification methods, datasets of sufficient volume would otherwise be unattainable in non-laboratory settings. Using a state-of-the-art GAN architecture, namely CycleGAN, images of pallet blocks rotated to their left-hand side were generated from images of visually centered pallet blocks, based on images of rotated pallet blocks that were recorded as part of a previously recorded and published dataset. In this process, the unique chipwood pattern of the pallet block surface structure was retained, only changing the orientation of the pallet block itself. By doing so, synthetic data for re-identification testing and training purposes was generated, in a manner that is distinct from ordinary data augmentation. In total, 1,004 new images of pallet blocks were generated. The quality of the generated images was gauged using a perspective classifier that was trained on the original images and then applied to the synthetic ones, comparing the accuracy between the two sets of images. The classification accuracy was 98% for the original images and 92% for the synthetic images. In addition, the generated images were also used in a re-identification task, in order to re-identify original images based on synthetic ones. The accuracy in this scenario was up to 88% for synthetic images, compared to 96% for original images. Through this evaluation, it is established, whether or not a generated pallet block image closely resembles its original counterpart. △ Less

Submitted 20 December, 2022; originally announced December 2022.

Comments: Accepted as a non-archival paper in AAAI23 workshop AI2SE

arXiv:2212.04721 [pdf, other]

A Grid-based Sensor Floor Platform for Robot Localization using Machine Learning

Authors: Anas Gouda, Danny Heinrich, Mirco Hünnefeld, Irfan Fachrudin Priyanta, Christopher Reining, Moritz Roidl

Abstract: Wireless Sensor Network (WSN) applications reshape the trend of warehouse monitoring systems allowing them to track and locate massive numbers of logistic entities in real-time. To support the tasks, classic Radio Frequency (RF)-based localization approaches (e.g. triangulation and trilateration) confront challenges due to multi-path fading and signal loss in noisy warehouse environment. In this p… ▽ More Wireless Sensor Network (WSN) applications reshape the trend of warehouse monitoring systems allowing them to track and locate massive numbers of logistic entities in real-time. To support the tasks, classic Radio Frequency (RF)-based localization approaches (e.g. triangulation and trilateration) confront challenges due to multi-path fading and signal loss in noisy warehouse environment. In this paper, we investigate machine learning methods using a new grid-based WSN platform called Sensor Floor that can overcome the issues. Sensor Floor consists of 345 nodes installed across the floor of our logistic research hall with dual-band RF and Inertial Measurement Unit (IMU) sensors. Our goal is to localize all logistic entities, for this study we use a mobile robot. We record distributed sensing measurements of Received Signal Strength Indicator (RSSI) and IMU values as the dataset and position tracking from Vicon system as the ground truth. The asynchronous collected data is pre-processed and trained using Random Forest and Convolutional Neural Network (CNN). The CNN model with regularization outperforms the Random Forest in terms of localization accuracy with aproximate 15 cm. Moreover, the CNN architecture can be configured flexibly depending on the scenario in the warehouse. The hardware, software and the CNN architecture of the Sensor Floor are open-source under https://github.com/FLW-TUDO/sensorfloor. △ Less

Submitted 9 December, 2022; originally announced December 2022.

Comments: This is a preprint version for IEEE I2MTC 2023

arXiv:2212.03346 [pdf]

doi 10.48550/arXiv.2212.03346

UAVs for Industries and Supply Chain Management

Authors: Shrutarv Awasthi, Nils Gramse, Dr. Christopher Reining, Dr. Moritz Roidl

Abstract: This work aims at showing that it is feasible and safe to use a swarm of Unmanned Aerial Vehicles (UAVs) indoors alongside humans. UAVs are increasingly being integrated under the Industry 4.0 framework. UAV swarms are primarily deployed outdoors in civil and military applications, but the opportunities for using them in manufacturing and supply chain management are immense. There is extensive res… ▽ More This work aims at showing that it is feasible and safe to use a swarm of Unmanned Aerial Vehicles (UAVs) indoors alongside humans. UAVs are increasingly being integrated under the Industry 4.0 framework. UAV swarms are primarily deployed outdoors in civil and military applications, but the opportunities for using them in manufacturing and supply chain management are immense. There is extensive research on UAV technology, e.g., localization, control, and computer vision, but less research on the practical application of UAVs in industry. UAV technology could improve data collection and monitoring, enhance decision-making in an Internet of Things framework and automate time-consuming and redundant tasks in the industry. However, there is a gap between the technological developments of UAVs and their integration into the supply chain. Therefore, this work focuses on automating the task of transporting packages utilizing a swarm of small UAVs operating alongside humans. MoCap system, ROS, and unity are used for localization, inter-process communication and visualization. Multiple experiments are performed with the UAVs in wander and swarm mode in a warehouse like environment. △ Less

Submitted 14 March, 2023; v1 submitted 6 December, 2022; originally announced December 2022.

Comments: Accpeted at the XXIV INTERNATIONAL CONFERENCE ON "MATERIAL HANDLING, CONSTRUCTIONS AND LOGISTICS"

arXiv:2212.02266 [pdf]

Applications of human activity recognition in industrial processes -- Synergy of human and technology

Authors: Friedrich Niemann, Christopher Reining, Hülya Bas, Sven Franke

Abstract: Human-technology collaboration relies on verbal and non-verbal communication. Machines must be able to detect and understand the movements of humans to facilitate non-verbal communication. In this article, we introduce ongoing research on human activity recognition in intralogistics, and show how it can be applied in industrial settings. We show how semantic attributes can be used to describe huma… ▽ More Human-technology collaboration relies on verbal and non-verbal communication. Machines must be able to detect and understand the movements of humans to facilitate non-verbal communication. In this article, we introduce ongoing research on human activity recognition in intralogistics, and show how it can be applied in industrial settings. We show how semantic attributes can be used to describe human activities flexibly and how context informantion increases the performance of classifiers to recognise them automatically. Beyond that, we present a concept based on a cyber-physical twin that can reduce the effort and time necessary to create a training dataset for human activity recognition. In the future, it will be possible to train a classifier solely with realistic simulation data, while maintaining or even increasing the classification performance. △ Less

Submitted 5 December, 2022; originally announced December 2022.

Comments: Accepted at XXIV International Conference on Material Handling, Constructions and Logistics, MHCL 2022, Belgrade, Serbia

Report number: ISBN 978-86-6060-134-8

arXiv:2204.13613 [pdf, other]

DoPose-6D dataset for object segmentation and 6D pose estimation

Authors: Anas Gouda, Abraham Ghanem, Christopher Reining

Abstract: Scene understanding is essential in determining how intelligent robotic gras** and manipulation could get. It is a problem that can be approached using different techniques: seen object segmentation, unseen object segmentation, or 6D pose estimation. These techniques can even be extended to multi-view. Most of the work on these problems depends on synthetic datasets due to the lack of real datas… ▽ More Scene understanding is essential in determining how intelligent robotic gras** and manipulation could get. It is a problem that can be approached using different techniques: seen object segmentation, unseen object segmentation, or 6D pose estimation. These techniques can even be extended to multi-view. Most of the work on these problems depends on synthetic datasets due to the lack of real datasets that are big enough for training and merely use the available real datasets for evaluation. This encourages us to introduce a new dataset (called DoPose-6D). The dataset contains annotations for 6D Pose estimation, object segmentation, and multi-view annotations, which serve all the pre-mentioned techniques. The dataset contains two types of scenes bin picking and tabletop, with the primary motive for this dataset collection being bin picking. We illustrate the effect of this dataset in the context of unseen object segmentation and provide some insights on mixing synthetic and real data for the training. We train a Mask R-CNN model that is practical to be used in industry and robotic gras** applications. Finally, we show how our dataset boosted the performance of a Mask R-CNN model. Our DoPose-6D dataset, trained network models, pipeline code, and ROS driver are available online. △ Less

Submitted 28 November, 2022; v1 submitted 28 April, 2022; originally announced April 2022.

Comments: accepted for IEEE ICMLA 2022

arXiv:2105.13850 [pdf, other]

pRSL: Interpretable Multi-label Stacking by Learning Probabilistic Rules

Authors: Michael Kirchhof, Lena Schmid, Christopher Reining, Michael ten Hompel, Markus Pauly

Abstract: A key task in multi-label classification is modeling the structure between the involved classes. Modeling this structure by probabilistic and interpretable means enables application in a broad variety of tasks such as zero-shot learning or learning from incomplete data. In this paper, we present the probabilistic rule stacking learner (pRSL) which uses probabilistic propositional logic rules and b… ▽ More A key task in multi-label classification is modeling the structure between the involved classes. Modeling this structure by probabilistic and interpretable means enables application in a broad variety of tasks such as zero-shot learning or learning from incomplete data. In this paper, we present the probabilistic rule stacking learner (pRSL) which uses probabilistic propositional logic rules and belief propagation to combine the predictions of several underlying classifiers. We derive algorithms for exact and approximate inference and learning, and show that pRSL reaches state-of-the-art performance on various benchmark datasets. In the process, we introduce a novel multicategorical generalization of the noisy-or gate. Additionally, we report simulation results on the quality of loopy belief propagation algorithms for approximate inference in bipartite noisy-or networks. △ Less

Submitted 28 May, 2021; originally announced May 2021.

Showing 1–12 of 12 results for author: Reining, C