-
Learning Embeddings with Centroid Triplet Loss for Object Identification in Robotic Gras**
Authors:
Anas Gouda,
Max Schwarz,
Christopher Reining,
Sven Behnke,
Alice Kirchheim
Abstract:
Foundation models are a strong trend in deep learning and computer vision. These models serve as a base for applications as they require minor or no further fine-tuning by developers to integrate into their applications. Foundation models for zero-shot object segmentation such as Segment Anything (SAM) output segmentation masks from images without any further object information. When they are foll…
▽ More
Foundation models are a strong trend in deep learning and computer vision. These models serve as a base for applications as they require minor or no further fine-tuning by developers to integrate into their applications. Foundation models for zero-shot object segmentation such as Segment Anything (SAM) output segmentation masks from images without any further object information. When they are followed in a pipeline by an object identification model, they can perform object detection without training. Here, we focus on training such an object identification model. A crucial practical aspect for an object identification model is to be flexible in input size. As object identification is an image retrieval problem, a suitable method should handle multi-query multi-gallery situations without constraining the number of input images (e.g. by having fixed-size aggregation layers). The key solution to train such a model is the centroid triplet loss (CTL), which aggregates image features to their centroids. CTL yields high accuracy, avoids misleading training signals and keeps the model input size flexible. In our experiments, we establish a new state of the art on the ArmBench object identification task, which shows general applicability of our model. We furthermore demonstrate an integrated unseen object detection pipeline on the challenging HOPE dataset, which requires fine-grained detection. There, our pipeline matches and surpasses related methods which have been trained on dataset-specific data.
△ Less
Submitted 8 July, 2024; v1 submitted 9 April, 2024;
originally announced April 2024.
-
Kontextbasierte Aktivitätserkennung -- Synergie von Mensch und Technik in der Social Networked Industry
Authors:
Friedrich Niemann,
Christopher Reining
Abstract:
In a social networked industry, the focus is on collaboration between humans and technology. Communication is the basic prerequisite for synergetic collaboration between all players. It includes non-verbal as well as verbal interactions. To enable non-verbal interaction, machines must be able to detect and understand human movements. This article presents the ongoing fundamental research on the an…
▽ More
In a social networked industry, the focus is on collaboration between humans and technology. Communication is the basic prerequisite for synergetic collaboration between all players. It includes non-verbal as well as verbal interactions. To enable non-verbal interaction, machines must be able to detect and understand human movements. This article presents the ongoing fundamental research on the analysis of human movements using sensor-based activity recognition and identifies potential for a transfer to industrial applications. The focus is on the practical feasibility of activity recognition by adding further data streams such as the position data of logistical objects and tools, meaning the context in which a certain activity is carried out.
--
In der Social Networked Industry steht die Zusammenarbeit von Mensch und Technik im Vordergrund. Grundvoraussetzung für eine synergetische Zusammenarbeit aller Akteure ist die Kommunikation, welche neben verbalen auch nonverbale Interaktionen umfasst. Um eine nonverbale Interaktion zu ermöglichen, müssen Maschinen in der Lage sein, menschliche Bewegungen zu erfassen und zu verstehen. Dieser Beitrag stellt die laufende Grundlagenforschung zur Analyse menschlicher Bewegungen mittels sensorgestützter Aktivitätserkennung vor und zeigt Anknüpfungspunkte für einen Transfer in industrielle Anwendungen. Im Fokus steht die Praxistauglichkeit der Aktivitätserkennung durch die Hinzunahme weiterer Datenströme wie beispielsweise den Positionsdaten logistischer Objekte und Hilfsmitteln, d. h. dem Kontext, in dem eine gewisse Aktivität ausgeführt wird.
△ Less
Submitted 8 February, 2024;
originally announced February 2024.
-
Object Pose Estimation Annotation Pipeline for Multi-view Monocular Camera Systems in Industrial Settings
Authors:
Hazem Youssef,
Frederik Polachowski,
Jérôme Rutinowski,
Moritz Roidl,
Christopher Reining
Abstract:
Object localization, and more specifically object pose estimation, in large industrial spaces such as warehouses and production facilities, is essential for material flow operations. Traditional approaches rely on artificial artifacts installed in the environment or excessively expensive equipment, that is not suitable at scale. A more practical approach is to utilize existing cameras in such spac…
▽ More
Object localization, and more specifically object pose estimation, in large industrial spaces such as warehouses and production facilities, is essential for material flow operations. Traditional approaches rely on artificial artifacts installed in the environment or excessively expensive equipment, that is not suitable at scale. A more practical approach is to utilize existing cameras in such spaces in order to address the underlying pose estimation problem and to localize objects of interest. In order to leverage state-of-the-art methods in deep learning for object pose estimation, large amounts of data need to be collected and annotated. In this work, we provide an approach to the annotation of large datasets of monocular images without the need for manual labor. Our approach localizes cameras in space, unifies their location with a motion capture system, and uses a set of linear map**s to project 3D models of objects of interest at their ground truth 6D pose locations. We test our pipeline on a custom dataset collected from a system of eight cameras in an industrial setting that mimics the intended area of operation. Our approach was able to provide consistent quality annotations for our dataset with 26, 482 object instances at a fraction of the time required by human annotators.
△ Less
Submitted 23 October, 2023;
originally announced October 2023.
-
Multi-Channel Time-Series Person and Soft-Biometric Identification
Authors:
Nilah Ravi Nair,
Fernando Moya Rueda,
Christopher Reining,
Gernot A. Fink
Abstract:
Multi-channel time-series datasets are popular in the context of human activity recognition (HAR). On-body device (OBD) recordings of human movements are often preferred for HAR applications not only for their reliability but as an approach for identity protection, e.g., in industrial settings. Contradictory, the gait activity is a biometric, as the cyclic movement is distinctive and collectable.…
▽ More
Multi-channel time-series datasets are popular in the context of human activity recognition (HAR). On-body device (OBD) recordings of human movements are often preferred for HAR applications not only for their reliability but as an approach for identity protection, e.g., in industrial settings. Contradictory, the gait activity is a biometric, as the cyclic movement is distinctive and collectable. In addition, the gait cycle has proven to contain soft-biometric information of human groups, such as age and height. Though general human movements have not been considered a biometric, they might contain identity information. This work investigates person and soft-biometrics identification from OBD recordings of humans performing different activities using deep architectures. Furthermore, we propose the use of attribute representation for soft-biometric identification. We evaluate the method on four datasets of multi-channel time-series HAR, measuring the performance of a person and soft-biometrics identification and its relation concerning performed activities. We find that person identification is not limited to gait activity. The impact of activities on the identification performance was found to be training and dataset specific. Soft-biometric based attribute representation shows promising results and emphasis the necessity of larger datasets.
△ Less
Submitted 4 April, 2023;
originally announced April 2023.
-
Semi-Automated Computer Vision based Tracking of Multiple Industrial Entities -- A Framework and Dataset Creation Approach
Authors:
Jérôme Rutinowski,
Hazem Youssef,
Sven Franke,
Irfan Fachrudin Priyanta,
Frederik Polachowski,
Moritz Roidl,
Christopher Reining
Abstract:
This contribution presents the TOMIE framework (Tracking Of Multiple Industrial Entities), a framework for the continuous tracking of industrial entities (e.g., pallets, crates, barrels) over a network of, in this example, six RGB cameras. This framework, makes use of multiple sensors, data pipelines and data annotation procedures, and is described in detail in this contribution. With the vision o…
▽ More
This contribution presents the TOMIE framework (Tracking Of Multiple Industrial Entities), a framework for the continuous tracking of industrial entities (e.g., pallets, crates, barrels) over a network of, in this example, six RGB cameras. This framework, makes use of multiple sensors, data pipelines and data annotation procedures, and is described in detail in this contribution. With the vision of a fully automated tracking system for industrial entities in mind, it enables researchers to efficiently capture high quality data in an industrial setting. Using this framework, an image dataset, the TOMIE dataset, is created, which at the same time is used to gauge the framework's validity. This dataset contains annotation files for 112,860 frames and 640,936 entity instances that are captured from a set of six cameras that perceive a large indoor space. This dataset out-scales comparable datasets by a factor of four and is made up of scenarios, drawn from industrial applications from the sector of warehousing. Three tracking algorithms, namely ByteTrack, Bot-Sort and SiamMOT are applied to this dataset, serving as a proof-of-concept and providing tracking results that are comparable to the state of the art.
△ Less
Submitted 3 April, 2023;
originally announced April 2023.
-
Dataset Bias in Human Activity Recognition
Authors:
Nilah Ravi Nair,
Lena Schmid,
Fernando Moya Rueda,
Markus Pauly,
Gernot A. Fink,
Christopher Reining
Abstract:
When creating multi-channel time-series datasets for Human Activity Recognition (HAR), researchers are faced with the issue of subject selection criteria. It is unknown what physical characteristics and/or soft-biometrics, such as age, height, and weight, need to be taken into account to train a classifier to achieve robustness towards heterogeneous populations in the training and testing data. Th…
▽ More
When creating multi-channel time-series datasets for Human Activity Recognition (HAR), researchers are faced with the issue of subject selection criteria. It is unknown what physical characteristics and/or soft-biometrics, such as age, height, and weight, need to be taken into account to train a classifier to achieve robustness towards heterogeneous populations in the training and testing data. This contribution statistically curates the training data to assess to what degree the physical characteristics of humans influence HAR performance. We evaluate the performance of a state-of-the-art convolutional neural network on two HAR datasets that vary in the sensors, activities, and recording for time-series HAR. The training data is intentionally biased with respect to human characteristics to determine the features that impact motion behaviour. The evaluations brought forth the impact of the subjects' characteristics on HAR. Thus, providing insights regarding the robustness of the classifier with respect to heterogeneous populations. The study is a step forward in the direction of fair and trustworthy artificial intelligence by attempting to quantify representation bias in multi-channel time series HAR data.
△ Less
Submitted 19 January, 2023;
originally announced January 2023.
-
On the Applicability of Synthetic Data for Re-Identification
Authors:
Jérôme Rutinowski,
Bhargav Vankayalapati,
Nils Schwenzfeier,
Maribel Acosta,
Christopher Reining
Abstract:
This contribution demonstrates the feasibility of applying Generative Adversarial Networks (GANs) on images of EPAL pallet blocks for dataset enhancement in the context of re-identification. For many industrial applications of re-identification methods, datasets of sufficient volume would otherwise be unattainable in non-laboratory settings. Using a state-of-the-art GAN architecture, namely CycleG…
▽ More
This contribution demonstrates the feasibility of applying Generative Adversarial Networks (GANs) on images of EPAL pallet blocks for dataset enhancement in the context of re-identification. For many industrial applications of re-identification methods, datasets of sufficient volume would otherwise be unattainable in non-laboratory settings. Using a state-of-the-art GAN architecture, namely CycleGAN, images of pallet blocks rotated to their left-hand side were generated from images of visually centered pallet blocks, based on images of rotated pallet blocks that were recorded as part of a previously recorded and published dataset. In this process, the unique chipwood pattern of the pallet block surface structure was retained, only changing the orientation of the pallet block itself. By doing so, synthetic data for re-identification testing and training purposes was generated, in a manner that is distinct from ordinary data augmentation. In total, 1,004 new images of pallet blocks were generated. The quality of the generated images was gauged using a perspective classifier that was trained on the original images and then applied to the synthetic ones, comparing the accuracy between the two sets of images. The classification accuracy was 98% for the original images and 92% for the synthetic images. In addition, the generated images were also used in a re-identification task, in order to re-identify original images based on synthetic ones. The accuracy in this scenario was up to 88% for synthetic images, compared to 96% for original images. Through this evaluation, it is established, whether or not a generated pallet block image closely resembles its original counterpart.
△ Less
Submitted 20 December, 2022;
originally announced December 2022.
-
A Grid-based Sensor Floor Platform for Robot Localization using Machine Learning
Authors:
Anas Gouda,
Danny Heinrich,
Mirco Hünnefeld,
Irfan Fachrudin Priyanta,
Christopher Reining,
Moritz Roidl
Abstract:
Wireless Sensor Network (WSN) applications reshape the trend of warehouse monitoring systems allowing them to track and locate massive numbers of logistic entities in real-time. To support the tasks, classic Radio Frequency (RF)-based localization approaches (e.g. triangulation and trilateration) confront challenges due to multi-path fading and signal loss in noisy warehouse environment. In this p…
▽ More
Wireless Sensor Network (WSN) applications reshape the trend of warehouse monitoring systems allowing them to track and locate massive numbers of logistic entities in real-time. To support the tasks, classic Radio Frequency (RF)-based localization approaches (e.g. triangulation and trilateration) confront challenges due to multi-path fading and signal loss in noisy warehouse environment. In this paper, we investigate machine learning methods using a new grid-based WSN platform called Sensor Floor that can overcome the issues. Sensor Floor consists of 345 nodes installed across the floor of our logistic research hall with dual-band RF and Inertial Measurement Unit (IMU) sensors. Our goal is to localize all logistic entities, for this study we use a mobile robot. We record distributed sensing measurements of Received Signal Strength Indicator (RSSI) and IMU values as the dataset and position tracking from Vicon system as the ground truth. The asynchronous collected data is pre-processed and trained using Random Forest and Convolutional Neural Network (CNN). The CNN model with regularization outperforms the Random Forest in terms of localization accuracy with aproximate 15 cm. Moreover, the CNN architecture can be configured flexibly depending on the scenario in the warehouse. The hardware, software and the CNN architecture of the Sensor Floor are open-source under https://github.com/FLW-TUDO/sensorfloor.
△ Less
Submitted 9 December, 2022;
originally announced December 2022.
-
UAVs for Industries and Supply Chain Management
Authors:
Shrutarv Awasthi,
Nils Gramse,
Dr. Christopher Reining,
Dr. Moritz Roidl
Abstract:
This work aims at showing that it is feasible and safe to use a swarm of Unmanned Aerial Vehicles (UAVs) indoors alongside humans. UAVs are increasingly being integrated under the Industry 4.0 framework. UAV swarms are primarily deployed outdoors in civil and military applications, but the opportunities for using them in manufacturing and supply chain management are immense. There is extensive res…
▽ More
This work aims at showing that it is feasible and safe to use a swarm of Unmanned Aerial Vehicles (UAVs) indoors alongside humans. UAVs are increasingly being integrated under the Industry 4.0 framework. UAV swarms are primarily deployed outdoors in civil and military applications, but the opportunities for using them in manufacturing and supply chain management are immense. There is extensive research on UAV technology, e.g., localization, control, and computer vision, but less research on the practical application of UAVs in industry. UAV technology could improve data collection and monitoring, enhance decision-making in an Internet of Things framework and automate time-consuming and redundant tasks in the industry. However, there is a gap between the technological developments of UAVs and their integration into the supply chain. Therefore, this work focuses on automating the task of transporting packages utilizing a swarm of small UAVs operating alongside humans. MoCap system, ROS, and unity are used for localization, inter-process communication and visualization. Multiple experiments are performed with the UAVs in wander and swarm mode in a warehouse like environment.
△ Less
Submitted 14 March, 2023; v1 submitted 6 December, 2022;
originally announced December 2022.
-
Applications of human activity recognition in industrial processes -- Synergy of human and technology
Authors:
Friedrich Niemann,
Christopher Reining,
Hülya Bas,
Sven Franke
Abstract:
Human-technology collaboration relies on verbal and non-verbal communication. Machines must be able to detect and understand the movements of humans to facilitate non-verbal communication. In this article, we introduce ongoing research on human activity recognition in intralogistics, and show how it can be applied in industrial settings. We show how semantic attributes can be used to describe huma…
▽ More
Human-technology collaboration relies on verbal and non-verbal communication. Machines must be able to detect and understand the movements of humans to facilitate non-verbal communication. In this article, we introduce ongoing research on human activity recognition in intralogistics, and show how it can be applied in industrial settings. We show how semantic attributes can be used to describe human activities flexibly and how context informantion increases the performance of classifiers to recognise them automatically. Beyond that, we present a concept based on a cyber-physical twin that can reduce the effort and time necessary to create a training dataset for human activity recognition. In the future, it will be possible to train a classifier solely with realistic simulation data, while maintaining or even increasing the classification performance.
△ Less
Submitted 5 December, 2022;
originally announced December 2022.
-
DoPose-6D dataset for object segmentation and 6D pose estimation
Authors:
Anas Gouda,
Abraham Ghanem,
Christopher Reining
Abstract:
Scene understanding is essential in determining how intelligent robotic gras** and manipulation could get. It is a problem that can be approached using different techniques: seen object segmentation, unseen object segmentation, or 6D pose estimation. These techniques can even be extended to multi-view. Most of the work on these problems depends on synthetic datasets due to the lack of real datas…
▽ More
Scene understanding is essential in determining how intelligent robotic gras** and manipulation could get. It is a problem that can be approached using different techniques: seen object segmentation, unseen object segmentation, or 6D pose estimation. These techniques can even be extended to multi-view. Most of the work on these problems depends on synthetic datasets due to the lack of real datasets that are big enough for training and merely use the available real datasets for evaluation.
This encourages us to introduce a new dataset (called DoPose-6D). The dataset contains annotations for 6D Pose estimation, object segmentation, and multi-view annotations, which serve all the pre-mentioned techniques. The dataset contains two types of scenes bin picking and tabletop, with the primary motive for this dataset collection being bin picking.
We illustrate the effect of this dataset in the context of unseen object segmentation and provide some insights on mixing synthetic and real data for the training. We train a Mask R-CNN model that is practical to be used in industry and robotic gras** applications. Finally, we show how our dataset boosted the performance of a Mask R-CNN model.
Our DoPose-6D dataset, trained network models, pipeline code, and ROS driver are available online.
△ Less
Submitted 28 November, 2022; v1 submitted 28 April, 2022;
originally announced April 2022.
-
pRSL: Interpretable Multi-label Stacking by Learning Probabilistic Rules
Authors:
Michael Kirchhof,
Lena Schmid,
Christopher Reining,
Michael ten Hompel,
Markus Pauly
Abstract:
A key task in multi-label classification is modeling the structure between the involved classes. Modeling this structure by probabilistic and interpretable means enables application in a broad variety of tasks such as zero-shot learning or learning from incomplete data. In this paper, we present the probabilistic rule stacking learner (pRSL) which uses probabilistic propositional logic rules and b…
▽ More
A key task in multi-label classification is modeling the structure between the involved classes. Modeling this structure by probabilistic and interpretable means enables application in a broad variety of tasks such as zero-shot learning or learning from incomplete data. In this paper, we present the probabilistic rule stacking learner (pRSL) which uses probabilistic propositional logic rules and belief propagation to combine the predictions of several underlying classifiers. We derive algorithms for exact and approximate inference and learning, and show that pRSL reaches state-of-the-art performance on various benchmark datasets.
In the process, we introduce a novel multicategorical generalization of the noisy-or gate. Additionally, we report simulation results on the quality of loopy belief propagation algorithms for approximate inference in bipartite noisy-or networks.
△ Less
Submitted 28 May, 2021;
originally announced May 2021.