Search | arXiv e-print repository

Deep learning empowered sensor fusion to improve infant movement classification

Authors: Tomas Kulvicius, Dajie Zhang, Luise Poustka, Sven Bölte, Lennart Jahn, Sarah Flügge, Marc Kraft, Markus Zweckstetter, Karin Nielsen-Saines, Florentin Wörgötter, Peter B Marschik

Abstract: There is a recent boom in the development of AI solutions to facilitate and enhance diagnostic procedures for established clinical tools. To assess the integrity of the develo** nervous system, the Prechtl general movement assessment (GMA) is recognized for its clinical value in diagnosing neurological impairments in early infancy. GMA has been increasingly augmented through machine learning app… ▽ More There is a recent boom in the development of AI solutions to facilitate and enhance diagnostic procedures for established clinical tools. To assess the integrity of the develo** nervous system, the Prechtl general movement assessment (GMA) is recognized for its clinical value in diagnosing neurological impairments in early infancy. GMA has been increasingly augmented through machine learning approaches intending to scale-up its application, circumvent costs in the training of human assessors and further standardize classification of spontaneous motor patterns. Available deep learning tools, all of which are based on single sensor modalities, are however still considerably inferior to that of well-trained human assessors. These approaches are hardly comparable as all models are designed, trained and evaluated on proprietary/silo-data sets. With this study we propose a sensor fusion approach for assessing fidgety movements (FMs) comparing three different sensor modalities (pressure, inertial, and visual sensors). Various combinations and two sensor fusion approaches (late and early fusion) for infant movement classification were tested to evaluate whether a multi-sensor system outperforms single modality assessments. The performance of the three-sensor fusion (classification accuracy of 94.5\%) was significantly higher than that of any single modality evaluated, suggesting the sensor fusion approach is a promising avenue for automated classification of infant motor patterns. The development of a robust sensor fusion system may significantly enhance AI-based early recognition of neurofunctions, ultimately facilitating automated early detection of neurodevelopmental conditions. △ Less

Submitted 21 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

arXiv:2401.16424 [pdf, other]

Computer Vision for Primate Behavior Analysis in the Wild

Authors: Richard Vogg, Timo Lüddecke, Jonathan Henrich, Sharmita Dey, Matthias Nuske, Valentin Hassler, Derek Murphy, Julia Fischer, Julia Ostner, Oliver Schülke, Peter M. Kappeler, Claudia Fichtel, Alexander Gail, Stefan Treue, Hansjörg Scherberger, Florentin Wörgötter, Alexander S. Ecker

Abstract: Advances in computer vision as well as increasingly widespread video-based behavioral monitoring have great potential for transforming how we study animal cognition and behavior. However, there is still a fairly large gap between the exciting prospects and what can actually be achieved in practice today, especially in videos from the wild. With this perspective paper, we want to contribute towards… ▽ More Advances in computer vision as well as increasingly widespread video-based behavioral monitoring have great potential for transforming how we study animal cognition and behavior. However, there is still a fairly large gap between the exciting prospects and what can actually be achieved in practice today, especially in videos from the wild. With this perspective paper, we want to contribute towards closing this gap, by guiding behavioral scientists in what can be expected from current methods and steering computer vision researchers towards problems that are relevant to advance research in animal behavior. We start with a survey of the state-of-the-art methods for computer vision problems that are directly relevant to the video-based study of animal behavior, including object detection, multi-individual tracking, (inter)action recognition and individual identification. We then review methods for effort-efficient learning, which is one of the biggest challenges from a practical perspective. Finally, we close with an outlook into the future of the emerging field of computer vision for animal behavior, where we argue that the field should move fast beyond the common frame-by-frame processing and treat video as a first-class citizen. △ Less

Submitted 29 January, 2024; originally announced January 2024.

arXiv:2311.07285 [pdf, other]

Multi Sentence Description of Complex Manipulation Action Videos

Authors: Fatemeh Ziaeetabar, Reza Safabakhsh, Saeedeh Momtazi, Minija Tamosiunaite, Florentin Wörgötter

Abstract: Automatic video description requires the generation of natural language statements about the actions, events, and objects in the video. An important human trait, when we describe a video, is that we are able to do this with variable levels of detail. Different from this, existing approaches for automatic video descriptions are mostly focused on single sentence generation at a fixed level of detail… ▽ More Automatic video description requires the generation of natural language statements about the actions, events, and objects in the video. An important human trait, when we describe a video, is that we are able to do this with variable levels of detail. Different from this, existing approaches for automatic video descriptions are mostly focused on single sentence generation at a fixed level of detail. Instead, here we address video description of manipulation actions where different levels of detail are required for being able to convey information about the hierarchical structure of these actions relevant also for modern approaches of robot learning. We propose one hybrid statistical and one end-to-end framework to address this problem. The hybrid method needs much less data for training, because it models statistically uncertainties within the video clips, while in the end-to-end method, which is more data-heavy, we are directly connecting the visual encoder to the language decoder without any intermediate (statistical) processing step. Both frameworks use LSTM stacks to allow for different levels of description granularity and videos can be described by simple single-sentences or complex multiple-sentence descriptions. In addition, quantitative results demonstrate that these methods produce more realistic descriptions than other competing approaches. △ Less

Submitted 13 November, 2023; originally announced November 2023.

arXiv:2310.00670 [pdf, other]

A Hierarchical Graph-based Approach for Recognition and Description Generation of Bimanual Actions in Videos

Authors: Fatemeh Ziaeetabar, Reza Safabakhsh, Saeedeh Momtazi, Minija Tamosiunaite, Florentin Wörgötter

Abstract: Nuanced understanding and the generation of detailed descriptive content for (bimanual) manipulation actions in videos is important for disciplines such as robotics, human-computer interaction, and video content analysis. This study describes a novel method, integrating graph based modeling with layered hierarchical attention mechanisms, resulting in higher precision and better comprehensiveness o… ▽ More Nuanced understanding and the generation of detailed descriptive content for (bimanual) manipulation actions in videos is important for disciplines such as robotics, human-computer interaction, and video content analysis. This study describes a novel method, integrating graph based modeling with layered hierarchical attention mechanisms, resulting in higher precision and better comprehensiveness of video descriptions. To achieve this, we encode, first, the spatio-temporal inter dependencies between objects and actions with scene graphs and we combine this, in a second step, with a novel 3-level architecture creating a hierarchical attention mechanism using Graph Attention Networks (GATs). The 3-level GAT architecture allows recognizing local, but also global contextual elements. This way several descriptions with different semantic complexity can be generated in parallel for the same video clip, enhancing the discriminative accuracy of action recognition and action description. The performance of our approach is empirically tested using several 2D and 3D datasets. By comparing our method to the state of the art we consistently obtain better performance concerning accuracy, precision, and contextual relevance when evaluating action recognition as well as description generation. In a large set of ablation experiments we also assess the role of the different components of our model. With our multi-level approach the system obtains different semantic description depths, often observed in descriptions made by different people, too. Furthermore, better insight into bimanual hand-object interactions as achieved by our model may portend advancements in the field of robotics, enabling the emulation of intricate human actions with heightened precision. △ Less

Submitted 1 October, 2023; originally announced October 2023.

arXiv:2305.09325 [pdf, other]

AdoptODE: Fusion of data and expert knowledge for modeling dynamical systems

Authors: Leon Lettermann, Alejandro Jurado, Timo Betz, Florentin Wörgötter, Sebastian Herzog

Abstract: Building a representative model of a complex system remains a highly challenging problem. While by now there is basic understanding of most physical domains, model design is often hindered by lack of detail, for example concerning model dimensions or its relevant constraints. Here we present a novel model-building approach -- adoptODE -- augmenting basic system descriptions, based on expert knowle… ▽ More Building a representative model of a complex system remains a highly challenging problem. While by now there is basic understanding of most physical domains, model design is often hindered by lack of detail, for example concerning model dimensions or its relevant constraints. Here we present a novel model-building approach -- adoptODE -- augmenting basic system descriptions, based on expert knowledge in the form of ordinary differential equations, with continuous adjoint sensitivity analysis related to artificial neural network principles, based on observable data. With this we have created a general tool, that can be applied to any physical system described by ordinary differential equations. AdoptODE allows validating or extending the initial description, for example with different variables and constraints. This way one arrives at a better-optimised, representative low-dimensional model, which can fit existing data and predict novel experimental outcomes. We validate our method on five, quite different problem domains. (1) Kolmogorov model: Lotka Volterra model where we show the application of adoptODE to continuous-time Markov processes and the performance of adoptODE when working with noisy data. (2) Particle model: Interactive N-body, is a demonstration of the scalability of adoptODE and that even interactive system with a high number of elements can be reconstructed with high precision. (3) Excitable media (heart dynamics) by the Bueno\-Orovio\-Cherry\-Fenton model: AdoptODE can reconstruct the parameters of a high-dimensional model and fields for diffusion driven system for chaotic behaviour. (4) Fluid dynamics: Rayleigh-Bénard Convection where a complete unknown field, the temperature, can be extracted from only velocity data. (5) New experimental data of Zebrafish embryogenesis: This is a case where we extend existing models with new variables. △ Less

Submitted 31 August, 2023; v1 submitted 16 May, 2023; originally announced May 2023.

arXiv:2211.13024 [pdf, other]

Comparison of Motion Encoding Frameworks on Human Manipulation Actions

Authors: Lennart Jahn, Florentin Wörgötter, Tomas Kulvicius

Abstract: Movement generation, and especially generalisation to unseen situations, plays an important role in robotics. Different types of movement generation methods exist such as spline based methods, dynamical system based methods, and methods based on Gaussian mixture models (GMMs). Using a large, new dataset on human manipulations, in this paper we provide a highly detailed comparison of five fundament… ▽ More Movement generation, and especially generalisation to unseen situations, plays an important role in robotics. Different types of movement generation methods exist such as spline based methods, dynamical system based methods, and methods based on Gaussian mixture models (GMMs). Using a large, new dataset on human manipulations, in this paper we provide a highly detailed comparison of five fundamentally different and widely used movement encoding and generation frameworks: dynamic movement primitives (DMPs), time based Gaussian mixture regression (tbGMR), stable estimator of dynamical systems (SEDS), Probabilistic Movement Primitives (ProMP) and Optimal Control Primitives (OCP). We compare these frameworks with respect to their movement encoding efficiency, reconstruction accuracy, and movement generalisation capabilities. The new dataset consists of nine object manipulation actions performed by 12 humans: pick and place, put on top/take down, put inside/take out, hide/uncover, and push/pull with a total of 7,652 movement examples. Our analysis shows that for movement encoding and reconstruction DMPs and OCPs are the most efficient with respect to the number of parameters and reconstruction accuracy, if a sufficient number of kernels is used. In case of movement generalisation to new start- and end-point situations, DMPs, OCPs and task parameterized GMM (TP-GMM, movement generalisation framework based on tbGMR) lead to similar performance, which ProMPs only achieve when using many demonstrations for learning. All models outperform SEDS, which additionally proves to be difficult to fit. Furthermore we observe that TP-GMM and SEDS suffer from problems reaching the end-points of generalizations.These different quantitative results will help selecting the most appropriate models and designing trajectory representations in an improved task-dependent way in future robotic applications. △ Less

Submitted 18 March, 2024; v1 submitted 23 November, 2022; originally announced November 2022.

arXiv:2211.08321 [pdf, other]

Simulated Mental Imagery for Robotic Task Planning

Authors: Shijia Li, Tomas Kulvicius, Minija Tamosiunaite, Florentin Wörgötter

Abstract: Traditional AI-planning methods for task planning in robotics require a symbolically encoded domain description. While powerful in well-defined scenarios, as well as human-interpretable, setting this up requires substantial effort. Different from this, most everyday planning tasks are solved by humans intuitively, using mental imagery of the different planning steps. Here we suggest that the same… ▽ More Traditional AI-planning methods for task planning in robotics require a symbolically encoded domain description. While powerful in well-defined scenarios, as well as human-interpretable, setting this up requires substantial effort. Different from this, most everyday planning tasks are solved by humans intuitively, using mental imagery of the different planning steps. Here we suggest that the same approach can be used for robots, too, in cases which require only limited execution accuracy. In the current study, we propose a novel sub-symbolic method called Simulated Mental Imagery for Planning (SiMIP), which consists of perception, simulated action, success-checking and re-planning performed on 'imagined' images. We show that it is possible to implement mental imagery-based planning in an algorithmically sound way by combining regular convolutional neural networks and generative adversarial networks. With this method, the robot acquires the capability to use the initially existing scene to generate action plans without symbolic domain descriptions, while at the same time plans remain human-interpretable, different from deep reinforcement learning, which is an alternative sub-symbolic approach. We create a dataset from real scenes for a packing problem of having to correctly place different objects into different target slots. This way efficiency and success rate of this algorithm could be quantified. △ Less

Submitted 27 July, 2023; v1 submitted 15 November, 2022; originally announced November 2022.

arXiv:2208.00884 [pdf]

Infant movement classification through pressure distribution analysis

Authors: Tomas Kulvicius, Dajie Zhang, Karin Nielsen-Saines, Sven Bölte, Marc Kraft, Christa Einspieler, Luise Poustka, Florentin Wörgötter, Peter B Marschik

Abstract: Aiming at objective early detection of neuromotor disorders such as cerebral palsy, we proposed an innovative non-intrusive approach using a pressure sensing device to classify infant general movements (GMs). Here, we tested the feasibility of using pressure data to differentiate typical GM patterns of the ''fidgety period'' (i.e., fidgety movements) vs. the ''pre-fidgety period'' (i.e., writhing… ▽ More Aiming at objective early detection of neuromotor disorders such as cerebral palsy, we proposed an innovative non-intrusive approach using a pressure sensing device to classify infant general movements (GMs). Here, we tested the feasibility of using pressure data to differentiate typical GM patterns of the ''fidgety period'' (i.e., fidgety movements) vs. the ''pre-fidgety period'' (i.e., writhing movements). Participants (N = 45) were sampled from a typically-develo** infant cohort. Multi-modal sensor data, including pressure data from a 32x32-grid pressure sensing mat with 1024 sensors, were prospectively recorded for each infant in seven succeeding laboratory sessions in biweekly intervals from 4-16 weeks of post-term age. For proof-of-concept, 1776 pressure data snippets, each 5s long, from the two targeted age periods were taken for movement classification. Each snippet was pre-annotated based on corresponding synchronised video data by human assessors as either fidgety present (FM+) or absent (FM-). Multiple neural network architectures were tested to distinguish the FM+ vs. FM- classes, including support vector machines (SVM), feed-forward networks (FFNs), convolutional neural networks (CNNs), and long short-term memory (LSTM) networks. The CNN achieved the highest average classification accuracy (81.4%) for classes FM+ vs. FM-. Comparing the pros and cons of other methods aiming at automated GMA to the pressure sensing approach, we concluded that the pressure sensing approach has great potential for efficient large-scale motion data acquisition and sharing. This will in return enable improvement of the approach that may prove scalable for daily clinical application for evaluating infant neuromotor functions. △ Less

Submitted 1 July, 2023; v1 submitted 26 July, 2022; originally announced August 2022.

arXiv:2207.11020 [pdf]

doi 10.1016/j.isci.2023.106348

Open video data sharing in developmental and behavioural science

Authors: Peter B Marschik, Tomas Kulvicius, Sarah Flügge, Claudius Widmann, Karin Nielsen-Saines, Martin Schulte-Rüther, Britta Hüning, Sven Bölte, Luise Poustka, Jeff Sigafoos, Florentin Wörgötter, Christa Einspieler, Dajie Zhang

Abstract: Video recording is a widely used method for documenting infant and child behaviours in research and clinical practice. Video data has rarely been shared due to ethical concerns of confidentiality, although the need of shared large-scaled datasets remains increasing. This demand is even more imperative when data-driven computer-based approaches are involved, such as screening tools to complement cl… ▽ More Video recording is a widely used method for documenting infant and child behaviours in research and clinical practice. Video data has rarely been shared due to ethical concerns of confidentiality, although the need of shared large-scaled datasets remains increasing. This demand is even more imperative when data-driven computer-based approaches are involved, such as screening tools to complement clinical assessments. To share data while abiding by privacy protection rules, a critical question arises whether efforts at data de-identification reduce data utility? We addressed this question by showcasing the Prechtl's general movements assessment (GMA), an established and globally practised video-based diagnostic tool in early infancy for detecting neurological deficits, such as cerebral palsy. To date, no shared expert-annotated large data repositories for infant movement analyses exist. Such datasets would massively benefit training and recalibration of human assessors and the development of computer-based approaches. In the current study, sequences from a prospective longitudinal infant cohort with a total of 19451 available general movements video snippets were randomly selected for human clinical reasoning and computer-based analysis. We demonstrated for the first time that pseudonymisation by face-blurring video recordings is a viable approach. The video redaction did not affect classification accuracy for either human assessors or computer vision methods, suggesting an adequate and easy-to-apply solution for sharing movement video data. We call for further explorations into efficient and privacy rule-conforming approaches for deidentifying video data in scientific and clinical fields beyond movement assessments. These approaches shall enable sharing and merging stand-alone video datasets into large data pools to advance science and public health. △ Less

Submitted 22 July, 2022; originally announced July 2022.

Journal ref: iScience, 2023

arXiv:2203.01051 [pdf, other]

3D object reconstruction and 6D-pose estimation from 2D shape for robotic gras** of objects

Authors: Marcell Wolnitza, Osman Kaya, Tomas Kulvicius, Florentin Wörgötter, Babette Dellen

Abstract: We propose a method for 3D object reconstruction and 6D-pose estimation from 2D images that uses knowledge about object shape as the primary key. In the proposed pipeline, recognition and labeling of objects in 2D images deliver 2D segment silhouettes that are compared with the 2D silhouettes of projections obtained from various views of a 3D model representing the recognized object class. By comp… ▽ More We propose a method for 3D object reconstruction and 6D-pose estimation from 2D images that uses knowledge about object shape as the primary key. In the proposed pipeline, recognition and labeling of objects in 2D images deliver 2D segment silhouettes that are compared with the 2D silhouettes of projections obtained from various views of a 3D model representing the recognized object class. By computing transformation parameters directly from the 2D images, the number of free parameters required during the registration process is reduced, making the approach feasible. Furthermore, 3D transformations and projective geometry are employed to arrive at a full 3D reconstruction of the object in camera space using a calibrated set up. Inclusion of a second camera allows resolving remaining ambiguities. The method is quantitatively evaluated using synthetic data and tested with real data, and additional results for the well-known Linemod data set are shown. In robot experiments, successful gras** of objects demonstrates its usability in real-world environments, and, where possible, a comparison with other methods is provided. The method is applicable to scenarios where 3D object models, e.g., CAD-models or point clouds, are available and precise pixel-wise segmentation maps of 2D images can be obtained. Different from other methods, the method does not use 3D depth for training, widening the domain of application. △ Less

Submitted 2 March, 2022; originally announced March 2022.

arXiv:2201.11104 [pdf, other]

doi 10.1109/TNNLS.2023.3327103

Combining Optimal Path Search With Task-Dependent Learning in a Neural Network

Authors: Tomas Kulvicius, Minija Tamosiunaite, Florentin Wörgötter

Abstract: Finding optimal paths in connected graphs requires determining the smallest total cost for traveling along the graph's edges. This problem can be solved by several classical algorithms where, usually, costs are predefined for all edges. Conventional planning methods can, thus, normally not be used when wanting to change costs in an adaptive way following the requirements of some task. Here we show… ▽ More Finding optimal paths in connected graphs requires determining the smallest total cost for traveling along the graph's edges. This problem can be solved by several classical algorithms where, usually, costs are predefined for all edges. Conventional planning methods can, thus, normally not be used when wanting to change costs in an adaptive way following the requirements of some task. Here we show that one can define a neural network representation of path finding problems by transforming cost values into synaptic weights, which allows for online weight adaptation using network learning mechanisms. When starting with an initial activity value of one, activity propagation in this network will lead to solutions, which are identical to those found by the Bellman-Ford algorithm. The neural network has the same algorithmic complexity as Bellman-Ford and, in addition, we can show that network learning mechanisms (such as Hebbian learning) can adapt the weights in the network augmenting the resulting paths according to some task at hand. We demonstrate this by learning to navigate in an environment with obstacles as well as by learning to follow certain sequences of path nodes. Hence, the here-presented novel algorithm may open up a different regime of applications where path-augmentation (by learning) is directly coupled with path finding in a natural way. △ Less

Submitted 2 November, 2023; v1 submitted 26 January, 2022; originally announced January 2022.

Journal ref: IEEE Transactions on Neural Networks and Learning Systems, 2023

arXiv:2110.13665 [pdf, other]

doi 10.1109/TCDS.2022.3163022

Bootstrap** Concept Formation in Small Neural Networks

Authors: Minija Tamosiunaite, Tomas Kulvicius, Florentin Wörgötter

Abstract: The question how neural systems (of humans) can perform reasoning is still far from being solved. We posit that the process of forming Concepts is a fundamental step required for this. We argue that, first, Concepts are formed as closed representations, which are then consolidated by relating them to each other. Here we present a model system (agent) with a small neural network that uses realistic… ▽ More The question how neural systems (of humans) can perform reasoning is still far from being solved. We posit that the process of forming Concepts is a fundamental step required for this. We argue that, first, Concepts are formed as closed representations, which are then consolidated by relating them to each other. Here we present a model system (agent) with a small neural network that uses realistic learning rules and receives only feedback from the environment in which the agent performs virtual actions. First, the actions of the agent are reflexive. In the process of learning, statistical regularities in the input lead to the formation of neuronal pools representing relations between the entities observed by the agent from its artificial world. This information then influences the behavior of the agent via feedback connections replacing the initial reflex by an action driven by these relational representations. We hypothesize that the neuronal pools representing relational information can be considered as primordial Concepts, which may in a similar way be present in some pre-linguistic animals, too. This system provides formal grounds for further discussions on what could be understood as a Concept and shows that associative learning is enough to develop concept-like structures. △ Less

Submitted 28 March, 2022; v1 submitted 26 October, 2021; originally announced October 2021.

Journal ref: IEEE Transactions on Cognitive and Developmental Systems, 2022

arXiv:2004.10518 [pdf, other]

Human and Machine Action Prediction Independent of Object Information

Authors: Fatemeh Ziaeetabar, Jennifer Pomp, Stefan Pfeiffer, Nadiya El-Sourani, Ricarda I. Schubotz, Minija Tamosiunaite, Florentin Wörgötter

Abstract: Predicting other people's action is key to successful social interactions, enabling us to adjust our own behavior to the consequence of the others' future actions. Studies on action recognition have focused on the importance of individual visual features of objects involved in an action and its context. Humans, however, recognize actions on unknown objects or even when objects are imagined (pantom… ▽ More Predicting other people's action is key to successful social interactions, enabling us to adjust our own behavior to the consequence of the others' future actions. Studies on action recognition have focused on the importance of individual visual features of objects involved in an action and its context. Humans, however, recognize actions on unknown objects or even when objects are imagined (pantomime). Other cues must thus compensate the lack of recognizable visual object features. Here, we focus on the role of inter-object relations that change during an action. We designed a virtual reality setup and tested recognition speed for 10 different manipulation actions on 50 subjects. All objects were abstracted by emulated cubes so the actions could not be inferred using object information. Instead, subjects had to rely only on the information that comes from the changes in the spatial relations that occur between those cubes. In spite of these constraints, our results show the subjects were able to predict actions in, on average, less than 64% of the action's duration. We employed a computational model -an enriched Semantic Event Chain (eSEC)- incorporating the information of spatial relations, specifically (a) objects' touching/untouching, (b) static spatial relations between objects and (c) dynamic spatial relations between objects. Trained on the same actions as those observed by subjects, the model successfully predicted actions even better than humans. Information theoretical analysis shows that eSECs optimally use individual cues, whereas humans presumably mostly rely on a mixed-cue strategy, which takes longer until recognition. Providing a better cognitive basis of action recognition may, on one hand improve our understanding of related human pathologies and, on the other hand, also help to build robots for conflict-free human-robot cooperation. Our results open new avenues here. △ Less

Submitted 22 April, 2020; originally announced April 2020.

Comments: This paper includes 31 pages, 11 figures and 1 table

arXiv:2004.02607 [pdf, other]

Semantic Image Search for Robotic Applications

Authors: Tomas Kulvicius, Irene Markelic, Minija Tamosiunaite, Florentin Wörgötter

Abstract: Generalization in robotics is one of the most important problems. New generalization approaches use internet databases in order to solve new tasks. Modern search engines can return a large amount of information according to a query within milliseconds. However, not all of the returned information is task relevant, partly due to the problem of polysemes. Here we specifically address the problem of… ▽ More Generalization in robotics is one of the most important problems. New generalization approaches use internet databases in order to solve new tasks. Modern search engines can return a large amount of information according to a query within milliseconds. However, not all of the returned information is task relevant, partly due to the problem of polysemes. Here we specifically address the problem of object generalization by using image search. We suggest a bi-modal solution, combining visual and textual information, based on the observation that humans use additional linguistic cues to demarcate intended word meaning. We evaluate the quality of our approach by comparing it to human labelled data and find that, on average, our approach leads to improved results in comparison to Google searches, and that it can treat the problem of polysemes. △ Less

Submitted 2 April, 2020; originally announced April 2020.

Journal ref: 22nd International Workshop on Robotics in Alpe-Adria-Danube Region (RAAD 2013), September 11-13, 2013, Portoroz, Slovenia

arXiv:2004.00568 [pdf, other]

One-shot path planning for multi-agent systems using fully convolutional neural network

Authors: Tomas Kulvicius, Sebastian Herzog, Timo Lüddecke, Minija Tamosiunaite, Florentin Wörgötter

Abstract: Path planning plays a crucial role in robot action execution, since a path or a motion trajectory for a particular action has to be defined first before the action can be executed. Most of the current approaches are iterative methods where the trajectory is generated iteratively by predicting the next state based on the current state. Moreover, in case of multi-agent systems, paths are planned for… ▽ More Path planning plays a crucial role in robot action execution, since a path or a motion trajectory for a particular action has to be defined first before the action can be executed. Most of the current approaches are iterative methods where the trajectory is generated iteratively by predicting the next state based on the current state. Moreover, in case of multi-agent systems, paths are planned for each agent separately. In contrast to that, we propose a novel method by utilising fully convolutional neural network, which allows generation of complete paths, even for more than one agent, in one-shot, i.e., with a single prediction step. We demonstrate that our method is able to successfully generate optimal or close to optimal paths in more than 98\% of the cases for single path predictions. Moreover, we show that although the network has never been trained on multi-path planning it is also able to generate optimal or close to optimal paths in 85.7\% and 65.4\% of the cases when generating two and three paths, respectively. △ Less

Submitted 1 April, 2020; originally announced April 2020.

Journal ref: The 3rd International Symposium on Swarm Behavior and Bioinspired Robotics (SWARM 2019), November 20-22, Okinawa, Japan

arXiv:2004.00540 [pdf, other]

doi 10.1109/TNNLS.2021.3089023

Generation of Paths in a Maze using a Deep Network without Learning

Authors: Tomas Kulvicius, Sebastian Herzog, Minija Tamosiunaite, Florentin Wörgötter

Abstract: Trajectory- or path-planning is a fundamental issue in a wide variety of applications. Here we show that it is possible to solve path planning for multiple start- and end-points highly efficiently with a network that consists only of max pooling layers, for which no network training is needed. Different from competing approaches, very large mazes containing more than half a billion nodes with dens… ▽ More Trajectory- or path-planning is a fundamental issue in a wide variety of applications. Here we show that it is possible to solve path planning for multiple start- and end-points highly efficiently with a network that consists only of max pooling layers, for which no network training is needed. Different from competing approaches, very large mazes containing more than half a billion nodes with dense obstacle configuration and several thousand path end-points can this way be solved in very short time on parallel hardware. △ Less

Submitted 1 April, 2020; originally announced April 2020.

Journal ref: IEEE Transactions on Neural Networks and Learning Systems, 2022

arXiv:1911.05990 [pdf, other]

Attention on Abstract Visual Reasoning

Authors: Lukas Hahne, Timo Lüddecke, Florentin Wörgötter, David Kappel

Abstract: Attention mechanisms have been boosting the performance of deep learning models on a wide range of applications, ranging from speech understanding to program induction. However, despite experiments from psychology which suggest that attention plays an essential role in visual reasoning, the full potential of attention mechanisms has so far not been explored to solve abstract cognitive tasks on ima… ▽ More Attention mechanisms have been boosting the performance of deep learning models on a wide range of applications, ranging from speech understanding to program induction. However, despite experiments from psychology which suggest that attention plays an essential role in visual reasoning, the full potential of attention mechanisms has so far not been explored to solve abstract cognitive tasks on image data. In this work, we propose a hybrid network architecture, grounded on self-attention and relational reasoning. We call this new model Attention Relation Network (ARNe). ARNe combines features from the recently introduced Transformer and the Wild Relation Network (WReN). We test ARNe on the Procedurally Generated Matrices (PGMs) datasets for abstract visual reasoning. ARNe excels the WReN model on this task by 11.28 ppt. Relational concepts between objects are efficiently learned demanding only 35% of the training samples to surpass reported accuracy of the base line model. Our proposed hybrid model, represents an alternative on learning abstract relations using self-attention and demonstrates that the Transformer network is also well suited for abstract visual reasoning. △ Less

Submitted 14 November, 2019; originally announced November 2019.

arXiv:1907.01932 [pdf]

doi 10.1038/s41598-020-60923-5

Action Prediction in Humans and Robots

Authors: Florentin Wörgötter, Fatemeh Ziaeetabar, Stefan Pfeiffer, Osman Kaya, Tomas Kulvicius, Minija Tamosiunaite

Abstract: Efficient action prediction is of central importance for the fluent workflow between humans and equally so for human-robot interaction. To achieve prediction, actions can be encoded by a series of events, where every event corresponds to a change in a (static or dynamic) relation between some of the objects in a scene. Manipulation actions and others can be uniquely encoded this way and only, on a… ▽ More Efficient action prediction is of central importance for the fluent workflow between humans and equally so for human-robot interaction. To achieve prediction, actions can be encoded by a series of events, where every event corresponds to a change in a (static or dynamic) relation between some of the objects in a scene. Manipulation actions and others can be uniquely encoded this way and only, on average, less than 60% of the time series has to pass until an action can be predicted. Using a virtual reality setup and testing ten different manipulation actions, here we show that in most cases humans predict actions at the same event as the algorithm. In addition, we perform an in-depth analysis about the temporal gain resulting from such predictions when chaining actions and show in some robotic experiments that the percentage gain for humans and robots is approximately equal. Thus, if robots use this algorithm then their prediction-moments will be compatible to those of their human interaction partners, which should much benefit natural human-robot collaboration. △ Less

Submitted 3 July, 2019; originally announced July 2019.

Journal ref: Scientific Reports, 2020

arXiv:1801.08829 [pdf, other]

doi 10.1109/TCDS.2018.2867772

Symbol Emergence in Cognitive Developmental Systems: a Survey

Authors: Tadahiro Taniguchi, Emre Ugur, Matej Hoffmann, Lorenzo Jamone, Takayuki Nagai, Benjamin Rosman, Toshihiko Matsuka, Naoto Iwahashi, Erhan Oztop, Justus Piater, Florentin Wörgötter

Abstract: Humans use signs, e.g., sentences in a spoken language, for communication and thought. Hence, symbol systems like language are crucial for our communication with other agents and adaptation to our real-world environment. The symbol systems we use in our human society adaptively and dynamically change over time. In the context of artificial intelligence (AI) and cognitive systems, the symbol ground… ▽ More Humans use signs, e.g., sentences in a spoken language, for communication and thought. Hence, symbol systems like language are crucial for our communication with other agents and adaptation to our real-world environment. The symbol systems we use in our human society adaptively and dynamically change over time. In the context of artificial intelligence (AI) and cognitive systems, the symbol grounding problem has been regarded as one of the central problems related to {\it symbols}. However, the symbol grounding problem was originally posed to connect symbolic AI and sensorimotor information and did not consider many interdisciplinary phenomena in human communication and dynamic symbol systems in our society, which semiotics considered. In this paper, we focus on the symbol emergence problem, addressing not only cognitive dynamics but also the dynamics of symbol systems in society, rather than the symbol grounding problem. We first introduce the notion of a symbol in semiotics from the humanities, to leave the very narrow idea of symbols in symbolic AI. Furthermore, over the years, it became more and more clear that symbol emergence has to be regarded as a multifaceted problem. Therefore, secondly, we review the history of the symbol emergence problem in different fields, including both biological and artificial systems, showing their mutual relations. We summarize the discussion and provide an integrative viewpoint and comprehensive overview of symbol emergence in cognitive systems. Additionally, we describe the challenges facing the creation of cognitive systems that can be part of symbol emergence systems. △ Less

Submitted 10 July, 2018; v1 submitted 26 January, 2018; originally announced January 2018.

Comments: 23 pages, 6 figures. Submitted to IEEE Transactions on Cognitive and Developmental Systems

Journal ref: IEEE Transactions on Cognitive and Developmental Systems, vol. 11, no. 4, pp. 494-516, 2019

arXiv:1709.08872 [pdf, other]

Learning to Label Affordances from Simulated and Real Data

Authors: Timo Lüddecke, Florentin Wörgötter

Abstract: An autonomous robot should be able to evaluate the affordances that are offered by a given situation. Here we address this problem by designing a system that can densely predict affordances given only a single 2D RGB image. This is achieved with a convolutional neural network (ResNet), which we combine with refinement modules recently proposed for addressing semantic image segmentation. We define… ▽ More An autonomous robot should be able to evaluate the affordances that are offered by a given situation. Here we address this problem by designing a system that can densely predict affordances given only a single 2D RGB image. This is achieved with a convolutional neural network (ResNet), which we combine with refinement modules recently proposed for addressing semantic image segmentation. We define a novel cost function, which is able to handle (potentially multiple) affordances of objects and their parts in a pixel-wise manner even in the case of incomplete data. We perform qualitative as well as quantitative evaluations with simulated and real data assessing 15 different affordances. In general, we find that affordances, which are well-enough represented in the training data, are correctly recognized with a substantial fraction of correctly assigned pixels. Furthermore, we show that our model outperforms several baselines. Hence, this method can give clear action guidelines for a robot. △ Less

Submitted 26 September, 2017; originally announced September 2017.

arXiv:1706.04265 [pdf, other]

Transfer entropy-based feedback improves performance in artificial neural networks

Authors: Sebastian Herzog, Christian Tetzlaff, Florentin Wörgötter

Abstract: The structure of the majority of modern deep neural networks is characterized by uni- directional feed-forward connectivity across a very large number of layers. By contrast, the architecture of the cortex of vertebrates contains fewer hierarchical levels but many recurrent and feedback connections. Here we show that a small, few-layer artificial neural network that employs feedback will reach top… ▽ More The structure of the majority of modern deep neural networks is characterized by uni- directional feed-forward connectivity across a very large number of layers. By contrast, the architecture of the cortex of vertebrates contains fewer hierarchical levels but many recurrent and feedback connections. Here we show that a small, few-layer artificial neural network that employs feedback will reach top level performance on a standard benchmark task, otherwise only obtained by large feed-forward structures. To achieve this we use feed-forward transfer entropy between neurons to structure feedback connectivity. Transfer entropy can here intuitively be understood as a measure for the relevance of certain pathways in the network, which are then amplified by feedback. Feedback may therefore be key for high network performance in small brain-like architectures. △ Less

Submitted 22 June, 2017; v1 submitted 13 June, 2017; originally announced June 2017.

arXiv:1610.05693 [pdf, other]

doi 10.1007/s11263-016-0956-8

Semantic Decomposition and Recognition of Long and Complex Manipulation Action Sequences

Authors: Eren Erdal Aksoy, Adil Orhan, Florentin Woergoetter

Abstract: Understanding continuous human actions is a non-trivial but important problem in computer vision. Although there exists a large corpus of work in the recognition of action sequences, most approaches suffer from problems relating to vast variations in motions, action combinations, and scene contexts. In this paper, we introduce a novel method for semantic segmentation and recognition of long and co… ▽ More Understanding continuous human actions is a non-trivial but important problem in computer vision. Although there exists a large corpus of work in the recognition of action sequences, most approaches suffer from problems relating to vast variations in motions, action combinations, and scene contexts. In this paper, we introduce a novel method for semantic segmentation and recognition of long and complex manipulation action tasks, such as "preparing a breakfast" or "making a sandwich". We represent manipulations with our recently introduced "Semantic Event Chain" (SEC) concept, which captures the underlying spatiotemporal structure of an action invariant to motion, velocity, and scene context. Solely based on the spatiotemporal interactions between manipulated objects and hands in the extracted SEC, the framework automatically parses individual manipulation streams performed either sequentially or concurrently. Using event chains, our method further extracts basic primitive elements of each parsed manipulation. Without requiring any prior object knowledge, the proposed framework can also extract object-like scene entities that exhibit the same role in semantically similar manipulations. We conduct extensive experiments on various recent datasets to validate the robustness of the framework. △ Less

Submitted 18 October, 2016; originally announced October 2016.

Comments: IJCV preprint manuscript

Journal ref: International Journal of Computer Vision, 2017

arXiv:1606.09277 [pdf, other]

Response and noise correlations to complex natural sounds in the auditory midbrain

Authors: Dominika Lyzwa, Florentin Wörgötter

Abstract: How natural communication sounds are spatially represented across the inferior colliculus, the main center of convergence for auditory information in the midbrain, is not known. The neural representation of the acoustic stimuli results from the interplay of locally differing input and the organization of spectral and temporal neural preferences that change gradually across the nucleus. This raises… ▽ More How natural communication sounds are spatially represented across the inferior colliculus, the main center of convergence for auditory information in the midbrain, is not known. The neural representation of the acoustic stimuli results from the interplay of locally differing input and the organization of spectral and temporal neural preferences that change gradually across the nucleus. This raises the question how similar the neural representation of the communication sounds is across these gradients of neural preferences, and whether it also changes gradually. Multi-unit cluster spike trains were recorded from guinea pigs presented with a spectrotemporally rich set of eleven species-specific communication sounds. Using cross-correlation, we analyzed the response similarity of spiking activity across a broad frequency range for similarly and differently frequency-tuned neurons. Furthermore, we separated the contribution of the stimulus to the correlations to investigate whether similarity is only attributable to the stimulus, or, whether interactions exist between the multi-unit clusters that lead to correlations and whether these follow the same representation as the response similarity. We found that similarity of responses is dependent on the neurons' spatial distance for similarly and differently frequency-tuned neurons, and that similarity decreases gradually with spatial distance. Significant neural correlations exist, and contribute to the response similarity. Our findings suggest that for multi-unit clusters in the mammalian inferior colliculus, the gradual response similarity with spatial distance to natural complex sounds is shaped by neural interactions and the gradual organization of neural preferences. △ Less

Submitted 29 June, 2016; originally announced June 2016.

Comments: 19 pages, 10 figures

arXiv:1506.03599 [pdf, other]

Distributed Recurrent Neural Forward Models with Synaptic Adaptation for Complex Behaviors of Walking Robots

Authors: Sakyasingha Dasgupta, Dennis Goldschmidt, Florentin Wörgötter, Poramate Manoonpong

Abstract: Walking animals, like stick insects, cockroaches or ants, demonstrate a fascinating range of locomotive abilities and complex behaviors. The locomotive behaviors can consist of a variety of walking patterns along with adaptation that allow the animals to deal with changes in environmental conditions, like uneven terrains, gaps, obstacles etc. Biological study has revealed that such complex behavio… ▽ More Walking animals, like stick insects, cockroaches or ants, demonstrate a fascinating range of locomotive abilities and complex behaviors. The locomotive behaviors can consist of a variety of walking patterns along with adaptation that allow the animals to deal with changes in environmental conditions, like uneven terrains, gaps, obstacles etc. Biological study has revealed that such complex behaviors are a result of a combination of biome- chanics and neural mechanism thus representing the true nature of embodied interactions. While the biomechanics helps maintain flexibility and sustain a variety of movements, the neural mechanisms generate movements while making appropriate predictions crucial for achieving adaptation. Such predictions or planning ahead can be achieved by way of in- ternal models that are grounded in the overall behavior of the animal. Inspired by these findings, we present here, an artificial bio-inspired walking system which effectively com- bines biomechanics (in terms of the body and leg structures) with the underlying neural mechanisms. The neural mechanisms consist of 1) central pattern generator based control for generating basic rhythmic patterns and coordinated movements, 2) distributed (at each leg) recurrent neural network based adaptive forward models with efference copies as internal models for sensory predictions and instantaneous state estimations, and 3) searching and elevation control for adapting the movement of an individual leg to deal with different environmental conditions. Using simulations we show that this bio-inspired approach with adaptive internal models allows the walking robot to perform complex loco- motive behaviors as observed in insects, including walking on undulated terrains, crossing large gaps as well as climbing over high obstacles... △ Less

Submitted 11 June, 2015; originally announced June 2015.

Comments: 26 pages, 10 figures

arXiv:1407.3269 [pdf, other]

doi 10.1016/j.ins.2014.05.001

Multiple chaotic central pattern generators with learning for legged locomotion and malfunction compensation

Authors: Guanjiao Ren, Weihai Chen, Sakyasingha Dasgupta, Christoph Kolodziejski, Florentin Wörgötter, Poramate Manoonpong

Abstract: An originally chaotic system can be controlled into various periodic dynamics. When it is implemented into a legged robot's locomotion control as a central pattern generator (CPG), sophisticated gait patterns arise so that the robot can perform various walking behaviors. However, such a single chaotic CPG controller has difficulties dealing with leg malfunction. Specifically, in the scenarios pres… ▽ More An originally chaotic system can be controlled into various periodic dynamics. When it is implemented into a legged robot's locomotion control as a central pattern generator (CPG), sophisticated gait patterns arise so that the robot can perform various walking behaviors. However, such a single chaotic CPG controller has difficulties dealing with leg malfunction. Specifically, in the scenarios presented here, its movement permanently deviates from the desired trajectory. To address this problem, we extend the single chaotic CPG to multiple CPGs with learning. The learning mechanism is based on a simulated annealing algorithm. In a normal situation, the CPGs synchronize and their dynamics are identical. With leg malfunction or disability, the CPGs lose synchronization leading to independent dynamics. In this case, the learning mechanism is applied to automatically adjust the remaining legs' oscillation frequencies so that the robot adapts its locomotion to deal with the malfunction. As a consequence, the trajectory produced by the multiple chaotic CPGs resembles the original trajectory far better than the one produced by only a single CPG. The performance of the system is evaluated first in a physical simulation of a quadruped as well as a hexapod robot and finally in a real six-legged walking machine called AMOSII. The experimental results presented here reveal that using multiple CPGs with learning is an effective approach for adaptive locomotion generation where, for instance, different body parts have to perform independent movements for malfunction compensation. △ Less

Submitted 11 July, 2014; originally announced July 2014.

Comments: 48 pages, 16 figures, Information Sciences 2014

ACM Class: I.2.9; I.2.6

arXiv:1105.1386 [pdf, other]

doi 10.1038/nphys1860

Self-organized adaptation of a simple neural circuit enables complex robot behaviour

Authors: Silke Steingrube, Marc Timme, Florentin Woergoetter, Poramate Manoonpong

Abstract: Controlling sensori-motor systems in higher animals or complex robots is a challenging combinatorial problem, because many sensory signals need to be simultaneously coordinated into a broad behavioural spectrum. To rapidly interact with the environment, this control needs to be fast and adaptive. Current robotic solutions operate with limited autonomy and are mostly restricted to few behavioural p… ▽ More Controlling sensori-motor systems in higher animals or complex robots is a challenging combinatorial problem, because many sensory signals need to be simultaneously coordinated into a broad behavioural spectrum. To rapidly interact with the environment, this control needs to be fast and adaptive. Current robotic solutions operate with limited autonomy and are mostly restricted to few behavioural patterns. Here we introduce chaos control as a new strategy to generate complex behaviour of an autonomous robot. In the presented system, 18 sensors drive 18 motors via a simple neural control circuit, thereby generating 11 basic behavioural patterns (e.g., orienting, taxis, self-protection, various gaits) and their combinations. The control signal quickly and reversibly adapts to new situations and additionally enables learning and synaptic long-term storage of behaviourally useful motor responses. Thus, such neural control provides a powerful yet simple way to self-organize versatile behaviours in autonomous agents with many degrees of freedom. △ Less

Submitted 6 May, 2011; originally announced May 2011.

Comments: 16 pages, non-final version, for final see Nature Physics homepage

Journal ref: Nature Phys. 6:224 (2010)

arXiv:cond-mat/0008013 [pdf, ps, other]

doi 10.1103/PhysRevE.62.R1461

Cluster update and recognition

Authors: Christian von Ferber, Florentin Worgotter

Abstract: We present a fast and robust cluster update algorithm that is especially efficient in implementing the task of image segmentation using the method of superparamagnetic clustering. We apply it to a Potts model with spin interactions that are are defined by gray-scale differences within the image. Motivated by biological systems, we introduce the concept of neural inhibition to the Potts model rea… ▽ More We present a fast and robust cluster update algorithm that is especially efficient in implementing the task of image segmentation using the method of superparamagnetic clustering. We apply it to a Potts model with spin interactions that are are defined by gray-scale differences within the image. Motivated by biological systems, we introduce the concept of neural inhibition to the Potts model realization of the segmentation problem. Including the inhibition term in the Hamiltonian results in enhanced contrast and thereby significantly improves segmentation quality. As a second benefit we can - after equilibration - directly identify the image segments as the clusters formed by the clustering algorithm. To construct a new spin configuration the algorithm performs the standard steps of (1) forming clusters and of (2) updating the spins in a cluster simultaneously. As opposed to standard algorithms, however, we share the interaction energy between the two steps. Thus the update probabilities are not independent of the interaction energies. As a consequence, we observe an acceleration of the relaxation by a factor of 10 compared to the Swendson and Wang procedure. △ Less

Submitted 1 August, 2000; originally announced August 2000.

Comments: 4 pages, 2 figures

Journal ref: Phys.Rev. E62 (2000) R1461-R1464

Showing 1–27 of 27 results for author: Wörgötter, F