Search | arXiv e-print repository

Towards Interpretable Reinforcement Learning with Constrained Normalizing Flow Policies

Authors: Finn Rietz, Erik Schaffernicht, Stefan Heinrich, Johannes A. Stork

Abstract: Reinforcement learning policies are typically represented by black-box neural networks, which are non-interpretable and not well-suited for safety-critical domains. To address both of these issues, we propose constrained normalizing flow policies as interpretable and safe-by-construction policy models. We achieve safety for reinforcement learning problems with instantaneous safety constraints, for… ▽ More Reinforcement learning policies are typically represented by black-box neural networks, which are non-interpretable and not well-suited for safety-critical domains. To address both of these issues, we propose constrained normalizing flow policies as interpretable and safe-by-construction policy models. We achieve safety for reinforcement learning problems with instantaneous safety constraints, for which we can exploit domain knowledge by analytically constructing a normalizing flow that ensures constraint satisfaction. The normalizing flow corresponds to an interpretable sequence of transformations on action samples, each ensuring alignment with respect to a particular constraint. Our experiments reveal benefits beyond interpretability in an easier learning objective and maintained constraint satisfaction throughout the entire learning process. Our approach leverages constraints over reward engineering while offering enhanced interpretability, safety, and direct means of providing domain knowledge to the agent without relying on complex reward functions. △ Less

Submitted 2 May, 2024; originally announced May 2024.

arXiv:2310.02360 [pdf, other]

Prioritized Soft Q-Decomposition for Lexicographic Reinforcement Learning

Authors: Finn Rietz, Erik Schaffernicht, Stefan Heinrich, Johannes Andreas Stork

Abstract: Reinforcement learning (RL) for complex tasks remains a challenge, primarily due to the difficulties of engineering scalar reward functions and the inherent inefficiency of training models from scratch. Instead, it would be better to specify complex tasks in terms of elementary subtasks and to reuse subtask solutions whenever possible. In this work, we address continuous space lexicographic multi-… ▽ More Reinforcement learning (RL) for complex tasks remains a challenge, primarily due to the difficulties of engineering scalar reward functions and the inherent inefficiency of training models from scratch. Instead, it would be better to specify complex tasks in terms of elementary subtasks and to reuse subtask solutions whenever possible. In this work, we address continuous space lexicographic multi-objective RL problems, consisting of prioritized subtasks, which are notoriously difficult to solve. We show that these can be scalarized with a subtask transformation and then solved incrementally using value decomposition. Exploiting this insight, we propose prioritized soft Q-decomposition (PSQD), a novel algorithm for learning and adapting subtask solutions under lexicographic priorities in continuous state-action spaces. PSQD offers the ability to reuse previously learned subtask solutions in a zero-shot composition, followed by an adaptation step. Its ability to use retained subtask training data for offline learning eliminates the need for new environment interaction during adaptation. We demonstrate the efficacy of our approach by presenting successful learning, reuse, and adaptation results for both low- and high-dimensional simulated robot control tasks, as well as offline learning results. In contrast to baseline approaches, PSQD does not trade off between conflicting subtasks or priority constraints and satisfies subtask priorities during learning. PSQD provides an intuitive framework for tackling complex RL problems, offering insights into the inner workings of the subtask composition. △ Less

Submitted 2 May, 2024; v1 submitted 3 October, 2023; originally announced October 2023.

Comments: Camera ready version

arXiv:2303.10042 [pdf, other]

ShaRPy: Shape Reconstruction and Hand Pose Estimation from RGB-D with Uncertainty

Authors: Vanessa Wirth, Anna-Maria Liphardt, Birte Coppers, Johanna Bräunig, Simon Heinrich, Sigrid Leyendecker, Arnd Kleyer, Georg Schett, Martin Vossiek, Bernhard Egger, Marc Stamminger

Abstract: Despite their potential, markerless hand tracking technologies are not yet applied in practice to the diagnosis or monitoring of the activity in inflammatory musculoskeletal diseases. One reason is that the focus of most methods lies in the reconstruction of coarse, plausible poses, whereas in the clinical context, accurate, interpretable, and reliable results are required. Therefore, we propose S… ▽ More Despite their potential, markerless hand tracking technologies are not yet applied in practice to the diagnosis or monitoring of the activity in inflammatory musculoskeletal diseases. One reason is that the focus of most methods lies in the reconstruction of coarse, plausible poses, whereas in the clinical context, accurate, interpretable, and reliable results are required. Therefore, we propose ShaRPy, the first RGB-D Shape Reconstruction and hand Pose tracking system, which provides uncertainty estimates of the computed pose, e.g., when a finger is hidden or its estimate is inconsistent with the observations in the input, to guide clinical decision-making. Besides pose, ShaRPy approximates a personalized hand shape, promoting a more realistic and intuitive understanding of its digital twin. Our method requires only a light-weight setup with a single consumer-level RGB-D camera yet it is able to distinguish similar poses with only small joint angle deviations in a metrically accurate space. This is achieved by combining a data-driven dense correspondence predictor with traditional energy minimization. To bridge the gap between interactive visualization and biomedical simulation we leverage a parametric hand model in which we incorporate biomedical constraints and optimize for both, its pose and hand shape. We evaluate ShaRPy on a keypoint detection benchmark and show qualitative results of hand function assessments for activity monitoring of musculoskeletal diseases. △ Less

Submitted 12 September, 2023; v1 submitted 17 March, 2023; originally announced March 2023.

Comments: Accepted at ICCVW (CVAMD) 2023

arXiv:2207.03960 [pdf, other]

Detection of Furigana Text in Images

Authors: Nikolaj Kjøller Bjerregaard, Veronika Cheplygina, Stefan Heinrich

Abstract: Furigana are pronunciation notes used in Japanese writing. Being able to detect these can help improve optical character recognition (OCR) performance or make more accurate digital copies of Japanese written media by correctly displaying furigana. This project focuses on detecting furigana in Japanese books and comics. While there has been research into the detection of Japanese text in general, t… ▽ More Furigana are pronunciation notes used in Japanese writing. Being able to detect these can help improve optical character recognition (OCR) performance or make more accurate digital copies of Japanese written media by correctly displaying furigana. This project focuses on detecting furigana in Japanese books and comics. While there has been research into the detection of Japanese text in general, there are currently no proposed methods for detecting furigana. We construct a new dataset containing Japanese written media and annotations of furigana. We propose an evaluation metric for such data which is similar to the evaluation protocols used in object detection except that it allows groups of objects to be labeled by one annotation. We propose a method for detection of furigana that is based on mathematical morphology and connected component analysis. We evaluate the detections of the dataset and compare different methods for text extraction. We also evaluate different types of images such as books and comics individually and discuss the challenges of each type of image. The proposed method reaches an F1-score of 76\% on the dataset. The method performs well on regular books, but less so on comics, and books of irregular format. Finally, we show that the proposed method can improve the performance of OCR by 5\% on the manga109 dataset. Source code is available via \texttt{\url{https://github.com/nikolajkb/FuriganaDetection}} △ Less

Submitted 8 July, 2022; originally announced July 2022.

Comments: This project was originally submitted by NKB in fulfillment of the 30 ECTS MSc thesis at the IT University of Copenhagen

arXiv:2006.13546 [pdf]

doi 10.3389/fnbot.2020.00052

Crossmodal Language Grounding in an Embodied Neurocognitive Model

Authors: Stefan Heinrich, Yuan Yao, Tobias Hinz, Zhiyuan Liu, Thomas Hummel, Matthias Kerzel, Cornelius Weber, Stefan Wermter

Abstract: Human infants are able to acquire natural language seemingly easily at an early age. Their language learning seems to occur simultaneously with learning other cognitive functions as well as with playful interactions with the environment and caregivers. From a neuroscientific perspective, natural language is embodied, grounded in most, if not all, sensory and sensorimotor modalities, and acquired b… ▽ More Human infants are able to acquire natural language seemingly easily at an early age. Their language learning seems to occur simultaneously with learning other cognitive functions as well as with playful interactions with the environment and caregivers. From a neuroscientific perspective, natural language is embodied, grounded in most, if not all, sensory and sensorimotor modalities, and acquired by means of crossmodal integration. However, characterising the underlying mechanisms in the brain is difficult and explaining the grounding of language in crossmodal perception and action remains challenging. In this paper, we present a neurocognitive model for language grounding which reflects bio-inspired mechanisms such as an implicit adaptation of timescales as well as end-to-end multimodal abstraction. It addresses developmental robotic interaction and extends its learning capabilities using larger-scale knowledge-based data. In our scenario, we utilise the humanoid robot NICO in obtaining the EMIL data collection, in which the cognitive robot interacts with objects in a children's playground environment while receiving linguistic labels from a caregiver. The model analysis shows that crossmodally integrated representations are sufficient for acquiring language merely from sensory input through interaction with objects in an environment. The representations self-organise hierarchically and embed temporal and spatial information through composition and decomposition. This model can also provide the basis for further crossmodal integration of perceptually grounded cognitive representations. △ Less

Submitted 16 October, 2020; v1 submitted 24 June, 2020; originally announced June 2020.

Journal ref: Frontiers in Neurorobotics, vol 14(52), 2020

arXiv:1910.13321 [pdf, other]

doi 10.1109/TPAMI.2020.3021209

Semantic Object Accuracy for Generative Text-to-Image Synthesis

Authors: Tobias Hinz, Stefan Heinrich, Stefan Wermter

Abstract: Generative adversarial networks conditioned on textual image descriptions are capable of generating realistic-looking images. However, current methods still struggle to generate images based on complex image captions from a heterogeneous domain. Furthermore, quantitatively evaluating these text-to-image models is challenging, as most evaluation metrics only judge image quality but not the conformi… ▽ More Generative adversarial networks conditioned on textual image descriptions are capable of generating realistic-looking images. However, current methods still struggle to generate images based on complex image captions from a heterogeneous domain. Furthermore, quantitatively evaluating these text-to-image models is challenging, as most evaluation metrics only judge image quality but not the conformity between the image and its caption. To address these challenges we introduce a new model that explicitly models individual objects within an image and a new evaluation metric called Semantic Object Accuracy (SOA) that specifically evaluates images given an image caption. The SOA uses a pre-trained object detector to evaluate if a generated image contains objects that are mentioned in the image caption, e.g. whether an image generated from "a car driving down the street" contains a car. We perform a user study comparing several text-to-image models and show that our SOA metric ranks the models the same way as humans, whereas other metrics such as the Inception Score do not. Our evaluation also shows that models which explicitly model objects outperform models which only model global image characteristics. △ Less

Submitted 2 June, 2020; v1 submitted 29 October, 2019; originally announced October 2019.

Comments: Added a user study to verify results. Code available at https://github.com/tohinz/semantic-object-accuracy-for-generative-text-to-image-synthesis

Journal ref: TPAMI (Early Access), 2020

arXiv:1901.00686 [pdf, other]

Generating Multiple Objects at Spatially Distinct Locations

Authors: Tobias Hinz, Stefan Heinrich, Stefan Wermter

Abstract: Recent improvements to Generative Adversarial Networks (GANs) have made it possible to generate realistic images in high resolution based on natural language descriptions such as image captions. Furthermore, conditional GANs allow us to control the image generation process through labels or even natural language descriptions. However, fine-grained control of the image layout, i.e. where in the ima… ▽ More Recent improvements to Generative Adversarial Networks (GANs) have made it possible to generate realistic images in high resolution based on natural language descriptions such as image captions. Furthermore, conditional GANs allow us to control the image generation process through labels or even natural language descriptions. However, fine-grained control of the image layout, i.e. where in the image specific objects should be located, is still difficult to achieve. This is especially true for images that should contain multiple distinct objects at different spatial locations. We introduce a new approach which allows us to control the location of arbitrarily many objects within an image by adding an object pathway to both the generator and the discriminator. Our approach does not need a detailed semantic layout but only bounding boxes and the respective labels of the desired objects are needed. The object pathway focuses solely on the individual objects and is iteratively applied at the locations specified by the bounding boxes. The global pathway focuses on the image background and the general image layout. We perform experiments on the Multi-MNIST, CLEVR, and the more complex MS-COCO data set. Our experiments show that through the use of the object pathway we can control object locations within images and can model complex scenes with multiple objects at various locations. We further show that the object pathway focuses on the individual objects and learns features relevant for these, while the global pathway focuses on global image characteristics and the image background. △ Less

Submitted 3 January, 2019; originally announced January 2019.

Comments: Published at ICLR 2019

arXiv:1703.08513 [pdf]

doi 10.1080/09540091.2017.1318357

Interactive Natural Language Acquisition in a Multi-modal Recurrent Neural Architecture

Authors: Stefan Heinrich, Stefan Wermter

Abstract: For the complex human brain that enables us to communicate in natural language, we gathered good understandings of principles underlying language acquisition and processing, knowledge about socio-cultural conditions, and insights about activity patterns in the brain. However, we were not yet able to understand the behavioural and mechanistic characteristics for natural language and how mechanisms… ▽ More For the complex human brain that enables us to communicate in natural language, we gathered good understandings of principles underlying language acquisition and processing, knowledge about socio-cultural conditions, and insights about activity patterns in the brain. However, we were not yet able to understand the behavioural and mechanistic characteristics for natural language and how mechanisms in the brain allow to acquire and process language. In bridging the insights from behavioural psychology and neuroscience, the goal of this paper is to contribute a computational understanding of appropriate characteristics that favour language acquisition. Accordingly, we provide concepts and refinements in cognitive modelling regarding principles and mechanisms in the brain and propose a neurocognitively plausible model for embodied language acquisition from real world interaction of a humanoid robot with its environment. In particular, the architecture consists of a continuous time recurrent neural network, where parts have different leakage characteristics and thus operate on multiple timescales for every modality and the association of the higher level nodes of all modalities into cell assemblies. The model is capable of learning language production grounded in both, temporal dynamic somatosensation and vision, and features hierarchical concept abstraction, concept decomposition, multi-modal integration, and self-organisation of latent representations. △ Less

Submitted 7 February, 2018; v1 submitted 24 March, 2017; originally announced March 2017.

Comments: Received 25 June 2016; Accepted 1 February 2017

Journal ref: Connection Science, vol 30, No 1, pp. 99-133, 2017

arXiv:1311.7182 [pdf, other]

Public Key Infrastructure based on Authentication of Media Attestments

Authors: Stuart Heinrich

Abstract: Many users would prefer the privacy of end-to-end encryption in their online communications if it can be done without significant inconvenience. However, because existing key distribution methods cannot be fully trusted enough for automatic use, key management has remained a user problem. We propose a fundamentally new approach to the key distribution problem by empowering end-users with the capac… ▽ More Many users would prefer the privacy of end-to-end encryption in their online communications if it can be done without significant inconvenience. However, because existing key distribution methods cannot be fully trusted enough for automatic use, key management has remained a user problem. We propose a fundamentally new approach to the key distribution problem by empowering end-users with the capacity to independently verify the authenticity of public keys using an additional media attestment. This permits client software to automatically lookup public keys from a keyserver without trusting the keyserver, because any attempted MITM attacks can be detected by end-users. Thus, our protocol is designed to enable a new breed of messaging clients with true end-to-end encryption built in, without the hassle of requiring users to manually manage the public keys, that is verifiably secure against MITM attacks, and does not require trusting any third parties. △ Less

Submitted 27 November, 2013; originally announced November 2013.

arXiv:1309.4426 [pdf, other]

GRED: Graph-Regularized 3D Shape Reconstruction from Highly Anisotropic and Noisy Images

Authors: Christian Widmer, Philipp Drewe, Xinghua Lou, Shefali Umrania, Stephanie Heinrich, Gunnar Rätsch

Abstract: Analysis of microscopy images can provide insight into many biological processes. One particularly challenging problem is cell nuclear segmentation in highly anisotropic and noisy 3D image data. Manually localizing and segmenting each and every cell nuclei is very time consuming, which remains a bottleneck in large scale biological experiments. In this work we present a tool for automated segmenta… ▽ More Analysis of microscopy images can provide insight into many biological processes. One particularly challenging problem is cell nuclear segmentation in highly anisotropic and noisy 3D image data. Manually localizing and segmenting each and every cell nuclei is very time consuming, which remains a bottleneck in large scale biological experiments. In this work we present a tool for automated segmentation of cell nuclei from 3D fluorescent microscopic data. Our tool is based on state-of-the-art image processing and machine learning techniques and supports a friendly graphical user interface (GUI). We show that our tool is as accurate as manual annotation but greatly reduces the time for the registration. △ Less

Submitted 17 September, 2013; originally announced September 2013.

arXiv:1103.6052 [pdf, other]

Internal Constraints of the Trifocal Tensor

Authors: Stuart B. Heinrich, Wesley E. Snyder

Abstract: The fundamental matrix and trifocal tensor are convenient algebraic representations of the epipolar geometry of two and three view configurations, respectively. The estimation of these entities is central to most reconstruction algorithms, and a solid understanding of their properties and constraints is therefore very important. The fundamental matrix has 1 internal constraint which is well unders… ▽ More The fundamental matrix and trifocal tensor are convenient algebraic representations of the epipolar geometry of two and three view configurations, respectively. The estimation of these entities is central to most reconstruction algorithms, and a solid understanding of their properties and constraints is therefore very important. The fundamental matrix has 1 internal constraint which is well understood, whereas the trifocal tensor has 8 independent algebraic constraints. The internal tensor constraints can be represented in many ways, although there is only one minimal and sufficient set of 8 constraints known. In this paper, we derive a second set of minimal and sufficient constraints that is simpler. We also show how this can be used in a new parameterization of the trifocal tensor. We hope that this increased understanding of the internal constraints may lead to improved algorithms for estimating the trifocal tensor, although the primary contribution is an improved theoretical understanding. △ Less

Submitted 30 March, 2011; originally announced March 2011.

arXiv:1103.5808 [pdf, other]

Improved Edge Awareness in Discontinuity Preserving Smoothing

Authors: Stuart B. Heinrich, Wesley E. Snyder

Abstract: Discontinuity preserving smoothing is a fundamentally important procedure that is useful in a wide variety of image processing contexts. It is directly useful for noise reduction, and frequently used as an intermediate step in higher level algorithms. For example, it can be particularly useful in edge detection and segmentation. Three well known algorithms for discontinuity preserving smoothing ar… ▽ More Discontinuity preserving smoothing is a fundamentally important procedure that is useful in a wide variety of image processing contexts. It is directly useful for noise reduction, and frequently used as an intermediate step in higher level algorithms. For example, it can be particularly useful in edge detection and segmentation. Three well known algorithms for discontinuity preserving smoothing are nonlinear anisotropic diffusion, bilateral filtering, and mean shift filtering. Although slight differences make them each better suited to different tasks, all are designed to preserve discontinuities while smoothing. However, none of them satisfy this goal perfectly: they each have exception cases in which smoothing may occur across hard edges. The principal contribution of this paper is the identification of a property we call edge awareness that should be satisfied by any discontinuity preserving smoothing algorithm. This constraint can be incorporated into existing algorithms to improve quality, and usually has negligible changes in runtime performance and/or complexity. We present modifications necessary to augment diffusion and mean shift, as well as a new formulation of the bilateral filter that unifies the spatial and range spaces to achieve edge awareness. △ Less

Submitted 29 March, 2011; originally announced March 2011.

arXiv:cs/0511068 [pdf]

An Agent-based Manufacturing Management System for Production and Logistics within Cross-Company Regional and National Production Networks

Authors: S. Heinrich, H. Durr, T. Hanel, J. Lassig

Abstract: The goal is the development of a simultaneous, dynamic, technological as well as logistical real-time planning and an organizational control of the production by the production units themselves, working in the production network under the use of Multi-Agent-Technology. The design of the multi-agent-based manufacturing management system, the models of the single agents, algorithms for the agent-b… ▽ More The goal is the development of a simultaneous, dynamic, technological as well as logistical real-time planning and an organizational control of the production by the production units themselves, working in the production network under the use of Multi-Agent-Technology. The design of the multi-agent-based manufacturing management system, the models of the single agents, algorithms for the agent-based, decentralized dispatching of orders, strategies and data management concepts as well as their integration into the SCM, basing on the solution described, will be explained in the following. Keywords: production engineering and management, dynamic manufacturing planning and control, multi-agentsystems (MAS), supply-chain-management (SCM), e-manufacturing △ Less

Submitted 18 November, 2005; originally announced November 2005.

Showing 1–13 of 13 results for author: Heinrich, S