Search | arXiv e-print repository

Learning Solutions of Stochastic Optimization Problems with Bayesian Neural Networks

Authors: Alan A. Lahoud, Erik Schaffernicht, Johannes A. Stork

Abstract: Mathematical solvers use parametrized Optimization Problems (OPs) as inputs to yield optimal decisions. In many real-world settings, some of these parameters are unknown or uncertain. Recent research focuses on predicting the value of these unknown parameters using available contextual features, aiming to decrease decision regret by adopting end-to-end learning approaches. However, these approache… ▽ More Mathematical solvers use parametrized Optimization Problems (OPs) as inputs to yield optimal decisions. In many real-world settings, some of these parameters are unknown or uncertain. Recent research focuses on predicting the value of these unknown parameters using available contextual features, aiming to decrease decision regret by adopting end-to-end learning approaches. However, these approaches disregard prediction uncertainty and therefore make the mathematical solver susceptible to provide erroneous decisions in case of low-confidence predictions. We propose a novel framework that models prediction uncertainty with Bayesian Neural Networks (BNNs) and propagates this uncertainty into the mathematical solver with a Stochastic Programming technique. The differentiable nature of BNNs and differentiable mathematical solvers allow for two different learning approaches: In the Decoupled learning approach, we update the BNN weights to increase the quality of the predictions' distribution of the OP parameters, while in the Combined learning approach, we update the weights aiming to directly minimize the expected OP's cost function in a stochastic end-to-end fashion. We do an extensive evaluation using synthetic data with various noise properties and a real dataset, showing that decisions regret are generally lower (better) with both proposed methods. △ Less

Submitted 5 June, 2024; originally announced June 2024.

arXiv:2405.04923 [pdf, other]

DataSP: A Differential All-to-All Shortest Path Algorithm for Learning Costs and Predicting Paths with Context

Authors: Alan A. Lahoud, Erik Schaffernicht, Johannes A. Stork

Abstract: Learning latent costs of transitions on graphs from trajectories demonstrations under various contextual features is challenging but useful for path planning. Yet, existing methods either oversimplify cost assumptions or scale poorly with the number of observed trajectories. This paper introduces DataSP, a differentiable all-to-all shortest path algorithm to facilitate learning latent costs from t… ▽ More Learning latent costs of transitions on graphs from trajectories demonstrations under various contextual features is challenging but useful for path planning. Yet, existing methods either oversimplify cost assumptions or scale poorly with the number of observed trajectories. This paper introduces DataSP, a differentiable all-to-all shortest path algorithm to facilitate learning latent costs from trajectories. It allows to learn from a large number of trajectories in each learning step without additional computation. Complex latent cost functions from contextual features can be represented in the algorithm through a neural network approximation. We further propose a method to sample paths from DataSP in order to reconstruct/mimic observed paths' distributions. We prove that the inferred distribution follows the maximum entropy principle. We show that DataSP outperforms state-of-the-art differentiable combinatorial solver and classical machine learning approaches in predicting paths on graphs. △ Less

Submitted 30 May, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

arXiv:2405.01198 [pdf, other]

Towards Interpretable Reinforcement Learning with Constrained Normalizing Flow Policies

Authors: Finn Rietz, Erik Schaffernicht, Stefan Heinrich, Johannes A. Stork

Abstract: Reinforcement learning policies are typically represented by black-box neural networks, which are non-interpretable and not well-suited for safety-critical domains. To address both of these issues, we propose constrained normalizing flow policies as interpretable and safe-by-construction policy models. We achieve safety for reinforcement learning problems with instantaneous safety constraints, for… ▽ More Reinforcement learning policies are typically represented by black-box neural networks, which are non-interpretable and not well-suited for safety-critical domains. To address both of these issues, we propose constrained normalizing flow policies as interpretable and safe-by-construction policy models. We achieve safety for reinforcement learning problems with instantaneous safety constraints, for which we can exploit domain knowledge by analytically constructing a normalizing flow that ensures constraint satisfaction. The normalizing flow corresponds to an interpretable sequence of transformations on action samples, each ensuring alignment with respect to a particular constraint. Our experiments reveal benefits beyond interpretability in an easier learning objective and maintained constraint satisfaction throughout the entire learning process. Our approach leverages constraints over reward engineering while offering enhanced interpretability, safety, and direct means of providing domain knowledge to the agent without relying on complex reward functions. △ Less

Submitted 2 May, 2024; originally announced May 2024.

arXiv:2310.17785 [pdf, other]

Learning Extrinsic Dexterity with Parameterized Manipulation Primitives

Authors: Shih-Min Yang, Martin Magnusson, Johannes A. Stork, Todor Stoyanov

Abstract: Many practically relevant robot gras** problems feature a target object for which all grasps are occluded, e.g., by the environment. Single-shot grasp planning invariably fails in such scenarios. Instead, it is necessary to first manipulate the object into a configuration that affords a grasp. We solve this problem by learning a sequence of actions that utilize the environment to change the obje… ▽ More Many practically relevant robot gras** problems feature a target object for which all grasps are occluded, e.g., by the environment. Single-shot grasp planning invariably fails in such scenarios. Instead, it is necessary to first manipulate the object into a configuration that affords a grasp. We solve this problem by learning a sequence of actions that utilize the environment to change the object's pose. Concretely, we employ hierarchical reinforcement learning to combine a sequence of learned parameterized manipulation primitives. By learning the low-level manipulation policies, our approach can control the object's state through exploiting interactions between the object, the gripper, and the environment. Designing such a complex behavior analytically would be infeasible under uncontrolled conditions, as an analytic approach requires accurate physical modeling of the interaction and contact dynamics. In contrast, we learn a hierarchical policy model that operates directly on depth perception data, without the need for object detection, pose estimation, or manual design of controllers. We evaluate our approach on picking box-shaped objects of various weight, shape, and friction properties from a constrained table-top workspace. Our method transfers to a real robot and is able to successfully complete the object picking task in 98\% of experimental trials. Supplementary information and videos can be found at https://shihminyang.github.io/ED-PMP/. △ Less

Submitted 9 May, 2024; v1 submitted 26 October, 2023; originally announced October 2023.

Comments: 2024 IEEE International Conference on Robotics and Automation (ICRA 2024)

arXiv:2310.07493 [pdf, other]

Diversity for Contingency: Learning Diverse Behaviors for Efficient Adaptation and Transfer

Authors: Finn Rietz, Johannes Andreas Stork

Abstract: Discovering all useful solutions for a given task is crucial for transferable RL agents, to account for changes in the task or transition dynamics. This is not considered by classical RL algorithms that are only concerned with finding the optimal policy, given the current task and dynamics. We propose a simple method for discovering all possible solutions of a given task, to obtain an agent that p… ▽ More Discovering all useful solutions for a given task is crucial for transferable RL agents, to account for changes in the task or transition dynamics. This is not considered by classical RL algorithms that are only concerned with finding the optimal policy, given the current task and dynamics. We propose a simple method for discovering all possible solutions of a given task, to obtain an agent that performs well in the transfer setting and adapts quickly to changes in the task or transition dynamics. Our method iteratively learns a set of policies, while each subsequent policy is constrained to yield a solution that is unlikely under all previous policies. Unlike prior methods, our approach does not require learning additional models for novelty detection and avoids balancing task and novelty reward signals, by directly incorporating the constraint into the action selection and optimization steps. △ Less

Submitted 11 October, 2023; originally announced October 2023.

Comments: Presented at the third RL-Conform workshop at IROS 2023

arXiv:2310.02360 [pdf, other]

Prioritized Soft Q-Decomposition for Lexicographic Reinforcement Learning

Authors: Finn Rietz, Erik Schaffernicht, Stefan Heinrich, Johannes Andreas Stork

Abstract: Reinforcement learning (RL) for complex tasks remains a challenge, primarily due to the difficulties of engineering scalar reward functions and the inherent inefficiency of training models from scratch. Instead, it would be better to specify complex tasks in terms of elementary subtasks and to reuse subtask solutions whenever possible. In this work, we address continuous space lexicographic multi-… ▽ More Reinforcement learning (RL) for complex tasks remains a challenge, primarily due to the difficulties of engineering scalar reward functions and the inherent inefficiency of training models from scratch. Instead, it would be better to specify complex tasks in terms of elementary subtasks and to reuse subtask solutions whenever possible. In this work, we address continuous space lexicographic multi-objective RL problems, consisting of prioritized subtasks, which are notoriously difficult to solve. We show that these can be scalarized with a subtask transformation and then solved incrementally using value decomposition. Exploiting this insight, we propose prioritized soft Q-decomposition (PSQD), a novel algorithm for learning and adapting subtask solutions under lexicographic priorities in continuous state-action spaces. PSQD offers the ability to reuse previously learned subtask solutions in a zero-shot composition, followed by an adaptation step. Its ability to use retained subtask training data for offline learning eliminates the need for new environment interaction during adaptation. We demonstrate the efficacy of our approach by presenting successful learning, reuse, and adaptation results for both low- and high-dimensional simulated robot control tasks, as well as offline learning results. In contrast to baseline approaches, PSQD does not trade off between conflicting subtasks or priority constraints and satisfies subtask priorities during learning. PSQD provides an intuitive framework for tackling complex RL problems, offering insights into the inner workings of the subtask composition. △ Less

Submitted 2 May, 2024; v1 submitted 3 October, 2023; originally announced October 2023.

Comments: Camera ready version

arXiv:2210.08600 [pdf, other]

Heterogeneous Full-body Control of a Mobile Manipulator with Behavior Trees

Authors: Marco Iannotta, David Cáceres Domínguez, Johannes A. Stork, Erik Schaffernicht, Todor Stoyanov

Abstract: Integrating the heterogeneous controllers of a complex mechanical system, such as a mobile manipulator, within the same structure and in a modular way is still challenging. In this work we extend our framework based on Behavior Trees for the control of a redundant mechanical system to the problem of commanding more complex systems that involve multiple low-level controllers. This allows the integr… ▽ More Integrating the heterogeneous controllers of a complex mechanical system, such as a mobile manipulator, within the same structure and in a modular way is still challenging. In this work we extend our framework based on Behavior Trees for the control of a redundant mechanical system to the problem of commanding more complex systems that involve multiple low-level controllers. This allows the integrated systems to achieve non-trivial goals that require coordination among the sub-systems. △ Less

Submitted 16 October, 2022; originally announced October 2022.

Comments: arXiv admin note: substantial text overlap with arXiv:2209.08619

arXiv:2210.02891 [pdf, other]

Transferring Knowledge for Reinforcement Learning in Contact-Rich Manipulation

Authors: Quantao Yang, Johannes A. Stork, Todor Stoyanov

Abstract: In manufacturing, assembly tasks have been a challenge for learning algorithms due to variant dynamics of different environments. Reinforcement learning (RL) is a promising framework to automatically learn these tasks, yet it is still not easy to apply a learned policy or skill, that is the ability of solving a task, to a similar environment even if the deployment conditions are only slightly diff… ▽ More In manufacturing, assembly tasks have been a challenge for learning algorithms due to variant dynamics of different environments. Reinforcement learning (RL) is a promising framework to automatically learn these tasks, yet it is still not easy to apply a learned policy or skill, that is the ability of solving a task, to a similar environment even if the deployment conditions are only slightly different. In this paper, we address the challenge of transferring knowledge within a family of similar tasks by leveraging multiple skill priors. We propose to learn prior distribution over the specific skill required to accomplish each task and compose the family of skill priors to guide learning the policy for a new task by comparing the similarity between the target task and the prior ones. Our method learns a latent action space representing the skill embedding from demonstrated trajectories for each prior task. We have evaluated our method on a set of peg-in-hole insertion tasks and demonstrate better generalization to new tasks that have never been encountered during training. △ Less

Submitted 19 September, 2022; originally announced October 2022.

arXiv:2209.09536 [pdf, other]

Towards Task-Prioritized Policy Composition

Authors: Finn Rietz, Erik Schaffernicht, Todor Stoyanov, Johannes A. Stork

Abstract: Combining learned policies in a prioritized, ordered manner is desirable because it allows for modular design and facilitates data reuse through knowledge transfer. In control theory, prioritized composition is realized by null-space control, where low-priority control actions are projected into the null-space of high-priority control actions. Such a method is currently unavailable for Reinforceme… ▽ More Combining learned policies in a prioritized, ordered manner is desirable because it allows for modular design and facilitates data reuse through knowledge transfer. In control theory, prioritized composition is realized by null-space control, where low-priority control actions are projected into the null-space of high-priority control actions. Such a method is currently unavailable for Reinforcement Learning. We propose a novel, task-prioritized composition framework for Reinforcement Learning, which involves a novel concept: The indifferent-space of Reinforcement Learning policies. Our framework has the potential to facilitate knowledge transfer and modular design while greatly increasing data efficiency and data reuse for Reinforcement Learning agents. Further, our approach can ensure high-priority constraint satisfaction, which makes it promising for learning in safety-critical domains like robotics. Unlike null-space control, our approach allows learning globally optimal policies for the compound task by online learning in the indifference-space of higher-level policies after initial compound policy construction. △ Less

Submitted 20 September, 2022; originally announced September 2022.

arXiv:2209.08619 [pdf, other]

doi 10.1109/LRA.2022.3211481

A Stack-of-Tasks Approach Combined with Behavior Trees: a New Framework for Robot Control

Authors: David Cáceres Domínguez, Marco Iannotta, Johannes A. Stork, Erik Schaffernicht, Todor Stoyanov

Abstract: Stack-of-Tasks (SoT) control allows a robot to simultaneously fulfill a number of prioritized goals formulated in terms of (in)equality constraints in error space. Since this approach solves a sequence of Quadratic Programs (QP) at each time-step, without taking into account any temporal state evolution, it is suitable for dealing with local disturbances. However, its limitation lies in the handli… ▽ More Stack-of-Tasks (SoT) control allows a robot to simultaneously fulfill a number of prioritized goals formulated in terms of (in)equality constraints in error space. Since this approach solves a sequence of Quadratic Programs (QP) at each time-step, without taking into account any temporal state evolution, it is suitable for dealing with local disturbances. However, its limitation lies in the handling of situations that require non-quadratic objectives to achieve a specific goal, as well as situations where countering the control disturbance would require a locally suboptimal action. Recent works address this shortcoming by exploiting Finite State Machines (FSMs) to compose the tasks in such a way that the robot does not get stuck in local minima. Nevertheless, the intrinsic trade-off between reactivity and modularity that characterizes FSMs makes them impractical for defining reactive behaviors in dynamic environments. In this letter, we combine the SoT control strategy with Behavior Trees (BTs), a task switching structure that addresses some of the limitations of the FSMs in terms of reactivity, modularity and re-usability. Experimental results on a Franka Emika Panda 7-DOF manipulator show the robustness of our framework, that allows the robot to benefit from the reactivity of both SoT and BTs. △ Less

Submitted 18 September, 2022; originally announced September 2022.

arXiv:2107.13977 [pdf, other]

Underwater Acoustic Networks for Security Risk Assessment in Public Drinking Water Reservoirs

Authors: Jörg Stork, Philip Wenzel, Severin Landwein, Maria-Elena Algorri, Martin Zaefferer, Wolfgang Kusch, Martin Staubach, Thomas Bartz-Beielstein, Hartmut Köhn, Hermann Dejager, Christian Wolf

Abstract: We have built a novel system for the surveillance of drinking water reservoirs using underwater sensor networks. We implement an innovative AI-based approach to detect, classify and localize underwater events. In this paper, we describe the technology and cognitive AI architecture of the system based on one of the sensor networks, the hydrophone network. We discuss the challenges of installing and… ▽ More We have built a novel system for the surveillance of drinking water reservoirs using underwater sensor networks. We implement an innovative AI-based approach to detect, classify and localize underwater events. In this paper, we describe the technology and cognitive AI architecture of the system based on one of the sensor networks, the hydrophone network. We discuss the challenges of installing and using the hydrophone network in a water reservoir where traffic, visitors, and variable water conditions create a complex, varying environment. Our AI solution uses an autoencoder for unsupervised learning of latent encodings for classification and anomaly detection, and time delay estimates for sound localization. Finally, we present the results of experiments carried out in a laboratory pool and the water reservoir and discuss the system's potential. △ Less

Submitted 29 July, 2021; originally announced July 2021.

arXiv:2105.07960 [pdf, other]

doi 10.1145/3449726.3463171

Behavior-based Neuroevolutionary Training in Reinforcement Learning

Authors: Jörg Stork, Martin Zaefferer, Nils Eisler, Patrick Tichelmann, Thomas Bartz-Beielstein, A. E. Eiben

Abstract: In addition to their undisputed success in solving classical optimization problems, neuroevolutionary and population-based algorithms have become an alternative to standard reinforcement learning methods. However, evolutionary methods often lack the sample efficiency of standard value-based methods that leverage gathered state and value experience. If reinforcement learning for real-world problems… ▽ More In addition to their undisputed success in solving classical optimization problems, neuroevolutionary and population-based algorithms have become an alternative to standard reinforcement learning methods. However, evolutionary methods often lack the sample efficiency of standard value-based methods that leverage gathered state and value experience. If reinforcement learning for real-world problems with significant resource cost is considered, sample efficiency is essential. The enhancement of evolutionary algorithms with experience exploiting methods is thus desired and promises valuable insights. This work presents a hybrid algorithm that combines topology-changing neuroevolutionary optimization with value-based reinforcement learning. We illustrate how the behavior of policies can be used to create distance and loss functions, which benefit from stored experiences and calculated state values. They allow us to model behavior and perform a directed search in the behavior space by gradient-free evolutionary algorithms and surrogate-based optimization. For this purpose, we consolidate different methods to generate and optimize agent policies, creating a diverse population. We exemplify the performance of our algorithm on standard benchmarks and a purpose-built real-world problem. Our results indicate that combining methods can enhance the sample efficiency and learning speed for evolutionary approaches. △ Less

Submitted 17 May, 2021; originally announced May 2021.

arXiv:2005.06195 [pdf, other]

The effect of Target Normalization and Momentum on Dying ReLU

Authors: Isac Arnekvist, J. Frederico Carvalho, Danica Kragic, Johannes A. Stork

Abstract: Optimizing parameters with momentum, normalizing data values, and using rectified linear units (ReLUs) are popular choices in neural network (NN) regression. Although ReLUs are popular, they can collapse to a constant function and "die", effectively removing their contribution from the model. While some mitigations are known, the underlying reasons of ReLUs dying during optimization are currently… ▽ More Optimizing parameters with momentum, normalizing data values, and using rectified linear units (ReLUs) are popular choices in neural network (NN) regression. Although ReLUs are popular, they can collapse to a constant function and "die", effectively removing their contribution from the model. While some mitigations are known, the underlying reasons of ReLUs dying during optimization are currently poorly understood. In this paper, we consider the effects of target normalization and momentum on dying ReLUs. We find empirically that unit variance targets are well motivated and that ReLUs die more easily, when target variance approaches zero. To further investigate this matter, we analyze a discrete-time linear autonomous system, and show theoretically how this relates to a model with a single ReLU and how common properties can result in dying ReLU. We also analyze the gradients of a single-ReLU model to identify saddle points and regions corresponding to dying ReLU and how parameters evolve into these regions when momentum is used. Finally, we show empirically that this problem persist, and is aggravated, for deeper models including residual networks. △ Less

Submitted 13 May, 2020; originally announced May 2020.

arXiv:2003.00925 [pdf, other]

doi 10.1007/s00170-020-06094-z

CAAI -- A Cognitive Architecture to Introduce Artificial Intelligence in Cyber-Physical Production Systems

Authors: Andreas Fischbach, Jan Strohschein, Andreas Bunte, Jörg Stork, Heide Faeskorn-Woyke, Natalia Moriz, Thomas Bartz-Beielstein

Abstract: This paper introduces CAAI, a novel cognitive architecture for artificial intelligence in cyber-physical production systems. The goal of the architecture is to reduce the implementation effort for the usage of artificial intelligence algorithms. The core of the CAAI is a cognitive module that processes declarative goals of the user, selects suitable models and algorithms, and creates a configurati… ▽ More This paper introduces CAAI, a novel cognitive architecture for artificial intelligence in cyber-physical production systems. The goal of the architecture is to reduce the implementation effort for the usage of artificial intelligence algorithms. The core of the CAAI is a cognitive module that processes declarative goals of the user, selects suitable models and algorithms, and creates a configuration for the execution of a processing pipeline on a big data platform. Constant observation and evaluation against performance criteria assess the performance of pipelines for many and varying use cases. Based on these evaluations, the pipelines are automatically adapted if necessary. The modular design with well-defined interfaces enables the reusability and extensibility of pipeline components. A big data platform implements this modular design supported by technologies such as Docker, Kubernetes, and Kafka for virtualization and orchestration of the individual components and their communication. The implementation of the architecture is evaluated using a real-world use case. △ Less

Submitted 26 February, 2020; originally announced March 2020.

arXiv:2002.04911 [pdf, other]

Ensemble of Sparse Gaussian Process Experts for Implicit Surface Map** with Streaming Data

Authors: Johannes A. Stork, Todor Stoyanov

Abstract: Creating maps is an essential task in robotics and provides the basis for effective planning and navigation. In this paper, we learn a compact and continuous implicit surface map of an environment from a stream of range data with known poses. For this, we create and incrementally adjust an ensemble of approximate Gaussian process (GP) experts which are each responsible for a different part of the… ▽ More Creating maps is an essential task in robotics and provides the basis for effective planning and navigation. In this paper, we learn a compact and continuous implicit surface map of an environment from a stream of range data with known poses. For this, we create and incrementally adjust an ensemble of approximate Gaussian process (GP) experts which are each responsible for a different part of the map. Instead of inserting all arriving data into the GP models, we greedily trade-off between model complexity and prediction error. Our algorithm therefore uses less resources on areas with few geometric features and more where the environment is rich in variety. We evaluate our approach on synthetic and real-world data sets and analyze sensitivity to parameters and measurement noise. The results show that we can learn compact and accurate implicit surface models under different conditions, with a performance comparable to or better than that of exact GP regression with subsampled data. △ Less

Submitted 12 February, 2020; originally announced February 2020.

arXiv:1912.07024 [pdf, other]

Multi-Object Rearrangement with Monte Carlo Tree Search:A Case Study on Planar Nonprehensile Sorting

Authors: Haoran Song, Joshua A. Haustein, Weihao Yuan, Kaiyu Hang, Michael Yu Wang, Danica Kragic, Johannes A. Stork

Abstract: In this work, we address a planar non-prehensile sorting task. Here, a robot needs to push many densely packed objects belonging to different classes into a configuration where these classes are clearly separated from each other. To achieve this, we propose to employ Monte Carlo tree search equipped with a task-specific heuristic function. We evaluate the algorithm on various simulated and real-wo… ▽ More In this work, we address a planar non-prehensile sorting task. Here, a robot needs to push many densely packed objects belonging to different classes into a configuration where these classes are clearly separated from each other. To achieve this, we propose to employ Monte Carlo tree search equipped with a task-specific heuristic function. We evaluate the algorithm on various simulated and real-world sorting tasks. We observe that the algorithm is capable to reliably sort large numbers of convex and non-convex objects, as well as convex objects in the presence of immovable obstacles. △ Less

Submitted 18 January, 2021; v1 submitted 15 December, 2019; originally announced December 2019.

Comments: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2020; Project page at http://haoran-song.github.io/mcts-sorting/

arXiv:1907.09300 [pdf, other]

doi 10.1145/3321707.3321829

Surrogate Models for Enhancing the Efficiency of Neuroevolution in Reinforcement Learning

Authors: Jörg Stork, Martin Zaefferer, Thomas Bartz-Beielstein, A. E. Eiben

Abstract: In the last years, reinforcement learning received a lot of attention. One method to solve reinforcement learning tasks is Neuroevolution, where neural networks are optimized by evolutionary algorithms. A disadvantage of Neuroevolution is that it can require numerous function evaluations, while not fully utilizing the available information from each fitness evaluation. This is especially problemat… ▽ More In the last years, reinforcement learning received a lot of attention. One method to solve reinforcement learning tasks is Neuroevolution, where neural networks are optimized by evolutionary algorithms. A disadvantage of Neuroevolution is that it can require numerous function evaluations, while not fully utilizing the available information from each fitness evaluation. This is especially problematic when fitness evaluations become expensive. To reduce the cost of fitness evaluations, surrogate models can be employed to partially replace the fitness function. The difficulty of surrogate modeling for Neuroevolution is the complex search space and how to compare different networks. To that end, recent studies showed that a kernel based approach, particular with phenotypic distance measures, works well. These kernels compare different networks via their behavior (phenotype) rather than their topology or encoding (genotype). In this work, we discuss the use of surrogate model-based Neuroevolution (SMB-NE) using a phenotypic distance for reinforcement learning. In detail, we investigate a) the potential of SMB-NE with respect to evaluation efficiency and b) how to select adequate input sets for the phenotypic distance measure in a reinforcement learning problem. The results indicate that we are able to considerably increase the evaluation efficiency using dynamic input sets. △ Less

Submitted 22 July, 2019; originally announced July 2019.

Comments: This is the authors version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in Genetic and Evolutionary Computation Conference (GECCO 2019)

Journal ref: 2019, Genetic and Evolutionary Computation Conference (GECCO 2019), Prague, Czech Republic. ACM, New York, NY, USA

arXiv:1907.07075 [pdf, other]

doi 10.1145/3319619.3326815

Prediction of neural network performance by phenotypic modeling

Authors: Alexander Hagg, Martin Zaefferer, Jörg Stork, Adam Gaier

Abstract: Surrogate models are used to reduce the burden of expensive-to-evaluate objective functions in optimization. By creating models which map genomes to objective values, these models can estimate the performance of unknown inputs, and so be used in place of expensive objective functions. Evolutionary techniques such as genetic programming or neuroevolution commonly alter the structure of the genome i… ▽ More Surrogate models are used to reduce the burden of expensive-to-evaluate objective functions in optimization. By creating models which map genomes to objective values, these models can estimate the performance of unknown inputs, and so be used in place of expensive objective functions. Evolutionary techniques such as genetic programming or neuroevolution commonly alter the structure of the genome itself. A lack of consistency in the genotype is a fatal blow to data-driven modeling techniques: interpolation between points is impossible without a common input space. However, while the dimensionality of genotypes may differ across individuals, in many domains, such as controllers or classifiers, the dimensionality of the input and output remains constant. In this work we leverage this insight to embed differing neural networks into the same input space. To judge the difference between the behavior of two neural networks, we give them both the same input sequence, and examine the difference in output. This difference, the phenotypic distance, can then be used to situate these networks into a common input space, allowing us to produce surrogate models which can predict the performance of neural networks regardless of topology. In a robotic navigation task, we show that models trained using this phenotypic embedding perform as well or better as those trained on the weight values of a fixed topology neural network. We establish such phenotypic surrogate models as a promising and flexible approach which enables surrogate modeling even for representations that undergo structural changes. △ Less

Submitted 16 July, 2019; originally announced July 2019.

arXiv:1907.02555 [pdf, other]

Object Placement Planning and Optimization for Robot Manipulators

Authors: Joshua A. Haustein, Kaiyu Hang, Johannes Stork, Danica Kragic

Abstract: We address the problem of motion planning for a robotic manipulator with the task to place a grasped object in a cluttered environment. In this task, we need to locate a collision-free pose for the object that a) facilitates the stable placement of the object, b) is reachable by the robot manipulator and c) optimizes a user-given placement objective. Because of the placement objective, this proble… ▽ More We address the problem of motion planning for a robotic manipulator with the task to place a grasped object in a cluttered environment. In this task, we need to locate a collision-free pose for the object that a) facilitates the stable placement of the object, b) is reachable by the robot manipulator and c) optimizes a user-given placement objective. Because of the placement objective, this problem is more challenging than classical motion planning where the target pose is defined from the start. To solve this task, we propose an anytime algorithm that integrates sampling-based motion planning for the robot manipulator with a novel hierarchical search for suitable placement poses. We evaluate our approach on a dual-arm robot for two different placement objectives, and observe its effectiveness even in challenging scenarios. △ Less

Submitted 4 July, 2019; originally announced July 2019.

Comments: 8 pages

arXiv:1903.03831 [pdf, other]

Data-Driven Model Predictive Control for Food-Cutting

Authors: Ioanna Mitsioni, Yiannis Karayiannidis, Johannes A. Stork, Danica Kragic

Abstract: Modelling of contact-rich tasks is challenging and cannot be entirely solved using classical control approaches due to the difficulty of constructing an analytic description of the contact dynamics. Additionally, in a manipulation task like food-cutting, purely learning-based methods such as Reinforcement Learning, require either a vast amount of data that is expensive to collect on a real robot,… ▽ More Modelling of contact-rich tasks is challenging and cannot be entirely solved using classical control approaches due to the difficulty of constructing an analytic description of the contact dynamics. Additionally, in a manipulation task like food-cutting, purely learning-based methods such as Reinforcement Learning, require either a vast amount of data that is expensive to collect on a real robot, or a highly realistic simulation environment, which is currently not available. This paper presents a data-driven control approach that employs a recurrent neural network to model the dynamics for a Model Predictive Controller. We build upon earlier work limited to torque-controlled robots and redefine it for velocity controlled ones. We incorporate force/torque sensor measurements, reformulate and further extend the control problem formulation. We evaluate the performance on objects used for training, as well as on unknown objects, by means of the cutting rates achieved and demonstrate that the method can efficiently treat different cases with only one dynamic model. Finally we investigate the behavior of the system during force-critical instances of cutting and illustrate its adaptive behavior in difficult cases. △ Less

Submitted 26 September, 2019; v1 submitted 9 March, 2019; originally announced March 2019.

arXiv:1902.03419 [pdf, other]

Improving NeuroEvolution Efficiency by Surrogate Model-based Optimization with Phenotypic Distance Kernels

Authors: Jörg Stork, Martin Zaefferer, Thomas Bartz-Beielstein

Abstract: In NeuroEvolution, the topologies of artificial neural networks are optimized with evolutionary algorithms to solve tasks in data regression, data classification, or reinforcement learning. One downside of NeuroEvolution is the large amount of necessary fitness evaluations, which might render it inefficient for tasks with expensive evaluations, such as real-time learning. For these expensive optim… ▽ More In NeuroEvolution, the topologies of artificial neural networks are optimized with evolutionary algorithms to solve tasks in data regression, data classification, or reinforcement learning. One downside of NeuroEvolution is the large amount of necessary fitness evaluations, which might render it inefficient for tasks with expensive evaluations, such as real-time learning. For these expensive optimization tasks, surrogate model-based optimization is frequently applied as it features a good evaluation efficiency. While a combination of both procedures appears as a valuable solution, the definition of adequate distance measures for the surrogate modeling process is difficult. In this study, we will extend cartesian genetic programming of artificial neural networks by the use of surrogate model-based optimization. We propose different distance measures and test our algorithm on a replicable benchmark task. The results indicate that we can significantly increase the evaluation efficiency and that a phenotypic distance, which is based on the behavior of the associated neural networks, is most promising. △ Less

Submitted 9 February, 2019; originally announced February 2019.

Comments: The final authenticated version of this publication will appear in the proceedings of the Applications of Evolutionary Computation - 22nd International Conference EvoApplications 2019 in the LNCS by Springer

arXiv:1901.03557 [pdf, other]

Learning Manipulation States and Actions for Efficient Non-prehensile Rearrangement Planning

Authors: Joshua A. Haustein, Isac Arnekvist, Johannes Stork, Kaiyu Hang, Danica Kragic

Abstract: This paper addresses non-prehensile rearrangement planning problems where a robot is tasked to rearrange objects among obstacles on a planar surface. We present an efficient planning algorithm that is designed to impose few assumptions on the robot's non-prehensile manipulation abilities and is simple to adapt to different robot embodiments. For this, we combine sampling-based motion planning with… ▽ More This paper addresses non-prehensile rearrangement planning problems where a robot is tasked to rearrange objects among obstacles on a planar surface. We present an efficient planning algorithm that is designed to impose few assumptions on the robot's non-prehensile manipulation abilities and is simple to adapt to different robot embodiments. For this, we combine sampling-based motion planning with reinforcement learning and generative modeling. Our algorithm explores the composite configuration space of objects and robot as a search over robot actions, forward simulated in a physics model. This search is guided by a generative model that provides robot states from which an object can be transported towards a desired state, and a learned policy that provides corresponding robot actions. As an efficient generative model, we apply Generative Adversarial Networks. We implement and evaluate our approach for robots endowed with configuration spaces in SE(2). We demonstrate empirically the efficacy of our algorithm design choices and observe more than 2x speedup in planning time on various test scenarios compared to a state-of-the-art approach. △ Less

Submitted 11 January, 2019; originally announced January 2019.

arXiv:1810.04438 [pdf, other]

Global Search with Bernoulli Alternation Kernel for Task-oriented Gras** Informed by Simulation

Authors: Rika Antonova, Mia Kokic, Johannes A. Stork, Danica Kragic

Abstract: We develop an approach that benefits from large simulated datasets and takes full advantage of the limited online data that is most relevant. We propose a variant of Bayesian optimization that alternates between using informed and uninformed kernels. With this Bernoulli Alternation Kernel we ensure that discrepancies between simulation and reality do not hinder adapting robot control policies onli… ▽ More We develop an approach that benefits from large simulated datasets and takes full advantage of the limited online data that is most relevant. We propose a variant of Bayesian optimization that alternates between using informed and uninformed kernels. With this Bernoulli Alternation Kernel we ensure that discrepancies between simulation and reality do not hinder adapting robot control policies online. The proposed approach is applied to a challenging real-world problem of task-oriented gras** with novel objects. Our further contribution is a neural network architecture and training pipeline that use experience from gras** objects in simulation to learn grasp stability scores. We learn task scores from a labeled dataset with a convolutional network, which is used to construct an informed kernel for our variant of Bayesian optimization. Experiments on an ABB Yumi robot with real sensor data demonstrate success of our approach, despite the challenge of fulfilling task requirements and high uncertainty over physical properties of objects. △ Less

Submitted 10 October, 2018; originally announced October 2018.

Comments: To appear in 2nd Conference on Robot Learning (CoRL) 2018

arXiv:1809.04322 [pdf, other]

Reinforcement Learning in Topology-based Representation for Human Body Movement with Whole Arm Manipulation

Authors: Weihao Yuan, Kaiyu Hang, Haoran Song, Danica Kragic, Michael Y. Wang, Johannes A. Stork

Abstract: Moving a human body or a large and bulky object can require the strength of whole arm manipulation (WAM). This type of manipulation places the load on the robot's arms and relies on global properties of the interaction to succeed---rather than local contacts such as gras** or non-prehensile pushing. In this paper, we learn to generate motions that enable WAM for holding and transporting of human… ▽ More Moving a human body or a large and bulky object can require the strength of whole arm manipulation (WAM). This type of manipulation places the load on the robot's arms and relies on global properties of the interaction to succeed---rather than local contacts such as gras** or non-prehensile pushing. In this paper, we learn to generate motions that enable WAM for holding and transporting of humans in certain rescue or patient care scenarios. We model the task as a reinforcement learning problem in order to provide a behavior that can directly respond to external perturbation and human motion. For this, we represent global properties of the robot-human interaction with topology-based coordinates that are computed from arm and torso positions. These coordinates also allow transferring the learned policy to other body shapes and sizes. For training and evaluation, we simulate a dynamic sea rescue scenario and show in quantitative experiments that the policy can solve unseen scenarios with differently-shaped humans, floating humans, or with perception noise. Our qualitative experiments show the subsequent transporting after holding is achieved and we demonstrate that the policy can be directly transferred to a real world setting. △ Less

Submitted 12 September, 2018; originally announced September 2018.

Comments: Submitted to RA-L with ICRA 2019

arXiv:1809.03548 [pdf, other]

VPE: Variational Policy Embedding for Transfer Reinforcement Learning

Authors: Isac Arnekvist, Danica Kragic, Johannes A. Stork

Abstract: Reinforcement Learning methods are capable of solving complex problems, but resulting policies might perform poorly in environments that are even slightly different. In robotics especially, training and deployment conditions often vary and data collection is expensive, making retraining undesirable. Simulation training allows for feasible training times, but on the other hand suffers from a realit… ▽ More Reinforcement Learning methods are capable of solving complex problems, but resulting policies might perform poorly in environments that are even slightly different. In robotics especially, training and deployment conditions often vary and data collection is expensive, making retraining undesirable. Simulation training allows for feasible training times, but on the other hand suffers from a reality-gap when applied in real-world settings. This raises the need of efficient adaptation of policies acting in new environments. We consider this as a problem of transferring knowledge within a family of similar Markov decision processes. For this purpose we assume that Q-functions are generated by some low-dimensional latent variable. Given such a Q-function, we can find a master policy that can adapt given different values of this latent variable. Our method learns both the generative map** and an approximate posterior of the latent variables, enabling identification of policies for new tasks by searching only in the latent space, rather than the space of all policies. The low-dimensional space, and master policy found by our method enables policies to quickly adapt to new environments. We demonstrate the method on both a pendulum swing-up task in simulation, and for simulation-to-real transfer on a pushing task. △ Less

Submitted 14 September, 2018; v1 submitted 10 September, 2018; originally announced September 2018.

arXiv:1808.08818 [pdf, other]

doi 10.1007/s11047-020-09820-4

A new Taxonomy of Continuous Global Optimization Algorithms

Authors: Jörg Stork, A. E. Eiben, Thomas Bartz-Beielstein

Abstract: Surrogate-based optimization, nature-inspired metaheuristics, and hybrid combinations have become state of the art in algorithm design for solving real-world optimization problems. Still, it is difficult for practitioners to get an overview that explains their advantages in comparison to a large number of available methods in the scope of optimization. Available taxonomies lack the embedding of cu… ▽ More Surrogate-based optimization, nature-inspired metaheuristics, and hybrid combinations have become state of the art in algorithm design for solving real-world optimization problems. Still, it is difficult for practitioners to get an overview that explains their advantages in comparison to a large number of available methods in the scope of optimization. Available taxonomies lack the embedding of current approaches in the larger context of this broad field. This article presents a taxonomy of the field, which explores and matches algorithm strategies by extracting similarities and differences in their search strategies. A particular focus lies on algorithms using surrogates, nature-inspired designs, and those created by design optimization. The extracted features of components or operators allow us to create a set of classification indicators to distinguish between a small number of classes. The features allow a deeper understanding of components of the search strategies and further indicate the close connections between the different algorithm designs. We present intuitive analogies to explain the basic principles of the search algorithms, particularly useful for novices in this research field. Furthermore, this taxonomy allows recommendations for the applicability of the corresponding algorithms. △ Less

Submitted 6 May, 2020; v1 submitted 27 August, 2018; originally announced August 2018.

Comments: 35 pages total, 28 written pages, 4 figures, 2019 Reworked Version

Journal ref: Natural Computing, 2020, 1-24

arXiv:1807.07839 [pdf, other]

Distance-based Kernels for Surrogate Model-based Neuroevolution

Authors: Jörg Stork, Martin Zaefferer, Thomas Bartz-Beielstein

Abstract: The topology optimization of artificial neural networks can be particularly difficult if the fitness evaluations require expensive experiments or simulations. For that reason, the optimization methods may need to be supported by surrogate models. We propose different distances for a suitable surrogate model, and compare them in a simple numerical test scenario. The topology optimization of artificial neural networks can be particularly difficult if the fitness evaluations require expensive experiments or simulations. For that reason, the optimization methods may need to be supported by surrogate models. We propose different distances for a suitable surrogate model, and compare them in a simple numerical test scenario. △ Less

Submitted 20 July, 2018; originally announced July 2018.

Comments: 4 pages, 1 figure. This publication was accepted to the Developmental Neural Networks Workshop of the Parallel Problem Solving from Nature 2018 (PPSN XV) conference

arXiv:1807.01019 [pdf, other]

Linear Combination of Distance Measures for Surrogate Models in Genetic Programming

Authors: Martin Zaefferer, Jörg Stork, Oliver Flasch, Thomas Bartz-Beielstein

Abstract: Surrogate models are a well established approach to reduce the number of expensive function evaluations in continuous optimization. In the context of genetic programming, surrogate modeling still poses a challenge, due to the complex genotype-phenotype relationships. We investigate how different genotypic and phenotypic distance measures can be used to learn Kriging models as surrogates. We compar… ▽ More Surrogate models are a well established approach to reduce the number of expensive function evaluations in continuous optimization. In the context of genetic programming, surrogate modeling still poses a challenge, due to the complex genotype-phenotype relationships. We investigate how different genotypic and phenotypic distance measures can be used to learn Kriging models as surrogates. We compare the measures and suggest to use their linear combination in a kernel. We test the resulting model in an optimization framework, using symbolic regression problem instances as a benchmark. Our experiments show that the model provides valuable information. Firstly, the model enables an improved optimization performance compared to a model-free algorithm. Furthermore, the model provides information on the contribution of different distance measures. The data indicates that a phenotypic distance measure is important during the early stages of an optimization run when less data is available. In contrast, genotypic measures, such as the tree edit distance, contribute more during the later stages. △ Less

Submitted 3 July, 2018; originally announced July 2018.

Comments: The final authenticated version of this publication will appear in the proceedings of the 15th International Conference on Parallel Problem Solving from Nature 2018 (PPSN XV), published in the LNCS by Springer

arXiv:1803.05752 [pdf, other]

doi 10.1109/ICRA.2018.8462863

Rearrangement with Nonprehensile Manipulation Using Deep Reinforcement Learning

Authors: Weihao Yuan, Johannes A. Stork, Danica Kragic, Michael Y. Wang, Kaiyu Hang

Abstract: Rearranging objects on a tabletop surface by means of nonprehensile manipulation is a task which requires skillful interaction with the physical world. Usually, this is achieved by precisely modeling physical properties of the objects, robot, and the environment for explicit planning. In contrast, as explicitly modeling the physical environment is not always feasible and involves various uncertain… ▽ More Rearranging objects on a tabletop surface by means of nonprehensile manipulation is a task which requires skillful interaction with the physical world. Usually, this is achieved by precisely modeling physical properties of the objects, robot, and the environment for explicit planning. In contrast, as explicitly modeling the physical environment is not always feasible and involves various uncertainties, we learn a nonprehensile rearrangement strategy with deep reinforcement learning based on only visual feedback. For this, we model the task with rewards and train a deep Q-network. Our potential field-based heuristic exploration strategy reduces the amount of collisions which lead to suboptimal outcomes and we actively balance the training set to avoid bias towards poor examples. Our training process leads to quicker learning and better performance on the task as compared to uniform exploration and standard experience replay. We demonstrate empirical evidence from simulation that our method leads to a success rate of 85%, show that our system can cope with sudden changes of the environment, and compare our performance with human level performance. △ Less

Submitted 15 March, 2018; originally announced March 2018.

Comments: 2018 International Conference on Robotics and Automation

arXiv:1611.06070 [pdf, other]

Rope through Loop Insertion for Robotic Knotting: A Virtual Magnetic Field Formulation

Authors: Alejandro Marzinotto, Johannes A. Stork

Abstract: Inserting an end of a rope through a loop is a common and important action that is required for creating most types of knots. To perform this action, we need to pass the end of the rope through an area that is enclosed by another segment of rope. As for all knotting actions, the robot must for this exercise control over a semi-compliant and flexible body whose complex 3d shape is difficult to perc… ▽ More Inserting an end of a rope through a loop is a common and important action that is required for creating most types of knots. To perform this action, we need to pass the end of the rope through an area that is enclosed by another segment of rope. As for all knotting actions, the robot must for this exercise control over a semi-compliant and flexible body whose complex 3d shape is difficult to perceive and follow. Additionally, the target loop often deforms during the insertion. We address this problem by defining a virtual magnetic field through the loop's interior and use the Biot Savart law to guide the robotic manipulator that holds the end of the rope. This approach directly defines, for any manipulator position, a motion vector that results in a path that passes through the loop. The motion vector is directly derived from the position of the loop and changes as soon as it moves or deforms. In simulation, we test the insertion action against dynamic loop deformation of different intensity. We also combine insertion with grasp and release actions, coordinated by a hybrid control system, to tie knots in simulation and with a NAO robot. △ Less

Submitted 18 November, 2016; originally announced November 2016.

Comments: 8 pages

Report number: 978-91-7729-218-0

arXiv:1510.03924 [pdf]

Comparison of different Methods for Univariate Time Series Imputation in R

Authors: Steffen Moritz, Alexis Sardá, Thomas Bartz-Beielstein, Martin Zaefferer, Jörg Stork

Abstract: Missing values in datasets are a well-known problem and there are quite a lot of R packages offering imputation functions. But while imputation in general is well covered within R, it is hard to find functions for imputation of univariate time series. The problem is, most standard imputation techniques can not be applied directly. Most algorithms rely on inter-attribute correlations, while univari… ▽ More Missing values in datasets are a well-known problem and there are quite a lot of R packages offering imputation functions. But while imputation in general is well covered within R, it is hard to find functions for imputation of univariate time series. The problem is, most standard imputation techniques can not be applied directly. Most algorithms rely on inter-attribute correlations, while univariate time series imputation needs to employ time dependencies. This paper provides an overview of univariate time series imputation in general and an in-detail insight into the respective implementations within R packages. Furthermore, we experimentally compare the R functions on different time series using four different ratios of missing data. Our results show that either an interpolation with seasonal kalman filter from the zoo package or a linear interpolation on seasonal loess decomposed data from the forecast package were the most effective methods for dealing with missing data in most of the scenarios assessed in this paper. △ Less

Submitted 13 October, 2015; originally announced October 2015.

Showing 1–31 of 31 results for author: Stork, J