-
Direct Imitation Learning-based Visual Servoing using the Large Projection Formulation
Authors:
Sayantan Auddy,
Antonio Paolillo,
Justus Piater,
Matteo Saveriano
Abstract:
Today robots must be safe, versatile, and user-friendly to operate in unstructured and human-populated environments. Dynamical system-based imitation learning enables robots to perform complex tasks stably and without explicit programming, greatly simplifying their real-world deployment. To exploit the full potential of these systems it is crucial to implement closed loops that use visual feedback…
▽ More
Today robots must be safe, versatile, and user-friendly to operate in unstructured and human-populated environments. Dynamical system-based imitation learning enables robots to perform complex tasks stably and without explicit programming, greatly simplifying their real-world deployment. To exploit the full potential of these systems it is crucial to implement closed loops that use visual feedback. Vision permits to cope with environmental changes, but is complex to handle due to the high dimension of the image space. This study introduces a dynamical system-based imitation learning for direct visual servoing. It leverages off-the-shelf deep learning-based perception backbones to extract robust features from the raw input image, and an imitation learning strategy to execute sophisticated robot motions. The learning blocks are integrated using the large projection task priority formulation. As demonstrated through extensive experimental analysis, the proposed method realizes complex tasks with a robotic manipulator.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Unsupervised Learning of Effective Actions in Robotics
Authors:
Marko Zaric,
Jakob Hollenstein,
Justus Piater,
Erwan Renaudo
Abstract:
Learning actions that are relevant to decision-making and can be executed effectively is a key problem in autonomous robotics. Current state-of-the-art action representations in robotics lack proper effect-driven learning of the robot's actions. Although successful in solving manipulation tasks, deep learning methods also lack this ability, in addition to their high cost in terms of memory or trai…
▽ More
Learning actions that are relevant to decision-making and can be executed effectively is a key problem in autonomous robotics. Current state-of-the-art action representations in robotics lack proper effect-driven learning of the robot's actions. Although successful in solving manipulation tasks, deep learning methods also lack this ability, in addition to their high cost in terms of memory or training data. In this paper, we propose an unsupervised algorithm to discretize a continuous motion space and generate "action prototypes", each producing different effects in the environment. After an exploration phase, the algorithm automatically builds a representation of the effects and groups motions into action prototypes, where motions more likely to produce an effect are represented more than those that lead to negligible changes. We evaluate our method on a simulated stair-climbing reinforcement learning task, and the preliminary results show that our effect driven discretization outperforms uniformly and randomly sampled discretizations in convergence speed and maximum reward.
△ Less
Submitted 3 April, 2024;
originally announced April 2024.
-
Continual Domain Randomization
Authors:
Josip Josifovski,
Sayantan Auddy,
Mohammadhossein Malmir,
Justus Piater,
Alois Knoll,
Nicolás Navarro-Guerrero
Abstract:
Domain Randomization (DR) is commonly used for sim2real transfer of reinforcement learning (RL) policies in robotics. Most DR approaches require a simulator with a fixed set of tunable parameters from the start of the training, from which the parameters are randomized simultaneously to train a robust model for use in the real world. However, the combined randomization of many parameters increases…
▽ More
Domain Randomization (DR) is commonly used for sim2real transfer of reinforcement learning (RL) policies in robotics. Most DR approaches require a simulator with a fixed set of tunable parameters from the start of the training, from which the parameters are randomized simultaneously to train a robust model for use in the real world. However, the combined randomization of many parameters increases the task difficulty and might result in sub-optimal policies. To address this problem and to provide a more flexible training process, we propose Continual Domain Randomization (CDR) for RL that combines domain randomization with continual learning to enable sequential training in simulation on a subset of randomization parameters at a time. Starting from a model trained in a non-randomized simulation where the task is easier to solve, the model is trained on a sequence of randomizations, and continual learning is employed to remember the effects of previous randomizations. Our robotic reaching and gras** tasks experiments show that the model trained in this fashion learns effectively in simulation and performs robustly on the real robot while matching or outperforming baselines that employ combined randomization or sequential randomization without continual learning. Our code and videos are available at https://continual-dr.github.io/.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
Effect of Optimizer, Initializer, and Architecture of Hypernetworks on Continual Learning from Demonstration
Authors:
Sayantan Auddy,
Sebastian Bergner,
Justus Piater
Abstract:
In continual learning from demonstration (CLfD), a robot learns a sequence of real-world motion skills continually from human demonstrations. Recently, hypernetworks have been successful in solving this problem. In this paper, we perform an exploratory study of the effects of different optimizers, initializers, and network architectures on the continual learning performance of hypernetworks for CL…
▽ More
In continual learning from demonstration (CLfD), a robot learns a sequence of real-world motion skills continually from human demonstrations. Recently, hypernetworks have been successful in solving this problem. In this paper, we perform an exploratory study of the effects of different optimizers, initializers, and network architectures on the continual learning performance of hypernetworks for CLfD. Our results show that adaptive learning rate optimizers work well, but initializers specially designed for hypernetworks offer no advantages for CLfD. We also show that hypernetworks that are capable of stable trajectory predictions are robust to different network architectures. Our open-source code is available at https://github.com/sebastianbergner/ExploringCLFD.
△ Less
Submitted 31 December, 2023;
originally announced January 2024.
-
Unified Task and Motion Planning using Object-centric Abstractions of Motion Constraints
Authors:
Alejandro Agostini,
Justus Piater
Abstract:
In task and motion planning (TAMP), the ambiguity and underdetermination of abstract descriptions used by task planning methods make it difficult to characterize physical constraints needed to successfully execute a task. The usual approach is to overlook such constraints at task planning level and to implement expensive sub-symbolic geometric reasoning techniques that perform multiple calls on un…
▽ More
In task and motion planning (TAMP), the ambiguity and underdetermination of abstract descriptions used by task planning methods make it difficult to characterize physical constraints needed to successfully execute a task. The usual approach is to overlook such constraints at task planning level and to implement expensive sub-symbolic geometric reasoning techniques that perform multiple calls on unfeasible actions, plan corrections, and re-planning until a feasible solution is found. We propose an alternative TAMP approach that unifies task and motion planning into a single heuristic search. Our approach is based on an object-centric abstraction of motion constraints that permits leveraging the computational efficiency of off-the-shelf AI heuristic search to yield physically feasible plans. These plans can be directly transformed into object and motion parameters for task execution without the need of intensive sub-symbolic geometric reasoning.
△ Less
Submitted 29 December, 2023;
originally announced December 2023.
-
Colored Noise in PPO: Improved Exploration and Performance through Correlated Action Sampling
Authors:
Jakob Hollenstein,
Georg Martius,
Justus Piater
Abstract:
Proximal Policy Optimization (PPO), a popular on-policy deep reinforcement learning method, employs a stochastic policy for exploration. In this paper, we propose a colored noise-based stochastic policy variant of PPO. Previous research highlighted the importance of temporal correlation in action noise for effective exploration in off-policy reinforcement learning. Building on this, we investigate…
▽ More
Proximal Policy Optimization (PPO), a popular on-policy deep reinforcement learning method, employs a stochastic policy for exploration. In this paper, we propose a colored noise-based stochastic policy variant of PPO. Previous research highlighted the importance of temporal correlation in action noise for effective exploration in off-policy reinforcement learning. Building on this, we investigate whether correlated noise can also enhance exploration in on-policy methods like PPO. We discovered that correlated noise for action selection improves learning performance and outperforms the currently popular uncorrelated white noise approach in on-policy methods. Unlike off-policy learning, where pink noise was found to be highly effective, we found that a colored noise, intermediate between white and pink, performed best for on-policy learning in PPO. We examined the impact of varying the amount of data collected for each update by modifying the number of parallel simulation environments for data collection and observed that with a larger number of parallel environments, more strongly correlated noise is beneficial. Due to the significant impact and ease of implementation, we recommend switching to correlated noise as the default noise source in PPO.
△ Less
Submitted 17 June, 2024; v1 submitted 18 December, 2023;
originally announced December 2023.
-
Regularity as Intrinsic Reward for Free Play
Authors:
Cansu Sancaktar,
Justus Piater,
Georg Martius
Abstract:
We propose regularity as a novel reward signal for intrinsically-motivated reinforcement learning. Taking inspiration from child development, we postulate that striving for structure and order helps guide exploration towards a subspace of tasks that are not favored by naive uncertainty-based intrinsic rewards. Our generalized formulation of Regularity as Intrinsic Reward (RaIR) allows us to operat…
▽ More
We propose regularity as a novel reward signal for intrinsically-motivated reinforcement learning. Taking inspiration from child development, we postulate that striving for structure and order helps guide exploration towards a subspace of tasks that are not favored by naive uncertainty-based intrinsic rewards. Our generalized formulation of Regularity as Intrinsic Reward (RaIR) allows us to operationalize it within model-based reinforcement learning. In a synthetic environment, we showcase the plethora of structured patterns that can emerge from pursuing this regularity objective. We also demonstrate the strength of our method in a multi-object robotic manipulation environment. We incorporate RaIR into free play and use it to complement the model's epistemic uncertainty as an intrinsic reward. Doing so, we witness the autonomous construction of towers and other regular structures during free play, which leads to a substantial improvement in zero-shot downstream task performance on assembly tasks.
△ Less
Submitted 3 December, 2023;
originally announced December 2023.
-
Scalable and Efficient Continual Learning from Demonstration via a Hypernetwork-generated Stable Dynamics Model
Authors:
Sayantan Auddy,
Jakob Hollenstein,
Matteo Saveriano,
Antonio Rodríguez-Sánchez,
Justus Piater
Abstract:
Learning from demonstration (LfD) provides an efficient way to train robots. The learned motions should be convergent and stable, but to be truly effective in the real world, LfD-capable robots should also be able to remember multiple motion skills. Existing stable-LfD approaches lack the capability of multi-skill retention. Although recent work on continual-LfD has shown that hypernetwork-generat…
▽ More
Learning from demonstration (LfD) provides an efficient way to train robots. The learned motions should be convergent and stable, but to be truly effective in the real world, LfD-capable robots should also be able to remember multiple motion skills. Existing stable-LfD approaches lack the capability of multi-skill retention. Although recent work on continual-LfD has shown that hypernetwork-generated neural ordinary differential equation solvers (NODE) can learn multiple LfD tasks sequentially, this approach lacks stability guarantees. We propose an approach for stable continual-LfD in which a hypernetwork generates two networks: a trajectory learning dynamics model, and a trajectory stabilizing Lyapunov function. The introduction of stability generates convergent trajectories, but more importantly it also greatly improves continual learning performance, especially in the size-efficient chunked hypernetworks. With our approach, a single hypernetwork learns stable trajectories of the robot's end-effector position and orientation simultaneously, and does so continually for a sequence of real-world LfD tasks without retraining on past demonstrations. We also propose stochastic hypernetwork regularization with a single randomly sampled regularization term, which reduces the cumulative training time cost for N tasks from O$(N^2)$ to O$(N)$ without any loss in performance on real-world tasks. We empirically evaluate our approach on the popular LASA dataset, on high-dimensional extensions of LASA (including up to 32 dimensions) to assess scalability, and on a novel extended robotic task dataset (RoboTasks9) to assess real-world performance. In trajectory error metrics, stability metrics and continual learning metrics our approach performs favorably, compared to other baselines. Our open-source code and datasets are available at https://github.com/sayantanauddy/clfd-snode.
△ Less
Submitted 9 January, 2024; v1 submitted 6 November, 2023;
originally announced November 2023.
-
Constrained Equation Learner Networks for Precision-Preserving Extrapolation of Robotic Skills
Authors:
Hector Perez-Villeda,
Justus Piater,
Matteo Saveriano
Abstract:
In Programming by Demonstration, the robot learns novel skills from human demonstrations. After learning, the robot should be able not only to reproduce the skill, but also to generalize it to shifted domains without collecting new training data. Adaptation to similar domains has been investigated in the literature; however, an open problem is how to adapt learned skills to different conditions th…
▽ More
In Programming by Demonstration, the robot learns novel skills from human demonstrations. After learning, the robot should be able not only to reproduce the skill, but also to generalize it to shifted domains without collecting new training data. Adaptation to similar domains has been investigated in the literature; however, an open problem is how to adapt learned skills to different conditions that are outside of the data distribution, and, more important, how to preserve the precision of the desired adaptations. This paper presents a novel supervised learning framework called Constrained Equation Learner Networks that addresses the trajectory adaptation problem in Programming by Demonstrations from a constrained regression perspective. While conventional approaches for constrained regression use one kind of basis function, e.g., Gaussian, we exploit Equation Learner Networks to learn a set of analytical expressions and use them as basis functions. These basis functions are learned from demonstration with the objective to minimize deviations from the training data while imposing constraints that represent the desired adaptations, like new initial or final points or maintaining the trajectory within given bounds. Our approach addresses three main difficulties in adapting robotic trajectories: 1) minimizing the distortion of the trajectory for new adaptations; 2) preserving the precision of the adaptations; and 3) dealing with the lack of intuition about the structure of basis functions. We validate our approach both in simulation and in real experiments in a set of robotic tasks that require adaptation due to changes in the environment, and we compare obtained results with two existing approaches. Performed experiments show that Constrained Equation Learner Networks outperform state of the art approaches by increasing generalization and adaptability of robotic skills.
△ Less
Submitted 4 November, 2023;
originally announced November 2023.
-
Differentiable Forward Kinematics for TensorFlow 2
Authors:
Lukas Mölschl,
Jakob J. Hollenstein,
Justus Piater
Abstract:
Robotic systems are often complex and depend on the integration of a large number of software components. One important component in robotic systems provides the calculation of forward kinematics, which is required by both motion-planning and perception related components. End-to-end learning systems based on deep learning require passing gradients across component boundaries.Typical software impl…
▽ More
Robotic systems are often complex and depend on the integration of a large number of software components. One important component in robotic systems provides the calculation of forward kinematics, which is required by both motion-planning and perception related components. End-to-end learning systems based on deep learning require passing gradients across component boundaries.Typical software implementations of forward kinematics are not differentiable, and thus prevent the construction of gradient-based, end-to-end learning systems. In this paper we present a library compatible with ROS-URDF that computes forward kinematics while simultaneously giving access to the gradients w.r.t. joint configurations and model parameters, allowing gradient-based learning and model identification. Our Python library is based on Tensorflow~2 and is auto-differentiable. It supports calculating a large number of kinematic configurations on the GPU in parallel, yielding a considerable performance improvement compared to sequential CPU-based calculation. https://github.com/lumoe/dlkinematics.git
△ Less
Submitted 10 March, 2023; v1 submitted 24 January, 2023;
originally announced January 2023.
-
Learning and Extrapolation of Robotic Skills using Task-Parameterized Equation Learner Networks
Authors:
Hector Villeda,
Justus Piater,
Matteo Saveriano
Abstract:
Imitation learning approaches achieve good generalization within the range of the training data, but tend to generate unpredictable motions when querying outside this range. We present a novel approach to imitation learning with enhanced extrapolation capabilities that exploits the so-called Equation Learner Network (EQLN). Unlike conventional approaches, EQLNs use supervised learning to fit a set…
▽ More
Imitation learning approaches achieve good generalization within the range of the training data, but tend to generate unpredictable motions when querying outside this range. We present a novel approach to imitation learning with enhanced extrapolation capabilities that exploits the so-called Equation Learner Network (EQLN). Unlike conventional approaches, EQLNs use supervised learning to fit a set of analytical expressions that allows them to extrapolate beyond the range of the training data. We augment the task demonstrations with a set of task-dependent parameters representing spatial properties of each motion and use them to train the EQLN. At run time, the features are used to query the Task-Parameterized Equation Learner Network (TP-EQLN) and generate the corresponding robot trajectory. The set of features encodes kinematic constraints of the task such as desired height or a final point to reach. We validate the results of our approach on manipulation tasks where it is important to preserve the shape of the motion in the extrapolation domain. Our approach is also compared with existing state-of-the-art approaches, in simulation and in real setups. The experimental results show that TP-EQLN can respect the constraints of the trajectory encoded in the feature parameters, even in the extrapolation domain, while preserving the overall shape of the trajectory provided in the demonstrations.
△ Less
Submitted 16 December, 2022;
originally announced December 2022.
-
Improving the Trainability of Deep Neural Networks through Layerwise Batch-Entropy Regularization
Authors:
David Peer,
Bart Keulen,
Sebastian Stabinger,
Justus Piater,
Antonio Rodríguez-Sánchez
Abstract:
Training deep neural networks is a very demanding task, especially challenging is how to adapt architectures to improve the performance of trained models. We can find that sometimes, shallow networks generalize better than deep networks, and the addition of more layers results in higher training and test errors. The deep residual learning framework addresses this degradation problem by adding skip…
▽ More
Training deep neural networks is a very demanding task, especially challenging is how to adapt architectures to improve the performance of trained models. We can find that sometimes, shallow networks generalize better than deep networks, and the addition of more layers results in higher training and test errors. The deep residual learning framework addresses this degradation problem by adding skip connections to several neural network layers. It would at first seem counter-intuitive that such skip connections are needed to train deep networks successfully as the expressivity of a network would grow exponentially with depth. In this paper, we first analyze the flow of information through neural networks. We introduce and evaluate the batch-entropy which quantifies the flow of information through each layer of a neural network. We prove empirically and theoretically that a positive batch-entropy is required for gradient descent-based training approaches to optimize a given loss function successfully. Based on those insights, we introduce batch-entropy regularization to enable gradient descent-based training algorithms to optimize the flow of information through each hidden layer individually. With batch-entropy regularization, gradient descent optimizers can transform untrainable networks into trainable networks. We show empirically that we can therefore train a "vanilla" fully connected network and convolutional neural network -- no skip connections, batch normalization, dropout, or any other architectural tweak -- with 500 layers by simply adding the batch-entropy regularization term to the loss function. The effect of batch-entropy regularization is not only evaluated on vanilla neural networks, but also on residual networks, autoencoders, and also transformer models over a wide range of computer vision as well as natural language processing tasks.
△ Less
Submitted 1 August, 2022;
originally announced August 2022.
-
Action Noise in Off-Policy Deep Reinforcement Learning: Impact on Exploration and Performance
Authors:
Jakob Hollenstein,
Sayantan Auddy,
Matteo Saveriano,
Erwan Renaudo,
Justus Piater
Abstract:
Many Deep Reinforcement Learning (D-RL) algorithms rely on simple forms of exploration such as the additive action noise often used in continuous control domains. Typically, the scaling factor of this action noise is chosen as a hyper-parameter and is kept constant during training. In this paper, we focus on action noise in off-policy deep reinforcement learning for continuous control. We analyze…
▽ More
Many Deep Reinforcement Learning (D-RL) algorithms rely on simple forms of exploration such as the additive action noise often used in continuous control domains. Typically, the scaling factor of this action noise is chosen as a hyper-parameter and is kept constant during training. In this paper, we focus on action noise in off-policy deep reinforcement learning for continuous control. We analyze how the learned policy is impacted by the noise type, noise scale, and impact scaling factor reduction schedule. We consider the two most prominent types of action noise, Gaussian and Ornstein-Uhlenbeck noise, and perform a vast experimental campaign by systematically varying the noise type and scale parameter, and by measuring variables of interest like the expected return of the policy and the state-space coverage during exploration. For the latter, we propose a novel state-space coverage measure $\operatorname{X}_{\mathcal{U}\text{rel}}$ that is more robust to estimation artifacts caused by points close to the state-space boundary than previously-proposed measures. Larger noise scales generally increase state-space coverage. However, we found that increasing the space coverage using a larger noise scale is often not beneficial. On the contrary, reducing the noise scale over the training process reduces the variance and generally improves the learning performance. We conclude that the best noise type and scale are environment dependent, and based on our observations derive heuristic rules for guiding the choice of the action noise as a starting point for further optimization.
△ Less
Submitted 5 June, 2023; v1 submitted 8 June, 2022;
originally announced June 2022.
-
Continual Learning from Demonstration of Robotics Skills
Authors:
Sayantan Auddy,
Jakob Hollenstein,
Matteo Saveriano,
Antonio Rodríguez-Sánchez,
Justus Piater
Abstract:
Methods for teaching motion skills to robots focus on training for a single skill at a time. Robots capable of learning from demonstration can considerably benefit from the added ability to learn new movement skills without forgetting what was learned in the past. To this end, we propose an approach for continual learning from demonstration using hypernetworks and neural ordinary differential equa…
▽ More
Methods for teaching motion skills to robots focus on training for a single skill at a time. Robots capable of learning from demonstration can considerably benefit from the added ability to learn new movement skills without forgetting what was learned in the past. To this end, we propose an approach for continual learning from demonstration using hypernetworks and neural ordinary differential equation solvers. We empirically demonstrate the effectiveness of this approach in remembering long sequences of trajectory learning tasks without the need to store any data from past demonstrations. Our results show that hypernetworks outperform other state-of-the-art continual learning approaches for learning from demonstration. In our experiments, we use the popular LASA benchmark, and two new datasets of kinesthetic demonstrations collected with a real robot that we introduce in this paper called the HelloWorld and RoboTasks datasets. We evaluate our approach on a physical robot and demonstrate its effectiveness in learning real-world robotic tasks involving changing positions as well as orientations. We report both trajectory error metrics and continual learning metrics, and we propose two new continual learning metrics. Our code, along with the newly collected datasets, is available at https://github.com/sayantanauddy/clfd.
△ Less
Submitted 12 April, 2023; v1 submitted 14 February, 2022;
originally announced February 2022.
-
Learning Descriptor of Constrained Task from Demonstration
Authors:
Xiang Zhang,
Matteo Saveriano,
Justus Piater
Abstract:
Constrained objects, such as doors and drawers are often complex and share a similar structure in the human environment. A robot needs to interact accurately with constrained objects to safely and successfully complete a task. Learning from Demonstration offers an appropriate path to learn the object structure of the demonstration for unknown objects for unknown tasks. There is work that extracts…
▽ More
Constrained objects, such as doors and drawers are often complex and share a similar structure in the human environment. A robot needs to interact accurately with constrained objects to safely and successfully complete a task. Learning from Demonstration offers an appropriate path to learn the object structure of the demonstration for unknown objects for unknown tasks. There is work that extracts the kinematic model from motion. However, the gap remains when the robot faces a new object with a similar model but different contexts, e.g. size, appearance, etc. In this paper, we propose a framework that integrates all the information needed to learn a constrained motion from a depth camera into a descriptor of the constrained task. The descriptor consists of object information, gras** point model, constrained model, and reference frame model. By associating constrained learning and reference frame with the constrained object, we demonstrate that the robot can learn the book opening model and parameter of the constraints from demonstration and generalize to novel books.
△ Less
Submitted 17 March, 2021;
originally announced March 2021.
-
DeepSym: Deep Symbol Generation and Rule Learning from Unsupervised Continuous Robot Interaction for Planning
Authors:
Alper Ahmetoglu,
M. Yunus Seker,
Justus Piater,
Erhan Oztop,
Emre Ugur
Abstract:
We propose a novel general method that finds action-grounded, discrete object and effect categories and builds probabilistic rules over them for non-trivial action planning. Our robot interacts with objects using an initial action repertoire that is assumed to be acquired earlier and observes the effects it can create in the environment. To form action-grounded object, effect, and relational categ…
▽ More
We propose a novel general method that finds action-grounded, discrete object and effect categories and builds probabilistic rules over them for non-trivial action planning. Our robot interacts with objects using an initial action repertoire that is assumed to be acquired earlier and observes the effects it can create in the environment. To form action-grounded object, effect, and relational categories, we employ a binary bottleneck layer in a predictive, deep encoder-decoder network that takes the image of the scene and the action applied as input, and generates the resulting effects in the scene in pixel coordinates. After learning, the binary latent vector represents action-driven object categories based on the interaction experience of the robot. To distill the knowledge represented by the neural network into rules useful for symbolic reasoning, a decision tree is trained to reproduce its decoder function. Probabilistic rules are extracted from the decision paths of the tree and are represented in the Probabilistic Planning Domain Definition Language (PPDDL), allowing off-the-shelf planners to operate on the knowledge extracted from the sensorimotor experience of the robot. The deployment of the proposed approach for a simulated robotic manipulator enabled the discovery of discrete representations of object properties such as `rollable' and `insertable'. In turn, the use of these representations as symbols allowed the generation of effective plans for achieving goals, such as building towers of the desired height, demonstrating the effectiveness of the approach for multi-step object manipulation. Finally, we demonstrate that the system is not only restricted to the robotics domain by assessing its applicability to the MNIST 8-puzzle domain in which learned symbols allow for the generation of plans that move the empty tile into any given position.
△ Less
Submitted 27 September, 2022; v1 submitted 4 December, 2020;
originally announced December 2020.
-
How do Offline Measures for Exploration in Reinforcement Learning behave?
Authors:
Jakob J. Hollenstein,
Sayantan Auddy,
Matteo Saveriano,
Erwan Renaudo,
Justus Piater
Abstract:
Sufficient exploration is paramount for the success of a reinforcement learning agent. Yet, exploration is rarely assessed in an algorithm-independent way. We compare the behavior of three data-based, offline exploration metrics described in the literature on intuitive simple distributions and highlight problems to be aware of when using them. We propose a fourth metric,uniform relative entropy, a…
▽ More
Sufficient exploration is paramount for the success of a reinforcement learning agent. Yet, exploration is rarely assessed in an algorithm-independent way. We compare the behavior of three data-based, offline exploration metrics described in the literature on intuitive simple distributions and highlight problems to be aware of when using them. We propose a fourth metric,uniform relative entropy, and implement it using either a k-nearest-neighbor or a nearest-neighbor-ratio estimator, highlighting that the implementation choices have a profound impact on these measures.
△ Less
Submitted 29 October, 2020;
originally announced October 2020.
-
Improving the Exploration of Deep Reinforcement Learning in Continuous Domains using Planning for Policy Search
Authors:
Jakob J. Hollenstein,
Erwan Renaudo,
Matteo Saveriano,
Justus Piater
Abstract:
Local policy search is performed by most Deep Reinforcement Learning (D-RL) methods, which increases the risk of getting trapped in a local minimum. Furthermore, the availability of a simulation model is not fully exploited in D-RL even in simulation-based training, which potentially decreases efficiency. To better exploit simulation models in policy search, we propose to integrate a kinodynamic p…
▽ More
Local policy search is performed by most Deep Reinforcement Learning (D-RL) methods, which increases the risk of getting trapped in a local minimum. Furthermore, the availability of a simulation model is not fully exploited in D-RL even in simulation-based training, which potentially decreases efficiency. To better exploit simulation models in policy search, we propose to integrate a kinodynamic planner in the exploration strategy and to learn a control policy in an offline fashion from the generated environment interactions. We call the resulting model-based reinforcement learning method PPS (Planning for Policy Search). We compare PPS with state-of-the-art D-RL methods in typical RL settings including underactuated systems. The comparison shows that PPS, guided by the kinodynamic planner, collects data from a wider region of the state space. This generates training data that helps PPS discover better policies.
△ Less
Submitted 24 October, 2020;
originally announced October 2020.
-
Reconfigurable Behavior Trees: Towards an Executive Framework Meeting High-level Decision Making and Control Layer Features
Authors:
Pilar de la Cruz,
Justus Piater,
Matteo Saveriano
Abstract:
Behavior Trees constitute a widespread AI tool which has been successfully spun out in robotics. Their advantages include simplicity, modularity, and reusability of code. However, Behavior Trees remain a high-level decision making engine; control features cannot be easily integrated. This paper proposes the Reconfigurable Behavior Trees (RBTs), an extension of the traditional BTs that considers ph…
▽ More
Behavior Trees constitute a widespread AI tool which has been successfully spun out in robotics. Their advantages include simplicity, modularity, and reusability of code. However, Behavior Trees remain a high-level decision making engine; control features cannot be easily integrated. This paper proposes the Reconfigurable Behavior Trees (RBTs), an extension of the traditional BTs that considers physical constraints from the robotic environment in the decision making process. We endow RBTs with continuous sensory information that permits the online monitoring of the task execution. The resulting stimulus-driven architecture is capable of dynamically handling changes in the executive context while kee** the execution time low. The proposed framework is evaluated on a set of robotic experiments. The results show that RBTs are a promising approach for robotic task representation, monitoring, and execution.
△ Less
Submitted 31 August, 2020; v1 submitted 21 July, 2020;
originally announced July 2020.
-
Evaluating the Progress of Deep Learning for Visual Relational Concepts
Authors:
Sebastian Stabinger,
Peer David,
Justus Piater,
Antonio Rodríguez-Sánchez
Abstract:
Convolutional Neural Networks (CNNs) have become the state of the art method for image classification in the last ten years. Despite the fact that they achieve superhuman classification accuracy on many popular datasets, they often perform much worse on more abstract image classification tasks. We will show that these difficult tasks are linked to relational concepts from cognitive psychology and…
▽ More
Convolutional Neural Networks (CNNs) have become the state of the art method for image classification in the last ten years. Despite the fact that they achieve superhuman classification accuracy on many popular datasets, they often perform much worse on more abstract image classification tasks. We will show that these difficult tasks are linked to relational concepts from cognitive psychology and that despite progress over the last few years, such relational reasoning tasks still remain difficult for current neural network architectures.
We will review deep learning research that is linked to relational concept learning, even if it was not originally presented from this angle. Reviewing the current literature, we will argue that some form of attention will be an important component of future systems to solve relational tasks.
In addition, we will point out the shortcomings of currently used datasets, and we will recommend steps to make future datasets more relevant for testing systems on relational reasoning.
△ Less
Submitted 13 September, 2021; v1 submitted 29 January, 2020;
originally announced January 2020.
-
Action Representations in Robotics: A Taxonomy and Systematic Classification
Authors:
Philipp Zech,
Erwan Renaudo,
Simon Haller,
Xiang Zhang,
Justus Piater
Abstract:
Understanding and defining the meaning of "action" is substantial for robotics research. This becomes utterly evident when aiming at equip** autonomous robots with robust manipulation skills for action execution. Unfortunately, to this day we still lack both a clear understanding of the concept of an action and a set of established criteria that ultimately characterize an action. In this survey…
▽ More
Understanding and defining the meaning of "action" is substantial for robotics research. This becomes utterly evident when aiming at equip** autonomous robots with robust manipulation skills for action execution. Unfortunately, to this day we still lack both a clear understanding of the concept of an action and a set of established criteria that ultimately characterize an action. In this survey we thus first review existing ideas and theories on the notion and meaning of action. Subsequently we discuss the role of action in robotics and attempt to give a seminal definition of action in accordance with its use in robotics research. Given this definition we then introduce a taxonomy for categorizing action representations in robotics along various dimensions. Finally, we provide a systematic literature survey on action representations in robotics where we categorize relevant literature along our taxonomy. After discussing the current state of the art we conclude with an outlook towards promising research directions.
△ Less
Submitted 12 September, 2018;
originally announced September 2018.
-
Learning Movement Assessment Primitives for Force Interaction Skills
Authors:
Xiang Zhang,
Athanasios S. Polydoros,
Justus Piater
Abstract:
We present a novel, reusable and task-agnostic primitive for assessing the outcome of a force-interaction robotic skill, useful e.g.\ for applications such as quality control in industrial manufacturing. The proposed method is easily programmed by kinesthetic teaching, and the desired adaptability and reusability are achieved by machine learning models. The primitive records sensory data during bo…
▽ More
We present a novel, reusable and task-agnostic primitive for assessing the outcome of a force-interaction robotic skill, useful e.g.\ for applications such as quality control in industrial manufacturing. The proposed method is easily programmed by kinesthetic teaching, and the desired adaptability and reusability are achieved by machine learning models. The primitive records sensory data during both demonstrations and reproductions of a movement. Recordings include the end-effector's Cartesian pose and exerted wrench at each time step. The collected data are then used to train Gaussian Processes which create models of the wrench as a function of the robot's pose. The similarity between the wrench models of the demonstration and the movement's reproduction is derived by measuring their Hellinger distance. This comparison creates features that are fed as inputs to a Naive Bayes classifier which estimates the movement's probability of success. The evaluation is performed on two diverse robotic assembly tasks -- snap-fitting and screwing -- with a total of 5 use cases, 11 demonstrations, and more than 200 movement executions. The performance metrics prove the proposed method's capability of generalization to different demonstrations and movements.
△ Less
Submitted 11 May, 2018;
originally announced May 2018.
-
Symbol Emergence in Cognitive Developmental Systems: a Survey
Authors:
Tadahiro Taniguchi,
Emre Ugur,
Matej Hoffmann,
Lorenzo Jamone,
Takayuki Nagai,
Benjamin Rosman,
Toshihiko Matsuka,
Naoto Iwahashi,
Erhan Oztop,
Justus Piater,
Florentin Wörgötter
Abstract:
Humans use signs, e.g., sentences in a spoken language, for communication and thought. Hence, symbol systems like language are crucial for our communication with other agents and adaptation to our real-world environment. The symbol systems we use in our human society adaptively and dynamically change over time. In the context of artificial intelligence (AI) and cognitive systems, the symbol ground…
▽ More
Humans use signs, e.g., sentences in a spoken language, for communication and thought. Hence, symbol systems like language are crucial for our communication with other agents and adaptation to our real-world environment. The symbol systems we use in our human society adaptively and dynamically change over time. In the context of artificial intelligence (AI) and cognitive systems, the symbol grounding problem has been regarded as one of the central problems related to {\it symbols}. However, the symbol grounding problem was originally posed to connect symbolic AI and sensorimotor information and did not consider many interdisciplinary phenomena in human communication and dynamic symbol systems in our society, which semiotics considered. In this paper, we focus on the symbol emergence problem, addressing not only cognitive dynamics but also the dynamics of symbol systems in society, rather than the symbol grounding problem. We first introduce the notion of a symbol in semiotics from the humanities, to leave the very narrow idea of symbols in symbolic AI. Furthermore, over the years, it became more and more clear that symbol emergence has to be regarded as a multifaceted problem. Therefore, secondly, we review the history of the symbol emergence problem in different fields, including both biological and artificial systems, showing their mutual relations. We summarize the discussion and provide an integrative viewpoint and comprehensive overview of symbol emergence in cognitive systems. Additionally, we describe the challenges facing the creation of cognitive systems that can be part of symbol emergence systems.
△ Less
Submitted 10 July, 2018; v1 submitted 26 January, 2018;
originally announced January 2018.
-
A novel Skill-based Programming Paradigm based on Autonomous Playing and Skill-centric Testing
Authors:
Simon Hangl,
Andreas Mennel,
Justus Piater
Abstract:
We introduce a novel paradigm for robot pro- gramming with which we aim to make robot programming more accessible for unexperienced users. In order to do so we incorporate two major components in one single framework: autonomous skill acquisition by robotic playing and visual programming. Simple robot program skeletons solving a task for one specific situation, so-called basic behaviours, are prov…
▽ More
We introduce a novel paradigm for robot pro- gramming with which we aim to make robot programming more accessible for unexperienced users. In order to do so we incorporate two major components in one single framework: autonomous skill acquisition by robotic playing and visual programming. Simple robot program skeletons solving a task for one specific situation, so-called basic behaviours, are provided by the user. The robot then learns how to solve the same task in many different situations by autonomous playing which reduces the barrier for unexperienced robot programmers. Programmers can use a mix of visual programming and kinesthetic teaching in order to provide these simple program skeletons. The robot program can be implemented interactively by programming parts with visual programming and kinesthetic teaching. We further integrate work on experience-based skill-centric robot software testing which enables the user to continuously test implemented skills without having to deal with the details of specific components.
△ Less
Submitted 18 September, 2017;
originally announced September 2017.
-
Skill Learning by Autonomous Robotic Playing using Active Learning and Creativity
Authors:
Simon Hangl,
Vedran Dunjko,
Hans J. Briegel,
Justus Piater
Abstract:
We treat the problem of autonomous acquisition of manipulation skills where problem-solving strategies are initially available only for a narrow range of situations. We propose to extend the range of solvable situations by autonomous playing with the object. By applying previously-trained skills and behaviours, the robot learns how to prepare situations for which a successful strategy is already k…
▽ More
We treat the problem of autonomous acquisition of manipulation skills where problem-solving strategies are initially available only for a narrow range of situations. We propose to extend the range of solvable situations by autonomous playing with the object. By applying previously-trained skills and behaviours, the robot learns how to prepare situations for which a successful strategy is already known. The information gathered during autonomous play is additionally used to learn an environment model. This model is exploited for active learning and the creative generation of novel preparatory behaviours. We apply our approach on a wide range of different manipulation tasks, e.g. book gras**, gras** of objects of different sizes by selecting different gras** strategies, placement on shelves, and tower disassembly. We show that the creative behaviour generation mechanism enables the robot to solve previously-unsolvable tasks, e.g. tower disassembly. We use success statistics gained during real-world experiments to simulate the convergence behaviour of our system. Experiments show that active improves the learning speed by around 9 percent in the book gras** scenario.
△ Less
Submitted 26 June, 2017;
originally announced June 2017.
-
Autonomous Skill-centric Testing using Deep Learning
Authors:
Simon Hangl,
Sebastian Stabinger,
Justus Piater
Abstract:
Software testing is an important tool to ensure software quality. This is a hard task in robotics due to dynamic environments and the expensive development and time-consuming execution of test cases. Most testing approaches use model-based and / or simulation-based testing to overcome these problems. We propose model-free skill-centric testing in which a robot autonomously executes skills in the r…
▽ More
Software testing is an important tool to ensure software quality. This is a hard task in robotics due to dynamic environments and the expensive development and time-consuming execution of test cases. Most testing approaches use model-based and / or simulation-based testing to overcome these problems. We propose model-free skill-centric testing in which a robot autonomously executes skills in the real world and compares it to previous experiences. The skills are selected by maximising the expected information gain on the distribution of erroneous software functions. We use deep learning to model the sensor data observed during previous successful skill executions and to detect irregularities. Sensor data is connected to function call profiles such that certain misbehaviour can be related to specific functions. We evaluate our approach in simulation and in experiments with a KUKA LWR 4+ robot by purposefully introducing bugs to the software. We demonstrate that these bugs can be detected with high accuracy and without the need for the implementation of specific tests or task-specific models.
△ Less
Submitted 13 August, 2017; v1 submitted 2 March, 2017;
originally announced March 2017.
-
Active and Transfer Learning of Grasps by Kernel Adaptive MCMC
Authors:
Philipp Zech,
Hanchen Xiong,
Justus Piater
Abstract:
Human ability of both versatile gras** of given objects and gras** of novel (as of yet unseen) objects is truly remarkable. This probably arises from the experience infants gather by actively playing around with diverse objects. Moreover, knowledge acquired during this process is reused during learning of how to grasp novel objects. We conjecture that this combined process of active and transf…
▽ More
Human ability of both versatile gras** of given objects and gras** of novel (as of yet unseen) objects is truly remarkable. This probably arises from the experience infants gather by actively playing around with diverse objects. Moreover, knowledge acquired during this process is reused during learning of how to grasp novel objects. We conjecture that this combined process of active and transfer learning boils down to a random search around an object, suitably biased by prior experience, to identify promising grasps. In this paper we present an active learning method for learning of grasps for given objects, and a transfer learning method for learning of grasps for novel objects. Our learning methods apply a kernel adaptive Metropolis-Hastings sampler that learns an approximation of the grasps' probability density of an object while drawing grasp proposals from it. The sampler employs simulated annealing to search for globally-optimal grasps. Our empirical results show promising applicability of our proposed learning schemes.
△ Less
Submitted 19 November, 2016;
originally announced November 2016.
-
Active and Transfer Learning of Grasps by Sampling from Demonstration
Authors:
Philipp Zech,
Justus Piater
Abstract:
We guess humans start acquiring gras** skills as early as at the infant stage by virtue of two key processes. First, infants attempt to learn grasps for known objects by imitating humans. Secondly, knowledge acquired during this process is reused in learning to grasp novel objects. We argue that these processes of active and transfer learning boil down to a random search of grasps on an object,…
▽ More
We guess humans start acquiring gras** skills as early as at the infant stage by virtue of two key processes. First, infants attempt to learn grasps for known objects by imitating humans. Secondly, knowledge acquired during this process is reused in learning to grasp novel objects. We argue that these processes of active and transfer learning boil down to a random search of grasps on an object, suitably biased by prior experience. In this paper we introduce active learning of grasps for known objects as well as transfer learning of grasps for novel objects grounded on kernel adaptive, mode-hop** Markov Chain Monte Carlo. Our experiments show promising applicability of our proposed learning methods.
△ Less
Submitted 19 November, 2016;
originally announced November 2016.
-
Grasp Learning by Sampling from Demonstration
Authors:
Philipp Zech,
Justus Piater
Abstract:
Robotic gras** traditionally relies on object features or shape information for learning new or applying already learned grasps. We argue however that such a strong reliance on object geometric information renders gras** and grasp learning a difficult task in the event of cluttered environments with high uncertainty where reasonable object models are not available. This being so, in this paper…
▽ More
Robotic gras** traditionally relies on object features or shape information for learning new or applying already learned grasps. We argue however that such a strong reliance on object geometric information renders gras** and grasp learning a difficult task in the event of cluttered environments with high uncertainty where reasonable object models are not available. This being so, in this paper we thus investigate the application of model-free stochastic optimization for grasp learning. For this, our proposed learning method requires just a handful of user-demonstrated grasps and an initial prior by a rough sketch of an object's grasp affordance density, yet no object geometric knowledge except for its pose. Our experiments show promising applicability of our proposed learning method.
△ Less
Submitted 19 November, 2016;
originally announced November 2016.
-
25 years of CNNs: Can we compare to human abstraction capabilities?
Authors:
Sebastian Stabinger,
Antonio Rodríguez-Sánchez,
Justus Piater
Abstract:
We try to determine the progress made by convolutional neural networks over the past 25 years in classifying images into abstractc lasses. For this purpose we compare the performance of LeNet to that of GoogLeNet at classifying randomly generated images which are differentiated by an abstract property (e.g., one class contains two objects of the same size, the other class two objects of different…
▽ More
We try to determine the progress made by convolutional neural networks over the past 25 years in classifying images into abstractc lasses. For this purpose we compare the performance of LeNet to that of GoogLeNet at classifying randomly generated images which are differentiated by an abstract property (e.g., one class contains two objects of the same size, the other class two objects of different sizes). Our results show that there is still work to do in order to solve vision problems humans are able to solve without much difficulty.
△ Less
Submitted 28 July, 2016;
originally announced July 2016.
-
Learning Abstract Classes using Deep Learning
Authors:
Sebastian Stabinger,
Antonio Rodriguez-Sanchez,
Justus Piater
Abstract:
Humans are generally good at learning abstract concepts about objects and scenes (e.g.\ spatial orientation, relative sizes, etc.). Over the last years convolutional neural networks have achieved almost human performance in recognizing concrete classes (i.e.\ specific object categories). This paper tests the performance of a current CNN (GoogLeNet) on the task of differentiating between abstract c…
▽ More
Humans are generally good at learning abstract concepts about objects and scenes (e.g.\ spatial orientation, relative sizes, etc.). Over the last years convolutional neural networks have achieved almost human performance in recognizing concrete classes (i.e.\ specific object categories). This paper tests the performance of a current CNN (GoogLeNet) on the task of differentiating between abstract classes which are trivially differentiable for humans. We trained and tested the CNN on the two abstract classes of horizontal and vertical orientation and determined how well the network is able to transfer the learned classes to other, previously unseen objects.
△ Less
Submitted 17 June, 2016;
originally announced June 2016.
-
Robotic Playing for Hierarchical Complex Skill Learning
Authors:
Simon Hangl,
Emre Ugur,
Sandor Szedmak,
Justus Piater
Abstract:
In complex manipulation scenarios (e.g. tasks requiring complex interaction of two hands or in-hand manipulation), generalization is a hard problem. Current methods still either require a substantial amount of (supervised) training data and / or strong assumptions on both the environment and the task. In this paradigm, controllers solving these tasks tend to be complex. We propose a paradigm of ma…
▽ More
In complex manipulation scenarios (e.g. tasks requiring complex interaction of two hands or in-hand manipulation), generalization is a hard problem. Current methods still either require a substantial amount of (supervised) training data and / or strong assumptions on both the environment and the task. In this paradigm, controllers solving these tasks tend to be complex. We propose a paradigm of maintaining simpler controllers solving the task in a small number of specific situations. In order to generalize to novel situations, the robot transforms the environment from novel situations into a situation where the solution of the task is already known. Our solution to this problem is to play with objects and use previously trained skills (basis skills). These skills can either be used for estimating or for changing the current state of the environment and are organized in skill hierarchies. The approach is evaluated in complex pick-and-place scenarios that involve complex manipulation. We further show that these skills can be learned by autonomous playing.
△ Less
Submitted 13 August, 2017; v1 submitted 2 March, 2016;
originally announced March 2016.
-
Proceedings of the 2nd Workshop on Robots in Clutter: Preparing robots for the real world (Berlin, 2013)
Authors:
Michael Zillich,
Maren Bennewitz,
Maria Fox,
Justus Piater,
Dejan Pangercic
Abstract:
This volume represents the proceedings of the 2nd Workshop on Robots in Clutter: Preparing robots for the real world, held June 27, 2013, at the Robotics: Science and Systems conference in Berlin, Germany.
This volume represents the proceedings of the 2nd Workshop on Robots in Clutter: Preparing robots for the real world, held June 27, 2013, at the Robotics: Science and Systems conference in Berlin, Germany.
△ Less
Submitted 15 June, 2013;
originally announced June 2013.
-
ÖAGM/AAPR 2013 - The 37th Annual Workshop of the Austrian Association for Pattern Recognition
Authors:
Justus Piater,
Antonio J. Rodríguez Sánchez
Abstract:
In this editorial, the organizers summarize facts and background about the event.
In this editorial, the organizers summarize facts and background about the event.
△ Less
Submitted 25 May, 2013;
originally announced May 2013.
-
Proceedings of the 37th Annual Workshop of the Austrian Association for Pattern Recognition (ÖAGM/AAPR), 2013
Authors:
Justus Piater,
Antonio Rodríguez-Sánchez
Abstract:
This volume represents the proceedings of the 37th Annual Workshop of the Austrian Association for Pattern Recognition (ÖAGM/AAPR), held May 23-24, 2013, in Innsbruck, Austria.
This volume represents the proceedings of the 37th Annual Workshop of the Austrian Association for Pattern Recognition (ÖAGM/AAPR), held May 23-24, 2013, in Innsbruck, Austria.
△ Less
Submitted 28 May, 2013; v1 submitted 6 April, 2013;
originally announced April 2013.
-
Closed-Loop Learning of Visual Control Policies
Authors:
S. R. Jodogne,
J. H. Piater
Abstract:
In this paper we present a general, flexible framework for learning map**s from images to actions by interacting with the environment. The basic idea is to introduce a feature-based image classifier in front of a reinforcement learning algorithm. The classifier partitions the visual space according to the presence or absence of few highly informative local descriptors that are incrementally sele…
▽ More
In this paper we present a general, flexible framework for learning map**s from images to actions by interacting with the environment. The basic idea is to introduce a feature-based image classifier in front of a reinforcement learning algorithm. The classifier partitions the visual space according to the presence or absence of few highly informative local descriptors that are incrementally selected in a sequence of attempts to remove perceptual aliasing. We also address the problem of fighting overfitting in such a greedy algorithm. Finally, we show how high-level visual features can be generated when the power of local descriptors is insufficient for completely disambiguating the aliased states. This is done by building a hierarchy of composite features that consist of recursive spatial combinations of visual features. We demonstrate the efficacy of our algorithms by solving three visual navigation tasks and a visual version of the classical Car on the Hill control problem.
△ Less
Submitted 10 October, 2011;
originally announced October 2011.