-
Sample Efficient Robot Learning with Structured World Models
Authors:
Tuluhan Akbulut,
Max Merlin,
Shane Parr,
Benedict Quartey,
Skye Thompson
Abstract:
Reinforcement learning has been demonstrated as a flexible and effective approach for learning a range of continuous control tasks, such as those used by robots to manipulate objects in their environment. But in robotics particularly, real-world rollouts are costly, and sample efficiency can be a major limiting factor when learning a new skill. In game environments, the use of world models has bee…
▽ More
Reinforcement learning has been demonstrated as a flexible and effective approach for learning a range of continuous control tasks, such as those used by robots to manipulate objects in their environment. But in robotics particularly, real-world rollouts are costly, and sample efficiency can be a major limiting factor when learning a new skill. In game environments, the use of world models has been shown to improve sample efficiency while still achieving good performance, especially when images or other rich observations are provided. In this project, we explore the use of a world model in a deformable robotic manipulation task, evaluating its effect on sample efficiency when learning to fold a cloth in simulation. We compare the use of RGB image observation with a feature space leveraging built-in structure (keypoints representing the cloth configuration), a common approach in robot skill learning, and compare the impact on task performance and learning efficiency with and without the world model. Our experiments showed that the usage of keypoints increased the performance of the best model on the task by 50%, and in general, the use of a learned or constructed reduced feature space improved task performance and sample efficiency. The use of a state transition predictor(MDN-RNN) in our world models did not have a notable effect on task performance.
△ Less
Submitted 21 October, 2022;
originally announced October 2022.
-
An Exploration of Neural Radiance Field Scene Reconstruction: Synthetic, Real-world and Dynamic Scenes
Authors:
Benedict Quartey,
Tuluhan Akbulut,
Wasiwasi Mgonzo,
Zheng Xin Yong
Abstract:
This project presents an exploration into 3D scene reconstruction of synthetic and real-world scenes using Neural Radiance Field (NeRF) approaches. We primarily take advantage of the reduction in training and rendering time of neural graphic primitives multi-resolution hash encoding, to reconstruct static video game scenes and real-world scenes, comparing and observing reconstruction detail and li…
▽ More
This project presents an exploration into 3D scene reconstruction of synthetic and real-world scenes using Neural Radiance Field (NeRF) approaches. We primarily take advantage of the reduction in training and rendering time of neural graphic primitives multi-resolution hash encoding, to reconstruct static video game scenes and real-world scenes, comparing and observing reconstruction detail and limitations. Additionally, we explore dynamic scene reconstruction using Neural Radiance Fields for Dynamic Scenes(D-NeRF). Finally, we extend the implementation of D-NeRF, originally constrained to handle synthetic scenes to also handle real-world dynamic scenes.
△ Less
Submitted 21 October, 2022;
originally announced October 2022.
-
Bimanual rope manipulation skill synthesis through context dependent correction policy learning from human demonstration
Authors:
T. Baturhan Akbulut,
G. Tuba C. Girgin,
Arash Mehrabi,
Minoru Asada,
Emre Ugur,
Erhan Oztop
Abstract:
Learning from demonstration (LfD) provides a convenient means to equip robots with dexterous skills when demonstration can be obtained in robot intrinsic coordinates. However, the problem of compounding errors in long and complex skills reduces its wide deployment. Since most such complex skills are composed of smaller movements that are combined, considering the target skill as a sequence of comp…
▽ More
Learning from demonstration (LfD) provides a convenient means to equip robots with dexterous skills when demonstration can be obtained in robot intrinsic coordinates. However, the problem of compounding errors in long and complex skills reduces its wide deployment. Since most such complex skills are composed of smaller movements that are combined, considering the target skill as a sequence of compact motor primitives seems reasonable. Here the problem that needs to be tackled is to ensure that a motor primitive ends in a state that allows the successful execution of the subsequent primitive. In this study, we focus on this problem by proposing to learn an explicit correction policy when the expected transition state between primitives is not achieved. The correction policy is itself learned via behavior cloning by the use of a state-of-the-art movement primitive learning architecture, Conditional Neural Motor Primitives (CNMPs). The learned correction policy is then able to produce diverse movement trajectories in a context dependent way. The advantage of the proposed system over learning the complete task as a single action is shown with a table-top setup in simulation, where an object has to be pushed through a corridor in two steps. Then, the applicability of the proposed method to bi-manual knotting in the real world is shown by equip** an upper-body humanoid robot with the skill of making knots over a bar in 3D space. The experiments show that the robot can perform successful knotting even when the faced correction cases are not part of the human demonstration set.
△ Less
Submitted 28 September, 2022;
originally announced September 2022.
-
Reward Conditioned Neural Movement Primitives for Population Based Variational Policy Optimization
Authors:
M. Tuluhan Akbulut,
Utku Bozdogan,
Ahmet Tekden,
Emre Ugur
Abstract:
The aim of this paper is to study the reward based policy exploration problem in a supervised learning approach and enable robots to form complex movement trajectories in challenging reward settings and search spaces. For this, the experience of the robot, which can be bootstrapped from demonstrated trajectories, is used to train a novel Neural Processes-based deep network that samples from its la…
▽ More
The aim of this paper is to study the reward based policy exploration problem in a supervised learning approach and enable robots to form complex movement trajectories in challenging reward settings and search spaces. For this, the experience of the robot, which can be bootstrapped from demonstrated trajectories, is used to train a novel Neural Processes-based deep network that samples from its latent space and generates the required trajectories given desired rewards. Our framework can generate progressively improved trajectories by sampling them from high reward landscapes, increasing the reward gradually. Variational inference is used to create a stochastic latent space to sample varying trajectories in generating population of trajectories given target rewards. We benefit from Evolutionary Strategies and propose a novel crossover operation, which is applied in the self-organized latent space of the individual policies, allowing blending of the individuals that might address different factors in the reward function. Using a number of tasks that require sequential reaching to multiple points or passing through gaps between objects, we showed that our method provides stable learning progress and significant sample efficiency compared to a number of state-of-the-art robotic reinforcement learning methods. Finally, we show the real-world suitability of our method through real robot execution involving obstacle avoidance.
△ Less
Submitted 9 November, 2020;
originally announced November 2020.
-
ACNMP: Skill Transfer and Task Extrapolation through Learning from Demonstration and Reinforcement Learning via Representation Sharing
Authors:
M. Tuluhan Akbulut,
Erhan Oztop,
M. Yunus Seker,
Honghu Xue,
Ahmet E. Tekden,
Emre Ugur
Abstract:
To equip robots with dexterous skills, an effective approach is to first transfer the desired skill via Learning from Demonstration (LfD), then let the robot improve it by self-exploration via Reinforcement Learning (RL). In this paper, we propose a novel LfD+RL framework, namely Adaptive Conditional Neural Movement Primitives (ACNMP), that allows efficient policy improvement in novel environments…
▽ More
To equip robots with dexterous skills, an effective approach is to first transfer the desired skill via Learning from Demonstration (LfD), then let the robot improve it by self-exploration via Reinforcement Learning (RL). In this paper, we propose a novel LfD+RL framework, namely Adaptive Conditional Neural Movement Primitives (ACNMP), that allows efficient policy improvement in novel environments and effective skill transfer between different agents. This is achieved through exploiting the latent representation learned by the underlying Conditional Neural Process (CNP) model, and simultaneous training of the model with supervised learning (SL) for acquiring the demonstrated trajectories and via RL for new trajectory discovery. Through simulation experiments, we show that (i) ACNMP enables the system to extrapolate to situations where pure LfD fails; (ii) Simultaneous training of the system through SL and RL preserves the shape of demonstrations while adapting to novel situations due to the shared representations used by both learners; (iii) ACNMP enables order-of-magnitude sample-efficient RL in extrapolation of reaching tasks compared to the existing approaches; (iv) ACNMPs can be used to implement skill transfer between robots having different morphology, with competitive learning speeds and importantly with less number of assumptions compared to the state-of-the-art approaches. Finally, we show the real-world suitability of ACNMPs through real robot experiments that involve obstacle avoidance, pick and place and pouring actions.
△ Less
Submitted 9 November, 2020; v1 submitted 25 March, 2020;
originally announced March 2020.