Search | arXiv e-print repository

CUER: Corrected Uniform Experience Replay for Off-Policy Continuous Deep Reinforcement Learning Algorithms

Authors: Arda Sarp Yenicesu, Furkan B. Mutlu, Suleyman S. Kozat, Ozgur S. Oguz

Abstract: The utilization of the experience replay mechanism enables agents to effectively leverage their experiences on several occasions. In previous studies, the sampling probability of the transitions was modified based on their relative significance. The process of reassigning sample probabilities for every transition in the replay buffer after each iteration is considered extremely inefficient. Hence,… ▽ More The utilization of the experience replay mechanism enables agents to effectively leverage their experiences on several occasions. In previous studies, the sampling probability of the transitions was modified based on their relative significance. The process of reassigning sample probabilities for every transition in the replay buffer after each iteration is considered extremely inefficient. Hence, in order to enhance computing efficiency, experience replay prioritization algorithms reassess the importance of a transition as it is sampled. However, the relative importance of the transitions undergoes dynamic adjustments when the agent's policy and value function are iteratively updated. Furthermore, experience replay is a mechanism that retains the transitions generated by the agent's past policies, which could potentially diverge significantly from the agent's most recent policy. An increased deviation from the agent's most recent policy results in a greater frequency of off-policy updates, which has a negative impact on the agent's performance. In this paper, we develop a novel algorithm, Corrected Uniform Experience Replay (CUER), which stochastically samples the stored experience while considering the fairness among all other experiences without ignoring the dynamic nature of the transition importance by making sampled state distribution more on-policy. CUER provides promising improvements for off-policy continuous control algorithms in terms of sample efficiency, final performance, and stability of the policy during the training. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2403.10436 [pdf, other]

H-MaP: An Iterative and Hybrid Sequential Manipulation Planner

Authors: Berk Cicek, Cankut Bora Tuncer, Busenaz Kerimgil, Ozgur S. Oguz

Abstract: This study introduces the Hybrid Sequential Manipulation Planner (H-MaP), a novel approach that iteratively does motion planning using contact points and waypoints for complex sequential manipulation tasks in robotics. Combining optimization-based methods for generalizability and sampling-based methods for robustness, H-MaP enhances manipulation planning through active contact mode switches and en… ▽ More This study introduces the Hybrid Sequential Manipulation Planner (H-MaP), a novel approach that iteratively does motion planning using contact points and waypoints for complex sequential manipulation tasks in robotics. Combining optimization-based methods for generalizability and sampling-based methods for robustness, H-MaP enhances manipulation planning through active contact mode switches and enables interactions with auxiliary objects and tools. This framework, validated by a series of diverse physical manipulation tasks and real-robot experiments, offers a scalable and adaptable solution for complex real-world applications in robotic manipulation. △ Less

Submitted 15 March, 2024; originally announced March 2024.

arXiv:2312.02677 [pdf, other]

Contact Energy Based Hindsight Experience Prioritization

Authors: Erdi Sayar, Zhenshan Bing, Carlo D'Eramo, Ozgur S. Oguz, Alois Knoll

Abstract: Multi-goal robot manipulation tasks with sparse rewards are difficult for reinforcement learning (RL) algorithms due to the inefficiency in collecting successful experiences. Recent algorithms such as Hindsight Experience Replay (HER) expedite learning by taking advantage of failed trajectories and replacing the desired goal with one of the achieved states so that any failed trajectory can be util… ▽ More Multi-goal robot manipulation tasks with sparse rewards are difficult for reinforcement learning (RL) algorithms due to the inefficiency in collecting successful experiences. Recent algorithms such as Hindsight Experience Replay (HER) expedite learning by taking advantage of failed trajectories and replacing the desired goal with one of the achieved states so that any failed trajectory can be utilized as a contribution to learning. However, HER uniformly chooses failed trajectories, without taking into account which ones might be the most valuable for learning. In this paper, we address this problem and propose a novel approach Contact Energy Based Prioritization~(CEBP) to select the samples from the replay buffer based on rich information due to contact, leveraging the touch sensors in the gripper of the robot and object displacement. Our prioritization scheme favors sampling of contact-rich experiences, which are arguably the ones providing the largest amount of information. We evaluate our proposed approach on various sparse reward robotic tasks and compare them with the state-of-the-art methods. We show that our method surpasses or performs on par with those methods on robot manipulation tasks. Finally, we deploy the trained policy from our method to a real Franka robot for a pick-and-place task. We observe that the robot can solve the task successfully. The videos and code are publicly available at: https://erdiphd.github.io/HER_force △ Less

Submitted 23 February, 2024; v1 submitted 5 December, 2023; originally announced December 2023.

arXiv:2311.13986 [pdf, other]

FViT-Grasp: Gras** Objects With Using Fast Vision Transformers

Authors: Arda Sarp Yenicesu, Berk Cicek, Ozgur S. Oguz

Abstract: This study addresses the challenge of manipulation, a prominent issue in robotics. We have devised a novel methodology for swiftly and precisely identifying the optimal grasp point for a robot to manipulate an object. Our approach leverages a Fast Vision Transformer (FViT), a type of neural network designed for processing visual data and predicting the most suitable grasp location. Demonstrating s… ▽ More This study addresses the challenge of manipulation, a prominent issue in robotics. We have devised a novel methodology for swiftly and precisely identifying the optimal grasp point for a robot to manipulate an object. Our approach leverages a Fast Vision Transformer (FViT), a type of neural network designed for processing visual data and predicting the most suitable grasp location. Demonstrating state-of-the-art performance in terms of speed while maintaining a high level of accuracy, our method holds promise for potential deployment in real-time robotic gras** applications. We believe that this study provides a baseline for future research in vision-based robotic grasp applications. Its high speed and accuracy bring researchers closer to real-life applications. △ Less

Submitted 23 November, 2023; originally announced November 2023.

arXiv:2309.07620 [pdf, other]

Neural Field Representations of Articulated Objects for Robotic Manipulation Planning

Authors: Phillip Grote, Joaquim Ortiz-Haro, Marc Toussaint, Ozgur S. Oguz

Abstract: Traditional approaches for manipulation planning rely on an explicit geometric model of the environment to formulate a given task as an optimization problem. However, inferring an accurate model from raw sensor input is a hard problem in itself, in particular for articulated objects (e.g., closets, drawers). In this paper, we propose a Neural Field Representation (NFR) of articulated objects that… ▽ More Traditional approaches for manipulation planning rely on an explicit geometric model of the environment to formulate a given task as an optimization problem. However, inferring an accurate model from raw sensor input is a hard problem in itself, in particular for articulated objects (e.g., closets, drawers). In this paper, we propose a Neural Field Representation (NFR) of articulated objects that enables manipulation planning directly from images. Specifically, after taking a few pictures of a new articulated object, we can forward simulate its possible movements, and, therefore, use this neural model directly for planning with trajectory optimization. Additionally, this representation can be used for shape reconstruction, semantic segmentation and image rendering, which provides a strong supervision signal during training and generalization. We show that our model, which was trained only on synthetic images, is able to extract a meaningful representation for unseen objects of the same class, both in simulation and with real images. Furthermore, we demonstrate that the representation enables robotic manipulation of an articulated object in the real world directly from images. △ Less

Submitted 14 September, 2023; originally announced September 2023.

arXiv:2306.17053 [pdf, other]

Spatial Reasoning via Deep Vision Models for Robotic Sequential Manipulation

Authors: Hongyou Zhou, Ingmar Schubert, Marc Toussaint, Ozgur S. Oguz

Abstract: In this paper, we propose using deep neural architectures (i.e., vision transformers and ResNet) as heuristics for sequential decision-making in robotic manipulation problems. This formulation enables predicting the subset of objects that are relevant for completing a task. Such problems are often addressed by task and motion planning (TAMP) formulations combining symbolic reasoning and continuous… ▽ More In this paper, we propose using deep neural architectures (i.e., vision transformers and ResNet) as heuristics for sequential decision-making in robotic manipulation problems. This formulation enables predicting the subset of objects that are relevant for completing a task. Such problems are often addressed by task and motion planning (TAMP) formulations combining symbolic reasoning and continuous motion planning. In essence, the action-object relationships are resolved for discrete, symbolic decisions that are used to solve manipulation motions (e.g., via nonlinear trajectory optimization). However, solving long-horizon tasks requires consideration of all possible action-object combinations which limits the scalability of TAMP approaches. To overcome this combinatorial complexity, we introduce a visual perception module integrated with a TAMP-solver. Given a task and an initial image of the scene, the learned model outputs the relevancy of objects to accomplish the task. By incorporating the predictions of the model into a TAMP formulation as a heuristic, the size of the search space is significantly reduced. Results show that our framework finds feasible solutions more efficiently when compared to a state-of-the-art TAMP solver. △ Less

Submitted 1 August, 2023; v1 submitted 29 June, 2023; originally announced June 2023.

Comments: 8 pages, 8 figures, IROS 2023

Report number: 1707

arXiv:2111.07908 [pdf, other]

Learning to Execute: Efficient Learning of Universal Plan-Conditioned Policies in Robotics

Authors: Ingmar Schubert, Danny Driess, Ozgur S. Oguz, Marc Toussaint

Abstract: Applications of Reinforcement Learning (RL) in robotics are often limited by high data demand. On the other hand, approximate models are readily available in many robotics scenarios, making model-based approaches like planning a data-efficient alternative. Still, the performance of these methods suffers if the model is imprecise or wrong. In this sense, the respective strengths and weaknesses of R… ▽ More Applications of Reinforcement Learning (RL) in robotics are often limited by high data demand. On the other hand, approximate models are readily available in many robotics scenarios, making model-based approaches like planning a data-efficient alternative. Still, the performance of these methods suffers if the model is imprecise or wrong. In this sense, the respective strengths and weaknesses of RL and model-based planners are. In the present work, we investigate how both approaches can be integrated into one framework that combines their strengths. We introduce Learning to Execute (L2E), which leverages information contained in approximate plans to learn universal policies that are conditioned on plans. In our robotic manipulation experiments, L2E exhibits increased performance when compared to pure RL, pure planning, or baseline methods combining learning and planning. △ Less

Submitted 15 November, 2021; originally announced November 2021.

Journal ref: 35th Conference on Neural Information Processing Systems (NeurIPS 2021), Sydney, Australia

arXiv:2110.03420 [pdf, other]

RHH-LGP: Receding Horizon And Heuristics-Based Logic-Geometric Programming For Task And Motion Planning

Authors: Cornelius V. Braun, Joaquim Ortiz-Haro, Marc Toussaint, Ozgur S. Oguz

Abstract: Sequential decision-making and motion planning for robotic manipulation induce combinatorial complexity. For long-horizon tasks, especially when the environment comprises many objects that can be interacted with, planning efficiency becomes even more important. To plan such long-horizon tasks, we present the RHH-LGP algorithm for combined task and motion planning (TAMP). First, we propose a TAMP a… ▽ More Sequential decision-making and motion planning for robotic manipulation induce combinatorial complexity. For long-horizon tasks, especially when the environment comprises many objects that can be interacted with, planning efficiency becomes even more important. To plan such long-horizon tasks, we present the RHH-LGP algorithm for combined task and motion planning (TAMP). First, we propose a TAMP approach (based on Logic-Geometric Programming) that effectively uses geometry-based heuristics for solving long-horizon manipulation tasks. The efficiency of this planner is then further improved by a receding horizon formulation, resulting in RHH-LGP. We demonstrate the robustness and effectiveness of our approach on a diverse range of long-horizon tasks that require reasoning about interactions with a large number of objects. Using our framework, we can solve tasks that require multiple robots, including a mobile robot and snake-like walking robots, to form novel heterogeneous kinematic structures autonomously. By combining geometry-based heuristics with iterative planning, our approach brings an order-of-magnitude reduction of planning time in all investigated problems. △ Less

Submitted 6 March, 2022; v1 submitted 7 October, 2021; originally announced October 2021.

Comments: for source code, see https://github.com/cornelius-braun/rhh-lgp

ACM Class: I.2.9

arXiv:2109.05077 [pdf, ps, other]

Data Generation Method for Learning a Low-dimensional Safe Region in Safe Reinforcement Learning

Authors: Zhehua Zhou, Ozgur S. Oguz, Yi Ren, Marion Leibold, Martin Buss

Abstract: Safe reinforcement learning aims to learn a control policy while ensuring that neither the system nor the environment gets damaged during the learning process. For implementing safe reinforcement learning on highly nonlinear and high-dimensional dynamical systems, one possible approach is to find a low-dimensional safe region via data-driven feature extraction methods, which provides safety estima… ▽ More Safe reinforcement learning aims to learn a control policy while ensuring that neither the system nor the environment gets damaged during the learning process. For implementing safe reinforcement learning on highly nonlinear and high-dimensional dynamical systems, one possible approach is to find a low-dimensional safe region via data-driven feature extraction methods, which provides safety estimates to the learning algorithm. As the reliability of the learned safety estimates is data-dependent, we investigate in this work how different training data will affect the safe reinforcement learning approach. By balancing between the learning performance and the risk of being unsafe, a data generation method that combines two sampling methods is proposed to generate representative training data. The performance of the method is demonstrated with a three-link inverted pendulum example. △ Less

Submitted 10 September, 2021; originally announced September 2021.

arXiv:2107.06661 [pdf, other]

Plan-Based Relaxed Reward Sha** for Goal-Directed Tasks

Authors: Ingmar Schubert, Ozgur S. Oguz, Marc Toussaint

Abstract: In high-dimensional state spaces, the usefulness of Reinforcement Learning (RL) is limited by the problem of exploration. This issue has been addressed using potential-based reward sha** (PB-RS) previously. In the present work, we introduce Final-Volume-Preserving Reward Sha** (FV-RS). FV-RS relaxes the strict optimality guarantees of PB-RS to a guarantee of preserved long-term behavior. Being… ▽ More In high-dimensional state spaces, the usefulness of Reinforcement Learning (RL) is limited by the problem of exploration. This issue has been addressed using potential-based reward sha** (PB-RS) previously. In the present work, we introduce Final-Volume-Preserving Reward Sha** (FV-RS). FV-RS relaxes the strict optimality guarantees of PB-RS to a guarantee of preserved long-term behavior. Being less restrictive, FV-RS allows for reward sha** functions that are even better suited for improving the sample efficiency of RL algorithms. In particular, we consider settings in which the agent has access to an approximate plan. Here, we use examples of simulated robotic manipulation tasks to demonstrate that plan-based FV-RS can indeed significantly improve the sample efficiency of RL over plan-based PB-RS. △ Less

Submitted 14 July, 2021; originally announced July 2021.

Comments: Published as a conference paper at ICLR 2021

Journal ref: ICLR 2021 - 9th International Conference on Learning Representations

arXiv:2106.02489 [pdf, other]

doi 10.1109/TRO.2022.3198020

Long-Horizon Multi-Robot Rearrangement Planning for Construction Assembly

Authors: Valentin Noah Hartmann, Andreas Orthey, Danny Driess, Ozgur S. Oguz, Marc Toussaint

Abstract: Robotic assembly planning enables architects to explicitly account for the assembly process during the design phase, and enables efficient building methods that profit from the robots' different capabilities. Previous work has addressed planning of robot assembly sequences and identifying the feasibility of architectural designs. This paper extends previous work by enabling planning with large, he… ▽ More Robotic assembly planning enables architects to explicitly account for the assembly process during the design phase, and enables efficient building methods that profit from the robots' different capabilities. Previous work has addressed planning of robot assembly sequences and identifying the feasibility of architectural designs. This paper extends previous work by enabling planning with large, heterogeneous teams of robots. We present a planning system which enables parallelization of complex task and motion planning problems by iteratively solving smaller subproblems. Combining optimization methods to solve for manipulation constraints with a sampling-based bi-directional space-time path planner enables us to plan cooperative multi-robot manipulation with unknown arrival-times. Thus, our solver allows for completing subproblems and tasks with differing timescales and synchronizes them effectively. We demonstrate the approach on multiple case-studies to show the robustness over long planning horizons and scalability to many objects and agents of our algorithm. Finally, we also demonstrate the execution of the computed plans on two robot arms to showcase the feasibility in the real world. △ Less

Submitted 7 March, 2022; v1 submitted 4 June, 2021; originally announced June 2021.

Comments: 13 pages, 16 Figures, 2 Tables, 3 Algorithms

Journal ref: IEEE Transactions on Robotics (Volume: 39, Issue: 1, February 2023)

arXiv:2101.12075 [pdf, other]

doi 10.1145/3430036.3430050

Visualization of Nonlinear Programming for Robot Motion Planning

Authors: David Hägele, Moataz Abdelaal, Ozgur S. Oguz, Marc Toussaint, Daniel Weiskopf

Abstract: Nonlinear programming targets nonlinear optimization with constraints, which is a generic yet complex methodology involving humans for problem modeling and algorithms for problem solving. We address the particularly hard challenge of supporting domain experts in handling, understanding, and trouble-shooting high-dimensional optimization with a large number of constraints. Leveraging visual analyti… ▽ More Nonlinear programming targets nonlinear optimization with constraints, which is a generic yet complex methodology involving humans for problem modeling and algorithms for problem solving. We address the particularly hard challenge of supporting domain experts in handling, understanding, and trouble-shooting high-dimensional optimization with a large number of constraints. Leveraging visual analytics, users are supported in exploring the computation process of nonlinear constraint optimization. Our system was designed for robot motion planning problems and developed in tight collaboration with domain experts in nonlinear programming and robotics. We report on the experiences from this design study, illustrate the usefulness for relevant example cases, and discuss the extension to visual analytics for nonlinear programming in general. △ Less

Submitted 28 January, 2021; originally announced January 2021.

Comments: 8 pages, 6 figures

ACM Class: H.5.2; G.1.6

Journal ref: Proceedings of the 13th International Symposium on Visual Information Communication and Interaction (2020), Article No. 10, Pages 1-8

arXiv:2011.04828 [pdf, other]

Learning Efficient Constraint Graph Sampling for Robotic Sequential Manipulation

Authors: Joaquim Ortiz-Haro, Valentin N. Hartmann, Ozgur S. Oguz, Marc Toussaint

Abstract: Efficient sampling from constraint manifolds, and thereby generating a diverse set of solutions for feasibility problems, is a fundamental challenge. We consider the case where a problem is factored, that is, the underlying nonlinear program is decomposed into differentiable equality and inequality constraints, each of which depends only on some variables. Such problems are at the core of efficien… ▽ More Efficient sampling from constraint manifolds, and thereby generating a diverse set of solutions for feasibility problems, is a fundamental challenge. We consider the case where a problem is factored, that is, the underlying nonlinear program is decomposed into differentiable equality and inequality constraints, each of which depends only on some variables. Such problems are at the core of efficient and robust sequential robot manipulation planning. Naive sequential conditional sampling of individual variables, as well as fully joint sampling of all variables at once (e.g., leveraging optimization methods), can be highly inefficient and non-robust. We propose a novel framework to learn how to break the overall problem into smaller sequential sampling problems. Specifically, we leverage Monte-Carlo Tree Search to learn assignment orders for the variable-subsets, in order to minimize the computation time to generate feasible full samples. This strategy allows us to efficiently compute a set of diverse valid robot configurations for mode-switches within sequential manipulation tasks, which are waypoints for subsequent trajectory optimization or sampling-based motion planning algorithms. We show that the learning method quickly converges to the best sampling strategy for a given problem, and outperforms user-defined orderings or fully joint optimization, while providing a higher sample diversity. △ Less

Submitted 29 March, 2021; v1 submitted 9 November, 2020; originally announced November 2020.

arXiv:2010.09555 [pdf, other]

doi 10.1109/TNNLS.2021.3106818

Learning a Low-dimensional Representation of a Safe Region for Safe Reinforcement Learning on Dynamical Systems

Authors: Zhehua Zhou, Ozgur S. Oguz, Marion Leibold, Martin Buss

Abstract: For safely applying reinforcement learning algorithms on high-dimensional nonlinear dynamical systems, a simplified system model is used to formulate a safe reinforcement learning framework. Based on the simplified system model, a low-dimensional representation of the safe region is identified and is used to provide safety estimates for learning algorithms. However, finding a satisfying simplified… ▽ More For safely applying reinforcement learning algorithms on high-dimensional nonlinear dynamical systems, a simplified system model is used to formulate a safe reinforcement learning framework. Based on the simplified system model, a low-dimensional representation of the safe region is identified and is used to provide safety estimates for learning algorithms. However, finding a satisfying simplified system model for complex dynamical systems usually requires a considerable amount of effort. To overcome this limitation, we propose in this work a general data-driven approach that is able to efficiently learn a low-dimensional representation of the safe region. Through an online adaptation method, the low-dimensional representation is updated by using the feedback data such that more accurate safety estimates are obtained. The performance of the proposed approach for identifying the low-dimensional representation of the safe region is demonstrated with a quadcopter example. The results show that, compared to previous work, a more reliable and representative low-dimensional representation of the safe region is derived, which then extends the applicability of the safe reinforcement learning framework. △ Less

Submitted 8 September, 2021; v1 submitted 19 October, 2020; originally announced October 2020.

arXiv:2003.07754 [pdf, other]

doi 10.1109/IROS45743.2020.9341502

Robust Task and Motion Planning for Long-Horizon Architectural Construction Planning

Authors: Valentin N. Hartmann, Ozgur S. Oguz, Danny Driess, Marc Toussaint, Achim Menges

Abstract: Integrating robotic systems in architectural and construction processes is of core interest to increase the efficiency of the building industry. Automated planning for such systems enables design analysis tools and facilitates faster design iteration cycles for designers and engineers. However, generic task-and-motion planning (TAMP) for long-horizon construction processes is beyond the capabiliti… ▽ More Integrating robotic systems in architectural and construction processes is of core interest to increase the efficiency of the building industry. Automated planning for such systems enables design analysis tools and facilitates faster design iteration cycles for designers and engineers. However, generic task-and-motion planning (TAMP) for long-horizon construction processes is beyond the capabilities of current approaches. In this paper, we develop a multi-agent TAMP framework for long horizon problems such as constructing a full-scale building. To this end we extend the Logic-Geometric Programming framework by sampling-based motion planning,a limited horizon approach, and a task-specific structural stability optimization that allow an effective decomposition of the task. We show that our framework is capable of constructing a large pavilion built from several hundred geometrically unique building elements from start to end autonomously. △ Less

Submitted 17 March, 2020; originally announced March 2020.

Showing 1–15 of 15 results for author: Oguz, O S