Search | arXiv e-print repository

Deriving Rewards for Reinforcement Learning from Symbolic Behaviour Descriptions of Bipedal Walking

Authors: Daniel Harnack, Christoph Lüth, Lukas Gross, Shivesh Kumar, Frank Kirchner

Abstract: Generating physical movement behaviours from their symbolic description is a long-standing challenge in artificial intelligence (AI) and robotics, requiring insights into numerical optimization methods as well as into formalizations from symbolic AI and reasoning. In this paper, a novel approach to finding a reward function from a symbolic description is proposed. The intended system behaviour is… ▽ More Generating physical movement behaviours from their symbolic description is a long-standing challenge in artificial intelligence (AI) and robotics, requiring insights into numerical optimization methods as well as into formalizations from symbolic AI and reasoning. In this paper, a novel approach to finding a reward function from a symbolic description is proposed. The intended system behaviour is modelled as a hybrid automaton, which reduces the system state space to allow more efficient reinforcement learning. The approach is applied to bipedal walking, by modelling the walking robot as a hybrid automaton over state space orthants, and used with the compass walker to derive a reward that incentivizes following the hybrid automaton cycle. As a result, training times of reinforcement learning controllers are reduced while final walking speed is increased. The approach can serve as a blueprint how to generate reward functions from symbolic AI and reasoning. △ Less

Submitted 16 December, 2023; originally announced December 2023.

Comments: To appear in 62nd IEEE Conference on Decision and Control (CDC). For supplemental material, see here https://dfki-ric-underactuated-lab.github.io/orthant_rewards_biped_rl/

ACM Class: I.2.9; I.2.8; I.2.6

arXiv:2307.16676 [pdf, other]

doi 10.1109/IROS55552.2023.10342187

End-to-End Reinforcement Learning for Torque Based Variable Height Hop**

Authors: Raghav Soni, Daniel Harnack, Hauke Isermann, Sotaro Fushimi, Shivesh Kumar, Frank Kirchner

Abstract: Legged locomotion is arguably the most suited and versatile mode to deal with natural or unstructured terrains. Intensive research into dynamic walking and running controllers has recently yielded great advances, both in the optimal control and reinforcement learning (RL) literature. Hop** is a challenging dynamic task involving a flight phase and has the potential to increase the traversability… ▽ More Legged locomotion is arguably the most suited and versatile mode to deal with natural or unstructured terrains. Intensive research into dynamic walking and running controllers has recently yielded great advances, both in the optimal control and reinforcement learning (RL) literature. Hop** is a challenging dynamic task involving a flight phase and has the potential to increase the traversability of legged robots. Model based control for hop** typically relies on accurate detection of different jump phases, such as lift-off or touch down, and using different controllers for each phase. In this paper, we present a end-to-end RL based torque controller that learns to implicitly detect the relevant jump phases, removing the need to provide manual heuristics for state detection. We also extend a method for simulation to reality transfer of the learned controller to contact rich dynamic tasks, resulting in successful deployment on the robot after training without parameter tuning. △ Less

Submitted 18 December, 2023; v1 submitted 31 July, 2023; originally announced July 2023.

Comments: Update publication info. Cite as: R. Soni, D. Harnack, H. Isermann, S. Fushimi, S. Kumar and F. Kirchner, "End-to-End Reinforcement Learning for Torque Based Variable Height Hop**," 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Detroit, MI, USA, 2023, pp. 7531-7538, doi: 10.1109/IROS55552.2023.10342187

Journal ref: End-to-End Reinforcement Learning for Torque Based Variable Height Hop**, 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Detroit, MI, USA, 2023, pp. 7531-7538

arXiv:2305.08373 [pdf, other]

doi 10.1109/LRA.2023.3269296

AcroMonk: A Minimalist Underactuated Brachiating Robot

Authors: Mahdi Javadi, Daniel Harnack, Paula Stocco, Shivesh Kumar, Shubham Vyas, Daniel Pizzutilo, Frank Kirchner

Abstract: Brachiation is a dynamic, coordinated swinging maneuver of body and arms used by monkeys and apes to move between branches. As a unique underactuated mode of locomotion, it is interesting to study from a robotics perspective since it can broaden the deployment scenarios for humanoids and animaloids. While several brachiating robots of varying complexity have been proposed in the past, this paper p… ▽ More Brachiation is a dynamic, coordinated swinging maneuver of body and arms used by monkeys and apes to move between branches. As a unique underactuated mode of locomotion, it is interesting to study from a robotics perspective since it can broaden the deployment scenarios for humanoids and animaloids. While several brachiating robots of varying complexity have been proposed in the past, this paper presents the simplest possible prototype of a brachiation robot, using only a single actuator and unactuated grippers. The novel passive gripper design allows it to snap on and release from monkey bars, while guaranteeing well defined start and end poses of the swing. The brachiation behavior is realized in three different ways, using trajectory optimization via direct collocation and stabilization by a model-based time-varying linear quadratic regulator (TVLQR) or model-free proportional derivative (PD) control, as well as by a reinforcement learning (RL) based control policy. The three control schemes are compared in terms of robustness to disturbances, mass uncertainty, and energy consumption. The system design and controllers have been open-sourced. Due to its minimal and open design, the system can serve as a canonical underactuated platform for education and research. △ Less

Submitted 15 May, 2023; originally announced May 2023.

Comments: The open-source implementation is available at https://github.com/dfki-ric-underactuated-lab/acromonk and a video demonstration of the experiments can be accessed at https://youtu.be/FIcDNtJo9Jc}

Journal ref: journal={IEEE Robotics and Automation Letters}, year={2023}, volume={8}, number={6}, pages={3637-3644}

arXiv:2207.09845 [pdf, other]

doi 10.1007/s00521-022-07949-0

Quantifying the Effect of Feedback Frequency in Interactive Reinforcement Learning for Robotic Tasks

Authors: Daniel Harnack, Julie Pivin-Bachler, Nicolás Navarro-Guerrero

Abstract: Reinforcement learning (RL) has become widely adopted in robot control. Despite many successes, one major persisting problem can be very low data efficiency. One solution is interactive feedback, which has been shown to speed up RL considerably. As a result, there is an abundance of different strategies, which are, however, primarily tested on discrete grid-world and small scale optimal control sc… ▽ More Reinforcement learning (RL) has become widely adopted in robot control. Despite many successes, one major persisting problem can be very low data efficiency. One solution is interactive feedback, which has been shown to speed up RL considerably. As a result, there is an abundance of different strategies, which are, however, primarily tested on discrete grid-world and small scale optimal control scenarios. In the literature, there is no consensus about which feedback frequency is optimal or at which time the feedback is most beneficial. To resolve these discrepancies we isolate and quantify the effect of feedback frequency in robotic tasks with continuous state and action spaces. The experiments encompass inverse kinematics learning for robotic manipulator arms of different complexity. We show that seemingly contradictory reported phenomena occur at different complexity levels. Furthermore, our results suggest that no single ideal feedback frequency exists. Rather that feedback frequency should be changed as the agent's proficiency in the task increases. △ Less

Submitted 15 March, 2023; v1 submitted 20 July, 2022; originally announced July 2022.

Comments: Neural Computing and Applications (2022). Special Issue on Human-aligned Reinforcement Learning for Autonomous Agents and Robots

arXiv:2112.03164 [pdf, other]

Feature Disentanglement of Robot Trajectories

Authors: Matias Valdenegro-Toro, Daniel Harnack, Hendrik Wöhrle

Abstract: Modeling trajectories generated by robot joints is complex and required for high level activities like trajectory generation, clustering, and classification. Disentagled representation learning promises advances in unsupervised learning, but they have not been evaluated in robot-generated trajectories. In this paper we evaluate three disentangling VAEs ($β$-VAE, Decorr VAE, and a new $β$-Decorr VA… ▽ More Modeling trajectories generated by robot joints is complex and required for high level activities like trajectory generation, clustering, and classification. Disentagled representation learning promises advances in unsupervised learning, but they have not been evaluated in robot-generated trajectories. In this paper we evaluate three disentangling VAEs ($β$-VAE, Decorr VAE, and a new $β$-Decorr VAE) on a dataset of 1M robot trajectories generated from a 3 DoF robot arm. We find that the decorrelation-based formulations perform the best in terms of disentangling metrics, trajectory quality, and correlation with ground truth latent features. We expect that these results increase the use of unsupervised learning in robot control. △ Less

Submitted 6 December, 2021; originally announced December 2021.

Comments: 5 pages, 3 figures, 1 table, with supplementary

arXiv:2007.14928 [pdf, other]

A Development Cycle for Automated Self-Exploration of Robot Behaviors

Authors: Thomas M. Roehr, Daniel Harnack, Hendrik Wöhrle, Felix Wiebe, Moritz Schilling, Oscar Lima, Malte Langosz, Shivesh Kumar, Sirko Straube, Frank Kirchner

Abstract: In this paper we introduce Q-Rock, a development cycle for the automated self-exploration and qualification of robot behaviors. With Q-Rock, we suggest a novel, integrative approach to automate robot development processes. Q-Rock combines several machine learning and reasoning techniques to deal with the increasing complexity in the design of robotic systems. The Q-Rock development cycle consists… ▽ More In this paper we introduce Q-Rock, a development cycle for the automated self-exploration and qualification of robot behaviors. With Q-Rock, we suggest a novel, integrative approach to automate robot development processes. Q-Rock combines several machine learning and reasoning techniques to deal with the increasing complexity in the design of robotic systems. The Q-Rock development cycle consists of three complementary processes: (1) automated exploration of capabilities that a given robotic hardware provides, (2) classification and semantic annotation of these capabilities to generate more complex behaviors, and (3) map** between application requirements and available behaviors. These processes are based on a graph-based representation of a robot's structure, including hardware and software components. A central, scalable knowledge base enables collaboration of robot designers including mechanical, electrical and systems engineers, software developers and machine learning experts. In this paper we formalize Q-Rock's integrative development cycle and highlight its benefits with a proof-of-concept implementation and a use case demonstration. △ Less

Submitted 20 March, 2021; v1 submitted 29 July, 2020; originally announced July 2020.

Comments: 30 pages, 16 figures, 4 tables

ACM Class: I.2.9

arXiv:2005.12064 [pdf, other]

Combinatorics of a Discrete Trajectory Space for Robot Motion Planning

Authors: Felix Wiebe, Shivesh Kumar, Daniel Harnack, Malte Langosz, Hendrik Wöhrle, Frank Kirchner

Abstract: Motion planning is a difficult problem in robot control. The complexity of the problem is directly related to the dimension of the robot's configuration space. While in many theoretical calculations and practical applications the configuration space is modeled as a continuous space, we present a discrete robot model based on the fundamental hardware specifications of a robot. Using lattice path me… ▽ More Motion planning is a difficult problem in robot control. The complexity of the problem is directly related to the dimension of the robot's configuration space. While in many theoretical calculations and practical applications the configuration space is modeled as a continuous space, we present a discrete robot model based on the fundamental hardware specifications of a robot. Using lattice path methods, we provide estimates for the complexity of motion planning by counting the number of possible trajectories in a discrete robot configuration space. △ Less

Submitted 25 May, 2020; originally announced May 2020.

Comments: 8 pages, 3 figures, to be published in the proceedings of 2nd IMA Conference on Mathematics of Robotics 2021

Showing 1–7 of 7 results for author: Harnack, D