Search | arXiv e-print repository

Constrained Reinforcement Learning with Smoothed Log Barrier Function

Authors: Baohe Zhang, Yuan Zhang, Lilli Frison, Thomas Brox, Joschka Bödecker

Abstract: Reinforcement Learning (RL) has been widely applied to many control tasks and substantially improved the performances compared to conventional control methods in many domains where the reward function is well defined. However, for many real-world problems, it is often more convenient to formulate optimization problems in terms of rewards and constraints simultaneously. Optimizing such constrained… ▽ More Reinforcement Learning (RL) has been widely applied to many control tasks and substantially improved the performances compared to conventional control methods in many domains where the reward function is well defined. However, for many real-world problems, it is often more convenient to formulate optimization problems in terms of rewards and constraints simultaneously. Optimizing such constrained problems via reward sha** can be difficult as it requires tedious manual tuning of reward functions with several interacting terms. Recent formulations which include constraints mostly require a pre-training phase, which often needs human expertise to collect data or assumes having a sub-optimal policy readily available. We propose a new constrained RL method called CSAC-LB (Constrained Soft Actor-Critic with Log Barrier Function), which achieves competitive performance without any pre-training by applying a linear smoothed log barrier function to an additional safety critic. It implements an adaptive penalty for policy learning and alleviates the numerical issues that are known to complicate the application of the log barrier function method. As a result, we show that with CSAC-LB, we achieve state-of-the-art performance on several constrained control tasks with different levels of difficulty and evaluate our methods in a locomotion task on a real quadruped robot platform. △ Less

Submitted 21 March, 2024; originally announced March 2024.

arXiv:2310.07029 [pdf, other]

Brain Age Revisited: Investigating the State vs. Trait Hypotheses of EEG-derived Brain-Age Dynamics with Deep Learning

Authors: Lukas AW Gemein, Robin T Schirrmeister, Joschka Boedecker, Tonio Ball

Abstract: The brain's biological age has been considered as a promising candidate for a neurologically significant biomarker. However, recent results based on longitudinal magnetic resonance imaging data have raised questions on its interpretation. A central question is whether an increased biological age of the brain is indicative of brain pathology and if changes in brain age correlate with diagnosed path… ▽ More The brain's biological age has been considered as a promising candidate for a neurologically significant biomarker. However, recent results based on longitudinal magnetic resonance imaging data have raised questions on its interpretation. A central question is whether an increased biological age of the brain is indicative of brain pathology and if changes in brain age correlate with diagnosed pathology (state hypothesis). Alternatively, could the discrepancy in brain age be a stable characteristic unique to each individual (trait hypothesis)? To address this question, we present a comprehensive study on brain aging based on clinical EEG, which is complementary to previous MRI-based investigations. We apply a state-of-the-art Temporal Convolutional Network (TCN) to the task of age regression. We train on recordings of the Temple University Hospital EEG Corpus (TUEG) explicitly labeled as non-pathological and evaluate on recordings of subjects with non-pathological as well as pathological recordings, both with examinations at a single point in time and repeated examinations over time. Therefore, we created four novel subsets of TUEG that include subjects with multiple recordings: I) all labeled non-pathological; II) all labeled pathological; III) at least one recording labeled non-pathological followed by at least one recording labeled pathological; IV) similar to III) but with opposing transition (first pathological then non-pathological). The results show that our TCN reaches state-of-the-art performance in age decoding with a mean absolute error of 6.6 years. Our extensive analyses demonstrate that the model significantly underestimates the age of non-pathological and pathological subjects (-1 and -5 years, paired t-test, p <= 0.18 and p <= 0.0066). Furthermore, the brain age gap biomarker is not indicative of pathological EEG. △ Less

Submitted 22 September, 2023; originally announced October 2023.

arXiv:2212.01607 [pdf, other]

A Hierarchical Approach for Strategic Motion Planning in Autonomous Racing

Authors: Rudolf Reiter, Jasper Hoffmann, Joschka Boedecker, Moritz Diehl

Abstract: We present an approach for safe trajectory planning, where a strategic task related to autonomous racing is learned sample-efficient within a simulation environment. A high-level policy, represented as a neural network, outputs a reward specification that is used within the cost function of a parametric nonlinear model predictive controller (NMPC). By including constraints and vehicle kinematics… ▽ More We present an approach for safe trajectory planning, where a strategic task related to autonomous racing is learned sample-efficient within a simulation environment. A high-level policy, represented as a neural network, outputs a reward specification that is used within the cost function of a parametric nonlinear model predictive controller (NMPC). By including constraints and vehicle kinematics in the NLP, we are able to guarantee safe and feasible trajectories related to the used model. Compared to classical reinforcement learning (RL), our approach restricts the exploration to safe trajectories, starts with a good prior performance and yields full trajectories that can be passed to a tracking lowest-level controller. We do not address the lowest-level controller in this work and assume perfect tracking of feasible trajectories. We show the superior performance of our algorithm on simulated racing tasks that include high-level decision making. The vehicle learns to efficiently overtake slower vehicles and to avoid getting overtaken by blocking faster vehicles. △ Less

Submitted 3 December, 2022; originally announced December 2022.

arXiv:2207.02016 [pdf, other]

Robust Reinforcement Learning in Continuous Control Tasks with Uncertainty Set Regularization

Authors: Yuan Zhang, Jianhong Wang, Joschka Boedecker

Abstract: Reinforcement learning (RL) is recognized as lacking generalization and robustness under environmental perturbations, which excessively restricts its application for real-world robotics. Prior work claimed that adding regularization to the value function is equivalent to learning a robust policy with uncertain transitions. Although the regularization-robustness transformation is appealing for its… ▽ More Reinforcement learning (RL) is recognized as lacking generalization and robustness under environmental perturbations, which excessively restricts its application for real-world robotics. Prior work claimed that adding regularization to the value function is equivalent to learning a robust policy with uncertain transitions. Although the regularization-robustness transformation is appealing for its simplicity and efficiency, it is still lacking in continuous control tasks. In this paper, we propose a new regularizer named $\textbf{U}$ncertainty $\textbf{S}$et $\textbf{R}$egularizer (USR), by formulating the uncertainty set on the parameter space of the transition function. In particular, USR is flexible enough to be plugged into any existing RL framework. To deal with unknown uncertainty sets, we further propose a novel adversarial approach to generate them based on the value function. We evaluate USR on the Real-world Reinforcement Learning (RWRL) benchmark, demonstrating improvements in the robust performance for perturbed testing environments. △ Less

Submitted 5 December, 2023; v1 submitted 5 July, 2022; originally announced July 2022.

Comments: Accepted at CoRL 2023

arXiv:2106.04306 [pdf, other]

Residual Feedback Learning for Contact-Rich Manipulation Tasks with Uncertainty

Authors: Alireza Ranjbar, Ngo Anh Vien, Hanna Ziesche, Joschka Boedecker, Gerhard Neumann

Abstract: While classic control theory offers state of the art solutions in many problem scenarios, it is often desired to improve beyond the structure of such solutions and surpass their limitations. To this end, residual policy learning (RPL) offers a formulation to improve existing controllers with reinforcement learning (RL) by learning an additive "residual" to the output of a given controller. However… ▽ More While classic control theory offers state of the art solutions in many problem scenarios, it is often desired to improve beyond the structure of such solutions and surpass their limitations. To this end, residual policy learning (RPL) offers a formulation to improve existing controllers with reinforcement learning (RL) by learning an additive "residual" to the output of a given controller. However, the applicability of such an approach highly depends on the structure of the controller. Often, internal feedback signals of the controller limit an RL algorithm to adequately change the policy and, hence, learn the task. We propose a new formulation that addresses these limitations by also modifying the feedback signals to the controller with an RL policy and show superior performance of our approach on a contact-rich peg-insertion task under position and orientation uncertainty. In addition, we use a recent Cartesian impedance control architecture as the control framework which can be available to us as a black-box while assuming no knowledge about its input/output structure, and show the difficulties of standard RPL. Furthermore, we introduce an adaptive curriculum for the given task to gradually increase the task difficulty in terms of position and orientation uncertainty. A video showing the results can be found at https://youtu.be/SAZm_Krze7U . △ Less

Submitted 6 August, 2021; v1 submitted 8 June, 2021; originally announced June 2021.

arXiv:2002.05115 [pdf, other]

doi 10.1016/j.neuroimage.2020.117021

Machine-Learning-Based Diagnostics of EEG Pathology

Authors: Lukas Alexander Wilhelm Gemein, Robin Tibor Schirrmeister, Patryk Chrabąszcz, Daniel Wilson, Joschka Boedecker, Andreas Schulze-Bonhage, Frank Hutter, Tonio Ball

Abstract: Machine learning (ML) methods have the potential to automate clinical EEG analysis. They can be categorized into feature-based (with handcrafted features), and end-to-end approaches (with learned features). Previous studies on EEG pathology decoding have typically analyzed a limited number of features, decoders, or both. For a I) more elaborate feature-based EEG analysis, and II) in-depth comparis… ▽ More Machine learning (ML) methods have the potential to automate clinical EEG analysis. They can be categorized into feature-based (with handcrafted features), and end-to-end approaches (with learned features). Previous studies on EEG pathology decoding have typically analyzed a limited number of features, decoders, or both. For a I) more elaborate feature-based EEG analysis, and II) in-depth comparisons of both approaches, here we first develop a comprehensive feature-based framework, and then compare this framework to state-of-the-art end-to-end methods. To this aim, we apply the proposed feature-based framework and deep neural networks including an EEG-optimized temporal convolutional network (TCN) to the task of pathological versus non-pathological EEG classification. For a robust comparison, we chose the Temple University Hospital (TUH) Abnormal EEG Corpus (v2.0.0), which contains approximately 3000 EEG recordings. The results demonstrate that the proposed feature-based decoding framework can achieve accuracies on the same level as state-of-the-art deep neural networks. We find accuracies across both approaches in an astonishingly narrow range from 81--86\%. Moreover, visualizations and analyses indicated that both approaches used similar aspects of the data, e.g., delta and theta band power at temporal electrode locations. We argue that the accuracies of current binary EEG pathology decoders could saturate near 90\% due to the imperfect inter-rater agreement of the clinical labels, and that such decoders are already clinically useful, such as in areas where clinical EEG experts are rare. We make the proposed feature-based framework available open source and thus offer a new tool for EEG machine learning research. △ Less

Submitted 11 February, 2020; originally announced February 2020.

Journal ref: NeuroImage, Volume 220, 15 October 2020, 117021

arXiv:1906.12189 [pdf, other]

Learning-based Model Predictive Control for Safe Exploration and Reinforcement Learning

Authors: Torsten Koller, Felix Berkenkamp, Matteo Turchetta, Joschka Boedecker, Andreas Krause

Abstract: Reinforcement learning has been successfully used to solve difficult tasks in complex unknown environments. However, these methods typically do not provide any safety guarantees during the learning process. This is particularly problematic, since reinforcement learning agent actively explore their environment. This prevents their use in safety-critical, real-world applications. In this paper, we p… ▽ More Reinforcement learning has been successfully used to solve difficult tasks in complex unknown environments. However, these methods typically do not provide any safety guarantees during the learning process. This is particularly problematic, since reinforcement learning agent actively explore their environment. This prevents their use in safety-critical, real-world applications. In this paper, we present a learning-based model predictive control scheme that provides high-probability safety guarantees throughout the learning process. Based on a reliable statistical model, we construct provably accurate confidence intervals on predicted trajectories. Unlike previous approaches, we allow for input-dependent uncertainties. Based on these reliable predictions, we guarantee that trajectories satisfy safety constraints. Moreover, we use a terminal set constraint to recursively guarantee the existence of safe control actions at every iteration. We evaluate the resulting algorithm to safely explore the dynamics of an inverted pendulum and to solve a reinforcement learning task on a cart-pole system with safety constraints. △ Less

Submitted 27 June, 2019; originally announced June 2019.

Comments: 14 pages, 7 figures. arXiv admin note: substantial text overlap with arXiv:1803.08287

arXiv:1612.07139 [pdf, other]

A Survey of Deep Network Solutions for Learning Control in Robotics: From Reinforcement to Imitation

Authors: Lei Tai, **gwei Zhang, Ming Liu, Joschka Boedecker, Wolfram Burgard

Abstract: Deep learning techniques have been widely applied, achieving state-of-the-art results in various fields of study. This survey focuses on deep learning solutions that target learning control policies for robotics applications. We carry out our discussions on the two main paradigms for learning control with deep networks: deep reinforcement learning and imitation learning. For deep reinforcement lea… ▽ More Deep learning techniques have been widely applied, achieving state-of-the-art results in various fields of study. This survey focuses on deep learning solutions that target learning control policies for robotics applications. We carry out our discussions on the two main paradigms for learning control with deep networks: deep reinforcement learning and imitation learning. For deep reinforcement learning (DRL), we begin from traditional reinforcement learning algorithms, showing how they are extended to the deep context and effective mechanisms that could be added on top of the DRL algorithms. We then introduce representative works that utilize DRL to solve navigation and manipulation tasks in robotics. We continue our discussion on methods addressing the challenge of the reality gap for transferring DRL policies trained in simulation to real-world scenarios, and summarize robotics simulation platforms for conducting DRL research. For imitation leaning, we go through its three main categories, behavior cloning, inverse reinforcement learning and generative adversarial imitation learning, by introducing their formulations and their corresponding robotics applications. Finally, we discuss the open challenges and research frontiers. △ Less

Submitted 8 April, 2018; v1 submitted 21 December, 2016; originally announced December 2016.

Comments: 19 pages, 1 figures

Showing 1–8 of 8 results for author: Bödecker, J