Search | arXiv e-print repository

Dynamic Update-to-Data Ratio: Minimizing World Model Overfitting

Authors: Nicolai Dorka, Tim Welschehold, Wolfram Burgard

Abstract: Early stop** based on the validation set performance is a popular approach to find the right balance between under- and overfitting in the context of supervised learning. However, in reinforcement learning, even for supervised sub-problems such as world model learning, early stop** is not applicable as the dataset is continually evolving. As a solution, we propose a new general method that dyn… ▽ More Early stop** based on the validation set performance is a popular approach to find the right balance between under- and overfitting in the context of supervised learning. However, in reinforcement learning, even for supervised sub-problems such as world model learning, early stop** is not applicable as the dataset is continually evolving. As a solution, we propose a new general method that dynamically adjusts the update to data (UTD) ratio during training based on under- and overfitting detection on a small subset of the continuously collected experience not used for training. We apply our method to DreamerV2, a state-of-the-art model-based reinforcement learning algorithm, and evaluate it on the DeepMind Control Suite and the Atari $100$k benchmark. The results demonstrate that one can better balance under- and overestimation by adjusting the UTD ratio with our approach compared to the default setting in DreamerV2 and that it is competitive with an extensive hyperparameter search which is not feasible for many applications. Our method eliminates the need to set the UTD hyperparameter by hand and even leads to a higher robustness with regard to other learning-related hyperparameters further reducing the amount of necessary tuning. △ Less

Submitted 17 March, 2023; originally announced March 2023.

Comments: ICLR 2023

arXiv:2009.08169 [pdf]

Holistic Filter Pruning for Efficient Deep Neural Networks

Authors: Lukas Enderich, Fabian Timm, Wolfram Burgard

Abstract: Deep neural networks (DNNs) are usually over-parameterized to increase the likelihood of getting adequate initial weights by random initialization. Consequently, trained DNNs have many redundancies which can be pruned from the model to reduce complexity and improve the ability to generalize. Structural sparsity, as achieved by filter pruning, directly reduces the tensor sizes of weights and activa… ▽ More Deep neural networks (DNNs) are usually over-parameterized to increase the likelihood of getting adequate initial weights by random initialization. Consequently, trained DNNs have many redundancies which can be pruned from the model to reduce complexity and improve the ability to generalize. Structural sparsity, as achieved by filter pruning, directly reduces the tensor sizes of weights and activations and is thus particularly effective for reducing complexity. We propose "Holistic Filter Pruning" (HFP), a novel approach for common DNN training that is easy to implement and enables to specify accurate pruning rates for the number of both parameters and multiplications. After each forward pass, the current model complexity is calculated and compared to the desired target size. By gradient descent, a global solution can be found that allocates the pruning budget over the individual layers such that the desired target size is fulfilled. In various experiments, we give insights into the training and achieve state-of-the-art performance on CIFAR-10 and ImageNet (HFP prunes 60% of the multiplications of ResNet-50 on ImageNet with no significant loss in the accuracy). We believe our simple and powerful pruning approach to constitute a valuable contribution for users of DNNs in low-cost applications. △ Less

Submitted 17 September, 2020; originally announced September 2020.

Comments: preprint, accepted at WACV2021

arXiv:2007.02701 [pdf, other]

Scaling Imitation Learning in Minecraft

Authors: Artemij Amiranashvili, Nicolai Dorka, Wolfram Burgard, Vladlen Koltun, Thomas Brox

Abstract: Imitation learning is a powerful family of techniques for learning sensorimotor coordination in immersive environments. We apply imitation learning to attain state-of-the-art performance on hard exploration problems in the Minecraft environment. We report experiments that highlight the influence of network architecture, loss function, and data augmentation. An early version of our approach reached… ▽ More Imitation learning is a powerful family of techniques for learning sensorimotor coordination in immersive environments. We apply imitation learning to attain state-of-the-art performance on hard exploration problems in the Minecraft environment. We report experiments that highlight the influence of network architecture, loss function, and data augmentation. An early version of our approach reached second place in the MineRL competition at NeurIPS 2019. Here we report stronger results that can be used as a starting point for future competition entries and related research. Our code is available at https://github.com/amiranas/minerl_imitation_learning. △ Less

Submitted 6 July, 2020; originally announced July 2020.

arXiv:2003.04046 [pdf, other]

doi 10.1109/IROS45743.2020.9340784

Efficiency and Equity are Both Essential: A Generalized Traffic Signal Controller with Deep Reinforcement Learning

Authors: Shengchao Yan, **gwei Zhang, Daniel Büscher, Wolfram Burgard

Abstract: Traffic signal controllers play an essential role in today's traffic system. However, the majority of them currently is not sufficiently flexible or adaptive to generate optimal traffic schedules. In this paper we present an approach to learning policies for signal controllers using deep reinforcement learning aiming for optimized traffic flow. Our method uses a novel formulation of the reward fun… ▽ More Traffic signal controllers play an essential role in today's traffic system. However, the majority of them currently is not sufficiently flexible or adaptive to generate optimal traffic schedules. In this paper we present an approach to learning policies for signal controllers using deep reinforcement learning aiming for optimized traffic flow. Our method uses a novel formulation of the reward function that simultaneously considers efficiency and equity. We furthermore present a general approach to find the bound for the proposed equity factor and we introduce the adaptive discounting approach that greatly stabilizes learning and helps to maintain a high flexibility of green light duration. The experimental evaluations on both simulated and real-world data demonstrate that our proposed algorithm achieves state-of-the-art performance (previously held by traditional non-learning methods) on a wide range of traffic situations. △ Less

Submitted 27 December, 2020; v1 submitted 9 March, 2020; originally announced March 2020.

Comments: Published as a conference paper at IROS 2020

arXiv:1909.01039 [pdf, other]

Learning User Preferences for Trajectories from Brain Signals

Authors: Henrich Kolkhorst, Wolfram Burgard, Michael Tangermann

Abstract: Robot motions in the presence of humans should not only be feasible and safe, but also conform to human preferences. This, however, requires user feedback on the robot's behavior. In this work, we propose a novel approach to leverage the user's brain signals as a feedback modality in order to decode the judgment of robot trajectories and rank them according to the user's preferences. We show that… ▽ More Robot motions in the presence of humans should not only be feasible and safe, but also conform to human preferences. This, however, requires user feedback on the robot's behavior. In this work, we propose a novel approach to leverage the user's brain signals as a feedback modality in order to decode the judgment of robot trajectories and rank them according to the user's preferences. We show that brain signals measured using electroencephalography during observation of a robotic arm's trajectory as well as in response to preference statements are informative regarding the user's preference. Furthermore, we demonstrate that user feedback from brain signals can be used to reliably infer pairwise trajectory preferences as well as to retrieve the preferred observed trajectories of the user with a performance comparable to explicit behavioral feedback. △ Less

Submitted 20 December, 2019; v1 submitted 3 September, 2019; originally announced September 2019.

Comments: The International Symposium on Robotics Research (ISRR), Hanoi, Vietnam, October 2019; reformatted to two-column layout

arXiv:1903.07400 [pdf, other]

Scheduled Intrinsic Drive: A Hierarchical Take on Intrinsically Motivated Exploration

Authors: **gwei Zhang, Niklas Wetzel, Nicolai Dorka, Joschka Boedecker, Wolfram Burgard

Abstract: Exploration in sparse reward reinforcement learning remains an open challenge. Many state-of-the-art methods use intrinsic motivation to complement the sparse extrinsic reward signal, giving the agent more opportunities to receive feedback during exploration. Commonly these signals are added as bonus rewards, which results in a mixture policy that neither conducts exploration nor task fulfillment… ▽ More Exploration in sparse reward reinforcement learning remains an open challenge. Many state-of-the-art methods use intrinsic motivation to complement the sparse extrinsic reward signal, giving the agent more opportunities to receive feedback during exploration. Commonly these signals are added as bonus rewards, which results in a mixture policy that neither conducts exploration nor task fulfillment resolutely. In this paper, we instead learn separate intrinsic and extrinsic task policies and schedule between these different drives to accelerate exploration and stabilize learning. Moreover, we introduce a new type of intrinsic reward denoted as successor feature control (SFC), which is general and not task-specific. It takes into account statistics over complete trajectories and thus differs from previous methods that only use local information to evaluate intrinsic motivation. We evaluate our proposed scheduled intrinsic drive (SID) agent using three different environments with pure visual inputs: VizDoom, DeepMind Lab and DeepMind Control Suite. The results show a substantially improved exploration efficiency with SFC and the hierarchical usage of the intrinsic drives. A video of our experimental results can be found at https://youtu.be/b0MbY3lUlEI. △ Less

Submitted 21 June, 2019; v1 submitted 18 March, 2019; originally announced March 2019.

Comments: A video of our experimental results can be found at https://youtu.be/b0MbY3lUlEI

arXiv:1805.01667 [pdf, other]

Intracranial Error Detection via Deep Learning

Authors: Martin Völker, Jiří Hammer, Robin T. Schirrmeister, Joos Behncke, Lukas D. J. Fiederer, Andreas Schulze-Bonhage, Petr Marusič, Wolfram Burgard, Tonio Ball

Abstract: Deep learning techniques have revolutionized the field of machine learning and were recently successfully applied to various classification problems in noninvasive electroencephalography (EEG). However, these methods were so far only rarely evaluated for use in intracranial EEG. We employed convolutional neural networks (CNNs) to classify and characterize the error-related brain response as measur… ▽ More Deep learning techniques have revolutionized the field of machine learning and were recently successfully applied to various classification problems in noninvasive electroencephalography (EEG). However, these methods were so far only rarely evaluated for use in intracranial EEG. We employed convolutional neural networks (CNNs) to classify and characterize the error-related brain response as measured in 24 intracranial EEG recordings. Decoding accuracies of CNNs were significantly higher than those of a regularized linear discriminant analysis. Using time-resolved deep decoding, it was possible to classify errors in various regions in the human brain, and further to decode errors over 200 ms before the actual erroneous button press, e.g., in the precentral gyrus. Moreover, deeper networks performed better than shallower networks in distinguishing correct from error trials in all-channel decoding. In single recordings, up to 100 % decoding accuracy was achieved. Visualization of the networks' learned features indicated that multivariate decoding on an ensemble of channels yields related, albeit non-redundant information compared to single-channel decoding. In summary, here we show the usefulness of deep learning for both intracranial error decoding and map** of the spatio-temporal structure of the human error processing network. △ Less

Submitted 2 November, 2018; v1 submitted 4 May, 2018; originally announced May 2018.

Comments: 8 pages, 6 figures. Accepted at the 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC2018)

ACM Class: I.2.6; I.2.8; I.5.0; J.2; J.3

arXiv:1604.03912 [pdf, other]

Inverse Reinforcement Learning with Simultaneous Estimation of Rewards and Dynamics

Authors: Michael Herman, Tobias Gindele, Jörg Wagner, Felix Schmitt, Wolfram Burgard

Abstract: Inverse Reinforcement Learning (IRL) describes the problem of learning an unknown reward function of a Markov Decision Process (MDP) from observed behavior of an agent. Since the agent's behavior originates in its policy and MDP policies depend on both the stochastic system dynamics as well as the reward function, the solution of the inverse problem is significantly influenced by both. Current IRL… ▽ More Inverse Reinforcement Learning (IRL) describes the problem of learning an unknown reward function of a Markov Decision Process (MDP) from observed behavior of an agent. Since the agent's behavior originates in its policy and MDP policies depend on both the stochastic system dynamics as well as the reward function, the solution of the inverse problem is significantly influenced by both. Current IRL approaches assume that if the transition model is unknown, additional samples from the system's dynamics are accessible, or the observed behavior provides enough samples of the system's dynamics to solve the inverse problem accurately. These assumptions are often not satisfied. To overcome this, we present a gradient-based IRL approach that simultaneously estimates the system's dynamics. By solving the combined optimization problem, our approach takes into account the bias of the demonstrations, which stems from the generating policy. The evaluation on a synthetic MDP and a transfer learning task shows improvements regarding the sample efficiency as well as the accuracy of the estimated reward functions and transition models. △ Less

Submitted 13 April, 2016; originally announced April 2016.

Comments: accepted to appear in AISTATS 2016

Showing 1–8 of 8 results for author: Burgard, W