Skip to main content

Showing 1–31 of 31 results for author: Boehmer, W

.
  1. arXiv:2406.08069  [pdf, other

    cs.LG cs.AI

    Explore-Go: Leveraging Exploration for Generalisation in Deep Reinforcement Learning

    Authors: Max Weltevrede, Felix Kaubek, Matthijs T. J. Spaan, Wendelin Böhmer

    Abstract: One of the remaining challenges in reinforcement learning is to develop agents that can generalise to novel scenarios they might encounter once deployed. This challenge is often framed in a multi-task setting where agents train on a fixed set of tasks and have to generalise to new tasks. Recent work has shown that in this setting increased exploration during training can be leveraged to increase t… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  2. arXiv:2406.01423  [pdf, other

    cs.LG cs.AI

    Value Improved Actor Critic Algorithms

    Authors: Yaniv Oren, Moritz A. Zanger, Pascal R. van der Vaart, Matthijs T. J. Spaan, Wendelin Bohmer

    Abstract: Many modern reinforcement learning algorithms build on the actor-critic (AC) framework: iterative improvement of a policy (the actor) using policy improvement operators and iterative approximation of the policy's value (the critic). In contrast, the popular value-based algorithm family employs improvement operators in the value update, to iteratively improve the value function directly. In this wo… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  3. arXiv:2405.01984  [pdf, other

    math.OC cs.AI

    A Penalty-Based Guardrail Algorithm for Non-Decreasing Optimization with Inequality Constraints

    Authors: Ksenija Stepanovic, Wendelin Böhmer, Mathijs de Weerdt

    Abstract: Traditional mathematical programming solvers require long computational times to solve constrained minimization problems of complex and large-scale physical systems. Therefore, these problems are often transformed into unconstrained ones, and solved with computationally efficient optimization approaches based on first-order information, such as the gradient descent method. However, for unconstrain… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

  4. arXiv:2402.01361  [pdf, other

    cs.LG

    To the Max: Reinventing Reward in Reinforcement Learning

    Authors: Grigorii Veviurko, Wendelin Böhmer, Mathijs de Weerdt

    Abstract: In reinforcement learning (RL), different rewards can define the same optimal policy but result in drastically different learning performance. For some, the agent gets stuck with a suboptimal behavior, and for others, it solves the task efficiently. Choosing a good reward function is hence an extremely important yet challenging problem. In this paper, we explore an alternative approach to using re… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

  5. arXiv:2312.02665  [pdf, other

    cs.AI cs.LG

    Lights out: training RL agents robust to temporary blindness

    Authors: N. Ordonez, M. Tromp, P. M. Julbe, W. Böhmer

    Abstract: Agents trained with DQN rely on an observation at each timestep to decide what action to take next. However, in real world applications observations can change or be missing entirely. Examples of this could be a light bulb breaking down, or the wallpaper in a certain room changing. While these situations change the actual observation, the underlying optimal policy does not change. Because of this… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

  6. arXiv:2310.12816  [pdf, other

    cs.RO cs.MA

    Multi-Robot Local Motion Planning Using Dynamic Optimization Fabrics

    Authors: Saray Bakker, Luzia Knoedler, Max Spahn, Wendelin Böhmer, Javier Alonso-Mora

    Abstract: In this paper, we address the problem of real-time motion planning for multiple robotic manipulators that operate in close proximity. We build upon the concept of dynamic fabrics and extend them to multi-robot systems, referred to as Multi-Robot Dynamic Fabrics (MRDF). This geometric method enables a very high planning frequency for high-dimensional systems at the expense of being reactive and pro… ▽ More

    Submitted 19 October, 2023; originally announced October 2023.

    Comments: 6 pages + 1 page references, 2 tables, 4 figures, preprint version to accepted paper to IEEE International Symposium on Multi-Robot & Multi-Agent Systems, Boston, 2023

  7. arXiv:2307.16304  [pdf, other

    cs.LG math.OC

    You Shall Pass: Dealing with the Zero-Gradient Problem in Predict and Optimize for Convex Optimization

    Authors: Grigorii Veviurko, Wendelin Böhmer, Mathijs de Weerdt

    Abstract: Predict and optimize is an increasingly popular decision-making paradigm that employs machine learning to predict unknown parameters of optimization problems. Instead of minimizing the prediction error of the parameters, it trains predictive models using task performance as a loss function. The key challenge to train such models is the computation of the Jacobian of the solution of the optimizatio… ▽ More

    Submitted 2 February, 2024; v1 submitted 30 July, 2023; originally announced July 2023.

  8. arXiv:2306.07124  [pdf, other

    cs.LG cs.AI stat.ML

    Diverse Projection Ensembles for Distributional Reinforcement Learning

    Authors: Moritz A. Zanger, Wendelin Böhmer, Matthijs T. J. Spaan

    Abstract: In contrast to classical reinforcement learning, distributional reinforcement learning algorithms aim to learn the distribution of returns rather than their expected value. Since the nature of the return distribution is generally unknown a priori or arbitrarily complex, a common approach finds approximations within a set of representable, parametric distributions. Typically, this involves a projec… ▽ More

    Submitted 12 June, 2023; originally announced June 2023.

    Comments: 21 pages, 7 figures, submitted to NeurIPS 2023

  9. arXiv:2306.05727  [pdf, other

    cs.LG

    The Role of Diverse Replay for Generalisation in Reinforcement Learning

    Authors: Max Weltevrede, Matthijs T. J. Spaan, Wendelin Böhmer

    Abstract: In reinforcement learning (RL), key components of many algorithms are the exploration strategy and replay buffer. These strategies regulate what environment data is collected and trained on and have been extensively studied in the RL literature. In this paper, we investigate the impact of these components in the context of generalisation in multi-task RL. We investigate the hypothesis that collect… ▽ More

    Submitted 31 August, 2023; v1 submitted 9 June, 2023; originally announced June 2023.

    Comments: 15 pages, 8 figures

  10. arXiv:2212.03068  [pdf, other

    cs.RO cs.AI

    Active Classification of Moving Targets with Learned Control Policies

    Authors: Álvaro Serra-Gómez, Eduardo Montijano, Wendelin Böhmer, Javier Alonso-Mora

    Abstract: In this paper, we consider the problem where a drone has to collect semantic information to classify multiple moving targets. In particular, we address the challenge of computing control inputs that move the drone to informative viewpoints, position and orientation, when the information is extracted using a "black-box" classifier, e.g., a deep learning neural network. These algorithms typically la… ▽ More

    Submitted 27 September, 2023; v1 submitted 6 December, 2022; originally announced December 2022.

    Comments: 8 pages, 6 figures, Accepted in IEEE RA-L

    MSC Class: 68T40

  11. arXiv:2210.13455  [pdf, other

    cs.LG cs.AI

    E-MCTS: Deep Exploration in Model-Based Reinforcement Learning by Planning with Epistemic Uncertainty

    Authors: Yaniv Oren, Matthijs T. J. Spaan, Wendelin Böhmer

    Abstract: One of the most well-studied and highly performing planning approaches used in Model-Based Reinforcement Learning (MBRL) is Monte-Carlo Tree Search (MCTS). Key challenges of MCTS-based MBRL methods remain dedicated deep exploration and reliability in the face of the unknown, and both challenges can be alleviated through principled epistemic uncertainty estimation in the predictions of MCTS. We pre… ▽ More

    Submitted 30 August, 2023; v1 submitted 21 October, 2022; originally announced October 2022.

    Comments: Submitted to NeurIPS 2023, accepted to EWRL 2023

  12. arXiv:2010.02974  [pdf, other

    cs.LG cs.AI cs.MA

    UneVEn: Universal Value Exploration for Multi-Agent Reinforcement Learning

    Authors: Tarun Gupta, Anuj Mahajan, Bei Peng, Wendelin Böhmer, Shimon Whiteson

    Abstract: VDN and QMIX are two popular value-based algorithms for cooperative MARL that learn a centralized action value function as a monotonic mixing of per-agent utilities. While this enables easy decentralization of the learned policy, the restricted joint action value function can prevent them from solving tasks that require significant coordination between agents at a given timestep. We show that this… ▽ More

    Submitted 10 June, 2021; v1 submitted 6 October, 2020; originally announced October 2020.

    Comments: Published at ICML 2021

  13. arXiv:2010.01856  [pdf, other

    cs.LG stat.ML

    My Body is a Cage: the Role of Morphology in Graph-Based Incompatible Control

    Authors: Vitaly Kurin, Maximilian Igl, Tim Rocktäschel, Wendelin Boehmer, Shimon Whiteson

    Abstract: Multitask Reinforcement Learning is a promising way to obtain models with better performance, generalisation, data efficiency, and robustness. Most existing work is limited to compatible settings, where the state and action space dimensions are the same across tasks. Graph Neural Networks (GNN) are one way to address incompatible environments, because they can process graphs of arbitrary size. The… ▽ More

    Submitted 14 April, 2021; v1 submitted 5 October, 2020; originally announced October 2020.

    Comments: ICLR 2021 Camera-Ready Version

  14. arXiv:2006.05826  [pdf, other

    cs.LG cs.AI stat.ML

    Transient Non-Stationarity and Generalisation in Deep Reinforcement Learning

    Authors: Maximilian Igl, Gregory Farquhar, Jelena Luketina, Wendelin Boehmer, Shimon Whiteson

    Abstract: Non-stationarity can arise in Reinforcement Learning (RL) even in stationary environments. For example, most RL algorithms collect new data throughout training, using a non-stationary behaviour policy. Due to the transience of this non-stationarity, it is often not explicitly addressed in deep RL and a single neural network is continually updated. However, we find evidence that neural networks exh… ▽ More

    Submitted 22 September, 2021; v1 submitted 10 June, 2020; originally announced June 2020.

  15. arXiv:2006.04222  [pdf, other

    cs.LG cs.AI cs.MA stat.ML

    Randomized Entity-wise Factorization for Multi-Agent Reinforcement Learning

    Authors: Shariq Iqbal, Christian A. Schroeder de Witt, Bei Peng, Wendelin Böhmer, Shimon Whiteson, Fei Sha

    Abstract: Multi-agent settings in the real world often involve tasks with varying types and quantities of agents and non-agent entities; however, common patterns of behavior often emerge among these agents/entities. Our method aims to leverage these commonalities by asking the question: ``What is the expected utility of each agent when only considering a randomly selected sub-group of its observed entities?… ▽ More

    Submitted 11 June, 2021; v1 submitted 7 June, 2020; originally announced June 2020.

    Comments: ICML 2021 Camera Ready

  16. arXiv:2005.09220  [pdf, other

    cs.LG cs.AI stat.ML

    Privileged Information Dropout in Reinforcement Learning

    Authors: Pierre-Alexandre Kamienny, Kai Arulkumaran, Feryal Behbahani, Wendelin Boehmer, Shimon Whiteson

    Abstract: Using privileged information during training can improve the sample efficiency and performance of machine learning systems. This paradigm has been applied to reinforcement learning (RL), primarily in the form of distillation or auxiliary tasks, and less commonly in the form of augmenting the inputs of agents. In this work, we investigate Privileged Information Dropout (\pid) for achieving the latt… ▽ More

    Submitted 19 May, 2020; originally announced May 2020.

  17. arXiv:2003.06709  [pdf, other

    cs.LG cs.AI stat.ML

    FACMAC: Factored Multi-Agent Centralised Policy Gradients

    Authors: Bei Peng, Tabish Rashid, Christian A. Schroeder de Witt, Pierre-Alexandre Kamienny, Philip H. S. Torr, Wendelin Böhmer, Shimon Whiteson

    Abstract: We propose FACtored Multi-Agent Centralised policy gradients (FACMAC), a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces. Like MADDPG, a popular multi-agent actor-critic method, our approach uses deep deterministic policy gradients to learn policies. However, FACMAC learns a centralised but factored critic, which combines per-agent utilit… ▽ More

    Submitted 7 May, 2021; v1 submitted 14 March, 2020; originally announced March 2020.

  18. arXiv:2002.12174  [pdf, other

    cs.LG cs.AI stat.ML

    Optimistic Exploration even with a Pessimistic Initialisation

    Authors: Tabish Rashid, Bei Peng, Wendelin Böhmer, Shimon Whiteson

    Abstract: Optimistic initialisation is an effective strategy for efficient exploration in reinforcement learning (RL). In the tabular case, all provably efficient model-free algorithms rely on it. However, model-free deep RL algorithms do not use optimistic initialisation despite taking inspiration from these provably efficient tabular algorithms. In particular, in scenarios with only positive rewards, Q-va… ▽ More

    Submitted 26 February, 2020; originally announced February 2020.

    Comments: Published as a conference paper at ICLR 2020

  19. Multi-agent Hierarchical Reinforcement Learning with Dynamic Termination

    Authors: Dongge Han, Wendelin Boehmer, Michael Wooldridge, Alex Rogers

    Abstract: In a multi-agent system, an agent's optimal policy will typically depend on the policies chosen by others. Therefore, a key issue in multi-agent systems research is that of predicting the behaviours of others, and responding promptly to changes in such behaviours. One obvious possibility is for each agent to broadcast their current intention, for example, the currently executed option in a hierarc… ▽ More

    Submitted 21 October, 2019; originally announced October 2019.

    Comments: PRICAI 2019

  20. arXiv:1910.00091  [pdf, other

    cs.LG cs.AI cs.MA

    Deep Coordination Graphs

    Authors: Wendelin Böhmer, Vitaly Kurin, Shimon Whiteson

    Abstract: This paper introduces the deep coordination graph (DCG) for collaborative multi-agent reinforcement learning. DCG strikes a flexible trade-off between representational capacity and generalization by factoring the joint value function of all agents according to a coordination graph into payoffs between pairs of agents. The value can be maximized by local message passing along the graph, which allow… ▽ More

    Submitted 23 June, 2020; v1 submitted 27 September, 2019; originally announced October 2019.

    Comments: Accepted at ICML 2020

  21. arXiv:1906.02138  [pdf, other

    cs.AI

    Exploration with Unreliable Intrinsic Reward in Multi-Agent Reinforcement Learning

    Authors: Wendelin Böhmer, Tabish Rashid, Shimon Whiteson

    Abstract: This paper investigates the use of intrinsic reward to guide exploration in multi-agent reinforcement learning. We discuss the challenges in applying intrinsic reward to multiple collaborative agents and demonstrate how unreliable reward can prevent decentralized agents from learning the optimal policy. We address this problem with a novel framework, Independent Centrally-assisted Q-learning (ICQL… ▽ More

    Submitted 5 June, 2019; originally announced June 2019.

    Comments: Accepted to the 2nd Exploration in Reinforcement Learning Workshop at the International Conference on Machine Learning 2019

  22. arXiv:1905.01072  [pdf, other

    cs.LG cs.AI stat.ML

    Deep Residual Reinforcement Learning

    Authors: Shangtong Zhang, Wendelin Boehmer, Shimon Whiteson

    Abstract: We revisit residual algorithms in both model-free and model-based reinforcement learning settings. We propose the bidirectional target network technique to stabilize residual algorithms, yielding a residual version of DDPG that significantly outperforms vanilla DDPG in the DeepMind Control Suite benchmark. Moreover, we find the residual algorithm an effective approach to the distribution mismatch… ▽ More

    Submitted 23 January, 2020; v1 submitted 3 May, 2019; originally announced May 2019.

    Comments: AAMAS 2020

  23. arXiv:1904.01033  [pdf, other

    cs.LG stat.ML

    Multitask Soft Option Learning

    Authors: Maximilian Igl, Andrew Gambardella, **ke He, Nantas Nardelli, N. Siddharth, Wendelin Böhmer, Shimon Whiteson

    Abstract: We present Multitask Soft Option Learning(MSOL), a hierarchical multitask framework based on Planning as Inference. MSOL extends the concept of options, using separate variational posteriors for each task, regularized by a shared prior. This ''soft'' version of options avoids several instabilities during training in a multitask setting, and provides a natural way to learn both intra-option policie… ▽ More

    Submitted 21 June, 2020; v1 submitted 1 April, 2019; originally announced April 2019.

    Comments: Published at UAI 2020

  24. arXiv:1903.11329  [pdf, other

    cs.LG cs.AI stat.ML

    Generalized Off-Policy Actor-Critic

    Authors: Shangtong Zhang, Wendelin Boehmer, Shimon Whiteson

    Abstract: We propose a new objective, the counterfactual objective, unifying existing objectives for off-policy policy gradient algorithms in the continuing reinforcement learning (RL) setting. Compared to the commonly used excursion objective, which can be misleading about the performance of the target policy when deployed, our new objective better predicts such performance. We prove the Generalized Off-Po… ▽ More

    Submitted 28 October, 2019; v1 submitted 27 March, 2019; originally announced March 2019.

    Comments: NeurIPS 2019

  25. arXiv:1810.11702  [pdf, other

    cs.MA cs.AI cs.GT cs.LG

    Multi-Agent Common Knowledge Reinforcement Learning

    Authors: Christian A. Schroeder de Witt, Jakob N. Foerster, Gregory Farquhar, Philip H. S. Torr, Wendelin Boehmer, Shimon Whiteson

    Abstract: Cooperative multi-agent reinforcement learning often requires decentralised policies, which severely limit the agents' ability to coordinate their behaviour. In this paper, we show that common knowledge between agents allows for complex decentralised coordination. Common knowledge arises naturally in a large number of decentralised cooperative multi-agent tasks, for example, when agents can recons… ▽ More

    Submitted 11 January, 2020; v1 submitted 27 October, 2018; originally announced October 2018.

    Comments: Advances in Neural Information Processing Systems, 9924-9935

  26. arXiv:1612.07548  [pdf, ps, other

    cs.AI cs.LG stat.ML

    Non-Deterministic Policy Improvement Stabilizes Approximated Reinforcement Learning

    Authors: Wendelin Böhmer, Rong Guo, Klaus Obermayer

    Abstract: This paper investigates a type of instability that is linked to the greedy policy improvement in approximated reinforcement learning. We show empirically that non-deterministic policy improvement can stabilize methods like LSPI by controlling the improvements' stochasticity. Additionally we show that a suitable representation of the value function also stabilizes the solution to some degree. The p… ▽ More

    Submitted 22 December, 2016; originally announced December 2016.

    Comments: This paper has been presented at the 13th European Workshop on Reinforcement Learning (EWRL 2016) on the 3rd and 4th of December 2016 in Barcelona, Spain

  27. arXiv:1504.04456  [pdf, ps, other

    nucl-th

    Theoretical neutron-capture cross sections for r-process nucleosynthesis in the $^{48}$Ca region

    Authors: T. Rauscher, W. Böhmer, K. -L. Kratz, W. Balogh, H. Oberhummer

    Abstract: We calculate neutron capture cross sections for r-process nucleosynthesis in the $^{48}$Ca region, namely for the isotopes $^{40-44}$S, $^{46-50}$Ar, $^{56-66}$Ti, $^{62-68}$Cr, and $^{72-76}$Fe. While previously only cross sections resulting from the compound nucleus reaction mechanism (Hauser-Feshbach) have been considered, we recalculate not only that contribution to the cross section but also… ▽ More

    Submitted 17 April, 2015; originally announced April 2015.

    Comments: 6 pages; talk at ENAM95, appeared in the proceedings, uploaded here to allow easy access. in Proc. Int. Conf. on Exotic Nuclei and Atomic Masses "ENAM 95", eds. M. de Saint Simon and O. Sorlin (Editions Frontières, Gif-sur-Yvette 1995), p. 683

  28. arXiv:1412.6286  [pdf, ps, other

    cs.LG stat.ML

    Regression with Linear Factored Functions

    Authors: Wendelin Böhmer, Klaus Obermayer

    Abstract: Many applications that use empirically estimated functions face a curse of dimensionality, because the integrals over most function classes must be approximated by sampling. This paper introduces a novel regression-algorithm that learns linear factored functions (LFF). This class of functions has structural properties that allow to analytically solve certain integrals and to calculate point-wise p… ▽ More

    Submitted 30 March, 2015; v1 submitted 19 December, 2014; originally announced December 2014.

    Comments: Under review as conference paper at ECML/PKDD 2015

  29. arXiv:1205.0986  [pdf, ps, other

    cs.AI cs.NE

    Robot Navigation using Reinforcement Learning and Slow Feature Analysis

    Authors: Wendelin Böhmer

    Abstract: The application of reinforcement learning algorithms onto real life problems always bears the challenge of filtering the environmental state out of raw sensor readings. While most approaches use heuristics, biology suggests that there must exist an unsupervised method to construct such filters automatically. Besides the extraction of environmental states, the filters have to represent them in a fa… ▽ More

    Submitted 4 May, 2012; originally announced May 2012.

    Comments: Diploma Thesis

  30. arXiv:astro-ph/0012217  [pdf, ps, other

    astro-ph

    On the origin of the Ca-Ti-Cr isotopic anomalies in the inclusion EK-1-4-1 of the Allende Meteorite

    Authors: K. -L. Kratz, W. Boehmer, C. Freiburghaus, P. Moeller, B. Pfeiffer, T. Rauscher, F. -K. Thielemann

    Abstract: In the framework of our investigation to explain the nucleosynthesis origin of the correlated Ca-Ti-Cr isotopic anomalies in the Ca-Al-rich ''FUN'' inclusion EK-1-4-1 of the Allende meteorite, the nuclear-physics basis in the neutron-rich N=28 region has been updated by including recent experimental data on beta-decay properties and microscopic predictions of neutron-capture cross sections. Char… ▽ More

    Submitted 11 December, 2000; originally announced December 2000.

    Comments: 15 pages, 3 figures. Proc. Torino-Melbourne-Pasadena "Wasserburg" Workshop, U. Torino, 21-22 June 2000. For publication in Mem.S.A.It

  31. Decay of neutron-rich Mn nuclides and deformation of heavy Fe isotopes

    Authors: M. Hannawald, T. Kautzsch, A. Woehr, W. B. Walters, K. -L. Kratz, V. N. Fedoseyev, V. L. Mishin, W. Boehmer, B. Pfeiffer, V. Sebastian, Y. Jading, U. Koester, J. Lettry, H. L. Ravn, the ISOLDE Collaboration

    Abstract: The use of chemically selective laser ionization combined with beta-delayed neutron counting at CERN/ISOLDE has permitted identification and half-life measurements for 623-ms Mn-61 up through 14-ms Mn-69. The measured half-lives are found to be significantly longer near N=40 than the values calculated with a QRPA shell model using ground-state deformations from the FRDM and ETFSI models. Gamma-r… ▽ More

    Submitted 21 December, 1998; originally announced December 1998.

    Comments: Latex-file with 4 figures, 5 pages, Phys. Rev. Lett., in print

    Journal ref: Phys.Rev.Lett. 82 (1999) 1391-1394