Skip to main content

Showing 1–38 of 38 results for author: Roijers, D M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.06184  [pdf, other

    cs.AI cs.LG

    Deep Multi-Objective Reinforcement Learning for Utility-Based Infrastructural Maintenance Optimization

    Authors: Jesse van Remmerden, Maurice Kenter, Diederik M. Roijers, Charalampos Andriotis, Yingqian Zhang, Zaharah Bukhsh

    Abstract: In this paper, we introduce Multi-Objective Deep Centralized Multi-Agent Actor-Critic (MO- DCMAC), a multi-objective reinforcement learning (MORL) method for infrastructural maintenance optimization, an area traditionally dominated by single-objective reinforcement learning (RL) approaches. Previous single-objective RL methods combine multiple objectives, such as probability of collapse and cost,… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  2. arXiv:2402.07182  [pdf, other

    cs.LG

    Divide and Conquer: Provably Unveiling the Pareto Front with Multi-Objective Reinforcement Learning

    Authors: Willem Röpke, Mathieu Reymond, Patrick Mannion, Diederik M. Roijers, Ann Nowé, Roxana Rădulescu

    Abstract: A significant challenge in multi-objective reinforcement learning is obtaining a Pareto front of policies that attain optimal performance under different preferences. We introduce Iterated Pareto Referent Optimisation (IPRO), a principled algorithm that decomposes the task of finding the Pareto front into a sequence of single-objective problems for which various solution methods exist. This enable… ▽ More

    Submitted 11 February, 2024; originally announced February 2024.

  3. arXiv:2402.02665  [pdf, ps, other

    cs.LG

    Utility-Based Reinforcement Learning: Unifying Single-objective and Multi-objective Reinforcement Learning

    Authors: Peter Vamplew, Cameron Foale, Conor F. Hayes, Patrick Mannion, Enda Howley, Richard Dazeley, Scott Johnson, Johan Källström, Gabriel Ramos, Roxana Rădulescu, Willem Röpke, Diederik M. Roijers

    Abstract: Research in multi-objective reinforcement learning (MORL) has introduced the utility-based paradigm, which makes use of both environmental rewards and a function that defines the utility derived by the user from those rewards. In this paper we extend this paradigm to the context of single-objective reinforcement learning (RL), and outline multiple potential benefits including the ability to perfor… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

    Comments: Accepted for the Blue Sky Track at AAMAS'24

  4. arXiv:2311.11288  [pdf, other

    cs.AI

    What Lies beyond the Pareto Front? A Survey on Decision-Support Methods for Multi-Objective Optimization

    Authors: Zuzanna Osika, Jazmin Zatarain Salazar, Diederik M. Roijers, Frans A. Oliehoek, Pradeep K. Murukannaiah

    Abstract: We present a review that unifies decision-support methods for exploring the solutions produced by multi-objective optimization (MOO) algorithms. As MOO is applied to solve diverse problems, approaches for analyzing the trade-offs offered by MOO algorithms are scattered across fields. We provide an overview of the advances on this topic, including methods for visualization, mining the solution set,… ▽ More

    Submitted 19 November, 2023; originally announced November 2023.

    Comments: IJCAI 2023 Conference Paper, Survey Track

  5. arXiv:2305.05560  [pdf, other

    cs.AI

    Distributional Multi-Objective Decision Making

    Authors: Willem Röpke, Conor F. Hayes, Patrick Mannion, Enda Howley, Ann Nowé, Diederik M. Roijers

    Abstract: For effective decision support in scenarios with conflicting objectives, sets of potentially optimal solutions can be presented to the decision maker. We explore both what policies these sets should contain and how such sets can be computed efficiently. With this in mind, we take a distributional approach and introduce a novel dominance criterion relating return distributions of policies directly.… ▽ More

    Submitted 18 July, 2023; v1 submitted 9 May, 2023; originally announced May 2023.

    Comments: Accepted at IJCAI 2023

  6. arXiv:2303.03284  [pdf, other

    cs.LG cs.AI

    The Wasserstein Believer: Learning Belief Updates for Partially Observable Environments through Reliable Latent Space Models

    Authors: Raphael Avalos, Florent Delgrange, Ann Nowé, Guillermo A. Pérez, Diederik M. Roijers

    Abstract: Partially Observable Markov Decision Processes (POMDPs) are used to model environments where the full state cannot be perceived by an agent. As such the agent needs to reason taking into account the past observations and actions. However, simply remembering the full history is generally intractable due to the exponential growth in the history space. Maintaining a probability distribution that mode… ▽ More

    Submitted 26 October, 2023; v1 submitted 6 March, 2023; originally announced March 2023.

  7. Sample-Efficient Multi-Objective Learning via Generalized Policy Improvement Prioritization

    Authors: Lucas N. Alegre, Ana L. C. Bazzan, Diederik M. Roijers, Ann Nowé, Bruno C. da Silva

    Abstract: Multi-objective reinforcement learning (MORL) algorithms tackle sequential decision problems where agents may have different preferences over (possibly conflicting) reward functions. Such algorithms often learn a set of policies (each optimized for a particular agent preference) that can later be used to solve problems with novel preferences. We introduce a novel algorithm that uses Generalized Po… ▽ More

    Submitted 23 March, 2023; v1 submitted 18 January, 2023; originally announced January 2023.

    Comments: Accepted to AAMAS 2023

  8. arXiv:2301.05755  [pdf, other

    cs.GT

    Bridging the Gap Between Single and Multi Objective Games

    Authors: Willem Röpke, Carla Groenland, Roxana Rădulescu, Ann Nowé, Diederik M. Roijers

    Abstract: A classic model to study strategic decision making in multi-agent systems is the normal-form game. This model can be generalised to allow for an infinite number of pure strategies leading to continuous games. Multi-objective normal-form games are another generalisation that model settings where players receive separate payoffs in more than one objective. We bridge the gap between the two models by… ▽ More

    Submitted 1 March, 2023; v1 submitted 13 January, 2023; originally announced January 2023.

    Comments: Accepted to AAMAS 2023

  9. arXiv:2211.13032  [pdf, other

    cs.AI cs.LG

    Monte Carlo Tree Search Algorithms for Risk-Aware and Multi-Objective Reinforcement Learning

    Authors: Conor F. Hayes, Mathieu Reymond, Diederik M. Roijers, Enda Howley, Patrick Mannion

    Abstract: In many risk-aware and multi-objective reinforcement learning settings, the utility of the user is derived from a single execution of a policy. In these settings, making decisions based on the average future returns is not suitable. For example, in a medical setting a patient may only have one opportunity to treat their illness. Making decisions using just the expected future returns -- known in r… ▽ More

    Submitted 6 December, 2022; v1 submitted 23 November, 2022; originally announced November 2022.

    Comments: arXiv admin note: substantial text overlap with arXiv:2102.00966

  10. arXiv:2211.04108  [pdf, other

    cs.CV

    Determining Accessible Sidewalk Width by Extracting Obstacle Information from Point Clouds

    Authors: Cláudia Fonseca Pinhão, Chris Eijgenstein, Iva Gornishka, Shayla Jansen, Diederik M. Roijers, Daan Bloembergen

    Abstract: Obstacles on the sidewalk often block the path, limiting passage and resulting in frustration and wasted time, especially for citizens and visitors who use assistive devices (wheelchairs, walkers, strollers, canes, etc). To enable equal participation and use of the city, all citizens should be able to perform and complete their daily activities in a similar amount of time and effort. Therefore, we… ▽ More

    Submitted 8 November, 2022; originally announced November 2022.

    Comments: 4 pages, 9 figures. Presented at the workshop on "The Future of Urban Accessibility" at ACM ASSETS'22. Code for this paper is available at https://github.com/Amsterdam-AI-Team/Urban_PointCloud_Sidewalk_Width

    ACM Class: I.4.6; I.4.8

  11. arXiv:2207.00368  [pdf, other

    cs.AI cs.LG

    Multi-Objective Coordination Graphs for the Expected Scalarised Returns with Generative Flow Models

    Authors: Conor F. Hayes, Timothy Verstraeten, Diederik M. Roijers, Enda Howley, Patrick Mannion

    Abstract: Many real-world problems contain multiple objectives and agents, where a trade-off exists between objectives. Key to solving such problems is to exploit sparse dependency structures that exist between agents. For example, in wind farm control a trade-off exists between maximising power and minimising stress on the systems components. Dependencies between turbines arise due to the wake effect. We m… ▽ More

    Submitted 1 July, 2022; originally announced July 2022.

  12. arXiv:2204.05027  [pdf, ps, other

    cs.LG cs.AI q-bio.PE

    Exploring the Pareto front of multi-objective COVID-19 mitigation policies using reinforcement learning

    Authors: Mathieu Reymond, Conor F. Hayes, Lander Willem, Roxana Rădulescu, Steven Abrams, Diederik M. Roijers, Enda Howley, Patrick Mannion, Niel Hens, Ann Nowé, Pieter Libin

    Abstract: Infectious disease outbreaks can have a disruptive impact on public health and societal processes. As decision making in the context of epidemic mitigation is hard, reinforcement learning provides a methodology to automatically learn prevention strategies in combination with complex epidemic models. Current research focuses on optimizing policies w.r.t. a single objective, such as the pathogen's a… ▽ More

    Submitted 11 April, 2022; originally announced April 2022.

  13. arXiv:2112.15422  [pdf, other

    cs.AI

    Scalar reward is not enough: A response to Silver, Singh, Precup and Sutton (2021)

    Authors: Peter Vamplew, Benjamin J. Smith, Johan Kallstrom, Gabriel Ramos, Roxana Radulescu, Diederik M. Roijers, Conor F. Hayes, Fredrik Heintz, Patrick Mannion, Pieter J. K. Libin, Richard Dazeley, Cameron Foale

    Abstract: The recent paper `"Reward is Enough" by Silver, Singh, Precup and Sutton posits that the concept of reward maximisation is sufficient to underpin all intelligence, both natural and artificial. We contest the underlying assumption of Silver et al. that such reward can be scalar-valued. In this paper we explain why scalar rewards are insufficient to account for some aspects of both biological and co… ▽ More

    Submitted 24 November, 2021; originally announced December 2021.

  14. arXiv:2112.12458  [pdf, other

    cs.LG cs.AI

    Local Advantage Networks for Cooperative Multi-Agent Reinforcement Learning

    Authors: Raphaël Avalos, Mathieu Reymond, Ann Nowé, Diederik M. Roijers

    Abstract: Many recent successful off-policy multi-agent reinforcement learning (MARL) algorithms for cooperative partially observable environments focus on finding factorized value functions, leading to convoluted network structures. Building on the structure of independent Q-learners, our LAN algorithm takes a radically different approach, leveraging a dueling architecture to learn for each agent a decentr… ▽ More

    Submitted 26 October, 2023; v1 submitted 23 December, 2021; originally announced December 2021.

    Comments: https://openreview.net/forum?id=adpKzWQunW

    Journal ref: Transactions on Machine Learning Research - October 2023

  15. arXiv:2112.06500  [pdf, other

    cs.GT cs.MA

    On Nash Equilibria in Normal-Form Games With Vectorial Payoffs

    Authors: Willem Röpke, Diederik M. Roijers, Ann Nowé, Roxana Rădulescu

    Abstract: We provide an in-depth study of Nash equilibria in multi-objective normal form games (MONFGs), i.e., normal form games with vectorial payoffs. Taking a utility-based approach, we assume that each player's utility can be modelled with a utility function that maps a vector to a scalar utility. In the case of a mixed strategy, it is meaningful to apply such a scalarisation both before calculating the… ▽ More

    Submitted 16 July, 2022; v1 submitted 13 December, 2021; originally announced December 2021.

  16. arXiv:2111.09191  [pdf, other

    cs.GT cs.LG cs.MA

    Preference Communication in Multi-Objective Normal-Form Games

    Authors: Willem Röpke, Diederik M. Roijers, Ann Nowé, Roxana Rădulescu

    Abstract: We consider preference communication in two-player multi-objective normal-form games. In such games, the payoffs resulting from joint actions are vector-valued. Taking a utility-based approach, we assume there exists a utility function for each player which maps vectors to scalar utilities and consider agents that aim to maximise the utility of expected payoff vectors. As agents typically do not k… ▽ More

    Submitted 10 June, 2022; v1 submitted 17 November, 2021; originally announced November 2021.

  17. Expected Scalarised Returns Dominance: A New Solution Concept for Multi-Objective Decision Making

    Authors: Conor F. Hayes, Timothy Verstraeten, Diederik M. Roijers, Enda Howley, Patrick Mannion

    Abstract: In many real-world scenarios, the utility of a user is derived from the single execution of a policy. In this case, to apply multi-objective reinforcement learning, the expected utility of the returns must be optimised. Various scenarios exist where a user's preferences over objectives (also known as the utility function) are unknown or difficult to specify. In such scenarios, a set of optimal pol… ▽ More

    Submitted 1 July, 2022; v1 submitted 2 June, 2021; originally announced June 2021.

  18. A Practical Guide to Multi-Objective Reinforcement Learning and Planning

    Authors: Conor F. Hayes, Roxana Rădulescu, Eugenio Bargiacchi, Johan Källström, Matthew Macfarlane, Mathieu Reymond, Timothy Verstraeten, Luisa M. Zintgraf, Richard Dazeley, Fredrik Heintz, Enda Howley, Athirai A. Irissappane, Patrick Mannion, Ann Nowé, Gabriel Ramos, Marcello Restelli, Peter Vamplew, Diederik M. Roijers

    Abstract: Real-world decision-making tasks are generally complex, requiring trade-offs between multiple, often conflicting, objectives. Despite this, the majority of research in reinforcement learning and decision-theoretic planning either assumes only a single objective, or that multiple objectives can be adequately handled via a simple linear combination. Such approaches may oversimplify the underlying pr… ▽ More

    Submitted 17 March, 2021; originally announced March 2021.

    Journal ref: Auton Agent Multi-Agent Syst 36, 26 (2022)

  19. arXiv:2102.00966  [pdf, other

    cs.LG cs.AI

    Risk Aware and Multi-Objective Decision Making with Distributional Monte Carlo Tree Search

    Authors: Conor F. Hayes, Mathieu Reymond, Diederik M. Roijers, Enda Howley, Patrick Mannion

    Abstract: In many risk-aware and multi-objective reinforcement learning settings, the utility of the user is derived from the single execution of a policy. In these settings, making decisions based on the average future returns is not suitable. For example, in a medical setting a patient may only have one opportunity to treat their illness. When making a decision, just the expected return -- known in reinfo… ▽ More

    Submitted 2 February, 2021; v1 submitted 1 February, 2021; originally announced February 2021.

    Comments: 8 pages, 4 figures

  20. arXiv:2101.07844  [pdf, other

    cs.LG cs.AI eess.SY

    Scalable Optimization for Wind Farm Control using Coordination Graphs

    Authors: Timothy Verstraeten, Pieter-Jan Daems, Eugenio Bargiacchi, Diederik M. Roijers, Pieter J. K. Libin, Jan Helsen

    Abstract: Wind farms are a crucial driver toward the generation of ecological and renewable energy. Due to their rapid increase in capacity, contemporary wind farms need to adhere to strict constraints on power output to ensure stability of the electricity grid. Specifically, a wind farm controller is required to match the farm's power production with a power demand imposed by the grid operator. This is a n… ▽ More

    Submitted 19 January, 2021; originally announced January 2021.

  21. arXiv:2011.07290  [pdf, other

    cs.MA cs.AI cs.GT cs.LG

    Opponent Learning Awareness and Modelling in Multi-Objective Normal Form Games

    Authors: Roxana Rădulescu, Timothy Verstraeten, Yijie Zhang, Patrick Mannion, Diederik M. Roijers, Ann Nowé

    Abstract: Many real-world multi-agent interactions consider multiple distinct criteria, i.e. the payoffs are multi-objective in nature. However, the same multi-objective payoff vector may lead to different utilities for each participant. Therefore, it is essential for an agent to learn about the behaviour of other agents in the system. In this work, we present the first study of the effects of such opponent… ▽ More

    Submitted 14 November, 2020; originally announced November 2020.

    Comments: Under review since 14 November 2020

  22. arXiv:2005.04166  [pdf, other

    cs.NE cs.AI

    Time Efficiency in Optimization with a Bayesian-Evolutionary Algorithm

    Authors: Gong** Lan, Jakub M. Tomczak, Diederik M. Roijers, A. E. Eiben

    Abstract: Not all generate-and-test search algorithms are created equal. Bayesian Optimization (BO) invests a lot of computation time to generate the candidate solution that best balances the predicted value and the uncertainty given all previous data, taking increasingly more time as the number of evaluations performed grows. Evolutionary Algorithms (EA) on the other hand rely on search heuristics that typ… ▽ More

    Submitted 4 May, 2020; originally announced May 2020.

    Comments: 13 pages, 10 Figures

  23. arXiv:2001.08177  [pdf, other

    cs.GT cs.AI cs.LG cs.MA

    A utility-based analysis of equilibria in multi-objective normal form games

    Authors: Roxana Rădulescu, Patrick Mannion, Yijie Zhang, Diederik M. Roijers, Ann Nowé

    Abstract: In multi-objective multi-agent systems (MOMAS), agents explicitly consider the possible tradeoffs between conflicting objective functions. We argue that compromises between competing objectives in MOMAS should be analysed on the basis of the utility that these compromises have for the users of a system, where an agent's utility function maps their payoff vectors to scalar utility values. This util… ▽ More

    Submitted 17 January, 2020; originally announced January 2020.

    Comments: Under review since 16 January 2020

  24. arXiv:2001.07804  [pdf

    cs.NE cs.AI

    Learning Directed Locomotion in Modular Robots with Evolvable Morphologies

    Authors: Gong** Lan, Matteo De Carlo, Fuda van Diggelen, Jakub M. Tomczak, Diederik M. Roijers, A. E. Eiben

    Abstract: We generalize the well-studied problem of gait learning in modular robots in two dimensions. Firstly, we address locomotion in a given target direction that goes beyond learning a typical undirected gait. Secondly, rather than studying one fixed robot morphology we consider a test suite of different modular robots. This study is based on our interest in evolutionary robot systems where both morpho… ▽ More

    Submitted 21 January, 2020; originally announced January 2020.

    Comments: 30 pages, 14 figures

  25. arXiv:2001.07527  [pdf, other

    cs.LG cs.AI cs.MA stat.ML

    Model-based Multi-Agent Reinforcement Learning with Cooperative Prioritized Swee**

    Authors: Eugenio Bargiacchi, Timothy Verstraeten, Diederik M. Roijers, Ann Nowé

    Abstract: We present a new model-based reinforcement learning algorithm, Cooperative Prioritized Swee**, for efficient learning in multi-agent Markov decision processes. The algorithm allows for sample-efficient learning on large problems by exploiting a factorization to approximate the value function. Our approach only requires knowledge about the structure of the problem in the form of a dynamic decisio… ▽ More

    Submitted 15 January, 2020; originally announced January 2020.

  26. arXiv:1911.10120  [pdf, other

    cs.LG cs.AI cs.MA stat.ML

    Multi-Agent Thompson Sampling for Bandit Applications with Sparse Neighbourhood Structures

    Authors: Timothy Verstraeten, Eugenio Bargiacchi, Pieter JK Libin, Jan Helsen, Diederik M Roijers, Ann Nowé

    Abstract: Multi-agent coordination is prevalent in many real-world applications. However, such coordination is challenging due to its combinatorial nature. An important observation in this regard is that agents in the real world often only directly affect a limited set of neighbouring agents. Leveraging such loose couplings among agents is key to making coordination in multi-agent systems feasible. In this… ▽ More

    Submitted 7 February, 2020; v1 submitted 22 November, 2019; originally announced November 2019.

    Journal ref: Sci Rep 10, 6728 (2020)

  27. Multi-Objective Multi-Agent Decision Making: A Utility-based Analysis and Survey

    Authors: Roxana Rădulescu, Patrick Mannion, Diederik M. Roijers, Ann Nowé

    Abstract: The majority of multi-agent system (MAS) implementations aim to optimise agents' policies with respect to a single objective, despite the fact that many real-world problem domains are inherently multi-objective in nature. Multi-objective multi-agent systems (MOMAS) explicitly consider the possible trade-offs between conflicting objective functions. We argue that, in MOMAS, such compromises should… ▽ More

    Submitted 6 September, 2019; originally announced September 2019.

    Comments: Under review since 15 May 2019

  28. arXiv:1903.04193  [pdf, other

    cs.LG cs.AI

    Sample-Efficient Model-Free Reinforcement Learning with Off-Policy Critics

    Authors: Denis Steckelmacher, Hélène Plisnier, Diederik M. Roijers, Ann Nowé

    Abstract: Value-based reinforcement-learning algorithms provide state-of-the-art results in model-free discrete-action settings, and tend to outperform actor-critic algorithms. We argue that actor-critic algorithms are limited by their need for an on-policy critic. We propose Bootstrapped Dual Policy Iteration (BDPI), a novel model-free reinforcement-learning algorithm for continuous states and discrete act… ▽ More

    Submitted 12 June, 2019; v1 submitted 11 March, 2019; originally announced March 2019.

    Comments: Accepted at the European Conference on Machine Learning 2019 (ECML)

  29. arXiv:1902.02556  [pdf, other

    cs.AI

    The Actor-Advisor: Policy Gradient With Off-Policy Advice

    Authors: Hélène Plisnier, Denis Steckelmacher, Diederik M. Roijers, Ann Nowé

    Abstract: Actor-critic algorithms learn an explicit policy (actor), and an accompanying value function (critic). The actor performs actions in the environment, while the critic evaluates the actor's current policy. However, despite their stability and promising convergence properties, current actor-critic algorithms do not outperform critic-only ones in practice. We believe that the fact that the critic lea… ▽ More

    Submitted 7 February, 2019; originally announced February 2019.

  30. arXiv:1809.07803  [pdf, other

    cs.LG cs.AI stat.ML

    Dynamic Weights in Multi-Objective Deep Reinforcement Learning

    Authors: Axel Abels, Diederik M. Roijers, Tom Lenaerts, Ann Nowé, Denis Steckelmacher

    Abstract: Many real-world decision problems are characterized by multiple conflicting objectives which must be balanced based on their relative importance. In the dynamic weights setting the relative importance changes over time and specialized algorithms that deal with such change, such as a tabular Reinforcement Learning (RL) algorithm by Natarajan and Tadepalli (2005), are required. However, this earlier… ▽ More

    Submitted 13 May, 2019; v1 submitted 20 September, 2018; originally announced September 2018.

    ACM Class: I.2.6

  31. arXiv:1808.04096  [pdf, other

    cs.LG cs.AI stat.ML

    Directed Policy Gradient for Safe Reinforcement Learning with Human Advice

    Authors: Hélène Plisnier, Denis Steckelmacher, Tim Brys, Diederik M. Roijers, Ann Nowé

    Abstract: Many currently deployed Reinforcement Learning agents work in an environment shared with humans, be them co-workers, users or clients. It is desirable that these agents adjust to people's preferences, learn faster thanks to their help, and act safely around them. We argue that most current approaches that learn from human feedback are unsafe: rewarding or punishing the agent a-posteriori cannot im… ▽ More

    Submitted 13 August, 2018; originally announced August 2018.

    Comments: Accepted at the European Workshop on Reinforcement Learning 2018 (EWRL14)

  32. arXiv:1802.07606  [pdf, other

    cs.LG cs.AI stat.ML

    Ordered Preference Elicitation Strategies for Supporting Multi-Objective Decision Making

    Authors: Luisa M Zintgraf, Diederik M Roijers, Sjoerd Linders, Catholijn M Jonker, Ann Nowé

    Abstract: In multi-objective decision planning and learning, much attention is paid to producing optimal solution sets that contain an optimal policy for every possible user preference profile. We argue that the step that follows, i.e, determining which policy to execute by maximising the user's intrinsic utility function over this (possibly infinite) set, is under-studied. This paper aims to fill this gap.… ▽ More

    Submitted 21 February, 2018; originally announced February 2018.

    Comments: AAMAS 2018, Source code at https://github.com/lmzintgraf/gp_pref_elicit

  33. arXiv:1711.06299  [pdf, ps, other

    cs.LG cs.AI q-bio.PE

    Bayesian Best-Arm Identification for Selecting Influenza Mitigation Strategies

    Authors: Pieter Libin, Timothy Verstraeten, Diederik M. Roijers, Jelena Grujic, Kristof Theys, Philippe Lemey, Ann Nowé

    Abstract: Pandemic influenza has the epidemic potential to kill millions of people. While various preventive measures exist (i.a., vaccination and school closures), deciding on strategies that lead to their most effective and efficient use remains challenging. To this end, individual-based epidemiological models are essential to assist decision makers in determining the best strategy to curb epidemic spread… ▽ More

    Submitted 15 June, 2018; v1 submitted 16 November, 2017; originally announced November 2017.

  34. arXiv:1708.06551  [pdf, other

    cs.AI cs.LG

    Reinforcement Learning in POMDPs with Memoryless Options and Option-Observation Initiation Sets

    Authors: Denis Steckelmacher, Diederik M. Roijers, Anna Harutyunyan, Peter Vrancx, Hélène Plisnier, Ann Nowé

    Abstract: Many real-world reinforcement learning problems have a hierarchical nature, and often exhibit some degree of partial observability. While hierarchy and partial observability are usually tackled separately (for instance by combining recurrent neural networks and options), we show that addressing both problems simultaneously is simpler and more efficient in many cases. More specifically, we make the… ▽ More

    Submitted 12 September, 2017; v1 submitted 22 August, 2017; originally announced August 2017.

  35. arXiv:1610.02707  [pdf, other

    cs.AI

    Multi-Objective Deep Reinforcement Learning

    Authors: Hossam Mossalam, Yannis M. Assael, Diederik M. Roijers, Shimon Whiteson

    Abstract: We propose Deep Optimistic Linear Support Learning (DOL) to solve high-dimensional multi-objective decision problems where the relative importances of the objectives are not known a priori. Using features from the high-dimensional inputs, DOL computes the convex coverage set containing all potential optimal solutions of the convex combinations of the objectives. To our knowledge, this is the first… ▽ More

    Submitted 9 October, 2016; originally announced October 2016.

  36. arXiv:1606.06888  [pdf, ps, other

    cs.AI cs.GT

    Structure in the Value Function of Two-Player Zero-Sum Games of Incomplete Information

    Authors: Auke J. Wiggers, Frans A. Oliehoek, Diederik M. Roijers

    Abstract: Zero-sum stochastic games provide a rich model for competitive decision making. However, under general forms of state uncertainty as considered in the Partially Observable Stochastic Game (POSG), such decision making problems are still not very well understood. This paper makes a contribution to the theory of zero-sum POSGs by characterizing structure in their value function. In particular, we int… ▽ More

    Submitted 22 June, 2016; originally announced June 2016.

  37. arXiv:1511.09047  [pdf, other

    cs.AI cs.MA

    Solving Transition-Independent Multi-agent MDPs with Sparse Interactions (Extended version)

    Authors: Joris Scharpff, Diederik M. Roijers, Frans A. Oliehoek, Matthijs T. J. Spaan, Mathijs M. de Weerdt

    Abstract: In cooperative multi-agent sequential decision making under uncertainty, agents must coordinate to find an optimal joint policy that maximises joint value. Typical algorithms exploit additive structure in the value function, but in the fully-observable multi-agent MDP setting (MMDP) such structure is not present. We propose a new optimal solver for transition-independent MMDPs, in which agents can… ▽ More

    Submitted 11 February, 2016; v1 submitted 29 November, 2015; originally announced November 2015.

    Comments: This article is an extended version of the paper that was published under the same title in the Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI16), held in Phoenix, Arizona USA on February 12-17, 2016

  38. A Survey of Multi-Objective Sequential Decision-Making

    Authors: Diederik Marijn Roijers, Peter Vamplew, Shimon Whiteson, Richard Dazeley

    Abstract: Sequential decision-making problems with multiple objectives arise naturally in practice and pose unique challenges for research in decision-theoretic planning and learning, which has largely focused on single-objective settings. This article surveys algorithms designed for sequential decision-making problems with multiple objectives. Though there is a growing body of literature on this subject, l… ▽ More

    Submitted 3 February, 2014; originally announced February 2014.

    Journal ref: Journal Of Artificial Intelligence Research, Volume 48, pages 67-113, 2013