Skip to main content

Showing 1–48 of 48 results for author: Nowé, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.03596  [pdf, other

    cs.LG cs.AI cs.MA

    Laser Learning Environment: A new environment for coordination-critical multi-agent tasks

    Authors: Yannick Molinghen, Raphaël Avalos, Mark Van Achter, Ann Nowé, Tom Lenaerts

    Abstract: We introduce the Laser Learning Environment (LLE), a collaborative multi-agent reinforcement learning environment in which coordination is central. In LLE, agents depend on each other to make progress (interdependence), must jointly take specific sequences of actions to succeed (perfect coordination), and accomplishing those joint actions does not yield any intermediate reward (zero-incentive dyna… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: Pre-print, 21 pages

  2. arXiv:2403.08829  [pdf, other

    cs.HC cs.LG cs.SI

    Mitigating Biases in Collective Decision-Making: Enhancing Performance in the Face of Fake News

    Authors: Axel Abels, Elias Fernandez Domingos, Ann Nowé, Tom Lenaerts

    Abstract: Individual and social biases undermine the effectiveness of human advisers by inducing judgment errors which can disadvantage protected groups. In this paper, we study the influence these biases can have in the pervasive problem of fake news by evaluating human participants' capacity to identify false headlines. By focusing on headlines involving sensitive characteristics, we gather a comprehensiv… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

  3. arXiv:2402.13785  [pdf, other

    cs.AI

    Synthesis of Hierarchical Controllers Based on Deep Reinforcement Learning Policies

    Authors: Florent Delgrange, Guy Avni, Anna Lukina, Christian Schilling, Ann Nowé, Guillermo A. Pérez

    Abstract: We propose a novel approach to the problem of controller design for environments modeled as Markov decision processes (MDPs). Specifically, we consider a hierarchical MDP a graph with each vertex populated by an MDP called a "room". We first apply deep reinforcement learning (DRL) to obtain low-level policies for each room, scaling to large rooms of unknown structure. We then apply reactive synthe… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

    Comments: 19 pages main text, 17 pages Appendix (excluding references)

  4. arXiv:2402.07182  [pdf, other

    cs.LG

    Divide and Conquer: Provably Unveiling the Pareto Front with Multi-Objective Reinforcement Learning

    Authors: Willem Röpke, Mathieu Reymond, Patrick Mannion, Diederik M. Roijers, Ann Nowé, Roxana Rădulescu

    Abstract: A significant challenge in multi-objective reinforcement learning is obtaining a Pareto front of policies that attain optimal performance under different preferences. We introduce Iterated Pareto Referent Optimisation (IPRO), a principled algorithm that decomposes the task of finding the Pareto front into a sequence of single-objective problems for which various solution methods exist. This enable… ▽ More

    Submitted 11 February, 2024; originally announced February 2024.

  5. arXiv:2306.10134  [pdf, other

    cs.LG cs.AI cs.MA

    Dynamic Size Message Scheduling for Multi-Agent Communication under Limited Bandwidth

    Authors: Qingshuang Sun, Denis Steckelmacher, Yuan Yao, Ann Nowé, Raphaël Avalos

    Abstract: Communication plays a vital role in multi-agent systems, fostering collaboration and coordination. However, in real-world scenarios where communication is bandwidth-limited, existing multi-agent reinforcement learning (MARL) algorithms often provide agents with a binary choice: either transmitting a fixed number of bytes or no information at all. This limitation hinders the ability to effectively… ▽ More

    Submitted 16 June, 2023; originally announced June 2023.

  6. arXiv:2305.05560  [pdf, other

    cs.AI

    Distributional Multi-Objective Decision Making

    Authors: Willem Röpke, Conor F. Hayes, Patrick Mannion, Enda Howley, Ann Nowé, Diederik M. Roijers

    Abstract: For effective decision support in scenarios with conflicting objectives, sets of potentially optimal solutions can be presented to the decision maker. We explore both what policies these sets should contain and how such sets can be computed efficiently. With this in mind, we take a distributional approach and introduce a novel dominance criterion relating return distributions of policies directly.… ▽ More

    Submitted 18 July, 2023; v1 submitted 9 May, 2023; originally announced May 2023.

    Comments: Accepted at IJCAI 2023

  7. Expertise Trees Resolve Knowledge Limitations in Collective Decision-Making

    Authors: Axel Abels, Tom Lenaerts, Vito Trianni, Ann Nowé

    Abstract: Experts advising decision-makers are likely to display expertise which varies as a function of the problem instance. In practice, this may lead to sub-optimal or discriminatory decisions against minority cases. In this work we model such changes in depth and breadth of knowledge as a partitioning of the problem space into regions of differing expertise. We provide here new algorithms that explicit… ▽ More

    Submitted 4 May, 2023; v1 submitted 2 May, 2023; originally announced May 2023.

    Comments: Proceedings of the 40th International Conference on Machine Learning (2023)

  8. arXiv:2304.08897  [pdf, other

    eess.SY cs.AI cs.LG math.OC

    An adaptive safety layer with hard constraints for safe reinforcement learning in multi-energy management systems

    Authors: Glenn Ceusters, Muhammad Andy Putratama, Rüdiger Franke, Ann Nowé, Maarten Messagie

    Abstract: Safe reinforcement learning (RL) with hard constraint guarantees is a promising optimal control direction for multi-energy management systems. It only requires the environment-specific constraint functions itself a priori and not a complete model. The project-specific upfront and ongoing engineering efforts are therefore still reduced, better representations of the underlying system dynamics can s… ▽ More

    Submitted 6 November, 2023; v1 submitted 18 April, 2023; originally announced April 2023.

    Comments: post-print

  9. arXiv:2303.12558  [pdf, other

    cs.LG cs.AI

    Wasserstein Auto-encoded MDPs: Formal Verification of Efficiently Distilled RL Policies with Many-sided Guarantees

    Authors: Florent Delgrange, Ann Nowé, Guillermo A. Pérez

    Abstract: Although deep reinforcement learning (DRL) has many success stories, the large-scale deployment of policies learned through these advanced techniques in safety-critical scenarios is hindered by their lack of formal guarantees. Variational Markov Decision Processes (VAE-MDPs) are discrete latent space models that provide a reliable framework for distilling formally verifiable controllers from any R… ▽ More

    Submitted 21 April, 2023; v1 submitted 22 March, 2023; originally announced March 2023.

    Comments: ICLR 2023, 10 pages main text, 14 pages appendix (excluding references)

  10. arXiv:2303.03284  [pdf, other

    cs.LG cs.AI

    The Wasserstein Believer: Learning Belief Updates for Partially Observable Environments through Reliable Latent Space Models

    Authors: Raphael Avalos, Florent Delgrange, Ann Nowé, Guillermo A. Pérez, Diederik M. Roijers

    Abstract: Partially Observable Markov Decision Processes (POMDPs) are used to model environments where the full state cannot be perceived by an agent. As such the agent needs to reason taking into account the past observations and actions. However, simply remembering the full history is generally intractable due to the exponential growth in the history space. Maintaining a probability distribution that mode… ▽ More

    Submitted 26 October, 2023; v1 submitted 6 March, 2023; originally announced March 2023.

  11. arXiv:2301.12822  [pdf, other

    cs.LG cs.AI

    Evaluating COVID-19 vaccine allocation policies using Bayesian $m$-top exploration

    Authors: Alexandra Cimpean, Timothy Verstraeten, Lander Willem, Niel Hens, Ann Nowé, Pieter Libin

    Abstract: Individual-based epidemiological models support the study of fine-grained preventive measures, such as tailored vaccine allocation policies, in silico. As individual-based models are computationally intensive, it is pivotal to identify optimal strategies within a reasonable computational budget. Moreover, due to the high societal impact associated with the implementation of preventive strategies,… ▽ More

    Submitted 30 January, 2023; originally announced January 2023.

  12. arXiv:2301.12820  [pdf, other

    cs.AI

    Transferring Multiple Policies to Hotstart Reinforcement Learning in an Air Compressor Management Problem

    Authors: Hélène Plisnier, Denis Steckelmacher, Jeroen Willems, Bruno Depraetere, Ann Nowé

    Abstract: Many instances of similar or almost-identical industrial machines or tools are often deployed at once, or in quick succession. For instance, a particular model of air compressor may be installed at hundreds of customers. Because these tools perform distinct but highly similar tasks, it is interesting to be able to quickly produce a high-quality controller for machine $N+1$ given the controllers al… ▽ More

    Submitted 30 January, 2023; originally announced January 2023.

    Comments: Preliminary version, experimental details still to be made more precise

  13. Sample-Efficient Multi-Objective Learning via Generalized Policy Improvement Prioritization

    Authors: Lucas N. Alegre, Ana L. C. Bazzan, Diederik M. Roijers, Ann Nowé, Bruno C. da Silva

    Abstract: Multi-objective reinforcement learning (MORL) algorithms tackle sequential decision problems where agents may have different preferences over (possibly conflicting) reward functions. Such algorithms often learn a set of policies (each optimized for a particular agent preference) that can later be used to solve problems with novel preferences. We introduce a novel algorithm that uses Generalized Po… ▽ More

    Submitted 23 March, 2023; v1 submitted 18 January, 2023; originally announced January 2023.

    Comments: Accepted to AAMAS 2023

  14. arXiv:2301.05755  [pdf, other

    cs.GT

    Bridging the Gap Between Single and Multi Objective Games

    Authors: Willem Röpke, Carla Groenland, Roxana Rădulescu, Ann Nowé, Diederik M. Roijers

    Abstract: A classic model to study strategic decision making in multi-agent systems is the normal-form game. This model can be generalised to allow for an infinite number of pure strategies leading to continuous games. Multi-objective normal-form games are another generalisation that model settings where players receive separate payoffs in more than one objective. We bridge the gap between the two models by… ▽ More

    Submitted 1 March, 2023; v1 submitted 13 January, 2023; originally announced January 2023.

    Comments: Accepted to AAMAS 2023

  15. arXiv:2207.03830  [pdf, other

    eess.SY cs.AI cs.LG math.OC

    Safe reinforcement learning for multi-energy management systems with known constraint functions

    Authors: Glenn Ceusters, Luis Ramirez Camargo, Rüdiger Franke, Ann Nowé, Maarten Messagie

    Abstract: Reinforcement learning (RL) is a promising optimal control technique for multi-energy management systems. It does not require a model a priori - reducing the upfront and ongoing project-specific engineering effort and is capable of learning better representations of the underlying system dynamics. However, vanilla RL does not provide constraint satisfaction guarantees - resulting in various potent… ▽ More

    Submitted 1 September, 2022; v1 submitted 8 July, 2022; originally announced July 2022.

    Comments: 26 pages, 14 figures

  16. arXiv:2204.05036  [pdf, other

    cs.LG cs.AI

    Pareto Conditioned Networks

    Authors: Mathieu Reymond, Eugenio Bargiacchi, Ann Nowé

    Abstract: In multi-objective optimization, learning all the policies that reach Pareto-efficient solutions is an expensive process. The set of optimal policies can grow exponentially with the number of objectives, and recovering all solutions requires an exhaustive exploration of the entire state space. We propose Pareto Conditioned Networks (PCN), a method that uses a single neural network to encompass all… ▽ More

    Submitted 11 April, 2022; originally announced April 2022.

    Comments: Accepted at the International Conference on Autonomous Agents and Multiagent Systems (AAMAS) 2022

  17. arXiv:2204.05027  [pdf, ps, other

    cs.LG cs.AI q-bio.PE

    Exploring the Pareto front of multi-objective COVID-19 mitigation policies using reinforcement learning

    Authors: Mathieu Reymond, Conor F. Hayes, Lander Willem, Roxana Rădulescu, Steven Abrams, Diederik M. Roijers, Enda Howley, Patrick Mannion, Niel Hens, Ann Nowé, Pieter Libin

    Abstract: Infectious disease outbreaks can have a disruptive impact on public health and societal processes. As decision making in the context of epidemic mitigation is hard, reinforcement learning provides a methodology to automatically learn prevention strategies in combination with complex epidemic models. Current research focuses on optimizing policies w.r.t. a single objective, such as the pathogen's a… ▽ More

    Submitted 11 April, 2022; originally announced April 2022.

  18. arXiv:2112.12458  [pdf, other

    cs.LG cs.AI

    Local Advantage Networks for Cooperative Multi-Agent Reinforcement Learning

    Authors: Raphaël Avalos, Mathieu Reymond, Ann Nowé, Diederik M. Roijers

    Abstract: Many recent successful off-policy multi-agent reinforcement learning (MARL) algorithms for cooperative partially observable environments focus on finding factorized value functions, leading to convoluted network structures. Building on the structure of independent Q-learners, our LAN algorithm takes a radically different approach, leveraging a dueling architecture to learn for each agent a decentr… ▽ More

    Submitted 26 October, 2023; v1 submitted 23 December, 2021; originally announced December 2021.

    Comments: https://openreview.net/forum?id=adpKzWQunW

    Journal ref: Transactions on Machine Learning Research - October 2023

  19. arXiv:2112.09655  [pdf, other

    cs.LG cs.AI

    Distillation of RL Policies with Formal Guarantees via Variational Abstraction of Markov Decision Processes (Technical Report)

    Authors: Florent Delgrange, Ann Nowé, Guillermo A. Pérez

    Abstract: We consider the challenge of policy simplification and verification in the context of policies learned through reinforcement learning (RL) in continuous environments. In well-behaved settings, RL algorithms have convergence guarantees in the limit. While these guarantees are valuable, they are insufficient for safety-critical applications. Furthermore, they are lost when applying advanced techniqu… ▽ More

    Submitted 14 June, 2022; v1 submitted 17 December, 2021; originally announced December 2021.

    Comments: AAAI 2022, technical report including supplementary material (10 pages main text, 14 pages appendix)

  20. arXiv:2112.06500  [pdf, other

    cs.GT cs.MA

    On Nash Equilibria in Normal-Form Games With Vectorial Payoffs

    Authors: Willem Röpke, Diederik M. Roijers, Ann Nowé, Roxana Rădulescu

    Abstract: We provide an in-depth study of Nash equilibria in multi-objective normal form games (MONFGs), i.e., normal form games with vectorial payoffs. Taking a utility-based approach, we assume that each player's utility can be modelled with a utility function that maps a vector to a scalar utility. In the case of a mixed strategy, it is meaningful to apply such a scalarisation both before calculating the… ▽ More

    Submitted 16 July, 2022; v1 submitted 13 December, 2021; originally announced December 2021.

  21. arXiv:2111.09191  [pdf, other

    cs.GT cs.LG cs.MA

    Preference Communication in Multi-Objective Normal-Form Games

    Authors: Willem Röpke, Diederik M. Roijers, Ann Nowé, Roxana Rădulescu

    Abstract: We consider preference communication in two-player multi-objective normal-form games. In such games, the payoffs resulting from joint actions are vector-valued. Taking a utility-based approach, we assume there exists a utility function for each player which maps vectors to scalar utilities and consider agents that aim to maximise the utility of expected payoff vectors. As agents typically do not k… ▽ More

    Submitted 10 June, 2022; v1 submitted 17 November, 2021; originally announced November 2021.

  22. arXiv:2106.13539  [pdf, other

    cs.AI cs.LG

    Dealing with Expert Bias in Collective Decision-Making

    Authors: Axel Abels, Tom Lenaerts, Vito Trianni, Ann Nowé

    Abstract: Quite some real-world problems can be formulated as decision-making problems wherein one must repeatedly make an appropriate choice from a set of alternatives. Multiple expert judgements, whether human or artificial, can help in taking correct decisions, especially when exploration of alternative solutions is costly. As expert opinions might deviate, the problem of finding the right alternative ca… ▽ More

    Submitted 29 August, 2022; v1 submitted 25 June, 2021; originally announced June 2021.

  23. Synthesising Reinforcement Learning Policies through Set-Valued Inductive Rule Learning

    Authors: Youri Coppens, Denis Steckelmacher, Catholijn M. Jonker, Ann Nowé

    Abstract: Today's advanced Reinforcement Learning algorithms produce black-box policies, that are often difficult to interpret and trust for a person. We introduce a policy distilling algorithm, building on the CN2 rule mining algorithm, that distills the policy into a rule-based decision system. At the core of our approach is the fact that an RL process does not just learn a policy, a map** from states t… ▽ More

    Submitted 10 June, 2021; originally announced June 2021.

    Comments: 17 pages, 4 figures. The final authenticated publication is available online at https://doi.org/10.1007/978-3-030-73959-1_15

    Journal ref: Trustworthy AI - Integrating Learning, Optimization and Reasoning (2021), Lecture Notes in Computer Science, vol. 12641, pp. 163-179

  24. arXiv:2104.09785  [pdf, other

    eess.SY cs.AI cs.LG math.OC

    Model-predictive control and reinforcement learning in multi-energy system case studies

    Authors: Glenn Ceusters, Román Cantú Rodríguez, Alberte Bouso García, Rüdiger Franke, Geert Deconinck, Lieve Helsen, Ann Nowé, Maarten Messagie, Luis Ramirez Camargo

    Abstract: Model-predictive-control (MPC) offers an optimal control technique to establish and ensure that the total operation cost of multi-energy systems remains at a minimum while fulfilling all system constraints. However, this method presumes an adequate model of the underlying system dynamics, which is prone to modelling errors and is not necessarily adaptive. This has an associated initial and ongoing… ▽ More

    Submitted 9 September, 2021; v1 submitted 20 April, 2021; originally announced April 2021.

    Comments: 43 pages, 29 figures

  25. A Practical Guide to Multi-Objective Reinforcement Learning and Planning

    Authors: Conor F. Hayes, Roxana Rădulescu, Eugenio Bargiacchi, Johan Källström, Matthew Macfarlane, Mathieu Reymond, Timothy Verstraeten, Luisa M. Zintgraf, Richard Dazeley, Fredrik Heintz, Enda Howley, Athirai A. Irissappane, Patrick Mannion, Ann Nowé, Gabriel Ramos, Marcello Restelli, Peter Vamplew, Diederik M. Roijers

    Abstract: Real-world decision-making tasks are generally complex, requiring trade-offs between multiple, often conflicting, objectives. Despite this, the majority of research in reinforcement learning and decision-theoretic planning either assumes only a single objective, or that multiple objectives can be adequately handled via a simple linear combination. Such approaches may oversimplify the underlying pr… ▽ More

    Submitted 17 March, 2021; originally announced March 2021.

    Journal ref: Auton Agent Multi-Agent Syst 36, 26 (2022)

  26. arXiv:2011.07290  [pdf, other

    cs.MA cs.AI cs.GT cs.LG

    Opponent Learning Awareness and Modelling in Multi-Objective Normal Form Games

    Authors: Roxana Rădulescu, Timothy Verstraeten, Yijie Zhang, Patrick Mannion, Diederik M. Roijers, Ann Nowé

    Abstract: Many real-world multi-agent interactions consider multiple distinct criteria, i.e. the payoffs are multi-objective in nature. However, the same multi-objective payoff vector may lead to different utilities for each participant. Therefore, it is essential for an agent to learn about the behaviour of other agents in the system. In this work, we present the first study of the effects of such opponent… ▽ More

    Submitted 14 November, 2020; originally announced November 2020.

    Comments: Under review since 14 November 2020

  27. arXiv:2003.13676  [pdf, other

    cs.LG cs.AI cs.MA

    Deep reinforcement learning for large-scale epidemic control

    Authors: Pieter Libin, Arno Moonens, Timothy Verstraeten, Fabian Perez-San**es, Niel Hens, Philippe Lemey, Ann Nowé

    Abstract: Epidemics of infectious diseases are an important threat to public health and global economies. Yet, the development of prevention strategies remains a challenging process, as epidemics are non-linear and complex processes. For this reason, we investigate a deep reinforcement learning approach to automatically learn prevention strategies in the context of pandemic influenza. Firstly, we construct… ▽ More

    Submitted 30 March, 2020; originally announced March 2020.

  28. arXiv:2001.09502  [pdf, other

    cs.LG stat.ML

    An interpretable semi-supervised classifier using two different strategies for amended self-labeling

    Authors: Isel Grau, Dipankar Sengupta, Maria M. Garcia Lorenzo, Ann Nowe

    Abstract: In the context of some machine learning applications, obtaining data instances is a relatively easy process but labeling them could become quite expensive or tedious. Such scenarios lead to datasets with few labeled instances and a larger number of unlabeled ones. Semi-supervised classification techniques combine labeled and unlabeled data during the learning phase in order to increase the classif… ▽ More

    Submitted 20 July, 2020; v1 submitted 26 January, 2020; originally announced January 2020.

    Comments: Accepted at Special Session on Advances on Explainable Artificial Intelligence, IEEE International Conference on Fuzzy Systems (FUZZ-IEEE 2020), IEEE World Congress on Computational Intelligence (WCCI 2020)

  29. arXiv:2001.08177  [pdf, other

    cs.GT cs.AI cs.LG cs.MA

    A utility-based analysis of equilibria in multi-objective normal form games

    Authors: Roxana Rădulescu, Patrick Mannion, Yijie Zhang, Diederik M. Roijers, Ann Nowé

    Abstract: In multi-objective multi-agent systems (MOMAS), agents explicitly consider the possible tradeoffs between conflicting objective functions. We argue that compromises between competing objectives in MOMAS should be analysed on the basis of the utility that these compromises have for the users of a system, where an agent's utility function maps their payoff vectors to scalar utility values. This util… ▽ More

    Submitted 17 January, 2020; originally announced January 2020.

    Comments: Under review since 16 January 2020

  30. arXiv:2001.07527  [pdf, other

    cs.LG cs.AI cs.MA stat.ML

    Model-based Multi-Agent Reinforcement Learning with Cooperative Prioritized Swee**

    Authors: Eugenio Bargiacchi, Timothy Verstraeten, Diederik M. Roijers, Ann Nowé

    Abstract: We present a new model-based reinforcement learning algorithm, Cooperative Prioritized Swee**, for efficient learning in multi-agent Markov decision processes. The algorithm allows for sample-efficient learning on large problems by exploiting a factorization to approximate the value function. Our approach only requires knowledge about the structure of the problem in the form of a dynamic decisio… ▽ More

    Submitted 15 January, 2020; originally announced January 2020.

  31. arXiv:1911.10121  [pdf, other

    cs.LG cs.AI eess.SY stat.ML

    Fleet Control using Coregionalized Gaussian Process Policy Iteration

    Authors: Timothy Verstraeten, Pieter JK Libin, Ann Nowé

    Abstract: In many settings, as for example wind farms, multiple machines are instantiated to perform the same task, which is called a fleet. The recent advances with respect to the Internet of Things allow control devices and/or machines to connect through cloud-based architectures in order to share information about their status and environment. Such an infrastructure allows seamless data sharing between f… ▽ More

    Submitted 22 November, 2019; originally announced November 2019.

  32. arXiv:1911.10120  [pdf, other

    cs.LG cs.AI cs.MA stat.ML

    Multi-Agent Thompson Sampling for Bandit Applications with Sparse Neighbourhood Structures

    Authors: Timothy Verstraeten, Eugenio Bargiacchi, Pieter JK Libin, Jan Helsen, Diederik M Roijers, Ann Nowé

    Abstract: Multi-agent coordination is prevalent in many real-world applications. However, such coordination is challenging due to its combinatorial nature. An important observation in this regard is that agents in the real world often only directly affect a limited set of neighbouring agents. Leveraging such loose couplings among agents is key to making coordination in multi-agent systems feasible. In this… ▽ More

    Submitted 7 February, 2020; v1 submitted 22 November, 2019; originally announced November 2019.

    Journal ref: Sci Rep 10, 6728 (2020)

  33. IPC-Net: 3D point-cloud segmentation using deep inter-point convolutional layers

    Authors: Felipe Gomez Marulanda, Pieter Libin, Timothy Verstraeten, Ann Nowé

    Abstract: Over the last decade, the demand for better segmentation and classification algorithms in 3D spaces has significantly grown due to the popularity of new 3D sensor technologies and advancements in the field of robotics. Point-clouds are one of the most popular representations to store a digital description of 3D shapes. However, point-clouds are stored in irregular and unordered structures, which l… ▽ More

    Submitted 30 September, 2019; originally announced September 2019.

    Journal ref: 2018 IEEE 30th International Conference on Tools with Artificial Intelligence (ICTAI),

  34. Multi-Objective Multi-Agent Decision Making: A Utility-based Analysis and Survey

    Authors: Roxana Rădulescu, Patrick Mannion, Diederik M. Roijers, Ann Nowé

    Abstract: The majority of multi-agent system (MAS) implementations aim to optimise agents' policies with respect to a single objective, despite the fact that many real-world problem domains are inherently multi-objective in nature. Multi-objective multi-agent systems (MOMAS) explicitly consider the possible trade-offs between conflicting objective functions. We argue that, in MOMAS, such compromises should… ▽ More

    Submitted 6 September, 2019; originally announced September 2019.

    Comments: Under review since 15 May 2019

  35. arXiv:1907.07958  [pdf, other

    cs.AI cs.RO

    Transfer Learning Across Simulated Robots With Different Sensors

    Authors: Hélène Plisnier, Denis Steckelmacher, Diederik Roijers, Ann Nowé

    Abstract: For a robot to learn a good policy, it often requires expensive equipment (such as sophisticated sensors) and a prepared training environment conducive to learning. However, it is seldom possible to perfectly equip robots for economic reasons, nor to guarantee ideal learning conditions, when deployed in real-life environments. A solution would be to prepare the robot in the lab environment, when a… ▽ More

    Submitted 18 July, 2019; originally announced July 2019.

  36. arXiv:1903.04193  [pdf, other

    cs.LG cs.AI

    Sample-Efficient Model-Free Reinforcement Learning with Off-Policy Critics

    Authors: Denis Steckelmacher, Hélène Plisnier, Diederik M. Roijers, Ann Nowé

    Abstract: Value-based reinforcement-learning algorithms provide state-of-the-art results in model-free discrete-action settings, and tend to outperform actor-critic algorithms. We argue that actor-critic algorithms are limited by their need for an on-policy critic. We propose Bootstrapped Dual Policy Iteration (BDPI), a novel model-free reinforcement-learning algorithm for continuous states and discrete act… ▽ More

    Submitted 12 June, 2019; v1 submitted 11 March, 2019; originally announced March 2019.

    Comments: Accepted at the European Conference on Machine Learning 2019 (ECML)

  37. arXiv:1902.02556  [pdf, other

    cs.AI

    The Actor-Advisor: Policy Gradient With Off-Policy Advice

    Authors: Hélène Plisnier, Denis Steckelmacher, Diederik M. Roijers, Ann Nowé

    Abstract: Actor-critic algorithms learn an explicit policy (actor), and an accompanying value function (critic). The actor performs actions in the environment, while the critic evaluates the actor's current policy. However, despite their stability and promising convergence properties, current actor-critic algorithms do not outperform critic-only ones in practice. We believe that the fact that the critic lea… ▽ More

    Submitted 7 February, 2019; originally announced February 2019.

  38. arXiv:1809.07803  [pdf, other

    cs.LG cs.AI stat.ML

    Dynamic Weights in Multi-Objective Deep Reinforcement Learning

    Authors: Axel Abels, Diederik M. Roijers, Tom Lenaerts, Ann Nowé, Denis Steckelmacher

    Abstract: Many real-world decision problems are characterized by multiple conflicting objectives which must be balanced based on their relative importance. In the dynamic weights setting the relative importance changes over time and specialized algorithms that deal with such change, such as a tabular Reinforcement Learning (RL) algorithm by Natarajan and Tadepalli (2005), are required. However, this earlier… ▽ More

    Submitted 13 May, 2019; v1 submitted 20 September, 2018; originally announced September 2018.

    ACM Class: I.2.6

  39. arXiv:1808.04096  [pdf, other

    cs.LG cs.AI stat.ML

    Directed Policy Gradient for Safe Reinforcement Learning with Human Advice

    Authors: Hélène Plisnier, Denis Steckelmacher, Tim Brys, Diederik M. Roijers, Ann Nowé

    Abstract: Many currently deployed Reinforcement Learning agents work in an environment shared with humans, be them co-workers, users or clients. It is desirable that these agents adjust to people's preferences, learn faster thanks to their help, and act safely around them. We argue that most current approaches that learn from human feedback are unsafe: rewarding or punishing the agent a-posteriori cannot im… ▽ More

    Submitted 13 August, 2018; originally announced August 2018.

    Comments: Accepted at the European Workshop on Reinforcement Learning 2018 (EWRL14)

  40. arXiv:1802.07606  [pdf, other

    cs.LG cs.AI stat.ML

    Ordered Preference Elicitation Strategies for Supporting Multi-Objective Decision Making

    Authors: Luisa M Zintgraf, Diederik M Roijers, Sjoerd Linders, Catholijn M Jonker, Ann Nowé

    Abstract: In multi-objective decision planning and learning, much attention is paid to producing optimal solution sets that contain an optimal policy for every possible user preference profile. We argue that the step that follows, i.e, determining which policy to execute by maximising the user's intrinsic utility function over this (possibly infinite) set, is under-studied. This paper aims to fill this gap.… ▽ More

    Submitted 21 February, 2018; originally announced February 2018.

    Comments: AAMAS 2018, Source code at https://github.com/lmzintgraf/gp_pref_elicit

  41. arXiv:1711.06299  [pdf, ps, other

    cs.LG cs.AI q-bio.PE

    Bayesian Best-Arm Identification for Selecting Influenza Mitigation Strategies

    Authors: Pieter Libin, Timothy Verstraeten, Diederik M. Roijers, Jelena Grujic, Kristof Theys, Philippe Lemey, Ann Nowé

    Abstract: Pandemic influenza has the epidemic potential to kill millions of people. While various preventive measures exist (i.a., vaccination and school closures), deciding on strategies that lead to their most effective and efficient use remains challenging. To this end, individual-based epidemiological models are essential to assist decision makers in determining the best strategy to curb epidemic spread… ▽ More

    Submitted 15 June, 2018; v1 submitted 16 November, 2017; originally announced November 2017.

  42. arXiv:1711.03817  [pdf, other

    cs.AI

    Learning with Options that Terminate Off-Policy

    Authors: Anna Harutyunyan, Peter Vrancx, Pierre-Luc Bacon, Doina Precup, Ann Nowe

    Abstract: A temporally abstract action, or an option, is specified by a policy and a termination condition: the policy guides option behavior, and the termination condition roughly determines its length. Generally, learning with longer options (like learning with multi-step returns) is known to be more efficient. However, if the option set for the task is not ideal, and cannot express the primitive optimal… ▽ More

    Submitted 2 December, 2017; v1 submitted 10 November, 2017; originally announced November 2017.

    Comments: AAAI 2018

  43. arXiv:1708.06551  [pdf, other

    cs.AI cs.LG

    Reinforcement Learning in POMDPs with Memoryless Options and Option-Observation Initiation Sets

    Authors: Denis Steckelmacher, Diederik M. Roijers, Anna Harutyunyan, Peter Vrancx, Hélène Plisnier, Ann Nowé

    Abstract: Many real-world reinforcement learning problems have a hierarchical nature, and often exhibit some degree of partial observability. While hierarchy and partial observability are usually tackled separately (for instance by combining recurrent neural networks and options), we show that addressing both problems simultaneously is simpler and more efficient in many cases. More specifically, we make the… ▽ More

    Submitted 12 September, 2017; v1 submitted 22 August, 2017; originally announced August 2017.

  44. arXiv:1702.08736  [pdf, other

    cs.MA cs.AI

    Analysing Congestion Problems in Multi-agent Reinforcement Learning

    Authors: Roxana Rădulescu, Peter Vrancx, Ann Nowé

    Abstract: Congestion problems are omnipresent in today's complex networks and represent a challenge in many research domains. In the context of Multi-agent Reinforcement Learning (MARL), approaches like difference rewards and resource abstraction have shown promising results in tackling such problems. Resource abstraction was shown to be an ideal candidate for solving large-scale resource allocation problem… ▽ More

    Submitted 30 March, 2017; v1 submitted 28 February, 2017; originally announced February 2017.

    Comments: Adaptive Learning Agents (ALA) Workshop at AAMAS 2017

    MSC Class: 68T05 ACM Class: I.2.11

  45. Solving stable matching problems using answer set programming

    Authors: Sofie De Clercq, Steven Schockaert, Martine De Cock, Ann Nowé

    Abstract: Since the introduction of the stable marriage problem (SMP) by Gale and Shapley (1962), several variants and extensions have been investigated. While this variety is useful to widen the application potential, each variant requires a new algorithm for finding the stable matchings. To address this issue, we propose an encoding of the SMP using answer set programming (ASP), which can straightforwardl… ▽ More

    Submitted 16 December, 2015; originally announced December 2015.

    Comments: Under consideration in Theory and Practice of Logic Programming (TPLP). arXiv admin note: substantial text overlap with arXiv:1302.7251

    Journal ref: Theory and Practice of Logic Programming 16 (2016) 247-268

  46. arXiv:1502.03248  [pdf, other

    cs.AI

    Off-Policy Reward Sha** with Ensembles

    Authors: Anna Harutyunyan, Tim Brys, Peter Vrancx, Ann Nowe

    Abstract: Potential-based reward sha** (PBRS) is an effective and popular technique to speed up reinforcement learning by leveraging domain knowledge. While PBRS is proven to always preserve optimal policies, its effect on learning speed is determined by the quality of its potential function, which, in turn, depends on both the underlying heuristic and the scale. Knowing which heuristic will prove effecti… ▽ More

    Submitted 23 March, 2015; v1 submitted 11 February, 2015; originally announced February 2015.

    Comments: To be presented at ALA-15. Short version to appear at AAMAS-15

  47. arXiv:1405.5358  [pdf, other

    cs.AI cs.LG

    Off-Policy Sha** Ensembles in Reinforcement Learning

    Authors: Anna Harutyunyan, Tim Brys, Peter Vrancx, Ann Nowe

    Abstract: Recent advances of gradient temporal-difference methods allow to learn off-policy multiple value functions in parallel with- out sacrificing convergence guarantees or computational efficiency. This opens up new possibilities for sound ensemble techniques in reinforcement learning. In this work we propose learning an ensemble of policies related through potential-based sha** rewards. The ensemble… ▽ More

    Submitted 21 May, 2014; originally announced May 2014.

    Comments: Full version of the paper to appear in Proc. ECAI 2014

  48. arXiv:1302.7251  [pdf, ps, other

    cs.AI cs.LO

    Modeling Stable Matching Problems with Answer Set Programming

    Authors: Sofie De Clercq, Steven Schockaert, Martine De Cock, Ann Nowé

    Abstract: The Stable Marriage Problem (SMP) is a well-known matching problem first introduced and solved by Gale and Shapley (1962). Several variants and extensions to this problem have since been investigated to cover a wider set of applications. Each time a new variant is considered, however, a new algorithm needs to be developed and implemented. As an alternative, in this paper we propose an encoding of… ▽ More

    Submitted 2 May, 2013; v1 submitted 28 February, 2013; originally announced February 2013.

    Comments: 26 pages

    MSC Class: 68N17