Search | arXiv e-print repository

Divide and Conquer: Provably Unveiling the Pareto Front with Multi-Objective Reinforcement Learning

Authors: Willem Röpke, Mathieu Reymond, Patrick Mannion, Diederik M. Roijers, Ann Nowé, Roxana Rădulescu

Abstract: A significant challenge in multi-objective reinforcement learning is obtaining a Pareto front of policies that attain optimal performance under different preferences. We introduce Iterated Pareto Referent Optimisation (IPRO), a principled algorithm that decomposes the task of finding the Pareto front into a sequence of single-objective problems for which various solution methods exist. This enable… ▽ More A significant challenge in multi-objective reinforcement learning is obtaining a Pareto front of policies that attain optimal performance under different preferences. We introduce Iterated Pareto Referent Optimisation (IPRO), a principled algorithm that decomposes the task of finding the Pareto front into a sequence of single-objective problems for which various solution methods exist. This enables us to establish convergence guarantees while providing an upper bound on the distance to undiscovered Pareto optimal solutions at each step. Empirical evaluations demonstrate that IPRO matches or outperforms methods that require additional domain knowledge. By leveraging problem-specific single-objective solvers, our approach also holds promise for applications beyond multi-objective reinforcement learning, such as in pathfinding and optimisation. △ Less

Submitted 11 February, 2024; originally announced February 2024.

arXiv:2402.02665 [pdf, ps, other]

Utility-Based Reinforcement Learning: Unifying Single-objective and Multi-objective Reinforcement Learning

Authors: Peter Vamplew, Cameron Foale, Conor F. Hayes, Patrick Mannion, Enda Howley, Richard Dazeley, Scott Johnson, Johan Källström, Gabriel Ramos, Roxana Rădulescu, Willem Röpke, Diederik M. Roijers

Abstract: Research in multi-objective reinforcement learning (MORL) has introduced the utility-based paradigm, which makes use of both environmental rewards and a function that defines the utility derived by the user from those rewards. In this paper we extend this paradigm to the context of single-objective reinforcement learning (RL), and outline multiple potential benefits including the ability to perfor… ▽ More Research in multi-objective reinforcement learning (MORL) has introduced the utility-based paradigm, which makes use of both environmental rewards and a function that defines the utility derived by the user from those rewards. In this paper we extend this paradigm to the context of single-objective reinforcement learning (RL), and outline multiple potential benefits including the ability to perform multi-policy learning across tasks relating to uncertain objectives, risk-aware RL, discounting, and safe RL. We also examine the algorithmic implications of adopting a utility-based approach. △ Less

Submitted 4 February, 2024; originally announced February 2024.

Comments: Accepted for the Blue Sky Track at AAMAS'24

arXiv:2401.12646 [pdf, other]

Emergent Cooperation under Uncertain Incentive Alignment

Authors: Nicole Orzan, Erman Acar, Davide Grossi, Roxana Rădulescu

Abstract: Understanding the emergence of cooperation in systems of computational agents is crucial for the development of effective cooperative AI. Interaction among individuals in real-world settings are often sparse and occur within a broad spectrum of incentives, which often are only partially known. In this work, we explore how cooperation can arise among reinforcement learning agents in scenarios chara… ▽ More Understanding the emergence of cooperation in systems of computational agents is crucial for the development of effective cooperative AI. Interaction among individuals in real-world settings are often sparse and occur within a broad spectrum of incentives, which often are only partially known. In this work, we explore how cooperation can arise among reinforcement learning agents in scenarios characterised by infrequent encounters, and where agents face uncertainty about the alignment of their incentives with those of others. To do so, we train the agents under a wide spectrum of environments ranging from fully competitive, to fully cooperative, to mixed-motives. Under this type of uncertainty we study the effects of mechanisms, such as reputation and intrinsic rewards, that have been proposed in the literature to foster cooperation in mixed-motives environments. Our findings show that uncertainty substantially lowers the agents' ability to engage in cooperative behaviour, when that would be the best course of action. In this scenario, the use of effective reputation mechanisms and intrinsic rewards boosts the agents' capability to act nearly-optimally in cooperative environments, while greatly enhancing cooperation in mixed-motive environments as well. △ Less

Submitted 23 January, 2024; originally announced January 2024.

arXiv:2306.05898 [pdf, other]

You Can Tell a Cybercriminal by the Company they Keep: A Framework to Infer the Relevance of Underground Communities to the Threat Landscape

Authors: Michele Campobasso, Radu Rădulescu, Sylvan Brons, Luca Allodi

Abstract: The criminal underground is populated with forum marketplaces where, allegedly, cybercriminals share and trade knowledge, skills, and cybercrime products. However, it is still unclear whether all marketplaces matter the same in the overall threat landscape. To effectively support trade and avoid degenerating into scams-for-scammers places, underground markets must address fundamental economic prob… ▽ More The criminal underground is populated with forum marketplaces where, allegedly, cybercriminals share and trade knowledge, skills, and cybercrime products. However, it is still unclear whether all marketplaces matter the same in the overall threat landscape. To effectively support trade and avoid degenerating into scams-for-scammers places, underground markets must address fundamental economic problems (such as moral hazard, adverse selection) that enable the exchange of actual technology and cybercrime products (as opposed to repackaged malware or years-old password databases). From the relevant literature and manual investigation, we identify several mechanisms that marketplaces implement to mitigate these problems, and we condense them into a market evaluation framework based on the Business Model Canvas. We use this framework to evaluate which mechanisms `successful' marketplaces have in place, and whether these differ from those employed by `unsuccessful' marketplaces. We test the framework on 23 underground forum markets by searching 836 aliases of indicted cybercriminals to identify `successful' marketplaces. We find evidence that marketplaces whose administrators are impartial in trade, verify their sellers, and have the right economic incentives to keep the market functional are more likely to be credible sources of threat. △ Less

Submitted 13 June, 2023; v1 submitted 9 June, 2023; originally announced June 2023.

Comments: The 22nd Workshop on the Economics of Information Security (WEIS'23), July 05--08, 2023, Geneva, Switzerland

arXiv:2301.05755 [pdf, other]

Bridging the Gap Between Single and Multi Objective Games

Authors: Willem Röpke, Carla Groenland, Roxana Rădulescu, Ann Nowé, Diederik M. Roijers

Abstract: A classic model to study strategic decision making in multi-agent systems is the normal-form game. This model can be generalised to allow for an infinite number of pure strategies leading to continuous games. Multi-objective normal-form games are another generalisation that model settings where players receive separate payoffs in more than one objective. We bridge the gap between the two models by… ▽ More A classic model to study strategic decision making in multi-agent systems is the normal-form game. This model can be generalised to allow for an infinite number of pure strategies leading to continuous games. Multi-objective normal-form games are another generalisation that model settings where players receive separate payoffs in more than one objective. We bridge the gap between the two models by providing a theoretical guarantee that a game from one setting can always be transformed to a game in the other. We extend the theoretical results to include guaranteed equivalence of Nash equilibria. The map** makes it possible to apply algorithms from one field to the other. We demonstrate this by introducing a fictitious play algorithm for multi-objective games and subsequently applying it to two well-known continuous games. We believe the equivalence relation will lend itself to new insights by translating the theoretical guarantees from one formalism to another. Moreover, it may lead to new computational approaches for continuous games when a problem is more naturally solved in the succinct format of multi-objective games. △ Less

Submitted 1 March, 2023; v1 submitted 13 January, 2023; originally announced January 2023.

Comments: Accepted to AAMAS 2023

arXiv:2204.05027 [pdf, ps, other]

Exploring the Pareto front of multi-objective COVID-19 mitigation policies using reinforcement learning

Authors: Mathieu Reymond, Conor F. Hayes, Lander Willem, Roxana Rădulescu, Steven Abrams, Diederik M. Roijers, Enda Howley, Patrick Mannion, Niel Hens, Ann Nowé, Pieter Libin

Abstract: Infectious disease outbreaks can have a disruptive impact on public health and societal processes. As decision making in the context of epidemic mitigation is hard, reinforcement learning provides a methodology to automatically learn prevention strategies in combination with complex epidemic models. Current research focuses on optimizing policies w.r.t. a single objective, such as the pathogen's a… ▽ More Infectious disease outbreaks can have a disruptive impact on public health and societal processes. As decision making in the context of epidemic mitigation is hard, reinforcement learning provides a methodology to automatically learn prevention strategies in combination with complex epidemic models. Current research focuses on optimizing policies w.r.t. a single objective, such as the pathogen's attack rate. However, as the mitigation of epidemics involves distinct, and possibly conflicting criteria (i.a., prevalence, mortality, morbidity, cost), a multi-objective approach is warranted to learn balanced policies. To lift this decision-making process to real-world epidemic models, we apply deep multi-objective reinforcement learning and build upon a state-of-the-art algorithm, Pareto Conditioned Networks (PCN), to learn a set of solutions that approximates the Pareto front of the decision problem. We consider the first wave of the Belgian COVID-19 epidemic, which was mitigated by a lockdown, and study different deconfinement strategies, aiming to minimize both COVID-19 cases (i.e., infections and hospitalizations) and the societal burden that is induced by the applied mitigation measures. We contribute a multi-objective Markov decision process that encapsulates the stochastic compartment model that was used to inform policy makers during the COVID-19 epidemic. As these social mitigation measures are implemented in a continuous action space that modulates the contact matrix of the age-structured epidemic model, we extend PCN to this setting. We evaluate the solution returned by PCN, and observe that it correctly learns to reduce the social burden whenever the hospitalization rates are sufficiently low. In this work, we thus show that multi-objective reinforcement learning is attainable in complex epidemiological models and provides essential insights to balance complex mitigation policies. △ Less

Submitted 11 April, 2022; originally announced April 2022.

arXiv:2112.15422 [pdf, other]

Scalar reward is not enough: A response to Silver, Singh, Precup and Sutton (2021)

Authors: Peter Vamplew, Benjamin J. Smith, Johan Kallstrom, Gabriel Ramos, Roxana Radulescu, Diederik M. Roijers, Conor F. Hayes, Fredrik Heintz, Patrick Mannion, Pieter J. K. Libin, Richard Dazeley, Cameron Foale

Abstract: The recent paper `"Reward is Enough" by Silver, Singh, Precup and Sutton posits that the concept of reward maximisation is sufficient to underpin all intelligence, both natural and artificial. We contest the underlying assumption of Silver et al. that such reward can be scalar-valued. In this paper we explain why scalar rewards are insufficient to account for some aspects of both biological and co… ▽ More The recent paper `"Reward is Enough" by Silver, Singh, Precup and Sutton posits that the concept of reward maximisation is sufficient to underpin all intelligence, both natural and artificial. We contest the underlying assumption of Silver et al. that such reward can be scalar-valued. In this paper we explain why scalar rewards are insufficient to account for some aspects of both biological and computational intelligence, and argue in favour of explicitly multi-objective models of reward maximisation. Furthermore, we contend that even if scalar reward functions can trigger intelligent behaviour in specific cases, it is still undesirable to use this approach for the development of artificial general intelligence due to unacceptable risks of unsafe or unethical behaviour. △ Less

Submitted 24 November, 2021; originally announced December 2021.

arXiv:2112.06500 [pdf, other]

On Nash Equilibria in Normal-Form Games With Vectorial Payoffs

Authors: Willem Röpke, Diederik M. Roijers, Ann Nowé, Roxana Rădulescu

Abstract: We provide an in-depth study of Nash equilibria in multi-objective normal form games (MONFGs), i.e., normal form games with vectorial payoffs. Taking a utility-based approach, we assume that each player's utility can be modelled with a utility function that maps a vector to a scalar utility. In the case of a mixed strategy, it is meaningful to apply such a scalarisation both before calculating the… ▽ More We provide an in-depth study of Nash equilibria in multi-objective normal form games (MONFGs), i.e., normal form games with vectorial payoffs. Taking a utility-based approach, we assume that each player's utility can be modelled with a utility function that maps a vector to a scalar utility. In the case of a mixed strategy, it is meaningful to apply such a scalarisation both before calculating the expectation of the payoff vector as well as after. This distinction leads to two optimisation criteria. With the first criterion, players aim to optimise the expected value of their utility function applied to the payoff vectors obtained in the game. With the second criterion, players aim to optimise the utility of expected payoff vectors given a joint strategy. Under this latter criterion, it was shown that Nash equilibria need not exist. Our first contribution is to provide a sufficient condition under which Nash equilibria are guaranteed to exist. Secondly, we show that when Nash equilibria do exist under both criteria, no equilibrium needs to be shared between the two criteria, and even the number of equilibria can differ. Thirdly, we contribute a study of pure strategy Nash equilibria under both criteria. We show that when assuming quasiconvex utility functions for players, the sets of pure strategy Nash equilibria under both optimisation criteria are equivalent. This result is further extended to games in which players adhere to different optimisation criteria. Finally, given these theoretical results, we construct an algorithm to compute all pure strategy Nash equilibria in MONFGs where players have a quasiconvex utility function. △ Less

Submitted 16 July, 2022; v1 submitted 13 December, 2021; originally announced December 2021.

arXiv:2111.09191 [pdf, other]

Preference Communication in Multi-Objective Normal-Form Games

Authors: Willem Röpke, Diederik M. Roijers, Ann Nowé, Roxana Rădulescu

Abstract: We consider preference communication in two-player multi-objective normal-form games. In such games, the payoffs resulting from joint actions are vector-valued. Taking a utility-based approach, we assume there exists a utility function for each player which maps vectors to scalar utilities and consider agents that aim to maximise the utility of expected payoff vectors. As agents typically do not k… ▽ More We consider preference communication in two-player multi-objective normal-form games. In such games, the payoffs resulting from joint actions are vector-valued. Taking a utility-based approach, we assume there exists a utility function for each player which maps vectors to scalar utilities and consider agents that aim to maximise the utility of expected payoff vectors. As agents typically do not know their opponent's utility function or strategy, they must learn policies to interact with each other. Inspired by Stackelberg games, we introduce four novel preference communication protocols to aid agents in arriving at adequate solutions. Each protocol describes a specific approach for one agent to communicate preferences over their actions and how another agent responds. Additionally, to study when communication emerges, we introduce a communication protocol where agents must learn when to communicate. These protocols are subsequently evaluated on a set of five benchmark games against baseline agents that do not communicate. We find that preference communication can alter the learning process and lead to the emergence of cyclic policies which had not been previously observed in this setting. We further observe that the resulting policies can heavily depend on the characteristics of the game that is played. Lastly, we find that communication naturally emerges in both cooperative and self-interested settings. △ Less

Submitted 10 June, 2022; v1 submitted 17 November, 2021; originally announced November 2021.

arXiv:2103.09568 [pdf, other]

doi 10.1007/s10458-022-09552-y

A Practical Guide to Multi-Objective Reinforcement Learning and Planning

Authors: Conor F. Hayes, Roxana Rădulescu, Eugenio Bargiacchi, Johan Källström, Matthew Macfarlane, Mathieu Reymond, Timothy Verstraeten, Luisa M. Zintgraf, Richard Dazeley, Fredrik Heintz, Enda Howley, Athirai A. Irissappane, Patrick Mannion, Ann Nowé, Gabriel Ramos, Marcello Restelli, Peter Vamplew, Diederik M. Roijers

Abstract: Real-world decision-making tasks are generally complex, requiring trade-offs between multiple, often conflicting, objectives. Despite this, the majority of research in reinforcement learning and decision-theoretic planning either assumes only a single objective, or that multiple objectives can be adequately handled via a simple linear combination. Such approaches may oversimplify the underlying pr… ▽ More Real-world decision-making tasks are generally complex, requiring trade-offs between multiple, often conflicting, objectives. Despite this, the majority of research in reinforcement learning and decision-theoretic planning either assumes only a single objective, or that multiple objectives can be adequately handled via a simple linear combination. Such approaches may oversimplify the underlying problem and hence produce suboptimal results. This paper serves as a guide to the application of multi-objective methods to difficult problems, and is aimed at researchers who are already familiar with single-objective reinforcement learning and planning methods who wish to adopt a multi-objective perspective on their research, as well as practitioners who encounter multi-objective decision problems in practice. It identifies the factors that may influence the nature of the desired solution, and illustrates by example how these influence the design of multi-objective decision-making systems for complex problems. △ Less

Submitted 17 March, 2021; originally announced March 2021.

Journal ref: Auton Agent Multi-Agent Syst 36, 26 (2022)

arXiv:2011.07290 [pdf, other]

Opponent Learning Awareness and Modelling in Multi-Objective Normal Form Games

Authors: Roxana Rădulescu, Timothy Verstraeten, Yijie Zhang, Patrick Mannion, Diederik M. Roijers, Ann Nowé

Abstract: Many real-world multi-agent interactions consider multiple distinct criteria, i.e. the payoffs are multi-objective in nature. However, the same multi-objective payoff vector may lead to different utilities for each participant. Therefore, it is essential for an agent to learn about the behaviour of other agents in the system. In this work, we present the first study of the effects of such opponent… ▽ More Many real-world multi-agent interactions consider multiple distinct criteria, i.e. the payoffs are multi-objective in nature. However, the same multi-objective payoff vector may lead to different utilities for each participant. Therefore, it is essential for an agent to learn about the behaviour of other agents in the system. In this work, we present the first study of the effects of such opponent modelling on multi-objective multi-agent interactions with non-linear utilities. Specifically, we consider two-player multi-objective normal form games with non-linear utility functions under the scalarised expected returns optimisation criterion. We contribute novel actor-critic and policy gradient formulations to allow reinforcement learning of mixed strategies in this setting, along with extensions that incorporate opponent policy reconstruction and learning with opponent learning awareness (i.e., learning while considering the impact of one's policy when anticipating the opponent's learning step). Empirical results in five different MONFGs demonstrate that opponent learning awareness and modelling can drastically alter the learning dynamics in this setting. When equilibria are present, opponent modelling can confer significant benefits on agents that implement it. When there are no Nash equilibria, opponent learning awareness and modelling allows agents to still converge to meaningful solutions that approximate equilibria. △ Less

Submitted 14 November, 2020; originally announced November 2020.

Comments: Under review since 14 November 2020

arXiv:2001.08177 [pdf, other]

doi 10.1017/S0269888920000351

A utility-based analysis of equilibria in multi-objective normal form games

Authors: Roxana Rădulescu, Patrick Mannion, Yijie Zhang, Diederik M. Roijers, Ann Nowé

Abstract: In multi-objective multi-agent systems (MOMAS), agents explicitly consider the possible tradeoffs between conflicting objective functions. We argue that compromises between competing objectives in MOMAS should be analysed on the basis of the utility that these compromises have for the users of a system, where an agent's utility function maps their payoff vectors to scalar utility values. This util… ▽ More In multi-objective multi-agent systems (MOMAS), agents explicitly consider the possible tradeoffs between conflicting objective functions. We argue that compromises between competing objectives in MOMAS should be analysed on the basis of the utility that these compromises have for the users of a system, where an agent's utility function maps their payoff vectors to scalar utility values. This utility-based approach naturally leads to two different optimisation criteria for agents in a MOMAS: expected scalarised returns (ESR) and scalarised expected returns (SER). In this article, we explore the differences between these two criteria using the framework of multi-objective normal form games (MONFGs). We demonstrate that the choice of optimisation criterion (ESR or SER) can radically alter the set of equilibria in a MONFG when non-linear utility functions are used. △ Less

Submitted 17 January, 2020; originally announced January 2020.

Comments: Under review since 16 January 2020

arXiv:1909.02964 [pdf, other]

doi 10.1007/s10458-019-09433-x

Multi-Objective Multi-Agent Decision Making: A Utility-based Analysis and Survey

Authors: Roxana Rădulescu, Patrick Mannion, Diederik M. Roijers, Ann Nowé

Abstract: The majority of multi-agent system (MAS) implementations aim to optimise agents' policies with respect to a single objective, despite the fact that many real-world problem domains are inherently multi-objective in nature. Multi-objective multi-agent systems (MOMAS) explicitly consider the possible trade-offs between conflicting objective functions. We argue that, in MOMAS, such compromises should… ▽ More The majority of multi-agent system (MAS) implementations aim to optimise agents' policies with respect to a single objective, despite the fact that many real-world problem domains are inherently multi-objective in nature. Multi-objective multi-agent systems (MOMAS) explicitly consider the possible trade-offs between conflicting objective functions. We argue that, in MOMAS, such compromises should be analysed on the basis of the utility that these compromises have for the users of a system. As is standard in multi-objective optimisation, we model the user utility using utility functions that map value or return vectors to scalar values. This approach naturally leads to two different optimisation criteria: expected scalarised returns (ESR) and scalarised expected returns (SER). We develop a new taxonomy which classifies multi-objective multi-agent decision making settings, on the basis of the reward structures, and which and how utility functions are applied. This allows us to offer a structured view of the field, to clearly delineate the current state-of-the-art in multi-objective multi-agent decision making approaches and to identify promising directions for future research. Starting from the execution phase, in which the selected policies are applied and the utility for the users is attained, we analyse which solution concepts apply to the different settings in our taxonomy. Furthermore, we define and discuss these solution concepts under both ESR and SER optimisation criteria. We conclude with a summary of our main findings and a discussion of many promising future research directions in multi-objective multi-agent systems. △ Less

Submitted 6 September, 2019; originally announced September 2019.

Comments: Under review since 15 May 2019

arXiv:1702.08736 [pdf, other]

Analysing Congestion Problems in Multi-agent Reinforcement Learning

Authors: Roxana Rădulescu, Peter Vrancx, Ann Nowé

Abstract: Congestion problems are omnipresent in today's complex networks and represent a challenge in many research domains. In the context of Multi-agent Reinforcement Learning (MARL), approaches like difference rewards and resource abstraction have shown promising results in tackling such problems. Resource abstraction was shown to be an ideal candidate for solving large-scale resource allocation problem… ▽ More Congestion problems are omnipresent in today's complex networks and represent a challenge in many research domains. In the context of Multi-agent Reinforcement Learning (MARL), approaches like difference rewards and resource abstraction have shown promising results in tackling such problems. Resource abstraction was shown to be an ideal candidate for solving large-scale resource allocation problems in a fully decentralized manner. However, its performance and applicability strongly depends on some, until now, undocumented assumptions. Two of the main congestion benchmark problems considered in the literature are: the Beach Problem Domain and the Traffic Lane Domain. In both settings the highest system utility is achieved when overcrowding one resource and kee** the rest at optimum capacity. We analyse how abstract grou** can promote this behaviour and how feasible it is to apply this approach in a real-world domain (i.e., what assumptions need to be satisfied and what knowledge is necessary). We introduce a new test problem, the Road Network Domain (RND), where the resources are no longer independent, but rather part of a network (e.g., road network), thus choosing one path will also impact the load on other paths having common road segments. We demonstrate the application of state-of-the-art MARL methods for this new congestion model and analyse their performance. RND allows us to highlight an important limitation of resource abstraction and show that the difference rewards approach manages to better capture and inform the agents about the dynamics of the environment. △ Less

Submitted 30 March, 2017; v1 submitted 28 February, 2017; originally announced February 2017.

Comments: Adaptive Learning Agents (ALA) Workshop at AAMAS 2017

MSC Class: 68T05 ACM Class: I.2.11

Showing 1–14 of 14 results for author: Rădulescu, R