-
On the Stability of Learning in Network Games with Many Players
Authors:
Aamal Hussain,
Dan Leonte,
Francesco Belardinelli,
Georgios Piliouras
Abstract:
Multi-agent learning algorithms have been shown to display complex, unstable behaviours in a wide array of games. In fact, previous works indicate that convergent behaviours are less likely to occur as the total number of agents increases. This seemingly prohibits convergence to stable strategies, such as Nash Equilibria, in games with many players.
To make progress towards addressing this chall…
▽ More
Multi-agent learning algorithms have been shown to display complex, unstable behaviours in a wide array of games. In fact, previous works indicate that convergent behaviours are less likely to occur as the total number of agents increases. This seemingly prohibits convergence to stable strategies, such as Nash Equilibria, in games with many players.
To make progress towards addressing this challenge we study the Q-Learning Dynamics, a classical model for exploration and exploitation in multi-agent learning. In particular, we study the behaviour of Q-Learning on games where interactions between agents are constrained by a network. We determine a number of sufficient conditions, depending on the game and network structure, which guarantee that agent strategies converge to a unique stable strategy, called the Quantal Response Equilibrium (QRE). Crucially, these sufficient conditions are independent of the total number of agents, allowing for provable convergence in arbitrarily large games.
Next, we compare the learned QRE to the underlying NE of the game, by showing that any QRE is an $ε$-approximate Nash Equilibrium. We first provide tight bounds on $ε$ and show how these bounds lead naturally to a centralised scheme for choosing exploration rates, which enables independent learners to learn stable approximate Nash Equilibrium strategies. We validate the method through experiments and demonstrate its effectiveness even in the presence of numerous agents and actions. Through these results, we show that independent learning dynamics may converge to approximate Nash Equilibria, even in the presence of many agents.
△ Less
Submitted 23 March, 2024;
originally announced March 2024.
-
The Reasons that Agents Act: Intention and Instrumental Goals
Authors:
Francis Rhys Ward,
Matt MacDermott,
Francesco Belardinelli,
Francesca Toni,
Tom Everitt
Abstract:
Intention is an important and challenging concept in AI. It is important because it underlies many other concepts we care about, such as agency, manipulation, legal responsibility, and blame. However, ascribing intent to AI systems is contentious, and there is no universally accepted theory of intention applicable to AI agents. We operationalise the intention with which an agent acts, relating to…
▽ More
Intention is an important and challenging concept in AI. It is important because it underlies many other concepts we care about, such as agency, manipulation, legal responsibility, and blame. However, ascribing intent to AI systems is contentious, and there is no universally accepted theory of intention applicable to AI agents. We operationalise the intention with which an agent acts, relating to the reasons it chooses its decision. We introduce a formal definition of intention in structural causal influence models, grounded in the philosophy literature on intent and applicable to real-world machine learning systems. Through a number of examples and results, we show that our definition captures the intuitive notion of intent and satisfies desiderata set-out by past work. In addition, we show how our definition relates to past concepts, including actual causality, and the notion of instrumental goals, which is a core idea in the literature on safe AI agents. Finally, we demonstrate how our definition can be used to infer the intentions of reinforcement learning agents and language models from their behaviour.
△ Less
Submitted 15 February, 2024; v1 submitted 11 February, 2024;
originally announced February 2024.
-
Leveraging Approximate Model-based Shielding for Probabilistic Safety Guarantees in Continuous Environments
Authors:
Alexander W. Goodall,
Francesco Belardinelli
Abstract:
Shielding is a popular technique for achieving safe reinforcement learning (RL). However, classical shielding approaches come with quite restrictive assumptions making them difficult to deploy in complex environments, particularly those with continuous state or action spaces. In this paper we extend the more versatile approximate model-based shielding (AMBS) framework to the continuous setting. In…
▽ More
Shielding is a popular technique for achieving safe reinforcement learning (RL). However, classical shielding approaches come with quite restrictive assumptions making them difficult to deploy in complex environments, particularly those with continuous state or action spaces. In this paper we extend the more versatile approximate model-based shielding (AMBS) framework to the continuous setting. In particular we use Safety Gym as our test-bed, allowing for a more direct comparison of AMBS with popular constrained RL algorithms. We also provide strong probabilistic safety guarantees for the continuous setting. In addition, we propose two novel penalty techniques that directly modify the policy gradient, which empirically provide more stable convergence in our experiments.
△ Less
Submitted 1 February, 2024;
originally announced February 2024.
-
Stability of Multi-Agent Learning in Competitive Networks: Delaying the Onset of Chaos
Authors:
Aamal Hussain,
Francesco Belardinelli
Abstract:
The behaviour of multi-agent learning in competitive network games is often studied within the context of zero-sum games, in which convergence guarantees may be obtained. However, outside of this class the behaviour of learning is known to display complex behaviours and convergence cannot be always guaranteed. Nonetheless, in order to develop a complete picture of the behaviour of multi-agent lear…
▽ More
The behaviour of multi-agent learning in competitive network games is often studied within the context of zero-sum games, in which convergence guarantees may be obtained. However, outside of this class the behaviour of learning is known to display complex behaviours and convergence cannot be always guaranteed. Nonetheless, in order to develop a complete picture of the behaviour of multi-agent learning in competitive settings, the zero-sum assumption must be lifted. Motivated by this we study the Q-Learning dynamics, a popular model of exploration and exploitation in multi-agent learning, in competitive network games. We determine how the degree of competition, exploration rate and network connectivity impact the convergence of Q-Learning. To study generic competitive games, we parameterise network games in terms of correlations between agent payoffs and study the average behaviour of the Q-Learning dynamics across all games drawn from a choice of this parameter. This statistical approach establishes choices of parameters for which Q-Learning dynamics converge to a stable fixed point. Differently to previous works, we find that the stability of Q-Learning is explicitly dependent only on the network connectivity rather than the total number of agents. Our experiments validate these findings and show that, under certain network structures, the total number of agents can be increased without increasing the likelihood of unstable or chaotic behaviours.
△ Less
Submitted 19 December, 2023;
originally announced December 2023.
-
Honesty Is the Best Policy: Defining and Mitigating AI Deception
Authors:
Francis Rhys Ward,
Francesco Belardinelli,
Francesca Toni,
Tom Everitt
Abstract:
Deceptive agents are a challenge for the safety, trustworthiness, and cooperation of AI systems. We focus on the problem that agents might deceive in order to achieve their goals (for instance, in our experiments with language models, the goal of being evaluated as truthful). There are a number of existing definitions of deception in the literature on game theory and symbolic AI, but there is no o…
▽ More
Deceptive agents are a challenge for the safety, trustworthiness, and cooperation of AI systems. We focus on the problem that agents might deceive in order to achieve their goals (for instance, in our experiments with language models, the goal of being evaluated as truthful). There are a number of existing definitions of deception in the literature on game theory and symbolic AI, but there is no overarching theory of deception for learning agents in games. We introduce a formal definition of deception in structural causal games, grounded in the philosophy literature, and applicable to real-world machine learning systems. Several examples and results illustrate that our formal definition aligns with the philosophical and commonsense meaning of deception. Our main technical result is to provide graphical criteria for deception. We show, experimentally, that these results can be used to mitigate deception in reinforcement learning agents and language models.
△ Less
Submitted 3 December, 2023;
originally announced December 2023.
-
3vLTL: A Tool to Generate Automata for Three-valued LTL
Authors:
Francesco Belardinelli,
Angelo Ferrando,
Vadim Malvone
Abstract:
Multi-valued logics have a long tradition in the literature on system verification, including run-time verification. However, comparatively fewer model-checking tools have been developed for multi-valued specification languages. We present 3vLTL, a tool to generate Buchi automata from formulas in Linear-time Temporal Logic (LTL) interpreted on a three-valued semantics. Given an LTL formula, a set…
▽ More
Multi-valued logics have a long tradition in the literature on system verification, including run-time verification. However, comparatively fewer model-checking tools have been developed for multi-valued specification languages. We present 3vLTL, a tool to generate Buchi automata from formulas in Linear-time Temporal Logic (LTL) interpreted on a three-valued semantics. Given an LTL formula, a set of atomic propositions as the alphabet for the automaton, and a truth value, our procedure generates a Buchi automaton that accepts all the words that assign the chosen truth value to the LTL formula. Given the particular type of the output of the tool, it can also be seamlessly processed by third-party libraries in a natural way. That is, the Buchi automaton can then be used in the context of formal verification to check whether an LTL formula is true, false, or undefined on a given model.
△ Less
Submitted 16 November, 2023;
originally announced November 2023.
-
Strategic Abilities of Forgetful Agents in Stochastic Environments
Authors:
Francesco Belardinelli,
Wojciech Jamroga,
Munyque Mittelmann,
Aniello Murano
Abstract:
In this paper, we investigate the probabilistic variants of the strategy logics ATL and ATL* under imperfect information. Specifically, we present novel decidability and complexity results when the model transitions are stochastic and agents play uniform strategies. That is, the semantics of the logics are based on multi-agent, stochastic transition systems with imperfect information, which combin…
▽ More
In this paper, we investigate the probabilistic variants of the strategy logics ATL and ATL* under imperfect information. Specifically, we present novel decidability and complexity results when the model transitions are stochastic and agents play uniform strategies. That is, the semantics of the logics are based on multi-agent, stochastic transition systems with imperfect information, which combine two sources of uncertainty, namely, the partial observability agents have on the environment, and the likelihood of transitions to occur from a system state. Since the model checking problem is undecidable in general in this setting, we restrict our attention to agents with memoryless (positional) strategies. The resulting setting captures the situation in which agents have qualitative uncertainty of the local state and quantitative uncertainty about the occurrence of future events. We illustrate the usefulness of this setting with meaningful examples.
△ Less
Submitted 26 October, 2023;
originally announced October 2023.
-
Scalable Verification of Strategy Logic through Three-valued Abstraction
Authors:
Francesco Belardinelli,
Angelo Ferrando,
Wojciech Jamroga,
Vadim Malvone,
Aniello Murano
Abstract:
The model checking problem for multi-agent systems against Strategy Logic specifications is known to be non-elementary. On this logic several fragments have been defined to tackle this issue but at the expense of expressiveness. In this paper, we propose a three-valued semantics for Strategy Logic upon which we define an abstraction method. We show that the latter semantics is an approximation of…
▽ More
The model checking problem for multi-agent systems against Strategy Logic specifications is known to be non-elementary. On this logic several fragments have been defined to tackle this issue but at the expense of expressiveness. In this paper, we propose a three-valued semantics for Strategy Logic upon which we define an abstraction method. We show that the latter semantics is an approximation of the classic two-valued one for Strategy Logic. Furthermore, we extend MCMAS, an open-source model checker for multi-agent specifications, to incorporate our abstraction method and present some promising experimental results.
△ Less
Submitted 26 October, 2023;
originally announced October 2023.
-
Approximate Model-Based Shielding for Safe Reinforcement Learning
Authors:
Alexander W. Goodall,
Francesco Belardinelli
Abstract:
Reinforcement learning (RL) has shown great potential for solving complex tasks in a variety of domains. However, applying RL to safety-critical systems in the real-world is not easy as many algorithms are sample-inefficient and maximising the standard RL objective comes with no guarantees on worst-case performance. In this paper we propose approximate model-based shielding (AMBS), a principled lo…
▽ More
Reinforcement learning (RL) has shown great potential for solving complex tasks in a variety of domains. However, applying RL to safety-critical systems in the real-world is not easy as many algorithms are sample-inefficient and maximising the standard RL objective comes with no guarantees on worst-case performance. In this paper we propose approximate model-based shielding (AMBS), a principled look-ahead shielding algorithm for verifying the performance of learned RL policies w.r.t. a set of given safety constraints. Our algorithm differs from other shielding approaches in that it does not require prior knowledge of the safety-relevant dynamics of the system. We provide a strong theoretical justification for AMBS and demonstrate superior performance to other safety-aware approaches on a set of Atari games with state-dependent safety-labels.
△ Less
Submitted 27 July, 2023;
originally announced August 2023.
-
Beyond Strict Competition: Approximate Convergence of Multi Agent Q-Learning Dynamics
Authors:
Aamal Hussain,
Francesco Belardinelli,
Georgios Piliouras
Abstract:
The behaviour of multi-agent learning in competitive settings is often considered under the restrictive assumption of a zero-sum game. Only under this strict requirement is the behaviour of learning well understood; beyond this, learning dynamics can often display non-convergent behaviours which prevent fixed-point analysis. Nonetheless, many relevant competitive games do not satisfy the zero-sum…
▽ More
The behaviour of multi-agent learning in competitive settings is often considered under the restrictive assumption of a zero-sum game. Only under this strict requirement is the behaviour of learning well understood; beyond this, learning dynamics can often display non-convergent behaviours which prevent fixed-point analysis. Nonetheless, many relevant competitive games do not satisfy the zero-sum assumption.
Motivated by this, we study a smooth variant of Q-Learning, a popular reinforcement learning dynamics which balances the agents' tendency to maximise their payoffs with their propensity to explore the state space. We examine this dynamic in games which are `close' to network zero-sum games and find that Q-Learning converges to a neighbourhood around a unique equilibrium. The size of the neighbourhood is determined by the `distance' to the zero-sum game, as well as the exploration rates of the agents. We complement these results by providing a method whereby, given an arbitrary network game, the `nearest' network zero-sum game can be found efficiently. As our experiments show, these guarantees are independent of whether the dynamics ultimately reach an equilibrium, or remain non-convergent.
△ Less
Submitted 25 July, 2023;
originally announced July 2023.
-
Stability of Multi-Agent Learning: Convergence in Network Games with Many Players
Authors:
Aamal Hussain,
Dan Leonte,
Francesco Belardinelli,
Georgios Piliouras
Abstract:
The behaviour of multi-agent learning in many player games has been shown to display complex dynamics outside of restrictive examples such as network zero-sum games. In addition, it has been shown that convergent behaviour is less likely to occur as the number of players increase. To make progress in resolving this problem, we study Q-Learning dynamics and determine a sufficient condition for the…
▽ More
The behaviour of multi-agent learning in many player games has been shown to display complex dynamics outside of restrictive examples such as network zero-sum games. In addition, it has been shown that convergent behaviour is less likely to occur as the number of players increase. To make progress in resolving this problem, we study Q-Learning dynamics and determine a sufficient condition for the dynamics to converge to a unique equilibrium in any network game. We find that this condition depends on the nature of pairwise interactions and on the network structure, but is explicitly independent of the total number of agents in the game. We evaluate this result on a number of representative network games and show that, under suitable network conditions, stable learning dynamics can be achieved with an arbitrary number of agents.
△ Less
Submitted 25 July, 2023;
originally announced July 2023.
-
Characterising Decision Theories with Mechanised Causal Graphs
Authors:
Matt MacDermott,
Tom Everitt,
Francesco Belardinelli
Abstract:
How should my own decisions affect my beliefs about the outcomes I expect to achieve? If taking a certain action makes me view myself as a certain type of person, it might affect how I think others view me, and how I view others who are similar to me. This can influence my expected utility calculations and change which action I perceive to be best. Whether and how it should is subject to debate, w…
▽ More
How should my own decisions affect my beliefs about the outcomes I expect to achieve? If taking a certain action makes me view myself as a certain type of person, it might affect how I think others view me, and how I view others who are similar to me. This can influence my expected utility calculations and change which action I perceive to be best. Whether and how it should is subject to debate, with contenders for how to think about it including evidential decision theory, causal decision theory, and functional decision theory. In this paper, we show that mechanised causal models can be used to characterise and differentiate the most important decision theories, and generate a taxonomy of different decision theories.
△ Less
Submitted 20 July, 2023;
originally announced July 2023.
-
Approximate Shielding of Atari Agents for Safe Exploration
Authors:
Alexander W. Goodall,
Francesco Belardinelli
Abstract:
Balancing exploration and conservatism in the constrained setting is an important problem if we are to use reinforcement learning for meaningful tasks in the real world. In this paper, we propose a principled algorithm for safe exploration based on the concept of shielding. Previous approaches to shielding assume access to a safety-relevant abstraction of the environment or a high-fidelity simulat…
▽ More
Balancing exploration and conservatism in the constrained setting is an important problem if we are to use reinforcement learning for meaningful tasks in the real world. In this paper, we propose a principled algorithm for safe exploration based on the concept of shielding. Previous approaches to shielding assume access to a safety-relevant abstraction of the environment or a high-fidelity simulator. Instead, our work is based on latent shielding - another approach that leverages world models to verify policy roll-outs in the latent space of a learned dynamics model. Our novel algorithm builds on this previous work, using safety critics and other additional features to improve the stability and farsightedness of the algorithm. We demonstrate the effectiveness of our approach by running experiments on a small set of Atari games with state dependent safety labels. We present preliminary results that show our approximate shielding algorithm effectively reduces the rate of safety violations, and in some cases improves the speed of convergence and quality of the final agent.
△ Less
Submitted 21 April, 2023;
originally announced April 2023.
-
Asymptotic Convergence and Performance of Multi-Agent Q-Learning Dynamics
Authors:
Aamal Abbas Hussain,
Francesco Belardinelli,
Georgios Piliouras
Abstract:
Achieving convergence of multiple learning agents in general $N$-player games is imperative for the development of safe and reliable machine learning (ML) algorithms and their application to autonomous systems. Yet it is known that, outside the bounds of simple two-player games, convergence cannot be taken for granted.
To make progress in resolving this problem, we study the dynamics of smooth Q…
▽ More
Achieving convergence of multiple learning agents in general $N$-player games is imperative for the development of safe and reliable machine learning (ML) algorithms and their application to autonomous systems. Yet it is known that, outside the bounds of simple two-player games, convergence cannot be taken for granted.
To make progress in resolving this problem, we study the dynamics of smooth Q-Learning, a popular reinforcement learning algorithm which quantifies the tendency for learning agents to explore their state space or exploit their payoffs. We show a sufficient condition on the rate of exploration such that the Q-Learning dynamics is guaranteed to converge to a unique equilibrium in any game. We connect this result to games for which Q-Learning is known to converge with arbitrary exploration rates, including weighted Potential games and weighted zero sum polymatrix games.
Finally, we examine the performance of the Q-Learning dynamic as measured by the Time Averaged Social Welfare, and comparing this with the Social Welfare achieved by the equilibrium. We provide a sufficient condition whereby the Q-Learning dynamic will outperform the equilibrium even if the dynamics do not converge.
△ Less
Submitted 23 January, 2023;
originally announced January 2023.
-
Argumentative Reward Learning: Reasoning About Human Preferences
Authors:
Francis Rhys Ward,
Francesco Belardinelli,
Francesca Toni
Abstract:
We define a novel neuro-symbolic framework, argumentative reward learning, which combines preference-based argumentation with existing approaches to reinforcement learning from human feedback. Our method improves prior work by generalising human preferences, reducing the burden on the user and increasing the robustness of the reward model. We demonstrate this with a number of experiments.
We define a novel neuro-symbolic framework, argumentative reward learning, which combines preference-based argumentation with existing approaches to reinforcement learning from human feedback. Our method improves prior work by generalising human preferences, reducing the burden on the user and increasing the robustness of the reward model. We demonstrate this with a number of experiments.
△ Less
Submitted 28 September, 2022;
originally announced September 2022.
-
Program Semantics and a Verification Technique for Knowledge-Based Multi-Agent Systems
Authors:
Francesco Belardinelli,
Ioana Boureanu,
Vadim Malvone,
Solofomampionona Fortunat Rajaona
Abstract:
We give a relational and a weakest precondition semantics for "knowledge-based programs", i.e., programs that restrict observability of variables so as to richly express changes in the knowledge of agents who can or cannot observe said variables. Based on these knowledge-based programs, we define a program-epistemic logic to model complex epistemic properties of the execution of multi-agent system…
▽ More
We give a relational and a weakest precondition semantics for "knowledge-based programs", i.e., programs that restrict observability of variables so as to richly express changes in the knowledge of agents who can or cannot observe said variables. Based on these knowledge-based programs, we define a program-epistemic logic to model complex epistemic properties of the execution of multi-agent systems. We translate the validity of program-epistemic logic formulae into first-order validity, using our weakest precondition semantics and an ingenious book-kee** of variable assignment. We implement our translation in Haskell in a general way (i.e., independently of the programs in the logical statements), and test this novel verification method for our new program-epistemic logic on a series of well-established examples.
△ Less
Submitted 6 July, 2022; v1 submitted 28 June, 2022;
originally announced June 2022.
-
A Sahlqvist-style Correspondence Theorem for Linear-time Temporal Logic
Authors:
Rui Li,
Francesco Belardinelli
Abstract:
The language of modal logic is capable of expressing first-order conditions on Kripke frames. The classic result by Henrik Sahlqvist identifies a significant class of modal formulas for which first-order conditions -- or Sahlqvist correspondents -- can be find in an effective, algorithmic way. Recent works have successfully extended this classic result to more complex modal languages. In this pape…
▽ More
The language of modal logic is capable of expressing first-order conditions on Kripke frames. The classic result by Henrik Sahlqvist identifies a significant class of modal formulas for which first-order conditions -- or Sahlqvist correspondents -- can be find in an effective, algorithmic way. Recent works have successfully extended this classic result to more complex modal languages. In this paper, we pursue a similar line and develop a Sahlqvist-style correspondence theorem for Linear-time Temporal Logic (LTL), which is one of the most widely used formal languages for temporal specification. LTL extends the syntax of basic modal logic with dedicated temporal operators Next X and Until U . As a result, the complexity of the class of formulas that have first-order correspondents also increases accordingly. In this paper, we identify a significant class of LTL Sahlqvist formulas built by using modal operators F , G, X, and U . The main result of this paper is to prove the correspondence of LTL Sahlqvist formulas to frame conditions that are definable in first-order language.
△ Less
Submitted 13 June, 2022;
originally announced June 2022.
-
Model Checking Strategic Abilities in Information-sharing Systems
Authors:
Francesco Belardinelli,
Ioana Boureanu,
Catalin Dima,
Vadim Malvone
Abstract:
We introduce a subclass of concurrent game structures (CGS) with imperfect information in which agents are endowed with private data-sharing capabilities. Importantly, our CGSs are such that it is still decidable to model-check these CGSs against a relevant fragment of ATL. These systems can be thought as a generalisation of architectures allowing information forks, in the sense that, in the initi…
▽ More
We introduce a subclass of concurrent game structures (CGS) with imperfect information in which agents are endowed with private data-sharing capabilities. Importantly, our CGSs are such that it is still decidable to model-check these CGSs against a relevant fragment of ATL. These systems can be thought as a generalisation of architectures allowing information forks, in the sense that, in the initial states of the system, we allow information forks from agents outside a given set A to agents inside this A. For this reason, together with the fact that the communication in our models underpins a specialised form of broadcast, we call our formalism A-cast systems. To underline, the fragment of ATL for which we show the model-checking problem to be decidable over A-cast is a large and significant one; it expresses coalitions over agents in any subset of the set A. Indeed, as we show, our systems and this ATL fragments can encode security problems that are notoriously hard to express faithfully: terrorist-fraud attacks in identity schemes.
△ Less
Submitted 19 April, 2022;
originally announced April 2022.
-
Bisimulations for Verifying Strategic Abilities with an Application to the ThreeBallot Voting Protocol
Authors:
Francesco Belardinelli,
Rodica Condurache,
Catalin Dima,
Wojciech Jamroga,
Michal Knapik
Abstract:
We propose a notion of alternating bisimulation for strategic abilities under imperfect information. The bisimulation preserves formulas of ATL$^*$ for both the {\em objective} and {\em subjective} variants of the state-based semantics with imperfect information, which are commonly used in the modeling and verification of multi-agent systems. Furthermore, we apply the theoretical result to the ver…
▽ More
We propose a notion of alternating bisimulation for strategic abilities under imperfect information. The bisimulation preserves formulas of ATL$^*$ for both the {\em objective} and {\em subjective} variants of the state-based semantics with imperfect information, which are commonly used in the modeling and verification of multi-agent systems. Furthermore, we apply the theoretical result to the verification of coercion-resistance in the ThreeBallot voting system, a voting protocol that does not use cryptography. In particular, we show that natural simplifications of an initial model of the protocol are in fact bisimulations of the original model, and therefore satisfy the same ATL$^*$ properties, including coercion-resistance. These simplifications allow the model-checking tool MCMAS to terminate on models with a larger number of voters and candidates, compared with the initial model.
△ Less
Submitted 18 October, 2023; v1 submitted 25 March, 2022;
originally announced March 2022.
-
Reasoning about Human-Friendly Strategies in Repeated Keyword Auctions
Authors:
Francesco Belardinelli,
Wojtek Jamroga,
Vadim Malvone,
Munyque Mittelmann,
Aniello Murano,
Laurent Perrussel
Abstract:
In online advertising, search engines sell ad placements for keywords continuously through auctions. This problem can be seen as an infinitely repeated game since the auction is executed whenever a user performs a query with the keyword. As advertisers may frequently change their bids, the game will have a large set of equilibria with potentially complex strategies. In this paper, we propose the u…
▽ More
In online advertising, search engines sell ad placements for keywords continuously through auctions. This problem can be seen as an infinitely repeated game since the auction is executed whenever a user performs a query with the keyword. As advertisers may frequently change their bids, the game will have a large set of equilibria with potentially complex strategies. In this paper, we propose the use of natural strategies for reasoning in such setting as they are processable by artificial agents with limited memory and/or computational power as well as understandable by human users. To reach this goal, we introduce a quantitative version of Strategy Logic with natural strategies in the setting of imperfect information. In a first step, we show how to model strategies for repeated keyword auctions and take advantage of the model for proving properties evaluating this game. In a second step, we study the logic in relation to the distinguishing power, expressivity, and model-checking complexity for strategies with and without recall.
△ Less
Submitted 24 January, 2022;
originally announced January 2022.
-
Do Androids Dream of Electric Fences? Safety-Aware Reinforcement Learning with Latent Shielding
Authors:
Chloe He,
Borja G. Leon,
Francesco Belardinelli
Abstract:
The growing trend of fledgling reinforcement learning systems making their way into real-world applications has been accompanied by growing concerns for their safety and robustness. In recent years, a variety of approaches have been put forward to address the challenges of safety-aware reinforcement learning; however, these methods often either require a handcrafted model of the environment to b…
▽ More
The growing trend of fledgling reinforcement learning systems making their way into real-world applications has been accompanied by growing concerns for their safety and robustness. In recent years, a variety of approaches have been put forward to address the challenges of safety-aware reinforcement learning; however, these methods often either require a handcrafted model of the environment to be provided beforehand, or that the environment is relatively simple and low-dimensional. We present a novel approach to safety-aware deep reinforcement learning in high-dimensional environments called latent shielding. Latent shielding leverages internal representations of the environment learnt by model-based agents to "imagine" future trajectories and avoid those deemed unsafe. We experimentally demonstrate that this approach leads to improved adherence to formally-defined safety specifications.
△ Less
Submitted 21 December, 2021;
originally announced December 2021.
-
In a Nutshell, the Human Asked for This: Latent Goals for Following Temporal Specifications
Authors:
Borja G. León,
Murray Shanahan,
Francesco Belardinelli
Abstract:
We address the problem of building agents whose goal is to learn to execute out-of distribution (OOD) multi-task instructions expressed in temporal logic (TL) by using deep reinforcement learning (DRL). Recent works provided evidence that the agent's neural architecture is a key feature when DRL agents are learning to solve OOD tasks in TL. Yet, the studies on this topic are still in their infancy…
▽ More
We address the problem of building agents whose goal is to learn to execute out-of distribution (OOD) multi-task instructions expressed in temporal logic (TL) by using deep reinforcement learning (DRL). Recent works provided evidence that the agent's neural architecture is a key feature when DRL agents are learning to solve OOD tasks in TL. Yet, the studies on this topic are still in their infancy. In this work, we propose a new deep learning configuration with inductive biases that lead agents to generate latent representations of their current goal, yielding a stronger generalization performance. We use these latent-goal networks within a neuro-symbolic framework that executes multi-task formally-defined instructions and contrast the performance of the proposed neural networks against employing different state-of-the-art (SOTA) architectures when generalizing to unseen instructions in OOD environments.
△ Less
Submitted 24 February, 2022; v1 submitted 18 October, 2021;
originally announced October 2021.
-
Aggregating Bipolar Opinions (With Appendix)
Authors:
Stefan Lauren,
Francesco Belardinelli,
Francesca Toni
Abstract:
We introduce a novel method to aggregate Bipolar Argumentation (BA) Frameworks expressing opinions by different parties in debates. We use Bipolar Assumption-based Argumentation (ABA) as an all-encompassing formalism for BA under different semantics. By leveraging on recent results on judgement aggregation in Social Choice Theory, we prove several preservation results, both positive and negative,…
▽ More
We introduce a novel method to aggregate Bipolar Argumentation (BA) Frameworks expressing opinions by different parties in debates. We use Bipolar Assumption-based Argumentation (ABA) as an all-encompassing formalism for BA under different semantics. By leveraging on recent results on judgement aggregation in Social Choice Theory, we prove several preservation results, both positive and negative, for relevant properties of Bipolar ABA.
△ Less
Submitted 4 February, 2021;
originally announced February 2021.
-
An Abstraction-based Method to Check Multi-Agent Deep Reinforcement-Learning Behaviors
Authors:
Pierre El Mqirmi,
Francesco Belardinelli,
Borja G. León
Abstract:
Multi-agent reinforcement learning (RL) often struggles to ensure the safe behaviours of the learning agents, and therefore it is generally not adapted to safety-critical applications. To address this issue, we present a methodology that combines formal verification with (deep) RL algorithms to guarantee the satisfaction of formally-specified safety constraints both in training and testing. The ap…
▽ More
Multi-agent reinforcement learning (RL) often struggles to ensure the safe behaviours of the learning agents, and therefore it is generally not adapted to safety-critical applications. To address this issue, we present a methodology that combines formal verification with (deep) RL algorithms to guarantee the satisfaction of formally-specified safety constraints both in training and testing. The approach we propose expresses the constraints to verify in Probabilistic Computation Tree Logic (PCTL) and builds an abstract representation of the system to reduce the complexity of the verification step. This abstract model allows for model checking techniques to identify a set of abstract policies that meet the safety constraints expressed in PCTL. Then, the agents' behaviours are restricted according to these safe abstract policies. We provide formal guarantees that by using this method, the actions of the agents always meet the safety constraints, and provide a procedure to generate an abstract model automatically. We empirically evaluate and show the effectiveness of our method in a multi-agent environment.
△ Less
Submitted 1 April, 2021; v1 submitted 2 February, 2021;
originally announced February 2021.
-
A Hennessy-Milner Theorem for ATL with Imperfect Information
Authors:
Francesco Belardinelli,
Catalin Dima,
Vadim Malvone,
Ferucio Tiplea
Abstract:
We show that a history-based variant of alternating bisimulation with imperfect information allows it to be related to a variant of Alternating-time Temporal Logic (ATL) with imperfect information by a full Hennessy-Milner theorem. The variant of ATL we consider has a common knowledge semantics, which requires that the uniform strategy available for a coalition to accomplish some goal must be comm…
▽ More
We show that a history-based variant of alternating bisimulation with imperfect information allows it to be related to a variant of Alternating-time Temporal Logic (ATL) with imperfect information by a full Hennessy-Milner theorem. The variant of ATL we consider has a common knowledge semantics, which requires that the uniform strategy available for a coalition to accomplish some goal must be common knowledge inside the coalition, while other semantic variants of ATL with imperfect information do not accommodate a Hennessy-Milner theorem. We also show that the existence of a history-based alternating bisimulation between two finite Concurrent Game Structures with imperfect information (iCGS) is undecidable.
△ Less
Submitted 26 June, 2020;
originally announced June 2020.
-
Systematic Generalisation through Task Temporal Logic and Deep Reinforcement Learning
Authors:
Borja G. León,
Murray Shanahan,
Francesco Belardinelli
Abstract:
This work introduces a neuro-symbolic agent that combines deep reinforcement learning (DRL) with temporal logic (TL) to achieve systematic zero-shot, i.e., never-seen-before, generalisation of formally specified instructions. In particular, we present a neuro-symbolic framework where a symbolic module transforms TL specifications into a form that helps the training of a DRL agent targeting general…
▽ More
This work introduces a neuro-symbolic agent that combines deep reinforcement learning (DRL) with temporal logic (TL) to achieve systematic zero-shot, i.e., never-seen-before, generalisation of formally specified instructions. In particular, we present a neuro-symbolic framework where a symbolic module transforms TL specifications into a form that helps the training of a DRL agent targeting generalisation, while a neural module learns systematically to solve the given tasks. We study the emergence of systematic learning in different settings and find that the architecture of the convolutional layers is key when generalising to new instructions. We also provide evidence that systematic learning can emerge with abstract operators such as negation when learning from a few training examples, which previous research have struggled with.
△ Less
Submitted 13 September, 2021; v1 submitted 12 June, 2020;
originally announced June 2020.
-
Extended Markov Games to Learn Multiple Tasks in Multi-Agent Reinforcement Learning
Authors:
Borja G. León,
Francesco Belardinelli
Abstract:
The combination of Formal Methods with Reinforcement Learning (RL) has recently attracted interest as a way for single-agent RL to learn multiple-task specifications. In this paper we extend this convergence to multi-agent settings and formally define Extended Markov Games as a general mathematical model that allows multiple RL agents to concurrently learn various non-Markovian specifications. To…
▽ More
The combination of Formal Methods with Reinforcement Learning (RL) has recently attracted interest as a way for single-agent RL to learn multiple-task specifications. In this paper we extend this convergence to multi-agent settings and formally define Extended Markov Games as a general mathematical model that allows multiple RL agents to concurrently learn various non-Markovian specifications. To introduce this new model we provide formal definitions and proofs as well as empirical tests of RL algorithms running on this framework. Specifically, we use our model to train two different logic-based multi-agent RL algorithms to solve diverse settings of non-Markovian co-safe LTL specifications.
△ Less
Submitted 14 February, 2020;
originally announced February 2020.
-
Formal Verification of Debates in Argumentation Theory
Authors:
Ria Jha,
Francesco Belardinelli,
Francesca Toni
Abstract:
Humans engage in informal debates on a daily basis. By expressing their opinions and ideas in an argumentative fashion, they are able to gain a deeper understanding of a given problem and in some cases, find the best possible course of actions towards resolving it. In this paper, we develop a methodology to verify debates formalised as abstract argumentation frameworks. We first present a translat…
▽ More
Humans engage in informal debates on a daily basis. By expressing their opinions and ideas in an argumentative fashion, they are able to gain a deeper understanding of a given problem and in some cases, find the best possible course of actions towards resolving it. In this paper, we develop a methodology to verify debates formalised as abstract argumentation frameworks. We first present a translation from debates to transition systems. Such transition systems can model debates and represent their evolution over time using a finite set of states. We then formalise relevant debate properties using temporal and strategy logics. These formalisations, along with a debate transition system, allow us to verify whether a given debate satisfies certain properties. The verification process can be automated using model checkers. Therefore, we also measure their performance when verifying debates, and use the results to discuss the feasibility of model checking debates.
△ Less
Submitted 12 December, 2019;
originally announced December 2019.
-
Social Choice Methods for Database Aggregation
Authors:
Francesco Belardinelli,
Umberto Grandi
Abstract:
Knowledge can be represented compactly in multiple ways, from a set of propositional formulas, to a Kripke model, to a database. In this paper we study the aggregation of information coming from multiple sources, each source submitting a database modelled as a first-order relational structure. In the presence of integrity constraints, we identify classes of aggregators that respect them in the agg…
▽ More
Knowledge can be represented compactly in multiple ways, from a set of propositional formulas, to a Kripke model, to a database. In this paper we study the aggregation of information coming from multiple sources, each source submitting a database modelled as a first-order relational structure. In the presence of integrity constraints, we identify classes of aggregators that respect them in the aggregated database, provided these are satisfied in all individual databases. We also characterise languages for first-order queries on which the answer to a query on the aggregated database coincides with the aggregation of the answers to the query obtained on each individual database. This contribution is meant to be a first step on the application of techniques from social choice theory to knowledge representation in databases.
△ Less
Submitted 21 July, 2019;
originally announced July 2019.
-
Model Checking ATL* on vCGS
Authors:
Francesco Belardinelli,
Catalin Dima,
Ioana Boureanu,
Vadim Malvone
Abstract:
We prove that the model checking ATL* on concurrent game structures with propositional control for atom-visibility (vCGS) is undecidable. To do so, we reduce this problem to model checking ATL* on iCGS.
We prove that the model checking ATL* on concurrent game structures with propositional control for atom-visibility (vCGS) is undecidable. To do so, we reduce this problem to model checking ATL* on iCGS.
△ Less
Submitted 11 March, 2019;
originally announced March 2019.
-
Database Aggregation
Authors:
Francesco Belardinelli,
Umberto Grandi
Abstract:
Knowledge can be represented compactly in a multitude ways, from a set of propositional formulas, to a Kripke model, to a database. In this paper we study the aggregation of information coming from multiple sources, each source submitting a database modelled as a first-order relational structure. In the presence of an integrity constraint, we identify classes of aggregators that respect it in the…
▽ More
Knowledge can be represented compactly in a multitude ways, from a set of propositional formulas, to a Kripke model, to a database. In this paper we study the aggregation of information coming from multiple sources, each source submitting a database modelled as a first-order relational structure. In the presence of an integrity constraint, we identify classes of aggregators that respect it in the aggregated database, provided all individual databases satisfy it. We also characterise languages for first-order queries on which the answer to queries on the aggregated database coincides with the aggregation of the answers to the query obtained on each individual database. This contribution is meant to be a first step on the application of techniques from rational choice theory to knowledge representation in databases.
△ Less
Submitted 23 February, 2018;
originally announced February 2018.
-
Relaxing Exclusive Control in Boolean Games
Authors:
Francesco Belardinelli,
Umberto Grandi,
Andreas Herzig,
Dominique Longin,
Emiliano Lorini,
Arianna Novaro,
Laurent Perrussel
Abstract:
In the typical framework for boolean games (BG) each player can change the truth value of some propositional atoms, while attempting to make her goal true. In standard BG goals are propositional formulas, whereas in iterated BG goals are formulas of Linear Temporal Logic. Both notions of BG are characterised by the fact that agents have exclusive control over their set of atoms, meaning that no tw…
▽ More
In the typical framework for boolean games (BG) each player can change the truth value of some propositional atoms, while attempting to make her goal true. In standard BG goals are propositional formulas, whereas in iterated BG goals are formulas of Linear Temporal Logic. Both notions of BG are characterised by the fact that agents have exclusive control over their set of atoms, meaning that no two agents can control the same atom. In the present contribution we drop the exclusivity assumption and explore structures where an atom can be controlled by multiple agents. We introduce Concurrent Game Structures with Shared Propositional Control (CGS-SPC) and show that they ac- count for several classes of repeated games, including iterated boolean games, influence games, and aggregation games. Our main result shows that, as far as verification is concerned, CGS-SPC can be reduced to concurrent game structures with exclusive control. This result provides a polynomial reduction for the model checking problem of specifications in Alternating-time Temporal Logic on CGS-SPC.
△ Less
Submitted 27 July, 2017;
originally announced July 2017.
-
A Logic for Global and Local Announcements
Authors:
Francesco Belardinelli,
Hans van Ditmarsch,
Wiebe van der Hoek
Abstract:
In this paper we introduce {\em global and local announcement logic} (GLAL), a dynamic epistemic logic with two distinct announcement operators -- $[φ]^+_A$ and $[φ]^-_A$ indexed to a subset $A$ of the set $Ag$ of all agents -- for global and local announcements respectively. The boundary case $[φ]^+_{Ag}$ corresponds to the public announcement of $φ$, as known from the literature. Unlike standard…
▽ More
In this paper we introduce {\em global and local announcement logic} (GLAL), a dynamic epistemic logic with two distinct announcement operators -- $[φ]^+_A$ and $[φ]^-_A$ indexed to a subset $A$ of the set $Ag$ of all agents -- for global and local announcements respectively. The boundary case $[φ]^+_{Ag}$ corresponds to the public announcement of $φ$, as known from the literature. Unlike standard public announcements, which are {\em model transformers}, the global and local announcements are {\em pointed model transformers}. In particular, the update induced by the announcement may be different in different states of the model. Therefore, the resulting computations are trees of models, rather than the typical sequences. A consequence of our semantics is that modally bisimilar states may be distinguished in our logic. Then, we provide a stronger notion of bisimilarity and we show that it preserves modal equivalence in GLAL. Additionally, we show that GLAL is strictly more expressive than public announcement logic with common knowledge. We prove a wide range of validities for GLAL involving the interaction between dynamics and knowledge, and show that the satisfiability problem for GLAL is decidable. We illustrate the formal machinery by means of detailed epistemic scenarios.
△ Less
Submitted 27 July, 2017;
originally announced July 2017.
-
Reasoning about Knowledge and Strategies: Epistemic Strategy Logic
Authors:
Francesco Belardinelli
Abstract:
In this paper we introduce Epistemic Strategy Logic (ESL), an extension of Strategy Logic with modal operators for individual knowledge. This enhanced framework allows us to represent explicitly and to reason about the knowledge agents have of their own and other agents' strategies. We provide a semantics to ESL in terms of epistemic concurrent game models, and consider the corresponding model che…
▽ More
In this paper we introduce Epistemic Strategy Logic (ESL), an extension of Strategy Logic with modal operators for individual knowledge. This enhanced framework allows us to represent explicitly and to reason about the knowledge agents have of their own and other agents' strategies. We provide a semantics to ESL in terms of epistemic concurrent game models, and consider the corresponding model checking problem. We show that the complexity of model checking ESL is not worse than (non-epistemic) Strategy Logic
△ Less
Submitted 3 April, 2014;
originally announced April 2014.
-
Interactions between Knowledge and Time in a First-Order Logic for Multi-Agent Systems: Completeness Results
Authors:
Francesco Belardinelli,
Alessio Lomuscio
Abstract:
We investigate a class of first-order temporal-epistemic logics for reasoning about multi-agent systems. We encode typical properties of systems including perfect recall, synchronicity, no learning, and having a unique initial state in terms of variants of quantified interpreted systems, a first-order extension of interpreted systems. We identify several monodic fragments of first-order temporal-e…
▽ More
We investigate a class of first-order temporal-epistemic logics for reasoning about multi-agent systems. We encode typical properties of systems including perfect recall, synchronicity, no learning, and having a unique initial state in terms of variants of quantified interpreted systems, a first-order extension of interpreted systems. We identify several monodic fragments of first-order temporal-epistemic logic and show their completeness with respect to their corresponding classes of quantified interpreted systems.
△ Less
Submitted 22 January, 2014;
originally announced January 2014.
-
Verification of Agent-Based Artifact Systems
Authors:
Francesco Belardinelli,
Alessio Lomuscio,
Fabio Patrizi
Abstract:
Artifact systems are a novel paradigm for specifying and implementing business processes described in terms of interacting modules called artifacts. Artifacts consist of data and lifecycles, accounting respectively for the relational structure of the artifacts' states and their possible evolutions over time. In this paper we put forward artifact-centric multi-agent systems, a novel formalisation o…
▽ More
Artifact systems are a novel paradigm for specifying and implementing business processes described in terms of interacting modules called artifacts. Artifacts consist of data and lifecycles, accounting respectively for the relational structure of the artifacts' states and their possible evolutions over time. In this paper we put forward artifact-centric multi-agent systems, a novel formalisation of artifact systems in the context of multi-agent systems operating on them. Differently from the usual process-based models of services, the semantics we give explicitly accounts for the data structures on which artifact systems are defined. We study the model checking problem for artifact-centric multi-agent systems against specifications written in a quantified version of temporal-epistemic logic expressing the knowledge of the agents in the exchange. We begin by noting that the problem is undecidable in general. We then identify two noteworthy restrictions, one syntactical and one semantical, that enable us to find bisimilar finite abstractions and therefore reduce the model checking problem to the instance on finite models. Under these assumptions we show that the model checking problem for these systems is EXPSPACE-complete. We then introduce artifact-centric programs, compact and declarative representations of the programs governing both the artifact system and the agents. We show that, while these in principle generate infinite-state systems, under natural conditions their verification problem can be solved on finite abstractions that can be effectively computed from the programs. Finally we exemplify the theoretical results of the paper through a mainstream procurement scenario from the artifact systems literature.
△ Less
Submitted 22 January, 2013; v1 submitted 12 January, 2013;
originally announced January 2013.
-
Automated Verification of Quantum Protocols using MCMAS
Authors:
F. Belardinelli,
P. Gonzalez,
A. Lomuscio
Abstract:
We present a methodology for the automated verification of quantum protocols using MCMAS, a symbolic model checker for multi-agent systems The method is based on the logical framework developed by D'Hondt and Panangaden for investigating epistemic and temporal properties, built on the model for Distributed Measurement-based Quantum Computation (DMC), an extension of the Measurement Calculus to dis…
▽ More
We present a methodology for the automated verification of quantum protocols using MCMAS, a symbolic model checker for multi-agent systems The method is based on the logical framework developed by D'Hondt and Panangaden for investigating epistemic and temporal properties, built on the model for Distributed Measurement-based Quantum Computation (DMC), an extension of the Measurement Calculus to distributed quantum systems. We describe the translation map from DMC to interpreted systems, the typical formalism for reasoning about time and knowledge in multi-agent systems. Then, we introduce dmc2ispl, a compiler into the input language of the MCMAS model checker. We demonstrate the technique by verifying the Quantum Teleportation Protocol, and discuss the performance of the tool.
△ Less
Submitted 3 July, 2012;
originally announced July 2012.