Search | arXiv e-print repository

Learning Constrained Markov Decision Processes With Non-stationary Rewards and Constraints

Authors: Francesco Emanuele Stradi, Anna Lunghi, Matteo Castiglioni, Alberto Marchesi, Nicola Gatti

Abstract: In constrained Markov decision processes (CMDPs) with adversarial rewards and constraints, a well-known impossibility result prevents any algorithm from attaining both sublinear regret and sublinear constraint violation, when competing against a best-in-hindsight policy that satisfies constraints on average. In this paper, we show that this negative result can be eased in CMDPs with non-stationary… ▽ More In constrained Markov decision processes (CMDPs) with adversarial rewards and constraints, a well-known impossibility result prevents any algorithm from attaining both sublinear regret and sublinear constraint violation, when competing against a best-in-hindsight policy that satisfies constraints on average. In this paper, we show that this negative result can be eased in CMDPs with non-stationary rewards and constraints, by providing algorithms whose performances smoothly degrade as non-stationarity increases. Specifically, we propose algorithms attaining $\tilde{\mathcal{O}} (\sqrt{T} + C)$ regret and positive constraint violation under bandit feedback, where $C$ is a corruption value measuring the environment non-stationarity. This can be $Θ(T)$ in the worst case, coherently with the impossibility result for adversarial CMDPs. First, we design an algorithm with the desired guarantees when $C$ is known. Then, in the case $C$ is unknown, we show how to obtain the same results by embedding such an algorithm in a general meta-procedure. This is of independent interest, as it can be applied to any non-stationary constrained online learning setting. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.06977 [pdf, ps, other]

The Sample Complexity of Stackelberg Games

Authors: Francesco Bacchiocchi, Matteo Bollini, Matteo Castiglioni, Alberto Marchesi, Nicola Gatti

Abstract: Stackelberg games (SGs) constitute the most fundamental and acclaimed models of strategic interactions involving some form of commitment. Moreover, they form the basis of more elaborate models of this kind, such as, e.g., Bayesian persuasion and principal-agent problems. Addressing learning tasks in SGs and related models is crucial to operationalize them in practice, where model parameters are us… ▽ More Stackelberg games (SGs) constitute the most fundamental and acclaimed models of strategic interactions involving some form of commitment. Moreover, they form the basis of more elaborate models of this kind, such as, e.g., Bayesian persuasion and principal-agent problems. Addressing learning tasks in SGs and related models is crucial to operationalize them in practice, where model parameters are usually unknown. In this paper, we revise the sample complexity of learning an optimal strategy to commit to in SGs. We provide a novel algorithm that (i) does not require any of the limiting assumptions made by state-of-the-art approaches and (ii) deals with a trade-off between sample complexity and termination probability arising when leader's strategies representation has finite precision. Such a trade-off has been completely neglected by existing algorithms and, if not properly managed, it may result in them using exponentially-many samples. Our algorithm requires novel techniques, which also pave the way to addressing learning problems in other models with commitment ubiquitous in the real world. △ Less

Submitted 11 May, 2024; originally announced May 2024.

arXiv:2403.03672 [pdf, ps, other]

Learning Adversarial MDPs with Stochastic Hard Constraints

Authors: Francesco Emanuele Stradi, Matteo Castiglioni, Alberto Marchesi, Nicola Gatti

Abstract: We study online learning problems in constrained Markov decision processes (CMDPs) with adversarial losses and stochastic hard constraints. We consider two different scenarios. In the first one, we address general CMDPs, where we design an algorithm that attains sublinear regret and cumulative positive constraints violation. In the second scenario, under the mild assumption that a policy strictly… ▽ More We study online learning problems in constrained Markov decision processes (CMDPs) with adversarial losses and stochastic hard constraints. We consider two different scenarios. In the first one, we address general CMDPs, where we design an algorithm that attains sublinear regret and cumulative positive constraints violation. In the second scenario, under the mild assumption that a policy strictly satisfying the constraints exists and is known to the learner, we design an algorithm that achieves sublinear regret while ensuring that the constraints are satisfied at every episode with high probability. To the best of our knowledge, our work is the first to study CMDPs involving both adversarial losses and hard constraints. Indeed, previous works either focus on much weaker soft constraints--allowing for positive violation to cancel out negative ones--or are restricted to stochastic losses. Thus, our algorithms can deal with general non-stationary environments subject to requirements much stricter than those manageable with state-of-the-art algorithms. This enables their adoption in a much wider range of real-world applications, ranging from autonomous driving to online advertising and recommender systems. △ Less

Submitted 20 March, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

arXiv:2402.13156 [pdf, ps, other]

Regret-Minimizing Contracts: Agency Under Uncertainty

Authors: Martino Bernasconi, Matteo Castiglioni, Alberto Marchesi

Abstract: We study the fundamental problem of designing contracts in principal-agent problems under uncertainty. Previous works mostly addressed Bayesian settings in which principal's uncertainty is modeled as a probability distribution over agent's types. In this paper, we study a setting in which the principal has no distributional information about agent's type. In particular, in our setting, the princip… ▽ More We study the fundamental problem of designing contracts in principal-agent problems under uncertainty. Previous works mostly addressed Bayesian settings in which principal's uncertainty is modeled as a probability distribution over agent's types. In this paper, we study a setting in which the principal has no distributional information about agent's type. In particular, in our setting, the principal only knows some uncertainty set defining possible agent's action costs. Thus, the principal takes a robust (adversarial) approach by trying to design contracts which minimize the (additive) regret: the maximum difference between what the principal could have obtained had them known agent's costs and what they actually get under the selected contract. △ Less

Submitted 21 February, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

arXiv:2402.03077 [pdf, ps, other]

Markov Persuasion Processes: Learning to Persuade from Scratch

Authors: Francesco Bacchiocchi, Francesco Emanuele Stradi, Matteo Castiglioni, Alberto Marchesi, Nicola Gatti

Abstract: In Bayesian persuasion, an informed sender strategically discloses information to a receiver so as to persuade them to undertake desirable actions. Recently, a growing attention has been devoted to settings in which sender and receivers interact sequentially. Recently, Markov persuasion processes (MPPs) have been introduced to capture sequential scenarios where a sender faces a stream of myopic re… ▽ More In Bayesian persuasion, an informed sender strategically discloses information to a receiver so as to persuade them to undertake desirable actions. Recently, a growing attention has been devoted to settings in which sender and receivers interact sequentially. Recently, Markov persuasion processes (MPPs) have been introduced to capture sequential scenarios where a sender faces a stream of myopic receivers in a Markovian environment. The MPPs studied so far in the literature suffer from issues that prevent them from being fully operational in practice, e.g., they assume that the sender knows receivers' rewards. We fix such issues by addressing MPPs where the sender has no knowledge about the environment. We design a learning algorithm for the sender, working with partial feedback. We prove that its regret with respect to an optimal information-disclosure policy grows sublinearly in the number of episodes, as it is the case for the loss in persuasiveness cumulated while learning. Moreover, we provide a lower bound for our setting matching the guarantees of our algorithm. △ Less

Submitted 6 March, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

arXiv:2309.09801 [pdf, ps, other]

Learning Optimal Contracts: How to Exploit Small Action Spaces

Authors: Francesco Bacchiocchi, Matteo Castiglioni, Alberto Marchesi, Nicola Gatti

Abstract: We study principal-agent problems in which a principal commits to an outcome-dependent payment scheme -- called contract -- in order to induce an agent to take a costly, unobservable action leading to favorable outcomes. We consider a generalization of the classical (single-round) version of the problem in which the principal interacts with the agent by committing to contracts over multiple rounds… ▽ More We study principal-agent problems in which a principal commits to an outcome-dependent payment scheme -- called contract -- in order to induce an agent to take a costly, unobservable action leading to favorable outcomes. We consider a generalization of the classical (single-round) version of the problem in which the principal interacts with the agent by committing to contracts over multiple rounds. The principal has no information about the agent, and they have to learn an optimal contract by only observing the outcome realized at each round. We focus on settings in which the size of the agent's action space is small. We design an algorithm that learns an approximately-optimal contract with high probability in a number of rounds polynomial in the size of the outcome space, when the number of actions is constant. Our algorithm solves an open problem by Zhu et al.[2022]. Moreover, it can also be employed to provide a $\tilde{\mathcal{O}}(T^{4/5})$ regret bound in the related online learning setting in which the principal aims at maximizing their cumulative utility, thus considerably improving previously-known regret bounds. △ Less

Submitted 7 June, 2024; v1 submitted 18 September, 2023; originally announced September 2023.

arXiv:2306.12221 [pdf, other]

Persuading Farsighted Receivers in MDPs: the Power of Honesty

Authors: Martino Bernasconi, Matteo Castiglioni, Alberto Marchesi, Mirco Mutti

Abstract: Bayesian persuasion studies the problem faced by an informed sender who strategically discloses information to influence the behavior of an uninformed receiver. Recently, a growing attention has been devoted to settings where the sender and the receiver interact sequentially, in which the receiver's decision-making problem is usually modeled as a Markov decision process (MDP). However, previous wo… ▽ More Bayesian persuasion studies the problem faced by an informed sender who strategically discloses information to influence the behavior of an uninformed receiver. Recently, a growing attention has been devoted to settings where the sender and the receiver interact sequentially, in which the receiver's decision-making problem is usually modeled as a Markov decision process (MDP). However, previous works focused on computing optimal information-revelation policies (a.k.a. signaling schemes) under the restrictive assumption that the receiver acts myopically, selecting actions to maximize the one-step utility and disregarding future rewards. This is justified by the fact that, when the receiver is farsighted and thus considers future rewards, finding an optimal Markovian signaling scheme is NP-hard. In this paper, we show that Markovian signaling schemes do not constitute the "right" class of policies. Indeed, differently from most of the MDPs settings, we prove that Markovian signaling schemes are not optimal, and general history-dependent signaling schemes should be considered. Moreover, we also show that history-dependent signaling schemes circumvent the negative complexity results affecting Markovian signaling schemes. Formally, we design an algorithm that computes an optimal and ε-persuasive history-dependent signaling scheme in time polynomial in 1/ε and in the instance size. The crucial challenge is that general history-dependent signaling schemes cannot be represented in polynomial space. Nevertheless, we introduce a convenient subclass of history-dependent signaling schemes, called promise-form, which are as powerful as general history-dependent ones and efficiently representable. Intuitively, promise-form signaling schemes compactly encode histories in the form of honest promises on future receiver's rewards. △ Less

Submitted 21 June, 2023; originally announced June 2023.

arXiv:2304.14326 [pdf, ps, other]

A Best-of-Both-Worlds Algorithm for Constrained MDPs with Long-Term Constraints

Authors: Jacopo Germano, Francesco Emanuele Stradi, Gianmarco Genalti, Matteo Castiglioni, Alberto Marchesi, Nicola Gatti

Abstract: We study online learning in episodic constrained Markov decision processes (CMDPs), where the goal of the learner is to collect as much reward as possible over the episodes, while guaranteeing that some long-term constraints are satisfied during the learning process. Rewards and constraints can be selected either stochastically or adversarially, and the transition function is not known to the lear… ▽ More We study online learning in episodic constrained Markov decision processes (CMDPs), where the goal of the learner is to collect as much reward as possible over the episodes, while guaranteeing that some long-term constraints are satisfied during the learning process. Rewards and constraints can be selected either stochastically or adversarially, and the transition function is not known to the learner. While online learning in classical unconstrained MDPs has received considerable attention over the last years, the setting of CMDPs is still largely unexplored. This is surprising, since in real-world applications, such as, e.g., autonomous driving, automated bidding, and recommender systems, there are usually additional constraints and specifications that an agent has to obey during the learning process. In this paper, we provide the first best-of-both-worlds algorithm for CMDPs with long-term constraints. Our algorithm is capable of handling settings in which rewards and constraints are selected either stochastically or adversarially, without requiring any knowledge of the underling process. Moreover, our algorithm matches state-of-the-art regret and constraint violation bounds for settings in which constraints are selected stochastically, while it is the first to provide guarantees in the case in which they are chosen adversarially. △ Less

Submitted 27 April, 2023; originally announced April 2023.

arXiv:2303.01296 [pdf, ps, other]

Optimal Rates and Efficient Algorithms for Online Bayesian Persuasion

Authors: Martino Bernasconi, Matteo Castiglioni, Andrea Celli, Alberto Marchesi, Nicola Gatti, Francesco Trovò

Abstract: Bayesian persuasion studies how an informed sender should influence beliefs of rational receivers who take decisions through Bayesian updating of a common prior. We focus on the online Bayesian persuasion framework, in which the sender repeatedly faces one or more receivers with unknown and adversarially selected types. First, we show how to obtain a tight $\tilde O(T^{1/2})$ regret bound in the c… ▽ More Bayesian persuasion studies how an informed sender should influence beliefs of rational receivers who take decisions through Bayesian updating of a common prior. We focus on the online Bayesian persuasion framework, in which the sender repeatedly faces one or more receivers with unknown and adversarially selected types. First, we show how to obtain a tight $\tilde O(T^{1/2})$ regret bound in the case in which the sender faces a single receiver and has partial feedback, improving over the best previously known bound of $\tilde O(T^{4/5})$. Then, we provide the first no-regret guarantees for the multi-receiver setting under partial feedback. Finally, we show how to design no-regret algorithms with polynomial per-iteration running time by exploiting type reporting, thereby circumventing known intractability results on online Bayesian persuasion. We provide efficient algorithms guaranteeing a $O(T^{1/2})$ regret upper bound both in the single- and multi-receiver scenario when type reporting is allowed. △ Less

Submitted 2 March, 2023; originally announced March 2023.

arXiv:2301.13790 [pdf, ps, other]

Selling Information while Being an Interested Party

Authors: Matteo Castiglioni, Francesco Bacchiocchi, Alberto Marchesi, Giulia Romano, Nicola Gatti

Abstract: We study the algorithmic problem faced by an information holder (seller) who wants to optimally sell such information to a budged-constrained decision maker (buyer) that has to undertake some action. Differently from previous, we consider the case in which the seller is an interested party, as the action chosen by the buyer does not only influence their utility, but also seller's one. This happens… ▽ More We study the algorithmic problem faced by an information holder (seller) who wants to optimally sell such information to a budged-constrained decision maker (buyer) that has to undertake some action. Differently from previous, we consider the case in which the seller is an interested party, as the action chosen by the buyer does not only influence their utility, but also seller's one. This happens in many real-world settings, where the way in which businesses use acquired information may positively or negatively affect the seller, due to the presence of externalities on the information market. The utilities of both the seller and the buyer depend on a random state of nature, which is revealed to the seller, but it is unknown to the buyer. Thus, the seller's goal is to (partially) sell their information about the state of nature to the buyer, so as to concurrently maximize revenue and induce the buyer to take a desirable action. We study settings in which buyer's budget and utilities are determined by a random buyer's type that is unknown to the seller. In such settings, an optimal protocol for the seller must propose to the buyer a menu of information-revelation policies to choose from, with the latter acquiring one of them by paying its corresponding price. Moreover, since in our model the seller is an interested party, an optimal protocol must also prescribe the seller to pay back the buyer contingently on their action. First, we show that the problem of computing a seller-optimal protocol can be solved in polynomial time. Next, we switch the attention to the case in which a seller's protocol employs a single information-revelation policy, rather than proposing a menu, deriving both positive and negative results. △ Less

Submitted 31 January, 2023; originally announced January 2023.

arXiv:2301.13654 [pdf, ps, other]

Multi-Agent Contract Design: How to Commission Multiple Agents with Individual Outcome

Authors: Matteo Castiglioni, Alberto Marchesi, Nicola Gatti

Abstract: We study hidden-action principal-agent problems with multiple agents. These are problems in which a principal commits to an outcome-dependent payment scheme in order to incentivize some agents to take costly, unobservable actions that lead to favorable outcomes. Previous works on multi-agent problems study models where the principal observes a single outcome determined by the actions of all the ag… ▽ More We study hidden-action principal-agent problems with multiple agents. These are problems in which a principal commits to an outcome-dependent payment scheme in order to incentivize some agents to take costly, unobservable actions that lead to favorable outcomes. Previous works on multi-agent problems study models where the principal observes a single outcome determined by the actions of all the agents. Such models considerably limit the contracting power of the principal, since payments can only depend on the joint result of all the agents' actions, and there is no way of paying each agent for their individual result. In this paper, we consider a model in which each agent determines their own individual outcome as an effect of their action only, the principal observes all the individual outcomes separately, and they perceive a reward that jointly depends on all these outcomes. This considerably enhances the principal's contracting capabilities, by allowing them to pay each agent on the basis of their individual result. We analyze the computational complexity of finding principal-optimal contracts, revolving around two newly-introduced properties of principal's rewards, which we call IR-supermodularity and DR-submodularity. Intuitively, the former captures settings with increasing returns, where the rewards grow faster as the agents' effort increases, while the latter models the case of diminishing returns, in which rewards grow slower instead. These two properties naturally model two common real-world phenomena, namely diseconomies and economies of scale. In this paper, we first address basic instances in which the principal knows everything about the agents, and, then, more general Bayesian instances where each agent has their own private type determining their features, such as action costs and how actions stochastically determine individual outcomes. △ Less

Submitted 31 January, 2023; originally announced January 2023.

arXiv:2301.13600 [pdf, ps, other]

Constrained Phi-Equilibria

Authors: Martino Bernasconi, Matteo Castiglioni, Alberto Marchesi, Francesco Trovò, Nicola Gatti

Abstract: The computational study of equilibria involving constraints on players' strategies has been largely neglected. However, in real-world applications, players are usually subject to constraints ruling out the feasibility of some of their strategies, such as, e.g., safety requirements and budget caps. Computational studies on constrained versions of the Nash equilibrium have lead to some results under… ▽ More The computational study of equilibria involving constraints on players' strategies has been largely neglected. However, in real-world applications, players are usually subject to constraints ruling out the feasibility of some of their strategies, such as, e.g., safety requirements and budget caps. Computational studies on constrained versions of the Nash equilibrium have lead to some results under very stringent assumptions, while finding constrained versions of the correlated equilibrium (CE) is still unexplored. In this paper, we introduce and computationally characterize constrained Phi-equilibria -- a more general notion than constrained CEs -- in normal-form games. We show that computing such equilibria is in general computationally intractable, and also that the set of the equilibria may not be convex, providing a sharp divide with unconstrained CEs. Nevertheless, we provide a polynomial-time algorithm for computing a constrained (approximate) Phi-equilibrium maximizing a given linear function, when either the number of constraints or that of players' actions is fixed. Moreover, in the special case in which a player's constraints do not depend on other players' strategies, we show that an exact, function-maximizing equilibrium can be computed in polynomial time, while one (approximate) equilibrium can be found with an efficient decentralized no-regret learning algorithm. △ Less

Submitted 31 January, 2023; originally announced January 2023.

arXiv:2209.07454 [pdf, ps, other]

A Unifying Framework for Online Optimization with Long-Term Constraints

Authors: Matteo Castiglioni, Andrea Celli, Alberto Marchesi, Giulia Romano, Nicola Gatti

Abstract: We study online learning problems in which a decision maker has to take a sequence of decisions subject to $m$ long-term constraints. The goal of the decision maker is to maximize their total reward, while at the same time achieving small cumulative constraints violation across the $T$ rounds. We present the first best-of-both-world type algorithm for this general class of problems, with no-regret… ▽ More We study online learning problems in which a decision maker has to take a sequence of decisions subject to $m$ long-term constraints. The goal of the decision maker is to maximize their total reward, while at the same time achieving small cumulative constraints violation across the $T$ rounds. We present the first best-of-both-world type algorithm for this general class of problems, with no-regret guarantees both in the case in which rewards and constraints are selected according to an unknown stochastic model, and in the case in which they are selected at each round by an adversary. Our algorithm is the first to provide guarantees in the adversarial setting with respect to the optimal fixed strategy that satisfies the long-term constraints. In particular, it guarantees a $ρ/(1+ρ)$ fraction of the optimal reward and sublinear regret, where $ρ$ is a feasibility parameter related to the existence of strictly feasible solutions. Our framework employs traditional regret minimizers as black-box components. Therefore, by instantiating it with an appropriate choice of regret minimizers it can handle the full-feedback as well as the bandit-feedback setting. Moreover, it allows the decision maker to seamlessly handle scenarios with non-convex rewards and constraints. We show how our framework can be applied in the context of budget-management mechanisms for repeated auctions in order to guarantee long-term constraints that are not packing (e.g., ROI constraints). △ Less

Submitted 15 September, 2022; originally announced September 2022.

arXiv:2209.03927 [pdf, other]

Sequential Information Design: Learning to Persuade in the Dark

Authors: Martino Bernasconi, Matteo Castiglioni, Alberto Marchesi, Nicola Gatti, Francesco Trovo

Abstract: We study a repeated information design problem faced by an informed sender who tries to influence the behavior of a self-interested receiver. We consider settings where the receiver faces a sequential decision making (SDM) problem. At each round, the sender observes the realizations of random events in the SDM problem. This begets the challenge of how to incrementally disclose such information to… ▽ More We study a repeated information design problem faced by an informed sender who tries to influence the behavior of a self-interested receiver. We consider settings where the receiver faces a sequential decision making (SDM) problem. At each round, the sender observes the realizations of random events in the SDM problem. This begets the challenge of how to incrementally disclose such information to the receiver to persuade them to follow (desirable) action recommendations. We study the case in which the sender does not know random events probabilities, and, thus, they have to gradually learn them while persuading the receiver. We start by providing a non-trivial polytopal approximation of the set of sender's persuasive information structures. This is crucial to design efficient learning algorithms. Next, we prove a negative result: no learning algorithm can be persuasive. Thus, we relax persuasiveness requirements by focusing on algorithms that guarantee that the receiver's regret in following recommendations grows sub-linearly. In the full-feedback setting -- where the sender observes all random events realizations -- , we provide an algorithm with $\tilde{O}(\sqrt{T})$ regret for both the sender and the receiver. Instead, in the bandit-feedback setting -- where the sender only observes the realizations of random events actually occurring in the SDM problem -- , we design an algorithm that, given an $α\in [1/2, 1]$ as input, ensures $\tilde{O}({T^α})$ and $\tilde{O}( T^{\max \{ α, 1-\fracα{2} \} })$ regrets, for the sender and the receiver respectively. This result is complemented by a lower bound showing that such a regrets trade-off is essentially tight. △ Less

Submitted 8 September, 2022; originally announced September 2022.

arXiv:2208.08238 [pdf, other]

Last-iterate Convergence to Trembling-hand Perfect Equilibria

Authors: Martino Bernasconi, Alberto Marchesi, Francesco Trovò

Abstract: Designing efficient algorithms to find Nash equilibrium (NE) refinements in sequential games is of paramount importance in practice. Indeed, it is well known that the NE has several weaknesses, since it may prescribe to play sub-optimal actions in those parts of the game that are never reached at the equilibrium. NE refinements, such as the extensive-form perfect equilibrium (EFPE), amend such wea… ▽ More Designing efficient algorithms to find Nash equilibrium (NE) refinements in sequential games is of paramount importance in practice. Indeed, it is well known that the NE has several weaknesses, since it may prescribe to play sub-optimal actions in those parts of the game that are never reached at the equilibrium. NE refinements, such as the extensive-form perfect equilibrium (EFPE), amend such weaknesses by accounting for the possibility of players' mistakes. This is crucial in real-world applications, where bounded rationality players are usually involved, and it turns out being useful also in boosting the performances of superhuman agents for recreational games like Poker. Nevertheless, only few works addressed the problem of computing NE refinements. Most of them propose algorithms finding exact NE refinements by means of linear programming, and, thus, these do not have the potential of scaling up to real-world-size games. On the other hand, existing iterative algorithms that exploit the tree structure of sequential games only provide convergence guarantees to approximate refinements. In this paper, we provide the first efficient last-iterate algorithm that provably converges to an EFPE in two-player zero-sum sequential games with imperfect information. Our algorithm works by tracking a sequence of equilibria of suitably-defined, regularized-perturbed games. In order to do that, it uses a procedure that is tailored to converge last-iterate to the equilibria of such games. Crucially, the updates performed by such a procedure can be performed efficiently by visiting the game tree, thus making our algorithm potentially more scalable than its linear-programming-based competitors. Finally, we evaluate our algorithm on a standard testbed of games, showing that it produces strategies which are much more robust to players' mistakes than those of state-of-the-art NE-computation algorithms. △ Less

Submitted 17 August, 2022; originally announced August 2022.

arXiv:2204.13772 [pdf, other]

The Power of Media Agencies in Ad Auctions: Improving Utility through Coordinated Bidding

Authors: Giulia Romano, Matteo Castiglioni, Alberto Marchesi, Nicola Gatti

Abstract: The increasing competition in digital advertising induced a proliferation of media agencies playing the role of intermediaries between advertisers and platforms selling ad slots. When a group of competing advertisers is managed by a common agency, many forms of collusion, such as bid rigging, can be implemented by coordinating bidding strategies, dramatically increasing advertisers' value. We stud… ▽ More The increasing competition in digital advertising induced a proliferation of media agencies playing the role of intermediaries between advertisers and platforms selling ad slots. When a group of competing advertisers is managed by a common agency, many forms of collusion, such as bid rigging, can be implemented by coordinating bidding strategies, dramatically increasing advertisers' value. We study the problem of finding bids and monetary transfers maximizing the utility of a group of colluders, under GSP and VCG mechanisms. First, we introduce an abstract bid optimization problem -- called weighted utility problem (WUP) -- , which is useful in proving our results. We show that the utilities of bidding strategies are related to the length of paths in a directed acyclic weighted graph, whose structure and weights depend on the mechanism under study. This allows us to solve WUP in polynomial time by finding a shortest path of the graph. Next, we switch to our original problem, focusing on two settings that differ for the incentives they allow for. Incentive constraints ensure that colluders do not leave the agency, and they can be enforced by implementing monetary transfers between the agency and the advertisers. In particular, we study the arbitrary transfers setting, where any kind of monetary transfer to and from the advertisers is allowed, and the more realistic limited liability setting, in which no advertiser can be paid by the agency. In the former, we cast the problem as a WUP instance and solve it by our graph-based algorithm, while, in the latter, we formulate it as a linear program with exponentially-many variables efficiently solvable by applying the ellipsoid algorithm to its dual. This requires to solve a suitable separation problem in polynomial time, which can be done by reducing it to a WUP instance. △ Less

Submitted 28 April, 2022; originally announced April 2022.

arXiv:2202.10966 [pdf, ps, other]

Designing Menus of Contracts Efficiently: The Power of Randomization

Authors: Matteo Castiglioni, Alberto Marchesi, Nicola Gatti

Abstract: We study hidden-action principal-agent problems in which a principal commits to an outcome-dependent payment scheme (called contract) so as to incentivize the agent to take a costly, unobservable action leading to favorable outcomes. In particular, we focus on Bayesian settings where the agent has private information. This is collectively encoded by the agent's type, which is unknown to the princi… ▽ More We study hidden-action principal-agent problems in which a principal commits to an outcome-dependent payment scheme (called contract) so as to incentivize the agent to take a costly, unobservable action leading to favorable outcomes. In particular, we focus on Bayesian settings where the agent has private information. This is collectively encoded by the agent's type, which is unknown to the principal, but randomly drawn according to a finitely-supported, commonly-known probability distribution. In Bayesian principal-agent problems, the principal may be better off by committing to a menu of contracts specifying a contract for each agent's type, rather than committing to a single contract. This induces a two-stage process that resembles interactions studied in classical mechanism design: after the principal has committed to a menu, the agent first reports a type to the principal, and, then, the latter puts in place the contract in the menu that corresponds to the reported type. Thus, the principal's computational problem boils down to designing a menu of contracts that incentivizes the agent to report their true type and maximizes expected utility. Previous works showed that computing an optimal menu of contracts is APX-hard. Crucially, previous works focus on menus of deterministic contracts. Surprisingly, we show that, if one considers menus of randomized contracts defined as probability distributions over payment vectors, then an "almost-optimal" menu can be computed in polynomial time. Indeed, the problem of computing a principal-optimal menu of randomized contracts may not admit a maximum, but only a supremum. Nevertheless, we show how to design a polynomial-time algorithm that guarantees the principal with an expected utility arbitrarily close to the supremum. Besides this main result, we also close several gaps in the analysis of menus of deterministic contracts. △ Less

Submitted 17 August, 2022; v1 submitted 22 February, 2022; originally announced February 2022.

arXiv:2202.00605 [pdf, ps, other]

Bayesian Persuasion Meets Mechanism Design: Going Beyond Intractability with Type Reporting

Authors: Matteo Castiglioni, Alberto Marchesi, Nicola Gatti

Abstract: Bayesian persuasion studies how an informed sender should partially disclose information so as to influence the behavior of self-interested receivers. In the last years, a growing attention has been devoted to relaxing the assumption that the sender perfectly knows receiver's payoffs. The first crucial step towards such an achievement is to study settings where each receiver's payoffs depend on th… ▽ More Bayesian persuasion studies how an informed sender should partially disclose information so as to influence the behavior of self-interested receivers. In the last years, a growing attention has been devoted to relaxing the assumption that the sender perfectly knows receiver's payoffs. The first crucial step towards such an achievement is to study settings where each receiver's payoffs depend on their unknown type, which is randomly determined by a known finite-supported probability distribution. This begets considerable computational challenges, as computing a sender-optimal signaling scheme is inapproximable up to within any constant factor. In this work, we circumvent this issue by leveraging ideas from mechanism design. In particular, we introduce a type reporting step in which the receiver is asked to report their type to the sender, after the latter has committed to a menu defining a signaling scheme for each possible receiver's type. We prove that, with a single receiver, the addition of this type reporting stage makes the sender's computational problem tractable. Then, we extend our framework to settings with multiple receivers, focusing on the case of no inter-agent externalities and binary actions. We show that it is possible to find a sender-optimal solution in polynomial-time by means of the ellipsoid method, given access to a suitable polynomial-time separation oracle. This can be implemented for supermodular and anonymous sender's utility functions. As for the case of submodular sender's utility functions, we first approximately cast the sender's problem into a linearly-constrained mathematical program whose objective function is the multi-linear extension of the sender's utility. Then, we show how to find in polynomial-time an approximate solution to the program by means of a continuous greedy algorithm. This provides a (1 -1/e)-approximation to the problem. △ Less

Submitted 1 September, 2022; v1 submitted 1 February, 2022; originally announced February 2022.

arXiv:2201.12275 [pdf, other]

Efficiency of Ad Auctions with Price Displaying

Authors: Matteo Castiglioni, Diodato Ferraioli, Nicola Gatti, Alberto Marchesi, Giulia Romano

Abstract: Most of the economic reports forecast that almost half of the worldwide market value unlocked by AI over the next decade (up to 6 trillion USD per year) will be in marketing&sales. In particular, AI will enable the optimization of more and more intricate economic settings, in which multiple different activities need to be jointly automated. This is the case of, e.g., Google Hotel Ads and Tripadvis… ▽ More Most of the economic reports forecast that almost half of the worldwide market value unlocked by AI over the next decade (up to 6 trillion USD per year) will be in marketing&sales. In particular, AI will enable the optimization of more and more intricate economic settings, in which multiple different activities need to be jointly automated. This is the case of, e.g., Google Hotel Ads and Tripadvisor, where auctions are used to display ads of similar products or services together with their prices. As in classical ad auctions, the ads are ranked depending on the advertisers' bids, whereas, differently from classical settings, ads are displayed together with their prices, so as to provide a direct comparison among them. This dramatically affects users' behavior, as well as the properties of ad auctions. We show that, in such settings, social welfare maximization can be achieved by means of a direct-revelation mechanism that jointly optimizes, in polynomial time, the ads allocation and the advertisers' prices to be displayed with them. However, in practice it is unlikely that advertisers allow the mechanism to choose prices on their behalf. Indeed, in commonly-adopted mechanisms, ads allocation and price optimization are decoupled, so that the advertisers optimize prices and bids, while the mechanism does so for the allocation, once prices and bids are given. We investigate how this decoupling affects the efficiency of mechanisms. In particular, we study the Price of Anarchy (PoA) and the Price of Stability (PoS) of indirect-revelation mechanisms with both VCG and GSP payments, showing that the PoS for the revenue may be unbounded even with two slots, and the PoA for the social welfare may be as large as the number of slots. Nevertheless, we show that, under some assumptions, simple modifications to the indirect-revelation mechanism with VCG payments achieve a PoS of 1 for the revenue. △ Less

Submitted 28 January, 2022; originally announced January 2022.

arXiv:2201.12183 [pdf, other]

Signaling in Posted Price Auctions

Authors: Matteo Castiglioni, Giulia Romano, Alberto Marchesi, Nicola Gatti

Abstract: We study single-item single-unit Bayesian posted price auctions, where buyers arrive sequentially and their valuations for the item being sold depend on a random, unknown state of nature. The seller has complete knowledge of the actual state and can send signals to the buyers so as to disclose information about it. For instance, the state of nature may reflect the condition and/or some particular… ▽ More We study single-item single-unit Bayesian posted price auctions, where buyers arrive sequentially and their valuations for the item being sold depend on a random, unknown state of nature. The seller has complete knowledge of the actual state and can send signals to the buyers so as to disclose information about it. For instance, the state of nature may reflect the condition and/or some particular features of the item, which are known to the seller only. The problem faced by the seller is about how to partially disclose information about the state so as to maximize revenue. Unlike classical signaling problems, in this setting, the seller must also correlate the signals being sent to the buyers with some price proposals for them. This introduces additional challenges compared to standard settings. We consider two cases: the one where the seller can only send signals publicly visible to all buyers, and the case in which the seller can privately send a different signal to each buyer. As a first step, we prove that, in both settings, the problem of maximizing the seller's revenue does not admit an FPTAS unless P=NP, even for basic instances with a single buyer. As a result, in the rest of the paper, we focus on designing PTASs. In order to do so, we first introduce a unifying framework encompassing both public and private signaling, whose core result is a decomposition lemma that allows focusing on a finite set of possible buyers' posteriors. This forms the basis on which our PTASs are developed. In particular, in the public signaling setting, our PTAS employs some ad hoc techniques based on linear programming, while our PTAS for the private setting relies on the ellipsoid method to solve an exponentially-sized LP in polynomial time. In the latter case, we need a custom approximate separation oracle, which we implement with a dynamic programming approach. △ Less

Submitted 29 March, 2022; v1 submitted 28 January, 2022; originally announced January 2022.

arXiv:2201.09728 [pdf, other]

Public Signaling in Bayesian Ad Auctions

Authors: Francesco Bacchiocchi, Matteo Castiglioni, Alberto Marchesi, Giulia Romano, Nicola Gatti

Abstract: We study signaling in Bayesian ad auctions, in which bidders' valuations depend on a random, unknown state of nature. The auction mechanism has complete knowledge of the actual state of nature, and it can send signals to bidders so as to disclose information about the state and increase revenue. For instance, a state may collectively encode some features of the user that are known to the mechanism… ▽ More We study signaling in Bayesian ad auctions, in which bidders' valuations depend on a random, unknown state of nature. The auction mechanism has complete knowledge of the actual state of nature, and it can send signals to bidders so as to disclose information about the state and increase revenue. For instance, a state may collectively encode some features of the user that are known to the mechanism only, since the latter has access to data sources unaccessible to the bidders. We study the problem of computing how the mechanism should send signals to bidders in order to maximize revenue. While this problem has already been addressed in the easier setting of second-price auctions, to the best of our knowledge, our work is the first to explore ad auctions with more than one slot. In this paper, we focus on public signaling and VCG mechanisms, under which bidders truthfully report their valuations. We start with a negative result, showing that, in general, the problem does not admit a PTAS unless P = NP, even when bidders' valuations are known to the mechanism. The rest of the paper is devoted to settings in which such negative result can be circumvented. First, we prove that, with known valuations, the problem can indeed be solved in polynomial time when either the number of states d or the number of slots m is fixed. Moreover, in the same setting, we provide an FPTAS for the case in which bidders are single minded, but d and m can be arbitrary. Then, we switch to the random valuations setting, in which these are randomly drawn according to some probability distribution. In this case, we show that the problem admits an FPTAS, a PTAS, and a QPTAS, when, respectively, d is fixed, m is fixed, and bidders' valuations are bounded away from zero. △ Less

Submitted 24 January, 2022; originally announced January 2022.

arXiv:2106.06480 [pdf, ps, other]

Multi-Receiver Online Bayesian Persuasion

Authors: Matteo Castiglioni, Alberto Marchesi, Andrea Celli, Nicola Gatti

Abstract: Bayesian persuasion studies how an informed sender should partially disclose information to influence the behavior of a self-interested receiver. Classical models make the stringent assumption that the sender knows the receiver's utility. This can be relaxed by considering an online learning framework in which the sender repeatedly faces a receiver of an unknown, adversarially selected type. We st… ▽ More Bayesian persuasion studies how an informed sender should partially disclose information to influence the behavior of a self-interested receiver. Classical models make the stringent assumption that the sender knows the receiver's utility. This can be relaxed by considering an online learning framework in which the sender repeatedly faces a receiver of an unknown, adversarially selected type. We study, for the first time, an online Bayesian persuasion setting with multiple receivers. We focus on the case with no externalities and binary actions, as customary in offline models. Our goal is to design no-regret algorithms for the sender with polynomial per-iteration running time. First, we prove a negative result: for any $0 < α\leq 1$, there is no polynomial-time no-$α$-regret algorithm when the sender's utility function is supermodular or anonymous. Then, we focus on the case of submodular sender's utility functions and we show that, in this case, it is possible to design a polynomial-time no-$(1 - \frac{1}{e})$-regret algorithm. To do so, we introduce a general online gradient descent scheme to handle online learning problems with a finite number of possible loss functions. This requires the existence of an approximate projection oracle. We show that, in our setting, there exists one such projection oracle which can be implemented in polynomial time. △ Less

Submitted 11 June, 2021; originally announced June 2021.

arXiv:2106.00319 [pdf, ps, other]

Bayesian Agency: Linear versus Tractable Contracts

Authors: Matteo Castiglioni, Alberto Marchesi, Nicola Gatti

Abstract: We study principal-agent problems in which a principal commits to an outcome-dependent payment scheme (a.k.a. contract) so as to induce an agent to take a costly, unobservable action. We relax the assumption that the principal perfectly knows the agent by considering a Bayesian setting where the agent's type is unknown and randomly selected according to a given probability distribution, which is k… ▽ More We study principal-agent problems in which a principal commits to an outcome-dependent payment scheme (a.k.a. contract) so as to induce an agent to take a costly, unobservable action. We relax the assumption that the principal perfectly knows the agent by considering a Bayesian setting where the agent's type is unknown and randomly selected according to a given probability distribution, which is known to the principal. Each agent's type is characterized by her own action costs and action-outcome distributions. In the literature on non-Bayesian principal-agent problems, considerable attention has been devoted to linear contracts, which are simple, pure-commission payment schemes that still provide nice approximation guarantees with respect to principal-optimal (possibly non-linear) contracts. While in non-Bayesian settings an optimal contract can be computed efficiently, this is no longer the case for our Bayesian principal-agent problems. This further motivates our focus on linear contracts, which can be optimized efficiently given their single-parameter nature. Our goal is to analyze the properties of linear contracts in Bayesian settings, in terms of approximation guarantees with respect to optimal contracts and general tractable contracts (i.e., efficiently-computable ones). First, we study the approximation guarantees of linear contracts with respect to optimal ones, showing that the former suffer from a multiplicative loss linear in the number of agent's types. Nevertheless, we prove that linear contracts can still provide a constant multiplicative approximation $ρ$ of the optimal principal's expected utility, though at the expense of an exponentially-small additive loss $2^{-Ω(ρ)}$. Then, we switch to tractable contracts, showing that, surprisingly, linear contracts perform well among them. △ Less

Submitted 1 June, 2021; originally announced June 2021.

arXiv:2104.01520 [pdf, ps, other]

Simple Uncoupled No-Regret Learning Dynamics for Extensive-Form Correlated Equilibrium

Authors: Gabriele Farina, Andrea Celli, Alberto Marchesi, Nicola Gatti

Abstract: The existence of simple uncoupled no-regret learning dynamics that converge to correlated equilibria in normal-form games is a celebrated result in the theory of multi-agent systems. Specifically, it has been known for more than 20 years that when all players seek to minimize their internal regret in a repeated normal-form game, the empirical frequency of play converges to a normal-form correlated… ▽ More The existence of simple uncoupled no-regret learning dynamics that converge to correlated equilibria in normal-form games is a celebrated result in the theory of multi-agent systems. Specifically, it has been known for more than 20 years that when all players seek to minimize their internal regret in a repeated normal-form game, the empirical frequency of play converges to a normal-form correlated equilibrium. Extensive-form games generalize normal-form games by modeling both sequential and simultaneous moves, as well as imperfect information. Because of the sequential nature and presence of private information in the game, correlation in extensive-form games possesses significantly different properties than its counterpart in normal-form games, many of which are still open research directions. Extensive-form correlated equilibrium (EFCE) has been proposed as the natural extensive-form counterpart to the classical notion of correlated equilibrium in normal-form games. Compared to the latter, the constraints that define the set of EFCEs are significantly more complex, as the correlation device must keep into account the evolution of beliefs of each player as they make observations throughout the game. Due to that significant added complexity, the existence of uncoupled learning dynamics leading to an EFCE has remained a challenging open research question for a long time. In this article, we settle that question by giving the first uncoupled no-regret dynamics that converge to the set of EFCEs in n-player general-sum extensive-form games with perfect recall. We show that each iterate can be computed in time polynomial in the size of the game tree, and that, when all players play repeatedly according to our learning dynamics, the empirical frequency of play is proven to be a O(T^-0.5)-approximate EFCE with high probability after T game repetitions, and an EFCE almost surely in the limit. △ Less

Submitted 27 May, 2021; v1 submitted 3 April, 2021; originally announced April 2021.

Comments: Extended version of our NeurIPS 2020 paper. Compared to the conference version, this preprint gives finer, in-high-probability regret bounds. We also better connected our work to the phi-regret minimization framework

arXiv:2012.06528 [pdf, ps, other]

Trembling-Hand Perfection and Correlation in Sequential Games

Authors: Alberto Marchesi, Nicola Gatti

Abstract: We initiate the study of trembling-hand perfection in sequential (i.e., extensive-form) games with correlation. We introduce the extensive-form perfect correlated equilibrium (EFPCE) as a refinement of the classical extensive-form correlated equilibrium (EFCE) that amends its weaknesses off the equilibrium path. This is achieved by accounting for the possibility that players may make mistakes whil… ▽ More We initiate the study of trembling-hand perfection in sequential (i.e., extensive-form) games with correlation. We introduce the extensive-form perfect correlated equilibrium (EFPCE) as a refinement of the classical extensive-form correlated equilibrium (EFCE) that amends its weaknesses off the equilibrium path. This is achieved by accounting for the possibility that players may make mistakes while following recommendations independently at each information set of the game. After providing an axiomatic definition of EFPCE, we show that one always exists since any perfect (Nash) equilibrium constitutes an EFPCE, and that it is a refinement of EFCE, as any EFPCE is also an EFCE. Then, we prove that, surprisingly, computing an EFPCE is not harder than finding an EFCE, since the problem can be solved in polynomial time for general n-player extensive-form games (also with chance). This is achieved by formulating the problem as that of finding a limit solution (as $ε\rightarrow 0$) to a suitably defined trembling LP parametrized by $ε$, featuring exponentially many variables and polynomially many constraints. To this end, we show how a recently developed polynomial-time algorithm for trembling LPs can be adapted to deal with problems having an exponential number of variables. This calls for the solution of a sequence of (non-trembling) LPs with exponentially many variables and polynomially many constraints, which is possible in polynomial time by applying an ellipsoid against hope approach. △ Less

Submitted 11 December, 2020; originally announced December 2020.

arXiv:2012.05774 [pdf, other]

Online Posted Pricing with Unknown Time-Discounted Valuations

Authors: Giulia Romano, Gianluca Tartaglia, Alberto Marchesi, Nicola Gatti

Abstract: We study the problem of designing posted-price mechanisms in order to sell a single unit of a single item within a finite period of time. Motivated by real-world problems, such as, e.g., long-term rental of rooms and apartments, we assume that customers arrive online according to a Poisson process, and their valuations are drawn from an unknown distribution and discounted over time. We evaluate ou… ▽ More We study the problem of designing posted-price mechanisms in order to sell a single unit of a single item within a finite period of time. Motivated by real-world problems, such as, e.g., long-term rental of rooms and apartments, we assume that customers arrive online according to a Poisson process, and their valuations are drawn from an unknown distribution and discounted over time. We evaluate our mechanisms in terms of competitive ratio, measuring the worst-case ratio between their revenue and that of an optimal mechanism that knows the distribution of valuations. First, we focus on the identical valuation setting, where all the customers value the item for the same amount. In this setting, we provide a mechanism M_c that achieves the best possible competitive ratio, discussing its dependency on the parameters in the case of linear discount. Then, we switch to the random valuation setting. We show that, if we restrict the attention to distributions of valuations with a monotone hazard rate, then the competitive ratio of M_c is lower bounded by a strictly positive constant that does not depend on the distribution. Moreover, we provide another mechanism, called M_pc, which is defined by a piecewise constant pricing strategy and reaches performances comparable to those obtained with M_c. This mechanism is useful when the seller cannot change the posted price too often. Finally, we empirically evaluate the performances of our mechanisms in a number of experimental settings. △ Less

Submitted 10 December, 2020; originally announced December 2020.

arXiv:2004.00603 [pdf, other]

No-Regret Learning Dynamics for Extensive-Form Correlated Equilibrium

Authors: Andrea Celli, Alberto Marchesi, Gabriele Farina, Nicola Gatti

Abstract: The existence of simple, uncoupled no-regret dynamics that converge to correlated equilibria in normal-form games is a celebrated result in the theory of multi-agent systems. Specifically, it has been known for more than 20 years that when all players seek to minimize their internal regret in a repeated normal-form game, the empirical frequency of play converges to a normal-form correlated equilib… ▽ More The existence of simple, uncoupled no-regret dynamics that converge to correlated equilibria in normal-form games is a celebrated result in the theory of multi-agent systems. Specifically, it has been known for more than 20 years that when all players seek to minimize their internal regret in a repeated normal-form game, the empirical frequency of play converges to a normal-form correlated equilibrium. Extensive-form (that is, tree-form) games generalize normal-form games by modeling both sequential and simultaneous moves, as well as private information. Because of the sequential nature and presence of partial information in the game, extensive-form correlation has significantly different properties than the normal-form counterpart, many of which are still open research directions. Extensive-form correlated equilibrium (EFCE) has been proposed as the natural extensive-form counterpart to normal-form correlated equilibrium. However, it was currently unknown whether EFCE emerges as the result of uncoupled agent dynamics. In this paper, we give the first uncoupled no-regret dynamics that converge to the set of EFCEs in $n$-player general-sum extensive-form games with perfect recall. First, we introduce a notion of trigger regret in extensive-form games, which extends that of internal regret in normal-form games. When each player has low trigger regret, the empirical frequency of play is close to an EFCE. Then, we give an efficient no-trigger-regret algorithm. Our algorithm decomposes trigger regret into local subproblems at each decision point for the player, and constructs a global strategy of the player from the local solutions at each decision point. △ Less

Submitted 2 September, 2022; v1 submitted 1 April, 2020; originally announced April 2020.

arXiv:2002.05190 [pdf, ps, other]

Signaling in Bayesian Network Congestion Games: the Subtle Power of Symmetry

Authors: Matteo Castiglioni, Andrea Celli, Alberto Marchesi, Nicola Gatti

Abstract: Network congestion games are a well-understood model of multi-agent strategic interactions. Despite their ubiquitous applications, it is not clear whether it is possible to design information structures to ameliorate the overall experience of the network users. We focus on Bayesian games with atomic players, where network vagaries are modeled via a (random) state of nature which determines the cos… ▽ More Network congestion games are a well-understood model of multi-agent strategic interactions. Despite their ubiquitous applications, it is not clear whether it is possible to design information structures to ameliorate the overall experience of the network users. We focus on Bayesian games with atomic players, where network vagaries are modeled via a (random) state of nature which determines the costs incurred by the players. A third-party entity---the sender---can observe the realized state of the network and exploit this additional information to send a signal to each player. A natural question is the following: is it possible for an informed sender to reduce the overall social cost via the strategic provision of information to players who update their beliefs rationally? The paper focuses on the problem of computing optimal ex ante persuasive signaling schemes, showing that symmetry is a crucial property for its solution. Indeed, we show that an optimal ex ante persuasive signaling scheme can be computed in polynomial time when players are symmetric and have affine cost functions. Moreover, the problem becomes NP-hard when players are asymmetric, even in non-Bayesian settings. △ Less

Submitted 12 February, 2020; originally announced February 2020.

arXiv:1911.07755 [pdf, other]

Learning Probably Approximately Correct Maximin Strategies in Simulation-Based Games with Infinite Strategy Spaces

Authors: Alberto Marchesi, Francesco Trovò, Nicola Gatti

Abstract: We tackle the problem of learning equilibria in simulation-based games. In such games, the players' utility functions cannot be described analytically, as they are given through a black-box simulator that can be queried to obtain noisy estimates of the utilities. This is the case in many real-world games in which a complete description of the elements involved is not available upfront, such as com… ▽ More We tackle the problem of learning equilibria in simulation-based games. In such games, the players' utility functions cannot be described analytically, as they are given through a black-box simulator that can be queried to obtain noisy estimates of the utilities. This is the case in many real-world games in which a complete description of the elements involved is not available upfront, such as complex military settings and online auctions. In these situations, one usually needs to run costly simulation processes to get an accurate estimate of the game outcome. As a result, solving these games begets the challenge of designing learning algorithms that can find (approximate) equilibria with high confidence, using as few simulator queries as possible. Moreover, since running the simulator during the game is unfeasible, the algorithms must first perform a pure exploration learning phase and, then, use the (approximate) equilibrium learned this way to play the game. In this work, we focus on two-player zero-sum games with infinite strategy spaces. Drawing from the best arm identification literature, we design two algorithms with theoretical guarantees to learn maximin strategies in these games. The first one works in the fixed-confidence setting, guaranteeing the desired confidence level while minimizing the number of queries. Instead, the second algorithm fits the fixed-budget setting, maximizing the confidence without exceeding the given maximum number of queries. First, we formally prove δ-PAC theoretical guarantees for our algorithms under some regularity assumptions, which are encoded by letting the utility functions be drawn from a Gaussian process. Then, we experimentally evaluate our techniques on a testbed made of randomly generated games and instances representing simple real-world security settings. △ Less

Submitted 25 February, 2020; v1 submitted 18 November, 2019; originally announced November 2019.

arXiv:1910.06228 [pdf, other]

Learning to Correlate in Multi-Player General-Sum Sequential Games

Authors: Andrea Celli, Alberto Marchesi, Tommaso Bianchi, Nicola Gatti

Abstract: In the context of multi-player, general-sum games, there is an increasing interest in solution concepts modeling some form of communication among players, since they can lead to socially better outcomes with respect to Nash equilibria, and may be reached through learning dynamics in a decentralized fashion. In this paper, we focus on coarse correlated equilibria (CCEs) in sequential games. First,… ▽ More In the context of multi-player, general-sum games, there is an increasing interest in solution concepts modeling some form of communication among players, since they can lead to socially better outcomes with respect to Nash equilibria, and may be reached through learning dynamics in a decentralized fashion. In this paper, we focus on coarse correlated equilibria (CCEs) in sequential games. First, we complete the picture on the complexity of finding social-welfare-maximizing CCEs by showing that the problem is not in Poly-APX unless P = NP. Furthermore, simple arguments show that CFR - working with behavioral strategies - may not converge to a CCE. However, we devise a simple variant (CFR-S) which provably converges to the set of CCEs, but may be empirically inefficient. Thus, we design a variant of the CFR algorithm (called CFR-Jr) which approaches the set of CCEs with a regret bound sub-linear in the size of the game, and is shown to be dramatically faster than CFR-S and the state-of-the-art algorithms to compute CCEs. △ Less

Submitted 14 October, 2019; originally announced October 2019.

arXiv:1905.13108 [pdf, other]

Leadership in Congestion Games: Multiple User Classes and Non-Singleton Actions (Extended Version)

Authors: Alberto Marchesi, Matteo Castiglioni, Nicola Gatti

Abstract: We study the problem of finding Stackelberg equilibria in games with a massive number of players. So far, the only known game instances in which the problem is solved in polynomial time are some particular congestion games. However, a complete characterization of hard and easy instances is still lacking. In this paper, we extend the state of the art along two main directions. First, we focus on ga… ▽ More We study the problem of finding Stackelberg equilibria in games with a massive number of players. So far, the only known game instances in which the problem is solved in polynomial time are some particular congestion games. However, a complete characterization of hard and easy instances is still lacking. In this paper, we extend the state of the art along two main directions. First, we focus on games where players' actions are made of multiple resources, and we prove that the problem is NP-hard and not in Poly-APX unless P = NP, even in the basic case in which players are symmetric, their actions are made of only two resources, and the cost functions are monotonic. Second, we focus on games with singleton actions where the players are partitioned into classes, depending on which actions they have available. In this case, we provide a dynamic programming algorithm that finds an equilibrium in polynomial time, when the number of classes is fixed and the leader plays pure strategies. Moreover, we prove that, if we allow for leader's mixed strategies, then the problem becomes NP-hard even with only four classes and monotonic costs. Finally, for both settings, we provide mixed-integer linear programming formulations, and we experimentally evaluate their scalability on both random game instances and worst-case instances based on our hardness reductions. △ Less

Submitted 30 May, 2019; originally announced May 2019.

arXiv:1905.13106 [pdf, other]

Be a Leader or Become a Follower: The Strategy to Commit to with Multiple Leaders (Extended Version)

Authors: Matteo Castiglioni, Alberto Marchesi, Nicola Gatti

Abstract: We study the problem of computing correlated strategies to commit to in games with multiple leaders and followers. To the best of our knowledge, this problem is widely unexplored so far, as the majority of the works in the literature focus on games with a single leader and one or more followers. The fundamental ingredient of our model is that a leader can decide whether to participate in the commi… ▽ More We study the problem of computing correlated strategies to commit to in games with multiple leaders and followers. To the best of our knowledge, this problem is widely unexplored so far, as the majority of the works in the literature focus on games with a single leader and one or more followers. The fundamental ingredient of our model is that a leader can decide whether to participate in the commitment or to defect from it by taking on the role of follower. This introduces a preliminary stage where, before the underlying game is played, the leaders make their decisions to reach an agreement on the correlated strategy to commit to. We distinguish three solution concepts on the basis of the constraints that they enforce on the agreement reached by the leaders. Then, we provide a comprehensive study of the properties of our solution concepts, in terms of existence, relation with other solution concepts, and computational complexity. △ Less

Submitted 30 May, 2019; originally announced May 2019.

arXiv:1811.03871 [pdf, other]

Quasi-Perfect Stackelberg Equilibrium

Authors: Alberto Marchesi, Gabriele Farina, Christian Kroer, Nicola Gatti, Tuomas Sandholm

Abstract: Equilibrium refinements are important in extensive-form (i.e., tree-form) games, where they amend weaknesses of the Nash equilibrium concept by requiring sequential rationality and other beneficial properties. One of the most attractive refinement concepts is quasi-perfect equilibrium. While quasi-perfection has been studied in extensive-form games, it is poorly understood in Stackelberg settings-… ▽ More Equilibrium refinements are important in extensive-form (i.e., tree-form) games, where they amend weaknesses of the Nash equilibrium concept by requiring sequential rationality and other beneficial properties. One of the most attractive refinement concepts is quasi-perfect equilibrium. While quasi-perfection has been studied in extensive-form games, it is poorly understood in Stackelberg settings---that is, settings where a leader can commit to a strategy---which are important for modeling, for example, security games. In this paper, we introduce the axiomatic definition of quasi-perfect Stackelberg equilibrium. We develop a broad class of game perturbation schemes that lead to them in the limit. Our class of perturbation schemes strictly generalizes prior perturbation schemes introduced for the computation of (non-Stackelberg) quasi-perfect equilibria. Based on our perturbation schemes, we develop a branch-and-bound algorithm for computing a quasi-perfect Stackelberg equilibrium. It leverages a perturbed variant of the linear program for computing a Stackelberg extensive-form correlated equilibrium. Experiments show that our algorithm can be used to find an approximate quasi-perfect Stackelberg equilibrium in games with thousands of nodes. △ Less

Submitted 9 November, 2018; originally announced November 2018.

arXiv:1808.10209 [pdf, other]

Leadership in Singleton Congestion Games: What is Hard and What is Easy

Authors: Matteo Castiglioni, Alberto Marchesi, Nicola Gatti, Stefano Coniglio

Abstract: We study the problem of computing Stackelberg equilibria Stackelberg games whose underlying structure is in congestion games, focusing on the case where each player can choose a single resource (a.k.a. singleton congestion games) and one of them acts as leader. In particular, we address the cases where the players either have the same action spaces (i.e., the set of resources they can choose is th… ▽ More We study the problem of computing Stackelberg equilibria Stackelberg games whose underlying structure is in congestion games, focusing on the case where each player can choose a single resource (a.k.a. singleton congestion games) and one of them acts as leader. In particular, we address the cases where the players either have the same action spaces (i.e., the set of resources they can choose is the same for all of them) or different ones, and where their costs are either monotonic functions of the resource congestion or not. We show that, in the case where the players have different action spaces, the cost the leader incurs in a Stackelberg equilibrium cannot be approximated in polynomial time up to within any polynomial factor in the size of the game unless P = NP, independently of the cost functions being monotonic or not. We show that a similar result also holds when the players have nonmonotonic cost functions, even if their action spaces are the same. Differently, we prove that the case with identical action spaces and monotonic cost functions is easy, and propose polynomial-time algorithm for it. We also improve an algorithm for the computation of a socially optimal equilibrium in singleton congestion games with the same action spaces without leadership, and extend it to the computation of a Stackelberg equilibrium for the case where the leader is restricted to pure strategies. For the cases in which the problem of finding an equilibrium is hard, we show how, in the optimistic setting where the followers break ties in favor of the leader, the problem can be formulated via mixed-integer linear programming techniques, which computational experiments show to scale quite well. △ Less

Submitted 30 August, 2018; originally announced August 2018.

arXiv:1808.01438 [pdf, other]

Computing a Pessimistic Leader-Follower Equilibrium with Multiple Followers: the Mixed-Pure Case

Authors: Stefano Coniglio, Nicola Gatti, Alberto Marchesi

Abstract: The search problem of computing a \textit{leader-follower equilibrium} has been widely investigated in the scientific literature in, almost exclusively, the single-follower setting. Although the \textit{optimistic} and \ textit{pessimistic} versions of the problem are solved with different methodologies, both cases allow for efficient, polynomial-time algorithms based on linear programming. The si… ▽ More The search problem of computing a \textit{leader-follower equilibrium} has been widely investigated in the scientific literature in, almost exclusively, the single-follower setting. Although the \textit{optimistic} and \ textit{pessimistic} versions of the problem are solved with different methodologies, both cases allow for efficient, polynomial-time algorithms based on linear programming. The situation is different with multiple followers, where results are only sporadic and depend strictly on the nature of the followers' game. In this paper, we investigate the setting of a normal-form game with a single leader and multiple followers who, after observing the leader's commitment, play a Nash equilibrium. The corresponding search problem, both in the optimistic and pessimistic versions, is known to be not in Poly-$\textsf{APX}$ unless $\textsf{P}=\textsf{NP}$ and exact algorithms are known only for the optimistic case. We focus on the case where the followers play in pure strategies under the assumption of pessimism. After casting this search problem as a \italic{pessimistic bilevel programming problem}, we show that, with two followers, the problem is $\textsf{NP}$-hard and, with three or more followers, it is not in Poly-$\textsf{APX}$ unless $\textsf{P}=\textsf{NP}$. We propose a single-level mathematical programming reformulation which calls for the maximisation of a nonconcave quadratic function over an unbounded nonconvex feasible region defined by linear and quadratic constraints. Since, due to admitting a supremum but not a maximum, only a restricted version of this formulation can be solved to optimality with state-of-the-art methods, we propose an exact \textit{ad hoc} algorithm, which we also embed within a branch-and-bound scheme, capable of computing the supremum of the problem. △ Less

Submitted 4 August, 2018; originally announced August 2018.

arXiv:1807.11914 [pdf, other]

Computing the Strategy to Commit to in Polymatrix Games (Extended Version)

Authors: Giuseppe De Nittis, Alberto Marchesi, Nicola Gatti

Abstract: Leadership games provide a powerful paradigm to model many real-world settings. Most literature focuses on games with a single follower who acts optimistically, breaking ties in favour of the leader. Unfortunately, for real-world applications, this is unlikely. In this paper, we look for efficiently solvable games with multiple followers who play either optimistically or pessimistically, i.e., bre… ▽ More Leadership games provide a powerful paradigm to model many real-world settings. Most literature focuses on games with a single follower who acts optimistically, breaking ties in favour of the leader. Unfortunately, for real-world applications, this is unlikely. In this paper, we look for efficiently solvable games with multiple followers who play either optimistically or pessimistically, i.e., breaking ties in favour or against the leader. We study the computational complexity of finding or approximating an optimistic or pessimistic leader-follower equilibrium in specific classes of succinct games---polymatrix like---which are equivalent to 2-player Bayesian games with uncertainty over the follower, with interdependent or independent types. Furthermore, we provide an exact algorithm to find a pessimistic equilibrium for those game classes. Finally, we show that in general polymatrix games the computation is harder even when players are forced to play pure strategies. △ Less

Submitted 31 July, 2018; originally announced July 2018.

Showing 1–36 of 36 results for author: Marchesi, A