Search | arXiv e-print repository

arXiv:2407.05793 [pdf, other]

A Primal-Dual Online Learning Approach for Dynamic Pricing of Sequentially Displayed Complementary Items under Sale Constraints

Authors: Francesco Emanuele Stradi, Filippo Cipriani, Lorenzo Ciampiconi, Marco Leonardi, Alessandro Rozza, Nicola Gatti

Abstract: We address the challenging problem of dynamically pricing complementary items that are sequentially displayed to customers. An illustrative example is the online sale of flight tickets, where customers navigate through multiple web pages. Initially, they view the ticket cost, followed by ancillary expenses such as insurance and additional luggage fees. Coherent pricing policies for complementary i… ▽ More We address the challenging problem of dynamically pricing complementary items that are sequentially displayed to customers. An illustrative example is the online sale of flight tickets, where customers navigate through multiple web pages. Initially, they view the ticket cost, followed by ancillary expenses such as insurance and additional luggage fees. Coherent pricing policies for complementary items are essential because optimizing the pricing of each item individually is ineffective. Our scenario also involves a sales constraint, which specifies a minimum number of items to sell, and uncertainty regarding customer demand curves. To tackle this problem, we originally formulate it as a Markov Decision Process with constraints. Leveraging online learning tools, we design a primal-dual online optimization algorithm. We empirically evaluate our approach using synthetic settings randomly generated from real-world data, covering various configurations from stationary to non-stationary, and compare its performance in terms of constraints violation and regret against well-known baselines optimizing each state singularly. △ Less

Submitted 8 July, 2024; originally announced July 2024.

arXiv:2405.14372 [pdf, ps, other]

Learning Constrained Markov Decision Processes With Non-stationary Rewards and Constraints

Authors: Francesco Emanuele Stradi, Anna Lunghi, Matteo Castiglioni, Alberto Marchesi, Nicola Gatti

Abstract: In constrained Markov decision processes (CMDPs) with adversarial rewards and constraints, a well-known impossibility result prevents any algorithm from attaining both sublinear regret and sublinear constraint violation, when competing against a best-in-hindsight policy that satisfies constraints on average. In this paper, we show that this negative result can be eased in CMDPs with non-stationary… ▽ More In constrained Markov decision processes (CMDPs) with adversarial rewards and constraints, a well-known impossibility result prevents any algorithm from attaining both sublinear regret and sublinear constraint violation, when competing against a best-in-hindsight policy that satisfies constraints on average. In this paper, we show that this negative result can be eased in CMDPs with non-stationary rewards and constraints, by providing algorithms whose performances smoothly degrade as non-stationarity increases. Specifically, we propose algorithms attaining $\tilde{\mathcal{O}} (\sqrt{T} + C)$ regret and positive constraint violation under bandit feedback, where $C$ is a corruption value measuring the environment non-stationarity. This can be $Θ(T)$ in the worst case, coherently with the impossibility result for adversarial CMDPs. First, we design an algorithm with the desired guarantees when $C$ is known. Then, in the case $C$ is unknown, we show how to obtain the same results by embedding such an algorithm in a general meta-procedure. This is of independent interest, as it can be applied to any non-stationary constrained online learning setting. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.06977 [pdf, ps, other]

The Sample Complexity of Stackelberg Games

Authors: Francesco Bacchiocchi, Matteo Bollini, Matteo Castiglioni, Alberto Marchesi, Nicola Gatti

Abstract: Stackelberg games (SGs) constitute the most fundamental and acclaimed models of strategic interactions involving some form of commitment. Moreover, they form the basis of more elaborate models of this kind, such as, e.g., Bayesian persuasion and principal-agent problems. Addressing learning tasks in SGs and related models is crucial to operationalize them in practice, where model parameters are us… ▽ More Stackelberg games (SGs) constitute the most fundamental and acclaimed models of strategic interactions involving some form of commitment. Moreover, they form the basis of more elaborate models of this kind, such as, e.g., Bayesian persuasion and principal-agent problems. Addressing learning tasks in SGs and related models is crucial to operationalize them in practice, where model parameters are usually unknown. In this paper, we revise the sample complexity of learning an optimal strategy to commit to in SGs. We provide a novel algorithm that (i) does not require any of the limiting assumptions made by state-of-the-art approaches and (ii) deals with a trade-off between sample complexity and termination probability arising when leader's strategies representation has finite precision. Such a trade-off has been completely neglected by existing algorithms and, if not properly managed, it may result in them using exponentially-many samples. Our algorithm requires novel techniques, which also pave the way to addressing learning problems in other models with commitment ubiquitous in the real world. △ Less

Submitted 11 May, 2024; originally announced May 2024.

arXiv:2403.03672 [pdf, ps, other]

Learning Adversarial MDPs with Stochastic Hard Constraints

Authors: Francesco Emanuele Stradi, Matteo Castiglioni, Alberto Marchesi, Nicola Gatti

Abstract: We study online learning problems in constrained Markov decision processes (CMDPs) with adversarial losses and stochastic hard constraints. We consider two different scenarios. In the first one, we address general CMDPs, where we design an algorithm that attains sublinear regret and cumulative positive constraints violation. In the second scenario, under the mild assumption that a policy strictly… ▽ More We study online learning problems in constrained Markov decision processes (CMDPs) with adversarial losses and stochastic hard constraints. We consider two different scenarios. In the first one, we address general CMDPs, where we design an algorithm that attains sublinear regret and cumulative positive constraints violation. In the second scenario, under the mild assumption that a policy strictly satisfying the constraints exists and is known to the learner, we design an algorithm that achieves sublinear regret while ensuring that the constraints are satisfied at every episode with high probability. To the best of our knowledge, our work is the first to study CMDPs involving both adversarial losses and hard constraints. Indeed, previous works either focus on much weaker soft constraints--allowing for positive violation to cancel out negative ones--or are restricted to stochastic losses. Thus, our algorithms can deal with general non-stationary environments subject to requirements much stricter than those manageable with state-of-the-art algorithms. This enables their adoption in a much wider range of real-world applications, ranging from autonomous driving to online advertising and recommender systems. △ Less

Submitted 20 March, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

arXiv:2402.13824 [pdf, ps, other]

Multi-Agent Contract Design beyond Binary Actions

Authors: Federico Cacciamani, Martino Bernasconi, Matteo Castiglioni, Nicola Gatti

Abstract: We study hidden-action principal-agent problems with multiple agents. Unlike previous work, we consider a general setting in which each agent has an arbitrary number of actions, and the joint action induces outcomes according to an arbitrary distribution. We study two classes of mechanisms: a class of deterministic mechanisms that is the natural extension of single-agent contracts, in which the ag… ▽ More We study hidden-action principal-agent problems with multiple agents. Unlike previous work, we consider a general setting in which each agent has an arbitrary number of actions, and the joint action induces outcomes according to an arbitrary distribution. We study two classes of mechanisms: a class of deterministic mechanisms that is the natural extension of single-agent contracts, in which the agents play a Nash equilibrium of the game induced by the contract, and a class of randomized mechanisms that is inspired by single-agent randomized contracts and correlated equilibria. △ Less

Submitted 21 February, 2024; originally announced February 2024.

arXiv:2402.03077 [pdf, ps, other]

Markov Persuasion Processes: Learning to Persuade from Scratch

Authors: Francesco Bacchiocchi, Francesco Emanuele Stradi, Matteo Castiglioni, Alberto Marchesi, Nicola Gatti

Abstract: In Bayesian persuasion, an informed sender strategically discloses information to a receiver so as to persuade them to undertake desirable actions. Recently, a growing attention has been devoted to settings in which sender and receivers interact sequentially. Recently, Markov persuasion processes (MPPs) have been introduced to capture sequential scenarios where a sender faces a stream of myopic re… ▽ More In Bayesian persuasion, an informed sender strategically discloses information to a receiver so as to persuade them to undertake desirable actions. Recently, a growing attention has been devoted to settings in which sender and receivers interact sequentially. Recently, Markov persuasion processes (MPPs) have been introduced to capture sequential scenarios where a sender faces a stream of myopic receivers in a Markovian environment. The MPPs studied so far in the literature suffer from issues that prevent them from being fully operational in practice, e.g., they assume that the sender knows receivers' rewards. We fix such issues by addressing MPPs where the sender has no knowledge about the environment. We design a learning algorithm for the sender, working with partial feedback. We prove that its regret with respect to an optimal information-disclosure policy grows sublinearly in the number of episodes, as it is the case for the loss in persuasiveness cumulated while learning. Moreover, we provide a lower bound for our setting matching the guarantees of our algorithm. △ Less

Submitted 6 March, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

arXiv:2310.02975 [pdf, ps, other]

$(ε, u)$-Adaptive Regret Minimization in Heavy-Tailed Bandits

Authors: Gianmarco Genalti, Lupo Marsigli, Nicola Gatti, Alberto Maria Metelli

Abstract: Heavy-tailed distributions naturally arise in several settings, from finance to telecommunications. While regret minimization under subgaussian or bounded rewards has been widely studied, learning with heavy-tailed distributions only gained popularity over the last decade. In this paper, we consider the setting in which the reward distributions have finite absolute raw moments of maximum order… ▽ More Heavy-tailed distributions naturally arise in several settings, from finance to telecommunications. While regret minimization under subgaussian or bounded rewards has been widely studied, learning with heavy-tailed distributions only gained popularity over the last decade. In this paper, we consider the setting in which the reward distributions have finite absolute raw moments of maximum order $1+ε$, uniformly bounded by a constant $u<+\infty$, for some $ε\in (0,1]$. In this setting, we study the regret minimization problem when $ε$ and $u$ are unknown to the learner and it has to adapt. First, we show that adaptation comes at a cost and derive two negative results proving that the same regret guarantees of the non-adaptive case cannot be achieved with no further assumptions. Then, we devise and analyze a fully data-driven trimmed mean estimator and propose a novel adaptive regret minimization algorithm, AdaR-UCB, that leverages such an estimator. Finally, we show that AdaR-UCB is the first algorithm that, under a known distributional assumption, enjoys regret guarantees nearly matching those of the non-adaptive heavy-tailed case. △ Less

Submitted 12 February, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

arXiv:2309.09801 [pdf, ps, other]

Learning Optimal Contracts: How to Exploit Small Action Spaces

Authors: Francesco Bacchiocchi, Matteo Castiglioni, Alberto Marchesi, Nicola Gatti

Abstract: We study principal-agent problems in which a principal commits to an outcome-dependent payment scheme -- called contract -- in order to induce an agent to take a costly, unobservable action leading to favorable outcomes. We consider a generalization of the classical (single-round) version of the problem in which the principal interacts with the agent by committing to contracts over multiple rounds… ▽ More We study principal-agent problems in which a principal commits to an outcome-dependent payment scheme -- called contract -- in order to induce an agent to take a costly, unobservable action leading to favorable outcomes. We consider a generalization of the classical (single-round) version of the problem in which the principal interacts with the agent by committing to contracts over multiple rounds. The principal has no information about the agent, and they have to learn an optimal contract by only observing the outcome realized at each round. We focus on settings in which the size of the agent's action space is small. We design an algorithm that learns an approximately-optimal contract with high probability in a number of rounds polynomial in the size of the outcome space, when the number of actions is constant. Our algorithm solves an open problem by Zhu et al.[2022]. Moreover, it can also be employed to provide a $\tilde{\mathcal{O}}(T^{4/5})$ regret bound in the related online learning setting in which the principal aims at maximizing their cumulative utility, thus considerably improving previously-known regret bounds. △ Less

Submitted 7 June, 2024; v1 submitted 18 September, 2023; originally announced September 2023.

arXiv:2308.16017 [pdf, ps, other]

Hidden-Role Games: Equilibrium Concepts and Computation

Authors: Luca Carminati, Brian Hu Zhang, Gabriele Farina, Nicola Gatti, Tuomas Sandholm

Abstract: In this paper, we study the class of games known as hidden-role games in which players are assigned privately to teams and are faced with the challenge of recognizing and cooperating with teammates. This model includes both popular recreational games such as the Mafia/Werewolf family and The Resistance (Avalon) and many real-world settings, such as distributed systems where nodes need to work toge… ▽ More In this paper, we study the class of games known as hidden-role games in which players are assigned privately to teams and are faced with the challenge of recognizing and cooperating with teammates. This model includes both popular recreational games such as the Mafia/Werewolf family and The Resistance (Avalon) and many real-world settings, such as distributed systems where nodes need to work together to accomplish a goal in the face of possible corruptions. There has been little to no formal mathematical grounding of such settings in the literature, and it was previously not even clear what the right solution concepts (notions of equilibria) should be. A suitable notion of equilibrium should take into account the communication channels available to the players (e.g., can they communicate? Can they communicate in private?). Defining such suitable notions turns out to be a nontrivial task with several surprising consequences. In this paper, we provide the first rigorous definition of equilibrium for hidden-role games, which overcomes serious limitations of other solution concepts not designed for hidden-role games. We then show that in certain cases, including the above recreational games, optimal equilibria can be computed efficiently. In most other cases, we show that computing an optimal equilibrium is at least NP-hard or coNP-hard. Lastly, we experimentally validate our approach by computing exact equilibria for complete 5- and 6-player Avalon instances whose size in terms of number of information sets is larger than $10^{56}$. △ Less

Submitted 17 February, 2024; v1 submitted 30 August, 2023; originally announced August 2023.

arXiv:2307.06210 [pdf, ps, other]

Online Information Acquisition: Hiring Multiple Agents

Authors: Federico Cacciamani, Matteo Castiglioni, Nicola Gatti

Abstract: We investigate the mechanism design problem faced by a principal who hires \emph{multiple} agents to gather and report costly information. Then, the principal exploits the information to make an informed decision. We model this problem as a game, where the principal announces a mechanism consisting in action recommendations and a payment function, a.k.a. scoring rule. Then, each agent chooses an e… ▽ More We investigate the mechanism design problem faced by a principal who hires \emph{multiple} agents to gather and report costly information. Then, the principal exploits the information to make an informed decision. We model this problem as a game, where the principal announces a mechanism consisting in action recommendations and a payment function, a.k.a. scoring rule. Then, each agent chooses an effort level and receives partial information about an underlying state of nature based on the effort. Finally, the agents report the information (possibly non-truthfully), the principal takes a decision based on this information, and the agents are paid according to the scoring rule. While previous work focuses on single-agent problems, we consider multi-agents settings. This poses the challenge of coordinating the agents' efforts and aggregating correlated information. Indeed, we show that optimal mechanisms must correlate agents' efforts, which introduces externalities among the agents, and hence complex incentive compatibility constraints and equilibrium selection problems. First, we design a polynomial-time algorithm to find an optimal incentive compatible mechanism. Then, we study an online problem, where the principal repeatedly interacts with a group of unknown agents. We design a no-regret algorithm that provides $\widetilde{\mathcal{O}}(T^{2/3})$ regret with respect to an optimal mechanism, matching the state-of-the-art bound for single-agent settings. △ Less

Submitted 12 July, 2023; originally announced July 2023.

arXiv:2306.05221 [pdf, other]

Steering No-Regret Learners to a Desired Equilibrium

Authors: Brian Hu Zhang, Gabriele Farina, Ioannis Anagnostides, Federico Cacciamani, Stephen Marcus McAleer, Andreas Alexander Haupt, Andrea Celli, Nicola Gatti, Vincent Conitzer, Tuomas Sandholm

Abstract: A mediator observes no-regret learners playing an extensive-form game repeatedly across $T$ rounds. The mediator attempts to steer players toward some desirable predetermined equilibrium by giving (nonnegative) payments to players. We call this the steering problem. The steering problem captures problems several problems of interest, among them equilibrium selection and information design (persuas… ▽ More A mediator observes no-regret learners playing an extensive-form game repeatedly across $T$ rounds. The mediator attempts to steer players toward some desirable predetermined equilibrium by giving (nonnegative) payments to players. We call this the steering problem. The steering problem captures problems several problems of interest, among them equilibrium selection and information design (persuasion). If the mediator's budget is unbounded, steering is trivial because the mediator can simply pay the players to play desirable actions. We study two bounds on the mediator's payments: a total budget and a per-round budget. If the mediator's total budget does not grow with $T$, we show that steering is impossible. However, we show that it is enough for the total budget to grow sublinearly with $T$, that is, for the average payment to vanish. When players' full strategies are observed at each round, we show that constant per-round budgets permit steering. In the more challenging setting where only trajectories through the game tree are observable, we show that steering is impossible with constant per-round budgets in general extensive-form games, but possible in normal-form games or if the per-round budget may itself depend on $T$. We also show how our results can be generalized to the case when the equilibrium is being computed online while steering is happening. We supplement our theoretical positive results with experiments highlighting the efficacy of steering in large games. △ Less

Submitted 17 February, 2024; v1 submitted 8 June, 2023; originally announced June 2023.

arXiv:2306.05216 [pdf, ps, other]

Computing Optimal Equilibria and Mechanisms via Learning in Zero-Sum Extensive-Form Games

Authors: Brian Hu Zhang, Gabriele Farina, Ioannis Anagnostides, Federico Cacciamani, Stephen Marcus McAleer, Andreas Alexander Haupt, Andrea Celli, Nicola Gatti, Vincent Conitzer, Tuomas Sandholm

Abstract: We introduce a new approach for computing optimal equilibria via learning in games. It applies to extensive-form settings with any number of players, including mechanism design, information design, and solution concepts such as correlated, communication, and certification equilibria. We observe that optimal equilibria are minimax equilibrium strategies of a player in an extensive-form zero-sum gam… ▽ More We introduce a new approach for computing optimal equilibria via learning in games. It applies to extensive-form settings with any number of players, including mechanism design, information design, and solution concepts such as correlated, communication, and certification equilibria. We observe that optimal equilibria are minimax equilibrium strategies of a player in an extensive-form zero-sum game. This reformulation allows to apply techniques for learning in zero-sum games, yielding the first learning dynamics that converge to optimal equilibria, not only in empirical averages, but also in iterates. We demonstrate the practical scalability and flexibility of our approach by attaining state-of-the-art performance in benchmark tabular games, and by computing an optimal mechanism for a sequential auction design problem using deep reinforcement learning. △ Less

Submitted 23 May, 2024; v1 submitted 8 June, 2023; originally announced June 2023.

arXiv:2304.14326 [pdf, ps, other]

A Best-of-Both-Worlds Algorithm for Constrained MDPs with Long-Term Constraints

Authors: Jacopo Germano, Francesco Emanuele Stradi, Gianmarco Genalti, Matteo Castiglioni, Alberto Marchesi, Nicola Gatti

Abstract: We study online learning in episodic constrained Markov decision processes (CMDPs), where the goal of the learner is to collect as much reward as possible over the episodes, while guaranteeing that some long-term constraints are satisfied during the learning process. Rewards and constraints can be selected either stochastically or adversarially, and the transition function is not known to the lear… ▽ More We study online learning in episodic constrained Markov decision processes (CMDPs), where the goal of the learner is to collect as much reward as possible over the episodes, while guaranteeing that some long-term constraints are satisfied during the learning process. Rewards and constraints can be selected either stochastically or adversarially, and the transition function is not known to the learner. While online learning in classical unconstrained MDPs has received considerable attention over the last years, the setting of CMDPs is still largely unexplored. This is surprising, since in real-world applications, such as, e.g., autonomous driving, automated bidding, and recommender systems, there are usually additional constraints and specifications that an agent has to obey during the learning process. In this paper, we provide the first best-of-both-worlds algorithm for CMDPs with long-term constraints. Our algorithm is capable of handling settings in which rewards and constraints are selected either stochastically or adversarially, without requiring any knowledge of the underling process. Moreover, our algorithm matches state-of-the-art regret and constraint violation bounds for settings in which constraints are selected stochastically, while it is the first to provide guarantees in the case in which they are chosen adversarially. △ Less

Submitted 27 April, 2023; originally announced April 2023.

arXiv:2303.01296 [pdf, ps, other]

Optimal Rates and Efficient Algorithms for Online Bayesian Persuasion

Authors: Martino Bernasconi, Matteo Castiglioni, Andrea Celli, Alberto Marchesi, Nicola Gatti, Francesco Trovò

Abstract: Bayesian persuasion studies how an informed sender should influence beliefs of rational receivers who take decisions through Bayesian updating of a common prior. We focus on the online Bayesian persuasion framework, in which the sender repeatedly faces one or more receivers with unknown and adversarially selected types. First, we show how to obtain a tight $\tilde O(T^{1/2})$ regret bound in the c… ▽ More Bayesian persuasion studies how an informed sender should influence beliefs of rational receivers who take decisions through Bayesian updating of a common prior. We focus on the online Bayesian persuasion framework, in which the sender repeatedly faces one or more receivers with unknown and adversarially selected types. First, we show how to obtain a tight $\tilde O(T^{1/2})$ regret bound in the case in which the sender faces a single receiver and has partial feedback, improving over the best previously known bound of $\tilde O(T^{4/5})$. Then, we provide the first no-regret guarantees for the multi-receiver setting under partial feedback. Finally, we show how to design no-regret algorithms with polynomial per-iteration running time by exploiting type reporting, thereby circumventing known intractability results on online Bayesian persuasion. We provide efficient algorithms guaranteeing a $O(T^{1/2})$ regret upper bound both in the single- and multi-receiver scenario when type reporting is allowed. △ Less

Submitted 2 March, 2023; originally announced March 2023.

arXiv:2302.02873 [pdf, ps, other]

Online Mechanism Design for Information Acquisition

Authors: Federico Cacciamani, Matteo Castiglioni, Nicola Gatti

Abstract: We study the problem of designing mechanisms for \emph{information acquisition} scenarios. This setting models strategic interactions between an uniformed \emph{receiver} and a set of informed \emph{senders}. In our model the senders receive information about the underlying state of nature and communicate their observation (either truthfully or not) to the receiver, which, based on this informatio… ▽ More We study the problem of designing mechanisms for \emph{information acquisition} scenarios. This setting models strategic interactions between an uniformed \emph{receiver} and a set of informed \emph{senders}. In our model the senders receive information about the underlying state of nature and communicate their observation (either truthfully or not) to the receiver, which, based on this information, selects an action. Our goal is to design mechanisms maximizing the receiver's utility while incentivizing the senders to report truthfully their information. First, we provide an algorithm that efficiently computes an optimal \emph{incentive compatible} (IC) mechanism. Then, we focus on the \emph{online} problem in which the receiver sequentially interacts in an unknown game, with the objective of minimizing the \emph{cumulative regret} w.r.t. the optimal IC mechanism, and the \emph{cumulative violation} of the incentive compatibility constraints. We investigate two different online scenarios, \emph{i.e.,} the \emph{full} and \emph{bandit feedback} settings. For the full feedback problem, we propose an algorithm that guarantees $\tilde{\mathcal O}(\sqrt T)$ regret and violation, while for the bandit feedback setting we present an algorithm that attains $\tilde{\mathcal O}(T^α)$ regret and $\tilde{\mathcal O}(T^{1-α/2})$ violation for any $α\in[1/2, 1]$. Finally, we complement our results providing a tight lower bound. △ Less

Submitted 12 June, 2023; v1 submitted 6 February, 2023; originally announced February 2023.

arXiv:2301.13790 [pdf, ps, other]

Selling Information while Being an Interested Party

Authors: Matteo Castiglioni, Francesco Bacchiocchi, Alberto Marchesi, Giulia Romano, Nicola Gatti

Abstract: We study the algorithmic problem faced by an information holder (seller) who wants to optimally sell such information to a budged-constrained decision maker (buyer) that has to undertake some action. Differently from previous, we consider the case in which the seller is an interested party, as the action chosen by the buyer does not only influence their utility, but also seller's one. This happens… ▽ More We study the algorithmic problem faced by an information holder (seller) who wants to optimally sell such information to a budged-constrained decision maker (buyer) that has to undertake some action. Differently from previous, we consider the case in which the seller is an interested party, as the action chosen by the buyer does not only influence their utility, but also seller's one. This happens in many real-world settings, where the way in which businesses use acquired information may positively or negatively affect the seller, due to the presence of externalities on the information market. The utilities of both the seller and the buyer depend on a random state of nature, which is revealed to the seller, but it is unknown to the buyer. Thus, the seller's goal is to (partially) sell their information about the state of nature to the buyer, so as to concurrently maximize revenue and induce the buyer to take a desirable action. We study settings in which buyer's budget and utilities are determined by a random buyer's type that is unknown to the seller. In such settings, an optimal protocol for the seller must propose to the buyer a menu of information-revelation policies to choose from, with the latter acquiring one of them by paying its corresponding price. Moreover, since in our model the seller is an interested party, an optimal protocol must also prescribe the seller to pay back the buyer contingently on their action. First, we show that the problem of computing a seller-optimal protocol can be solved in polynomial time. Next, we switch the attention to the case in which a seller's protocol employs a single information-revelation policy, rather than proposing a menu, deriving both positive and negative results. △ Less

Submitted 31 January, 2023; originally announced January 2023.

arXiv:2301.13654 [pdf, ps, other]

Multi-Agent Contract Design: How to Commission Multiple Agents with Individual Outcome

Authors: Matteo Castiglioni, Alberto Marchesi, Nicola Gatti

Abstract: We study hidden-action principal-agent problems with multiple agents. These are problems in which a principal commits to an outcome-dependent payment scheme in order to incentivize some agents to take costly, unobservable actions that lead to favorable outcomes. Previous works on multi-agent problems study models where the principal observes a single outcome determined by the actions of all the ag… ▽ More We study hidden-action principal-agent problems with multiple agents. These are problems in which a principal commits to an outcome-dependent payment scheme in order to incentivize some agents to take costly, unobservable actions that lead to favorable outcomes. Previous works on multi-agent problems study models where the principal observes a single outcome determined by the actions of all the agents. Such models considerably limit the contracting power of the principal, since payments can only depend on the joint result of all the agents' actions, and there is no way of paying each agent for their individual result. In this paper, we consider a model in which each agent determines their own individual outcome as an effect of their action only, the principal observes all the individual outcomes separately, and they perceive a reward that jointly depends on all these outcomes. This considerably enhances the principal's contracting capabilities, by allowing them to pay each agent on the basis of their individual result. We analyze the computational complexity of finding principal-optimal contracts, revolving around two newly-introduced properties of principal's rewards, which we call IR-supermodularity and DR-submodularity. Intuitively, the former captures settings with increasing returns, where the rewards grow faster as the agents' effort increases, while the latter models the case of diminishing returns, in which rewards grow slower instead. These two properties naturally model two common real-world phenomena, namely diseconomies and economies of scale. In this paper, we first address basic instances in which the principal knows everything about the agents, and, then, more general Bayesian instances where each agent has their own private type determining their features, such as action costs and how actions stochastically determine individual outcomes. △ Less

Submitted 31 January, 2023; originally announced January 2023.

arXiv:2301.13600 [pdf, ps, other]

Constrained Phi-Equilibria

Authors: Martino Bernasconi, Matteo Castiglioni, Alberto Marchesi, Francesco Trovò, Nicola Gatti

Abstract: The computational study of equilibria involving constraints on players' strategies has been largely neglected. However, in real-world applications, players are usually subject to constraints ruling out the feasibility of some of their strategies, such as, e.g., safety requirements and budget caps. Computational studies on constrained versions of the Nash equilibrium have lead to some results under… ▽ More The computational study of equilibria involving constraints on players' strategies has been largely neglected. However, in real-world applications, players are usually subject to constraints ruling out the feasibility of some of their strategies, such as, e.g., safety requirements and budget caps. Computational studies on constrained versions of the Nash equilibrium have lead to some results under very stringent assumptions, while finding constrained versions of the correlated equilibrium (CE) is still unexplored. In this paper, we introduce and computationally characterize constrained Phi-equilibria -- a more general notion than constrained CEs -- in normal-form games. We show that computing such equilibria is in general computationally intractable, and also that the set of the equilibria may not be convex, providing a sharp divide with unconstrained CEs. Nevertheless, we provide a polynomial-time algorithm for computing a constrained (approximate) Phi-equilibrium maximizing a given linear function, when either the number of constraints or that of players' actions is fixed. Moreover, in the special case in which a player's constraints do not depend on other players' strategies, we show that an exact, function-maximizing equilibrium can be computed in polynomial time, while one (approximate) equilibrium can be found with an efficient decentralized no-regret learning algorithm. △ Less

Submitted 31 January, 2023; originally announced January 2023.

arXiv:2212.06251 [pdf, other]

Autoregressive Bandits

Authors: Francesco Bacchiocchi, Gianmarco Genalti, Davide Maran, Marco Mussi, Marcello Restelli, Nicola Gatti, Alberto Maria Metelli

Abstract: Autoregressive processes naturally arise in a large variety of real-world scenarios, including stock markets, sales forecasting, weather prediction, advertising, and pricing. When facing a sequential decision-making problem in such a context, the temporal dependence between consecutive observations should be properly accounted for guaranteeing convergence to the optimal policy. In this work, we pr… ▽ More Autoregressive processes naturally arise in a large variety of real-world scenarios, including stock markets, sales forecasting, weather prediction, advertising, and pricing. When facing a sequential decision-making problem in such a context, the temporal dependence between consecutive observations should be properly accounted for guaranteeing convergence to the optimal policy. In this work, we propose a novel online learning setting, namely, Autoregressive Bandits (ARBs), in which the observed reward is governed by an autoregressive process of order $k$, whose parameters depend on the chosen action. We show that, under mild assumptions on the reward process, the optimal policy can be conveniently computed. Then, we devise a new optimistic regret minimization algorithm, namely, AutoRegressive Upper Confidence Bound (AR-UCB), that suffers sublinear regret of order $\widetilde{\mathcal{O}} \left( \frac{(k+1)^{3/2}\sqrt{nT}}{(1-Γ)^2}\right)$, where $T$ is the optimization horizon, $n$ is the number of actions, and $Γ< 1$ is a stability index of the process. Finally, we empirically validate our algorithm, illustrating its advantages w.r.t. bandit baselines and its robustness to misspecification of key parameters. △ Less

Submitted 19 February, 2024; v1 submitted 12 December, 2022; originally announced December 2022.

Comments: Accepted to AISTATS 2024

arXiv:2211.09612 [pdf, other]

Dynamic Pricing with Volume Discounts in Online Settings

Authors: Marco Mussi, Gianmarco Genalti, Alessandro Nuara, Francesco Trovò, Marcello Restelli, Nicola Gatti

Abstract: According to the main international reports, more pervasive industrial and business-process automation, thanks to machine learning and advanced analytic tools, will unlock more than 14 trillion USD worldwide annually by 2030. In the specific case of pricing problems-which constitute the class of problems we investigate in this paper-, the estimated unlocked value will be about 0.5 trillion USD per… ▽ More According to the main international reports, more pervasive industrial and business-process automation, thanks to machine learning and advanced analytic tools, will unlock more than 14 trillion USD worldwide annually by 2030. In the specific case of pricing problems-which constitute the class of problems we investigate in this paper-, the estimated unlocked value will be about 0.5 trillion USD per year. In particular, this paper focuses on pricing in e-commerce when the objective function is profit maximization and only transaction data are available. This setting is one of the most common in real-world applications. Our work aims to find a pricing strategy that allows defining optimal prices at different volume thresholds to serve different classes of users. Furthermore, we face the major challenge, common in real-world settings, of dealing with limited data available. We design a two-phase online learning algorithm, namely PVD-B, capable of exploiting the data incrementally in an online fashion. The algorithm first estimates the demand curve and retrieves the optimal average price, and subsequently it offers discounts to differentiate the prices for each volume threshold. We ran a real-world 4-month-long A/B testing experiment in collaboration with an Italian e-commerce company, in which our algorithm PVD-B-corresponding to A configuration-has been compared with human pricing specialists-corresponding to B configuration. At the end of the experiment, our algorithm produced a total turnover of about 300 KEuros, outperforming the B configuration performance by about 55%. The Italian company we collaborated with decided to adopt our algorithm for more than 1,200 products since January 2022. △ Less

Submitted 17 November, 2022; originally announced November 2022.

Comments: Accepted to IAAI 2023

arXiv:2209.07454 [pdf, ps, other]

A Unifying Framework for Online Optimization with Long-Term Constraints

Authors: Matteo Castiglioni, Andrea Celli, Alberto Marchesi, Giulia Romano, Nicola Gatti

Abstract: We study online learning problems in which a decision maker has to take a sequence of decisions subject to $m$ long-term constraints. The goal of the decision maker is to maximize their total reward, while at the same time achieving small cumulative constraints violation across the $T$ rounds. We present the first best-of-both-world type algorithm for this general class of problems, with no-regret… ▽ More We study online learning problems in which a decision maker has to take a sequence of decisions subject to $m$ long-term constraints. The goal of the decision maker is to maximize their total reward, while at the same time achieving small cumulative constraints violation across the $T$ rounds. We present the first best-of-both-world type algorithm for this general class of problems, with no-regret guarantees both in the case in which rewards and constraints are selected according to an unknown stochastic model, and in the case in which they are selected at each round by an adversary. Our algorithm is the first to provide guarantees in the adversarial setting with respect to the optimal fixed strategy that satisfies the long-term constraints. In particular, it guarantees a $ρ/(1+ρ)$ fraction of the optimal reward and sublinear regret, where $ρ$ is a feasibility parameter related to the existence of strictly feasible solutions. Our framework employs traditional regret minimizers as black-box components. Therefore, by instantiating it with an appropriate choice of regret minimizers it can handle the full-feedback as well as the bandit-feedback setting. Moreover, it allows the decision maker to seamlessly handle scenarios with non-convex rewards and constraints. We show how our framework can be applied in the context of budget-management mechanisms for repeated auctions in order to guarantee long-term constraints that are not packing (e.g., ROI constraints). △ Less

Submitted 15 September, 2022; originally announced September 2022.

arXiv:2209.03927 [pdf, other]

Sequential Information Design: Learning to Persuade in the Dark

Authors: Martino Bernasconi, Matteo Castiglioni, Alberto Marchesi, Nicola Gatti, Francesco Trovo

Abstract: We study a repeated information design problem faced by an informed sender who tries to influence the behavior of a self-interested receiver. We consider settings where the receiver faces a sequential decision making (SDM) problem. At each round, the sender observes the realizations of random events in the SDM problem. This begets the challenge of how to incrementally disclose such information to… ▽ More We study a repeated information design problem faced by an informed sender who tries to influence the behavior of a self-interested receiver. We consider settings where the receiver faces a sequential decision making (SDM) problem. At each round, the sender observes the realizations of random events in the SDM problem. This begets the challenge of how to incrementally disclose such information to the receiver to persuade them to follow (desirable) action recommendations. We study the case in which the sender does not know random events probabilities, and, thus, they have to gradually learn them while persuading the receiver. We start by providing a non-trivial polytopal approximation of the set of sender's persuasive information structures. This is crucial to design efficient learning algorithms. Next, we prove a negative result: no learning algorithm can be persuasive. Thus, we relax persuasiveness requirements by focusing on algorithms that guarantee that the receiver's regret in following recommendations grows sub-linearly. In the full-feedback setting -- where the sender observes all random events realizations -- , we provide an algorithm with $\tilde{O}(\sqrt{T})$ regret for both the sender and the receiver. Instead, in the bandit-feedback setting -- where the sender only observes the realizations of random events actually occurring in the SDM problem -- , we design an algorithm that, given an $α\in [1/2, 1]$ as input, ensures $\tilde{O}({T^α})$ and $\tilde{O}( T^{\max \{ α, 1-\fracα{2} \} })$ regrets, for the sender and the receiver respectively. This result is complemented by a lower bound showing that such a regrets trade-off is essentially tight. △ Less

Submitted 8 September, 2022; originally announced September 2022.

arXiv:2206.09161 [pdf, other]

A Marriage between Adversarial Team Games and 2-player Games: Enabling Abstractions, No-regret Learning, and Subgame Solving

Authors: Luca Carminati, Federico Cacciamani, Marco Ciccone, Nicola Gatti

Abstract: \emph{Ex ante} correlation is becoming the mainstream approach for \emph{sequential adversarial team games}, where a team of players faces another team in a zero-sum game. It is known that team members' asymmetric information makes both equilibrium computation \textsf{APX}-hard and team's strategies not directly representable on the game tree. This latter issue prevents the adoption of successful… ▽ More \emph{Ex ante} correlation is becoming the mainstream approach for \emph{sequential adversarial team games}, where a team of players faces another team in a zero-sum game. It is known that team members' asymmetric information makes both equilibrium computation \textsf{APX}-hard and team's strategies not directly representable on the game tree. This latter issue prevents the adoption of successful tools for huge 2-player zero-sum games such as, \emph{e.g.}, abstractions, no-regret learning, and subgame solving. This work shows that we can recover from this weakness by bridging the gap between sequential adversarial team games and 2-player games. In particular, we propose a new, suitable game representation that we call \emph{team-public-information}, in which a team is represented as a single coordinator who only knows information common to the whole team and prescribes to each member an action for any possible private state. The resulting representation is highly \emph{explainable}, being a 2-player tree in which the team's strategies are behavioral with a direct interpretation and more expressive than the original extensive form when designing abstractions. Furthermore, we prove payoff equivalence of our representation, and we provide techniques that, starting directly from the extensive form, generate dramatically more compact representations without information loss. Finally, we experimentally evaluate our techniques when applied to a standard testbed, comparing their performance with the current state of the art. △ Less

Submitted 18 June, 2022; originally announced June 2022.

Comments: 20 pages; Accepted for publication at ICML 2022

arXiv:2206.00586 [pdf, other]

Multi-Armed Bandit Problem with Temporally-Partitioned Rewards: When Partial Feedback Counts

Authors: Giulia Romano, Andrea Agostini, Francesco Trovò, Nicola Gatti, Marcello Restelli

Abstract: There is a rising interest in industrial online applications where data becomes available sequentially. Inspired by the recommendation of playlists to users where their preferences can be collected during the listening of the entire playlist, we study a novel bandit setting, namely Multi-Armed Bandit with Temporally-Partitioned Rewards (TP-MAB), in which the stochastic reward associated with the p… ▽ More There is a rising interest in industrial online applications where data becomes available sequentially. Inspired by the recommendation of playlists to users where their preferences can be collected during the listening of the entire playlist, we study a novel bandit setting, namely Multi-Armed Bandit with Temporally-Partitioned Rewards (TP-MAB), in which the stochastic reward associated with the pull of an arm is partitioned over a finite number of consecutive rounds following the pull. This setting, unexplored so far to the best of our knowledge, is a natural extension of delayed-feedback bandits to the case in which rewards may be dilated over a finite-time span after the pull instead of being fully disclosed in a single, potentially delayed round. We provide two algorithms to address TP-MAB problems, namely, TP-UCB-FR and TP-UCB-EW, which exploit the partial information disclosed by the reward collected over time. We show that our algorithms provide better asymptotical regret upper bounds than delayed-feedback bandit algorithms when a property characterizing a broad set of reward structures of practical interest, namely alpha-smoothness, holds. We also empirically evaluate their performance across a wide range of settings, both synthetically generated and from a real-world media recommendation problem. △ Less

Submitted 1 June, 2022; originally announced June 2022.

arXiv:2204.13772 [pdf, other]

The Power of Media Agencies in Ad Auctions: Improving Utility through Coordinated Bidding

Authors: Giulia Romano, Matteo Castiglioni, Alberto Marchesi, Nicola Gatti

Abstract: The increasing competition in digital advertising induced a proliferation of media agencies playing the role of intermediaries between advertisers and platforms selling ad slots. When a group of competing advertisers is managed by a common agency, many forms of collusion, such as bid rigging, can be implemented by coordinating bidding strategies, dramatically increasing advertisers' value. We stud… ▽ More The increasing competition in digital advertising induced a proliferation of media agencies playing the role of intermediaries between advertisers and platforms selling ad slots. When a group of competing advertisers is managed by a common agency, many forms of collusion, such as bid rigging, can be implemented by coordinating bidding strategies, dramatically increasing advertisers' value. We study the problem of finding bids and monetary transfers maximizing the utility of a group of colluders, under GSP and VCG mechanisms. First, we introduce an abstract bid optimization problem -- called weighted utility problem (WUP) -- , which is useful in proving our results. We show that the utilities of bidding strategies are related to the length of paths in a directed acyclic weighted graph, whose structure and weights depend on the mechanism under study. This allows us to solve WUP in polynomial time by finding a shortest path of the graph. Next, we switch to our original problem, focusing on two settings that differ for the incentives they allow for. Incentive constraints ensure that colluders do not leave the agency, and they can be enforced by implementing monetary transfers between the agency and the advertisers. In particular, we study the arbitrary transfers setting, where any kind of monetary transfer to and from the advertisers is allowed, and the more realistic limited liability setting, in which no advertiser can be paid by the agency. In the former, we cast the problem as a WUP instance and solve it by our graph-based algorithm, while, in the latter, we formulate it as a linear program with exponentially-many variables efficiently solvable by applying the ellipsoid algorithm to its dual. This requires to solve a suitable separation problem in polynomial time, which can be done by reducing it to a WUP instance. △ Less

Submitted 28 April, 2022; originally announced April 2022.

arXiv:2202.10966 [pdf, ps, other]

Designing Menus of Contracts Efficiently: The Power of Randomization

Authors: Matteo Castiglioni, Alberto Marchesi, Nicola Gatti

Abstract: We study hidden-action principal-agent problems in which a principal commits to an outcome-dependent payment scheme (called contract) so as to incentivize the agent to take a costly, unobservable action leading to favorable outcomes. In particular, we focus on Bayesian settings where the agent has private information. This is collectively encoded by the agent's type, which is unknown to the princi… ▽ More We study hidden-action principal-agent problems in which a principal commits to an outcome-dependent payment scheme (called contract) so as to incentivize the agent to take a costly, unobservable action leading to favorable outcomes. In particular, we focus on Bayesian settings where the agent has private information. This is collectively encoded by the agent's type, which is unknown to the principal, but randomly drawn according to a finitely-supported, commonly-known probability distribution. In Bayesian principal-agent problems, the principal may be better off by committing to a menu of contracts specifying a contract for each agent's type, rather than committing to a single contract. This induces a two-stage process that resembles interactions studied in classical mechanism design: after the principal has committed to a menu, the agent first reports a type to the principal, and, then, the latter puts in place the contract in the menu that corresponds to the reported type. Thus, the principal's computational problem boils down to designing a menu of contracts that incentivizes the agent to report their true type and maximizes expected utility. Previous works showed that computing an optimal menu of contracts is APX-hard. Crucially, previous works focus on menus of deterministic contracts. Surprisingly, we show that, if one considers menus of randomized contracts defined as probability distributions over payment vectors, then an "almost-optimal" menu can be computed in polynomial time. Indeed, the problem of computing a principal-optimal menu of randomized contracts may not admit a maximum, but only a supremum. Nevertheless, we show how to design a polynomial-time algorithm that guarantees the principal with an expected utility arbitrarily close to the supremum. Besides this main result, we also close several gaps in the analysis of menus of deterministic contracts. △ Less

Submitted 17 August, 2022; v1 submitted 22 February, 2022; originally announced February 2022.

arXiv:2202.00605 [pdf, ps, other]

Bayesian Persuasion Meets Mechanism Design: Going Beyond Intractability with Type Reporting

Authors: Matteo Castiglioni, Alberto Marchesi, Nicola Gatti

Abstract: Bayesian persuasion studies how an informed sender should partially disclose information so as to influence the behavior of self-interested receivers. In the last years, a growing attention has been devoted to relaxing the assumption that the sender perfectly knows receiver's payoffs. The first crucial step towards such an achievement is to study settings where each receiver's payoffs depend on th… ▽ More Bayesian persuasion studies how an informed sender should partially disclose information so as to influence the behavior of self-interested receivers. In the last years, a growing attention has been devoted to relaxing the assumption that the sender perfectly knows receiver's payoffs. The first crucial step towards such an achievement is to study settings where each receiver's payoffs depend on their unknown type, which is randomly determined by a known finite-supported probability distribution. This begets considerable computational challenges, as computing a sender-optimal signaling scheme is inapproximable up to within any constant factor. In this work, we circumvent this issue by leveraging ideas from mechanism design. In particular, we introduce a type reporting step in which the receiver is asked to report their type to the sender, after the latter has committed to a menu defining a signaling scheme for each possible receiver's type. We prove that, with a single receiver, the addition of this type reporting stage makes the sender's computational problem tractable. Then, we extend our framework to settings with multiple receivers, focusing on the case of no inter-agent externalities and binary actions. We show that it is possible to find a sender-optimal solution in polynomial-time by means of the ellipsoid method, given access to a suitable polynomial-time separation oracle. This can be implemented for supermodular and anonymous sender's utility functions. As for the case of submodular sender's utility functions, we first approximately cast the sender's problem into a linearly-constrained mathematical program whose objective function is the multi-linear extension of the sender's utility. Then, we show how to find in polynomial-time an approximate solution to the program by means of a continuous greedy algorithm. This provides a (1 -1/e)-approximation to the problem. △ Less

Submitted 1 September, 2022; v1 submitted 1 February, 2022; originally announced February 2022.

arXiv:2201.12275 [pdf, other]

Efficiency of Ad Auctions with Price Displaying

Authors: Matteo Castiglioni, Diodato Ferraioli, Nicola Gatti, Alberto Marchesi, Giulia Romano

Abstract: Most of the economic reports forecast that almost half of the worldwide market value unlocked by AI over the next decade (up to 6 trillion USD per year) will be in marketing&sales. In particular, AI will enable the optimization of more and more intricate economic settings, in which multiple different activities need to be jointly automated. This is the case of, e.g., Google Hotel Ads and Tripadvis… ▽ More Most of the economic reports forecast that almost half of the worldwide market value unlocked by AI over the next decade (up to 6 trillion USD per year) will be in marketing&sales. In particular, AI will enable the optimization of more and more intricate economic settings, in which multiple different activities need to be jointly automated. This is the case of, e.g., Google Hotel Ads and Tripadvisor, where auctions are used to display ads of similar products or services together with their prices. As in classical ad auctions, the ads are ranked depending on the advertisers' bids, whereas, differently from classical settings, ads are displayed together with their prices, so as to provide a direct comparison among them. This dramatically affects users' behavior, as well as the properties of ad auctions. We show that, in such settings, social welfare maximization can be achieved by means of a direct-revelation mechanism that jointly optimizes, in polynomial time, the ads allocation and the advertisers' prices to be displayed with them. However, in practice it is unlikely that advertisers allow the mechanism to choose prices on their behalf. Indeed, in commonly-adopted mechanisms, ads allocation and price optimization are decoupled, so that the advertisers optimize prices and bids, while the mechanism does so for the allocation, once prices and bids are given. We investigate how this decoupling affects the efficiency of mechanisms. In particular, we study the Price of Anarchy (PoA) and the Price of Stability (PoS) of indirect-revelation mechanisms with both VCG and GSP payments, showing that the PoS for the revenue may be unbounded even with two slots, and the PoA for the social welfare may be as large as the number of slots. Nevertheless, we show that, under some assumptions, simple modifications to the indirect-revelation mechanism with VCG payments achieve a PoS of 1 for the revenue. △ Less

Submitted 28 January, 2022; originally announced January 2022.

arXiv:2201.12183 [pdf, other]

Signaling in Posted Price Auctions

Authors: Matteo Castiglioni, Giulia Romano, Alberto Marchesi, Nicola Gatti

Abstract: We study single-item single-unit Bayesian posted price auctions, where buyers arrive sequentially and their valuations for the item being sold depend on a random, unknown state of nature. The seller has complete knowledge of the actual state and can send signals to the buyers so as to disclose information about it. For instance, the state of nature may reflect the condition and/or some particular… ▽ More We study single-item single-unit Bayesian posted price auctions, where buyers arrive sequentially and their valuations for the item being sold depend on a random, unknown state of nature. The seller has complete knowledge of the actual state and can send signals to the buyers so as to disclose information about it. For instance, the state of nature may reflect the condition and/or some particular features of the item, which are known to the seller only. The problem faced by the seller is about how to partially disclose information about the state so as to maximize revenue. Unlike classical signaling problems, in this setting, the seller must also correlate the signals being sent to the buyers with some price proposals for them. This introduces additional challenges compared to standard settings. We consider two cases: the one where the seller can only send signals publicly visible to all buyers, and the case in which the seller can privately send a different signal to each buyer. As a first step, we prove that, in both settings, the problem of maximizing the seller's revenue does not admit an FPTAS unless P=NP, even for basic instances with a single buyer. As a result, in the rest of the paper, we focus on designing PTASs. In order to do so, we first introduce a unifying framework encompassing both public and private signaling, whose core result is a decomposition lemma that allows focusing on a finite set of possible buyers' posteriors. This forms the basis on which our PTASs are developed. In particular, in the public signaling setting, our PTAS employs some ad hoc techniques based on linear programming, while our PTAS for the private setting relies on the ellipsoid method to solve an exponentially-sized LP in polynomial time. In the latter case, we need a custom approximate separation oracle, which we implement with a dynamic programming approach. △ Less

Submitted 29 March, 2022; v1 submitted 28 January, 2022; originally announced January 2022.

arXiv:2201.10377 [pdf, other]

Public Information Representation for Adversarial Team Games

Authors: Luca Carminati, Federico Cacciamani, Marco Ciccone, Nicola Gatti

Abstract: The peculiarity of adversarial team games resides in the asymmetric information available to the team members during the play, which makes the equilibrium computation problem hard even with zero-sum payoffs. The algorithms available in the literature work with implicit representations of the strategy space and mainly resort to Linear Programming and column generation techniques to enlarge incremen… ▽ More The peculiarity of adversarial team games resides in the asymmetric information available to the team members during the play, which makes the equilibrium computation problem hard even with zero-sum payoffs. The algorithms available in the literature work with implicit representations of the strategy space and mainly resort to Linear Programming and column generation techniques to enlarge incrementally the strategy space. Such representations prevent the adoption of standard tools such as abstraction generation, game solving, and subgame solving, which demonstrated to be crucial when solving huge, real-world two-player zero-sum games. Differently from these works, we answer the question of whether there is any suitable game representation enabling the adoption of those tools. In particular, our algorithms convert a sequential team game with adversaries to a classical two-player zero-sum game. In this converted game, the team is transformed into a single coordinator player who only knows information common to the whole team and prescribes to the players an action for any possible private state. Interestingly, we show that our game is more expressive than the original extensive-form game as any state/action abstraction of the extensive-form game can be captured by our representation, while the reverse does not hold. Due to the NP-hard nature of the problem, the resulting Public Team game may be exponentially larger than the original one. To limit this explosion, we provide three algorithms, each returning an information-lossless abstraction that dramatically reduces the size of the tree. These abstractions can be produced without generating the original game tree. Finally, we show the effectiveness of the proposed approach by presenting experimental results on Kuhn and Leduc Poker games, obtained by applying state-of-art algorithms for two-player zero-sum games on the converted games △ Less

Submitted 25 January, 2022; originally announced January 2022.

Comments: 19 pages, 7 figures, Best Paper Award in Cooperative AI Workshop at NeurIPS 2021

arXiv:2201.09728 [pdf, other]

Public Signaling in Bayesian Ad Auctions

Authors: Francesco Bacchiocchi, Matteo Castiglioni, Alberto Marchesi, Giulia Romano, Nicola Gatti

Abstract: We study signaling in Bayesian ad auctions, in which bidders' valuations depend on a random, unknown state of nature. The auction mechanism has complete knowledge of the actual state of nature, and it can send signals to bidders so as to disclose information about the state and increase revenue. For instance, a state may collectively encode some features of the user that are known to the mechanism… ▽ More We study signaling in Bayesian ad auctions, in which bidders' valuations depend on a random, unknown state of nature. The auction mechanism has complete knowledge of the actual state of nature, and it can send signals to bidders so as to disclose information about the state and increase revenue. For instance, a state may collectively encode some features of the user that are known to the mechanism only, since the latter has access to data sources unaccessible to the bidders. We study the problem of computing how the mechanism should send signals to bidders in order to maximize revenue. While this problem has already been addressed in the easier setting of second-price auctions, to the best of our knowledge, our work is the first to explore ad auctions with more than one slot. In this paper, we focus on public signaling and VCG mechanisms, under which bidders truthfully report their valuations. We start with a negative result, showing that, in general, the problem does not admit a PTAS unless P = NP, even when bidders' valuations are known to the mechanism. The rest of the paper is devoted to settings in which such negative result can be circumvented. First, we prove that, with known valuations, the problem can indeed be solved in polynomial time when either the number of states d or the number of slots m is fixed. Moreover, in the same setting, we provide an FPTAS for the case in which bidders are single minded, but d and m can be arbitrary. Then, we switch to the random valuations setting, in which these are randomly drawn according to some probability distribution. In this case, we show that the problem admits an FPTAS, a PTAS, and a QPTAS, when, respectively, d is fixed, m is fixed, and bidders' valuations are bounded away from zero. △ Less

Submitted 24 January, 2022; originally announced January 2022.

arXiv:2201.07139 [pdf, ps, other]

Safe Online Bid Optimization with Return-On-Investment and Budget Constraints subject to Uncertainty

Authors: Matteo Castiglioni, Alessandro Nuara, Giulia Romano, Giorgio Spadaro, Francesco Trovò, Nicola Gatti

Abstract: In online marketing, the advertisers' goal is usually a tradeoff between achieving high volumes and high profitability. The companies' business units customarily address this tradeoff by maximizing the volumes while guaranteeing a lower bound to the Return On Investment (ROI). This paper investigates combinatorial bandit algorithms for the bid optimization of advertising campaigns subject to uncer… ▽ More In online marketing, the advertisers' goal is usually a tradeoff between achieving high volumes and high profitability. The companies' business units customarily address this tradeoff by maximizing the volumes while guaranteeing a lower bound to the Return On Investment (ROI). This paper investigates combinatorial bandit algorithms for the bid optimization of advertising campaigns subject to uncertain budget and ROI constraints. We study the nature of both the optimization and learning problems. In particular, when focusing on the optimization problem without uncertainty, we show that it is inapproximable within any factor unless P=NP, and we provide a pseudo-polynomial-time algorithm that achieves an optimal solution. When considering uncertainty, we prove that no online learning algorithm can violate the (ROI or budget) constraints during the learning process a sublinear number of times while guaranteeing a sublinear pseudo-regret. Thus, we provide an algorithm, namely GCB, guaranteeing sublinear regret at the cost of a potentially linear number of constraints violations. We also design its safe version, namely GCB_{safe}, guaranteeing w.h.p. a constant upper bound on the number of constraints violations at the cost of a linear pseudo-regret. More interestingly, we provide an algorithm, namely GCB_{safe}(ψ,φ), guaranteeing both sublinear pseudo-regret and safety w.h.p. at the cost of accepting tolerances ψand φin the satisfaction of the ROI and budget constraints, respectively. This algorithm actually mitigates the risks due to the constraints violations without precluding the convergence to the optimal solution. Finally, we experimentally compare our algorithms in terms of pseudo-regret/constraint-violation tradeoff in settings generated from real-world data, showing the importance of adopting safety constraints in practice and the effectiveness of our algorithms. △ Less

Submitted 18 January, 2022; originally announced January 2022.

arXiv:2106.06480 [pdf, ps, other]

Multi-Receiver Online Bayesian Persuasion

Authors: Matteo Castiglioni, Alberto Marchesi, Andrea Celli, Nicola Gatti

Abstract: Bayesian persuasion studies how an informed sender should partially disclose information to influence the behavior of a self-interested receiver. Classical models make the stringent assumption that the sender knows the receiver's utility. This can be relaxed by considering an online learning framework in which the sender repeatedly faces a receiver of an unknown, adversarially selected type. We st… ▽ More Bayesian persuasion studies how an informed sender should partially disclose information to influence the behavior of a self-interested receiver. Classical models make the stringent assumption that the sender knows the receiver's utility. This can be relaxed by considering an online learning framework in which the sender repeatedly faces a receiver of an unknown, adversarially selected type. We study, for the first time, an online Bayesian persuasion setting with multiple receivers. We focus on the case with no externalities and binary actions, as customary in offline models. Our goal is to design no-regret algorithms for the sender with polynomial per-iteration running time. First, we prove a negative result: for any $0 < α\leq 1$, there is no polynomial-time no-$α$-regret algorithm when the sender's utility function is supermodular or anonymous. Then, we focus on the case of submodular sender's utility functions and we show that, in this case, it is possible to design a polynomial-time no-$(1 - \frac{1}{e})$-regret algorithm. To do so, we introduce a general online gradient descent scheme to handle online learning problems with a finite number of possible loss functions. This requires the existence of an approximate projection oracle. We show that, in our setting, there exists one such projection oracle which can be implemented in polynomial time. △ Less

Submitted 11 June, 2021; originally announced June 2021.

arXiv:2106.00319 [pdf, ps, other]

Bayesian Agency: Linear versus Tractable Contracts

Authors: Matteo Castiglioni, Alberto Marchesi, Nicola Gatti

Abstract: We study principal-agent problems in which a principal commits to an outcome-dependent payment scheme (a.k.a. contract) so as to induce an agent to take a costly, unobservable action. We relax the assumption that the principal perfectly knows the agent by considering a Bayesian setting where the agent's type is unknown and randomly selected according to a given probability distribution, which is k… ▽ More We study principal-agent problems in which a principal commits to an outcome-dependent payment scheme (a.k.a. contract) so as to induce an agent to take a costly, unobservable action. We relax the assumption that the principal perfectly knows the agent by considering a Bayesian setting where the agent's type is unknown and randomly selected according to a given probability distribution, which is known to the principal. Each agent's type is characterized by her own action costs and action-outcome distributions. In the literature on non-Bayesian principal-agent problems, considerable attention has been devoted to linear contracts, which are simple, pure-commission payment schemes that still provide nice approximation guarantees with respect to principal-optimal (possibly non-linear) contracts. While in non-Bayesian settings an optimal contract can be computed efficiently, this is no longer the case for our Bayesian principal-agent problems. This further motivates our focus on linear contracts, which can be optimized efficiently given their single-parameter nature. Our goal is to analyze the properties of linear contracts in Bayesian settings, in terms of approximation guarantees with respect to optimal contracts and general tractable contracts (i.e., efficiently-computable ones). First, we study the approximation guarantees of linear contracts with respect to optimal ones, showing that the former suffer from a multiplicative loss linear in the number of agent's types. Nevertheless, we prove that linear contracts can still provide a constant multiplicative approximation $ρ$ of the optimal principal's expected utility, though at the expense of an exponentially-small additive loss $2^{-Ω(ρ)}$. Then, we switch to tractable contracts, showing that, surprisingly, linear contracts perform well among them. △ Less

Submitted 1 June, 2021; originally announced June 2021.

arXiv:2104.01520 [pdf, ps, other]

Simple Uncoupled No-Regret Learning Dynamics for Extensive-Form Correlated Equilibrium

Authors: Gabriele Farina, Andrea Celli, Alberto Marchesi, Nicola Gatti

Abstract: The existence of simple uncoupled no-regret learning dynamics that converge to correlated equilibria in normal-form games is a celebrated result in the theory of multi-agent systems. Specifically, it has been known for more than 20 years that when all players seek to minimize their internal regret in a repeated normal-form game, the empirical frequency of play converges to a normal-form correlated… ▽ More The existence of simple uncoupled no-regret learning dynamics that converge to correlated equilibria in normal-form games is a celebrated result in the theory of multi-agent systems. Specifically, it has been known for more than 20 years that when all players seek to minimize their internal regret in a repeated normal-form game, the empirical frequency of play converges to a normal-form correlated equilibrium. Extensive-form games generalize normal-form games by modeling both sequential and simultaneous moves, as well as imperfect information. Because of the sequential nature and presence of private information in the game, correlation in extensive-form games possesses significantly different properties than its counterpart in normal-form games, many of which are still open research directions. Extensive-form correlated equilibrium (EFCE) has been proposed as the natural extensive-form counterpart to the classical notion of correlated equilibrium in normal-form games. Compared to the latter, the constraints that define the set of EFCEs are significantly more complex, as the correlation device must keep into account the evolution of beliefs of each player as they make observations throughout the game. Due to that significant added complexity, the existence of uncoupled learning dynamics leading to an EFCE has remained a challenging open research question for a long time. In this article, we settle that question by giving the first uncoupled no-regret dynamics that converge to the set of EFCEs in n-player general-sum extensive-form games with perfect recall. We show that each iterate can be computed in time polynomial in the size of the game tree, and that, when all players play repeatedly according to our learning dynamics, the empirical frequency of play is proven to be a O(T^-0.5)-approximate EFCE with high probability after T game repetitions, and an EFCE almost surely in the limit. △ Less

Submitted 27 May, 2021; v1 submitted 3 April, 2021; originally announced April 2021.

Comments: Extended version of our NeurIPS 2020 paper. Compared to the conference version, this preprint gives finer, in-high-probability regret bounds. We also better connected our work to the phi-regret minimization framework

arXiv:2102.05026 [pdf, other]

Multi-Agent Coordination in Adversarial Environments through Signal Mediated Strategies

Authors: Federico Cacciamani, Andrea Celli, Marco Ciccone, Nicola Gatti

Abstract: Many real-world scenarios involve teams of agents that have to coordinate their actions to reach a shared goal. We focus on the setting in which a team of agents faces an opponent in a zero-sum, imperfect-information game. Team members can coordinate their strategies before the beginning of the game, but are unable to communicate during the playing phase of the game. This is the case, for example,… ▽ More Many real-world scenarios involve teams of agents that have to coordinate their actions to reach a shared goal. We focus on the setting in which a team of agents faces an opponent in a zero-sum, imperfect-information game. Team members can coordinate their strategies before the beginning of the game, but are unable to communicate during the playing phase of the game. This is the case, for example, in Bridge, collusion in poker, and collusion in bidding. In this setting, model-free RL methods are oftentimes unable to capture coordination because agents' policies are executed in a decentralized fashion. Our first contribution is a game-theoretic centralized training regimen to effectively perform trajectory sampling so as to foster team coordination. When team members can observe each other actions, we show that this approach provably yields equilibrium strategies. Then, we introduce a signaling-based framework to represent team coordinated strategies given a buffer of past experiences. Each team member's policy is parametrized as a neural network whose output is conditioned on a suitable exogenous signal, drawn from a learned probability distribution. By combining these two elements, we empirically show convergence to coordinated equilibria in cases where previous state-of-the-art multi-agent RL algorithms did not. △ Less

Submitted 9 February, 2021; originally announced February 2021.

Comments: Accepted at AAMAS 2021 (full paper)

arXiv:2012.06528 [pdf, ps, other]

Trembling-Hand Perfection and Correlation in Sequential Games

Authors: Alberto Marchesi, Nicola Gatti

Abstract: We initiate the study of trembling-hand perfection in sequential (i.e., extensive-form) games with correlation. We introduce the extensive-form perfect correlated equilibrium (EFPCE) as a refinement of the classical extensive-form correlated equilibrium (EFCE) that amends its weaknesses off the equilibrium path. This is achieved by accounting for the possibility that players may make mistakes whil… ▽ More We initiate the study of trembling-hand perfection in sequential (i.e., extensive-form) games with correlation. We introduce the extensive-form perfect correlated equilibrium (EFPCE) as a refinement of the classical extensive-form correlated equilibrium (EFCE) that amends its weaknesses off the equilibrium path. This is achieved by accounting for the possibility that players may make mistakes while following recommendations independently at each information set of the game. After providing an axiomatic definition of EFPCE, we show that one always exists since any perfect (Nash) equilibrium constitutes an EFPCE, and that it is a refinement of EFCE, as any EFPCE is also an EFCE. Then, we prove that, surprisingly, computing an EFPCE is not harder than finding an EFCE, since the problem can be solved in polynomial time for general n-player extensive-form games (also with chance). This is achieved by formulating the problem as that of finding a limit solution (as $ε\rightarrow 0$) to a suitably defined trembling LP parametrized by $ε$, featuring exponentially many variables and polynomially many constraints. To this end, we show how a recently developed polynomial-time algorithm for trembling LPs can be adapted to deal with problems having an exponential number of variables. This calls for the solution of a sequence of (non-trembling) LPs with exponentially many variables and polynomially many constraints, which is possible in polynomial time by applying an ellipsoid against hope approach. △ Less

Submitted 11 December, 2020; originally announced December 2020.

arXiv:2012.05774 [pdf, other]

Online Posted Pricing with Unknown Time-Discounted Valuations

Authors: Giulia Romano, Gianluca Tartaglia, Alberto Marchesi, Nicola Gatti

Abstract: We study the problem of designing posted-price mechanisms in order to sell a single unit of a single item within a finite period of time. Motivated by real-world problems, such as, e.g., long-term rental of rooms and apartments, we assume that customers arrive online according to a Poisson process, and their valuations are drawn from an unknown distribution and discounted over time. We evaluate ou… ▽ More We study the problem of designing posted-price mechanisms in order to sell a single unit of a single item within a finite period of time. Motivated by real-world problems, such as, e.g., long-term rental of rooms and apartments, we assume that customers arrive online according to a Poisson process, and their valuations are drawn from an unknown distribution and discounted over time. We evaluate our mechanisms in terms of competitive ratio, measuring the worst-case ratio between their revenue and that of an optimal mechanism that knows the distribution of valuations. First, we focus on the identical valuation setting, where all the customers value the item for the same amount. In this setting, we provide a mechanism M_c that achieves the best possible competitive ratio, discussing its dependency on the parameters in the case of linear discount. Then, we switch to the random valuation setting. We show that, if we restrict the attention to distributions of valuations with a monotone hazard rate, then the competitive ratio of M_c is lower bounded by a strictly positive constant that does not depend on the distribution. Moreover, we provide another mechanism, called M_pc, which is defined by a piecewise constant pricing strategy and reaches performances comparable to those obtained with M_c. This mechanism is useful when the seller cannot change the posted price too often. Finally, we empirically evaluate the performances of our mechanisms in a number of experimental settings. △ Less

Submitted 10 December, 2020; originally announced December 2020.

arXiv:2012.05002 [pdf, other]

Persuading Voters in District-based Elections

Authors: Matteo Castiglioni, Nicola Gatti

Abstract: We focus on the scenario in which an agent can exploit his information advantage to manipulate the outcome of an election. In particular, we study district-based elections with two candidates, in which the winner of the election is the candidate that wins in the majority of the districts. District-based elections are adopted worldwide (e.g., UK and USA) and are a natural extension of widely studie… ▽ More We focus on the scenario in which an agent can exploit his information advantage to manipulate the outcome of an election. In particular, we study district-based elections with two candidates, in which the winner of the election is the candidate that wins in the majority of the districts. District-based elections are adopted worldwide (e.g., UK and USA) and are a natural extension of widely studied voting mechanisms (e.g., k-voting and plurality voting). We resort to the Bayesian persuasion framework, where the manipulator (sender) strategically discloses information to the voters (receivers) that update their beliefs rationally. We study both private signaling, in which the sender can use a private communication channel per receiver, and public signaling, in which the sender can use a single communication channel for all the receivers. Furthermore, for the first time, we introduce semi-public signaling in which the sender can use a single communication channel per district. We show that there is a sharp distinction between private and (semi-)public signaling. In particular, optimal private signaling schemes can provide an arbitrarily better probability of victory than (semi-)public ones and can be computed efficiently, while optimal (semi-)public signaling schemes cannot be approximated to within any factor in polynomial time unless P=NP. However, we show that reasonable relaxations allow the design of multi-criteria PTASs for optimal (semi-)public signaling schemes. In doing so, we introduce a novel property, namely comparative stability, and we design a bi-criteria PTAS for public signaling in general Bayesian persuasion problems beyond elections when the sender's utility function is state-dependent. △ Less

Submitted 10 December, 2020; v1 submitted 9 December, 2020; originally announced December 2020.

arXiv:2009.10061 [pdf, ps, other]

Faster Algorithms for Optimal Ex-Ante Coordinated Collusive Strategies in Extensive-Form Zero-Sum Games

Authors: Gabriele Farina, Andrea Celli, Nicola Gatti, Tuomas Sandholm

Abstract: We focus on the problem of finding an optimal strategy for a team of two players that faces an opponent in an imperfect-information zero-sum extensive-form game. Team members are not allowed to communicate during play but can coordinate before the game. In that setting, it is known that the best the team can do is sample a profile of potentially randomized strategies (one per player) from a joint… ▽ More We focus on the problem of finding an optimal strategy for a team of two players that faces an opponent in an imperfect-information zero-sum extensive-form game. Team members are not allowed to communicate during play but can coordinate before the game. In that setting, it is known that the best the team can do is sample a profile of potentially randomized strategies (one per player) from a joint (a.k.a. correlated) probability distribution at the beginning of the game. In this paper, we first provide new modeling results about computing such an optimal distribution by drawing a connection to a different literature on extensive-form correlation. Second, we provide an algorithm that computes such an optimal distribution by only using profiles where only one of the team members gets to randomize in each profile. We can also cap the number of such profiles we allow in the solution. This begets an anytime algorithm by increasing the cap. We find that often a handful of well-chosen such profiles suffices to reach optimal utility for the team. This enables team members to reach coordination through a relatively simple and understandable plan. Finally, inspired by this observation and leveraging theoretical concepts that we introduce, we develop an efficient column-generation algorithm for finding an optimal distribution for the team. We evaluate it on a suite of common benchmark games. It is three orders of magnitude faster than the prior state of the art on games that the latter can solve and it can also solve several games that were previously unsolvable. △ Less

Submitted 21 September, 2020; originally announced September 2020.

arXiv:2006.15977 [pdf, other]

A privacy-preserving tests optimization algorithm for epidemics containment

Authors: Alessandro Nuara, Francesco Trovò, Nicola Gatti

Abstract: The SARS-CoV-2 outbreak changed the everyday life of almost all the people over the world.Currently, we are facing with the problem of containing the spread of the virus both using the more effective forced lockdown, which has the drawback of slowing down the economy of the involved countries, and by identifying and isolating the positive individuals, which, instead, is an hard task in general due… ▽ More The SARS-CoV-2 outbreak changed the everyday life of almost all the people over the world.Currently, we are facing with the problem of containing the spread of the virus both using the more effective forced lockdown, which has the drawback of slowing down the economy of the involved countries, and by identifying and isolating the positive individuals, which, instead, is an hard task in general due to the lack of information. For this specific disease, the identificato of the infected is particularly challenging since there exists cathegories, namely the asymptomatics, who are positive and potentially contagious, but do not show any of the symptoms of SARS-CoV-2. Until the developement and distribution of a vaccine is not yet ready, we need to design ways of selecting those individuals which are most likely infected, given the limited amount of tests which are available each day. In this paper, we make use of available data collected by the so called contact tracing apps to develop an algorithm, namely PPTO, that identifies those individuals that are most likely positive and, therefore, should be tested. While in the past these analysis have been conducted by centralized algorithms, requiring that all the app users data are gathered in a single database, our protocol is able to work on a device level, by exploiting the communication of anonymized information to other devices. △ Less

Submitted 22 March, 2021; v1 submitted 24 June, 2020; originally announced June 2020.

Comments: added figures fixed typos added table of notation

arXiv:2004.00603 [pdf, other]

No-Regret Learning Dynamics for Extensive-Form Correlated Equilibrium

Authors: Andrea Celli, Alberto Marchesi, Gabriele Farina, Nicola Gatti

Abstract: The existence of simple, uncoupled no-regret dynamics that converge to correlated equilibria in normal-form games is a celebrated result in the theory of multi-agent systems. Specifically, it has been known for more than 20 years that when all players seek to minimize their internal regret in a repeated normal-form game, the empirical frequency of play converges to a normal-form correlated equilib… ▽ More The existence of simple, uncoupled no-regret dynamics that converge to correlated equilibria in normal-form games is a celebrated result in the theory of multi-agent systems. Specifically, it has been known for more than 20 years that when all players seek to minimize their internal regret in a repeated normal-form game, the empirical frequency of play converges to a normal-form correlated equilibrium. Extensive-form (that is, tree-form) games generalize normal-form games by modeling both sequential and simultaneous moves, as well as private information. Because of the sequential nature and presence of partial information in the game, extensive-form correlation has significantly different properties than the normal-form counterpart, many of which are still open research directions. Extensive-form correlated equilibrium (EFCE) has been proposed as the natural extensive-form counterpart to normal-form correlated equilibrium. However, it was currently unknown whether EFCE emerges as the result of uncoupled agent dynamics. In this paper, we give the first uncoupled no-regret dynamics that converge to the set of EFCEs in $n$-player general-sum extensive-form games with perfect recall. First, we introduce a notion of trigger regret in extensive-form games, which extends that of internal regret in normal-form games. When each player has low trigger regret, the empirical frequency of play is close to an EFCE. Then, we give an efficient no-trigger-regret algorithm. Our algorithm decomposes trigger regret into local subproblems at each decision point for the player, and constructs a global strategy of the player from the local solutions at each decision point. △ Less

Submitted 2 September, 2022; v1 submitted 1 April, 2020; originally announced April 2020.

arXiv:2003.01452 [pdf, other]

Online Joint Bid/Daily Budget Optimization of Internet Advertising Campaigns

Authors: Alessandro Nuara, Francesco Trovò, Nicola Gatti, Marcello Restelli

Abstract: Pay-per-click advertising includes various formats (\emph{e.g.}, search, contextual, social) with a total investment of more than 200 billion USD per year worldwide. An advertiser is given a daily budget to allocate over several, even thousands, campaigns, mainly distinguishing for the ad, target, or channel. Furthermore, publishers choose the ads to display and how to allocate them employing auct… ▽ More Pay-per-click advertising includes various formats (\emph{e.g.}, search, contextual, social) with a total investment of more than 200 billion USD per year worldwide. An advertiser is given a daily budget to allocate over several, even thousands, campaigns, mainly distinguishing for the ad, target, or channel. Furthermore, publishers choose the ads to display and how to allocate them employing auctioning mechanisms, in which every day the advertisers set for each campaign a bid corresponding to the maximum amount of money per click they are willing to pay and the fraction of the daily budget to invest. In this paper, we study the problem of automating the online joint bid/daily budget optimization of pay-per-click advertising campaigns over multiple channels. We formulate our problem as a combinatorial semi-bandit problem, which requires solving a special case of the Multiple-Choice Knapsack problem every day. Furthermore, for every campaign, we capture the dependency of the number of clicks on the bid and daily budget by Gaussian Processes, thus requiring mild assumptions on the regularity of these functions. We design four algorithms and show that they suffer from a regret that is upper bounded with high probability as O(sqrt{T}), where T is the time horizon of the learning process. We experimentally evaluate our algorithms with synthetic settings generated from real data from Yahoo!, and we present the results of the adoption of our algorithms in a real-world application with a daily average spent of 1,000 Euros for more than one year. △ Less

Submitted 3 March, 2020; originally announced March 2020.

arXiv:2002.05190 [pdf, ps, other]

Signaling in Bayesian Network Congestion Games: the Subtle Power of Symmetry

Authors: Matteo Castiglioni, Andrea Celli, Alberto Marchesi, Nicola Gatti

Abstract: Network congestion games are a well-understood model of multi-agent strategic interactions. Despite their ubiquitous applications, it is not clear whether it is possible to design information structures to ameliorate the overall experience of the network users. We focus on Bayesian games with atomic players, where network vagaries are modeled via a (random) state of nature which determines the cos… ▽ More Network congestion games are a well-understood model of multi-agent strategic interactions. Despite their ubiquitous applications, it is not clear whether it is possible to design information structures to ameliorate the overall experience of the network users. We focus on Bayesian games with atomic players, where network vagaries are modeled via a (random) state of nature which determines the costs incurred by the players. A third-party entity---the sender---can observe the realized state of the network and exploit this additional information to send a signal to each player. A natural question is the following: is it possible for an informed sender to reduce the overall social cost via the strategic provision of information to players who update their beliefs rationally? The paper focuses on the problem of computing optimal ex ante persuasive signaling schemes, showing that symmetry is a crucial property for its solution. Indeed, we show that an optimal ex ante persuasive signaling scheme can be computed in polynomial time when players are symmetric and have affine cost functions. Moreover, the problem becomes NP-hard when players are asymmetric, even in non-Bayesian settings. △ Less

Submitted 12 February, 2020; originally announced February 2020.

arXiv:2002.05156 [pdf, ps, other]

Public Bayesian Persuasion: Being Almost Optimal and Almost Persuasive

Authors: Matteo Castiglioni, Andrea Celli, Nicola Gatti

Abstract: Persuasion studies how an informed principal may influence the behavior of agents by the strategic provision of payoff-relevant information. We focus on the fundamental multi-receiver model by Arieli and Babichenko (2019), in which there are no inter-agent externalities. Unlike prior works on this problem, we study the public persuasion problem in the general setting with: (i) arbitrary state spac… ▽ More Persuasion studies how an informed principal may influence the behavior of agents by the strategic provision of payoff-relevant information. We focus on the fundamental multi-receiver model by Arieli and Babichenko (2019), in which there are no inter-agent externalities. Unlike prior works on this problem, we study the public persuasion problem in the general setting with: (i) arbitrary state spaces; (ii) arbitrary action spaces; (iii) arbitrary sender's utility functions. We fully characterize the computational complexity of computing a bi-criteria approximation of an optimal public signaling scheme. In particular, we show, in a voting setting of independent interest, that solving this problem requires at least a quasi-polynomial number of steps even in settings with a binary action space, assuming the Exponential Time Hypothesis. In doing so, we prove that a relaxed version of the Maximum Feasible Subsystem of Linear Inequalities problem requires at least quasi-polynomial time to be solved. Finally, we close the gap by providing a quasi-polynomial time bi-criteria approximation algorithm for arbitrary public persuasion problems that, in specific settings, yields a QPTAS. △ Less

Submitted 31 March, 2020; v1 submitted 12 February, 2020; originally announced February 2020.

arXiv:1912.07712 [pdf, other]

Coordination in Adversarial Sequential Team Games via Multi-Agent Deep Reinforcement Learning

Authors: Andrea Celli, Marco Ciccone, Raffaele Bongo, Nicola Gatti

Abstract: Many real-world applications involve teams of agents that have to coordinate their actions to reach a common goal against potential adversaries. This paper focuses on zero-sum games where a team of players faces an opponent, as is the case, for example, in Bridge, collusion in poker, and collusion in bidding. The possibility for the team members to communicate before gameplay---that is, coordinate… ▽ More Many real-world applications involve teams of agents that have to coordinate their actions to reach a common goal against potential adversaries. This paper focuses on zero-sum games where a team of players faces an opponent, as is the case, for example, in Bridge, collusion in poker, and collusion in bidding. The possibility for the team members to communicate before gameplay---that is, coordinate their strategies ex ante---makes the use of behavioral strategies unsatisfactory. We introduce Soft Team Actor-Critic (STAC) as a solution to the team's coordination problem that does not require any prior domain knowledge. STAC allows team members to effectively exploit ex ante communication via exogenous signals that are shared among the team. STAC reaches near-optimal coordinated strategies both in perfectly observable and partially observable games, where previous deep RL algorithms fail to reach optimal coordinated behaviors. △ Less

Submitted 16 December, 2019; originally announced December 2019.

Comments: Preliminary version

arXiv:1911.07755 [pdf, other]

Learning Probably Approximately Correct Maximin Strategies in Simulation-Based Games with Infinite Strategy Spaces

Authors: Alberto Marchesi, Francesco Trovò, Nicola Gatti

Abstract: We tackle the problem of learning equilibria in simulation-based games. In such games, the players' utility functions cannot be described analytically, as they are given through a black-box simulator that can be queried to obtain noisy estimates of the utilities. This is the case in many real-world games in which a complete description of the elements involved is not available upfront, such as com… ▽ More We tackle the problem of learning equilibria in simulation-based games. In such games, the players' utility functions cannot be described analytically, as they are given through a black-box simulator that can be queried to obtain noisy estimates of the utilities. This is the case in many real-world games in which a complete description of the elements involved is not available upfront, such as complex military settings and online auctions. In these situations, one usually needs to run costly simulation processes to get an accurate estimate of the game outcome. As a result, solving these games begets the challenge of designing learning algorithms that can find (approximate) equilibria with high confidence, using as few simulator queries as possible. Moreover, since running the simulator during the game is unfeasible, the algorithms must first perform a pure exploration learning phase and, then, use the (approximate) equilibrium learned this way to play the game. In this work, we focus on two-player zero-sum games with infinite strategy spaces. Drawing from the best arm identification literature, we design two algorithms with theoretical guarantees to learn maximin strategies in these games. The first one works in the fixed-confidence setting, guaranteeing the desired confidence level while minimizing the number of queries. Instead, the second algorithm fits the fixed-budget setting, maximizing the confidence without exceeding the given maximum number of queries. First, we formally prove δ-PAC theoretical guarantees for our algorithms under some regularity assumptions, which are encoded by letting the utility functions be drawn from a Gaussian process. Then, we experimentally evaluate our techniques on a testbed made of randomly generated games and instances representing simple real-world security settings. △ Less

Submitted 25 February, 2020; v1 submitted 18 November, 2019; originally announced November 2019.

arXiv:1911.06198 [pdf, other]

Election Manipulation on Social Networks: Seeding, Edge Removal, Edge Addition

Authors: Matteo Castiglioni, Nicola Gatti, Giulia Landriani, Diodato Ferraioli

Abstract: We focus on the election manipulation problem through social influence, where a manipulator exploits a social network to make her most preferred candidate win an election. Influence is due to information in favor of and/or against one or multiple candidates, sent by seeds and spreading through the network according to the independent cascade model. We provide a comprehensive study of the election… ▽ More We focus on the election manipulation problem through social influence, where a manipulator exploits a social network to make her most preferred candidate win an election. Influence is due to information in favor of and/or against one or multiple candidates, sent by seeds and spreading through the network according to the independent cascade model. We provide a comprehensive study of the election control problem, investigating two forms of manipulations: seeding to buy influencers given a social network, and removing or adding edges in the social network given the seeds and the information sent. In particular, we study a wide range of cases distinguishing for the number of candidates or the kind of information spread over the network. Our main result is positive for democracy, and it shows that the election manipulation problem is not affordable in the worst-case except for trivial classes of instances, even when one accepts to approximate the margin of victory. In the case of seeding, we also show that the manipulation is hard even if the graph is a line and that a large class of algorithms, including most of the approaches recently adopted for social-influence problems, fail to compute a bounded approximation even on elementary networks, as undirected graphs with every node having a degree at most two or directed trees. In the case of edge removal or addition, our hardness results also apply to the basic case of social influence maximization/minimization. In contrast, the hardness of election manipulation holds even when the manipulator has an unlimited budget, being allowed to remove or add an arbitrary number of edges. △ Less

Submitted 27 February, 2020; v1 submitted 14 November, 2019; originally announced November 2019.

Comments: arXiv admin note: text overlap with arXiv:1902.03779

arXiv:1910.06228 [pdf, other]

Learning to Correlate in Multi-Player General-Sum Sequential Games

Authors: Andrea Celli, Alberto Marchesi, Tommaso Bianchi, Nicola Gatti

Abstract: In the context of multi-player, general-sum games, there is an increasing interest in solution concepts modeling some form of communication among players, since they can lead to socially better outcomes with respect to Nash equilibria, and may be reached through learning dynamics in a decentralized fashion. In this paper, we focus on coarse correlated equilibria (CCEs) in sequential games. First,… ▽ More In the context of multi-player, general-sum games, there is an increasing interest in solution concepts modeling some form of communication among players, since they can lead to socially better outcomes with respect to Nash equilibria, and may be reached through learning dynamics in a decentralized fashion. In this paper, we focus on coarse correlated equilibria (CCEs) in sequential games. First, we complete the picture on the complexity of finding social-welfare-maximizing CCEs by showing that the problem is not in Poly-APX unless P = NP. Furthermore, simple arguments show that CFR - working with behavioral strategies - may not converge to a CCE. However, we devise a simple variant (CFR-S) which provably converges to the set of CCEs, but may be empirically inefficient. Thus, we design a variant of the CFR algorithm (called CFR-Jr) which approaches the set of CCEs with a regret bound sub-linear in the size of the game, and is shown to be dramatically faster than CFR-S and the state-of-the-art algorithms to compute CCEs. △ Less

Submitted 14 October, 2019; originally announced October 2019.

arXiv:1908.10620 [pdf, other]

Persuading Voters: It's Easy to Whisper, It's Hard to Speak Loud

Authors: Matteo Castiglioni, Andrea Celli, Nicola Gatti

Abstract: We focus on the following natural question: is it possible to influence the outcome of a voting process through the strategic provision of information to voters who update their beliefs rationally? We investigate whether it is computationally tractable to design a signaling scheme maximizing the probability with which the sender's preferred candidate is elected. We focus on the model recently intr… ▽ More We focus on the following natural question: is it possible to influence the outcome of a voting process through the strategic provision of information to voters who update their beliefs rationally? We investigate whether it is computationally tractable to design a signaling scheme maximizing the probability with which the sender's preferred candidate is elected. We focus on the model recently introduced by Arieli and Babichenko (2019) (i.e., without inter-agent externalities), and consider, as explanatory examples, $k$-voting rule and plurality voting. There is a sharp contrast between the case in which private signals are allowed and the more restrictive setting in which only public signals are allowed. In the former, we show that an optimal signaling scheme can be computed efficiently both under a $k$-voting rule and plurality voting. In establishing these results, we provide two general (i.e., applicable to settings beyond voting) contributions. Specifically, we extend a well known result by Dughmi and Xu (2017) to more general settings, and prove that, when the sender's utility function is anonymous, computing an optimal signaling scheme is fixed parameter tractable w.r.t. the number of receivers' actions. In the public signaling case, we show that the sender's optimal expected return cannot be approximated to within any factor under a $k$-voting rule. This negative result easily extends to plurality voting and problems where utility functions are anonymous. △ Less

Submitted 23 September, 2019; v1 submitted 28 August, 2019; originally announced August 2019.

Showing 1–50 of 81 results for author: Gatti, N