Collaborative Decision-Making and the $k$ -Strong Price of Anarchy in Common Interest Games

Bryce L. Ferguson \IEEEmembershipStudent Member, IEEE Dario Paccagnan \IEEEmembershipMember, IEEE Bary S. R. Pradelski and Jason R. Marden \IEEEmembershipSenior Member, IEEE This work was supported in part by the Office of Naval Research under Grant # N00014-20-1-2359, the Air Force Office of Scientific Research under Grants # FA95550-20-1-0054 and # FA9550-21-1-0203, and the French National Research Agency (ANR) under Grant # ANR-19-CE48-0018-01.Bryce L. Ferguson and Jason R. Marden are with the Department of Electrical and Computer Engineering at the University of California, Santa Barbara, CA. {blferguson,jrmarden}@ece.ucsb.edu.Dario Paccagnan is with the Department of Computing at Imperial College London, UK. [email protected].Bary S. R. Pradelski is with the National Centre for Scientific Research (CNRS) and the Department of Economics at Oxford, UK.[email protected].

Abstract

The control of large-scale, multi-agent systems often entails distributing decision-making across the system components. However, with advances in communication and computation technologies, we can consider new collaborative decision-making paradigms that bridge centralized and distributed control architectures. In this work, we seek to understand the benefits and costs of increased collaborative communication in multi-agent systems. We specifically study this in the context of common interest games in which groups of up to $k$ agents can coordinate their actions in maximizing a common objective function. The equilibria that emerge in these systems are the $k$ -strong Nash equilibria of the common interest game; studying the properties of these states provides relevant insights into the efficacy of inter-agent collaboration. Our contributions come threefold: 1) provide bounds on how well $k$ -strong Nash equilibria approximate the optimal system welfare, formalized by the $k$ -strong price of anarchy, 2) prove the run-time and transient performance of collaborative agent-based dynamics, and 3) introduce techniques of redesigning objectives for groups of agents which improve system performance. We study these three facets generally as well as in the context of resource allocation problems, in which we provide tractable linear programs that give tight bounds on the $k$ -strong price of anarchy.

1 Introduction

Large-scale systems such as transportation services [1], robotic fleets [2], supply chains [3], or cloud computing services [4] can be challenging to design effective control schemes for due to their many components and vast scale. The two prevailing paradigms to design control schemes are centralized control [5, 6, 7], which guides behavior across the entire system and distributed control [8, 9, 10], which allows local components to guide their own behavior. Each of these approaches possesses respective pros and cons: centralization allows for more direct manipulation of system behavior at the cost of greater communication and computation requirements, while decentralization reduces the communication and computation requirements but cannot always attain the desired system behavior. Advancements in embedded communication and computation [11, 12, 13, 14] enable the design of new paradigms that exist between centralized and distributed control.

Specifically, we study the efficacy of learning in multi-agent systems when individual system components (or agents) can partially communicate and thus coordinate their behavior. Many engineering domains are on the precipice of enabling these collaborative paradigms; for example, autonomous vehicle platoons with connected cruise control [15], unmanned aerial surveillance vehicles with range-limited communication [16], and cloud computing networks with emerging distributed learning techniques [17]. In each of these settings, inter-agent communication and collaboration offer the opportunity to improve the performance attainable by the system as a whole; however, implementing these frameworks incurs costs that are both monetary–in the form of the additional technology required–and computational–in the form of more complex decision-making algorithms. In this work, we provide tools to help better understand the benefits and costs associated with collaborative communication in multi-agent systems.

Refer to caption — Figure 1: Illustration of the $k$ -strong Nash equilibrium local optimality guarantee for a three-agent common-interest game where $k\in\{1,2,3\}$ . In each case, if the dark cube is a $k$ -strong Nash equilibrium, then it is optimal over the highlighted region with respect to the shared objective function $W$ . As $k$ (the size of collaborative groups) increases, the local optimality is strengthened by holding overall $k$ -lateral deviations.

We model a multi-agent system as a common interest game where some (but not all) groups of agents can collaborate in selecting their actions to maximize the system welfare. We particularly focus on the case where a collaborative action takes the form of a group best response, i.e., a group of agents updating their actions in response to the remaining players’ actions. As the size and number of these collaborative groups increase, a coordinated group decision has a larger impact on system behavior. To range the level of collaboration between the fully distributed setting (where no agents can collaborate) and the fully centralized setting (where all agents can collaborate collectively), we consider the cases where groups of up to $k$ agents can collaborate. In these collaborative environments, a stable state of the system is that of the $k$ -strong Nash equilibrium [18]. Researchers have studied the existence [19] and computation [20] of strong Nash equilibria in settings including congestion games [21], lexicographical games [22], and Markov games [23]. This work applies these concepts to multi-agent systems. To understand the possible benefits of collaboration to system performance, we quantify how well $k$ -strong Nash equilibria approximate the optimal welfare, termed the $k$ -strong price of anarchy [24, 25]. To understand the possible cost of collaboration, we analyze the running time and transient performance of agent-based dynamics, which converge to $k$ -strong Nash equilibria.

Distributed learning in games has been a widely studied area in controls [26], but the ability to reach equilibrium with coalitional best responses has not yet been studied; we thus study the added run time of collaborative algorithms. Quantifying the $k$ -strong price of anarchy has been studied in network formation games [24, 25] and load balancing games [27, 28, 29], as well as more general utility maximizing games [30, 31]. In many of these, the bounds are either not tight (particularly for finitely many players) or hold for equilibria which need not exist. By focusing on the class of common interest games, we guarantee the existence of collaborative equilibria, provide tight approximation bounds, and develop new insights into collaborative multi-agent optimization.

Organization - This work provides tools to understand the benefits and costs of collaborative communication by studying the qualities of $k$ -strong Nash equilibria. In Section 3.1, we consider the case where groups of agents are designed to maximize the system welfare and introduce the notion of $(\lambda,\mu)$ - $k$ -coalitionally smooth games (a generalization of smooth games [32] and coalitionally smooth games [30]), and provide bounds on the $k$ -strong price of anarchy. Then, in Section 3.2, we focus on the well-studied setting of distributed resource allocation problems [33, 34, 35, 36, 37, 38], and provide tight bounds on the $k$ -strong price of anarchy via the solution of a tractable linear program. Fig. 3 plots these bounds and demonstrates how increased collaboration improves efficiency guarantees in several classes of resource allocation problems. In Section 4, we consider the effects of group decision-making on agent-based dynamics; specifically, we show the added run-time complexity of coalitional round-robin dynamics and provide transient performance guarantees of asynchronous best response dynamics. We support our findings with numerical examples. In Section 5, we consider that the system operator may be able to design the agents’ objective separately from the system welfare; we provide a generalized technique for bounding the $k$ -strong price of anarchy in this setting. In Section 5.2, we again focus on the setting of resource allocation and provide two linear programs to lower and upper bound the attainable $k$ -strong price of anarchy guarantee via utility design.

2 Preliminaries

Throughout, we will denote $[n]=\{1,\ldots,n\}$ . We will regularly use the binomial coefficient $\binom{n}{k}=\frac{n!}{(n-k)!k!}$ in constructing optimization problems; we define this value as $0$ when $n<k$ for ease of notation.

2.1 Collaborative Decision Making

Consider a finite set of agents $N=\{1,\ldots,n\}$ . Each agent $i\in N$ selects an action $a_{i}$ from a finite action set $\mathcal{A}_{i}$ . When each agent selects an action, we will denote their joint action by the tuple $a=(a_{1},\ldots,a_{n})\in\mathcal{A}=\mathcal{A}_{1}\times\cdots\times\mathcal% {A}_{n}$ . Let $G=(N,\mathcal{A})$ be a tuple encoding the components of the agent environment. The system’s performance is dictated by the agents’ actions; as such, for each joint-action $a$ , we assign a system welfare $W(a)$ where $W:\mathcal{A}\rightarrow\mathbb{R}_{\geq 0}$ is the system designer’s objective function. With this, we let the tuple $(G,W)$ denote a multi-agent system (often referred to as a system), which defines the primitives of the system designer’s problem of designing an effective control algorithm.

The system designer would like to configure the agents to reach a joint action that maximizes the system welfare, i.e.,

a^{\rm opt}\in\operatorname*{arg\,max}_{a\in\mathcal{A}}W(a).

(1)

Though this system state is ideal, it may be difficult to attain as 1) solving for the optimal allocation can be combinatorial and in some cases (including those from Section 3.2) NP-hard [33], and 2) it requires a centralized authority to control all agents, which may be practically or logistically difficult. To resolve this, we will consider that agents make decisions in a decentralized manner.

Fully distributing the decision-making involves designing each agent to update their action locally and has been widely studied and developed to guarantee reasonable system behavior [8]; however, fully distributing decision-making may often become unnecessary as emerging communication technologies enable collaborative inter-agent decision-making[11]. To implement one such collaborative system architecture, a system operator must make two decisions: 1) which group of agents can collaborate on their decisions (possibly subject to some operational constraints), and 2) how the agents should collaborate on their decisions. A natural choice for the latter is a group best response. Let $\Gamma\subseteq N$ be a group of agents endowed with the ability to collaboratively select a group action $a_{\Gamma}\in\mathcal{A}_{\Gamma}={\prod}_{i\in\Gamma}\mathcal{A}_{i}$ , which they select by maximizing the system welfare over their group action-set,

a_{\Gamma}\in\operatorname*{arg\,max}_{a_{\Gamma}^{\prime}\in\mathcal{A}_{% \Gamma}}W(a_{\Gamma}^{\prime},a_{-\Gamma}),

(2)

where $a_{-\Gamma}$ denotes the actions of the players $i\in N\setminus\Gamma$ . If there are multiple elements in the argmax, the group breaks them at random unless they can remain with their current action.

Intuitively, a group best responding and collaboratively maximizing the system welfare should lead to direct improvements to system performance; however, one can consider other group decision-making rules as well. In particular, in Section 5, we will consider that the system designer can design the agents’ objective separately from the system objective as a means to further shape system behavior. In either case, one would imagine that the greater the collaborative structure, the greater the impact on emergent behavior.

For the system operator’s decision over which groups should collaborate, let $\mathcal{C}\subseteq 2^{N}$ denote the collaboration set, or the set of groups of agents ( $\Gamma\in\mathcal{C}$ ) able to collaborate their decisions. These collaborations can overlap–where agents can partake in multiple, disparate collaborations–and vary in size. For example, if agents send signals through a communication network [39], we will have $\mathcal{C}=\{(i,j)\in N^{2}\mid(i,j)\in E\}$ where $E$ are the edges in a communication graph. If agents are allowed to communicate with each other one at a time and make pairwise decisions [40], then $\mathcal{C}=\{(i,j)\in N^{2}\}$ . If agents can only communicate with others within a local proximity [41], then $\mathcal{C}=\{\Gamma\subseteq N\mid\rho(i,j)\leq d\leavevmode\nobreak\ \forall i% ,j\in\Gamma\}$ where $\rho$ measures the distance between two agents and $d$ is a maximum communication range. Once the system operator decides on the collaborative structure and the group decision-making protocol, the agents’ decision-making process forms a collaborative multi-agent system, denoted by the tuple $(G,W,\mathcal{C})$ .

As we vary the number and size of collaborative sets, we can consider control paradigms somewhere between centralized (i.e., $\{N\}\in\mathcal{C}$ ) and fully distributed (i.e., $\mathcal{C}=\left\{\{1\},\{2\},\ldots,\{n\}\right\}$ . This work seeks to understand the efficacy of different levels of communication/collaboration. To more effectively quantify this, we consider a specific type of collaboration set in which we can range between the centralized and distributed extremes.

2.2 k-Strong Nash Equilibria

We consider the collaboration sets that contain groups of agents up to size $k$ . Let $\mathcal{C}_{k}=\{\Gamma\subseteq N\mid|\Gamma|=k\}$ denote the subsets of exactly $k$ agents and $\mathcal{C}_{[k]}=\bigcup_{\zeta\in[k]}\mathcal{C}_{\zeta}$ be the subsets that contain at most $k$ agents. When $k=1$ , we recover the fully distributed setting, and when $k=n$ , we recover the fully centralized setting. As we vary $k$ between $1$ and $n$ , we sweep through different levels of communication and collaboration.

In the game-theoretic approach to multi-agent systems, a Nash equilibrium is a joint action where no agent can unilaterally deviate their action to improve the system welfare [18]. We generalize this concept to the setting of collaborative decision-making by considering a $k$ -strong Nash equilibrium as a joint action where no group of $k$ agents can deviate their group’s actions to improve the welfare.

Definition 1.

A joint-action $a^{k{\rm SNE}}\in\mathcal{A}$ is a $k$ -strong Nash equilibrium for the common-interest game $(G,W,\mathcal{C}_{[k]})$ if

W(a^{k{\rm SNE}})\geq W(a^{\prime}_{\Gamma},a^{k{\rm SNE}}_{-\Gamma}),% \leavevmode\nobreak\ \forall a^{\prime}_{\Gamma}\in\mathcal{A}_{\Gamma},% \leavevmode\nobreak\ \Gamma\in\mathcal{C}_{[k]}.

(3)

Let $k{\rm SNE}(G,W)\subseteq\mathcal{A}$ denote the set of all $k$ -strong Nash equilibria. Note that when $k=1$ , we recover the classical definition of a Nash equilibrium, and when $k=n$ , the equilibrium condition implies global optimality. Definition 1 differs slightly from the literature, where $k$ -strong Nash equilibria are defined by no group of agents deviating to a new group action that is Pareto-optimal for the group (i.e., no agent receives a lower payoff with respect to their individual utility function) [18]; when the agents respond to a common interest objective, the definitions are equivalent. Additionally, in general games, $k$ -strong Nash equilibria need not exist; however, that is not the case in our setting due to the common-interest structure we impose on agent decision-making.

Proposition 2.1.

In a system $(G,W)$ with collaboration set $\mathcal{C}_{[k]}$ for any $k\in[n]$ , a $k$ -strong Nash equilibrium exists.

The proof appears in the appendix.

The main focus of this work is understanding how equilibrium performance changes with the level of collaborative communication. Notice that (3) serves as a local optimality guarantee in the neighborhood of $k$ -lateral deviations. Fig. 1 depicts this for a three-player matrix game; when $k=1$ , a $1$ -strong Nash equilibrium (or just Nash equilibrium) is optimal over the unilateral deviations, when $k=2$ a $2$ -strong Nash equilibrium is optimal over the bilateral deviations, and when $k=3=n$ , the $3$ -strong Nash equilibrium is optimal over the whole joint-action space. From this, we observe that the local optimality guarantee is strengthened as we increase the level of collaboration $k$ (i.e., $k^{\prime}{\rm SNE}\subseteq k{\rm SNE}$ for $k^{\prime}>k$ ).

To quantify the effect of varying $k$ on equilibrium performance, we consider the ratio of worst-case equilibrium welfare and the optimal attainable welfare, termed the $k$ -strong price of anarchy.

\mathrm{SPoA}_{k}(G,W)=\frac{\min_{a^{k{\rm SNE}}\in k{\rm SNE}(G,W)}W(a^{k{% \rm SNE}})}{\max_{a^{\rm opt}\in\mathcal{A}}W(a^{\rm opt})}\in[0,1],

(4)

where we let $0/0$ be defined as $1$ to ignore the degenerate case when no welfare is attainable. In the multi-agent system $(G,W)$ with communication structure $\mathcal{C}_{[k]}$ , every $k$ -strong Nash equilibrium approximates the optimal solution at least as well as $\mathrm{SPoA}_{k}(G,W)$ . Accordingly, we will use the $k$ -strong price of anarchy to understand the efficiency associated with collaborative decision-making. For example, in Fig. 2, we depict the $k$ -strong price of anarchy in resource covering games [33] for $1\leq k\leq n$ , illustrating the performance guarantees attainable between centralized and distributed control paradigms.

2.3 Summary of Contributions

This work studies the benefits and costs of increased collaborative communication within multi-agent systems. Our contributions come threefold:
1) In Section 3, we provide tools to quantify the $k$ -strong price of anarchy when agents optimize the system objective. We introduce $(\lambda,\mu)$ - $k$ -coalitionally smooth games and provide a $k$ -strong price of anarchy guarantee using the parameters $\lambda$ and $\mu$ . We then focus on the class of resource allocation games, where in Proposition 3.2, we show that these parameters can be found via the solution to a tractable linear program. In Theorem 3.3, we show that combining the constraints of each of the $k$ linear programs gives a tight bound. Figure 3 depicts the $k$ -strong price of anarchy for several classes of resource allocation games.
2) In Section 4, we study collaborative dynamics that reach these equilibria. In Section 4.1, we introduce the coalitional round-robin dynamics and show that an equilibrium is reached in a finite number of best responses and that the number of welfare comparisons grows with a small-base exponential of $k$ . In Section 4.2, we introduce the asynchronous coalitional best response dynamics, which we show converge almost surely. Further, if the game is $(\lambda,\mu)$ - $k$ -coalitionally smooth, then we provide a bound on the transient performance (or the cumulative welfare along the dynamics). We support these findings with a numerical study in Section 4.3.
3) In Section 5, we consider how to improve the design of a group’s decision-making process. By providing the agents with a new, designed objective function, the system designer may alter the set of equilibria and ideally increase the $k$ -strong price of anarchy. In Section 5.1, we generalize the notion of coalitional smoothness to the setting where the agents’ objective differs from the system welfare, and in Theorem 5.2, we show how we can construct an optimal utility rule. Fig. 5 shows the $k$ -strong price of anarchy under the optimal utility design for resource allocation games, demonstrating the added benefit of designing how groups of agents make decisions.

3 Quantifying $k$ -Strong Price of Anarchy

3.1 Coalitionally Smooth Games

We first consider the efficiency of $k$ -strong Nash equilibria for general multi-agent systems. This efficiency–quantified by the $k$ -strong price of anarchy–is conditioned on the system welfare $W$ and the agent decision-making environment $G$ . In Definition 2, we provide a condition on a system $(G,W)$ that will be useful in bounding the $k$ -strong price of anarchy.

Definition 2.

A system $(G,W)$ is $(\lambda,\mu)$ - $k$ -coalitionally smooth, where $\lambda,\mu\in\mathbb{R}_{\geq 0}^{k}$ , if for all $a,a^{\prime}\in\mathcal{A}$

\frac{1}{\binom{n}{\zeta}}\sum_{\Gamma\in\mathcal{C}_{\zeta}}W(a_{\Gamma}^{% \prime},a_{-\Gamma})\geq\lambda_{\zeta}W(a^{\prime})-\mu_{\zeta}W(a),% \leavevmode\nobreak\ \forall\zeta\in[k].

(5)

In (5), we provide a constraint on the welfare function stating that the average effect of a group of size $\zeta$ deviating their action from $a$ to $a^{\prime}$ is lower bounded by a linear combination of the welfare of $a$ and $a^{\prime}$ . The term smooth is in reference to the welfare function’s change over the joint-action space being bounded by (5). Additionally, Definition 2 extends the classic notion of smooth games [32] and coalitional smoothness for strong equilibria [30] to the setting of $k$ -coalitions in common interest games.

In effect, every system $(G,W)$ is smooth with $\lambda_{\zeta}=\mu_{\zeta}=0$ for all $\zeta\in[k]$ , but some parameters $(\lambda,\mu)$ are more useful than others. In Proposition 3.1, we show that the parameters $\lambda$ and $\mu$ from Definition 2 can be used to lower bound the $k$ -strong price of anarchy.

Proposition 3.1.

A system $(G,W)$ that is $(\lambda,\mu)$ - $k$ -coalitionally smooth has $k$ -strong price of anarchy satisfying

\mathrm{SPoA}_{k}(G,W)\geq\frac{\lambda_{\zeta}}{1+\mu_{\zeta}},\leavevmode% \nobreak\ \forall\zeta\in[k].

(6)

Proof.

Let $a^{k{\rm SNE}}\in\mathcal{A}$ denote a $k$ -strong Nash equilibrium in the system $(G,W)$ (i.e., satisfying Definition 1), and let $a^{\rm opt}\in\operatorname*{arg\,max}_{a\in\mathcal{A}}W(a)$ denote an optimal joint action. For any $\zeta\in[k]$ , we have


$\displaystyle W(a^{k{\rm SNE}})$	$\displaystyle=\frac{1}{\binom{n}{\zeta}}\sum_{\Gamma\in\mathcal{C}_{\zeta}}W(a% ^{k{\rm SNE}})$	(7a)
	$\displaystyle\geq\frac{1}{\binom{n}{\zeta}}\sum_{\Gamma\in\mathcal{C}_{\zeta}}% W(a^{\rm opt}_{\Gamma},a^{k{\rm SNE}}_{-\Gamma})$	(7b)
	$\displaystyle\geq\lambda_{\zeta}W(a^{\rm opt})-\mu_{\zeta}W(a^{k{\rm SNE}}).$	(7c)

Where (7a) holds from $|\mathcal{C}_{\zeta}|=\binom{n}{\zeta}$ , (7b) holds from Definition 1, and (7c) holds from Definition 2. Rearranging, we get $W(a^{k{\rm SNE}})/W(a^{\rm opt})\geq\lambda_{\zeta}/(1+\mu_{\zeta})$ . ∎

(6) provides $k$ lower bounds on the $k$ -strong price of anarchy; accordingly, a $k$ -strong Nash equilibrium approximates the system optimal at least as well as $\max_{\zeta\in[k]}\{\lambda_{\zeta}/(1+\mu_{\zeta})\}$ . Often, the best lower bound is provided by $\zeta=k$ ; however, this is not true in general. As such, we must consider each of the constraints in (5) to derive the best bounds.

The efficiency bounds of this form are valuable for several reasons, including: 1) they can be used to provide insights on the transient guarantees of various multi-agent dynamics (see Section 4, 2) they easily generalize to broader equilibrium concepts (subject of future work), and 3) if parameters $(\lambda,\mu)$ can be shown to satisfy (5) for a set of systems $\mathcal{S}$ , then each system $(G,W)\in\mathcal{S}$ inherits the efficiency guarantee of (6). This last point is particularly pertinent, as system models may be subject to noise, mischaracterizations, or changes over time. If the efficiency guarantee holds across many similar systems, then the guarantees are essentially robust to these issues.

In the Section 3.2, we will provide methods to find coalitional smoothness parameters for classes of resource allocation games via tractable linear programs.

3.2 Resource Allocation Games

In this subsection, we consider the well-studied class of resource allocation games [35, 33, 36, 37, 42]. Consider a set of resources or tasks $\mathcal{R}=\{1,\ldots,R\}$ , to which agents are assigned, i.e., agent $i\in N$ selects a subset of these resources as its action $a_{i}\subseteq\mathcal{R}$ from a constrained set of subsets $\mathcal{A}_{i}\subseteq 2^{\mathcal{R}}$ . Each resource $r\in\mathcal{R}$ has a value $v_{r}\geq 0$ ; the welfare contributed by a resource is $v_{r}w(|a|_{r})$ , where $w:\{0,\ldots,n\}\rightarrow\mathbb{R}_{\geq 0}$ captures the added benefit of having multiple agents assigned to the same resources and $|a|_{r}$ is the number of agents assigned to $r$ in allocation $a$ . Assume that $w(0)=0$ as no welfare is contributed by resources assigned to zero agents and further that $w(y)>0$ for all $y>0$ . The system welfare is thus

W(a)=\sum_{r\in\mathcal{R}}v_{r}w(|a|_{r}).

(8)

For ease of notation, we will refer to the system welfare by the local welfare rule $w$ , noting in the agent-environment $G$ , it generates a welfare function $W$ via (8).

As discussed in 3.1, we wish to find efficiency bounds that hold over a class of resource allocation problems. Let $G=(\mathcal{R},N,\mathcal{A},\{v_{r}\}_{r\in\mathcal{R}})$ denote a resource allocation problem, and let $\mathcal{G}_{n}$ denote the set of all such resource allocation problems with at most $n$ agents. In Proposition 3.2, we propose a tractable linear program whose solution provides parameters $(\lambda,\mu)$ which satisfy Definition 2 for every system $(G,w)\in\mathcal{G}_{n}\times\{w\}$ . From Proposition 3.1, this also provides a lower bound on the $k$ -strong price of anarchy for the class of resource allocation problems with local welfare $w$ .

Proposition 3.2.

Each resource allocation problem $(G,w)\in\mathcal{G}_{n}\times\{w\}$ is $(\lambda,\mu)$ - $k$ -coalitionally smooth with $\lambda_{\zeta}=1/\nu_{\zeta}^{\star}$ and $\mu_{\zeta}=\rho_{\zeta}^{\star}/\nu_{\zeta}^{\star}-1$ , where $(\rho_{\zeta}^{\star},\nu_{\zeta}^{\star})$ is a solution to the linear program (P $\zeta$ ):

	$\displaystyle(\rho_{\zeta}^{\star},\nu_{\zeta}^{\star})\in\operatorname*{arg\,% min}_{\rho\geq\nu\geq 0}\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode% \nobreak\ \rho$
	$\displaystyle{\rm s.t.}\hskip 8.0pt0\geq w(o+x)-\rho w(e+x)+$
	$\displaystyle\leavevmode\nobreak\ \leavevmode\nobreak\ \nu\left(\binom{n}{% \zeta}w(e+x)-\sum_{\begin{subarray}{c}0\leq\alpha\leq e\\ 0\leq\beta\leq o\\ \alpha+\beta\leq\zeta\end{subarray}}{e\choose\alpha}{o\choose\beta}\binom{n-e-% o}{\zeta-\alpha-\beta}w(e+x+\beta-\alpha)\right)$
	$\displaystyle\hskip 160.0pt\forall(e,x,o)\in\mathcal{I}$		(P $\zeta$ )

The constraints are parameterized by the triples $\mathcal{I}:=\{(e,x,o)\in\mathbb{N}_{\geq 0}^{3}\mid 1\leq e+x+o\leq n\}$ . With the possibility of collaboration, an equilibrium becomes more difficult to characterize than in a fully distributed setting. We circumvent this by introducing a parameterization which allows us to generalize the $\mathcal{O}\left(\sum_{\zeta=1}^{k}{n\choose\zeta}m^{\zeta}\right)$ comparisons of (3) (where $m:=\max_{i\in N}|\mathcal{A}_{i}|$ ) into $\mathcal{O}(n^{3})$ linear inequalities. Further, satisfying these inequalities provides parameters $(\lambda_{\zeta},\mu_{\zeta})$ that satisfy Definition 2, leading to (P $\zeta$ ) as a search for such parameters with the best $k$ -strong price of anarchy guarantee.

Proof of Proposition 3.2: The proof largely relies on introducing a parameterization that lets us treat (5) as a set of linear constraints. Consider a resource allocation game $(G,w)\in\mathcal{G}_{n}\times\{w\}$ and any two actions $a,a^{\prime}\in\mathcal{A}$ . To each resource $r\in\mathcal{R}$ , we assign a label $(e_{r},x_{r},o_{r})$ , where

	$\displaystyle e_{r}$	$\displaystyle=\lvert\{i\in N\mid r\in a_{i}\setminus a_{i}^{\prime}\}\rvert$
	$\displaystyle x_{r}$	$\displaystyle=\lvert\{i\in N\mid r\in a_{i}\cap a_{i}^{\prime}\}\rvert$
	$\displaystyle o_{r}$	$\displaystyle=\lvert\{i\in N\mid r\in a_{i}^{\prime}\setminus a_{i}\}\rvert.$

This is to say, $e_{r}$ denotes the number of agents utilizing resource $r$ in joint action $a$ but not $a^{\prime}$ , $o_{r}$ is the number that uses resource $r$ in joint action $a^{\prime}$ but not $a$ , and $x_{r}$ is the number that uses $r$ in both $a$ and $a^{\prime}$ . In the set of games $\mathcal{G}_{n}$ , let $\mathcal{I}=\{(e,x,o)\in\mathbb{N}_{\geq 0}^{3}\mid 1\leq e+x+o\leq n\}$ denote the set of possible labels, and $\theta(e,x,o):=\sum_{r\in\mathcal{R}_{(e,x,o)}}v_{r},$ where $\mathcal{R}_{(e,x,o)}=\{r\in\mathcal{R}\mid e_{r}=e,x_{r}=x,o_{r}=o\}$ denotes the set of resources with label $(e,x,o)$ . The parameter $\theta\in\mathbb{R}_{\geq 0}^{|\mathcal{I}|}$ is a vector with elements for each label.

We will now express the terms in (5) using this parameterization. Because $W(a)=\sum_{r\in\mathcal{R}}v_{r}w(|a|_{r})$ depends only on the number of agents utilizing a resource; we can represent $|a|_{r}=e_{r}+x_{r}$ and write the system welfare as

	$\displaystyle W(a)$	$\displaystyle=\sum_{r\in\mathcal{R}}v_{r}w(e_{r}+x_{r})$
		$\displaystyle=\sum_{(e,x,o)\in\mathcal{I}}\Bigg{(}\sum_{r\in\mathcal{R}_{e,x,o% }}v_{r}\Bigg{)}w(e+x)$
		$\displaystyle=\sum_{e,x,o}\theta(e,x,o)w(e+x).$

When not stated, the sum over $(e,x,o)$ is implied to be for each label in $\mathcal{I}$ . Similar steps can be followed to show $W(a^{\prime})=\sum_{e,x,o}\theta(e,x,o)w(o+x)$ .

Finally, the term $\sum_{\Gamma\in\mathcal{C}_{\zeta}}W(a_{\Gamma}^{\prime},a_{-\Gamma})$ can similarly be transcribed by this parameterization:

	$\displaystyle\sum_{\Gamma\in\mathcal{C}_{\zeta}}W(a^{\prime}_{\Gamma},a_{-% \Gamma})$
	$\displaystyle=\sum_{\Gamma\in\mathcal{C}_{\zeta}}\sum_{e,x,o}\sum_{r\in% \mathcal{R}_{(e,x,o)}}v_{r}w(\|a^{\prime}_{\Gamma},a_{-\Gamma}\|_{r})$
	$\displaystyle=\sum_{e,x,o}\sum_{r\in\mathcal{R}_{(e,x,o)}}v_{r}\sum_{\Gamma\in% \mathcal{C}_{\zeta}}w(\|a^{\prime}_{\Gamma},a_{-\Gamma}\|_{r})$
	$\displaystyle=\sum_{e,x,o}\sum_{r\in\mathcal{R}_{(e,x,o)}}v_{r}\sum_{\begin{% subarray}{c}0\leq\alpha\leq e\\ 0\leq\beta\leq o\\ \alpha+\beta\leq\zeta\end{subarray}}{e\choose\alpha}{o\choose\beta}\binom{n-e-% o}{\zeta-\alpha-\beta}w(e+x+\beta-\alpha)$
	$\displaystyle=\sum_{e,x,o}\theta(e,x,o)\sum_{\begin{subarray}{c}0\leq\alpha% \leq e\\ 0\leq\beta\leq o\\ \alpha+\beta\leq\zeta\end{subarray}}{e\choose\alpha}{o\choose\beta}\binom{n-e-% o}{\zeta-\alpha-\beta}w(e+x+\beta-\alpha)$

where the set of coalitions $\mathcal{C}_{\zeta}$ was partitioned according to the action profile of the agents in each coalition. We let $\alpha$ denote the number of agents in $\Gamma$ that utilize resource $r$ only in joint action $a$ and $\beta$ the number of agents in $\Gamma$ that utilize $r$ only in joint action $a^{\prime}$ . By simple counting arguments, there are exactly ${e\choose\alpha}{o\choose\beta}\binom{n-e-o}{\zeta-\alpha-\beta}$ coalitions grouped with the same $\alpha$ and $\beta$ . This decomposition is possible as the number of agents utilizing resource $r$ after a group $\Gamma$ deviates is precisely $e+x+\beta-\alpha$ .

The smoothness constraint (5) is satisfied only if

\frac{1}{\binom{n}{\zeta}}\sum_{e,x,o}\theta(e,x,o)\sum_{\begin{subarray}{c}0% \leq\alpha\leq e\\ 0\leq\beta\leq o\\ \alpha+\beta\leq\zeta\end{subarray}}{e\choose\alpha}{o\choose\beta}\binom{n-e-% o}{\zeta-\alpha-\beta}w(e+x+\beta-\alpha)\\ \geq\lambda_{\zeta}\sum_{e,x,o}\theta(e,x,o)w(o+x)-\mu_{\zeta}\sum_{e,x,o}% \theta(e,x,o)w(e+x).

As $\theta(e,x,o)\geq 0$ for all $(e,x,o)\in\mathcal{I}$ , it is sufficient to satisfy

\frac{1}{\binom{n}{\zeta}}\sum_{\begin{subarray}{c}0\leq\alpha\leq e\\ 0\leq\beta\leq o\\ \alpha+\beta\leq\zeta\end{subarray}}{e\choose\alpha}{o\choose\beta}\binom{n-e-% o}{\zeta-\alpha-\beta}w(e+x+\beta-\alpha)\\ \geq\lambda_{\zeta}w(o+x)-\mu_{\zeta}w(e+x),\quad\forall(e,x,o)\in\mathcal{I}.

(9)

Observe that (9) is independent of $a$ , $a^{\prime}$ , and $G$ . As such, this set of constraints serves as a sufficient condition that any $G\in\mathcal{G}_{n}$ satisfies (5) for all respective $a,a^{\prime}\in\mathcal{A}$ .

To find parameters $\lambda_{\zeta}$ and $\mu_{\zeta}$ that provide the best $k$ -strong price of anarchy guarantee, we formulate the following optimization problem:

	$\displaystyle\max_{\lambda_{\zeta},\mu_{\zeta}\geq 0}\quad$	$\displaystyle\frac{\lambda_{\zeta}}{1+\mu_{\zeta}}$		(P1 $\zeta$ )
	$\displaystyle{\rm s.t.}\quad$	$\displaystyle\leavevmode\nobreak\ \leavevmode\nobreak\ \eqref{eq:param_constraint}$

We restrict $\lambda_{\zeta}$ to be non-negative, though this constraint is not active except in degenerate cases. Finally, we transform (P1 $\zeta$ ) by substituting new decision variables $\rho=(1+\mu_{\zeta})/\lambda_{\zeta}$ and $\nu=1/\left(\binom{n}{\zeta}\lambda_{\zeta}\right)\geq 0$ . The new objective becomes $1/\rho$ . Note that the constraint $(e,x,o)=(1,0,0)$ implies $\rho\geq 0$ ; we can thus invert the objective and change the minimization to a maximization, giving (P $\zeta$ ). ∎

$\displaystyle P^{\star}=$	$\displaystyle\min_{\rho,\{\nu_{\zeta}\geq 0\}_{\zeta\in[k]}}$	$\displaystyle\rho$
	$\displaystyle\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ % \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ {\rm s.t.}$	$\displaystyle 0\geq w(o+x)-\rho w(e+x)+\sum_{\zeta\in[k]}\nu_{\zeta}\left(% \binom{n}{\zeta}w(e+x)-\sum_{\begin{subarray}{c}0\leq\alpha\leq e\\ 0\leq\beta\leq o\\ \alpha+\beta\leq\zeta\end{subarray}}{e\choose\alpha}{o\choose\beta}\binom{n-e-% o}{\zeta-\alpha-\beta}w(e+x+\beta-\alpha)\right)$
$\displaystyle\hskip 303.53377pt\forall(e,x,o)\in\mathcal{I}$			(P $[k]$ )

The smoothness parameters found via Proposition 3.2 can be used with Proposition 3.1 to generate lower bounds on the $k$ -strong price of anarchy. However, these bounds need not be tight, i.e., there may be no system in the class $\mathcal{G}_{n}\times\{w\}$ that attains this inefficiency, and better bounds may be possible. To study what efficiency we can guarantee across a class of resource allocation problems, we define the $k$ -strong price of anarchy bound for $(\mathcal{G}_{n},w)$ as

\mathrm{SPoA}_{k}(\mathcal{G}_{n},w)=\min_{G\in\mathcal{G}_{n}}\mathrm{SPoA}_{% k}(G,w).

(10)

This performance ratio is parameterized by our choice of welfare function $w$ and the size of collaborative coalitions $k$ . In Theorem 3.3, we provide a linear program whose value provides an exact value of $\mathrm{SPoA}_{k}(\mathcal{G}_{n},w)$ . We do this by showing that the constraints of the $k$ linear programs in Proposition 3.2 can be combined to give an exact quantification of the $k$ -strong price of anarchy bound.

Theorem 3.3.

For the class of resource allocation problems $\mathcal{G}_{n}$ with welfare function $w$ , when groups maximize the common interest welfare, then

\mathrm{SPoA}_{k}(\mathcal{G}_{n},w)=1/P^{\star}(n,w,k),

(11)

where $P^{\star}(n,w,k)$ is the solution to (P $[k]$ ).

The proof appears in the appendix.

In Fig. 3, we consider four welfare functions and plot the tight bounds on the $k$ -strong price of anarchy for $1\leq k\leq n$ . As expected, we observe that increased communication improves efficiency guarantees; the amount of this increase is useful in determining the benefits of inter-agent communication/collaboration. However, this collaboration comes at a cost; in Section 4, we will study the complexity of distributed dynamics reaching $k$ -strong Nash equilibria.

4 Coalitional Dynamics

Section 3 provided several tools for quantifying the efficiency guarantees of $k$ -strong Nash equilibria. In this section, we will study the qualities of group-based dynamics that reach these equilibria. In particular, we will discuss the convergence rate and transient performance when agents follow the Coalitional round-robin and Asynchronous Best Response, respectively. We will denote $a^{t}$ as the joint action occurring at time $t\in\mathbb{N}$ and $\Gamma^{t}\subseteq N$ as the group of agents updating their action at time $t$ .

4.1 Round Robin

We first consider the $k$ -coalitional round robin agent dynamics, in which each group of $k$ agents updates their actions sequentially, following a set order $\sigma\in\Sigma_{\binom{n}{k}}$ , where $\sigma(z)$ for $z\in\{1,\ldots,\binom{n}{k}\}$ is the index of a group $\Gamma\in\mathcal{C}_{[k]}$ . We will call a round one pass through $\sigma$ in which each group updates their action. At their turn, the group $\Gamma^{t}$ selects their best response to the current action, i.e., $a_{\Gamma^{t}}^{t+1}\in\operatorname*{arg\,max}_{a_{\Gamma^{t}}\in\mathcal{A}_% {\Gamma^{t}}}W(a_{\Gamma^{t}},a_{-\Gamma^{t}})$ , where ties are broken uniformly at random unless $a_{\Gamma^{t}}^{t}\in\operatorname*{arg\,max}_{a_{\Gamma^{t}}\in\mathcal{A}_{% \Gamma^{t}}}W(a_{\Gamma^{t}},a_{-\Gamma^{t}})$ , in which case the group selects their current action $a_{\Gamma^{t}}^{t+1}=a_{\Gamma^{t}}^{t}$ . The dynamics are more formally described in Algorithm 1.

Algorithm 1

k

-Round-Robin Dynamics

procedure

k

RoundRobin(

W,\mathcal{A},N,\sigma,a

)

\overline{a}\leftarrow

NULL

while

\overline{a}\neq a

\overline{a}\leftarrow a

for

z\in\{1,\ldots,\binom{n}{k}\}

\Gamma\leftarrow\mathcal{C}_{k}(\sigma(z))

\triangleright

Get group

for

a_{\Gamma}^{+}\in\mathcal{A}_{\Gamma}\setminus a_{\Gamma}

\triangleright

Group deviations

W(a_{\Gamma}^{+},a_{-\Gamma})>W(a)

then

a\leftarrow(a_{\Gamma}^{+},a_{-\Gamma})

These dynamics are synchronous (in that agents must follow a set order) but provide an understanding of how groups of agents can make decisions in a localized manner, and we can analyze the equilibrium hitting time. In the fully distributed setting ( $k=1$ ), it has been shown that these dynamics reach a Nash equilibrium in finite time and require $\mathcal{O}(m^{n})$ welfare evaluations [43]. In Proposition 4.1, we find that in the coalitional settings, we maintain the finite convergence time and incur a small base exponential gain in the number of welfare comparisons required. Recent work has shown that the examples that realize these worst-case hitting times are fragile and that equilibria can be computed in polynomial-time under smoothed running-time analysis [44]. As a first step, we consider the worst-case run time, but the authors believe that similar findings on the added complexity of group decision-making will hold under smoothed running-time analysis, though this is the subject of ongoing work. Recall $n=|N|$ and $m:=\max_{i\in N}|\mathcal{A}_{i}|$ .

Proposition 4.1.

The $k$ -Coalitional-Round-Robin dynamics converge in finite time and requires $\mathcal{O}\left(m^{n}\left(\frac{1}{1-\nicefrac{{1}}{{m}}}\right)^{k}\right)$ welfare evaluations.

Proof.

First, we verify that the output of Algorithm 1 is a $k$ -strong Nash equilibrium, then we consider how long it takes Algorithm 1 to converge. Algorithm 1 terminates after a round in which no group $\Gamma\in\mathcal{C}_{k}$ can select a new action in which the welfare increases, i.e., $W(a)\geq W(a_{\Gamma},a_{-\Gamma})$ for all $a_{\Gamma}\in\mathcal{A}_{\Gamma}$ and $\Gamma\in\mathcal{C}_{k}$ where $a$ is the output of Algorithm 1. A deviation for a any subgroup $\Gamma^{\prime}\in\mathcal{C}_{[k]}$ is subsumed by the joint action $(a_{\Gamma^{\prime}},a_{\Gamma\setminus\Gamma^{\prime}})\in\mathcal{A}_{\Gamma}$ . As such, a state $a$ terminates Algorithm 1 if and only if it satisfies (3) and is a $k$ -strong Nash equilibrium.

Without loss of generality, we assume each agent possesses $m$ actions; for each agent, $i$ that has fewer actions, assign $m-|\mathcal{A}_{i}|$ dummy actions with minimum welfare. In one round of the $k$ -Round-Robin dynamics, each group of agents is given the opportunity to deviate their action. First, we note that no group $\Gamma$ will respond to the same complimentary group action $a_{-\Gamma}$ in two consecutive rounds unless $a$ is a $k$ -strong Nash equilibrium. If the group $\Gamma$ rejects a group action $a_{\Gamma}$ in response to $a_{-\Gamma}$ , the joint action $(a_{\Gamma},a_{-\Gamma})$ is eliminated from consideration as an output of Algorithm 1. Accounting for overlaps between the groups, in any round that does not start in a $k$ -strong Nash equilibrium, at least $y=\sum_{\zeta=1}^{k}\binom{n}{\zeta}(m-1)^{\zeta},$ joint actions are eliminated as possible outputs of Algorithm 1. As there are $m^{n}$ joint actions in total, there can be at most $r\leq\lfloor\frac{m^{n}}{y}\rfloor+1$ rounds that do not start in a $k$ -strong Nash equilibrium; this proves the finite convergence time. In each round, there are exactly $\binom{n}{k}m^{k}$ welfare checks; thus, the total number of welfare checks is no more than $(\frac{m^{n}}{y}+1)\binom{n}{k}m^{k}$ . Removing lower order terms from $y$ gives the stated bound. ∎

From Proposition 4.1, we observe two things: 1) the coalitional dynamics do not require drastically more welfare evaluations than the fully distributed round-robin, but 2) the convergence rate is slow regardless of $k$ . In light of this, we turn our focus to understanding the transient performance of collaborative decision-making dynamics. Further, in many settings, it is desirable to allow agents or groups to update their actions asynchronously. In Section 4.2, we will consider both of these factors in the asynchronous best response dynamics.

4.2 Asynchronous Best-Response Dynamics

Motivated by settings where agents (or groups of agents) perform action revisions asynchronously or on their own time scales, we consider a dynamical system where the next group of agents to update is random.

We define the Asynchronous $k$ -Coalitional Best-Response Dynamics as follows: let $t\geq 0$ denote the number of agent (or group) updates that have yet occurred¹¹1Counting time steps in terms of the number of updates subsumes cases where agents (or groups) update with respect to individual and independent random clocks. The rate of each clock is analogous to the selection probability for different groups.. The updating group $\Gamma^{t}$ is selected at random, such that the size of the group $\zeta$ is picked with probability $p_{\zeta}=\mathbb{E}[|\Gamma^{t}|=\zeta]$ and the specific agents in the group are drawn uniformly at random. Once formed, the updating group, $\Gamma^{t}$ , chooses their best response in the same manner as the coalitional round robin described in Section 4.1.

From their distributed decision-making and asynchronicity, these dynamics capture the behavior of real-time multi-agent systems components. In Theorem 4.2, we show these dynamics converge almost surely to a $k$ -strong Nash equilibrium, and further, if the system is $(\lambda,\mu)$ - $k$ -coalitionally smooth, we provide a bound on the cumulative welfare relative to the optimal.

Theorem 4.2.

The Asynchronous $k$ -Coalitional Best-Response Dynamics converge almost surely to the set of $k$ -strong Nash equilibrium. Further, if $(G,W)$ is a $(\lambda,\mu)$ - $k$ -coalitionally smooth system, then after $T\geq 1$ update steps, the cumulative expected welfare satisfies

\mathbb{E}\left[\frac{1}{T}\sum_{t=1}^{T}W(a^{t})\right]\geq\frac{T-1}{2T}% \frac{\sum_{\zeta=1}^{k}p_{\zeta}\lambda_{\zeta}}{1+\sum_{\zeta=1}^{k}p_{\zeta% }\mu_{\zeta}}W(a^{\rm opt}),

(12)

where $p_{\zeta}$ is the probability a group of size $\zeta$ best responds.

Interestingly, the bound on the average transient welfare depends on how frequently groups of different sizes are sampled to perform their best response. When the agents are designed to more regularly collaborate in larger groups, the transient guarantee will often be better.

Proof.

First, we show that the Asynchronous $k$ -Coalitional Best-Response Dynamics converges in general. A group $\Gamma$ revises their action only to one of strictly higher payoff if one exists. Consider the resulting Markov chain $\mathcal{M}$ with states $\mathcal{A}$ . Any state $a\in\mathcal{A}\setminus k{\rm SNE}$ has an outgoing edge with positive probability as there exists some group $\Gamma\in\mathcal{C}_{[k]}$ that is selected with probability $p_{|\Gamma|}/|\mathcal{C}_{|\Gamma|}|>0$ which would revise their action. Any state $a\in k{\rm SNE}$ has no outgoing edges with positive probability as no group $\Gamma\in\mathcal{C}_{[k]}$ can revise their action to strictly increase the welfare. Finally, there are no cycles (excluding self-loops) in $\mathcal{M}$ , as every outgoing edge is directed from a joint action of lower welfare to one of strictly higher welfare. As such, the set $k{\rm SNE}$ is absorbing and $\mathbb{P}[\lim_{t\rightarrow\infty}a^{t}\in k{\rm SNE}]=1$ .

Now, consider that the system $(G,W)$ is $(\lambda,\mu)$ - $k$ -coalitionally smooth. As the selection of the updating group is random, the welfare at time $t+1$ is a random variable, even when conditioned on $a^{t}$ ; the expectation of the succeeding welfare can be written

	$\displaystyle\mathbb{E}[W(a^{t+1})\mid$	$\displaystyle a^{t}=a]=\sum_{\zeta=1}^{k}p_{\zeta}\sum_{\Gamma\in\mathcal{C}_{% \zeta}}\frac{1}{\binom{n}{\zeta}}W(a_{\Gamma}^{+},a_{-\Gamma})$
		$\displaystyle\geq\sum_{\zeta=1}^{k}p_{\zeta}\sum_{\Gamma\in\mathcal{C}_{\zeta}% }\frac{1}{\binom{n}{\zeta}}W(a^{\rm opt}_{\Gamma},a_{-\Gamma})$
		$\displaystyle\geq\sum_{\zeta=1}^{k}p_{\zeta}\left(\lambda_{\zeta}W(a^{\rm opt}% )-\mu_{\zeta}W(a)\right)$
		$\displaystyle=\left(\sum_{\zeta=1}^{k}p_{\zeta}\lambda_{\zeta}\right)W(a^{\rm opt% })-\left(\sum_{\zeta=1}^{k}p_{\zeta}\mu_{\zeta}\right)W(a),$

where $a_{\Gamma}^{+}\in\operatorname*{arg\,max}_{a_{\Gamma}\in\mathcal{A}_{\Gamma}}W% (a_{\Gamma},a_{-\Gamma})$ is the update state for the group $\Gamma$ following the dynamics; the welfare for each possible updated joint action is the same, so determining which group action is selected is irrelevant. As $a_{\Gamma}^{+}$ is a best response, the welfare is no better for selecting a different action, namely $a^{\rm opt}_{\Gamma}$ . The final inequality holds from (5). Taking the expectation of $\mathbb{E}[W(a^{t+1})\mid a^{t}=a]$ over $a^{t}$ gives

\mathbb{E}\left[W(a^{t+1})\right]\geq\left(\sum_{\zeta=1}^{k}p_{\zeta}\lambda_% {\zeta}\right)W(a^{\rm opt})-\left(\sum_{\zeta=1}^{k}p_{\zeta}\mu_{\zeta}% \right)\mathbb{E}\left[W(a^{t})\right].

Rearranging terms shows

\mathbb{E}\left[W(a^{t+1})\right]-\frac{\sum_{\zeta=1}^{k}p_{\zeta}\lambda_{% \zeta}}{1+\sum_{\zeta=1}^{k}p_{\zeta}\mu_{\zeta}}W(a^{\rm opt})\\ \geq\left(\sum_{\zeta=1}^{k}p_{\zeta}\mu_{\zeta}\right)\left(\frac{\sum_{\zeta% =1}^{k}p_{\zeta}\lambda_{\zeta}}{1+\sum_{\zeta=1}^{k}p_{\zeta}\mu_{\zeta}}W(a^% {\rm opt})-\mathbb{E}\left[W(a^{t})\right]\right).

Observe that either $\mathbb{E}\left[W(a^{t})\right]\geq\frac{\sum_{\zeta=1}^{k}p_{\zeta}\lambda_{% \zeta}}{1+\sum_{\zeta=1}^{k}p_{\zeta}\mu_{\zeta}}W(a^{\rm opt})$ or $\mathbb{E}\left[W(a^{t+1})\right]-\frac{\sum_{\zeta=1}^{k}p_{\zeta}\lambda_{% \zeta}}{1+\sum_{\zeta=1}^{k}p_{\zeta}\mu_{\zeta}}W(a^{\rm opt})\geq 0$ . Accordingly, in expectation, every other update must satisfy the bound, giving the average cumulative welfare bound in (12). ∎

Theorem 4.2 shows that the transient efficiency changes with the frequency with which different group sizes perform best responses. To attain the best transient guarantee, we can select $p$ carefully.

Corollary 1.

If a system $(G,w)$ is a resource allocation problem in $\mathcal{G}_{n}\times\{w\}$ , then selecting $p_{\zeta}\propto\frac{\nu_{\zeta}^{\star}}{\sum_{\psi\in[k]}\binom{n}{\psi}\nu% _{\psi}^{\star}}$ for all $\zeta\in[k]$ gives

\mathbb{E}\left[\frac{1}{T}\sum_{t=1}^{T}W(a^{t})\right]\geq\frac{T-1}{2T}% \mathrm{SPoA}_{k}(\mathcal{G}_{n},w)W(a^{\rm opt}).

The proof is omitted as it is straightforward by rearranging terms in the constraints of (D).

Together, Theorem 4.2 and Corollary 1 provide insight into the transient performance of non-deterministic multi-agent dynamical systems with collaborative communication. Future work will study the traits of non-best-response dynamics, namely regret-based decision-making.

4.3 Numerical Example

We support the findings of Section 4.2 by numerical example. We randomly generate resource allocation problems and simulate the coalitional asynchronous best response dynamics when groups of size $k\in\{1,2,3,4,5\}$ update.

The resource allocation problems are generated by creating 100 resources with values independently drawn uniformly at random on $[0,1]$ . Each of the 25 agents is endowed with between 1 and 10 actions (also sampled uniformly at random). For each action of each player, each resource is included in that particular action with probability 0.25. This defines a tuple $G$ . We use the local welfare function $w(x)=xe^{-x/5}$ to capture some added benefit from having multiple agents use the same resource and eventual diminishing returns and increased cost from over congestion.

We select a random initial condition and run the asynchronous best response dynamics with $p_{k}=1$ for one value $k\in\{1,2,3,4,5\}$ (i.e., only groups of exactly size $k$ are sampled, but the simulation is repeated for $1\leq k\leq 5$ . We ran this simulation 100 times.

In Fig. 4(a), we plot the average welfare across the simulations over the number of group action revisions. We observe that the larger coalitions provide superior transient and long-run performance. However, a single group action revision requires more computation for larger coalitions. In Fig. 4(b), for each coalition size $k\in\{1,\ldots,5\}$ , we show a scatter plot of the number of cumulative welfare evaluations and the attained system welfare, along with a trend line fit to the data within two standard deviations of the average number of welfare evaluations. Here, we observe that for lower values of welfare, the smaller coalitions can attain similar welfare with fewer welfare evaluations but that the larger coalitions reach higher welfare much more regularly.

These conclusions help to identify the trade-off in designing systems with collaborative communication: better performance is attainable at the cost of greater computation.

5 Utility Design

Up until this point, agents and groups of agents have been set to optimize the system welfare $W$ over their respective individual or group actions. Though this is a reasonable approach, the system designer may seek to further improve system performance by designing how a group of agents makes a decision. Consider that groups of agents instead maximize the objective function $U:\mathcal{A}\rightarrow\mathbb{R}_{\geq 0}$ (henceforth referred to as the utility function), i.e.,

a_{\Gamma}\in\operatorname*{arg\,max}_{a_{\Gamma}^{\prime}\in\mathcal{A}_{% \Gamma}}U(a_{\Gamma}^{\prime},a_{-\Gamma}),

(13)

where ties are still broken at random unless the current group action is in the argmax. By designing the utility function $U$ , the system operator can alter how groups of agents make decisions and, ideally, improve the performance of the system. A multi-agent system is now captured by the tuple $(G,W,U)$ , where the previous results are the special case when $U=W$ .

By redefining the objective functions groups of agents seek to maximize, we additionally alter the equilibria that emerge from collaborative decision-making. We alter the definition of $k$ -strong Nash equilibria to hold with respect to the utility function, i.e.,

U(a^{k{\rm SNE}})\geq U(a^{\prime}_{\Gamma},a^{k{\rm SNE}}_{-\Gamma}),% \leavevmode\nobreak\ \forall a^{\prime}_{\Gamma}\in\mathcal{A}_{\Gamma},% \leavevmode\nobreak\ \Gamma\in\mathcal{C}_{[k]}.

(14)

Let $k{\rm SNE}(G,U)$ denote the set of $k$ -strong Nash equilibria when agents optimize the objective $U$ . The new set of equilibria implies the equilibrium performance guarantee may also change. As such, we redefine the $k$ -strong price of anarchy as the approximation of the optimal welfare provided the system equilibria under objective function $U$ ,

\mathrm{SPoA}_{k}(G,W,U)=\frac{\min_{a^{k{\rm SNE}}\in k{\rm SNE}(G,U)}W(a^{k{% \rm SNE}})}{\max_{a^{\rm opt}\in\mathcal{A}}W(a^{\rm opt})}.

(15)

With this new design opportunity, we identify two goals in understanding the new attainable performance of collaborative decision-making: 1) quantifying the performance of a prescribed utility function, and 2) finding a utility function that provides the greatest $k$ -strong price of anarchy guarantees. We address these two points in general in Section 5.1 and more thoroughly within resource allocation problems in Section 5.2.

5.1 Generalized Coalitionally Smooth Games

In this section, we consider the general setting and particularly focus on quantifying the $k$ -strong price of anarchy of a system $(G,W,U)$ . As in Section 3.1, we introduce a notion of smooth systems now generalized to the setting where the agent objective $U$ differs from the system objective $W$ .

Definition 3.

A system $(G,W,U)$ is $(\lambda,\mu)$ - $k$ -generalized-coalitionally smooth, where $\lambda,\mu\in\mathbb{R}_{\geq 0}^{k}$ , if for all $a,a^{\prime}\in\mathcal{A}$

\frac{1}{\binom{n}{\zeta}}\sum_{\Gamma\in\mathcal{C}_{\zeta}}U(a_{\Gamma}^{% \prime},a_{-\Gamma})-U(a)+W(a)\\ \geq\lambda_{\zeta}W(a^{\prime})-\mu_{\zeta}W(a),\leavevmode\nobreak\ \forall% \zeta\in[k].

(16)

Like (5), (16) provides a bound on average deviation effect of a group of size $\zeta$ but on the utility function instead of the welfare. In Proposition 5.1, we show that $(\lambda,\mu)$ - $k$ -generalized-coalitionally smooth system permits a bound on the $k$ -strong price of anarchy.

Proposition 5.1.

A system $(G,W,U)$ that is $(\lambda,\mu)$ - $k$ -generalized-coalitionally smooth has $k$ -strong price of anarchy satisfying

\mathrm{SPoA}_{k}(G,W,U)\geq\frac{\lambda_{\zeta}}{1+\mu_{\zeta}},\leavevmode% \nobreak\ \forall\zeta\in[k].

(17)

Proof.

Let $a^{k{\rm SNE}}\in\mathcal{A}$ denote a $k$ -strong Nash equilibrium when agents follow objective function $U$ , and let $a^{\rm opt}\in\operatorname*{arg\,max}_{a\in\mathcal{A}}W(a)$ denote an optimal joint action. For any $\zeta\in[k]$ , we have


$\displaystyle W(a^{k{\rm SNE}})$	$\displaystyle\geq\frac{1}{\binom{n}{\zeta}}\sum_{\Gamma\in\mathcal{C}_{\zeta}}% U(a^{\rm opt}_{\Gamma},a^{k{\rm SNE}}_{-\Gamma})-U(a^{k{\rm SNE}})+W(a^{k{\rm SNE% }})$	(18a)
	$\displaystyle\geq\lambda_{\zeta}W(a^{\rm opt})-\mu_{\zeta}W(a^{k{\rm SNE}}).$	(18b)

Where (18a) holds from $\frac{1}{\binom{n}{\zeta}}\sum_{\Gamma\in\mathcal{C}_{\zeta}}U(a^{\rm opt}_{% \Gamma},a^{k{\rm SNE}}_{-\Gamma})-U(a^{k{\rm SNE}})\geq 0$ by $a^{k{\rm SNE}}$ being a $k$ -strong Nash equilibrium and (7c) holds from Definition 3. Rearranging, we get $W(a^{k{\rm SNE}})/W(a^{\rm opt})\geq\lambda_{\zeta}/(1+\mu_{\zeta})$ . ∎

Beyond quantifying the $k$ -strong price of anarchy for a system $(G,W,U)$ , one may wish to find the utility function which provides the best efficiency guarantee, i.e.,

U\in\operatorname*{arg\,max}_{U^{\prime}:\mathcal{A}\rightarrow\mathbb{R}_{% \geq 0}}\mathrm{SPoA}_{k}(G,W,U^{\prime}).

For a specific problem $(G,W)$ , it is possible to design a utility function which guarantees that a system optimal $a^{\rm opt}$ is a unique equilibrium and provides $\mathrm{SPoA}_{k}(G,W,U)=1$ (e.g., $U(a)=\sum_{i\in N}{\mathds{1}}[a_{i}=a^{\rm opt}_{i}]$ ). However, this would require knowing the optimal allocations a priori, which poses several problems, including: 1) computing an optimal allocation can be intractable, and 2) system parameters may be subject to modeling errors, noise, or changes over time, causing the optimal allocations to change. As such, we will consider the design of utility rules, which provide a set of instructions to construct a utility function across a class of systems and eliminate the computational burden of solving for a new utility function for each system while maintaining improved performance guarantees. Luckily, the approach in Proposition 5.1 is amenable to generating performance guarantees across a class of systems, and in Section 5.2, we will investigate optimal utility rules more thoroughly in resource allocation problems.

5.2 Resource Allocation Games

In this section, we consider the $k$ -strong price of anarchy in classes of resource allocation problems when the agents’ objective is derived from a utility rule $u\in\mathbb{R}^{n+1}_{\geq 0}$ . In an agent environment $G=(N,\mathcal{A},\mathcal{R},\{v_{r}\}_{r\in\mathcal{R}})$ , the utility rule $u$ can be applied to derive the utility function

U(a)=\sum_{r\in\mathcal{R}}v_{r}u(|a|_{r}).

To normalize the utility function, we set $u(0)=0$ . We ultimately consider the performance of a utility rule $u$ across all agent environments $G\in\mathcal{G}_{n}$ with welfare function $w$ . We slightly abuse notation to refer to a system by the tuple $(G,w,u)$ . To quantify this performance, we generalize the $k$ -strong price of anarchy bound defined in Section 4.1 to hold for cases where groups of agents optimize the utility function.

\mathrm{SPoA}_{k}(\mathcal{G}_{n},w,u)=\min_{G\in\mathcal{G}_{n}}\mathrm{SPoA}% _{k}(G,w,u).

(19)

The performance ratio is parameterized by the pair $(w,u)$ ; as such, we will discuss the effectiveness of a utility rule $u$ with respect to a given welfare function $w$ .

Taking the utility rule approach completely eliminates the computational cost of deriving a utility function for each problem instance; now we seek to understand the capabilities of this approach in two ways: 1) in Theorem 5.2 we demonstrate how we can construct utility rules with good performance guarantees, and 2) in Proposition 5.1 we provide an upper bound on the best attainable performance a utility rule can provide. In Corollary 2, we provide a formal condition on when the constructed utility rule is optimal.

Theorem 5.2.

Any resource allocation problem $(G,W)\in\mathcal{G}_{n}\times\{w\}$ with the utility rule $\widetilde{u}_{\zeta}$ is $(1,\widetilde{\rho}_{\zeta}-1)$ - $k$ -generalized-coallitionally smooth, where $\widetilde{u}_{\zeta}$ and $\widetilde{\rho}_{\zeta}$ are solutions to the linear program,

	$\displaystyle(\widetilde{\rho}_{\zeta},\widetilde{u}_{\zeta})\in\operatorname*% {arg\,min}_{\rho\geq 0,u\in\mathbb{R}^{n+1}_{\geq 0}}\leavevmode\nobreak\ % \leavevmode\nobreak\ \leavevmode\nobreak\ \rho$
	$\displaystyle{\rm s.t.}\hskip 8.0pt0\geq w(o+x)-\rho w(e+x)+$
	$\displaystyle\leavevmode\nobreak\ \leavevmode\nobreak\ \left(\binom{n}{\zeta}u% (e+x)-\sum_{\begin{subarray}{c}0\leq\alpha\leq e\\ 0\leq\beta\leq o\\ \alpha+\beta\leq\zeta\end{subarray}}{e\choose\alpha}{o\choose\beta}\binom{n-e-% o}{\zeta-\alpha-\beta}u(e+x+\beta-\alpha)\right)$
	$\displaystyle\hskip 160.0pt\forall(e,x,o)\in\mathcal{I}.$		(Q $\zeta$ )

Proof.

Consider the parameterization described in the proof of Proposition 3.2, where for any two actions $a,a^{\prime}\in\mathcal{A}$ , we can rewrite $W(a)=\sum_{e,x,o}\theta(e,x,o)w(e+x)$ and $W(a^{\prime})=\sum_{e,x,o}\theta(e,x,o)w(o+x)$ . Now, we can additionally rewrite $U(a)=\sum_{e,x,o}\theta(e,x,o)u(e+x)$ and

\sum_{\Gamma\in\mathcal{C}_{\zeta}}W(a^{\prime}_{\Gamma},a_{-\Gamma})\\ =\sum_{e,x,o}\theta(e,x,o)\sum_{\begin{subarray}{c}0\leq\alpha\leq e\\ 0\leq\beta\leq o\\ \alpha+\beta\leq\zeta\end{subarray}}{e\choose\alpha}{o\choose\beta}\binom{n-e-% o}{\zeta-\alpha-\beta}w(e+x+\beta-\alpha).

We can now write out (16), the $(\lambda,\mu)$ - $k$ -generalized-coalitionally smooth constraint, as

\sum_{e,x,o}\theta(e,x,o)\Bigg{(}\frac{1}{\binom{n}{\zeta}}\sum_{\begin{% subarray}{c}0\leq\alpha\leq e\\ 0\leq\beta\leq o\\ \alpha+\beta\leq\zeta\end{subarray}}{e\choose\alpha}{o\choose\beta}\binom{n-e-% o}{\zeta-\alpha-\beta}u(e+x+\beta-\alpha)\\ -u(e+x)\Bigg{)}\geq\sum_{e,x,o}\theta(e,x,o)\left(\lambda_{\zeta}w(o+x)-(\mu_{% \zeta}+1)w(e+x)\right).

As before, we can observe that this constraint is sufficiently satisfied when

\frac{1}{\binom{n}{\zeta}}\sum_{\begin{subarray}{c}0\leq\alpha\leq e\\ 0\leq\beta\leq o\\ \alpha+\beta\leq\zeta\end{subarray}}{e\choose\alpha}{o\choose\beta}\binom{n-e-% o}{\zeta-\alpha-\beta}u(e+x+\beta-\alpha)-u(e+x)\\ \geq\lambda_{\zeta}w(o+x)-(\mu_{\zeta}+1)w(e+x),\quad\forall(e,x,o)\in\mathcal% {I}.

(20)

The task of finding smoothness parameters that give the best price of anarchy guarantee becomes the same problem as (P1 $\zeta$ ) but now with constraint set (20). By substituting the decision variables $\rho=(1+\mu_{\zeta})/\lambda_{\zeta}$ and $\nu=1/\left(\binom{n}{\zeta}\lambda_{\zeta}\right)\geq 0$ , we attain the new constraint set

0\geq w(o+x)-\rho w(e+x)+\\ \nu\left(\binom{n}{\zeta}u(e+x)-\sum_{\begin{subarray}{c}0\leq\alpha\leq e\\ 0\leq\beta\leq o\\ \alpha+\beta\leq\zeta\end{subarray}}{e\choose\alpha}{o\choose\beta}\binom{n-e-% o}{\zeta-\alpha-\beta}u(e+x+\beta-\alpha)\right)\\ \forall(e,x,o)\in\mathcal{I}.

(21)

The new objective²²2As an aside, the transformed program up to this point can be used to evaluate the performance of a specified utility rule. becomes $1/\rho$ .

Finally, we let $u\in\mathbb{R}^{n}_{\geq 0}$ become a decision variable in the program. Observe that every occurrence of $u$ is multiplied by $\nu$ , and every occurrence of $\nu$ multiplies $u$ . As such, we can define the new decision variable $u^{\prime}=\nu u$ and retrieve the linear program (Q $\zeta$ ). ∎

The utility rule $\hat{u}_{\zeta}$ that (Q $\zeta$ ) provides us some guarantee on attainable performance from designing group decision-making in collaborative systems. However, it is not yet clear if these are the best possible utility rules. To understand what the best possible performance is of a collaborative system, we define the optimal $k$ -strong price of anarchy as

\mathrm{SPoA}_{k}^{\star}(\mathcal{G}_{n},w)=\sup_{u:[n]\rightarrow{R}_{\geq 0% }}\mathrm{SPoA}_{k}(\mathcal{G}_{n},w,u).

(22)

This upper bound informs us of what efficiency is possible to hope for out of a collaborative system. In Proposition 5.3, we bound this quantity.

Proposition 5.3.

For the class of resource allocation problems $\mathcal{G}_{n}\times\{w\}$ , when agents maximize the optimal utility design objective $u^{\star}$ ,

\mathrm{SPoA}_{k}^{\star}(\mathcal{G}_{n},w)\leq 1/Q^{\star}(n,w,k),

(23)

where $Q^{\star}(n,w,k)$ is value of the linear program

	$\displaystyle Q^{\star}(n,w,k)=\min_{\rho\geq 0,\{u_{\zeta}\in\mathbb{R}^{n+1}% _{\geq 0}\}_{\zeta\in[k]}}\leavevmode\nobreak\ \leavevmode\nobreak\ % \leavevmode\nobreak\ \rho$
	$\displaystyle{\rm s.t.}\hskip 8.0pt0\geq w(o+x)-\rho w(e+x)+$
	$\displaystyle\sum_{\zeta\in[k]}\left(\binom{n}{\zeta}u_{\zeta}(e+x)-\sum_{% \begin{subarray}{c}0\leq\alpha\leq e\\ 0\leq\beta\leq o\\ \alpha+\beta\leq\zeta\end{subarray}}{e\choose\alpha}{o\choose\beta}\binom{n-e-% o}{\zeta-\alpha-\beta}u_{\zeta}(e+x+\beta-\alpha)\right)$
	$\displaystyle\hskip 140.0pt\forall(e,x,o)\in\mathcal{I}.$		(Q $[k]$ )

The proof appears in the appendix.

Note that Theorem 5.2 provides a utility rule with associated performance guarantee which lower bounds $\mathrm{SPoA}_{k}^{\star}(\mathcal{G}_{n},w)$ , and Proposition 5.3 provides an upper bound. In Corollary 2, we note that when these two bounds match, we have a tight bound on $\mathrm{SPoA}_{k}^{\star}(\mathcal{G}_{n},w)$ as well as an optimal utility rule.

Corollary 2.

For the class of resource allocation problems $\mathcal{G}_{n}\times\{w\}$ , if the value of (Q $\zeta$ ) satisfies $\rho^{\star}_{\zeta}=Q^{\star}(n,w,k)$ , then $\mathrm{SPoA}_{k}^{\star}(\mathcal{G}_{n},w)=1/Q^{\star}(n,w,k)$ is a tight bound and a solution $\widetilde{u}_{\zeta}$ to (Q $\zeta$ ) is an optimal utility rule.

Proof.

This follows immediately from $1/\rho_{\zeta}^{\star}=\frac{\lambda_{\zeta}}{1+\mu_{\zeta}}$ being a lower bound on $\mathrm{SPoA}_{k}^{\star}(\mathcal{G}_{n},w)$ and the reciprocal of the value of (Q $[k]$ ), $1/Q^{\star}$ being an upper bound. When the two match, the bound must be tight. ∎

The two bounds coinciding is not guaranteed but does occur at the extremes ( $k=1$ and $k=n$ ); further, the gap between the two bounds (if present) is often small, and the lower bound attained by the utility rule constructed in Theorem 5.2 often demonstrates a significant improvement over the setting where agents simply optimize the system objective. Consider the four welfare functions from Fig. 3 again; for each, we find that the utility rule computed using Theorem 5.2 and the upper bound on $\mathrm{SPoA}_{k}^{\star}(\mathcal{G}_{n},w)$ using Proposition 5.3. In Fig. 5 we plot these lower and upper bounds on $\mathrm{SPoA}_{k}^{\star}(\mathcal{G}_{20},w)$ for each utility function and for each value of $1\leq k\leq n$ ; these values are juxtaposed with the $k$ -strong price of anarchy when agents optimize the system objective $w$ to demonstrate the possible gain in performance from designing the agents’ objective in collaborative systems.

6 Conclusion

In this work, we provided a variety of tools for evaluating the benefits and costs of collaborative communication in multi-agent systems. A collaborative multi-agent system was modeled by a common interest game where groups of players collaboratively perform their best responses simultaneously. We specifically considered the $k$ -strong Nash equilibrium as a relevant equilibrium concept to gain insights into system behavior between the fully centralized and fully distributed settings. We introduced the notion of $(\lambda,\mu)$ - $k$ -coalitionally smooth systems and derived bounds on how well the $k$ -strong Nash equilibrium approximates the optimum in such systems. Further analysis studied the running time of collaborative multi-agent decision dynamics and their transient performance, as well as the possible performance gains from designing agents’ objectives separately from the system objective. Finally, we underwent a more thorough study in the class of resource allocation games, in which we provided tractable linear programs whose solutions give tight bounds on the $k$ -strong price of anarchy in resource allocation games. Future work will study less extensive communication paradigms and dynamical systems that emerge when agents learn together.

References

[1] S. Wollenstein-Betech, A. Houshmand, M. Salazar, M. Pavone, C. G. Cassandras, and I. C. Paschalidis, “Congestion-aware Routing and Rebalancing of Autonomous Mobility-on-Demand Systems in Mixed Traffic,” in 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC), 2020, pp. 1–7.
[2] A. Khamis, A. Hussein, and A. Elmogy, “Multi-robot task allocation: A review of the state-of-the-art,” Cooperative robots and sensor networks 2015, pp. 31–51, 2015.
[3] V. Ranganathan, P. Kumar, U. Kaur, S. H. Li, T. Chakraborty, and R. Chandra, “Re-Inventing the Food Supply Chain with IoT: A Data-Driven Solution to Reduce Food Loss,” IEEE Internet of Things Magazine, vol. 5, no. 1, pp. 41–47, Mar. 2022.
[4] K. Tsakalozos, H. Kllapi, E. Sitaridi, M. Roussopoulos, D. Paparas, and A. Delis, “Flexible use of cloud resources through profit maximization and price discrimination,” in Proc. International Conference on Data Engineering, 2011, pp. 75–86.
[5] F. G. Filip, “Decision support and control for large-scale complex systems,” Annual Reviews in Control, vol. 32, no. 1, pp. 61–70, Apr. 2008.
[6] C. Daini, P. Goatin, M. L. D. Monache, and A. Ferrara, “Centralized Traffic Control via Small Fleets of Connected and Automated Vehicles,” in 2022 European Control Conference (ECC), Jul. 2022, pp. 371–376.
[7] L. Fang and H. Li, “Centralized resource allocation based on the cost–revenue analysis,” Computers & Industrial Engineering, vol. 85, pp. 395–401, Jul. 2015.
[8] G. Antonelli, “Interconnected dynamic systems: An overview on distributed control,” IEEE Control Systems Magazine, vol. 33, no. 1, pp. 76–88, 2013.
[9] J. R. Marden, Gü. Arslan, and J. S. Shamma, “Cooperative Control and Potential Games,” IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 39, no. 6, pp. 1393–1407, 2009.
[10] R. M. Murray, “Recent research in cooperative control of multivehicle systems,” Journal of Dynamic Systems, Measurement, and Control, vol. 129, no. 5, pp. 571–583, May 2007.
[11] A. Das, T. Gervet, J. Romoff, D. Batra, D. Parikh, M. Rabbat, and J. Pineau, “Tarmac: Targeted multi-agent communication,” in International Conference on Machine Learning. PMLR, 2019, pp. 1538–1546.
[12] B. L. Ferguson, D. Paccagnan, and J. R. Marden, “The cost of informing decision-makers in multi-agent maximum coverage problems with random resource values,” IEEE Control Systems Letters, vol. 7, pp. 2928–2933, 2023.
[13] D. D. Šiljak and A. I. Zečević, “Control of large-scale systems: Beyond decentralized feedback,” Annual Reviews in Control, vol. 29, no. 2, pp. 169–179, Jan. 2005.
[14] Z. Xu and V. Tzoumas, “Resource-Aware Distributed Submodular Maximization: A Paradigm for Multi-Robot Decision-Making,” in 2022 IEEE 61st Conference on Decision and Control (CDC), Dec. 2022, pp. 5959–5966.
[15] G. Orosz, “Connected cruise control: Modelling, delay effects, and nonlinear behaviour,” Vehicle System Dynamics, vol. 54, no. 8, pp. 1147–1176, 2016.
[16] H. Nawaz, H. M. Ali, and A. A. Laghari, “UAV Communication Networks Issues: A Review,” Archives of Computational Methods in Engineering, vol. 28, no. 3, pp. 1349–1369, May 2021.
[17] A. Lazaridou and M. Baroni, “Emergent multi-agent communication in the deep learning era,” arXiv preprint arXiv:2006.02419, 2020.
[18] R. J. Aumann, “Acceptable points in general cooperative n-person games,” Contributions to the Theory of Games, vol. 4, pp. 287–324, 1959.
[19] R. Nessah and G. Tian, “On the existence of strong Nash equilibria,” Journal of Mathematical Analysis and Applications, vol. 414, no. 2, pp. 871–885, 2014.
[20] N. Gatti, M. Rocco, and T. Sandholm, “On the verification and computation of strong Nash equilibrium,” arXiv preprint arXiv:1711.06318, 2017.
[21] R. Holzman and N. Law-Yone, “Strong equilibrium in congestion games,” Games and economic behavior, vol. 21, no. 1-2, pp. 85–101, 1997.
[22] T. Harks, M. Klimm, and R. H. Möhring, “Strong Nash equilibria in games with the lexicographical improvement property,” in Internet and Network Economics: 5th International Workshop, WINE 2009, Rome, Italy, December 14-18, 2009. Proceedings 5. Springer, 2009, pp. 463–470.
[23] J. B. Clempner and A. S. Poznyak, “Finding the strong nash equilibrium: Computation, existence and characterization for markov games,” Journal of Optimization Theory and Applications, vol. 186, pp. 1029–1052, 2020.
[24] A. Epstein, M. Feldman, and Y. Mansour, “Strong equilibrium in cost sharing connection games,” in Proceedings of the 8th ACM Conference on Electronic Commerce, ser. EC ’07. New York, NY, USA: Association for Computing Machinery, Jun. 2007, pp. 84–92.
[25] N. Andelman, M. Feldman, and Y. Mansour, “Strong price of anarchy,” Games and Economic Behavior, vol. 65, no. 2, pp. 289–317, Mar. 2009.
[26] J. Barreiro-Gomez, G. Obando, and N. Quijano, “Distributed Population Dynamics: Optimization and Control Applications,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, pp. 1–11, 2016.
[27] A. Fiat, H. Kaplan, M. Levy, and S. Olonetsky, “Strong price of anarchy for machine load balancing,” in ICALP, vol. 4596. Springer, 2007, pp. 583–594.
[28] L. Epstein and R. van Stee, “The price of anarchy on uniformly related machines revisited,” Information and Computation, vol. 212, pp. 37–54, 2012.
[29] S. Chien and A. Sinclair, “Strong and pareto price of anarchy in congestion games.” in ICALP (1). Citeseer, 2009, pp. 279–291.
[30] Y. Bachrach, V. Syrgkanis, É. Tardos, and M. Vojnović, “Strong Price of Anarchy, Utility Games and Coalitional Dynamics,” in Algorithmic Game Theory, ser. Lecture Notes in Computer Science, R. Lavi, Ed. Berlin, Heidelberg: Springer, 2014, pp. 218–230.
[31] M. Feldman and O. Friedler, “A unified framework for strong price of anarchy in clustering games,” in Automata, Languages, and Programming: 42nd International Colloquium, ICALP 2015, Kyoto, Japan, July 6-10, 2015, Proceedings, Part II. Springer, 2015, pp. 601–613.
[32] T. Roughgarden, “Intrinsic robustness of the price of anarchy,” Communications of the ACM, vol. 55, no. 7, pp. 116–123, 2012.
[33] M. Gairing, “Covering games: Approximation through non-cooperation,” in Internet and Economics, 2009, pp. 184–195.
[34] V. Bilò and C. Vinci, “Dynamic taxes for polynomial congestion games,” in EC 2016 - Proceedings of the 2016 ACM Conference on Economics and Computation. New York, New York, USA: ACM Press, 2016, pp. 839–856.
[35] D. Paccagnan, R. Chandan, and J. R. Marden, “Utility Design for Distributed Resource Allocation—Part I: Characterizing and Optimizing the Exact Price of Anarchy,” IEEE Transactions on Automatic Control, vol. 65, no. 11, pp. 4616–4631, Nov. 2020.
[36] R. Zhang, Y. Zhang, R. Konda, B. Ferguson, J. Marden, and N. Li, “Markov Games with Decoupled Dynamics: Price of Anarchy and Sample Complexity,” Apr. 2023.
[37] R. Konda, R. Chandan, D. Grimsman, and J. R. Marden, “Optimal Design of Best Response Dynamics in Resource Allocation Games,” Apr. 2022.
[38] B. L. Ferguson and J. R. Marden, “Robust Utility Design in Distributed Resource Allocation Problems with Defective Agents,” Dynamic Games and Applications, pp. 1–23, Aug. 2022.
[39] W. Saad, Z. Han, M. Debbah, A. Hjorungnes, and T. Basar, “Coalitional game theory for communication networks,” Ieee signal processing magazine, vol. 26, no. 5, pp. 77–97, 2009.
[40] H. Bayram and H. I. Bozma, “Multirobot communication network topology via centralized pairwise games,” in 2013 IEEE International Conference on Robotics and Automation. IEEE, 2013, pp. 2521–2526.
[41] D. Cappello and T. Mylvaganam, “Distributed differential games for control of multi-agent systems,” IEEE Transactions on Control of Network Systems, vol. 9, no. 2, pp. 635–646, 2021.
[42] A. Vetta, “Nash equilibria in competitive societies, with applications to facility location, traffic routing and auctions,” The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings., pp. 416–425, 2002.
[43] S. Durand and B. Gaujal, “Complexity and Optimality of the Best Response Algorithm in Random Potential Games,” in Algorithmic Game Theory, M. Gairing and R. Savani, Eds. Berlin, Heidelberg: Springer, 2016, pp. 40–51.
[44] Y. Giannakopoulos, “A Smoothed FPTAS for Equilibria in Congestion Games,” Jul. 2023.

Proof of Proposition 2.1: To show existence, we can simply observe that $a^{\rm opt}\in\operatorname*{arg\,max}_{a\in\mathcal{A}}W(a)$ is a $k$ -strong Nash equilibrium for any $k\in[n]$ . Because $W(a^{\rm opt})\geq W(a^{\prime})$ for all $a^{\prime}\in\mathcal{A}$ , the global optimal satisfies $W(a^{\rm opt})\geq W(a_{\Gamma}^{\prime},a^{\rm opt}_{-\Gamma}),\leavevmode% \nobreak\ \forall a_{\Gamma}^{\prime}\in\mathcal{A}_{\Gamma},\leavevmode% \nobreak\ \Gamma\in\mathcal{C}_{[k]}$ . ∎

Proof of Theorem 3.3: The proof can be outlined in four parts: first, the problem of finding $\mathrm{SPoA}_{k}(\mathcal{G}_{n},w)$ is transformed and relaxed; second, the parameterization used in the proof of Proposition 3.2 is used to turn the relaxed problem into a linear program. Next, an example is constructed to show the linear program provides a tight bound. Finally, we take the dual of said linear program.

$\displaystyle\max_{\theta\in\mathbb{R}^{\|\mathcal{I}\|}_{\geq 0}}\quad$	$\displaystyle\sum_{e,x,o}w(o+x)\theta(e,x,o)$
$\displaystyle\leavevmode\nobreak\ \leavevmode\nobreak\ {\rm s.t.}$	$\displaystyle\sum_{e,x,o}\left(\binom{n}{\zeta}w(e+x)-\sum_{\begin{subarray}{c% }0\leq\alpha\leq e\\ 0\leq\beta\leq o\\ \alpha+\beta\leq\zeta\end{subarray}}{e\choose\alpha}{o\choose\beta}\binom{n-e-% o}{\zeta-\alpha-\beta}w(e+x+\beta-\alpha)\right)\theta(e,x,o)\geq 0$
	$\displaystyle\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ % \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode% \nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ % \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode% \nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ % \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode% \nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ % \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode% \nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ % \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode% \nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ % \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode% \nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ % \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode% \nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ % \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode% \nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ % \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode% \nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ % \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode% \nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ % \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode% \nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ % \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode% \nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ % \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode% \nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ % \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \forall\zeta\in% \{1,\ldots,k\}$
	$\displaystyle\sum_{e,x,o}w(e+x)\theta(e,x,o)=1$	(D)

1) Relaxing the problem: Quantifying $\mathrm{SPoA}_{k}(\mathcal{G}_{n},w)$ can be expressed as taking the minimum $k$ -strong price of anarchy over all games in $\mathcal{G}_{n}$ , i.e.,

\displaystyle\underset{\displaystyle\scriptstyle G\in\mathcal{G}_{n}}{\mathrm{% min}}\quad\frac{\min_{a^{k{\rm SNE}}\in k{\rm SNE}(G)}W(a^{k{\rm SNE}})}{\max_% {a^{\rm opt}\in\mathcal{A}}W(a^{\rm opt})}\hfil\hfil\hfil\hfil

(D1)

To make this problem more approachable, we introduce several transformations and relaxations. First, rather than searching over the entire set of game $\mathcal{G}_{n}$ , we search over the set of games $\hat{\mathcal{G}}_{n}$ , in which each agent has exactly two actions. This reduction of the search space can be done without loss of generality, i.e., $\mathrm{SPoA}_{k}(\mathcal{G}_{n},w)=\mathrm{SPoA}_{k}(\hat{\mathcal{G}}_{n},w)$ . Trivially, $\hat{\mathcal{G}}_{n}\subset\mathcal{G}_{n}$ . Further, consider any game $G\in\mathcal{G}_{n}$ ; if for every player, each of their actions is removed except their action in the optimal allocation $a^{\rm opt}_{i}$ and their action in their worst $k$ -strong Nash equilibrium $a^{k{\rm SNE}}_{i}$ , the new problem will maintain the same $k$ -strong price of anarchy, but will now exist in $\hat{\mathcal{G}}_{n}$ . With this reduction, we will denote each player’s action set as $\hat{\mathcal{A}}_{i}=\{a^{\rm opt}_{i},a^{k{\rm SNE}}_{i}\}$ . Second, we normalize each resource value $v_{r}$ such that the equilibrium welfare is one. This, too, can be done without loss of generality by scaling each resource identically, thus not altering the $\mathrm{SPoA}$ ratio. Third, we invert the objective and consider the maximization of $W(a^{\rm opt})/W(a^{k{\rm SNE}})$ . Finally, we sum over each of the $k$ -coalition equilibrium constraints. For each $\zeta\in[k]$ , rather than satisfying each inequality in (3), sum over every combination of the $\zeta$ out of $n$ players, denoted $\mathcal{C}_{\zeta}$ . Applying these reductions to (D1) gives,

$\displaystyle\underset{\displaystyle\scriptstyle G\in\hat{\mathcal{G}}_{n}}{% \mathrm{max}}\quad W(a^{\rm opt})\hfil\hfil\hfil\hfil$		(D2)
$\displaystyle\mathmakebox[width("$\underset{\displaystyle\phantom{\scriptstyle G% \in\hat{\mathcal{G}}_{n}}}{\mathrm{max}}$")][c]{\mathmakebox[width("$\mathrm{% max}$")][l]{\mathrm{\kern 1.00006pts.t.}}}\quad$	$\displaystyle\binom{n}{\zeta}W(a^{k{\rm SNE}})\geq\sum_{\Gamma\in\mathcal{C}_{% \zeta}}W(a^{\rm opt}_{\Gamma},a^{k{\rm SNE}}_{-\Gamma}),\leavevmode\nobreak\ % \forall\zeta\in[k]\hfil\hfil$
$\displaystyle W(a^{k{\rm SNE}})=1$

(D2) provides a lower bound on $\mathrm{SPoA}_{k}(\mathcal{G}_{n},w)$ as the feasible set was expanded. Later, we will show that the bound is tight by constructing an example that realizes it.

2) Parameterization: We use the parameterization introduced in the proof of Proposition 3.2 with respect to the joint actions $a=a^{k{\rm SNE}}$ and $a^{\prime}=a^{\rm opt}$ . By considering any $\theta\in\mathbb{R}_{\geq 0}^{|\mathcal{I}|}$ , we can parameterize any game $G\in\hat{\mathcal{G}}_{n}$ ; to find the worst-case price of anarchy, we search over all such parameters, i.e., look over the entire class of games. The linear program (D) is the result of the search for the vector $\theta$ that results in the highest price of anarchy.

3) Constructing an example: Consider the following resource allocation problem: for each label $(e,x,o)\in\mathcal{I}$ and permutation of the $n$ player $\sigma\in{\Sigma}_{n}$ , define a ring of $n$ resources. Total, there are $nn!|\mathcal{I}|$ resources. Let $r_{i,j}^{(e,x,o)}$ denote the resource with label $(e,x,o)$ at position $i$ in the $j$ th ring. Consider, for instance, the $n!$ rings associated with the label $(e,x,o)=(2,1,1)$ as depicted in Fig. 6. We will construct the actions $a^{k{\rm SNE}}_{i}$ and $a^{\rm opt}_{i}$ so that for each resource in these rings, $e+x=3$ agents have it in only their equilibrium action, and $x+o=2$ agents have it only in their optimal action. In the first ring (with the monotonic permutation $\sigma=(1,2,3,\ldots,n)$ ), agent $i$ has actions $a^{k{\rm SNE}}_{i}=\{r^{(2,1,1)}_{i,1},r^{(2,1,1)}_{i+1\%n,1},r^{(2,1,1)}_{i+2% \%n,1}\}$ and $a^{\rm opt}_{i}=\{r^{(2,1,1)}_{i+2,1},r^{(2,1,1)}_{i+3\%n,1}\}$ , where $\%$ denotes the modulo operator so the selected resources wrap around the ring. This pattern continues for each ring $j\in[n!]$ with a different permutation of players $\sigma\in{\Sigma}_{n}$ . At a ring with label $(e,x,o)$ and permutation $\sigma$ , player $i$ has the actions $a^{k{\rm SNE}}_{i}=\{r^{(e,x,o)}_{\sigma(i),j},\ldots,r^{(e,x,o)}_{\sigma(i)+e% +x-1\%n,j}\}$ and $a^{\rm opt}_{i}=\{r^{(e,x,o)}_{\sigma(i)+e\%n,j},\ldots,r^{(e,x,o)}_{\sigma(i)% +e+x+o-1\%n,j}\}$ . Finally, each resource of type $(e,x,o)$ has a value $\theta(e,x,o)$ where $\theta$ is a fixed parameter. The function which encodes the welfare from player overlap is $w$ .

In the joint action $a^{k{\rm SNE}}$ , each resource is covered by exactly $e+x$ agents, and the system welfare can be written

W(a^{k{\rm SNE}})=\sum_{e,x,o}nn!\theta(e,x,o)w(e+x).

(24)

Similarly, joint action $a^{\rm opt}$ satisfies

W(a^{\rm opt})=\sum_{e,x,o}nn!\theta(e,x,o)w(o+x).

(25)

Now, consider a coalition $\Gamma\in\mathcal{C}_{[k]}$ and denote by $\zeta$ its cardinality. The system welfare of this group deviating their action to $a^{\rm opt}_{\Gamma}$ is

	$\displaystyle W(a^{\rm opt}_{\Gamma},a^{k{\rm SNE}}_{-\Gamma})=\sum_{e,x,o}% \sum_{j=1}^{n!}\sum_{i=1}^{n}\theta(e,x,o)w(\|a^{\rm opt}_{\Gamma},a^{k{\rm SNE% }}_{-\Gamma}\|_{r})$
	$\displaystyle=\sum_{e,x,o}\theta(e,x,o)\sum_{\begin{subarray}{c}0\leq\alpha% \leq e\\ 0\leq\beta\leq o\\ \alpha+\beta\leq\zeta\end{subarray}}nn!{e\choose\alpha}{o\choose\beta}\binom{n% -e-o}{\zeta-\alpha-\beta}w(e+x+\beta-\alpha)$		(26)

where we let $r$ be the shorthand for $r_{i,j}^{(e,x,o)}$ . The second equality holds by defining $\alpha$ and $\beta$ as the number of players in $\Gamma$ who invested in resource $r$ exclusively in their action $a^{k{\rm SNE}}$ or $a^{\rm opt}$ respectively. By counting arguments, there are exactly $\binom{e}{\alpha}\binom{o}{\beta}\binom{n-e-o}{\zeta-\alpha-\beta}$ positions for the players in $\Gamma$ which yield the profile $(\alpha,\beta)$ for a resource at some fixed position in the ring, there are $\zeta!$ ways to order the players in $\Gamma$ , $(n-\zeta)!$ ways to order the players not in $\Gamma$ , and $n$ resource in each ring.

Verifying $a^{k{\rm SNE}}$ is a $k$ -strong Nash equilibrium boils down to showing (24) is greater than or equal to (26). We can see that this holds whenever $\theta$ is a feasible point in (D). Accordingly, the $k$ -strong price of anarchy satisfies

\frac{1}{Q^{\star}}\leq\mathrm{SPoA}_{k}(\mathcal{G}_{n},w)\leq\frac{1}{\sum_{% e,x,o}\theta(e,x,o)w(o+x)},

(27)

where the first inequality holds from the reductions made in part 1, and the second holds as the $k$ -strong price of anarchy is upper bounded by any particular problem; comparing (24) and (25) gives the final expression. Letting $\theta$ take on the solution to (D) shows the bound is tight.

4) Taking the Dual: Before considering the dual program to (D), we first show that the primal is feasible. It is easy to verify the feasible set is non-empty by considering the point $\theta(1,0,0)=1/w(1)$ and zero otherwise. Now, we must show that the feasible set is compact, and thus, the value of (D) is bounded. From the equality constraint, we can obtain

1\geq\min_{y>0}w(y)\sum_{\begin{subarray}{c}e,x,o\\ e+x>0\end{subarray}}\theta(e,x,o).

Because we assume $w(y)>0$ for all $y>0$ , we show that each value of $\theta(e,x,o)$ such that $e+x>0$ is bounded. For the remaining values of $\theta(0,0,o)$ , consider the equilibrium constraint³³3The $\zeta=1$ constraint is present in (D) for all $k\geq 1$ . when $\zeta=1$ . By rearranging terms and observing the bounded terms from the previous argument, we observe $L\geq w(1)\sum_{o\in[n]}o\theta(0,0,o)$ , where $L$ is a bounded value. Because $w(1)>0$ , the remaining decision variables are also bounded, and thus the feasible set is finite.

Now, we find the dual program to (D). Because (D) is a linear program, we can rewrite it in the more concise form

	$\displaystyle\underset{\displaystyle\scriptstyle\theta\in\mathbb{R}^{\|\mathcal% {I}\|}}{\mathrm{max}}\quad b^{\top}\theta\hfil\hfil\hfil\hfil$
	$\displaystyle\mathmakebox[width("$\underset{\displaystyle\phantom{\scriptstyle% \theta\in\mathbb{R}^{\|\mathcal{I}\|}}}{\mathrm{max}}$")][c]{\mathmakebox[width(% "$\mathrm{max}$")][l]{\mathrm{\kern 1.00006pts.t.}}}\quad$	$\displaystyle c_{\zeta}^{\top}\theta$	$\displaystyle\geq 0,\leavevmode\nobreak\ \forall\zeta\in[k]$	$\displaystyle\quad\quad(\nu_{\zeta})$
$\displaystyle d^{\top}\theta-1$	$\displaystyle=0$		$\displaystyle\quad\quad(\rho)$
$\displaystyle\theta$	$\displaystyle\geq 0$		$\displaystyle\quad\quad(\phi)$

where $\nu\geq 0$ , $\rho$ , and $\phi\geq 0$ are the associated dual variables. The Lagrangian function is defined as $\mathcal{L}(\theta,\nu,\rho,\phi)=b^{\top}\theta+(\sum_{\zeta\in[k]}\nu_{\zeta% }c_{\zeta}^{\top}\theta)-\rho(d^{\top}\theta-1)+\phi^{\top}\theta$ . Let $g(\nu,\rho,\phi)=\sup_{\theta\in\mathbb{R}^{|\mathcal{I}|}}\mathcal{L}(\theta,% \nu,\rho,\phi)$ serve as an upper bound to (D). The dual program is derived by minimizing $g(\nu,\rho,\phi)$ ; note that this value is only unbounded above unless $b^{\top}+\sum_{\zeta\in[k]}\nu_{\zeta}c_{\zeta}^{\top}-\rho d^{\top}+\phi^{% \top}=0$ . Substituting this into the objective and removing the free variable $\phi$ so that the equality constraint becomes an inequality, the dual problem becomes

		$\displaystyle\underset{\displaystyle\scriptstyle\rho,\{\nu_{\zeta}\in\mathbb{R% }_{\geq 0}\}_{\zeta\in[k]}}{\mathrm{min}}\quad\rho\hfil\hfil\hfil\hfil$				(P1)
		$\displaystyle\mathmakebox[width("$\underset{\displaystyle\phantom{\scriptstyle% \rho,\{\nu_{\zeta}\in\mathbb{R}_{\geq 0}\}_{\zeta\in[k]}}}{\mathrm{min}}$")][c% ]{\mathmakebox[width("$\mathrm{min}$")][l]{\mathrm{\kern 1.00006pts.t.}}}\quad$	$\displaystyle b^{\top}-\rho d^{\top}+\sum_{\zeta\in[k]}\nu_{\zeta}c_{\zeta}$	$\displaystyle\leq 0\hfil\hfil$		(P1)

From strong duality, (P1) provides the same value as (D). Expanding terms show that (P1) is equivalent to (P $[k]$ ). ∎

Proof of Proposition 5.3: The proof is straightforward and simply requires generalizing the constraint set of (P $[k]$ ). Consider taking the same steps as the proof of Theorem 3.3 but with the equilibrium constraint defined by the utility rule $u$ . This will result in the same linear program as in (P $[k]$ ), but now with the constraint set

0\geq w(o+x)-\rho w(e+x)+\\ \sum_{\zeta\in[k]}\nu_{\zeta}\left(\binom{n}{\zeta}u(e+x)-\sum_{\begin{% subarray}{c}0\leq\alpha\leq e\\ 0\leq\beta\leq o\\ \alpha+\beta\leq\zeta\end{subarray}}{e\choose\alpha}{o\choose\beta}\binom{n-e-% o}{\zeta-\alpha-\beta}u(e+x+\beta-\alpha)\right)\\ \forall(e,x,o)\in\mathcal{I}.

(28)

At this point, the new linear program will provide tight bounds on a specified utility rule $u$ .

Finally, we substitute the new decision variable $u_{\zeta}\in\mathbb{R}^{n}_{\geq 0}$ into each occurrence of $\nu_{\zeta}u$ . This enlarges the feasible set, which now subsumes all the feasible points that would evaluate a utility rule $u$ by satisfying $u=u_{\zeta}$ for all $\zeta\in[k]$ . As we do not enforce this constraint, the value of the final program (Q $[k]$ ) provides a lower bound on the original program, or its reciprocal provides an upper bound on the $k$ -strong price of anarchy under the optimal utility design. ∎

Collaborative Decision-Making and the k𝑘kitalic_k-Strong Price of Anarchy in Common Interest Games

Abstract

1 Introduction

2 Preliminaries

2.1 Collaborative Decision Making

2.2 k-Strong Nash Equilibria

Definition 1.

Proposition 2.1.

2.3 Summary of Contributions

3 Quantifying k𝑘kitalic_k-Strong Price of Anarchy

3.1 Coalitionally Smooth Games

Definition 2.

Proposition 3.1.

Proof.

3.2 Resource Allocation Games

Proposition 3.2.

Theorem 3.3.

4 Coalitional Dynamics

4.1 Round Robin

Proposition 4.1.

Proof.

4.2 Asynchronous Best-Response Dynamics

Theorem 4.2.

Proof.

Corollary 1.

4.3 Numerical Example

5 Utility Design

5.1 Generalized Coalitionally Smooth Games

Definition 3.

Proposition 5.1.

Proof.

5.2 Resource Allocation Games

Theorem 5.2.

Proof.

Proposition 5.3.

Corollary 2.

Proof.

6 Conclusion

References

References

Collaborative Decision-Making and the $k$ -Strong Price of Anarchy in Common Interest Games

3 Quantifying $k$ -Strong Price of Anarchy