Collaborative Decision-Making and the -Strong Price of Anarchy in Common Interest Games
Bryce L. Ferguson
\IEEEmembershipStudent Member, IEEE
Dario Paccagnan
\IEEEmembershipMember, IEEE
Bary S. R. Pradelski
and Jason R. Marden
\IEEEmembershipSenior Member, IEEE
This work was supported in part by the Office of Naval Research under Grant # N00014-20-1-2359, the Air Force Office of Scientific Research under Grants # FA95550-20-1-0054 and # FA9550-21-1-0203, and the
French National Research Agency (ANR) under Grant # ANR-19-CE48-0018-01.Bryce L. Ferguson and Jason R. Marden are with the Department of Electrical and Computer Engineering at the University of California, Santa Barbara, CA. {blferguson,jrmarden}@ece.ucsb.edu.Dario Paccagnan is with the Department of Computing at Imperial College London, UK. [email protected].Bary S. R. Pradelski is with the National Centre for Scientific Research (CNRS) and the Department of Economics at Oxford, UK.[email protected].
Abstract
The control of large-scale, multi-agent systems often entails distributing decision-making across the system components.
However, with advances in communication and computation technologies, we can consider new collaborative decision-making paradigms that bridge centralized and distributed control architectures.
In this work, we seek to understand the benefits and costs of increased collaborative communication in multi-agent systems.
We specifically study this in the context of common interest games in which groups of up to agents can coordinate their actions in maximizing a common objective function.
The equilibria that emerge in these systems are the -strong Nash equilibria of the common interest game; studying the properties of these states provides relevant insights into the efficacy of inter-agent collaboration.
Our contributions come threefold: 1) provide bounds on how well -strong Nash equilibria approximate the optimal system welfare, formalized by the -strong price of anarchy, 2) prove the run-time and transient performance of collaborative agent-based dynamics, and 3) introduce techniques of redesigning objectives for groups of agents which improve system performance.
We study these three facets generally as well as in the context of resource allocation problems, in which we provide tractable linear programs that give tight bounds on the -strong price of anarchy.
1 Introduction
Large-scale systems such as transportation services [1], robotic fleets [2], supply chains [3], or cloud computing services [4] can be challenging to design effective control schemes for due to their many components and vast scale.
The two prevailing paradigms to design control schemes are centralized control [5, 6, 7], which guides behavior across the entire system and distributed control [8, 9, 10], which allows local components to guide their own behavior.
Each of these approaches possesses respective pros and cons: centralization allows for more direct manipulation of system behavior at the cost of greater communication and computation requirements, while decentralization reduces the communication and computation requirements but cannot always attain the desired system behavior.
Advancements in embedded communication and computation [11, 12, 13, 14] enable the design of new paradigms that exist between centralized and distributed control.
Specifically, we study the efficacy of
learning in multi-agent systems when individual system components (or agents) can partially communicate and thus coordinate their behavior.
Many engineering domains are on the precipice of enabling these collaborative paradigms; for example,
autonomous vehicle platoons with connected cruise control [15],
unmanned aerial surveillance vehicles with range-limited communication [16], and
cloud computing networks with emerging distributed learning techniques [17].
In each of these settings, inter-agent communication and collaboration offer the opportunity to improve the performance attainable by the system as a whole; however, implementing these frameworks incurs costs that are both monetary–in the form of the additional technology required–and computational–in the form of more complex decision-making algorithms.
In this work, we provide tools to help better understand the benefits and costs associated with collaborative communication in multi-agent systems.
We model a multi-agent system as a common interest game where some (but not all) groups of agents can collaborate in selecting their actions to maximize the system welfare.
We particularly focus on the case where a collaborative action takes the form of a group best response, i.e., a group of agents updating their actions in response to the remaining players’ actions.
As the size and number of these collaborative groups increase, a coordinated group decision has a larger impact on system behavior.
To range the level of collaboration between the fully distributed setting (where no agents can collaborate) and the fully centralized setting (where all agents can collaborate collectively), we consider the cases where groups of up to agents can collaborate.
In these collaborative environments, a stable state of the system is that of the -strong Nash equilibrium [18].
Researchers have studied the existence [19] and computation [20] of strong Nash equilibria in settings including congestion games [21], lexicographical games [22], and Markov games [23].
This work applies these concepts to multi-agent systems.
To understand the possible benefits of collaboration to system performance, we quantify how well -strong Nash equilibria approximate the optimal welfare, termed the -strong price of anarchy [24, 25].
To understand the possible cost of collaboration, we analyze the running time and transient performance of agent-based dynamics, which converge to -strong Nash equilibria.
Distributed learning in games has been a widely studied area in controls [26], but the ability to reach equilibrium with coalitional best responses has not yet been studied; we thus study the added run time of collaborative algorithms.
Quantifying the -strong price of anarchy has been studied in network formation games [24, 25] and load balancing games [27, 28, 29], as well as more general utility maximizing games [30, 31].
In many of these, the bounds are either not tight (particularly for finitely many players) or hold for equilibria which need not exist.
By focusing on the class of common interest games, we guarantee the existence of collaborative equilibria, provide tight approximation bounds, and develop new insights into collaborative multi-agent optimization.
Organization - This work provides tools to understand the benefits and costs of collaborative communication by studying the qualities of -strong Nash equilibria.
In Section3.1, we consider the case where groups of agents are designed to maximize the system welfare and introduce the notion of --coalitionally smooth games (a generalization of smooth games [32] and coalitionally smooth games [30]), and provide bounds on the -strong price of anarchy.
Then, in Section3.2, we focus on the well-studied setting of distributed resource allocation problems [33, 34, 35, 36, 37, 38], and provide tight bounds on the -strong price of anarchy via the solution of a tractable linear program.
Fig.3 plots these bounds and demonstrates how increased collaboration improves efficiency guarantees in several classes of resource allocation problems.
In Section4, we consider the effects of group decision-making on agent-based dynamics; specifically, we show the added run-time complexity of coalitional round-robin dynamics and provide transient performance guarantees of asynchronous best response dynamics.
We support our findings with numerical examples.
In Section5, we consider that the system operator may be able to design the agents’ objective separately from the system welfare; we provide a generalized technique for bounding the -strong price of anarchy in this setting.
In Section5.2, we again focus on the setting of resource allocation and provide two linear programs to lower and upper bound the attainable -strong price of anarchy guarantee via utility design.
2 Preliminaries
Throughout, we will denote .
We will regularly use the binomial coefficient in constructing optimization problems; we define this value as when for ease of notation.
2.1 Collaborative Decision Making
Consider a finite set of agents .
Each agent selects an action from a finite action set .
When each agent selects an action, we will denote their joint action by the tuple .
Let be a tuple encoding the components of the agent environment.
The system’s performance is dictated by the agents’ actions; as such, for each joint-action , we assign a system welfare where is the system designer’s objective function.
With this, we let the tuple denote a multi-agent system (often referred to as a system), which defines the primitives of the system designer’s problem of designing an effective control algorithm.
The system designer would like to configure the agents to reach a joint action that maximizes the system welfare, i.e.,
(1)
Though this system state is ideal, it may be difficult to attain as 1) solving for the optimal allocation can be combinatorial and in some cases (including those from Section3.2) NP-hard [33], and 2) it requires a centralized authority to control all agents, which may be practically or logistically difficult.
To resolve this, we will consider that agents make decisions in a decentralized manner.
Fully distributing the decision-making involves designing each agent to update their action locally and has been widely studied and developed to guarantee reasonable system behavior [8]; however, fully distributing decision-making may often become unnecessary as emerging communication technologies enable collaborative inter-agent decision-making[11].
To implement one such collaborative system architecture, a system operator must make two decisions: 1) which group of agents can collaborate on their decisions (possibly subject to some operational constraints), and 2) how the agents should collaborate on their decisions.
A natural choice for the latter is a group best response.
Let be a group of agents endowed with the ability to collaboratively select a group action , which they select by maximizing the system welfare over their group action-set,
(2)
where denotes the actions of the players .
If there are multiple elements in the argmax, the group breaks them at random unless they can remain with their current action.
Intuitively, a group best responding and collaboratively maximizing the system welfare should lead to direct improvements to system performance; however, one can consider other group decision-making rules as well.
In particular, in Section5, we will consider that the system designer can design the agents’ objective separately from the system objective as a means to further shape system behavior.
In either case, one would imagine that the greater the collaborative structure, the greater the impact on emergent behavior.
For the system operator’s decision over which groups should collaborate, let denote the collaboration set, or the set of groups of agents () able to collaborate their decisions.
These collaborations can overlap–where agents can partake in multiple, disparate collaborations–and vary in size.
For example, if agents send signals through a communication network [39], we will have where are the edges in a communication graph.
If agents are allowed to communicate with each other one at a time and make pairwise decisions [40], then .
If agents can only communicate with others within a local proximity [41], then where measures the distance between two agents and is a maximum communication range.
Once the system operator decides on the collaborative structure and the group decision-making protocol, the agents’ decision-making process forms a collaborative multi-agent system, denoted by the tuple .
As we vary the number and size of collaborative sets, we can consider control paradigms somewhere between centralized (i.e., ) and fully distributed (i.e., .
This work seeks to understand the efficacy of different levels of communication/collaboration.
To more effectively quantify this, we consider a specific type of collaboration set in which we can range between the centralized and distributed extremes.
2.2 k-Strong Nash Equilibria
We consider the collaboration sets that contain groups of agents up to size .
Let denote the subsets of exactly agents and be the subsets that contain at most agents.
When , we recover the fully distributed setting, and when , we recover the fully centralized setting.
As we vary between and , we sweep through different levels of communication and collaboration.
In the game-theoretic approach to multi-agent systems, a Nash equilibrium is a joint action where no agent can unilaterally deviate their action to improve the system welfare [18].
We generalize this concept to the setting of collaborative decision-making by considering a -strong Nash equilibrium as a joint action where no group of agents can deviate their group’s actions to improve the welfare.
Definition 1.
A joint-action is a -strong Nash equilibrium for the common-interest game if
(3)
Let denote the set of all -strong Nash equilibria.
Note that when , we recover the classical definition of a Nash equilibrium, and when , the equilibrium condition implies global optimality.
Definition1 differs slightly from the literature, where -strong Nash equilibria are defined by no group of agents deviating to a new group action that is Pareto-optimal for the group (i.e., no agent receives a lower payoff with respect to their individual utility function) [18]; when the agents respond to a common interest objective, the definitions are equivalent.
Additionally, in general games, -strong Nash equilibria need not exist; however, that is not the case in our setting due to the common-interest structure we impose on agent decision-making.
Proposition 2.1.
In a system with collaboration set for any , a -strong Nash equilibrium exists.
The proof appears in the appendix.
The main focus of this work is understanding how equilibrium performance changes with the level of collaborative communication.
Notice that (3) serves as a local optimality guarantee in the neighborhood of -lateral deviations.
Fig.1 depicts this for a three-player matrix game; when , a -strong Nash equilibrium (or just Nash equilibrium) is optimal over the unilateral deviations, when a -strong Nash equilibrium is optimal over the bilateral deviations, and when , the -strong Nash equilibrium is optimal over the whole joint-action space.
From this, we observe that the local optimality guarantee is strengthened as we increase the level of collaboration (i.e., for ).
To quantify the effect of varying on equilibrium performance, we consider the ratio of worst-case equilibrium welfare and the optimal attainable welfare, termed the -strong price of anarchy.
(4)
where we let be defined as to ignore the degenerate case when no welfare is attainable.
In the multi-agent system with communication structure , every -strong Nash equilibrium approximates the optimal solution at least as well as .
Accordingly, we will use the -strong price of anarchy to understand the efficiency associated with collaborative decision-making.
For example, in Fig.2, we depict the -strong price of anarchy in resource covering games [33] for , illustrating the performance guarantees attainable between centralized and distributed control paradigms.
2.3 Summary of Contributions
This work studies the benefits and costs of increased collaborative communication within multi-agent systems.
Our contributions come threefold:
1) In Section3, we provide tools to quantify the -strong price of anarchy when agents optimize the system objective.
We introduce --coalitionally smooth games and provide a -strong price of anarchy guarantee using the parameters and . We then focus on the class of resource allocation games, where in Proposition 3.2, we show that these parameters can be found via the solution to a tractable linear program. In Theorem3.3, we show that combining the constraints of each of the linear programs gives a tight bound. Figure 3 depicts the -strong price of anarchy for several classes of resource allocation games.
2) In Section4, we study collaborative dynamics that reach these equilibria. In Section4.1, we introduce the coalitional round-robin dynamics and show that an equilibrium is reached in a finite number of best responses and that the number of welfare comparisons grows with a small-base exponential of . In Section4.2, we introduce the asynchronous coalitional best response dynamics, which we show converge almost surely. Further, if the game is --coalitionally smooth, then we provide a bound on the transient performance (or the cumulative welfare along the dynamics). We support these findings with a numerical study in Section4.3.
3) In Section5, we consider how to improve the design of a group’s decision-making process. By providing the agents with a new, designed objective function, the system designer may alter the set of equilibria and ideally increase the -strong price of anarchy. In Section5.1, we generalize the notion of coalitional smoothness to the setting where the agents’ objective differs from the system welfare, and in Theorem5.2, we show how we can construct an optimal utility rule. Fig.5 shows the -strong price of anarchy under the optimal utility design for resource allocation games, demonstrating the added benefit of designing how groups of agents make decisions.
3 Quantifying -Strong Price of Anarchy
3.1 Coalitionally Smooth Games
We first consider the efficiency of -strong Nash equilibria for general multi-agent systems.
This efficiency–quantified by the -strong price of anarchy–is conditioned on the system welfare and the agent decision-making environment .
In Definition2, we provide a condition on a system that will be useful in bounding the -strong price of anarchy.
Definition 2.
A system is --coalitionally smooth, where , if for all
(5)
In (5), we provide a constraint on the welfare function stating that the average effect of a group of size deviating their action from to is lower bounded by a linear combination of the welfare of and .
The term smooth is in reference to the welfare function’s change over the joint-action space being bounded by (5).
Additionally, Definition2 extends the classic notion of smooth games [32] and coalitional smoothness for strong equilibria [30] to the setting of -coalitions in common interest games.
In effect, every system is smooth with for all , but some parameters are more useful than others.
In Proposition3.1, we show that the parameters and from Definition2 can be used to lower bound the -strong price of anarchy.
Proposition 3.1.
A system that is --coalitionally smooth has -strong price of anarchy satisfying
(6)
Proof.
Let denote a -strong Nash equilibrium in the system (i.e., satisfying Definition1), and let denote an optimal joint action.
For any , we have
(6) provides lower bounds on the -strong price of anarchy;
accordingly, a -strong Nash equilibrium approximates the system optimal at least as well as .
Often, the best lower bound is provided by ; however, this is not true in general.
As such, we must consider each of the constraints in (5) to derive the best bounds.
The efficiency bounds of this form are valuable for several reasons, including:
1) they can be used to provide insights on the transient guarantees of various multi-agent dynamics (see Section4,
2) they easily generalize to broader equilibrium concepts (subject of future work), and
3) if parameters can be shown to satisfy (5) for a set of systems , then each system inherits the efficiency guarantee of (6).
This last point is particularly pertinent, as system models may be subject to noise, mischaracterizations, or changes over time.
If the efficiency guarantee holds across many similar systems, then the guarantees are essentially robust to these issues.
In the Section3.2, we will provide methods to find coalitional smoothness parameters for classes of resource allocation games via tractable linear programs.
3.2 Resource Allocation Games
In this subsection, we consider the well-studied class of resource allocation games [35, 33, 36, 37, 42].
Consider a set of resources or tasks , to which agents are assigned, i.e., agent selects a subset of these resources as its action from a constrained set of subsets .
Each resource has a value ; the welfare contributed by a resource is , where captures the added benefit of having multiple agents assigned to the same resources and is the number of agents assigned to in allocation .
Assume that as no welfare is contributed by resources assigned to zero agents and further that for all .
The system welfare is thus
(8)
For ease of notation, we will refer to the system welfare by the local welfare rule , noting in the agent-environment , it generates a welfare function via (8).
As discussed in 3.1, we wish to find efficiency bounds that hold over a class of resource allocation problems.
Let denote a resource allocation problem, and let denote the set of all such resource allocation problems with at most agents.
In Proposition 3.2, we propose a tractable linear program whose solution provides parameters which satisfy Definition2 for every system .
From Proposition3.1, this also provides a lower bound on the -strong price of anarchy for the class of resource allocation problems with local welfare .
Proposition 3.2.
Each resource allocation problem is --coalitionally smooth with and , where is a solution to the linear program (P):
(P)
The constraints are parameterized by the triples .
With the possibility of collaboration, an equilibrium becomes more difficult to characterize than in a fully distributed setting.
We circumvent this by introducing a parameterization which allows us to generalize the comparisons of (3) (where ) into linear inequalities.
Further, satisfying these inequalities provides parameters that satisfy Definition2, leading to (P) as a search for such parameters with the best -strong price of anarchy guarantee.
Proof of Proposition 3.2:
The proof largely relies on introducing a parameterization that lets us treat (5) as a set of linear constraints.
Consider a resource allocation game and any two actions .
To each resource , we assign a label , where
This is to say, denotes the number of agents utilizing resource in joint action but not , is the number that uses resource in joint action but not , and is the number that uses in both and .
In the set of games , let denote the set of possible labels, and
where denotes the set of resources with label .
The parameter is a vector with elements for each label.
We will now express the terms in (5) using this parameterization.
Because depends only on the number of agents utilizing a resource; we can represent and write the system welfare as
When not stated, the sum over is implied to be for each label in .
Similar steps can be followed to show .
Finally, the term can similarly be transcribed by this parameterization:
where the set of coalitions was partitioned according to the action profile of the agents in each coalition.
We let denote the number of agents in that utilize resource only in joint action and the number of agents in that utilize only in joint action .
By simple counting arguments, there are exactly coalitions grouped with the same and .
This decomposition is possible as the number of agents utilizing resource after a group deviates is precisely .
The smoothness constraint (5) is satisfied only if
As for all , it is sufficient to satisfy
(9)
Observe that (9) is independent of , , and .
As such, this set of constraints serves as a sufficient condition that any satisfies (5) for all respective .
To find parameters and that provide the best -strong price of anarchy guarantee, we formulate the following optimization problem:
(P1)
We restrict to be non-negative, though this constraint is not active except in degenerate cases.
Finally, we transform (P1) by substituting new decision variables and .
The new objective becomes .
Note that the constraint implies ; we can thus invert the objective and change the minimization to a maximization, giving (P).
∎
(P)
The smoothness parameters found via Proposition 3.2 can be used with Proposition3.1 to generate lower bounds on the -strong price of anarchy.
However, these bounds need not be tight, i.e., there may be no system in the class that attains this inefficiency, and better bounds may be possible.
To study what efficiency we can guarantee across a class of resource allocation problems, we define the -strong price of anarchy bound for as
(10)
This performance ratio is parameterized by our choice of welfare function and the size of collaborative coalitions .
In Theorem3.3, we provide a linear program whose value provides an exact value of .
We do this by showing that the constraints of the linear programs in Proposition 3.2 can be combined to give an exact quantification of the -strong price of anarchy bound.
Theorem 3.3.
For the class of resource allocation problems with welfare function , when groups maximize the common interest welfare, then
In Fig.3, we consider four welfare functions and plot the tight bounds on the -strong price of anarchy for .
As expected, we observe that increased communication improves efficiency guarantees; the amount of this increase is useful in determining the benefits of inter-agent communication/collaboration.
However, this collaboration comes at a cost; in Section4, we will study the complexity of distributed dynamics reaching -strong Nash equilibria.
4 Coalitional Dynamics
Section3 provided several tools for quantifying the efficiency guarantees of -strong Nash equilibria.
In this section, we will study the qualities of group-based dynamics that reach these equilibria.
In particular, we will discuss the convergence rate and transient performance when agents follow the Coalitional round-robin and Asynchronous Best Response, respectively.
We will denote as the joint action occurring at time and as the group of agents updating their action at time .
4.1 Round Robin
We first consider the -coalitional round robin agent dynamics, in which each group of agents updates their actions sequentially, following a set order , where for is the index of a group .
We will call a round one pass through in which each group updates their action.
At their turn, the group selects their best response to the current action, i.e., , where ties are broken uniformly at random unless , in which case the group selects their current action .
The dynamics are more formally described in Algorithm1.
These dynamics are synchronous (in that agents must follow a set order) but provide an understanding of how groups of agents can make decisions in a localized manner, and we can analyze the equilibrium hitting time.
In the fully distributed setting (), it has been shown that these dynamics reach a Nash equilibrium in finite time and require welfare evaluations [43].
In Proposition 4.1, we find that in the coalitional settings, we maintain the finite convergence time and incur a small base exponential gain in the number of welfare comparisons required.
Recent work has shown that the examples that realize these worst-case hitting times are fragile and that equilibria can be computed in polynomial-time under smoothed running-time analysis [44].
As a first step, we consider the worst-case run time, but the authors believe that similar findings on the added complexity of group decision-making will hold under smoothed running-time analysis, though this is the subject of ongoing work.
Recall and .
Proposition 4.1.
The -Coalitional-Round-Robin dynamics converge in finite time and requires welfare evaluations.
Proof.
First, we verify that the output of Algorithm1 is a -strong Nash equilibrium, then we consider how long it takes Algorithm1 to converge.
Algorithm1 terminates after a round in which no group can select a new action in which the welfare increases, i.e., for all and where is the output of Algorithm1.
A deviation for a any subgroup is subsumed by the joint action .
As such, a state terminates Algorithm1 if and only if it satisfies (3) and is a -strong Nash equilibrium.
Without loss of generality, we assume each agent possesses actions; for each agent, that has fewer actions, assign dummy actions with minimum welfare.
In one round of the -Round-Robin dynamics, each group of agents is given the opportunity to deviate their action.
First, we note that no group will respond to the same complimentary group action in two consecutive rounds unless is a -strong Nash equilibrium.
If the group rejects a group action in response to , the joint action is eliminated from consideration as an output of Algorithm1.
Accounting for overlaps between the groups, in any round that does not start in a -strong Nash equilibrium, at least
joint actions are eliminated as possible outputs of Algorithm1.
As there are joint actions in total, there can be at most rounds that do not start in a -strong Nash equilibrium; this proves the finite convergence time.
In each round, there are exactly welfare checks; thus, the total number of welfare checks is no more than .
Removing lower order terms from gives the stated bound.
∎
From Proposition 4.1, we observe two things: 1) the coalitional dynamics do not require drastically more welfare evaluations than the fully distributed round-robin, but 2) the convergence rate is slow regardless of .
In light of this, we turn our focus to understanding the transient performance of collaborative decision-making dynamics.
Further, in many settings, it is desirable to allow agents or groups to update their actions asynchronously.
In Section4.2, we will consider both of these factors in the asynchronous best response dynamics.
4.2 Asynchronous Best-Response Dynamics
Motivated by settings where agents (or groups of agents) perform action revisions asynchronously or on their own time scales, we consider a dynamical system where the next group of agents to update is random.
We define the Asynchronous -Coalitional Best-Response Dynamics as follows: let denote the number of agent (or group) updates that have yet occurred111Counting time steps in terms of the number of updates subsumes cases where agents (or groups) update with respect to individual and independent random clocks. The rate of each clock is analogous to the selection probability for different groups..
The updating group is selected at random, such that the size of the group is picked with probability and the specific agents in the group are drawn uniformly at random.
Once formed, the updating group, , chooses their best response in the same manner as the coalitional round robin described in Section4.1.
From their distributed decision-making and asynchronicity, these dynamics capture the behavior of real-time multi-agent systems components.
In Theorem4.2, we show these dynamics converge almost surely to a -strong Nash equilibrium, and further, if the system is --coalitionally smooth, we provide a bound on the cumulative welfare relative to the optimal.
Theorem 4.2.
The Asynchronous -Coalitional Best-Response Dynamics converge almost surely to the set of -strong Nash equilibrium.
Further, if is a --coalitionally smooth system, then after update steps, the cumulative expected welfare satisfies
(12)
where is the probability a group of size best responds.
Interestingly, the bound on the average transient welfare depends on how frequently groups of different sizes are sampled to perform their best response.
When the agents are designed to more regularly collaborate in larger groups, the transient guarantee will often be better.
Proof.
First, we show that the Asynchronous -Coalitional Best-Response Dynamics converges in general.
A group revises their action only to one of strictly higher payoff if one exists.
Consider the resulting Markov chain with states .
Any state has an outgoing edge with positive probability as there exists some group that is selected with probability which would revise their action.
Any state has no outgoing edges with positive probability as no group can revise their action to strictly increase the welfare.
Finally, there are no cycles (excluding self-loops) in , as every outgoing edge is directed from a joint action of lower welfare to one of strictly higher welfare.
As such, the set is absorbing and .
Now, consider that the system is --coalitionally smooth.
As the selection of the updating group is random, the welfare at time is a random variable, even when conditioned on ; the expectation of the succeeding welfare can be written
where is the update state for the group following the dynamics; the welfare for each possible updated joint action is the same, so determining which group action is selected is irrelevant.
As is a best response, the welfare is no better for selecting a different action, namely .
The final inequality holds from (5).
Taking the expectation of over gives
Rearranging terms shows
Observe that either or .
Accordingly, in expectation, every other update must satisfy the bound, giving the average cumulative welfare bound in (12).
∎
Theorem4.2 shows that the transient efficiency changes with the frequency with which different group sizes perform best responses.
To attain the best transient guarantee, we can select carefully.
Corollary 1.
If a system is a resource allocation problem in , then selecting for all gives
The proof is omitted as it is straightforward by rearranging terms in the constraints of (D).
Together, Theorem4.2 and Corollary1 provide insight into the transient performance of non-deterministic multi-agent dynamical systems with collaborative communication.
Future work will study the traits of non-best-response dynamics, namely regret-based decision-making.
4.3 Numerical Example
We support the findings of Section4.2 by numerical example.
We randomly generate resource allocation problems and simulate the coalitional asynchronous best response dynamics when groups of size update.
The resource allocation problems are generated by creating 100 resources with values independently drawn uniformly at random on .
Each of the 25 agents is endowed with between 1 and 10 actions (also sampled uniformly at random).
For each action of each player, each resource is included in that particular action with probability 0.25.
This defines a tuple .
We use the local welfare function to capture some added benefit from having multiple agents use the same resource and eventual diminishing returns and increased cost from over congestion.
We select a random initial condition and run the asynchronous best response dynamics with for one value (i.e., only groups of exactly size are sampled, but the simulation is repeated for .
We ran this simulation 100 times.
In Fig.4(a), we plot the average welfare across the simulations over the number of group action revisions.
We observe that the larger coalitions provide superior transient and long-run performance.
However, a single group action revision requires more computation for larger coalitions.
In Fig.4(b), for each coalition size , we show a scatter plot of the number of cumulative welfare evaluations and the attained system welfare, along with a trend line fit to the data within two standard deviations of the average number of welfare evaluations.
Here, we observe that for lower values of welfare, the smaller coalitions can attain similar welfare with fewer welfare evaluations but that the larger coalitions reach higher welfare much more regularly.
These conclusions help to identify the trade-off in designing systems with collaborative communication: better performance is attainable at the cost of greater computation.
Figure 5: Bounds on the -strong Price of Anarchy using the optimal utility function in the class of resource allocation games with welfare function . Upper bound on generated by Proposition5.3 and lower bound and utility rule that attains it generated by Theorem5.2. Compared with the -strong price of anarchy when agents optimize the system welfare (lighter line), we demonstrate the possible and guaranteed gain in equilibrium performance attainable by designing group decision-making for collaborative multi-agent systems.
5 Utility Design
Up until this point, agents and groups of agents have been set to optimize the system welfare over their respective individual or group actions.
Though this is a reasonable approach, the system designer may seek to further improve system performance by designing how a group of agents makes a decision.
Consider that groups of agents instead maximize the objective function (henceforth referred to as the utility function), i.e.,
(13)
where ties are still broken at random unless the current group action is in the argmax.
By designing the utility function , the system operator can alter how groups of agents make decisions and, ideally, improve the performance of the system.
A multi-agent system is now captured by the tuple , where the previous results are the special case when .
By redefining the objective functions groups of agents seek to maximize, we additionally alter the equilibria that emerge from collaborative decision-making.
We alter the definition of -strong Nash equilibria to hold with respect to the utility function, i.e.,
(14)
Let denote the set of -strong Nash equilibria when agents optimize the objective .
The new set of equilibria implies the equilibrium performance guarantee may also change.
As such, we redefine the -strong price of anarchy as the approximation of the optimal welfare provided the system equilibria under objective function ,
(15)
With this new design opportunity, we identify two goals in understanding the new attainable performance of collaborative decision-making: 1) quantifying the performance of a prescribed utility function, and 2) finding a utility function that provides the greatest -strong price of anarchy guarantees.
We address these two points in general in Section5.1 and more thoroughly within resource allocation problems in Section5.2.
5.1 Generalized Coalitionally Smooth Games
In this section, we consider the general setting and particularly focus on quantifying the -strong price of anarchy of a system .
As in Section3.1, we introduce a notion of smooth systems now generalized to the setting where the agent objective differs from the system objective .
Definition 3.
A system is --generalized-coalitionally smooth, where , if for all
(16)
Like (5), (16) provides a bound on average deviation effect of a group of size but on the utility function instead of the welfare.
In Proposition 5.1, we show that --generalized-coalitionally smooth system permits a bound on the -strong price of anarchy.
Proposition 5.1.
A system that is --generalized-coalitionally smooth has -strong price of anarchy satisfying
(17)
Proof.
Let denote a -strong Nash equilibrium when agents follow objective function , and let denote an optimal joint action.
For any , we have
(18a)
(18b)
Where (18a) holds from by being a -strong Nash equilibrium and (7c) holds from Definition3.
Rearranging, we get .
∎
Beyond quantifying the -strong price of anarchy for a system , one may wish to find the utility function which provides the best efficiency guarantee, i.e.,
For a specific problem , it is possible to design a utility function which guarantees that a system optimal is a unique equilibrium and provides (e.g., ).
However, this would require knowing the optimal allocations a priori, which poses several problems, including: 1) computing an optimal allocation can be intractable, and 2) system parameters may be subject to modeling errors, noise, or changes over time, causing the optimal allocations to change.
As such, we will consider the design of utility rules, which provide a set of instructions to construct a utility function across a class of systems and eliminate the computational burden of solving for a new utility function for each system while maintaining improved performance guarantees.
Luckily, the approach in Proposition 5.1 is amenable to generating performance guarantees across a class of systems, and in Section5.2, we will investigate optimal utility rules more thoroughly in resource allocation problems.
5.2 Resource Allocation Games
In this section, we consider the -strong price of anarchy in classes of resource allocation problems when the agents’ objective is derived from a utility rule .
In an agent environment , the utility rule can be applied to derive the utility function
To normalize the utility function, we set .
We ultimately consider the performance of a utility rule across all agent environments with welfare function .
We slightly abuse notation to refer to a system by the tuple .
To quantify this performance, we generalize the -strong price of anarchy bound defined in Section4.1 to hold for cases where groups of agents optimize the utility function.
(19)
The performance ratio is parameterized by the pair ; as such, we will discuss the effectiveness of a utility rule with respect to a given welfare function .
Taking the utility rule approach completely eliminates the computational cost of deriving a utility function for each problem instance; now we seek to understand the capabilities of this approach in two ways: 1) in Theorem5.2 we demonstrate how we can construct utility rules with good performance guarantees, and 2) in Proposition 5.1 we provide an upper bound on the best attainable performance a utility rule can provide.
In Corollary2, we provide a formal condition on when the constructed utility rule is optimal.
Theorem 5.2.
Any resource allocation problem with the utility rule is --generalized-coallitionally smooth, where and are solutions to the linear program,
(Q)
Proof.
Consider the parameterization described in the proof of Proposition 3.2, where for any two actions , we can rewrite and .
Now, we can additionally rewrite and
We can now write out (16), the --generalized-coalitionally smooth constraint, as
As before, we can observe that this constraint is sufficiently satisfied when
(20)
The task of finding smoothness parameters that give the best price of anarchy guarantee becomes the same problem as (P1) but now with constraint set (20).
By substituting the decision variables and , we attain the new constraint set
(21)
The new objective222As an aside, the transformed program up to this point can be used to evaluate the performance of a specified utility rule. becomes .
Finally, we let become a decision variable in the program.
Observe that every occurrence of is multiplied by , and every occurrence of multiplies .
As such, we can define the new decision variable and retrieve the linear program (Q).
∎
The utility rule that (Q) provides us some guarantee on attainable performance from designing group decision-making in collaborative systems.
However, it is not yet clear if these are the best possible utility rules.
To understand what the best possible performance is of a collaborative system, we define the optimal -strong price of anarchy as
(22)
This upper bound informs us of what efficiency is possible to hope for out of a collaborative system.
In Proposition 5.3, we bound this quantity.
Proposition 5.3.
For the class of resource allocation problems , when agents maximize the optimal utility design objective ,
(23)
where is value of the linear program
(Q)
The proof appears in the appendix.
Note that Theorem5.2 provides a utility rule with associated performance guarantee which lower bounds , and Proposition 5.3 provides an upper bound.
In Corollary2, we note that when these two bounds match, we have a tight bound on as well as an optimal utility rule.
Corollary 2.
For the class of resource allocation problems , if the value of (Q) satisfies , then is a tight bound and a solution to (Q) is an optimal utility rule.
Proof.
This follows immediately from being a lower bound on and the reciprocal of the value of (Q), being an upper bound.
When the two match, the bound must be tight.
∎
The two bounds coinciding is not guaranteed but does occur at the extremes ( and ); further, the gap between the two bounds (if present) is often small, and the lower bound attained by the utility rule constructed in Theorem5.2 often demonstrates a significant improvement over the setting where agents simply optimize the system objective.
Consider the four welfare functions from Fig.3 again; for each, we find that the utility rule computed using Theorem5.2 and the upper bound on using Proposition5.3.
In Fig.5 we plot these lower and upper bounds on for each utility function and for each value of ; these values are juxtaposed with the -strong price of anarchy when agents optimize the system objective to demonstrate the possible gain in performance from designing the agents’ objective in collaborative systems.
6 Conclusion
In this work, we provided a variety of tools for evaluating the benefits and costs of collaborative communication in multi-agent systems.
A collaborative multi-agent system was modeled by a common interest game where groups of players collaboratively perform their best responses simultaneously.
We specifically considered the -strong Nash equilibrium as a relevant equilibrium concept to gain insights into system behavior between the fully centralized and fully distributed settings.
We introduced the notion of --coalitionally smooth systems and derived bounds on how well the -strong Nash equilibrium approximates the optimum in such systems.
Further analysis studied the running time of collaborative multi-agent decision dynamics and their transient performance, as well as the possible performance gains from designing agents’ objectives separately from the system objective.
Finally, we underwent a more thorough study in the class of resource allocation games, in which we provided tractable linear programs whose solutions give tight bounds on the -strong price of anarchy in resource allocation games.
Future work will study less extensive communication paradigms and dynamical systems that emerge when agents learn together.
References
References
[1]
S. Wollenstein-Betech, A. Houshmand, M. Salazar, M. Pavone, C. G. Cassandras,
and I. C. Paschalidis, “Congestion-aware Routing and Rebalancing of
Autonomous Mobility-on-Demand Systems in Mixed Traffic,” in
2020 IEEE 23rd International Conference on Intelligent
Transportation Systems (ITSC), 2020, pp. 1–7.
[2]
A. Khamis, A. Hussein, and A. Elmogy, “Multi-robot task allocation: A
review of the state-of-the-art,” Cooperative robots and sensor
networks 2015, pp. 31–51, 2015.
[3]
V. Ranganathan, P. Kumar, U. Kaur, S. H. Li, T. Chakraborty, and R. Chandra,
“Re-Inventing the Food Supply Chain with IoT: A Data-Driven
Solution to Reduce Food Loss,” IEEE Internet of Things
Magazine, vol. 5, no. 1, pp. 41–47, Mar. 2022.
[4]
K. Tsakalozos, H. Kllapi, E. Sitaridi, M. Roussopoulos, D. Paparas, and
A. Delis, “Flexible use of cloud resources through profit maximization and
price discrimination,” in Proc. International Conference on Data
Engineering, 2011, pp. 75–86.
[5]
F. G. Filip, “Decision support and control for large-scale complex systems,”
Annual Reviews in Control, vol. 32, no. 1, pp. 61–70, Apr. 2008.
[6]
C. Daini, P. Goatin, M. L. D. Monache, and A. Ferrara, “Centralized Traffic
Control via Small Fleets of Connected and Automated Vehicles,”
in 2022 European Control Conference (ECC), Jul. 2022, pp.
371–376.
[7]
L. Fang and H. Li, “Centralized resource allocation based on the cost–revenue
analysis,” Computers & Industrial Engineering, vol. 85, pp.
395–401, Jul. 2015.
[8]
G. Antonelli, “Interconnected dynamic systems: An overview on distributed
control,” IEEE Control Systems Magazine, vol. 33, no. 1, pp. 76–88,
2013.
[9]
J. R. Marden, Gü. Arslan, and J. S. Shamma, “Cooperative
Control and Potential Games,” IEEE Transactions on Systems,
Man, and Cybernetics, Part B (Cybernetics), vol. 39, no. 6, pp. 1393–1407,
2009.
[10]
R. M. Murray, “Recent research in cooperative control of multivehicle
systems,” Journal of Dynamic Systems, Measurement, and Control, vol.
129, no. 5, pp. 571–583, May 2007.
[11]
A. Das, T. Gervet, J. Romoff, D. Batra, D. Parikh, M. Rabbat, and J. Pineau,
“Tarmac: Targeted multi-agent communication,” in International
Conference on Machine Learning. PMLR,
2019, pp. 1538–1546.
[12]
B. L. Ferguson, D. Paccagnan, and J. R. Marden, “The cost of informing
decision-makers in multi-agent maximum coverage problems with random resource
values,” IEEE Control Systems Letters, vol. 7, pp. 2928–2933, 2023.
[13]
D. D. Šiljak and A. I. Zečević, “Control of large-scale systems:
Beyond decentralized feedback,” Annual Reviews in Control,
vol. 29, no. 2, pp. 169–179, Jan. 2005.
[14]
Z. Xu and V. Tzoumas, “Resource-Aware Distributed Submodular Maximization:
A Paradigm for Multi-Robot Decision-Making,” in 2022 IEEE
61st Conference on Decision and Control (CDC), Dec. 2022,
pp. 5959–5966.
[15]
G. Orosz, “Connected cruise control: Modelling, delay effects, and nonlinear
behaviour,” Vehicle System Dynamics, vol. 54, no. 8, pp. 1147–1176,
2016.
[16]
H. Nawaz, H. M. Ali, and A. A. Laghari, “UAV Communication Networks
Issues: A Review,” Archives of Computational Methods in
Engineering, vol. 28, no. 3, pp. 1349–1369, May 2021.
[17]
A. Lazaridou and M. Baroni, “Emergent multi-agent communication in the deep
learning era,” arXiv preprint arXiv:2006.02419, 2020.
[18]
R. J. Aumann, “Acceptable points in general cooperative n-person games,”
Contributions to the Theory of Games, vol. 4, pp. 287–324, 1959.
[19]
R. Nessah and G. Tian, “On the existence of strong Nash equilibria,”
Journal of Mathematical Analysis and Applications, vol. 414, no. 2,
pp. 871–885, 2014.
[20]
N. Gatti, M. Rocco, and T. Sandholm, “On the verification and computation of
strong Nash equilibrium,” arXiv preprint arXiv:1711.06318, 2017.
[21]
R. Holzman and N. Law-Yone, “Strong equilibrium in congestion games,”
Games and economic behavior, vol. 21, no. 1-2, pp. 85–101, 1997.
[22]
T. Harks, M. Klimm, and R. H. Möhring, “Strong Nash equilibria in
games with the lexicographical improvement property,” in Internet and
Network Economics: 5th International Workshop, WINE 2009, Rome, Italy,
December 14-18, 2009. Proceedings 5. Springer, 2009, pp. 463–470.
[23]
J. B. Clempner and A. S. Poznyak, “Finding the strong nash equilibrium:
Computation, existence and characterization for markov games,”
Journal of Optimization Theory and Applications, vol. 186, pp.
1029–1052, 2020.
[24]
A. Epstein, M. Feldman, and Y. Mansour, “Strong equilibrium in cost sharing
connection games,” in Proceedings of the 8th ACM Conference on
Electronic Commerce, ser. EC ’07. New York, NY, USA: Association for Computing Machinery, Jun.
2007, pp. 84–92.
[25]
N. Andelman, M. Feldman, and Y. Mansour, “Strong price of anarchy,”
Games and Economic Behavior, vol. 65, no. 2, pp. 289–317, Mar. 2009.
[26]
J. Barreiro-Gomez, G. Obando, and N. Quijano, “Distributed Population
Dynamics: Optimization and Control Applications,” IEEE
Transactions on Systems, Man, and Cybernetics: Systems, pp. 1–11, 2016.
[27]
A. Fiat, H. Kaplan, M. Levy, and S. Olonetsky, “Strong price of anarchy for
machine load balancing,” in ICALP, vol. 4596. Springer, 2007, pp. 583–594.
[28]
L. Epstein and R. van Stee, “The price of anarchy on uniformly related
machines revisited,” Information and Computation, vol. 212, pp.
37–54, 2012.
[29]
S. Chien and A. Sinclair, “Strong and pareto price of anarchy in congestion
games.” in ICALP (1). Citeseer, 2009, pp. 279–291.
[30]
Y. Bachrach, V. Syrgkanis, É. Tardos, and M. Vojnović, “Strong
Price of Anarchy, Utility Games and Coalitional Dynamics,”
in Algorithmic Game Theory, ser. Lecture Notes in Computer
Science, R. Lavi, Ed. Berlin,
Heidelberg: Springer, 2014, pp. 218–230.
[31]
M. Feldman and O. Friedler, “A unified framework for strong price of anarchy
in clustering games,” in Automata, Languages, and Programming: 42nd
International Colloquium, ICALP 2015, Kyoto, Japan, July 6-10, 2015,
Proceedings, Part II. Springer,
2015, pp. 601–613.
[32]
T. Roughgarden, “Intrinsic robustness of the price of anarchy,”
Communications of the ACM, vol. 55, no. 7, pp. 116–123, 2012.
[33]
M. Gairing, “Covering games: Approximation through non-cooperation,” in
Internet and Economics, 2009, pp. 184–195.
[34]
V. Bilò and C. Vinci, “Dynamic taxes for polynomial congestion games,” in
EC 2016 - Proceedings of the 2016 ACM Conference on
Economics and Computation. New York, New York, USA: ACM Press, 2016, pp. 839–856.
[35]
D. Paccagnan, R. Chandan, and J. R. Marden, “Utility Design for
Distributed Resource Allocation—Part I: Characterizing and
Optimizing the Exact Price of Anarchy,” IEEE Transactions
on Automatic Control, vol. 65, no. 11, pp. 4616–4631, Nov. 2020.
[36]
R. Zhang, Y. Zhang, R. Konda, B. Ferguson, J. Marden, and N. Li, “Markov
Games with Decoupled Dynamics: Price of Anarchy and Sample
Complexity,” Apr. 2023.
[37]
R. Konda, R. Chandan, D. Grimsman, and J. R. Marden, “Optimal Design of
Best Response Dynamics in Resource Allocation Games,” Apr. 2022.
[38]
B. L. Ferguson and J. R. Marden, “Robust Utility Design in Distributed
Resource Allocation Problems with Defective Agents,” Dynamic
Games and Applications, pp. 1–23, Aug. 2022.
[39]
W. Saad, Z. Han, M. Debbah, A. Hjorungnes, and T. Basar, “Coalitional game
theory for communication networks,” Ieee signal processing magazine,
vol. 26, no. 5, pp. 77–97, 2009.
[40]
H. Bayram and H. I. Bozma, “Multirobot communication network topology via
centralized pairwise games,” in 2013 IEEE International Conference
on Robotics and Automation. IEEE,
2013, pp. 2521–2526.
[41]
D. Cappello and T. Mylvaganam, “Distributed differential games for control of
multi-agent systems,” IEEE Transactions on Control of Network
Systems, vol. 9, no. 2, pp. 635–646, 2021.
[42]
A. Vetta, “Nash equilibria in competitive societies, with applications to
facility location, traffic routing and auctions,” The 43rd Annual IEEE
Symposium on Foundations of Computer Science, 2002. Proceedings., pp.
416–425, 2002.
[43]
S. Durand and B. Gaujal, “Complexity and Optimality of the Best Response
Algorithm in Random Potential Games,” in Algorithmic Game
Theory, M. Gairing and R. Savani, Eds. Berlin, Heidelberg: Springer, 2016, pp. 40–51.
[44]
Y. Giannakopoulos, “A Smoothed FPTAS for Equilibria in Congestion
Games,” Jul. 2023.
Proof of Proposition 2.1:
To show existence, we can simply observe that is a -strong Nash equilibrium for any .
Because for all , the global optimal satisfies
. ∎
Proof of Theorem3.3:
The proof can be outlined in four parts: first, the problem of finding is transformed and relaxed; second, the parameterization used in the proof of Proposition 3.2 is used to turn the relaxed problem into a linear program. Next, an example is constructed to show the linear program provides a tight bound. Finally, we take the dual of said linear program.
Figure 6: Game construction for worst-case -strong price of anarchy. Three of the players’ action sets are shown (color-coded in red, green, and blue, respectively) on three of rings for the label . A ring has positions, one for each player. For a label we generate rings for all the orderings of players over positions. This is repeated for each label. Players still only have two actions, but each action covers resources from each ring. The value of a resource is equal to the value of , a solution to (D), for the label with which it is associated.
(D)
1) Relaxing the problem: Quantifying can be expressed as taking the minimum -strong price of anarchy over all games in , i.e.,
(D1)
To make this problem more approachable, we introduce several transformations and relaxations.
First, rather than searching over the entire set of game , we search over the set of games , in which each agent has exactly two actions.
This reduction of the search space can be done without loss of generality, i.e., .
Trivially, .
Further, consider any game ; if for every player, each of their actions is removed except their action in the optimal allocation and their action in their worst -strong Nash equilibrium , the new problem will maintain the same -strong price of anarchy, but will now exist in .
With this reduction, we will denote each player’s action set as .
Second, we normalize each resource value such that the equilibrium welfare is one.
This, too, can be done without loss of generality by scaling each resource identically, thus not altering the ratio.
Third, we invert the objective and consider the maximization of .
Finally, we sum over each of the -coalition equilibrium constraints.
For each , rather than satisfying each inequality in (3), sum over every combination of the out of players, denoted .
Applying these reductions to (D1) gives,
(D2)
(D2) provides a lower bound on as the feasible set was expanded.
Later, we will show that the bound is tight by constructing an example that realizes it.
2) Parameterization: We use the parameterization introduced in the proof of Proposition 3.2 with respect to the joint actions and .
By considering any , we can parameterize any game ; to find the worst-case price of anarchy, we search over all such parameters, i.e., look over the entire class of games.
The linear program (D) is the result of the search for the vector that results in the highest price of anarchy.
3) Constructing an example: Consider the following resource allocation problem: for each label and permutation of the player , define a ring of resources.
Total, there are resources.
Let denote the resource with label at position in the th ring.
Consider, for instance, the rings associated with the label as depicted in Fig.6.
We will construct the actions and so that for each resource in these rings, agents have it in only their equilibrium action, and agents have it only in their optimal action.
In the first ring (with the monotonic permutation ), agent has actions and , where denotes the modulo operator so the selected resources wrap around the ring.
This pattern continues for each ring with a different permutation of players .
At a ring with label and permutation , player has the actions and .
Finally, each resource of type has a value where is a fixed parameter.
The function which encodes the welfare from player overlap is .
In the joint action , each resource is covered by exactly agents, and the system welfare can be written
(24)
Similarly, joint action satisfies
(25)
Now, consider a coalition and denote by its cardinality.
The system welfare of this group deviating their action to is
(26)
where we let be the shorthand for .
The second equality holds by defining and as the number of players in who invested in resource exclusively in their action or respectively.
By counting arguments, there are exactly positions for the players in which yield the profile for a resource at some fixed position in the ring, there are ways to order the players in , ways to order the players not in , and resource in each ring.
Verifying is a -strong Nash equilibrium boils down to showing (24) is greater than or equal to (26).
We can see that this holds whenever is a feasible point in (D).
Accordingly, the -strong price of anarchy satisfies
(27)
where the first inequality holds from the reductions made in part 1, and the second holds as the -strong price of anarchy is upper bounded by any particular problem; comparing (24) and (25) gives the final expression.
Letting take on the solution to (D) shows the bound is tight.
4) Taking the Dual: Before considering the dual program to (D), we first show that the primal is feasible.
It is easy to verify the feasible set is non-empty by considering the point and zero otherwise.
Now, we must show that the feasible set is compact, and thus, the value of (D) is bounded.
From the equality constraint, we can obtain
Because we assume for all , we show that each value of such that is bounded.
For the remaining values of , consider the equilibrium constraint333The constraint is present in (D) for all . when .
By rearranging terms and observing the bounded terms from the previous argument, we observe , where is a bounded value.
Because , the remaining decision variables are also bounded, and thus the feasible set is finite.
Now, we find the dual program to (D).
Because (D) is a linear program, we can rewrite it in the more concise form
where , , and are the associated dual variables.
The Lagrangian function is defined as .
Let serve as an upper bound to (D).
The dual program is derived by minimizing ; note that this value is only unbounded above unless .
Substituting this into the objective and removing the free variable so that the equality constraint becomes an inequality, the dual problem becomes
(P1)
From strong duality, (P1) provides the same value as (D).
Expanding terms show that (P1) is equivalent to (P).
∎
Proof of Proposition 5.3:
The proof is straightforward and simply requires generalizing the constraint set of (P).
Consider taking the same steps as the proof of Theorem3.3 but with the equilibrium constraint defined by the utility rule .
This will result in the same linear program as in (P), but now with the constraint set
(28)
At this point, the new linear program will provide tight bounds on a specified utility rule .
Finally, we substitute the new decision variable into each occurrence of .
This enlarges the feasible set, which now subsumes all the feasible points that would evaluate a utility rule by satisfying for all .
As we do not enforce this constraint, the value of the final program (Q) provides a lower bound on the original program, or its reciprocal provides an upper bound on the -strong price of anarchy under the optimal utility design.
∎