FedCompetitors: Harmonious Collaboration in Federated Learning with Competing Participants

Shanli Tan¹\equalcontrib, Hao Cheng²\equalcontrib, Xiaohu Wu¹\equalcontrib

{}^{\bigstar}

, Han Yu³\equalcontrib, Tiantian He⁴

{}^{\bigstar}

, Yew-Soon Ong^3,4,
Chongjun Wang², Xiaofeng Tao¹

Abstract

Federated learning (FL) provides a privacy-preserving approach for collaborative training of machine learning models. Given the potential data heterogeneity, it is crucial to select appropriate collaborators for each FL participant (FL-PT) based on data complementarity. Recent studies have addressed this challenge. Similarly, it is imperative to consider the inter-individual relationships among FL-PTs where some FL-PTs engage in competition. Although FL literature has acknowledged the significance of this scenario, practical methods for establishing FL ecosystems remain largely unexplored. In this paper, we extend a principle from the balance theory, namely “the friend of my enemy is my enemy”, to ensure the absence of conflicting interests within an FL ecosystem. The extended principle and the resulting problem are formulated via graph theory and integer linear programming. A polynomial-time algorithm is proposed to determine the collaborators of each FL-PT. The solution guarantees high scalability, allowing even competing FL-PTs to smoothly join the ecosystem without conflict of interest. The proposed framework jointly considers competition and data heterogeneity. Extensive experiments on real-world and synthetic data demonstrate its efficacy compared to five alternative approaches, and its ability to establish efficient collaboration networks among FL-PTs.

Introduction

Federated Learning (FL) represents a paradigm within distributed machine learning (ML) that facilitates the collaborative training of ML models by leveraging data from multiple parties while upholding privacy considerations (Yang et al. 2019). Each participant in FL (referred to as FL-PT) acts as a custodian of data and directly employs its dataset to locally train a model. In the well-established Federated Averaging (FedAvg) framework (McMahan et al. 2017), a central server (CS) periodically gathers model updates from individual FL-PTs, which are then aggregated to refine a global model. Similarly, each FL-PT regularly acquires the latest global model from the CS and further enhances it through local training. This iterative interplay between the CS and FL-PTs persists until the global model achieves convergence. FL has demonstrated significant promise across diverse domains, including healthcare, digital banking, ridesharing, recommender systems, and drug discovery (Sheller et al. 2020; Long et al. 2020; Yang et al. 2020; Wang et al. 2022; Oldenhof et al. 2023; Sun et al. 2023).

For example, consider a clinical research network of multiple hospitals (Fleurence et al. 2014). These hospitals possess the capacity to collaboratively construct ML models. In an optimal setting, the global model derived from FL should outperform models crafted by individual FL-PTs. However, a potential complication arises from the non-independent and non-identically distributed (Non-IID) nature of data across these FL-PTs (Zhu et al. 2021). Each FL-PT undertakes local model training, which might lead it to a distinct local optima, diverging from the global optima. Consequently, the model performance of an FL-PT might experience degradation due to the FL process (Wang et al. 2019). The diversity in data characteristics among FL-PTs can be graphically portrayed using a directed benefit graph denoted as $\mathcal{G}_{b}$ (Cui et al. 2022). In this graphical representation, an edge from FL-PT $v_{i}$ to $v_{j}$ signifies that the data from $v_{i}$ can potentially enhance the learning outcomes of $v_{j}$ through the FL process.

Besides data heterogeneity, another important factor is the relationships among FL-PTs. For instance, in the context of hospitals located in different cities, they serve distinct populations. As depicted in Figure 1, the hospital in city $C$ solely focuses on improving its own ML model, and its utility is independent of any FL-PT in other cities. Such two FL-PTs are considered “independent”, where the shared global model in FL functions as a public good, similar to a radio signal where each individual only values the received signal quality (Tang and Wong 2021). In contrast, hospitals within the same city (e.g., city $B$ ) serve the same population, which can include both public and private hospitals. Then, competition arises where the utility of an FL-PT also depends on the model performance of its competitor (Brekke, Siciliani, and Straume 2011). Such FL-PTs are considered “competitive”. The inter-individual relationship between any two FL-PTs can be represented by an undirected graph $\mathcal{G}_{c}$ .

Refer to caption — Figure 1: Illustration of the Relationships among Hospitals: the black line denotes the competing relationship between two hospitals.

In the presence of both data heterogeneity and competition, selecting suitable collaborators for each FL-PT is a crucial challenge. Recently, Cui et al. (2022) consider the data heterogeneity case (i.e., the edge set of $\mathcal{G}_{b}$ is non-empty and the edge set of $\mathcal{G}_{c}$ is empty) and leverages the concept of core-stable coalition from cooperative games to effectively address this. All FL-PTs are partitioned into disjoint groups/coalitions. Let $\pi(i)$ denote the coalition to which $v_{i}$ belongs where $\pi$ is called a coalition structure, and $v_{i}$ ’s utility depends on the FL-PTs in $\pi(i)$ . For a core-stable coalition structure $\pi$ , there is no other coalition $\mathcal{C}$ such that every FL-PT $v_{i}$ in $\mathcal{C}$ prefers $\mathcal{C}$ over $\pi(i)$ (Aziz and Savani 2016). Nevertheless, there is no existing work addressing the issue of competition among a part of FL-PTs when establishing collaborations in FL ecosystems.

In this paper, we propose the FedCompetitors approach to bridge this gap. It is general in the sense that (i) the edge set of $\mathcal{G}_{c}$ is empty or non-empty except the complete graph case and (ii) the edge set of $\mathcal{G}_{b}$ is non-empty. The presence of competing FL-PTs has been recognized as an important aspect in the FL literature (Kairouz et al. 2021; Zhan et al. 2022; Shi, Yu, and Leung 2023). In balance theory, a principle, namely “the friend of my enemy is my enemy”, can avoid conflict of interest (Leskovec, Huttenlocher, and Kleinberg 2010; Cartwright and Harary 1956). We apply its extended version to establish collaboration among FL-PTs. Specifically, suppose $v_{i}$ and $v_{k}$ compete, and $v_{j}$ is the friend of $v_{i}$ (i.e., $v_{i}$ benefits from the data of $v_{j}$ in FL training). The FL-PT $v_{i}$ , its friend $v_{j}$ , and other FL-PTs who benefit $v_{i}$ and $v_{j}$ are in an alliance. Then, the CS regulates that $v_{k}$ will not make a contribution to any FL-PT in the alliance, which ensures that no FL-PTs directly or indirectly assist their competitors. If two FL-PTs can collaborate together, they are independent of each other. In a group of independent FL-PTs, an FL-PT can freely collaborate with other FL-PTs in the group, thereby maximizing the social welfare of the entire FL ecosystem.

The extended principle and the resulting problem above can be formulated via graph theory and integer linear programming. We further propose a polynomial-time algorithm that is to determine the collaborators of each FL-PT. Using the proposed solution, even competing FL-PTs can seamlessly join without conflict of interest and the FL ecosystem thus exhibits a high level of scalability and is trusted by FL-PTs with conflicting interests (Tariq et al. 2023; Yu et al. 2014). Extensive experiments on both synthetic and real-world datasets demonstrate the effectiveness of FedCompetitors over the state of the art.

Related Work

We focus on the context of cross-silo FL, where FL-PTs are typically companies or organizations and they both contribute their data and utilize the trained ML models. In the existing research, two scenarios have been extensively investigated: (i) any two FL-PTs in the FL ecosystem are independent of each other and an FL-PT solely focuses on improving its own model performance, without considering the potential competition, and (ii) any two FL-PTs in the FL ecosystem compete against each other where $\mathcal{G}_{c}$ is a complete graph. In this paper, we mainly consider the scenario where there exists competition among a part of FL-PTs and an FL-PT will not collaborate with its competitors and other FL-PTs with potential conflict of interest.

Firstly, in the independent scenario, prior studies focus on alleviating the side effect of data heterogeneity. While applying Hedonic games that are a type of cooperative games (Aziz and Savani 2016), stable coalition structures are sought to establish collaboration among FL-PTs. Donahue and Kleinberg (2021) provide an analytical understanding of what partition of FL-PTs leads to a stable coalition structure for mean estimation and linear regression. Chaudhury et al. (2022) treat all FL-PTs as a grand coalition and optimizes a common model for all FL-PTs, which is considered core-stable if there is no other coalition $\mathcal{S}$ of FL-PTs that could significantly benefit by training a model with only their data. Another way that learns personalized models for FL-PTs works as follows (Tan et al. 2022): (i) use the CS to train a global model, and (ii) adapt the model to the local data of FL-PTs. Several approaches, such as meta-learning, and multi-task learning, have been employed for personalization (Fallah, Mokhtari, and Ozdaglar 2020; Smith et al. 2017). Ding and Wang (2022) study the case when the FL ecosystem expands to have numerous independent FL-PTs. A group of FL-PTs that has similar contributors is a group of collaboration partners. The authors propose to partition all FL-PTs into $K$ groups and adaptively learn a small number $K$ of models for $n$ FL-PTs, where $1\ll K\ll n$ .

Secondly, in the competition scenario, all FL-PTs are assumed to offer the same service in a given market. Wu and Yu (2022) aim to achieve the objective of maintaining a negligible change in market share after FL-PTs join the FL ecosystem (Farris et al. 2010; Wu, De Pellegrini, and Casale 2023), and analyze the achievability of this objective. Afterwards, two other works study the profitablity of FL-PTs in the given market after FL-PTs join the FL ecosystem, but are taken under different assumptions on the source of extra profit brought by FL. Specifically, Tsoy and Konstantinov (2023) use the following assumption: (i) each consumer has a fixed budget that is allocated to multiple services from different markets, and (ii) if an FL-PT has a higher model quality, its service quality is higher and the consumer will allocate more of its budget to consume the service. Huang, Ke, and Liu (2023) consider duopoly business competition between two FL-PTs and assume that, if the model-related service can be improved by FL, customers will have willingness to pay more and FL-PTs thus have opportunities to increase their profits.

Model and Assumptions

We use graph theory to describe our model of interest and mathematically formulate the extended principle. Specifically, let us consider a set of $n$ FL-PTs denoted by $\mathcal{V}=\{v_{1},$ $v_{2},\cdots,v_{n}\}$ . Each FL-PT $v_{i}$ possesses a local dataset $\mathcal{D}_{i}$ . The FL-PTs contemplate joining a collaborative FL network, facilitated by the CS. However, challenges such as data heterogeneity and competition arise among the FL-PTs. To characterize the various relationships among the FL-PTs, three graphs are employed.

Competing graph $\mathcal{G}_{c}$ . An undirected graph $\mathcal{G}_{c}=(\mathcal{V},E_{c})$ is used to represent the competing relations between any two FL-PTs, where $\mathcal{V}$ is the set of nodes/FL-PTs and $E_{c}$ is the set of edges. An edge $(v_{i},v_{j})\in E_{c}$ signifies a competitive relationship between FL-PTs $v_{i}$ and $v_{j}$ . The adjacency matrix of $\mathcal{G}_{c}$ is denoted as $S_{n\times n}$ : its main diagonal elements are set to zero, i.e., $s_{i,i}=0$ ; when $i\neq j$ , $s_{i,j}=1$ if $v_{i}$ competes with $v_{j}$ , and $s_{i,j}=0$ if $v_{i}$ is independent of $v_{j}$ . Each FL-PT $v_{i}$ will report its competitors to CS, as it hopes that CS will correctly utilize this information to prevent its competitors from benefiting from its data. Thus, CS has the knowledge of $\mathcal{G}_{c}$ .

Benefit graph $\mathcal{G}_{b}$ . A benefit graph is employed to depict the impact of sample distribution discrepancies among the $n$ FL-PTs. For any two FL-PTs $v_{i}$ and $v_{j}$ , if $w_{j,i}=0$ , it indicates that $v_{i}$ cannot benefit from the data of $v_{j}$ . Conversely, if $w_{j,i}>0$ , it implies that $v_{i}$ can benefit from $v_{j}$ ’s data, with larger values of $w_{j,i}$ signifying greater benefit to $v_{i}$ . These values $w_{j,i}$ define a directed graph denoted as $\mathcal{G}_{b}=(\mathcal{V},E_{b})$ , referred to as the benefit graph: $(v_{j},v_{i})\in E_{b}$ if and only if $i\neq j$ and $w_{j,i}>0$ . The adjacency matrix of $\mathcal{G}_{b}$ is denoted as $W_{n\times n}$ , where the $i$ -th column comprises the weights $w_{1,i},w_{2,i},\cdots,w_{n,i}$ , representing the importance of the $n$ FL-PTs to $v_{i}$ . The level of potential (LoP) of an FL-PT $v_{i}$ contributing to the other FL-PTs $\mathcal{V}-\{v_{i}\}$ is defined as

\displaystyle w_{i}=\sum\nolimits_{j\neq i}{w_{i,j}},

(1)

which measures the importance of $v_{i}$ to the FL ecosystem. The graph $\mathcal{G}_{b}$ can be obtained by the hypernetwork technique in (Cui et al. 2022; Navon et al. 2021).

Data usage graph $\mathcal{G}_{u}$ . Although $v_{i}$ may benefit from $v_{j}$ ’s data ( $w_{j,i}>0$ ), CS has the authority to determine whether $v_{i}$ can actually utilize $v_{j}$ ’s local model update information (i.e., indirectly use $v_{j}$ ’s data) in the FL training process or not. Let $X=(x_{j,i})$ be a $n\times n$ matrix where

\displaystyle x_{j,i}\in\{0,1\}

(2)

is a decision variable: for two different FL-PTs $v_{i}$ and $v_{j}$ , $x_{j,i}$ is set to one if $v_{j}$ will contribute to $v_{i}$ (i.e., $v_{i}$ will utilize $v_{j}$ ’s local model update information) in the FL training process and $x_{j,i}$ is set to zero otherwise. $X$ defines a directed graph $\mathcal{G}_{u}=(\mathcal{V},E_{u})$ , called the data usage graph: $(v_{j},v_{i})\in E_{u}$ if and only if $j\neq i$ and $x_{j,i}=1$ ; then, $v_{j}$ is said to be a collaborator or friend of $v_{i}$ . Consider any pair of FL-PTs $v_{i}$ and $v_{j}$ . If $v_{j}$ ’s data cannot benefit $v_{i}$ ( $w_{j,i}=0$ ), we set $x_{j,i}=0$ . Only when $v_{j}$ ’s data can benefit $v_{i}$ , there is a possibility that $x_{j,i}=1$ . Consequently, $E_{u}$ is a subset of $E_{b}$ , leading directly to the following conclusion.

Lemma 1.

For any two nodes $v_{j}$ and $v_{i}$ , if there is no path from $v_{j}$ to $v_{i}$ in the benefit graph $\mathcal{G}_{b}$ , then this also holds in the data usage graph $\mathcal{G}_{u}$ .

Principle for avoiding conflict of interest

Below, we extend the principle that “the friend of my enemy is my enemy”.

Assumption 1.

For any two competing FL-PTs $v_{i}$ and $v_{j}$ (i.e., $(v_{i},v_{j})\in E_{c}$ ), $v_{j}$ is unreachable to $v_{i}$ in the data usage graph $\mathcal{G}_{u}$ .

Assumption 1 is implemented while establishing the collaboration relationships among FL-PTs. Suppose there is a path from $v_{j}$ to $v_{i}$ in the benefit graph $\mathcal{G}_{b}$ whose length is $p_{i,j}$ . We use Figure 2 to explain the implication of Assumption 1. If $p_{i,j}=1$ , it posits that one FL-PT refuses to contribute to its competitor. If $p_{i,j}=2$ , we use $v_{k}$ to denote the intermediate node between $v_{j}$ and $v_{i}$ . If $v_{i}$ benefits from $v_{k}$ , $v_{k}$ is $v_{i}$ ’s friend; $v_{j}$ is not willing to see the enhancement of $v_{i}$ ’s model and will threaten not to contribute to $v_{k}$ . Assumption 1 posits that, if $(v_{k},v_{i})\in E_{u}$ , then $(v_{j},v_{k})\notin E_{u}$ , i.e., $v_{j}$ doesn’t help the friend $v_{k}$ of its enemy $v_{i}$ . Generally, for any $p_{i,j}$ , the path from $v_{j}$ to $v_{i}$ in $\mathcal{G}_{b}$ is denoted as

\displaystyle P_{j}^{i}=(v_{j_{0}},v_{j_{1}},\cdots,v_{j_{p_{i,j}}}),

(3)

where $j_{0}=j$ and $j_{p_{i,j}}=i$ . If any, let $t$ be the minimum integer in $[1,p_{i,j}-1]$ such that $(v_{j_{l}},v_{j_{l+1}})\in E_{u}$ for every $l\in[t,p_{i,j}-1]$ where $v_{j_{l}}$ helps $v_{j_{l+1}}$ . Then, FL-PTs $v_{j_{t}},$ $v_{j_{t+1}},$ $\cdots,$ $v_{j_{p_{i,j}}}$ are said to be in an alliance, and $v_{j}$ will not help any member in this alliance. Assumption 1 follows a common logic in reality that nobody wants to see others help its enemy and its enemy’s friends. By applying Assumption 1, it is strictly guaranteed that each FL-PT will not make a contribution to its competitors directly or indirectly.

For any competing FL-PTs $v_{i}$ and $v_{j}$ , let $\mathcal{P}_{j,i}$ denote the set of all reachable paths from $v_{j}$ to $v_{i}$ in the graph $\mathcal{G}_{b}$ . Assumption 1 can be characterized by $\mathcal{G}_{c}$ , $\mathcal{G}_{b}$ , and $\mathcal{G}_{u}$ .

Proposition 1.

Assumption 1 holds if and only if the following condition is satisfied:

	$\displaystyle x_{j,j_{1}}+x_{j_{1},j_{2}}+$	$\displaystyle\cdots+x_{j_{p_{i,j}},i}\leqslant p_{i,j}-1,$		(4)
		$\displaystyle\forall(v_{i},v_{j})\in E_{c},\,\forall P_{j}^{i}\in\mathcal{P}_{% j,i}.$

Proof.

Firstly, we prove the reverse direction. By Lemma 1, to satisfy Assumption 1, we only need to focus on such $v_{j}$ and $v_{i}$ that are reachable in $\mathcal{G}_{b}$ . $P_{j}^{i}$ is defined in Eq. (3). If Eq. (4) holds, then, for any $P_{j}^{i}\in\mathcal{P}_{j,i}$ there exist two adjacent nodes $v_{j_{l}}$ and $v_{j_{l+1}}$ in $P_{j}^{i}$ , where $l\in[0,p_{i,j}-1]$ , such that $x_{j_{l},j_{l+1}}$ $=0$ and $(v_{j_{l}},v_{j_{l+1}})\notin E_{u}$ . Thus, there are no reachable paths from $v_{j}$ to $v_{i}$ in $\mathcal{G}_{u}$ and Assumption 1 is satisfied. Secondly, we prove the forward direction by contradiction. The length of $P_{j}^{i}$ is $p_{i,j}$ . If Eq. (4) doesn’t hold, then, for any $l\in[0,p_{i,j}-1]$ , $x_{j_{l},j_{l+1}}=1$ and there exists an edge from $v_{j_{l}}$ to $v_{j_{l+1}}$ in the graph $\mathcal{G}_{u}$ , which contradicts Assumption 1 where $v_{j}$ is not reachable to $v_{i}$ in $\mathcal{G}_{u}$ . ∎

In this paper, we aim to propose a framework that can construct an FL ecosystem without conflict of interest. Mathematically, our problem is to determine the matrix $X_{n\times n}$ of decision variables that satisfy Eq. (2) and (4), which determines the collaborators of FL-PTs. Eq. (4) is equivalent to Assumption 1 by Proposition 1. The absence of conflicting interests among FL-PTs is guaranteed by Eq. (4).

Polynomial-Time Algorithm

We propose a polynomial-time algorithm to determine the matrix $X_{n\times n}$ of decision variables subject to Eq. (2) and (4). We begin by describing the algorithm’s initial states. The LoP $w_{i}$ in Eq. (1) measures the importance of $v_{i}$ to the FL ecosystem. We sort the LoPs of all FL-PTs in non-increasing order, and without loss of generality, we assume:

\displaystyle w_{1}\geqslant w_{2}\geqslant\cdots\geqslant w_{n}.

(5)

The initial values of $X_{n\times n}$ are set as follows:

\displaystyle x_{j,i}=1\text{ if }i=j,\text{ and }x_{j,i}=0\text{ if }i\neq j.

(6)

This defines the initial $\mathcal{G}_{u}$ , which will be updated as the algorithm runs. We also define a connectivity matrix $C_{n\times n}$ of $\mathcal{G}_{u}$ : when $i\neq j$ , $c_{j,i}=1$ if there is a path from $v_{j}$ to $v_{i}$ and $c_{j,i}=0$ otherwise; $c_{i,i}$ is always set to one trivially. Initially, $C_{n\times n}$ is set as an identity matrix, i.e., a diagonal matrix whose main diagonal elements are all one.

Data:

S_{n\times n}

, and

W_{n\times n}

Result:

X_{n\times n}

1 Initialize

X_{n\times n}

by Eq. (6) and

C_{n\times n}

to be an identity matrix;

2 Generate the sorted sequence (i.e., Eq. (5));

3 for $v_{i}$ in the sorted sequence do

4 Solve the ILP problem (7) by Algorithm 2;

Algorithm 1 Collaborator Selection

The proposed algorithm is presented as Algorithm 1. The $n$ FL-PTs are considered sequentially from $v_{1}$ to $v_{n}$ (line 3). At the step for $v_{i}$ (line 4), the decision variables to be determined are $\{x_{j,i}\}_{j\neq i}$ and we maximize the benefit of $v_{i}$ :

\displaystyle\text{maximize}\enskip\sum\nolimits_{j\neq i}{w_{j,i}\cdot x_{j,i}}

(7)

subject to Eq. (2) and (4). Afterwards, $X_{n\times n}$ is updated and the collaborators of $v_{i}$ are determined. Next, we solve the integer linear programming (ILP) problem (7). Let $\mathcal{B}_{i}$ denote all FL-PTs that can benefit $v_{i}$ but are independent of $v_{i}$ , which can be defined by the adjacency matrix $W_{n\times n}$ of $\mathcal{G}_{b}$ and the adjacency matrix $S_{n\times n}$ of $\mathcal{G}_{c}$ :

\displaystyle\mathcal{B}_{i}=\left\{v_{j}\in\mathcal{V}\,|\,j\neq i,w_{j,i}>0,% s_{j,i}=0\right\}.

(8)

$\mathcal{B}_{i}$ includes all possible collaborators of $v_{i}$ .

For any $v_{j}\in\mathcal{B}_{i}$ , let $\mathcal{V}_{j}^{-}$ denote a set consisting of all nodes that are reachable to $v_{j}$ in $\mathcal{G}_{u}$ , as well as $v_{j}$ itself, which can be defined by the connectivity matrix $C_{n\times n}$ :

\displaystyle\mathcal{V}_{j}^{-}=\left\{v_{k}\in\mathcal{V}\,|\,c_{k,j}=1% \right\}.

(9)

Let $\mathcal{S}_{j}^{-}$ denote all competitors of the nodes in $\mathcal{V}_{j}^{-}$ , and $\mathcal{S}_{i,j}^{-}$ denote the nodes of $\mathcal{S}_{j}^{-}$ that are reachable from $v_{i}$ in $\mathcal{G}_{u}$ :

	$\displaystyle\mathcal{S}_{j}^{-}$	$\displaystyle=\left\{v_{k}\in\mathcal{V}\,\|\,\exists v_{p}\in\mathcal{V}_{j}^{% -}:s_{k,p}=1\right\},$		(10)
	$\displaystyle\mathcal{S}_{i,j}^{-}$	$\displaystyle=\left\{v_{k}\in\mathcal{S}_{j}^{-}\,\|\,c_{i,k}=1\right\}% \subseteq\mathcal{S}_{j}^{-}.$		(11)

As illustrated in Figure 3(a), if $\mathcal{S}_{i,j}^{-}\neq\emptyset$ , we have $x_{j,i}=0$ ; otherwise, some nodes in $\mathcal{V}_{j}^{-}$ will be reachable to its competitor (e.g., the node in the oval) in $\mathcal{G}_{u}$ , which violates Eq. (4). Let $\mathcal{V}_{i}^{+}$ denote a set consisting of all nodes that are reachable from $v_{i}$ in $\mathcal{G}_{u}$ , as well as $v_{i}$ itself:

\displaystyle\mathcal{V}_{i}^{+}=\{v_{k}\in\mathcal{V}\,|\,c_{i,k}=1\}.

(12)

Let $\mathcal{S}_{i}^{+}$ denote all competitors of the nodes in $\mathcal{V}_{i}^{+}$ , and $\mathcal{S}_{i,j}^{+}$ denote the nodes of $\mathcal{S}_{i}^{+}$ that are reachable to $v_{j}$ in $\mathcal{G}_{u}$ :

	$\displaystyle\mathcal{S}_{i}^{+}$	$\displaystyle=\{v_{k}\in\mathcal{V}\,\|\,\exists v_{p}\in\mathcal{V}_{i}^{+}:s_% {p,k}=1\},$		(13)
	$\displaystyle\mathcal{S}_{i,j}^{+}$	$\displaystyle=\{v_{k}\in\mathcal{S}_{i}^{+}\,\|\,c_{k,j}=1\}\subseteq\mathcal{S% }_{i}^{+}.$		(14)

Here, by Eq. (9), (11), (12), and (14), we have

\displaystyle\mathcal{S}_{i,j}^{-}=\mathcal{V}_{i}^{+}\cap\mathcal{S}_{j}^{-}% \subseteq\mathcal{V}_{i}^{+}\text{ and }\mathcal{S}_{i,j}^{+}=\mathcal{V}_{j}^% {-}\cap\mathcal{S}_{i}^{+}\subseteq\mathcal{V}_{j}^{-}.

(15)

As illustrated in Figure 3(b), if $\mathcal{S}_{i,j}^{+}\neq\emptyset$ , then $x_{j,i}=0$ ; otherwise, some nodes in $\mathcal{V}_{i}^{+}$ will be reachable from its competitor (e.g., the node in the oval) in $\mathcal{G}_{u}$ , violating Eq. (4).

Based on the above understanding, we propose Algorithm 2 to solve the ILP problem (7). For a node $v_{j}\in\mathcal{B}_{i}$ , $w_{j,i}$ represents the importance of $v_{j}$ to $v_{i}$ . We sort the nodes of $\mathcal{B}_{i}$ in the non-increasing order of their values $w_{j,i}$ (line 1). The nodes of $\mathcal{B}_{i}$ are considered sequentially in this order (line 2). For each node $v_{j}\in\mathcal{B}_{i}$ , if $\mathcal{S}_{i,j}^{+}=\emptyset$ and $\mathcal{S}_{i,j}^{-}=\emptyset$ , the algorithm sets $v_{j}$ as the collaborator of $v_{i}$ (i.e., $x_{j,i}=1$ ), with the connectivity from $v_{j}$ to $v_{i}$ is updated (lines 3-4). Finally, we consider the effect of setting $x_{j,i}=1$ on the connectivity between any two nodes $v_{p}$ and $v_{q}$ in the graph $\mathcal{G}_{u}$ , except $(v_{j},v_{i})$ (line 5). In the graph $\mathcal{G}_{u}$ , if we have before executing line 4 that $v_{p}$ is not reachable to $v_{q}$ , $v_{p}$ is reachable to $v_{j}$ , and $v_{i}$ is reachable to $v_{q}$ , then $v_{p}$ becomes reachable to $v_{q}$ (lines 6-7).

Lemma 2.

Given $W_{n\times n}$ , $S_{n\times n}$ and $C_{n\times n}$ , the time complexity of finding $\mathcal{B}_{i}$ is $\mathcal{O}(n)$ while the time complexity of finding $\mathcal{S}_{i,j}^{-}$ or $\mathcal{S}_{i,j}^{+}$ is $\mathcal{O}(n^{2})$ .

Proof.

By Eq. (8), the time complexity of finding $\mathcal{B}_{i}$ is $\mathcal{O}(n)$ where $|\mathcal{B}_{i}|\leqslant n$ . By Eq. (9), the time complexity of finding $\mathcal{V}_{j}^{-}$ is $\mathcal{O}(n)$ where $|\mathcal{V}_{j}^{-}|\leqslant n$ . By Eq. (10), $\mathcal{S}_{j}^{-}$ can be found by (i) checking every $v_{k}\in\mathcal{V}$ and (ii) judging whether there exists a node $v_{p}\in\mathcal{V}_{j}^{-}$ such that $s_{k,p}=1$ ; the resulting time complexity is $\mathcal{O}(n^{2})$ ; here, $|\mathcal{S}_{j}^{-}|\leqslant n$ . Given $\mathcal{S}_{j}^{-}$ , by Eq. (11), the time complexity of finding $\mathcal{S}_{i,j}^{-}$ is $\mathcal{O}(n)$ . Finally, the time complexity of finding $\mathcal{S}_{i,j}^{-}$ is $\mathcal{O}(n^{2})$ . Similarly to $\mathcal{S}_{i,j}^{-}$ , the time complexity of finding $\mathcal{S}_{i,j}^{+}$ is also $\mathcal{O}(n^{2})$ . ∎

Data:

W_{n\times n}

S_{n\times n}

, and

C_{n\times n}

Result: the updated

X_{n\times n}

, and

C_{n\times n}

1 Sort the nodes of

\mathcal{B}_{i}

in non-increasing order of their values

w_{j,i}

, generating a sorted sequence;

2 for $v_{j}$ in the sorted sequence do

3 if $\mathcal{S}_{i,j}^{+}=\emptyset$ $\wedge$ $\mathcal{S}_{i,j}^{-}=\emptyset$ then

x_{j,i}\leftarrow 1

c_{j,i}\leftarrow 1

;

5 for any two integers $p\in[1,n]$ and $q\in[1,n]$ with $p\neq q$ and $(p,q)\neq(j,i)$ do

6 if $c_{p,q}=0$ $\wedge$ $c_{p,j}=1\wedge c_{i,q}=1$ then

c_{p,q}\leftarrow 1

;

Algorithm 2 ILP Solver

Proposition 2.

Suppose $X_{n\times n}$ satisfies Eq. (2) and (4) before $v_{i}$ is considered. Algorithm 2 gives a feasible solution to the ILP problem (7) with a time complexity $\mathcal{O}(n^{3})$ when $v_{i}$ is considered.

Proof.

By Proposition 1, Eq. (4) is equivalent to Assumption 1. Firstly, we prove by contradiction that Algorithm 2 gives a feasible solution. Before $v_{i}$ is considered, no two competitors in $\mathcal{V}$ are reachable in $\mathcal{G}_{u}$ by Assumption 1. Setting $x_{j,i}=1$ is equivalent to adding an edge $(v_{j},v_{i})$ in $\mathcal{G}_{u}$ . By the definition of $\mathcal{V}_{j}^{-}$ and $\mathcal{V}_{i}^{+}$ , the addition of $(v_{j},v_{i})$ can only affect the reachability from the nodes of $\mathcal{V}_{j}^{-}$ to the nodes of $\mathcal{V}_{i}^{+}$ in $\mathcal{G}_{u}$ . Suppose there exists a node $v_{j}\in\mathcal{B}_{i}$ satisfying $\mathcal{S}_{i,j}^{+}=\emptyset$ and $\mathcal{S}_{i,j}^{-}=\emptyset$ , such that, Assumption 1 is violated after setting $x_{j,i}=1$ . Thus, the addition of $(v_{j},v_{i})$ leads to that some node of $\mathcal{V}_{j}^{-}$ is reachable to and competes with some node of $\mathcal{V}_{i}^{+}$ in $\mathcal{G}_{u}$ . Then, there exists a node $v_{k}$ such that either $v_{k}\in\mathcal{V}_{j}^{-}$ and $v_{k}$ is a competitor of some node in $\mathcal{V}_{i}^{+}$ (i.e., $v_{k}\in\mathcal{S}_{i,j}^{+}$ by Eq. (13) and (15)), or $v_{k}\in\mathcal{V}_{j}^{+}$ and $v_{k}$ is a competitor of the nodes of $\mathcal{V}_{i}^{-}$ (i.e., $v_{k}\in\mathcal{S}_{i,j}^{-}$ by Eq. (10) and (15)). $\mathcal{S}_{i,j}^{-}$ and $\mathcal{S}_{i,j}^{+}$ are non-empty, which contradicts the condition in line 3 that leads to $x_{j,i}=1$ .

Secondly, we show the complexity of Algorithm 2. Given $\mathcal{B}_{i}$ , the time complexity of sorting the nodes of $\mathcal{B}_{i}$ is $\mathcal{O}(n\log{n})$ , e.g., using the mergesort algorithm. Thus, by Lemma 2, the time complexity in line 1 is $\mathcal{O}(n\log{n})$ . For the for-loop in line 2, its time complexity is $\mathcal{O}(n)$ where $|\mathcal{B}_{i}|\leqslant n$ ; by Lemma 2, the time complexity in line 3 is $\mathcal{O}(n^{2})$ . For the for-loop in line 5, the time complexity is $\mathcal{O}(n^{2})$ . The total time complexity in lines 2–7 is $\mathcal{O}(n^{3})$ . Finally, Algorithm 2 has a time complexity $\mathcal{O}(n^{3})$ . ∎

We show the correctness of Algorithm 1. At the beginning of Algorithm 1, $X_{n\times n}$ satisfies Eq. (2) and (4) by Eq. (6). After each step for $v_{i}$ in line 4, $X_{n\times n}$ still satisfies these constraints by Proposition 2. When Algorithm 1 ends, the final collaborating relationship among all FL-PTs is determined by $X_{n\times n}$ . By Eq. (1), the time complexity of computing $w_{i}$ for each FL-PT $v_{i}$ is $\mathcal{O}(n)$ ; thus, the time complexity of computing $w_{1},w_{2},\cdots,w_{n}$ is $\mathcal{O}(n^{2})$ . The time complexity of sorting $w_{1},w_{2},\cdots,w_{n}$ is $\mathcal{O}(n\log{n})$ . Thus, the time complexity in line 2 of Algorithm 1 is $\mathcal{O}(n^{2})$ . By Proposition 2, the time complexity in lines 3-4 is $\mathcal{O}(n^{4})$ . Thus, the time complexity of Algorithm 1 is $\mathcal{O}(n^{4})$ .

Table 1: Experiments with synthetic data under fixed competing graphs

[b] Weakly Non-IID setting (MSE) $v_{1}$ $v_{2}$ $v_{3}$ $v_{4}$ $v_{5}$ $v_{6}$ $v_{7}$ $v_{8}$ Local 0.23 $\pm$ 0.08 0.23 $\pm$ 0.09 0.87 $\pm$ 0.41 0.82 $\pm$ 0.26 0.23 $\pm$ 0.10 0.23 $\pm$ 0.07 0.82 $\pm$ 0.24 0.78 $\pm$ 0.30 FedAvg 0.20 $\pm$ 0.06 0.20 $\pm$ 0.06 0.20 $\pm$ 0.10 0.19 $\pm$ 0.07 0.19 $\pm$ 0.06 0.19 $\pm$ 0.06 0.19 $\pm$ 0.08 0.19 $\pm$ 0.10 FedProx 0.16 $\pm$ 0.06 0.17 $\pm$ 0.07 0.15 $\pm$ 0.09 0.17 $\pm$ 0.08 0.17 $\pm$ 0.06 0.17 $\pm$ 0.06 0.16 $\pm$ 0.09 0.18 $\pm$ 0.07 SCAFFOLD 0.17 $\pm$ 0.07 0.17 $\pm$ 0.07 0.16 $\pm$ 0.09 0.16 $\pm$ 0.07 0.18 $\pm$ 0.06 0.18 $\pm$ 0.07 0.18 $\pm$ 0.08 0.18 $\pm$ 0.08 CE 0.14 $\pm$ 0.10 0.14 $\pm$ 0.11 1.14 $\pm$ 0.67 1.20 $\pm$ 0.88 0.15 $\pm$ 0.08 0.16 $\pm$ 0.09 1.23 $\pm$ 0.37 1.22 $\pm$ 0.81 FedCompetitors 0.14 $\pm$ 0.12 0.14 $\pm$ 0.07 0.13 $\pm$ 0.06 0.15 $\pm$ 0.06 0.15 $\pm$ 0.08 0.14 $\pm$ 0.06 0.14 $\pm$ 0.07 0.14 $\pm$ 0.07 Strongly Non-IID Setting (MSE) $v_{1}$ $v_{2}$ $v_{3}$ $v_{4}$ $v_{5}$ $v_{6}$ $v_{7}$ $v_{8}$ Local 0.23 $\pm$ 0.08 0.23 $\pm$ 0.08 0.22 $\pm$ 0.07 0.23 $\pm$ 0.08 0.23 $\pm$ 0.06 0.22 $\pm$ 0.06 0.22 $\pm$ 0.08 0.23 $\pm$ 0.07 FedAvg 24.47 $\pm$ 4.98 24.85 $\pm$ 4.82 24.85 $\pm$ 5.03 24.73 $\pm$ 5.67 24.15 $\pm$ 3.00 24.47 $\pm$ 2.78 24.17 $\pm$ 4.40 24.97 $\pm$ 3.81 FedProx 17.80 $\pm$ 7.54 17.82 $\pm$ 6.42 17.88 $\pm$ 7.68 17.86 $\pm$ 7.64 17.69 $\pm$ 7.14 17.76 $\pm$ 6.23 17.68 $\pm$ 5.94 17.73 $\pm$ 7.04 SCAFFOLD 17.22 $\pm$ 2.85 17.44 $\pm$ 2.17 17.39 $\pm$ 4.02 17.20 $\pm$ 3.58 16.87 $\pm$ 2.75 17.13 $\pm$ 2.79 17.00 $\pm$ 2.41 17.33 $\pm$ 2.59 CE 0.15 $\pm$ 0.12 0.14 $\pm$ 0.11 0.14 $\pm$ 0.07 0.14 $\pm$ 0.07 0.14 $\pm$ 0.06 0.14 $\pm$ 0.06 0.12 $\pm$ 0.05 0.12 $\pm$ 0.05 FedCompetitors 0.14 $\pm$ 0.07 0.13 $\pm$ 0.06 0.13 $\pm$ 0.06 0.14 $\pm$ 0.09 0.13 $\pm$ 0.07 0.14 $\pm$ 0.06 0.11 $\pm$ 0.04 0.13 $\pm$ 0.07

Experimental Evaluation

We conduct experiments on synthetic data and the CIFAR-10 dataset. To investigate the practicality of FedCompetitors, we also adopt the electronic health record (EHR) dataset eICU (Pollard et al. 2018) to illustrate the collaboration relationships of FL-PTs on a real-world network of multiple hospitals.

Comparison baselines

Compared with the proposed approach in the last section, we now give a more intuitive procedure to address the competing relationships among FL-PTs. This procedure makes the previous FL approaches (e.g., FedAvg) applicable to the scenario of this paper. At a high level, we will find a partition of all FL-PTs into several disjoint groups such that the FL-PTs in each group are independent of each other, without conflict of interest. Then, baselines can be generated by directly applying the previous FL approaches to each group of FL-PTs. Specifically, the competing graph $\mathcal{G}_{c}$ describes the competing relationship among FL-PTs. Let $\mathcal{G}_{c}^{-}$ denote the complement of $\mathcal{G}_{c}$ : the nonexistence of an edge between $v_{i}$ and $v_{j}$ in $\mathcal{G}_{c}$ leads to the existence of an edge $(v_{i},v_{j})$ in $\mathcal{G}_{c}^{-}$ , and vice versa. Each edge in the graph $\mathcal{G}_{c}^{-}$ indicates that the two FL-PTs connected by this edge are independent. A clique is a subset of nodes of $\mathcal{G}_{c}^{-}$ such that every two nodes in the clique are adjacent, that is, a clique is a subgraph that is complete. A clique cover of $\mathcal{G}_{c}^{-}$ is a partition of all nodes into cliques within which every two nodes in the clique are adjacent and independent of each other (Tomita, Tanaka, and Takahashi 2006). A minimum clique cover is a clique cover that uses as few cliques as possible.

The FL-PTs in each clique are grouped together to take FL training, without involving the FL-PTs from other cliques. We apply four typical FL approaches directly to the nodes of each clique for FL training: FedAvg, CE, FedProx (Li et al. 2020) and SCAFFOLD (Karimireddy et al. 2020), which generates four baselines. The collaboration equilibrium (CE) approach is proposed in (Cui et al. 2022) where each coalition is defined as a strongly connected component of the benefit graph; its effectiveness has well been validated against several other approaches. FedProx and SCAFFOLD represent two typical approaches that make the aggregated model at the CS close to the global optima and are two benchmarks in (Li et al. 2022) for showing the FL performance under Non-IID data settings. The fifth baseline is Local where each FL-PT takes local ML training without collaboration.

General experimental setting. Like (Cui et al. 2022), the hypernetwork technique in (Navon et al. 2021) is used to compute the benefit graph $\mathcal{G}_{b}$ and a hypernetwork is constructed by a multilayer perceptron (MLP). When it comes to a specific dataset, all approaches have the same network structure for each FL-PT to execute the learning tasks.

Synthetic experiments

We show the experimental results on synthetic data with fixed competing graphs. Specifically, let us consider 8 FL-PTs $\{v_{1},v_{2},\cdots,v_{8}\}$ . The synthetic features are generated by $x\sim\mathcal{U}[-1.0,1.0]$ . Given the FL-PT $v_{i}$ , the grand truth weights $u_{i,l}=v_{l}+r_{i,l}$ are sampled as $v\sim$ $\mathcal{U}[0.0,1.0]$ and $r_{i,l}\sim$ $\mathcal{N}(0.0,\rho^{2})$ where $l\in\{1,2,3\}$ ; the noise $\epsilon\sim\mathcal{N}(0.0,$ $0.1^{2})$ is added to each label.

Weakly Non-IID setting. $\rho^{2}$ measures the data distribution discrepancy among FL-PTs. We set $\rho=0.01$ , which means that the generated data are weakly non-iid in terms of sample features and labels. The same type of polynomial regression tasks is learned by all FL-PTs and the synthetic labels are defined as: $y=\sum_{l=1}^{3}{u_{i,l}^{T}x^{l}}+\epsilon$ . The network used for predicting the label at each FL-PT is an MLP with one hidden layer. FL-PTs $v_{1}$ , $v_{2}$ , $v_{5}$ and $v_{6}$ have 2000 samples, while the other FL-PTs have 100 samples. Thus, there exists quantity skew, i.e., a significant difference in the sample quantities of FL-PTs. Two large FL-PTs $v_{1}$ and $v_{2}$ are independent and compete with the other two large FL-PTs $v_{5}$ and $v_{6}$ that are independent. Each small FL-PT competes one large FL-PT: $(v_{1},v_{7})$ , $(v_{2},v_{8})$ , $(v_{3},v_{5})$ , and $(v_{4},v_{6})$ are edges in the competing graph $\mathcal{G}_{c}$ . Such $\mathcal{G}_{c}$ leads to a unique clique cover. Under this setting, the minimum clique cover of $\mathcal{G}_{c}^{-}$ is $\{v_{i}\}_{i=1}^{4}$ and $\{v_{i}\}_{i=5}^{8}$ , and small FL-PTs benefit large FL-PTs little. The experimental results (measured by mean squared error (MSE)) are given in Table 1. On average, CE has the worst performance since small FL-PTs $v_{3}$ , $v_{4}$ , $v_{7}$ and $v_{8}$ cannot benefit from large FL-PTs. Particularly, FedCompetitors has the best performance compared with the five baselines.

Table 2: Experiments with eICU under a fixed competing graph

[t] AUC $v_{1}$ $v_{2}$ $v_{3}$ $v_{4}$ $v_{5}$ $v_{6}$ $v_{7}$ $v_{8}$ $v_{9}$ $v_{10}$ Local 76.12 69.46 68.94 68.04 76.46 40.00 69.30 60.53 56.94 49.12 FedAvg 75.26 72.09 68.87 74.13 83.72 41.67 79.37 54.41 66.67 38.10 CE 83.53 75.64 74.38 74.46 80.89 82.61 71.43 66.67 66.67 80.00 FedCompetitors 81.50 78.23 69.18 83.52 85.91 89.58 80.70 68.89 90.48 95.24

Table 3: Experiments with CIFAR-10 under randomly generated competing graphs

[b] MTA Local $86.46\pm 4.12$ FedAvg $52.99\pm 4.38$ FedProx $51.13\pm 7.10$ SCAFFOLD $51.20\pm 7.09$ CE $87.80\pm 7.18$ FedCompetitors 91.33 $\pm$ 4.14

Strongly Non-IID setting. This setting is the same as the setting above expect three aspects. Firstly, each FL-PT has 2000 samples and there is no quantity skew. Secondly, we generate conflicting learning tasks by flip** over the labels of some FL-PTs: $y=-\sum_{l=1}^{3}{u_{i,l}^{T}x^{l}}+\epsilon$ for $i\in\{5,6,7,8\}$ , which leads to strongly Non-IID among the eight FL-PTs in terms of the labels. Thirdly, we test on a different competing graph where there are two independent groups of FL-PTs $\{v_{i}\}_{i=1}^{4}$ and $\{v_{i}\}_{i=5}^{8}$ : for $i\in\{1,5\}$ , the FL-PTs $v_{i}$ and $v_{i+1}$ are independent of each other and compete with $v_{i+2}$ and $v_{i+3}$ that are also independent of each other. Under this setting, all FL-PTs in the same group can benefit each other; the minimum clique cover of $\mathcal{G}_{c}^{-}$ is $\{v_{1},$ $v_{2},v_{5},v_{6}\}$ and $\{v_{3},v_{4},v_{7},v_{8}\}$ . The experimental results are given in Table 1. FedAvg, FedProx, and SCAFFOLD perform the worst since training a global model cannot simultaneously satisfy the FL-PTs in the same clique with conflicting learning tasks. It is observed that FedCompetitors has the best performance compared with the five baselines.

Benchmark experiments

We conduct experiments on CIFAR-10 with competing graphs that are generated randomly. CIFAR-10 is an image classification dataset and has 10 classes, each with 6000 images. We follow the setting in (Cui et al. 2022) for CIFAR-10 to construct Non-IID data and network structures, and to measure performance. There are 10 FL-PTs, and each FL-PT randomly obtains 2 of the 10 classes to simulate the Non-IID setting. The model performance is measured by the mean test accuracy (MTA). To simulate competition, we set the probability of two FL-PTs competing against each other to 0.2, thus generating a random competing graph $\mathcal{G}_{c}$ , which constrains the collaboration between some FL-PTs. Table 3 shows the experimental results. It is observed that FedCompetitors has the best performance. FedAvg, FedProx, and SCAFFOLD perform worst since training a global model cannot simultaneously satisfy the FL-PTs in the same clique with data heterogeneity. FedCompetitors performs better than CE by 3.53%.

Hospital collaboration example

eICU is a dataset collecting EHRs from many hospitals across the United States admitted to the intensive care unit (ICU). The task is to predict mortality during hospitalization. We use this dataset to illustrate a benefit graph $\mathcal{G}_{b}$ and a data usage graph $\mathcal{G}_{u}$ in the real world. The setting here is the same as the setting in (Cui et al. 2022) for eICU, including the data pre-processing procedure, the way of choosing hospitals, the network structures, and the performance metric. There are 10 hospitals, among which the first 5 hospitals $\{v_{i}\}_{i=1}^{5}$ are large with about 1000 patients per hospital and the others are small with about 100 patients per hospital. Label imbalance occurs since more than 90% samples have negative labels; thus, AUC is used to measure the utility of each FL-PT. The generated benefit graph $\mathcal{G}_{b}$ is illustrated in Figure 4(a).

Let us consider the case where more than one large hospital may be located in the same city while small hospitals are dispersed in rural areas with lower population densities; competition mainly occurs among large hospitals. We assume that $v_{2}$ competes with $v_{5}$ , while $v_{3}$ competes with $v_{4}$ and $v_{5}$ , respectively. For the baselines except the local approach, the way of generating the clique cover is independent of $\mathcal{G}_{b}$ where FL-PTs in each clique collaborate together; the generated clique cover is $\{v_{4},v_{5}\}$ and $\{v_{i}\}_{i=1}^{3}\cup\{v_{i}\}_{i=6}^{10}$ . For FedCompetitors, the generated data usage graph $\mathcal{G}_{u}$ is illustrated in Figure 4(b), which fully utilizes the information on $\mathcal{G}_{b}$ by Algorithm 1. Compared with the baselines, it is observed from Figure 4(b) that the local model update information of $v_{4}$ and $v_{5}$ can also be utilized by other FL-PTs $\{v_{1},v_{7},v_{8},v_{9},v_{10}\}$ while $v_{4}$ and $v_{5}$ can similarly benefit from $v_{1}$ in the FL training process. This is an advantage of FedCompetitors and is reflected in the experimental results, which are given in Table 2. Overall, FedCompetitors achieves the best performance.

Conclusions

We consider in this paper an open research problem in which a subset of FL-PTs in the FL ecosystem engage in competition. We extend a principle from balance theory that “the friend of my enemy is my enemy” to guarantee that no conflict of interest occurs among FL-PTs. The resulting FL ecosystem thus exhibits a high level of scalability since FL-PTs that even compete can join smoothly. We formulate the problem and show that it is mathematically solvable in polynomial time. Thus, an efficient algorithm is proposed to determine the collaboration relationships of FL-PTs. The framework of this paper is also general since it considers both competition and data heterogeneity, which is another important aspect in FL. Extensive experiments demonstrate the effectiveness of the proposed framework.

Acknowledgments

This research was supported in part by the National Key R&D Program of China (No. 2022YFB2902900). This research/project is also supported, in part, by the National Research Foundation Singapore and DSO National Laboratories under the AI Singapore Programme (AISG Award No: AISG2-RP-2020-019); the RIE 2020 Advanced Manufacturing and Engineering (AME) Programmatic Fund (No. A20G8b0102), Singapore; and the Center for Frontier AI Research (CFAR), Agency for Science, Technology and Research (A ${}^{\ast}$ STAR), Singapore. The work of Hao Cheng and Chongjun Wang was supported by the National Natural Science Foundation of China (Grant No. 62192783, 62376117). The work of Shanli Tan was done when he was a research intern with Xiaohu Wu at the National Engineering Research Center of Mobile Network Technologies, Bei**g University of Posts and Telecommunications, China.

References

Aziz and Savani (2016) Aziz, H.; and Savani, R. 2016. Hedonic Games. In Brandt, F.; Conitzer, V.; Endriss, U.; Lang, J.; and Procaccia, A. D., eds., Handbook of Computational Social Choice, 356–376. Cambridge University Press.
Brekke, Siciliani, and Straume (2011) Brekke, K. R.; Siciliani, L.; and Straume, O. R. 2011. Hospital competition and quality with regulated prices. Scandinavian Journal of Economics, 113(2): 444–469.
Cartwright and Harary (1956) Cartwright, D.; and Harary, F. 1956. Structural balance: A generalization of Heider’s theory. Psychological Review, 63(5): 277.
Chaudhury et al. (2022) Chaudhury, B. R.; Li, L.; Kang, M.; Li, B.; and Mehta, R. 2022. Fairness in federated learning via core-stability. In Advances in Neural Information Processing Systems (NeurIPS’22), volume 35, 5738–5750.
Cui et al. (2022) Cui, S.; Liang, J.; Pan, W.; Chen, K.; Zhang, C.; and Wang, F. 2022. Collaboration equilibrium in federated learning. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD’22), 241–251.
Ding and Wang (2022) Ding, S.; and Wang, W. 2022. Collaborative learning by detecting collaboration partners. In Advances in Neural Information Processing Systems (NeurIPS’22), volume 35, 15629–15641.
Donahue and Kleinberg (2021) Donahue, K.; and Kleinberg, J. 2021. Model-sharing games: Analyzing federated learning under voluntary participation. Proceedings of the 35th AAAI Conference on Artificial Intelligence (AAAI’21), 35(6): 5303–5311.
Fallah, Mokhtari, and Ozdaglar (2020) Fallah, A.; Mokhtari, A.; and Ozdaglar, A. 2020. Personalized federated learning with theoretical guarantees: A model-agnostic meta-learning approach. In Advances in Neural Information Processing Systems (NeurIPS’20), volume 33, 3557–3568.
Farris et al. (2010) Farris, P. W.; Bendle, N.; Pfeifer, P. E.; and Reibstein, D. 2010. Marketing metrics: The definitive guide to measuring marketing performance. Pearson Education.
Fleurence et al. (2014) Fleurence, R. L.; Curtis, L. H.; Califf, R. M.; Platt, R.; Selby, J. V.; and Brown, J. S. 2014. Launching PCORnet, a national patient-centered clinical research network. Journal of the American Medical Informatics Association, 21(4): 578–582.
Huang, Ke, and Liu (2023) Huang, C.; Ke, S.; and Liu, X. 2023. Duopoly business competition in cross-silo federated learning. IEEE Transactions on Network Science and Engineering, 1–13.
Kairouz et al. (2021) Kairouz, P.; McMahan, H. B.; Avent, B.; Bellet, A.; Bennis, M.; Nitin Bhagoji, A.; Bonawitz, K.; Charles, Z.; Cormode, G.; Cummings, R.; D’Oliveira, R. G. L.; Eichner, H.; El Rouayheb, S.; Evans, D.; Gardner, J.; Garrett, Z.; Gascón, A.; Ghazi, B.; Gibbons, P. B.; Gruteser, M.; Harchaoui, Z.; He, C.; He, L.; Huo, Z.; Hutchinson, B.; Hsu, J.; Jaggi, M.; Javidi, T.; Joshi, G.; Khodak, M.; Konecný, J.; Korolova, A.; Koushanfar, F.; Koyejo, S.; Lepoint, T.; Liu, Y.; Mittal, P.; Mohri, M.; Nock, R.; Özgür, A.; Pagh, R.; Qi, H.; Ramage, D.; Raskar, R.; Raykova, M.; Song, D.; Song, W.; Stich, S. U.; Sun, Z.; Suresh, A. T.; Tramèr, F.; Vepakomma, P.; Wang, J.; Xiong, L.; Xu, Z.; Yang, Q.; Yu, F. X.; Yu, H.; and Zhao, S. 2021. Advances and Open Problems in Federated Learning. Foundations and Trends in Machine Learning, 14(1–2): 1–210.
Karimireddy et al. (2020) Karimireddy, S. P.; Kale, S.; Mohri, M.; Reddi, S.; Stich, S.; and Suresh, A. T. 2020. Scaffold: Stochastic controlled averaging for federated learning. In Proceedings of the 37th International Conference on Machine Learning (ICML’20), volume 119, 5132–5143.
Leskovec, Huttenlocher, and Kleinberg (2010) Leskovec, J.; Huttenlocher, D.; and Kleinberg, J. 2010. Predicting positive and negative links in online social networks. In Proceedings of the 19th International Conference on World Wide Web (WWW’10), 641–650.
Li et al. (2022) Li, Q.; Diao, Y.; Chen, Q.; and He, B. 2022. Federated learning on non-iid data silos: An experimental study. In Proceedings of the IEEE 38th International Conference on Data Engineering (ICDE’22), 965–978.
Li et al. (2020) Li, T.; Sahu, A. K.; Zaheer, M.; Sanjabi, M.; Talwalkar, A.; and Smith, V. 2020. Federated optimization in heterogeneous networks. In Proceedings of Machine Learning and Systems, volume 2, 429–450.
Long et al. (2020) Long, G.; Tan, Y.; Jiang, J.; and Zhang, C. 2020. Federated learning for open banking. In Federated Learning, 240–254. Springer.
McMahan et al. (2017) McMahan, B.; Moore, E.; Ramage, D.; Hampson, S.; and Arcas, B. A. 2017. Communication-efficient learning of deep networks from decentralized data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS’17), 1273–1282.
Navon et al. (2021) Navon, A.; Shamsian, A.; Fetaya, E.; and Chechik, G. 2021. Learning the Pareto Front with Hypernetworks. In International Conference on Learning Representations (ICLR’21).
Oldenhof et al. (2023) Oldenhof, M.; Ács, G.; Pejó, B.; Schuffenhauer, A.; Holway, N.; Sturm, N.; Dieckmann, A.; Fortmeier, O.; Boniface, E.; Mayer, C.; et al. 2023. Industry-scale orchestrated federated learning for drug discovery. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, 15576–15584.
Pollard et al. (2018) Pollard, T. J.; Johnson, A. E.; Raffa, J. D.; Celi, L. A.; Mark, R. G.; and Badawi, O. 2018. The eICU collaborative research database, a freely available multi-center database for critical care research. Scientific data, 5(1): 1–13.
Sheller et al. (2020) Sheller, M. J.; Edwards, B.; Reina, G. A.; Martin, J.; Pati, S.; Kotrotsou, A.; Milchenko, M.; Xu, W.; Marcus, D.; Colen, R. R.; et al. 2020. Federated learning in medicine: Facilitating multi-institutional collaborations without sharing patient data. Scientific Reports, 10(1): 1–12.
Shi, Yu, and Leung (2023) Shi, Y.; Yu, H.; and Leung, C. 2023. Towards fairness-aware federated learning. IEEE Transactions on Neural Networks and Learning Systems, 1–17.
Smith et al. (2017) Smith, V.; Chiang, C.-K.; Sanjabi, M.; and Talwalkar, A. S. 2017. Federated multi-task learning. In Advances in neural information processing systems (NIPS’17), volume 30.
Sun et al. (2023) Sun, C.; Huang, C.; Shou, B.; and Huang, J. 2023. Federated Learning in Competitive EV Charging Market. arXiv preprint arXiv:2310.08794.
Tan et al. (2022) Tan, A. Z.; Yu, H.; Cui, L.; and Yang, Q. 2022. Towards personalized federated learning. IEEE Transactions on Neural Networks and Learning Systems, 1–17.
Tang and Wong (2021) Tang, M.; and Wong, V. W. 2021. An incentive mechanism for cross-silo federated learning: A public goods perspective. In Proceedings of the 2022 IEEE Conference on Computer Communications (INFOCOM’22), 1–10. IEEE.
Tariq et al. (2023) Tariq, A.; Serhani, M. A.; Sallabi, F.; Qayyum, T.; Barka, E. S.; and Shuaib, K. A. 2023. Trustworthy Federated Learning: A Survey. arXiv preprint arXiv:2305.11537.
Tomita, Tanaka, and Takahashi (2006) Tomita, E.; Tanaka, A.; and Takahashi, H. 2006. The worst-case time complexity for generating all maximal cliques and computational experiments. Theoretical computer science, 363(1): 28–42.
Tsoy and Konstantinov (2023) Tsoy, N.; and Konstantinov, N. 2023. Strategic data sharing between competitors. arXiv preprint arXiv:2305.16052.
Wang et al. (2022) Wang, Y.; Tong, Y.; Zhou, Z.; Ren, Z.; Xu, Y.; Wu, G.; and Lv, W. 2022. Fed-LTD: Towards Cross-Platform Ride Hailing via Federated Learning to Dispatch. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD’22), 4079–4089.
Wang et al. (2019) Wang, Z.; Dai, Z.; Póczos, B.; and Carbonell, J. 2019. Characterizing and avoiding negative transfer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19), 11293–11302.
Wu, De Pellegrini, and Casale (2023) Wu, X.; De Pellegrini, F.; and Casale, G. 2023. Delay and price differentiation in cloud computing: A service model, supporting architectures, and performance. ACM Trans. Model. Perform. Eval. Comput. Syst. Just Accepted.
Wu and Yu (2022) Wu, X.; and Yu, H. 2022. MarS-FL: Enabling competitors to collaborate in federated learning. IEEE Transactions on Big Data, 1–11.
Yang et al. (2020) Yang, L.; Tan, B.; Zheng, V. W.; Chen, K.; and Yang, Q. 2020. Federated recommendation systems. In Federated Learning: Privacy and Incentive, 225–239. Springer.
Yang et al. (2019) Yang, Q.; Liu, Y.; Chen, T.; and Tong, Y. 2019. Federated machinelearning: concept and applications. ACM Transactions on Intelligent Systems and Technology, 10(2): 12:1–12:19.
Yu et al. (2014) Yu, H.; Miao, C.; An, B.; Shen, Z.; and Leung, C. 2014. Reputation-aware task allocation for human trustees. In Proceedings of the 2014 international conference on Autonomous agents and multi-agent systems, 357–364.
Zhan et al. (2022) Zhan, Y.; Zhang, J.; Hong, Z.; Wu, L.; Li, P.; and Guo, S. 2022. A survey of incentive mechanism design for federated learning. IEEE Transactions on Emerging Topics in Computing, 10(2): 1035–1044.
Zhu et al. (2021) Zhu, H.; Xu, J.; Liu, S.; and **, Y. 2021. Federated learning on non-IID data: A survey. Neurocomputing, 465: 371–390.