Maverick-Aware Shapley Valuation for
Client Selection in Federated Learning

Mengwei Yang1, Ismat Jarin1, Baturalp Buyukates3, Salman Avestimehr3, Athina Markopoulou1 Emails: {mengwey, ijarin, athina}@uci.edu, {buyukate, avestime}@usc.edu 1Department of EECS and CS, University of California, Irvine, CA, USA 3Ming Hsieh Department of ECE, University of Southern California, CA, USA
Abstract

Federated Learning (FL) allows clients to train a model collaboratively without sharing their private data. One key challenge in practical FL systems is data heterogeneity, particularly in handling clients with rare data, also referred to as Mavericks. These clients own one or more data classes exclusively, and the model performance becomes poor without their participation. Thus, utilizing Mavericks throughout training is crucial. In this paper, we first design a Maverick-aware Shapley valuation that fairly evaluates the contribution of Mavericks. The main idea is to compute the clients’ Shapley values (SV) class-wise, i.e., per label. Next, we propose FedMS, a Maverick-Shapley client selection mechanism for FL that intelligently selects the clients that contribute the most in each round, by employing our Maverick-aware SV-based contribution score. We show that, compared to an extensive list of baselines, FedMS achieves better model performance and fairer Shapley Rewards distribution.

I Introduction

As the pace of legislation on user privacy accelerates, regulations such as the General Data Protection Regulation (GDPR) [1] and the California Consumer Privacy Act (CCPA) [2] have been released to give users more control over their personal information. In this landscape, Federated Learning (FL) has been proposed [3] to facilitate machine learning (ML) over decentralized user data, taking the place of traditional centralized training approaches with significant privacy challenges. In FL, many clients collaboratively train a model by only transmitting their model updates instead of their private data. Despite this increased privacy notion, practical FL systems usually face the challenge of data heterogeneity. Unlike the idealistic data center environments, in FL, participating clients usually have heterogeneous data, which can easily cause poor accuracy and slow convergence. Even though many works have tackled data heterogeneity from model performance, client selection, and rewarding perspectives in FL [4, 5, 6, 7], a prevalent scenario remains largely understudied: clients with rare data. Clients providing rare and previously unseen data are crucial to the success of the trained ML models. Training on diverse data avoids the common bias in algorithms, leading to fairer and trustworthy ML systems.

Refer to caption
Figure 1: Multiple devices participate in FL for a voice AI task. A few devices that exclusively own rare data, i.e., non-native accent data, are the Mavericks and crucial for training.

In [8], the term Mavericks was coined to refer to clients with rare data in FL, and more specifically to clients that exclusively own one or more classes (i.e., labels) of data, whereas the non-Maverick clients have a balanced distribution from the remaining classes. Some examples are shown in Fig. 1. When training an FL model for a disease classification task, most hospitals (i.e., clients) possess data indicating common diseases such as flu or cold. However, very few hospitals possess rare disease datasets such as for leukemia or thyroid cancers, making them Mavericks in this learning task. Another example of Mavericks is people with rare accents in training voice-activated AI systems like Amazon’s Alexa and Google’s Home Assistant. While the majority of these devices contain native accent data, a few of them contain data from users with non-native accents. Recent studies report that these assistant devices struggle to understand non-native accents, with more than 6% performance gap between the Western accents and the minority accents [9]. This performance disparity indicates a biased performance and demonstrates the importance of training with rare (or less common) data, from Mavericks, to create models that “speak" to everyone.

Prior work in FL has not sufficiently addressed this problem. The random sampling of clients at each round adopted by the conventional FL scheme, FedAvg [10], does not fully exploit rare data and can cause slow convergence, low model performance, and degraded fairness [11]. Existing techniques for selecting clients in FL includes contribution-based approaches S-FedAvg [12], GreedyFed [13] and distance-based methods such as FedEMD [8]. In contribution-based client selection methods, Shapley value (SV) [14] is widely applied for measuring clients’ contribution during training. Previous works [8] and [15] have shown that, despite their accurate performance in i.i.d. scenarios, SV-based methods systematically undervalue the Mavericks (although they are paramount for achieving high accuracy on certain classes of data), suffering from unfairness and performance loss due to under-utilization of rare data.

In this paper, we offer a principled way to value and utilize the Mavericks in FL. We design (i) a novel Maverick-aware Shapley valuation and (ii) a corresponding client selection mechanism that can more fairly assess the contribution of Mavericks and can effectively utilize them in each round. Our main contributions can be summarized as follows:

  • We propose a class-wise SV-based contribution score to value the contributions of clients in FL. To compute this score, we define the class difficulty in order to combine the class-wise SVs and fairly evaluate the contribution of the clients (Mavericks and non-Mavericks).

  • We then introduce FedMS, Maverick-Shapley client selection mechanism for FL, to effectively utilize the Mavericks during training based on the contribution scores. We show that FedMS significantly increases the model accuracy compared to an extensive list of baselines.

II Problem Setup and Background

In this section, we first formalize the FL framework [10] and define Mavericks [8]. We then give an overview of the SV-based methods for evaluating the contribution of clients.

Federated Learning (FL). We consider a general FL system with multiple clients and one server. We let 𝒦𝒦\mathcal{K}caligraphic_K denote the set of clients such that 𝒦={1,2,,I}𝒦12𝐼\mathcal{K}=\{1,2,...,I\}caligraphic_K = { 1 , 2 , … , italic_I }. Each client i𝑖iitalic_i owns dataset 𝒟isubscript𝒟𝑖\mathcal{D}_{i}caligraphic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, where ni=|𝒟i|subscript𝑛𝑖subscript𝒟𝑖n_{i}=|\mathcal{D}_{i}|italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = | caligraphic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT |. Each data point is a pair (𝒙,y)𝒙𝑦(\bm{x},y)( bold_italic_x , italic_y ), where 𝒙𝒙\bm{x}bold_italic_x is the feature vector and y𝑦yitalic_y is the corresponding label. We let ={1,2,,C}12𝐶\mathcal{M}=\{1,2,...,C\}caligraphic_M = { 1 , 2 , … , italic_C } denote the set of class labels. 𝒘𝒘{\bm{w}}bold_italic_w is the learnable weights of the global model and each client i𝑖iitalic_i has local model 𝒘isubscript𝒘𝑖{\bm{w}}_{i}bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. The training objective is defined as

min𝒘(𝒘)=mini𝒦nini(𝒘i),subscript𝒘𝒘subscript𝑖𝒦subscript𝑛𝑖𝑛subscript𝑖subscript𝒘𝑖\displaystyle\min_{\bm{w}}\mathcal{L}(\bm{w})=\min\sum_{i\in\mathcal{K}}\frac{% n_{i}}{n}\mathcal{L}_{i}(\bm{w}_{i}),roman_min start_POSTSUBSCRIPT bold_italic_w end_POSTSUBSCRIPT caligraphic_L ( bold_italic_w ) = roman_min ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_K end_POSTSUBSCRIPT divide start_ARG italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_n end_ARG caligraphic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , (1)

where we have n=i𝒦ni𝑛subscript𝑖𝒦subscript𝑛𝑖n=\sum_{i\in\mathcal{K}}n_{i}italic_n = ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_K end_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and the loss at client i𝑖iitalic_i is i(𝒘i)=1nid𝒟id(𝒘i)subscript𝑖subscript𝒘𝑖1subscript𝑛𝑖subscript𝑑subscript𝒟𝑖subscript𝑑subscript𝒘𝑖\mathcal{L}_{i}(\bm{w}_{i})=\frac{1}{n_{i}}\sum_{d\in\mathcal{D}_{i}}\mathcal{% L}_{d}(\bm{w}_{i})caligraphic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = divide start_ARG 1 end_ARG start_ARG italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_d ∈ caligraphic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ( bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ). The FL training process includes the following steps: (i) Initialization: The server initializes the global model parameters 𝒘𝒘\bm{w}bold_italic_w and broadcasts it to clients. (ii) Client Selection: In round t𝑡titalic_t, the server selects i𝒦t={1,2,,It}𝑖superscript𝒦𝑡12superscript𝐼𝑡i\in\mathcal{K}^{t}=\{1,2,...,I^{t}\}italic_i ∈ caligraphic_K start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = { 1 , 2 , … , italic_I start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT } clients with selection strategy π𝜋\piitalic_π. (iii) Local Update and Model Aggregation: Each selected client i𝑖iitalic_i in round t𝑡titalic_t performs local training and sends 𝒘itsuperscriptsubscript𝒘𝑖𝑡{\bm{w}}_{i}^{t}bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT to the server. Then, the server updates the global model using the model updates of the clients as 𝒘t+1=i=1Itniti=1Itnit𝒘itsuperscript𝒘𝑡1superscriptsubscript𝑖1superscript𝐼𝑡superscriptsubscript𝑛𝑖𝑡superscriptsubscript𝑖1subscript𝐼𝑡superscriptsubscript𝑛𝑖𝑡superscriptsubscript𝒘𝑖𝑡{\bm{w}}^{t+1}=\sum_{i=1}^{I^{t}}\frac{n_{i}^{t}}{\sum_{i=1}^{I_{t}}n_{i}^{t}}% \bm{w}_{i}^{t}bold_italic_w start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_I start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT divide start_ARG italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_I start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_ARG bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT. Steps (ii) and (iii) are repeated until the convergence of the global model.

1
2Input: T𝑇Titalic_T: number of training rounds; E𝐸Eitalic_E: number of local epochs; 𝒦𝒦\mathcal{K}caligraphic_K: set of clients; 𝒟isubscript𝒟𝑖\mathcal{D}_{i}caligraphic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT: dataset of client i𝑖iitalic_i; B𝐵Bitalic_B: minibatch size; nitsuperscriptsubscript𝑛𝑖𝑡n_{i}^{t}italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT: dataset size of the i𝑖iitalic_ith client in round t𝑡titalic_t; \mathcal{M}caligraphic_M: set of class labels; 𝒟valsubscript𝒟𝑣𝑎𝑙\mathcal{D}_{val}caligraphic_D start_POSTSUBSCRIPT italic_v italic_a italic_l end_POSTSUBSCRIPT: validation dataset; 𝒱class()subscript𝒱𝑐𝑙𝑎𝑠𝑠\mathcal{V}_{class}(\cdot)caligraphic_V start_POSTSUBSCRIPT italic_c italic_l italic_a italic_s italic_s end_POSTSUBSCRIPT ( ⋅ ): class-wise accuracy function; ηisubscript𝜂𝑖\eta_{i}italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT: learning rate at client i𝑖iitalic_i.
3Server executes:
4    Initialize 𝒘0superscript𝒘0\bm{w}^{0}bold_italic_w start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT, β𝛽\betaitalic_β, S^^𝑆\hat{S}over^ start_ARG italic_S end_ARG
5
6   for each round t = 0, … T1𝑇1T-1italic_T - 1 do
7     
8     // Compute contribution score S^isubscript^𝑆𝑖\hat{S}_{i}over^ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT.
9      S^i=cβcSic,i𝒦formulae-sequencesubscript^𝑆𝑖subscript𝑐bold-⋅superscript𝛽𝑐superscriptsubscript𝑆𝑖𝑐for-all𝑖𝒦\hat{S}_{i}=\sum\limits_{c\in\mathcal{M}}\beta^{c}\bm{\cdot}S_{i}^{c},\forall i% \in\mathcal{K}over^ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_c ∈ caligraphic_M end_POSTSUBSCRIPT italic_β start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT bold_⋅ italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT , ∀ italic_i ∈ caligraphic_K
10      // Sample clients from PS^,isubscript𝑃^𝑆𝑖P_{\hat{S},i}italic_P start_POSTSUBSCRIPT over^ start_ARG italic_S end_ARG , italic_i end_POSTSUBSCRIPT.
11      PS^,i=exp(S^i)i𝒦exp(S^i),i𝒦formulae-sequencesubscript𝑃^𝑆𝑖𝑒𝑥𝑝subscript^𝑆𝑖subscript𝑖𝒦𝑒𝑥𝑝subscript^𝑆𝑖for-all𝑖𝒦P_{\hat{S},i}=\frac{exp(\hat{S}_{i})}{\sum\limits_{i\in\mathcal{K}}exp(\hat{S}% _{i})},\forall i\in\mathcal{K}italic_P start_POSTSUBSCRIPT over^ start_ARG italic_S end_ARG , italic_i end_POSTSUBSCRIPT = divide start_ARG italic_e italic_x italic_p ( over^ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_K end_POSTSUBSCRIPT italic_e italic_x italic_p ( over^ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG , ∀ italic_i ∈ caligraphic_K
12      𝒦tsuperscript𝒦𝑡absent\mathcal{K}^{t}\leftarrowcaligraphic_K start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ← sample i𝑖iitalic_i clients PS^,isimilar-toabsentsubscript𝑃^𝑆𝑖\sim P_{\hat{S},i}∼ italic_P start_POSTSUBSCRIPT over^ start_ARG italic_S end_ARG , italic_i end_POSTSUBSCRIPT
13     
14     for each client i𝑖iitalic_i \in 𝒦tsuperscript𝒦𝑡\mathcal{K}^{t}caligraphic_K start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT in parallel do
15           𝒘itsuperscriptsubscript𝒘𝑖𝑡absent\bm{w}_{i}^{t}\leftarrowbold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ← UserUpdate (𝒘t,i)superscript𝒘𝑡𝑖(\bm{w}^{t},i)( bold_italic_w start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , italic_i )
16          
17           // Calculate class-wise Shapley value ϕisubscriptitalic-ϕ𝑖\phi_{i}italic_ϕ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, class difficulty β𝛽\betaitalic_β and the best clients set 𝒦^tsuperscript^𝒦𝑡\hat{\mathcal{K}}^{t}over^ start_ARG caligraphic_K end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT.
18           ϕ,β,𝒦^titalic-ϕ𝛽superscript^𝒦𝑡absent\phi,\beta,\hat{\mathcal{K}}^{t}\leftarrowitalic_ϕ , italic_β , over^ start_ARG caligraphic_K end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ← Maverick-Shapley ({𝒘it}i𝒦t,𝒘t,Dval,𝒱class(),)subscriptsuperscriptsubscript𝒘𝑖𝑡𝑖superscript𝒦𝑡superscript𝒘𝑡subscript𝐷𝑣𝑎𝑙subscript𝒱𝑐𝑙𝑎𝑠𝑠{\hskip 56.9055pt}(\{\bm{w}_{i}^{t}\}_{i\in\mathcal{K}^{t}},\bm{w}^{t},D_{val}% ,\mathcal{V}_{class}(\cdot),\mathcal{M})( { bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_i ∈ caligraphic_K start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_POSTSUBSCRIPT , bold_italic_w start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , italic_D start_POSTSUBSCRIPT italic_v italic_a italic_l end_POSTSUBSCRIPT , caligraphic_V start_POSTSUBSCRIPT italic_c italic_l italic_a italic_s italic_s end_POSTSUBSCRIPT ( ⋅ ) , caligraphic_M )
19           // Compute the accumulated class-wise Shapley value Sisubscript𝑆𝑖S_{i}italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT.
20           Sic=αSic+(1α)ϕic,i𝒦t,cformulae-sequencesuperscriptsubscript𝑆𝑖𝑐bold-⋅𝛼superscriptsubscript𝑆𝑖𝑐bold-⋅1𝛼superscriptsubscriptitalic-ϕ𝑖𝑐formulae-sequencefor-all𝑖superscript𝒦𝑡for-all𝑐S_{i}^{c}=\alpha\bm{\cdot}S_{i}^{c}+(1-\alpha)\bm{\cdot}\phi_{i}^{c},\forall i% \in\mathcal{K}^{t},\forall c\in\mathcal{M}italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT = italic_α bold_⋅ italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT + ( 1 - italic_α ) bold_⋅ italic_ϕ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT , ∀ italic_i ∈ caligraphic_K start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , ∀ italic_c ∈ caligraphic_M
21           𝒘t+1i𝒦^tniti𝒦^tnit𝒘itsuperscript𝒘𝑡1subscript𝑖superscript^𝒦𝑡superscriptsubscript𝑛𝑖𝑡subscript𝑖superscript^𝒦𝑡superscriptsubscript𝑛𝑖𝑡superscriptsubscript𝒘𝑖𝑡{\bm{w}}^{t+1}\leftarrow\sum\limits_{i\in\hat{\mathcal{K}}^{t}}\frac{n_{i}^{t}% }{\sum\limits_{i\in\hat{\mathcal{K}}^{t}}n_{i}^{t}}\bm{w}_{i}^{t}bold_italic_w start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ← ∑ start_POSTSUBSCRIPT italic_i ∈ over^ start_ARG caligraphic_K end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_POSTSUBSCRIPT divide start_ARG italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ over^ start_ARG caligraphic_K end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_ARG bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT;
22          
23           function UserUpdate (wt,isuperscript𝑤𝑡𝑖\bm{w}^{t},ibold_italic_w start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , italic_i):
24              for each local epoch e = 1…E𝐸Eitalic_E do
25               𝒟iBsubscriptsuperscript𝒟𝐵𝑖absent\mathcal{D}^{B}_{i}\leftarrowcaligraphic_D start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ← select a minibatch of size B𝒟i𝐵subscript𝒟𝑖B\subseteq\mathcal{D}_{i}italic_B ⊆ caligraphic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT
26                𝒘it𝒘itηii(𝒟iB,𝒘it)superscriptsubscript𝒘𝑖𝑡superscriptsubscript𝒘𝑖𝑡subscript𝜂𝑖subscript𝑖subscriptsuperscript𝒟𝐵𝑖superscriptsubscript𝒘𝑖𝑡\bm{w}_{i}^{t}\leftarrow\bm{w}_{i}^{t}-\eta_{i}\nabla\mathcal{L}_{i}(\mathcal{% D}^{B}_{i},\bm{w}_{i}^{t})bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ← bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∇ caligraphic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( caligraphic_D start_POSTSUPERSCRIPT italic_B end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT )
27               
28                   return 𝐰itsuperscriptsubscript𝐰𝑖𝑡{\bm{w}}_{i}^{t}bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT to server
29               
Algorithm 1 FedMS: a Maverick-Shapley Client Selection Mechanism for FL

Mavericks. A Maverick is a client that owns one or more classes exclusively [8]. Let mavsubscript𝑚𝑎𝑣\mathcal{M}_{mav}caligraphic_M start_POSTSUBSCRIPT italic_m italic_a italic_v end_POSTSUBSCRIPT denote the set of class labels exclusively owned by Mavericks. If a client is a Maverick, then its dataset satisfies 𝒟i={{xc,yc}cMmavi,{xc,yc}cMmavi}subscript𝒟𝑖subscriptsuperscriptsuperscript𝑥𝑐superscript𝑦𝑐𝑖𝑐subscript𝑀𝑚𝑎𝑣subscriptsuperscriptsuperscript𝑥𝑐superscript𝑦𝑐𝑖𝑐subscript𝑀𝑚𝑎𝑣\mathcal{D}_{i}=\{\{x^{c},y^{c}\}^{i}_{c\in M_{mav}},\{x^{c},y^{c}\}^{i}_{c% \notin M_{mav}}\}caligraphic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = { { italic_x start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT } start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c ∈ italic_M start_POSTSUBSCRIPT italic_m italic_a italic_v end_POSTSUBSCRIPT end_POSTSUBSCRIPT , { italic_x start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT } start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c ∉ italic_M start_POSTSUBSCRIPT italic_m italic_a italic_v end_POSTSUBSCRIPT end_POSTSUBSCRIPT }. Here, {xc,yc}isuperscriptsuperscript𝑥𝑐superscript𝑦𝑐𝑖\{x^{c},y^{c}\}^{i}{ italic_x start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT } start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT denotes the data points in 𝒟isubscript𝒟𝑖\mathcal{D}_{i}caligraphic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT with label c𝑐citalic_c. If a client is not a Maverick, then its dataset satisfies 𝒟i={{xc,yc}cMmavi}subscript𝒟𝑖subscriptsuperscriptsuperscript𝑥𝑐superscript𝑦𝑐𝑖𝑐subscript𝑀𝑚𝑎𝑣\mathcal{D}_{i}=\{\{x^{c},y^{c}\}^{i}_{c\notin M_{mav}}\}caligraphic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = { { italic_x start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT } start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_c ∉ italic_M start_POSTSUBSCRIPT italic_m italic_a italic_v end_POSTSUBSCRIPT end_POSTSUBSCRIPT }. As in [8], we assume the data samples {xc,yc}cMmavsubscriptsuperscript𝑥𝑐superscript𝑦𝑐𝑐subscript𝑀𝑚𝑎𝑣\{x^{c},y^{c}\}_{c\notin M_{mav}}{ italic_x start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_c ∉ italic_M start_POSTSUBSCRIPT italic_m italic_a italic_v end_POSTSUBSCRIPT end_POSTSUBSCRIPT are evenly distributed among all clients but the data samples {xc,yc}cMmavsubscriptsuperscript𝑥𝑐superscript𝑦𝑐𝑐subscript𝑀𝑚𝑎𝑣\{x^{c},y^{c}\}_{c\in M_{mav}}{ italic_x start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_c ∈ italic_M start_POSTSUBSCRIPT italic_m italic_a italic_v end_POSTSUBSCRIPT end_POSTSUBSCRIPT are exclusively owned by the Mavericks. We note that there can be multiple Mavericks jointly owning the rare labels, which we call the shared Mavericks.

Shapley Value (SV) for Client Valuation in FL. SV [14, 16] of client i𝑖iitalic_i is given by

ϕi(𝒦,𝒱)=𝒬𝒦\{i}𝒱(𝒬{i})𝒱(𝒬)(|𝒦|1|𝒬|),subscriptitalic-ϕ𝑖𝒦𝒱subscript𝒬\𝒦𝑖𝒱𝒬𝑖𝒱𝒬binomial𝒦1𝒬\displaystyle\phi_{i}(\mathcal{K},\mathcal{V})=\sum_{\mathcal{Q}\subseteq% \mathcal{K}\backslash\{i\}}\frac{\mathcal{V}(\mathcal{Q}\cup\{i\})-\mathcal{V}% (\mathcal{Q})}{\tbinom{|\mathcal{K}|-1}{|\mathcal{Q}|}},italic_ϕ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( caligraphic_K , caligraphic_V ) = ∑ start_POSTSUBSCRIPT caligraphic_Q ⊆ caligraphic_K \ { italic_i } end_POSTSUBSCRIPT divide start_ARG caligraphic_V ( caligraphic_Q ∪ { italic_i } ) - caligraphic_V ( caligraphic_Q ) end_ARG start_ARG ( FRACOP start_ARG | caligraphic_K | - 1 end_ARG start_ARG | caligraphic_Q | end_ARG ) end_ARG , (2)

where ϕisubscriptitalic-ϕ𝑖\phi_{i}italic_ϕ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the SV for client i𝑖iitalic_i, 𝒬𝒬\mathcal{Q}caligraphic_Q denotes the subset of participants from 𝒦𝒦\mathcal{K}caligraphic_K. The utility function 𝒱()𝒱\mathcal{V}(\cdot)caligraphic_V ( ⋅ ) can assume any form which can evaluate the utility of the input. The conventional SV in (2) requires retraining the FL model for all subsets of clients, which is computationally prohibitive [17]. For client contribution assessment in FL, gradient-based SV approximation techniques such as MR [18], TMR [19], and GTG [17] are employed (see Appendix -A for an overview).

III Proposed Method: FedMS

In this section, we describe the proposed Maverick-Shapley client selection mechanism, FedMS, that fairly computes the contributions of all clients using a class-wise SV-based contribution scoring and selects the most contributing clients in each round. The steps are outlined in Algorithm 1 and the list of variables is given in Appendix -C.

Maverick-Shapley Contribution Score. When training a model for multi-class tasks, the difficulty of learning each class is different. Particularly in the presence of Mavericks, rare classes are harder to learn than the others. In order to differentiate between classes and accurately compute the contribution of each client (Mavericks and non-Mavericks alike), we propose a class-wise SV-based contribution score. In particular, we use the class-wise accuracy as the utility function in SV computations to better capture the difficulty level of each class. Class-wise accuracy is calculated as

𝒱classc(w;𝒟val)=NccjNcj,c,formulae-sequencesuperscriptsubscript𝒱𝑐𝑙𝑎𝑠𝑠𝑐𝑤subscript𝒟valsuperscript𝑁𝑐𝑐subscript𝑗superscript𝑁𝑐𝑗for-all𝑐\displaystyle\mathcal{V}_{class}^{c}(w;\mathcal{D}_{\textrm{val}})=\frac{N^{cc% }}{\sum\limits_{j\in\mathcal{M}}N^{cj}},\quad\forall c\in\mathcal{M},caligraphic_V start_POSTSUBSCRIPT italic_c italic_l italic_a italic_s italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT ( italic_w ; caligraphic_D start_POSTSUBSCRIPT val end_POSTSUBSCRIPT ) = divide start_ARG italic_N start_POSTSUPERSCRIPT italic_c italic_c end_POSTSUPERSCRIPT end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_M end_POSTSUBSCRIPT italic_N start_POSTSUPERSCRIPT italic_c italic_j end_POSTSUPERSCRIPT end_ARG , ∀ italic_c ∈ caligraphic_M , (3)

where w𝑤witalic_w is a given model, 𝒟valsubscript𝒟val\mathcal{D}_{\textrm{val}}caligraphic_D start_POSTSUBSCRIPT val end_POSTSUBSCRIPT is validation dataset at the server, Ncjsuperscript𝑁𝑐𝑗N^{cj}italic_N start_POSTSUPERSCRIPT italic_c italic_j end_POSTSUPERSCRIPT represents the number of validation data points of class c𝑐citalic_c predicted as class j𝑗jitalic_j, \mathcal{M}caligraphic_M is set of class labels. Our main distinction in SV computation is the fact that we compute it in a class-wise manner to better capture the diverse resources of Mavericks (hence the name Maverick-Shapley).

In each FL round, after receiving model updates from the participating clients, the server computes the SV of a client i𝑖iitalic_i for class c𝑐citalic_c, ϕicsuperscriptsubscriptitalic-ϕ𝑖𝑐\phi_{i}^{c}italic_ϕ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT, by utilizing a gradient-based SV approximation method of its choice using (3). It then computes the accumulated SVs Sicsuperscriptsubscript𝑆𝑖𝑐S_{i}^{c}italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT using a decay factor α𝛼\alphaitalic_α as

Sic=αSic+(1α)ϕic,i𝒦t,c.formulae-sequencesuperscriptsubscript𝑆𝑖𝑐𝛼superscriptsubscript𝑆𝑖𝑐1𝛼superscriptsubscriptitalic-ϕ𝑖𝑐formulae-sequencefor-all𝑖superscript𝒦𝑡for-all𝑐\displaystyle S_{i}^{c}=\alpha*S_{i}^{c}+(1-\alpha)*\phi_{i}^{c},\quad\forall i% \in\mathcal{K}^{t},\forall c\in\mathcal{M}.italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT = italic_α ∗ italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT + ( 1 - italic_α ) ∗ italic_ϕ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT , ∀ italic_i ∈ caligraphic_K start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , ∀ italic_c ∈ caligraphic_M . (4)

Finally, the server computes the contribution score of each client i𝑖iitalic_i as a weighted sum of its class-wise accumulated SVs111In the first round, the server initializes contribution scores by calculating the cosine distance between each client model and the aggregate model.

S^i=cβcSic,i𝒦,formulae-sequencesubscript^𝑆𝑖subscript𝑐bold-⋅superscript𝛽𝑐superscriptsubscript𝑆𝑖𝑐for-all𝑖𝒦\displaystyle\hat{S}_{i}=\sum\limits_{c\in\mathcal{M}}\beta^{c}\bm{\cdot}S_{i}% ^{c},\quad\forall i\in\mathcal{K},over^ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_c ∈ caligraphic_M end_POSTSUBSCRIPT italic_β start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT bold_⋅ italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT , ∀ italic_i ∈ caligraphic_K , (5)

where β𝛽\betaitalic_β denotes the class difficulty, adaptively adjusting the impact of each class in the contribution scores such that

βc=exp(1𝒱classc(w;𝒟val)T)cexp(1𝒱classc(w;𝒟val)T),c,formulae-sequencesuperscript𝛽𝑐𝑒𝑥𝑝1superscriptsubscript𝒱𝑐𝑙𝑎𝑠𝑠𝑐𝑤subscript𝒟val𝑇subscript𝑐𝑒𝑥𝑝1superscriptsubscript𝒱𝑐𝑙𝑎𝑠𝑠𝑐𝑤subscript𝒟val𝑇for-all𝑐\displaystyle\beta^{c}=\frac{exp\left(\frac{1-\mathcal{V}_{class}^{c}(w;% \mathcal{D}_{\textrm{val}})}{T}\right)}{\sum\limits_{c\in\mathcal{M}}exp\left(% \frac{1-\mathcal{V}_{class}^{c}(w;\mathcal{D}_{\textrm{val}})}{T}\right)},% \quad\forall c\in\mathcal{M},italic_β start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT = divide start_ARG italic_e italic_x italic_p ( divide start_ARG 1 - caligraphic_V start_POSTSUBSCRIPT italic_c italic_l italic_a italic_s italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT ( italic_w ; caligraphic_D start_POSTSUBSCRIPT val end_POSTSUBSCRIPT ) end_ARG start_ARG italic_T end_ARG ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_c ∈ caligraphic_M end_POSTSUBSCRIPT italic_e italic_x italic_p ( divide start_ARG 1 - caligraphic_V start_POSTSUBSCRIPT italic_c italic_l italic_a italic_s italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT ( italic_w ; caligraphic_D start_POSTSUBSCRIPT val end_POSTSUBSCRIPT ) end_ARG start_ARG italic_T end_ARG ) end_ARG , ∀ italic_c ∈ caligraphic_M , (6)

where the temperature T𝑇Titalic_T controls the distribution. Since the difficulty of learning each class is dynamically changing, the server updates the class difficulty β𝛽\betaitalic_β and the contribution score S^^𝑆\hat{S}over^ start_ARG italic_S end_ARG in each round.

The proposed Maverick-Shapley approach is universally applicable to the existing SV approximation algorithms. Algorithm 2 describes the procedure for the MR [18] technique.222Another example is in Appendix -B for the GTG-Shapley [17] technique.

1
2Input: Updated client models {𝒘it}i𝒦tsubscriptsuperscriptsubscript𝒘𝑖𝑡𝑖superscript𝒦𝑡\{\bm{w}_{i}^{t}\}_{i\in\mathcal{K}^{t}}{ bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_i ∈ caligraphic_K start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_POSTSUBSCRIPT; current server model 𝒘tsuperscript𝒘𝑡\bm{w}^{t}bold_italic_w start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT; validation dataset at server 𝒟valsubscript𝒟val\mathcal{D}_{\textrm{val}}caligraphic_D start_POSTSUBSCRIPT val end_POSTSUBSCRIPT; class-wise accuracy function 𝒱class()subscript𝒱𝑐𝑙𝑎𝑠𝑠\mathcal{V}_{class}(\cdot)caligraphic_V start_POSTSUBSCRIPT italic_c italic_l italic_a italic_s italic_s end_POSTSUBSCRIPT ( ⋅ ); \mathcal{M}caligraphic_M: set of class labels.
3 Hyperparameter: Temperature T
4 Initialize: ϕi=0,i𝒦tformulae-sequencesubscriptitalic-ϕ𝑖0for-all𝑖superscript𝒦𝑡\phi_{i}=0,\forall i\in\mathcal{K}^{t}italic_ϕ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0 , ∀ italic_i ∈ caligraphic_K start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT
5 for each subset Q𝑄Qitalic_Q \subseteq 𝒦tsuperscript𝒦𝑡\mathcal{K}^{t}caligraphic_K start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT do
6      w~Q=ModelAverage({𝒘it}iQ,𝒘t)subscript~𝑤𝑄ModelAveragesubscriptsuperscriptsubscript𝒘𝑖𝑡𝑖𝑄superscript𝒘𝑡\widetilde{w}_{Q}=\textnormal{{ModelAverage}}(\{\bm{w}_{i}^{t}\}_{i\in Q},\bm{% w}^{t})over~ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT = ModelAverage ( { bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_i ∈ italic_Q end_POSTSUBSCRIPT , bold_italic_w start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT )
7     
8     
9     for client i𝑖iitalic_i \in 𝒦tsuperscript𝒦𝑡\mathcal{K}^{t}caligraphic_K start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT do
10           for class c𝑐citalic_c \in \mathcal{M}caligraphic_M do
11               
12               ϕic=𝒬𝒦t\{i}𝒱classc(w~𝒬{i};𝒟val)𝒱classc(w~𝒬;𝒟val)(|𝒦t|1|𝒬|)superscriptsubscriptitalic-ϕ𝑖𝑐subscript𝒬\superscript𝒦𝑡𝑖superscriptsubscript𝒱𝑐𝑙𝑎𝑠𝑠𝑐subscript~𝑤𝒬𝑖subscript𝒟valsuperscriptsubscript𝒱𝑐𝑙𝑎𝑠𝑠𝑐subscript~𝑤𝒬subscript𝒟valbinomialsuperscript𝒦𝑡1𝒬\phi_{i}^{c}=\sum\limits_{\mathcal{Q}\subseteq\mathcal{K}^{t}\backslash\{i\}}% \frac{\mathcal{V}_{class}^{c}(\widetilde{w}_{\mathcal{Q}\cup\{i\}};\mathcal{D}% _{\textrm{val}})-\mathcal{V}_{class}^{c}(\widetilde{w}_{\mathcal{Q}};\mathcal{% D}_{\textrm{val}})}{\tbinom{|\mathcal{K}^{t}|-1}{|\mathcal{Q}|}}italic_ϕ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT caligraphic_Q ⊆ caligraphic_K start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT \ { italic_i } end_POSTSUBSCRIPT divide start_ARG caligraphic_V start_POSTSUBSCRIPT italic_c italic_l italic_a italic_s italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT ( over~ start_ARG italic_w end_ARG start_POSTSUBSCRIPT caligraphic_Q ∪ { italic_i } end_POSTSUBSCRIPT ; caligraphic_D start_POSTSUBSCRIPT val end_POSTSUBSCRIPT ) - caligraphic_V start_POSTSUBSCRIPT italic_c italic_l italic_a italic_s italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT ( over~ start_ARG italic_w end_ARG start_POSTSUBSCRIPT caligraphic_Q end_POSTSUBSCRIPT ; caligraphic_D start_POSTSUBSCRIPT val end_POSTSUBSCRIPT ) end_ARG start_ARG ( FRACOP start_ARG | caligraphic_K start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT | - 1 end_ARG start_ARG | caligraphic_Q | end_ARG ) end_ARG
13               
14               
15                // Find the best clients set 𝒦^tsuperscript^𝒦𝑡\hat{\mathcal{K}}^{t}over^ start_ARG caligraphic_K end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT and its class-wise accuracy v^^𝑣\hat{v}over^ start_ARG italic_v end_ARG
16                𝒦^t,v^argmaxQ𝒦tc𝒱classc(w~Q,𝒟val)superscript^𝒦𝑡^𝑣subscriptargmax𝑄superscript𝒦𝑡subscript𝑐superscriptsubscript𝒱𝑐𝑙𝑎𝑠𝑠𝑐subscript~𝑤𝑄subscript𝒟val\hat{\mathcal{K}}^{t},\hat{v}\leftarrow\operatorname*{argmax}_{Q\subseteq% \mathcal{K}^{t}}\sum\limits_{c\in\mathcal{M}}\mathcal{V}_{class}^{c}(% \widetilde{w}_{Q},\mathcal{D}_{\textrm{val}})over^ start_ARG caligraphic_K end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , over^ start_ARG italic_v end_ARG ← roman_argmax start_POSTSUBSCRIPT italic_Q ⊆ caligraphic_K start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_c ∈ caligraphic_M end_POSTSUBSCRIPT caligraphic_V start_POSTSUBSCRIPT italic_c italic_l italic_a italic_s italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT ( over~ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_Q end_POSTSUBSCRIPT , caligraphic_D start_POSTSUBSCRIPT val end_POSTSUBSCRIPT )
17                // Obtain class difficulty β𝛽\betaitalic_β
18                βc=exp(1v^cT)cMexp(1v^cT),cMformulae-sequencesuperscript𝛽𝑐𝑒𝑥𝑝1superscript^𝑣𝑐𝑇subscript𝑐𝑀𝑒𝑥𝑝1superscript^𝑣𝑐𝑇for-all𝑐𝑀\beta^{c}=\frac{exp(\frac{1-\hat{v}^{c}}{T})}{\sum\limits_{c\in M}exp(\frac{1-% \hat{v}^{c}}{T})},\forall c\in Mitalic_β start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT = divide start_ARG italic_e italic_x italic_p ( divide start_ARG 1 - over^ start_ARG italic_v end_ARG start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT end_ARG start_ARG italic_T end_ARG ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_c ∈ italic_M end_POSTSUBSCRIPT italic_e italic_x italic_p ( divide start_ARG 1 - over^ start_ARG italic_v end_ARG start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT end_ARG start_ARG italic_T end_ARG ) end_ARG , ∀ italic_c ∈ italic_M
19               
return ϕ,β,K^titalic-ϕ𝛽superscript^𝐾𝑡\phi,\beta,\hat{K}^{t}italic_ϕ , italic_β , over^ start_ARG italic_K end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT
Algorithm 2 Maverick-Shapley

Client Selection. Based on the contribution scores, the server selects the most contributing clients in each FL training round. To this end, it calculates the selection probability of each client according to their contribution scores S^^𝑆\hat{S}over^ start_ARG italic_S end_ARG as

PS^,i=exp(S^i)i𝒦exp(S^i),i𝒦,formulae-sequencesubscript𝑃^𝑆𝑖𝑒𝑥𝑝subscript^𝑆𝑖subscript𝑖𝒦𝑒𝑥𝑝subscript^𝑆𝑖for-all𝑖𝒦\displaystyle P_{\hat{S},i}=\frac{exp(\hat{S}_{i})}{\sum\limits_{i\in\mathcal{% K}}exp(\hat{S}_{i})},\quad\forall i\in\mathcal{K},italic_P start_POSTSUBSCRIPT over^ start_ARG italic_S end_ARG , italic_i end_POSTSUBSCRIPT = divide start_ARG italic_e italic_x italic_p ( over^ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_K end_POSTSUBSCRIPT italic_e italic_x italic_p ( over^ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG , ∀ italic_i ∈ caligraphic_K , (7)

and samples clients based on PS^subscript𝑃^𝑆P_{\hat{S}}italic_P start_POSTSUBSCRIPT over^ start_ARG italic_S end_ARG end_POSTSUBSCRIPT in each round. As Mavericks exclusively own certain classes, they are the most contributing clients for those rare classes and have higher probability to be selected when the model performs poorly on rare classes.

As the server computes the class-wise accuracy considering multiple client permutations during the contribution score computation (e.g., line 8 in Algorithm 2), it can further refine the client selection. In each training round in FedMS, the server finds the subset of clients leading to the highest total class-wise accuracy increase, i.e., the best client set 𝒦^tsuperscript^𝒦𝑡\hat{\mathcal{K}}^{t}over^ start_ARG caligraphic_K end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT in line 10 in Algorithm 2, and aggregates only their updates.

Shapley Rewards (SR). In each round, the server computes the SV of a selected client i𝑖iitalic_i for class c𝑐citalic_c, ϕicsuperscriptsubscriptitalic-ϕ𝑖𝑐\phi_{i}^{c}italic_ϕ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT. It then calculates the Shapley Rewards of each client i𝑖iitalic_i for round t𝑡titalic_t as a weighted sum of its class-wise SVs using the current class difficulty β𝛽\betaitalic_β

Rit=cβcϕic,i𝒦t.formulae-sequencesuperscriptsubscript𝑅𝑖𝑡subscript𝑐bold-⋅superscript𝛽𝑐superscriptsubscriptitalic-ϕ𝑖𝑐for-all𝑖superscript𝒦𝑡\displaystyle R_{i}^{t}=\sum\limits_{c\in\mathcal{M}}\beta^{c}\bm{\cdot}\phi_{% i}^{c},\quad\forall i\in\mathcal{K}^{t}.italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_c ∈ caligraphic_M end_POSTSUBSCRIPT italic_β start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT bold_⋅ italic_ϕ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT , ∀ italic_i ∈ caligraphic_K start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT . (8)

IV Evaluation

In this section, we comprehensively evaluate the effectiveness of our algorithm, FedMS, on two datasets against six baselines. We demonstrate an improved accuracy and fairer Shapley rewards for both Mavericks and non-Mavericks.

Datasets and Models. We use two benchmark datasets, (i) MNIST[20] consisting of handwritten digits, with 60,000 samples for training and 10,000 for testing, and (ii) CIFAR-10 [21] consisting of colored images of 10 classes, with 50,000 samples for training and 10,000 for testing. We utilize a lightweight MLP neural network [22] for MNIST and a commonly employed CNN [23] for the CIFAR-10 dataset.

Implementation Details. Both MNIST and CIFAR-10 datasets are uniformly distributed across all 10 class labels. Here, to satisfy our Mavericks setting, we split the dataset into two scenarios: (i) 5 clients (4 non-Mavericks and 1 Maverick) without client selection and (ii) 50 clients (48 non-Mavericks and 2 Mavericks) with 10% selection rate of 50 clients in each round. Each Maverick exclusively owns one class in both scenarios (i and ii). The training process involves 100 global training rounds for MNIST and 200 for CIFAR-10 both with a batch size of 64; the learning rate is 0.05 in both datasets. We employ 1 local training on MNIST and 10 local training on CIFAR-10. We choose α𝛼\alphaitalic_α as 0.6 for both MNIST and CIFAR-10. In the proposed FedMS, we use class-wise GTG-Shapley (shown in Appendix -B) for Shapley Rewards computation. We note that our class-wise approach is applicable to other SV approximation methods such as MR and TMR as well (see Tables II and III for performance under different methods).

Evaluation Metrics. To assess the effectiveness of evaluated mechanisms, we consider the test accuracy as the utility metric. In addition, we evaluate different schemes based on their Shapley Rewards (SR) to the Mavericks. A larger SR is associated with higher contributions. While we compute SR as in (8), previous works use SVs of the clients simply as SR.

Baselines. We consider six client selection baselines: FedAvg [10], S-FedAvg [12], FedEMD [8], FedProx [24], GreedyFed[13], and PoC [25]. FedAvg applies random sampling in each round. S-FedAvg and GreedyFed combine SV-based methods with client selection. FedProx and PoC propose mechanisms regarding data heterogeneity in FL. FedEMD combines EMD distance with client selection in the presence of Mavericks.

Refer to caption
(a) FedAvg (Original)
Refer to caption
(b) FedMS (Our method)
Figure 2: Comparison of test accuracy and Shapley rewards with 5 clients (w/o client selection) for the MNIST dataset using GTG-Shapley.

Fairer Shapley (Reward) Distribution. Fig. 2 illustrates how the test accuracy and SR change during training in the 5 client setting (w/o client selection). We see in Fig. 2 that Mavericks helps increase the model accuracy. Despite this benefit of training with Mavericks, we observe that the average SR of Mavericks is considerably lower than that of non-Mavericks in FedAvg when the rewards are based on the SVs. In contrast, Fig. 2(b) exhibits a fairer SR for Mavericks. In Fig. 3, when considering the scenario with 50 clients (w/ client selection), FedMS assigns higher rewards to Mavericks than all baseline methods. In these experiments, we use the GTG-Shapley [17] for SV computation. We deduce that FedMS shows effectiveness for fairer SR distribution for Mavericks and non-Mavericks in both settings (w/ and w/o client selection), thanks to the class-wise SV-based rewards as in (8).

Refer to caption
(a) FedAvg (Original)
Refer to caption
(b) S-FedAvg
Refer to caption
(c) FedEMD
Refer to caption
(d) FedMS (Our method)
Figure 3: Comparison of test accuracy and Shapley rewards with 50 clients (w/ client selection) for the MNIST dataset using GTG-Shapley for various client selection techniques.

Improved Model Performance. Figs. 3(a), 3(b) and 3(c) display a similar test accuracy in the settings of All Clients and Without Mavericks for FedAvg, S-FedAvg, and FedEMD, respectively. On the other hand, in Fig. 3(d), our method FedMS demonstrates an elevated accuracy in All Clients setting compared to the Without Mavericks setting, illustrating the effective utilization of Mavericks during FL training under the proposed approach.

Comparisons with Baselines. Our proposed method, FedMS, outperforms the baselines in both SR and utility metrics. In regards to the SR, FedMS computes a fairer SR for all clients by considering class-wise SVs and class difficulties β𝛽\betaitalic_β. If rare classes owned by Mavericks perform poorly on the validation dataset, our mechanism increases the β𝛽\betaitalic_β associated with these rare classes. Hence, our system boosts fairer SR for the Mavericks. Alternatively, in the baseline methods, only S-FedAvg and GreedyFed adopt SV in their client selection process but none of them considers the Mavericks settings. In those SV-based methods, the low SR of Mavericks decreases their selection probability during training, resulting in under-utilization of Mavericks. Since FedMS can effectively select the most contributing clients, it successfully selects Mavericks and shows an increased model accuracy, as shown in Fig. 3(d). FedEMD applies a decreasing selection probability of Mavericks as iterations progress and our approach differs from FedEMD by not relying on the distance of local & global data distributions. Instead, we prioritize the class-wise contribution of each client during the selection process, thus, our method achieves fairer SR and improved accuracy compared to FedEMD (see more comparisons in Appendix -D).

V Conclusion and Future Directions

The selection of clients plays a pivotal role in achieving success in FL, as it allows for the optimization of the utility derived from diverse model updates, specifically in the presence of Mavericks. In this work, we propose FedMS, a Maverick-aware Shapley valuation mechanism for client selection in FL that not only fairly evaluates the contributions of the Mavericks but also effectively selects the most contributing clients in each training round. Our proposed FedMS achieves better model performance and fairer Shapley Rewards distribution compared to the existing methods.

Future Directions: FedMS does not consider potential attacks in the presence of Mavericks, particularly how attackers might exploit the system (e.g., attackers or outliers can pretend as Mavericks or the Mavericks themselves can take advantage of the system). Future research should focus on investigating potential attacks, such as poisoning and adversarial attacks, that target Maverick-friendly FL systems.

References

  • [1] Paul Voigt and Axel Von dem Bussche. The EU General data protection regulation (GDPR). volume 10, pages 10–5555. Springer, 2017.
  • [2] Preston Bukaty. The California Consumer Privacy Act (CCPA): An Implementation Guide. IT Governance Ltd, 2019.
  • [3] H Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, et al. Communication-efficient learning of deep networks from decentralized data. 2016.
  • [4] Bing Luo, Wenli Xiao, Shiqiang Wang, Jianwei Huang, and Leandros Tassiulas. Tackling system and statistical heterogeneity for federated learning with adaptive client sampling. In IEEE INFOCOM- IEEE Conference on Computer Communications, London, United Kingdom, pages 1739–1748. IEEE, 2022.
  • [5] Fan Xin, **ghui Zhang, Junzhou Luo, and Fang Dong. Federated Learning Client Selection Mechanism Under System and Data Heterogeneity. In 25th IEEE International Conference on Computer Supported Cooperative Work in Design, CSCWD Hangzhou, China,, pages 1239–1244. IEEE, May,2022.
  • [6] Yutong Dai, Zeyuan Chen, Junnan Li, Shelby Heinecke, Lichao Sun, and Ran Xu. Tackling data heterogeneity in federated learning with class prototypes. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 7314–7322, 2023.
  • [7] Jianyi Zhang, Ang Li, Minxue Tang, **gwei Sun, Xiang Chen, Fan Zhang, Changyou Chen, Yiran Chen, and Hai Li. Fed-CBS: A Heterogeneity-Aware Client Sampling Mechanism for Federated Learning via Class-Imbalance Reduction. In International Conference on Machine Learning, ICML ,Honolulu, Hawaii, USA, volume 202 of Proceedings of Machine Learning Research, pages 41354–41381. PMLR, July, 2023.
  • [8] Jiyue Huang, Chi Hong, Yang Liu, Lydia Y Chen, and Stefanie Roos. Tackling Mavericks in Federated Learning via Adaptive Client Selection Strategy. AAAI, 2022.
  • [9] The Washington Post. Why Alexa Can’t Understand Some Accents. The Washington Post, 2018.
  • [10] Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics, pages 1273–1282. PMLR, 2017.
  • [11] Lei Fu, Huanle Zhang, Ge Gao, Mi Zhang, and Xin Liu. Client selection in federated learning: Principles, challenges, and opportunities. IEEE Internet of Things Journal, 2023.
  • [12] Lokesh Nagalapatti and Ramasuri Narayanam. Game of Gradients: Mitigating irrelevant clients in federated learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 9046–9054, 2021.
  • [13] Pranava Singhal, Shashi Raj Pandey, and Petar Popovski. Greedy Shapley Client Selection for Communication-Efficient Federated Learning. IEEE Networking Letters, 2024.
  • [14] Lloyd S Shapley. Cores of convex games. International journal of game theory, 1:11–26, 1971.
  • [15] Baturalp Buyukates, Chaoyang He, Shanshan Han, Zhiyong Fang, Yupeng Zhang, Jieyi Long, Ali Farahanchi, and Salman Avestimehr. Proof-of-Contribution-Based Design for Collaborative Machine Learning on Blockchain. In IEEE International Conference on Decentralized Applications and Infrastructures (IEEE DAPPS 2023), July 2023.
  • [16] Amirata Ghorbani and James Zou. Data shapley: Equitable valuation of data for machine learning. In International conference on machine learning, pages 2242–2251. PMLR, 2019.
  • [17] Zelei Liu, Yuanyuan Chen, Han Yu, Yang Liu, and Lizhen Cui. GTG-Shapley: Efficient and accurate participant contribution evaluation in federated learning. ACM Transactions on Intelligent Systems and Technology (TIST), 13(4):1–21, 2022.
  • [18] Tianshu Song, Yongxin Tong, and Shuyue Wei. Profit allocation for federated learning. In 2019 IEEE International Conference on Big Data (Big Data), pages 2577–2586. IEEE, 2019.
  • [19] Shuyue Wei, Yongxin Tong, Zimu Zhou, and Tianshu Song. Efficient and fair data valuation for horizontal federated learning. Federated Learning: Privacy and Incentive, pages 139–152, 2020.
  • [20] Li Deng. The mnist database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine, 29(6):141–142, 2012.
  • [21] Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. Cifar-10 (canadian institute for advanced research). Technical Report 1, Technical Report, 2009.
  • [22] Marius-Constantin Popescu, Valentina E Balas, Liliana Perescu-Popescu, and Nikos Mastorakis. Multilayer perceptron and neural networks. WSEAS Transactions on Circuits and Systems, 8(7):579–588, 2009.
  • [23] Saad Albawi, Tareq Abed Mohammed, and Saad Al-Zawi. Understanding of a convolutional neural network. In 2017 international conference on engineering and technology (ICET), pages 1–6. Ieee, 2017.
  • [24] Tian Li, Anit Kumar Sahu, Manzil Zaheer, Maziar Sanjabi, Ameet Talwalkar, and Virginia Smith. Federated optimization in heterogeneous networks. volume 2, pages 429–450, 2020.
  • [25] Yae Jee Cho, Jianyu Wang, and Gauri Joshi. Towards understanding biased client selection in federated learning. In International Conference on Artificial Intelligence and Statistics, pages 10351–10375. PMLR, 2022.

-A Related Works

Client Selection. In an FL system, clients show different degrees of heterogeneity in data distribution and system resources. The vanilla mechanism FedAvg [10] that randomly samples clients in each round may not fully leverage the diverse local updates from heterogeneous clients [11]. Various selection methods have been proposed to deal with heterogeneity in FL to improve the model performance [12, 8, 13, 25]. S-FedAvg [12] combines FedAvg with SV and empowers the server to select relevant clients with high probability. GreedyFed [13] greedily selects the most contributing clients in each round by employing a fast Shapley approximation algorithm named the GTG-Shapley [17]. In Power-of-Choice (POC) [25], authors propose a client scheduling strategy that selects the client models with the highest loss in each round. Common to all these methods is the fact that none of them considers the Mavericks. Recently, authors in [8] introduced the concept of Mavericks and proposed FedEMD to adaptively select clients based on the Wasserstein distance between the local and global data distributions. Although FedEMD increases the probability of selecting the Mavericks, it does not provide a solution to fairly evaluate the contribution of the Mavericks.

Contribution Evaluation via Gradient Shapley Methods. SV-based methods are widely employed in FL to compute the contributions of the participating clients [19, 18, 17]. Despite its prominence in the game theory literature, in the context of ML, SV [14] is not practical as it requires retraining from scratch considering each client permutation. Gradient Shapley methods aim to eliminate the lengthy retraining of FL models by utilizing gradient updates of the clients to approximate the FL sub-models for various clients permutations in the SV computation. Reference [18] proposes two gradient Shapley methods: one-round (OR) and multi-round (MR). OR calculates the SV once after the training while MR calculates the SV in every FL round. Truncated Multi-Rounds Construction (TMR) [19] eliminates unnecessary FL sub-model reconstructions by adding a decay factor. In Guided Truncation Gradient Shapley (GTG-Shapley) [17], authors design a guided Monte Carlo sampling approach combined with truncation techniques to further improve the computation efficiency. Despite these efforts to efficiently and accurately approximate the SV, previous works [8, 15] showed that the current SV-based methods are unable to fairly assess the contributions of the Mavericks. Motivated by these, one of our goals in this work is to propose a SV-based contribution score that can appreciate the contributions of both Maverick and non-Maverick clients. We then use the accumulated contribution scores of the clients to perform intelligent client selection in each round to better utilize the Mavericks during training and improve the model performance.

-B Maverick GTG-Shapley

The employed class-wise Shapley value computation technique in FedMS, i.e., Maverick-Shapley, is compatible with the existing SV approximation approaches. In Algorithm 3, we describe the class-wise Shapley computation by using the GTG-Shapley [17] technique.

1
2Input: Updated client models {𝒘it}i𝒦tsubscriptsuperscriptsubscript𝒘𝑖𝑡𝑖superscript𝒦𝑡\{\bm{w}_{i}^{t}\}_{i\in\mathcal{K}^{t}}{ bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_i ∈ caligraphic_K start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_POSTSUBSCRIPT; current server model 𝒘tsuperscript𝒘𝑡\bm{w}^{t}bold_italic_w start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT; validation dataset at server Dvalsubscript𝐷𝑣𝑎𝑙D_{val}italic_D start_POSTSUBSCRIPT italic_v italic_a italic_l end_POSTSUBSCRIPT; class-wise accuracy function 𝒱class()subscript𝒱𝑐𝑙𝑎𝑠𝑠\mathcal{V}_{class}(\cdot)caligraphic_V start_POSTSUBSCRIPT italic_c italic_l italic_a italic_s italic_s end_POSTSUBSCRIPT ( ⋅ ); \mathcal{M}caligraphic_M: set of class labels.
3 Hyperparameters: Error threshold ϵbsubscriptitalic-ϵ𝑏\epsilon_{b}italic_ϵ start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT, ϵisubscriptitalic-ϵ𝑖\epsilon_{i}italic_ϵ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, temperature T.
4 Initialize: ϕi=0,i𝒦t,r=0formulae-sequencesubscriptitalic-ϕ𝑖0formulae-sequencefor-all𝑖superscript𝒦𝑡𝑟0\phi_{i}=0,\forall i\in\mathcal{K}^{t},r=0italic_ϕ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0 , ∀ italic_i ∈ caligraphic_K start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , italic_r = 0
5 Compute wt+1=ModelAverage(ni,{𝒘it}i𝒦t)superscript𝑤𝑡1ModelAveragesubscript𝑛𝑖subscriptsuperscriptsubscript𝒘𝑖𝑡𝑖superscript𝒦𝑡w^{t+1}=\textnormal{{ModelAverage}}(n_{i},\,\{\bm{w}_{i}^{t}\}_{i\in\mathcal{K% }^{t}})italic_w start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT = ModelAverage ( italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , { bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_i ∈ caligraphic_K start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_POSTSUBSCRIPT )
6 v0=𝒱class(wt;𝒟val),vN=𝒱class(wt+1;𝒟val),formulae-sequencesubscript𝑣0subscript𝒱𝑐𝑙𝑎𝑠𝑠superscript𝑤𝑡subscript𝒟valsubscript𝑣𝑁subscript𝒱𝑐𝑙𝑎𝑠𝑠superscript𝑤𝑡1subscript𝒟valv_{0}=\mathcal{V}_{class}(w^{t};\mathcal{D}_{\textrm{val}}),\,v_{N}=\mathcal{V% }_{class}(w^{t+1};\mathcal{D}_{\textrm{val}}),\,italic_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = caligraphic_V start_POSTSUBSCRIPT italic_c italic_l italic_a italic_s italic_s end_POSTSUBSCRIPT ( italic_w start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ; caligraphic_D start_POSTSUBSCRIPT val end_POSTSUBSCRIPT ) , italic_v start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT = caligraphic_V start_POSTSUBSCRIPT italic_c italic_l italic_a italic_s italic_s end_POSTSUBSCRIPT ( italic_w start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ; caligraphic_D start_POSTSUBSCRIPT val end_POSTSUBSCRIPT ) ,
7 # between round truncation
8 if |vNv0|>ϵbsubscript𝑣𝑁subscript𝑣0subscriptitalic-ϵ𝑏|v_{N}-v_{0}|>\epsilon_{b}| italic_v start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT - italic_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT | > italic_ϵ start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT then
9    while Convergence criteria not met do
10       r=r+1𝑟𝑟1r=r+1italic_r = italic_r + 1
11       for client i𝒦t𝑖superscript𝒦𝑡i\in\mathcal{K}^{t}italic_i ∈ caligraphic_K start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT  do
12         permute 𝒦t{i}:πr[0]=i,πr[1:n]\mathcal{K}^{t}\setminus\{i\}:\pi^{r}[0]=i,\pi^{r}[1:n]caligraphic_K start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∖ { italic_i } : italic_π start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT [ 0 ] = italic_i , italic_π start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT [ 1 : italic_n ]
13         v0r=v0superscriptsubscript𝑣0𝑟subscript𝑣0v_{0}^{r}=v_{0}italic_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT = italic_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT
14         # within-round truncation
15         for j=1,,n𝑗1𝑛j=1,\dots,nitalic_j = 1 , … , italic_n do
16           if |vNvj1r|ϵisubscript𝑣𝑁subscriptsuperscript𝑣𝑟𝑗1subscriptitalic-ϵ𝑖|v_{N}-v^{r}_{j-1}|\geq\epsilon_{i}| italic_v start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT - italic_v start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT | ≥ italic_ϵ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT
17                 H=πr[:j]H=\pi^{r}[:j]italic_H = italic_π start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT [ : italic_j ]
18                 w~H=ModelAverage({𝒘it}iH,𝒘t)subscript~𝑤𝐻ModelAveragesubscriptsuperscriptsubscript𝒘𝑖𝑡𝑖𝐻superscript𝒘𝑡\widetilde{w}_{H}=\textnormal{{ModelAverage}}(\{\bm{w}_{i}^{t}\}_{i\in H},\bm{% w}^{t})over~ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_H end_POSTSUBSCRIPT = ModelAverage ( { bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_i ∈ italic_H end_POSTSUBSCRIPT , bold_italic_w start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT )
19                 vjr=𝒱class(w~H;𝒟val)subscriptsuperscript𝑣𝑟𝑗subscript𝒱𝑐𝑙𝑎𝑠𝑠subscript~𝑤𝐻subscript𝒟valv^{r}_{j}=\mathcal{V}_{class}(\widetilde{w}_{H};\mathcal{D}_{\textrm{val}})italic_v start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = caligraphic_V start_POSTSUBSCRIPT italic_c italic_l italic_a italic_s italic_s end_POSTSUBSCRIPT ( over~ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_H end_POSTSUBSCRIPT ; caligraphic_D start_POSTSUBSCRIPT val end_POSTSUBSCRIPT )
20           else
21                 vjr=vj1rsubscriptsuperscript𝑣𝑟𝑗subscriptsuperscript𝑣𝑟𝑗1v^{r}_{j}=v^{r}_{j-1}italic_v start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = italic_v start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT
22                 for class c𝑐citalic_c \in \mathcal{M}caligraphic_M do
23                      ϕπr[j]c=r1rϕπr[j]c+(vjr,cvj1r,c)rsuperscriptsubscriptitalic-ϕsuperscript𝜋𝑟delimited-[]𝑗𝑐𝑟1𝑟superscriptsubscriptitalic-ϕsuperscript𝜋𝑟delimited-[]𝑗𝑐superscriptsubscript𝑣𝑗𝑟𝑐superscriptsubscript𝑣𝑗1𝑟𝑐𝑟\phi_{\pi^{r}[j]}^{c}=\frac{r-1}{r}\phi_{\pi^{r}[j]}^{c}+\frac{(v_{j}^{r,c}-v_% {j-1}^{r,c})}{r}italic_ϕ start_POSTSUBSCRIPT italic_π start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT [ italic_j ] end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT = divide start_ARG italic_r - 1 end_ARG start_ARG italic_r end_ARG italic_ϕ start_POSTSUBSCRIPT italic_π start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT [ italic_j ] end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT + divide start_ARG ( italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r , italic_c end_POSTSUPERSCRIPT - italic_v start_POSTSUBSCRIPT italic_j - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r , italic_c end_POSTSUPERSCRIPT ) end_ARG start_ARG italic_r end_ARG
24               
25               
26               
27               # Find best clients set 𝒦^tsuperscript^𝒦𝑡\hat{\mathcal{K}}^{t}over^ start_ARG caligraphic_K end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT and its class-wise accuracy v^^𝑣\hat{v}over^ start_ARG italic_v end_ARG
28                𝒦^t,v^argmaxHc𝒱classc(w~H;𝒟val)superscript^𝒦𝑡^𝑣subscriptargmax𝐻subscript𝑐superscriptsubscript𝒱𝑐𝑙𝑎𝑠𝑠𝑐subscript~𝑤𝐻subscript𝒟val\hat{\mathcal{K}}^{t},\hat{v}\leftarrow\operatorname*{argmax}_{H}\sum\limits_{% c\in\mathcal{M}}\mathcal{V}_{class}^{c}(\widetilde{w}_{H};\mathcal{D}_{\textrm% {val}})over^ start_ARG caligraphic_K end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , over^ start_ARG italic_v end_ARG ← roman_argmax start_POSTSUBSCRIPT italic_H end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_c ∈ caligraphic_M end_POSTSUBSCRIPT caligraphic_V start_POSTSUBSCRIPT italic_c italic_l italic_a italic_s italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT ( over~ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_H end_POSTSUBSCRIPT ; caligraphic_D start_POSTSUBSCRIPT val end_POSTSUBSCRIPT )
29                # Obtain class difficulty β𝛽\betaitalic_β
30                βc=exp(1v^cT)cMexp(1v^cT),cMformulae-sequencesuperscript𝛽𝑐𝑒𝑥𝑝1superscript^𝑣𝑐𝑇subscript𝑐𝑀𝑒𝑥𝑝1superscript^𝑣𝑐𝑇for-all𝑐𝑀\beta^{c}=\frac{exp(\frac{1-\hat{v}^{c}}{T})}{\sum\limits_{c\in M}exp(\frac{1-% \hat{v}^{c}}{T})},\forall c\in Mitalic_β start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT = divide start_ARG italic_e italic_x italic_p ( divide start_ARG 1 - over^ start_ARG italic_v end_ARG start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT end_ARG start_ARG italic_T end_ARG ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_c ∈ italic_M end_POSTSUBSCRIPT italic_e italic_x italic_p ( divide start_ARG 1 - over^ start_ARG italic_v end_ARG start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT end_ARG start_ARG italic_T end_ARG ) end_ARG , ∀ italic_c ∈ italic_M
31               
return ϕ,β,𝒦^titalic-ϕ𝛽superscript^𝒦𝑡\phi,\beta,\hat{\mathcal{K}}^{t}italic_ϕ , italic_β , over^ start_ARG caligraphic_K end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT
Algorithm 3 Maverick GTG-Shapley

-C FedMS Parameters & Notation

Table I lists the parameters and notation we use in FedMS.

Notation Description
(𝒙,y)𝒙𝑦(\bm{x},y)( bold_italic_x , italic_y ) 𝒙𝒙\bm{x}bold_italic_x is the feature vector and y𝑦yitalic_y is the corresponding label
𝒦𝒦\mathcal{K}caligraphic_K Set of clients such that 𝒦={1,2,,I}𝒦12𝐼\mathcal{K}=\{1,2,...,I\}caligraphic_K = { 1 , 2 , … , italic_I }
\mathcal{M}caligraphic_M Set of class labels ={1,2,,C}12𝐶\mathcal{M}=\{1,2,...,C\}caligraphic_M = { 1 , 2 , … , italic_C }
ηisubscript𝜂𝑖\eta_{i}italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT Learning rate at client i𝑖iitalic_i
𝒟isubscript𝒟𝑖\mathcal{D}_{i}caligraphic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT Local dataset of client i𝑖iitalic_i
𝒟valsubscript𝒟𝑣𝑎𝑙\mathcal{D}_{val}caligraphic_D start_POSTSUBSCRIPT italic_v italic_a italic_l end_POSTSUBSCRIPT Validation dataset at the server
𝒘𝒘{\bm{w}}bold_italic_w Learnable weights of the global model
𝒘isubscript𝒘𝑖{\bm{w}}_{i}bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT Local model of client i𝑖iitalic_i 𝒘isubscript𝒘𝑖{\bm{w}}_{i}bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT
𝒱class()subscript𝒱𝑐𝑙𝑎𝑠𝑠\mathcal{V}_{class}(\cdot)caligraphic_V start_POSTSUBSCRIPT italic_c italic_l italic_a italic_s italic_s end_POSTSUBSCRIPT ( ⋅ ) Class-wise accuracy function
ϕitalic-ϕ\phiitalic_ϕ Class-wise Shapley value vector (including all clients)
ϕisubscriptitalic-ϕ𝑖\phi_{i}italic_ϕ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT Class-wise Shapley value of the i𝑖iitalic_i-th client
β𝛽\betaitalic_β Class difficulty vector (including all classes)
βcsuperscript𝛽𝑐\beta^{c}italic_β start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT Class difficulty of class c𝑐citalic_c
Sicsuperscriptsubscript𝑆𝑖𝑐S_{i}^{c}italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT Accumulated Shapley value of client i𝑖iitalic_i for class c𝑐citalic_c
α𝛼\alphaitalic_α Decay factor for the accumulated Shapley value Sicsuperscriptsubscript𝑆𝑖𝑐S_{i}^{c}italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT
S^icsuperscriptsubscript^𝑆𝑖𝑐\hat{S}_{i}^{c}over^ start_ARG italic_S end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT Contribution score of client i𝑖iitalic_i
PS^subscript𝑃^𝑆P_{\hat{S}}italic_P start_POSTSUBSCRIPT over^ start_ARG italic_S end_ARG end_POSTSUBSCRIPT Selection probability vector for client selection (including all clients)
PS^,isubscript𝑃^𝑆𝑖P_{\hat{S},i}italic_P start_POSTSUBSCRIPT over^ start_ARG italic_S end_ARG , italic_i end_POSTSUBSCRIPT Selection probability of client i𝑖iitalic_i for client selection
T𝑇Titalic_T Number of FL training rounds
t𝑡titalic_t Index of FL round, t=0,1,2,T1𝑡012𝑇1t=0,1,2,...T-1italic_t = 0 , 1 , 2 , … italic_T - 1
E𝐸Eitalic_E Number of local epochs
B𝐵Bitalic_B Mini-batch size
𝒘tsuperscript𝒘𝑡{\bm{w}^{t}}bold_italic_w start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT Learnable weights of the global model in round t𝑡titalic_t
𝒘itsuperscriptsubscript𝒘𝑖𝑡{\bm{w}}_{i}^{t}bold_italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT Local model of client i𝑖iitalic_i in round t𝑡titalic_t
𝒦tsuperscript𝒦𝑡\mathcal{K}^{t}caligraphic_K start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT Set of selected clients in round t𝑡titalic_t with selection strategy π𝜋\piitalic_π
𝒦^tsuperscript^𝒦𝑡\hat{\mathcal{K}}^{t}over^ start_ARG caligraphic_K end_ARG start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT Best clients set with the highest class-accuracy in round t𝑡titalic_t
nitsuperscriptsubscript𝑛𝑖𝑡n_{i}^{t}italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT Dataset size of the i𝑖iitalic_i-th client in round t𝑡titalic_t
TABLE I: Main parameters and notation.

-D Additional Experiments

In this section, we present additional experimental evaluations conducted on both MNIST and CIFAR-10 datasets concerning our reward and utility metrics. When examining the reward metrics, focusing on the fairer Shapley Rewards, we can observe from Figs. 4,  68, and 10 that FedMS provides more rewards to Mavericks compared to non-Mavericks for both MNIST and CIFAR-10 dataset, in line with the observed accuracy benefit of training with the Mavericks. In contrast, the state-of-the-art (SOTA) techniques provide lower rewards to Mavericks compared to non-Mavericks (example, FedAvg in Figs. 5(b)7(b)9(b) or FedEMD in Figs. 5(d)7(d)9(d)). Regarding the utility metric, we notice that FedMS better utilizes Mavericks, resulting in an overall improvement in model accuracy compared to the SOTA methods on both MNIST and CIFAR-10 datasets. From these two observations, we can deduce that our method not only effectively selects Mavericks, thereby enhancing model performance, but also ensures a fairer Shapley Rewards distribution among Mavericks and non-Mavericks. Accuracy performance of the proposed FedMS in comparison with the SOTA baselines considering various SV approximation techniques for MNIST and CIFAR-10 dataset is given in Table II and III.

Refer to caption
(a) FedMS (Our method)
Refer to caption
(b) FedProx
Refer to caption
(c) GreedyFed
Refer to caption
(d) PoC
Figure 4: Comparison of test accuracy and Shapley rewards with 50 clients (w/ client selection) for the MNIST dataset using GTG-Shapley for various client selection techniques.
Refer to caption
(a) FedMS (Our method)
Refer to caption
(b) FedAvg (original)
Refer to caption
(c) S-FedAvg
Refer to caption
(d) FedEMD
Figure 5: Comparison of test accuracy and Shapley rewards with 50 clients (w/ client selection) for the CIFAR-10 dataset using GTG-Shapley for various client selection techniques.
Refer to caption
(a) FedMS (Our method)
Refer to caption
(b) FedProx
Refer to caption
(c) GreedyFed
Refer to caption
(d) PoC
Figure 6: Comparison of test accuracy and Shapley rewards with 50 clients (w/ client selection) for the CIFAR-10 dataset using GTG-Shapley for various client selection techniques.
Refer to caption
(a) FedMS (Our method)
Refer to caption
(b) FedAvg (original)
Refer to caption
(c) S-FedAvg
Refer to caption
(d) FedEMD
Figure 7: Comparison of test accuracy and Shapley rewards with 50 clients (w/ client selection) for the CIFAR-10 dataset using MR Shapley for various client selection techniques.
Refer to caption
(a) FedMS (Our method)
Refer to caption
(b) FedProx
Refer to caption
(c) GreedyFed
Refer to caption
(d) PoC
Figure 8: Comparison of test accuracy and Shapley rewards with 50 clients (w/ client selection) for the CIFAR-10 dataset using MR Shapley for various client selection techniques.
Refer to caption
(a) FedMS (Our method)
Refer to caption
(b) FedAvg (original)
Refer to caption
(c) S-FedAvg
Refer to caption
(d) FedEMD
Figure 9: Comparison of test accuracy and Shapley rewards with 50 clients (w/ client selection) for the CIFAR-10 dataset using TMR Shapley for various client selection techniques.
Refer to caption
(a) FedMS (Our method)
Refer to caption
(b) FedProx
Refer to caption
(c) GreedyFed
Refer to caption
(d) PoC
Figure 10: Comparison of test accuracy and Shapley rewards with 50 clients (w/ client selection) for the CIFAR-10 dataset using TMR Shapley for various client selection techniques.
SHAP CS-Alg FedMS FedAvg FedProx S-FedAvg GreedyFed PoC FedEMD
GTG-Shapley 82.91 ±plus-or-minus\pm± 0.5 73.77 ±plus-or-minus\pm± 0.1 73.52 ±plus-or-minus\pm± 0.4 74.00 ±plus-or-minus\pm± 0.5 73.91 ±plus-or-minus\pm± 0.2 74.49 ±plus-or-minus\pm± 0.7 73.79 ±plus-or-minus\pm± 0.2
MR 82.81 ±plus-or-minus\pm± 1.8 73.99 ±plus-or-minus\pm± 0.4 73.79 ±plus-or-minus\pm± 0.4 73.94 ±plus-or-minus\pm± 0.2 73.66 ±plus-or-minus\pm± 0.2 73.94 ±plus-or-minus\pm± 0.1 73.87 ±plus-or-minus\pm± 0.1
TMR 80.27 ±plus-or-minus\pm± 2.4 73.74 ±plus-or-minus\pm± 0.1 74.02 ±plus-or-minus\pm± 0.4 73.67 ±plus-or-minus\pm± 0.1 73.99 ±plus-or-minus\pm± 0.1 73.42 ±plus-or-minus\pm± 0.4 75.49 ±plus-or-minus\pm± 1.0
TABLE II: Model performance (test accuracy in %) of different client selection algorithms (CS-Alg) including FedMS for MNIST dataset under various Shapley value approximation methods.
SHAP CS-Alg FedMS FedAvg FedProx S-FedAvg GreedyFed PoC FedEMD
GTG-Shapley 64.79 ±plus-or-minus\pm± 0.5 60.87 ±plus-or-minus\pm± 0.1 60.25 ±plus-or-minus\pm± 0.4 61.53 ±plus-or-minus\pm± 0.3 59.87 ±plus-or-minus\pm± 0.1 61.65 ±plus-or-minus\pm± 0.3 62.7 ±plus-or-minus\pm± 0.1
MR 64.56 ±plus-or-minus\pm± 1.5 61.84 ±plus-or-minus\pm± 0.2 60.97 ±plus-or-minus\pm± 0.4 61.24 ±plus-or-minus\pm± 0.2 58.81 ±plus-or-minus\pm± 0.2 62.29 ±plus-or-minus\pm± 0.3 62.25 ±plus-or-minus\pm± 0.1
TMR 64.5 ±plus-or-minus\pm± 0.6 61.84 ±plus-or-minus\pm± 0.1 59.9 ±plus-or-minus\pm± 0.1 61.5 ±plus-or-minus\pm± 0.5 57.7 ±plus-or-minus\pm± 0.2 61.25 ±plus-or-minus\pm± 0.2 61.85 ±plus-or-minus\pm± 0.15
TABLE III: Model performance (test accuracy in %) of different client selection algorithms (CS-Alg) including FedMS for CIFAR-10 dataset under various Shapley value approximation methods.