HTML conversions sometimes display errors due to content that did not convert correctly from the source. This paper uses the following packages that are not yet supported by the HTML conversion tool. Feedback on these issues are not necessary; they are known and are being worked on.

  • failed: nccmath
  • failed: mwe

Authors: achieve the best HTML results from your LaTeX submissions by following these best practices.

License: CC BY-NC-SA 4.0
arXiv:2403.04753v1 [cs.GT] 07 Mar 2024
\OneAndAHalfSpacedXI\ECRepeatTheorems\EquationsNumberedThrough\MANUSCRIPTNO\RUNAUTHOR

Qi, Zhu

\RUNTITLE

A Pitfall of Shapley Values in Collaborative Federated Learning

\TITLE

Mechanism for Decision-aware Collaborative Federated Learning: A Pitfall of Shapley Values

\ARTICLEAUTHORS\AUTHOR

Meng Qi \AFFSC Johnson College of Business, Cornell University, Ithaca, NY, 14850,
\EMAIL[email protected] \AUTHORMingxi Zhu \AFFScheller College of Business, Georgia Institute of Technology
\EMAIL[email protected]

\ABSTRACT

This paper investigates mechanism design for decision-aware collaboration via federated learning (FL) platforms. Our framework consists of a digital platform and multiple decision-aware agents, each endowed with proprietary data sets. The platform offers an infrastructure that enables access to the proprietary data, creates incentives for collaborative learning aimed at operational decision-making, and conducts federated learning (FL) to avoid direct raw data sharing. The computation and communication efficiency of the FL process is inherently influenced by the agent participation equilibrium induced by the mechanism. Therefore, assessing the collaborative learning system’s efficiency involves two critical factors: the surplus created by coalition formation and the communication costs incurred across the coalition during FL. To evaluate the system efficiency under the intricate interplay between mechanism design, agent participation, operational decision-making, and the performance of FL algorithms, we introduce a multi-action collaborative federated learning (MCFL) framework for decision-aware agents. Under the MCFL framework, we further analyze the equilibrium for the renowned Shapley value based mechanisms. Specifically, we examine the issue of false-name manipulation, a form of dishonest behavior where participating agents create duplicate fake identities to split their original data among these identities. By solving the agent participation decisions in equilibrium, we demonstrate that, while Shapley value effectively maximizes coalition-generated surplus by encouraging full participation, it inadvertently promotes false-name manipulation. This further significantly increases the communication costs when the platform conducts federated learning across the coalition members. Thus, we highlight a significant pitfall of Shapley value based mechanisms, which implicitly incentivizes data splitting and identity duplication, ultimately impairing the overall efficiency in federated learning systems.

\KEYWORDS

collaborative federated learning, mechanism design, operational analytics, optimization algorithms

1 Introduction

Digitalization has brought revolutionary changes across traditional sectors such as retail, finance, and healthcare, allowing companies to offer services via online platforms and thus highlighting the importance of data-driven approaches. Moreover, the advances in large-scale machine learning models underscore the necessity for collaboration in develo** machine learning methods, where individuals and organizations unite, sharing their data for mutual advancement.

Our framework models the collaborative learning system consisting of a platform and multiple decision-aware agents. The platform offers an infrastructure that enables access to each agent’s proprietary data set and coordinates the collaboration by incentivizing agent participation in collaborative learning through mechanism design. There are two main types of collaborative learning approaches: centralized learning, which involves training on a centralized machine; and federated learning, which is a decentralized approach with local training. Centralized learning involves raw data sharing, which is often less preferred due to privacy concerns (AbdulRahman et al. 2020, Choudhury 2023). In contrast, federated learning (FL) does not require any of the participants to reveal their raw data to the platform, safeguarding the privacy of local agents’ data (McMahan et al. 2017). Therefore, FL techniques become prevailing for the platform to address privacy and safety concerns arising from the agents.

Normally, the decision-aware agents have their specific operational objectives when participating in collaborative federated learning. For example, the primary goal of sellers in e-commerce platforms is to make informed data-driven pricing or inventory decisions, rather than concentrating on purely statistical predictive objectives, as often assumed in prior research (McMahan et al. 2017). Moreover, the decision-aware agents could choose how to participate in collaborative learning by partially contributing a subset of their datasets or splitting their datasets to participate under fake identities. This type of dishonest conduct is known as false-name manipulation (Conitzer and Yokoo 2010). Hence, the platform must carefully design a mechanism to encourage full participation among the decision-aware agents with heterogeneous data volumes, while making the mechanism more robust to false-name manipulation.

It is worth emphasizing that, there is an intricate interplay between mechanism design, agent participation, operational decision-making, and the performance of FL algorithms, as described in the following questions: 1) how is the allocation mechanism designed based on performance guarantees of learning algorithms; 2) how a mechanism influences the agent participation equilibrium; and 3) how the computation and communication of the FL learning algorithm is influenced by the equilibrium. This complex interaction among these factors has been overlooked in existing studies, while prior research focusing on encouraging local agent participation often neglects its impact on algorithm performance, and studies aimed at improving FL algorithm efficiency and reducing communication costs tend to disregard the crucial role of mechanism design in improving algorithm performance.

1.1 A Motivating Example

One use of this scenario is the customer-to-manufacturer (C2M) initiative. For online e-commerce platforms such as Amazon, FlipKart, Alibaba, and JD.com, products and services are provided by sellers/manufacturers to customers over such online platforms. These platforms’ digital infrastructure is the key to accessing a great amount of granular and precise data (e.g. search and clicks of products, customer reviews and ratings) compared to traditional brick-and-mortar retailers (Qi et al. 2020). There is a clear incentive to leverage this data further up the supply chain, enabling sellers and manufacturers to make more informed decisions in planning and operations. In practice, platforms like Walmart have begun sharing these datasets with their sellers (Masters 2019, Arora and Jain 2023).

The C2M paradigm aims to build digital connections between end consumers and upstream manufacturers, often through online retailing platforms, as has been witnessed at JD.com, Alibaba, and Pinduoduo (PDD) (Mak and Max Shen 2021). Therefore, many operational decisions, such as pricing, product design, and inventory management, are indispensable to the resources provided by these digital platforms and cannot be effectively managed by sellers or manufacturers alone. Furthermore, by aggregating the information potentially obtained from their individual data, sellers/manufacturers will significantly benefit from having a more refined and accurate data-driven decision. To assist sellers, the platform provides collaboration opportunities for them to form a coalition to aggregate their data for more informed decision-making (Masters 2019). Such a collaboration is essential for harnessing the full potential of digital transformation.

1.2 Outline and Main Contributions

In this part, we present an outline of our paper and summarize our main methodlogical contributions.

In Section 3, we introduce a multi-action collaborative federated learning framework (MCFL) with decision-aware agents. The MCFL framework models the collaborative federated learning system by characterizing the mechanism designed by the platform, agent participation cooperative game, operational decision-making, and the performance of FL algorithms conducted within the formed coalition. This fills the gap in modeling all these key factors and investigating the complex interplay between them.

In Section 4 we investigate the surplus allocation mechanism. We specifically investigate the widely recognized Shapley value-based mechanism, a prominent method in multi-agent collaborative learning (Shapley et al. 1953). We further analyze the agent participation equilibrium considering the possible dishonest behavior - false-name manipulation. By solving the agent participation decisions in equilibrium, we demonstrate that the Shapley value is not robust against preventing false-name manipulation, which further impacts the performance of MCFL.

In Section 5, we analyze the system efficiency, which consists of two critical factors: the surplus generated through collaborative federated learning and the communication costs incurred across the coalition while performing FL algorithms. We focus on the Federated Averaging algorithm (FedAvg) where the agents perform local training and the platform periodically collects interim results from each agent and synchronizes all agents. The communication cost, which occurs during the synchronization, is affected by the number of agent identities within the coalition. We demonstrate that, while Shapley value effectively maximizes coalition-generated surplus by encouraging full participation, it inadvertently promotes false-name manipulation. As a result, agents tend to participate with data split among fake identities, which further significantly increases the communication costs for performing FL across agents. This pitfall of Shaply value based mechanism ultimately reduces the system efficiency under the MCFL framework.

Thus, we highlight a significant pitfall of Shapley value based mechanisms, which implicitly incentivizes data splitting and identity duplication, ultimately impairing the overall efficiency of the collaborative federated learning system.

2 Literature Review

Our paper is closely related to collaborations involving multiple agents. We model such collaboration as a cooperative game, which studies the coalitions formed by players and their cooperative actions Branzei et al. (2008). A well-known allocation rule within this framework is based on the Shapley value (Shapley et al. 1953), which allocates the payoffs to players based on their marginal contribution to each coalition she is a member of, ensuring a fair and efficient distribution. Although the Shapley value initially considers only binary participation actions of agents, Hsiao and Raghavan (1993) extends the Shapley value to scenarios where agents have multiple actions. Building on these works, we present a multi-action collaborative federated learning (MCFL) framework and summarize the related literature below.

Shapley value in operations management.

Shapley value is extensively used to incentivize collaboration among decision-aware agents with specific operational objectives. This topic is heavily studied in the field of operations management, where agents typically operate within distinct business contexts. Specifically, in supply chain management, Leng and Parlar (2009) explores the Shapley value for distributing surplus generated from shared demand information among a manufacturer, a distributor, and a retailer. Kemahlıoğlu-Ziya and Bartholdi III (2011) applies the Shapley mechanism for allocating surplus generated from inventory pooling among retailers. Gopalakrishnan et al. (2021) use the Shapley value to allocate carbon emission responsibilities among firms in a supply chain. Beyond supply chain management, Anily and Haviv (2010) employs the Shapley mechanism to divide the surplus from pooling service capacities in service systems. Singal et al. (2019) tackles the challenge of allocating eventual conversion among online advertisers through a modified counterfactual adjusted Shapley value. Bergantinos and Moreno-Ternero (2020) considers the equal-split rule, aligning with Shapley values in this specific context of splitting revenues from broadcasting sports events. Leng et al. (2021) investigates a game class with diminishing marginal contributions and analyzes Shapley value mechanism properties. Gopalakrishnan and Sankaranarayanan (2023) considers a Shapley mechanism variant for firm security cost-sharing arrangements. Several works also consider other mechanisms besides Shapley value for decision-aware agents. Gopalakrishnan et al. (2014) provides a summary of commonly used mechanisms in cost-sharing games. Our paper, while also considering decision-aware agents, specifically examines a scenario where the agents engage in informed decision-making through collaborative learning. In our work, each agent possesses a proprietary dataset. The collaboration is facilitated by a platform that provides the infrastructure for learning across multiple agents without sharing raw data. We refer to this setup as the multi-action collaborative federated learning (MCFL) framework for decision-aware agents.

Shapley value and machine learning.

Shapley value is also widely used in the field of machine learning and the computer science community. They consider the collaborative learning framework where multiple agents jointly minimize a global loss function through contributing individual data sets. Ghorbani and Zou (2019) provides a metric based on Shapley value to evaluate individual data contribution to empirical risk minimization. Jia et al. (2019) uses Shapley value to fairly distribute profits among multiple data contributors in collaborative machine learning. Sim et al. (2020) uses a variant of Shapley value to incentivize collaboration in data sharing for obtaining high-quality machine learning models. Rozemberczki et al. (2022) presents an overview of the applications of Shapley value in machine learning. Our paper differs from this flow of research in the following aspects. First, while our paper also considers mechanisms that incentivize data provision and data sharing among participating agents, the goal of our agents is not on minimizing a global loss function, but on making well-informed business decisions. Moreover, we consider a scenario where agents do not have incentives to directly share raw data, and the platform must provide an infrastructure for privacy-preserved collaborative learning, which leads to the analysis under the MCFL framework.

Federated learning.

Federated learning (FL) is a machine learning approach where a model is trained across multiple decentralized agents holding local data samples, without exchanging them, thus enhancing privacy and efficiency. McMahan et al. (2017) introduces a unified framework for federated learning, and Kairouz et al. (2021) offers a comprehensive survey on different aspects of the FL approach. A fundamental algorithm that is widely used in FL is FedAvg (or local SGD), which is based on the parallel stochastic gradient descent method Zinkevich et al. (2010). Since then, several works aimed at quantifying the performance of FedAvg Stich (2018), Yu et al. (2019), Khaled et al. (2019). The majority of the cost for performing FL normally occurs in its communication process when the platform queries the local interim results across decentralized agents Kairouz et al. (2021). And many studies have focused on reducing communication costs either through improving the algorithm design for faster convergence Shamir et al. (2014), Yuan and Ma (2020), or reducing the communication bandwidth Konečnỳ et al. (2016), Chraibi et al. (2019), Hamer et al. (2020). However, to our knowledge, there exists no previous work that considers the impact of mechanism design to reduce the communication cost of FL algorithms. In our work, we adopt the FL technology in privacy-preserving collaborative learning. We specifically consider the impact of mechanism design on the FL algorithm performance.

Incentive design in federated learning.

Lastly, our work is closely related to mechanism design in FL. Zhan et al. (2021) and Zeng et al. (2021) provide surveys in recent works on incentives and mechanism design in FL. Most of the works in this area focus on incentivizing agents to participate in federated learning, where the agents’ participation decisions are binary, and the cost of federated learning is not considered. Few works have considered partial provision of data, with the cost that occurs either in FL process or through data provision.Karimireddy et al. (2022) considers a model with partial participation due to the cost of data provision, and constructs a mechanism to incentivize collaborative learning. In our paper, we do not directly consider the cost of data provision, but our main result on the pitfall of Shapley would still hold with data provision cost. Gafni and Tennenholtz (2022) considers an FL platform with non-collaborative agents and investigates in how to manage the conflicting incentives, where we focus on a collaborative environment. Zhang et al. (2022) analyzes the incentive mechanism design while considering partial participation and the cost of computation for FL algorithms. We differ from this paper in the following aspects. Firstly, they assume specific exogenous functional forms of learning benefit and FL cost, while our work models the FL cost as an outcome of equilibrium participation, influenced by the mechanism. Secondly, we account for the decision-aware agents, where coalition surplus is also shaped by the operational decisions. More importantly, we address the potential for dishonest behaviors under the MCFL framework, such as data splitting and fake identity creation, which is not covered in prior research.

To our knowledge, our paper is the first work that considers decision-aware collaborative learning through the FL approach, with an analysis of the impact of mechanism design on both the coalition surplus and FL efficiency. Specifically, we consider that in equilibrium, the agents may conduct dishonest behavior on false name manipulation Iwasaki et al. (2010), Conitzer and Yokoo (2010), Aziz et al. (2011), where an agent creates fake identities and splits her data among two or more identities to participate in collaborative learning. We show that, while a Shapley value based mechanism encourages agent full participation, it inadvertently promotes false-name manipulation, which hugely increases the FL training cost. This eventually leads to a pitfall of Shapley value under the MCFL framework.

3 The Multi-Action Collaborative Federated Learning (MCFL) Framework

In this section, we describe a multi-action collaborative federated learning (MCFL) framework for decision-aware agents. The system consists of a digital platform, which is the coordinator, and K𝐾Kitalic_K agents on the digital platform. These agents share a common operational objective and aim to make informed decisions after collaborative learning. The platform provides an infrastructure that enables cross-agent learning and seeks to design a mechanism that incentivizes all agents to form a joint coalition.

Each agent holds a specific quantity of proprietary data samples. We use the vector 𝐦=[m1,,mK]K𝐦subscript𝑚1subscript𝑚𝐾superscript𝐾\mathbf{m}=[m_{1},\dots,m_{K}]\in\mathbb{R}^{K}bold_m = [ italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_m start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ] ∈ blackboard_R start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT to denote the amount of data each agent possesses. Specifically, an agent k𝑘kitalic_k possesses a data set in the form of Sk:={𝐲j,j=1,,mk}S_{k}:=\{\mathbf{y}_{j},\quad\forall j=1,\dots,m_{k}\}italic_S start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT := { bold_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , ∀ italic_j = 1 , … , italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } with each observation 𝐲jpsubscript𝐲𝑗superscript𝑝\mathbf{y}_{j}\in\mathbb{R}^{p}bold_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT represents a data sample. All data samples are generated independently from an unknown ground-truth distribution Fθ*subscript𝐹superscript𝜃F_{\BFtheta^{*}}italic_F start_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT. We consider a parametric family Θ:={Fθ:θΘ}assignsubscriptΘconditional-setsubscript𝐹𝜃𝜃Θ\mathcal{F}_{\Theta}:=\{F_{\BFtheta}:\BFtheta\in\BFTheta\}caligraphic_F start_POSTSUBSCRIPT roman_Θ end_POSTSUBSCRIPT := { italic_F start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT : italic_θ ∈ roman_Θ }. In this case, θsuperscript𝜃\BFtheta^{\ast}italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT is unknown and could be estimated from data. The goal of each agent is to learn the unknown θsuperscript𝜃\BFtheta^{\ast}italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT from the data and make informed decisions, ideally with a guaranteed performance for a decision-aware objective.

Since we assume that all the data samples are i.i.d., each data sample contributes equally to the learning precision of unknown θsuperscript𝜃\BFtheta^{\ast}italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT. Moreover, in the absence of the infrastructure provided by the platform, an agent on her own lacks the capability to train the learning model. Therefore, individual agents are naturally incentivized to collaborate towards a common goal, which involves forming a coalition to aggregate data samples from all participants for improved estimation. Unlike most of the previous literature, we assume that agents can form a coalition by contributing only a subset of their data samples, rather than all. Specifically, for an agent k𝑘kitalic_k, we let τk[0,mk]subscript𝜏𝑘0subscript𝑚𝑘\tau_{k}\in[0,m_{k}]italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ [ 0 , italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ] represent the size of data that agent k𝑘kitalic_k contributes to the coalition. We further define the agent participation decision profile, τ𝒜=[τ1,,τK]subscript𝜏𝒜subscript𝜏1subscript𝜏𝐾\BFtau_{\mathcal{A}}=[\tau_{1},\dots,\tau_{K}]italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT = [ italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_τ start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ], for the coalition 𝒜𝒜\mathcal{A}caligraphic_A. When τk=0subscript𝜏𝑘0\tau_{k}=0italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 0, the agent k𝑘kitalic_k is not in any coalition. We say that an agent k𝑘kitalic_k belongs to coalition 𝒜𝒜\mathcal{A}caligraphic_A if and only if τk>0subscript𝜏𝑘0\tau_{k}>0italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT > 0. Hence 𝒜={k:τk>0}𝒜conditional-set𝑘subscript𝜏𝑘0\mathcal{A}=\{k:\tau_{k}>0\}caligraphic_A = { italic_k : italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT > 0 }. Given τ𝒜subscript𝜏𝒜\BFtau_{\mathcal{A}}italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT, we denote Sτ𝒜={Sτk:k𝒜}subscript𝑆subscript𝜏𝒜conditional-setsubscript𝑆subscript𝜏𝑘for-all𝑘𝒜S_{\BFtau_{\mathcal{A}}}=\{S_{\tau_{k}}:\forall k\in\mathcal{A}\}italic_S start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT end_POSTSUBSCRIPT = { italic_S start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT : ∀ italic_k ∈ caligraphic_A }, the data samples from coalition 𝒜𝒜\mathcal{A}caligraphic_A with participation decision profile τ𝒜subscript𝜏𝒜\BFtau_{\mathcal{A}}italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT, where Sτk={𝐲i,i=1,,τk}SkS_{\tau_{k}}=\{\mathbf{y}_{i},i=1,\dots,\tau_{k}\}\subseteq S_{k}italic_S start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT = { bold_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_i = 1 , … , italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } ⊆ italic_S start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, with Sτk=0=subscript𝑆subscript𝜏𝑘0S_{\tau_{k}=0}=\emptysetitalic_S start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 0 end_POSTSUBSCRIPT = ∅.

Additional Notation.

For any τ𝜏\BFtauitalic_τ, by slightly abusing notation, we let |τ|:=kτkassign𝜏subscript𝑘subscript𝜏𝑘|\BFtau|:=\sum_{k}\tau_{k}| italic_τ | := ∑ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT denote the total size of data within a coalition that is characterized by τ𝜏\BFtauitalic_τ, and |m|:=kmkassign𝑚subscript𝑘subscript𝑚𝑘|\BFm|:=\sum_{k}m_{k}| italic_m | := ∑ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT be the total number of data samples. We let (A)𝐴\mathbb{P}(A)blackboard_P ( italic_A ) denote the probability of any event A𝐴Aitalic_A. We let 2𝒩(K)superscript2𝒩𝐾2^{\mathcal{N}(K)}2 start_POSTSUPERSCRIPT caligraphic_N ( italic_K ) end_POSTSUPERSCRIPT denote the set of all subsets of {1,,K}1𝐾\{1,\dots,K\}{ 1 , … , italic_K }. We let m[2:K]=[m2,,mK]subscript𝑚delimited-[]:2𝐾subscript𝑚2subscript𝑚𝐾\BFm_{[2:K]}=[m_{2},\dots,m_{K}]italic_m start_POSTSUBSCRIPT [ 2 : italic_K ] end_POSTSUBSCRIPT = [ italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_m start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ] be the maximum number of data points for agent k=2,,K𝑘2𝐾k=2,\dots,Kitalic_k = 2 , … , italic_K, and τ[2:K]=[τ2,,τK]subscript𝜏delimited-[]:2𝐾subscript𝜏2subscript𝜏𝐾\BFtau_{[2:K]}=[\tau_{2},\dots,\tau_{K}]italic_τ start_POSTSUBSCRIPT [ 2 : italic_K ] end_POSTSUBSCRIPT = [ italic_τ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_τ start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ] with τk{0,,mk}subscript𝜏𝑘0subscript𝑚𝑘\tau_{k}\in\{0,\dots,m_{k}\}italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ { 0 , … , italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } be the participation decision profile of agent k=2,,K𝑘2𝐾k=2,\dots,Kitalic_k = 2 , … , italic_K. Lastly, Let Mk(τ)={v|τvmv,vk}subscript𝑀𝑘𝜏conditional-set𝑣formulae-sequencesubscript𝜏𝑣subscript𝑚𝑣𝑣𝑘M_{k}(\BFtau)=\{v|\tau_{v}\neq m_{v},v\neq k\}italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) = { italic_v | italic_τ start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT ≠ italic_m start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT , italic_v ≠ italic_k } denote the set of agents in profile τ𝜏\BFtauitalic_τ who have not reached their maximum exertion effort, excluding agent k𝑘kitalic_k.

3.1 A Decision-Aware Learning objective

On this platform, all agents are facing a common decision-making problem with uncertainty, whereas the objective function is jointly determined by an agent’s decision w𝑤\BFwitalic_w and a random parameter y𝑦\BFyitalic_y with unknown distribution. Specifically, we assume each agent aims to develop a good data-driven solution for the following problem

z*(θ):=maxw𝒞[π(w,Fθ):=𝔼yFθ[r(w,y)]],assignsuperscript𝑧superscript𝜃subscript𝑤𝒞assign𝜋𝑤subscript𝐹superscript𝜃subscript𝔼similar-to𝑦subscript𝐹superscript𝜃delimited-[]𝑟𝑤𝑦z^{*}(\BFtheta^{\ast}):=\max_{\BFw\in\mathcal{C}}\big{[}\pi(\BFw,F_{\BFtheta^{% \ast}}):=\mathbb{E}_{\BFy\sim F_{\BFtheta^{\ast}}}[r(\BFw,\BFy)]\big{]},italic_z start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) := roman_max start_POSTSUBSCRIPT italic_w ∈ caligraphic_C end_POSTSUBSCRIPT [ italic_π ( italic_w , italic_F start_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) := blackboard_E start_POSTSUBSCRIPT italic_y ∼ italic_F start_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_r ( italic_w , italic_y ) ] ] ,

where Fθsubscript𝐹superscript𝜃F_{\BFtheta^{\ast}}italic_F start_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT is the underlying true distribution of the random parameter y𝒴𝑦𝒴\BFy\in\mathcal{Y}italic_y ∈ caligraphic_Y, wd𝑤superscript𝑑\BFw\in\mathbb{R}^{d}italic_w ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT is the decision variable restricted in a convex feasible region 𝒞𝒞\mathcal{C}caligraphic_C, and r:𝒞×𝒴:𝑟𝒞𝒴r:\mathcal{C}\times\mathcal{Y}\rightarrow\mathbb{R}italic_r : caligraphic_C × caligraphic_Y → blackboard_R is the objective/reward function. Here, we consider a strictly concave reward function r(w,y)𝑟𝑤𝑦r(\BFw,\BFy)italic_r ( italic_w , italic_y ) in w𝑤witalic_w for all y𝒴𝑦𝒴\BFy\in\mathcal{Y}italic_y ∈ caligraphic_Y. Thus, there exists a unique optimal decision

w*(θ):=\argmaxw𝒞π(w,Fθ).assignsuperscript𝑤superscript𝜃subscript\argmax𝑤𝒞𝜋𝑤subscript𝐹superscript𝜃\BFw^{*}(\BFtheta^{\ast}):=\argmax_{\BFw\in\mathcal{C}}\pi(\BFw,F_{\BFtheta^{% \ast}}).italic_w start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) := start_POSTSUBSCRIPT italic_w ∈ caligraphic_C end_POSTSUBSCRIPT italic_π ( italic_w , italic_F start_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) .

It is worthy noting that we do not consider competition among users in this model for now. Given any estimator θ^^𝜃\hat{\BFtheta}over^ start_ARG italic_θ end_ARG, the agents make informed decisions

w(θ^):=\argmaxw𝒞π(w,θ^)=\argmaxw𝒞EyFθ^[r(w,y)].assignsuperscript𝑤^𝜃subscript\argmax𝑤𝒞𝜋𝑤^𝜃subscript\argmax𝑤𝒞subscript𝐸similar-to𝑦subscript𝐹^𝜃delimited-[]𝑟𝑤𝑦\BFw^{\ast}(\hat{\BFtheta}):=\argmax_{\BFw\in\mathcal{C}}\pi(\BFw,\hat{% \BFtheta})=\argmax_{\BFw\in\mathcal{C}}E_{\BFy\sim F_{\hat{\BFtheta}}}[r(\BFw,% \BFy)].italic_w start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( over^ start_ARG italic_θ end_ARG ) := start_POSTSUBSCRIPT italic_w ∈ caligraphic_C end_POSTSUBSCRIPT italic_π ( italic_w , over^ start_ARG italic_θ end_ARG ) = start_POSTSUBSCRIPT italic_w ∈ caligraphic_C end_POSTSUBSCRIPT italic_E start_POSTSUBSCRIPT italic_y ∼ italic_F start_POSTSUBSCRIPT over^ start_ARG italic_θ end_ARG end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_r ( italic_w , italic_y ) ] .

With the jointly learned parameter θ^^𝜃\hat{\BFtheta}over^ start_ARG italic_θ end_ARG, the characteristic function of a coalition with decision profile τ𝒜subscript𝜏𝒜\mathbf{\BFtau}_{\mathcal{A}}italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT can be defined as

v(τ𝒜)=π(w*(θ^τ𝒜),θ).𝑣subscript𝜏𝒜𝜋superscript𝑤subscript^𝜃subscript𝜏𝒜superscript𝜃v(\BFtau_{\mathcal{A}})=\pi(\BFw^{*}(\hat{\BFtheta}_{\BFtau_{\mathcal{A}}}),% \BFtheta^{\ast}).italic_v ( italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT ) = italic_π ( italic_w start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) , italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) .

Below, we present an example of the decision-making problem along with its associated characteristic function.

Example 3.1 (The Newsvendor Problem.)

In the Newsvendor example, y𝒴𝑦𝒴\BFy\in\mathcal{Y}\subseteq\mathbb{R}italic_y ∈ caligraphic_Y ⊆ blackboard_R is the random demand, and w𝑤\BFw\in\mathbb{R}italic_w ∈ blackboard_R is the decision of order quantities. The decision-maker aims to minimize the cost r(w,y)=h(wy)++b(yw)+𝑟𝑤𝑦superscript𝑤𝑦𝑏superscript𝑦𝑤r(\BFw,\BFy)=h(\BFw-\BFy)^{+}+b(\BFy-\BFw)^{+}italic_r ( italic_w , italic_y ) = italic_h ( italic_w - italic_y ) start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT + italic_b ( italic_y - italic_w ) start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT, where hhitalic_h and b𝑏bitalic_b represents the unit overstock and understock cost, respectively. Given any estimator θ^normal-^𝜃\hat{\theta}over^ start_ARG italic_θ end_ARG, w*(θ^)superscript𝑤normal-^𝜃w^{*}(\hat{\BFtheta})italic_w start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( over^ start_ARG italic_θ end_ARG ) is the bb+h𝑏𝑏\frac{b}{b+h}divide start_ARG italic_b end_ARG start_ARG italic_b + italic_h end_ARG quantile of Fθ^subscript𝐹normal-^𝜃F_{\hat{\theta}}italic_F start_POSTSUBSCRIPT over^ start_ARG italic_θ end_ARG end_POSTSUBSCRIPT.

Since every data sample equally enhances the learning precision of the unknown θsuperscript𝜃\BFtheta^{\ast}italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, only the total number of samples in the decision profile impacts the learning quality. Therefore, we can further express v(τ𝒜)𝑣subscript𝜏𝒜v(\BFtau_{\mathcal{A}})italic_v ( italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT ) and θ^τ𝒜subscript^𝜃subscript𝜏𝒜\hat{\BFtheta}_{\BFtau_{\mathcal{A}}}over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT end_POSTSUBSCRIPT as functions that depend on |τ𝒜|subscript𝜏𝒜|\BFtau_{\mathcal{A}}|| italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT |, the total number of data samples in coalition 𝒜𝒜\mathcal{A}caligraphic_A,

v(τ𝒜)=π(w*(θ^|τ𝒜|),θ):=v(|τ𝒜|).𝑣subscript𝜏𝒜𝜋superscript𝑤subscript^𝜃subscript𝜏𝒜superscript𝜃assign𝑣subscript𝜏𝒜v(\BFtau_{\mathcal{A}})=\pi(\BFw^{*}(\hat{\BFtheta}_{|\BFtau_{\mathcal{A}}|}),% \theta^{\ast}):=v(|\BFtau_{\mathcal{A}}|).italic_v ( italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT ) = italic_π ( italic_w start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT | italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT | end_POSTSUBSCRIPT ) , italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) := italic_v ( | italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT | ) .

In order to encourage agent participation, the platform provides a performance guarantee in coalition surplus v(|τ𝒜|)𝑣subscript𝜏𝒜v(|\BFtau_{\mathcal{A}}|)italic_v ( | italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT | ) for a coalition 𝒜𝒜\mathcal{A}caligraphic_A in the form of ensuring a small performance gap between the oracle surplus z*superscript𝑧z^{*}italic_z start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT and actual coalition surplus v(|τ𝒜|)𝑣subscript𝜏𝒜v(|\BFtau_{\mathcal{A}}|)italic_v ( | italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT | ). Under certain standard assumptions provided in Appendix 8, the surplus gap between z*superscript𝑧z^{*}italic_z start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT and v(|τ𝒜|)𝑣subscript𝜏𝒜v(|\BFtau_{\mathcal{A}}|)italic_v ( | italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT | ) could be translated to the distance between estimator gap θ^FLθ*normsubscript^𝜃𝐹𝐿superscript𝜃\|\hat{\BFtheta}_{FL}-\BFtheta^{*}\|∥ over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_F italic_L end_POSTSUBSCRIPT - italic_θ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∥. Given the data size |τ𝒜|subscript𝜏𝒜|\BFtau_{\mathcal{A}}|| italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT |, it is often possible to obtain a statistical high probability performance guarantee as stated in the following format:

(θ^FLθ*ε(|τ𝒜|,δ0))δ0,normsubscript^𝜃𝐹𝐿superscript𝜃𝜀subscript𝜏𝒜subscript𝛿0subscript𝛿0{\mathbb{P}(\|\hat{\BFtheta}_{FL}-\BFtheta^{*}\|\geq\varepsilon(|\BFtau_{% \mathcal{A}}|,\delta_{0}))\leq\delta_{0},}blackboard_P ( ∥ over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_F italic_L end_POSTSUBSCRIPT - italic_θ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∥ ≥ italic_ε ( | italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT | , italic_δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ) ≤ italic_δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , (1)

where ε(|τ𝒜|,δ0)𝜀subscript𝜏𝒜subscript𝛿0\varepsilon(|\BFtau_{\mathcal{A}}|,\delta_{0})italic_ε ( | italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT | , italic_δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) depends on the pre-specified probability δ0subscript𝛿0\delta_{0}italic_δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and the sample size |τ𝒜|subscript𝜏𝒜|\BFtau_{\mathcal{A}}|| italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT |. Typically, ε(|τ𝒜|,δ0)𝜀subscript𝜏𝒜subscript𝛿0\varepsilon(|\BFtau_{\mathcal{A}}|,\delta_{0})italic_ε ( | italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT | , italic_δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) decreases with growing |τ𝒜|subscript𝜏𝒜|\BFtau_{\mathcal{A}}|| italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT |. And the platform guarantees that with probability greater than 1δ01subscript𝛿01-\delta_{0}1 - italic_δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, the surplus gap between z*superscript𝑧z^{*}italic_z start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT and v(|τ𝒜|)𝑣subscript𝜏𝒜v(|\BFtau_{\mathcal{A}}|)italic_v ( | italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT | ) is bounded by Lr,wε(|τ𝒜|,δ0)subscript𝐿𝑟𝑤𝜀subscript𝜏𝒜subscript𝛿0L_{r,w}\varepsilon(|\BFtau_{\mathcal{A}}|,\delta_{0})italic_L start_POSTSUBSCRIPT italic_r , italic_w end_POSTSUBSCRIPT italic_ε ( | italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT | , italic_δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ),

(z*v(|τ𝒜|)Lr,wε(|τ𝒜|,δ0))δ0,superscript𝑧𝑣subscript𝜏𝒜subscript𝐿𝑟𝑤𝜀subscript𝜏𝒜subscript𝛿0subscript𝛿0{\mathbb{P}(z^{*}-v(|\BFtau_{\mathcal{A}}|)\geq L_{r,w}\varepsilon(|\BFtau_{% \mathcal{A}}|,\delta_{0}))\leq\delta_{0},}blackboard_P ( italic_z start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT - italic_v ( | italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT | ) ≥ italic_L start_POSTSUBSCRIPT italic_r , italic_w end_POSTSUBSCRIPT italic_ε ( | italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT | , italic_δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ) ≤ italic_δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , (2)

where Lr,wsubscript𝐿𝑟𝑤L_{r,w}italic_L start_POSTSUBSCRIPT italic_r , italic_w end_POSTSUBSCRIPT is a given constant that depends on the Lipschitzness constants for reward and decision functions. We also show the performance guarantee bounds in (1) and (2) are equivalent under standard assumptions in Appendix 8.

3.2 Federated Learning within Agent Coalition

After forming a coalition, the platform aims to conduct collaborative learning that is privacy-preserving. One common approach is through Federated Learning (FL) frameworks, with a representative widely adopted algorithm known as Federated Averaging algorithm (FedAvg) (Kairouz et al. 2021). In FedAvg, agents keep their raw data locally to preserve privacy and perform local training on their own data. Periodically, the platform collects interim results from each agent and synchronizes all agents by distributing the average of these local outcomes. The details of the algorithm are provided in Section 5.

Given a coalition 𝒜𝒜\mathcal{A}caligraphic_A, agent participation is characterized by the participation decision profile τ𝒜subscript𝜏𝒜\BFtau_{\mathcal{A}}italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT, and |τ𝒜|subscript𝜏𝒜|\BFtau_{\mathcal{A}}|| italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT | denotes the total size of data within the coalition. We let θ^|τ𝒜|FLsubscriptsuperscript^𝜃𝐹𝐿subscript𝜏𝒜\hat{\BFtheta}^{FL}_{|\BFtau_{\mathcal{A}}|}over^ start_ARG italic_θ end_ARG start_POSTSUPERSCRIPT italic_F italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT | end_POSTSUBSCRIPT denote the estimator produced by the platform conducting FL with coalition data Sτ𝒜subscript𝑆subscript𝜏𝒜S_{\BFtau_{\mathcal{A}}}italic_S start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT end_POSTSUBSCRIPT.

FL requires multiple rounds of synchronization to obtain an estimator θ^|τ𝒜|FLsubscriptsuperscript^𝜃𝐹𝐿subscript𝜏𝒜\hat{\BFtheta}^{FL}_{|\BFtau_{\mathcal{A}}|}over^ start_ARG italic_θ end_ARG start_POSTSUPERSCRIPT italic_F italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT | end_POSTSUBSCRIPT that satisfies the performance guarantee in (1). The majority cost of performing FL lies in this synchronization process where the platform is required to aggregate and communicate the results across agents Kairouz et al. (2021). A measure of this cost is determined by the total number of synchronizations needed to achieve an estimator meeting the performance criterion, denoted as Nsync(δ0,,Φ)subscript𝑁𝑠𝑦𝑛𝑐subscript𝛿0ΦN_{sync}(\delta_{0},\mathcal{M},\Phi)italic_N start_POSTSUBSCRIPT italic_s italic_y italic_n italic_c end_POSTSUBSCRIPT ( italic_δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , caligraphic_M , roman_Φ ), where ΦΦ\Phiroman_Φ specifies FL algorithm parameters such as the initial point and the step-size choices. It’s worth mentioning that the number of synchronizations required to converge intrinsically depends on the announced mechanism \mathcal{M}caligraphic_M through agent participation profile τ𝒜subscript𝜏𝒜\BFtau_{\mathcal{A}}italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT, which contains the information of how many agents are participating, and the number of local data samples.

3.3 MCFL System Synergy

The decision timeline of the MCFL system is described as the following:

  • The platform specifies a guaranteed performance based on the size of the aggregated data from the coalition and the revenue surplus division rule. Specifically, the platform specifies probability bound p0subscript𝑝0p_{0}italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, FL learning parameter set ΦΦ\Phiroman_Φ, and the mechanism \mathcal{M}caligraphic_M on surplus allocation.

  • The agents participate by deciding how to share, and how much to share. The coalition 𝒜𝒜\mathcal{A}caligraphic_A was formed with the associated participation profile τ𝒜subscript𝜏𝒜\BFtau_{\mathcal{A}}italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT.

  • The platform conducts the learning task through federated learning and shares the learning results with the agents. The result is guaranteed to satisfy the performance guarantee bound in (1).

  • The platform announces θ^|τ𝒜|FLsubscriptsuperscript^𝜃𝐹𝐿subscript𝜏𝒜\hat{\BFtheta}^{FL}_{|\BFtau_{\mathcal{A}}|}over^ start_ARG italic_θ end_ARG start_POSTSUPERSCRIPT italic_F italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT | end_POSTSUBSCRIPT, and the agents in coalition 𝒜𝒜\mathcal{A}caligraphic_A make informed decision w*(θ^|τ𝒜|FL)superscript𝑤subscriptsuperscript^𝜃𝐹𝐿subscript𝜏𝒜w^{*}(\hat{\BFtheta}^{FL}_{|\BFtau_{\mathcal{A}}|})italic_w start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( over^ start_ARG italic_θ end_ARG start_POSTSUPERSCRIPT italic_F italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT | end_POSTSUBSCRIPT ). The actual coalition surplus v(|τ𝒜|)=π(w*(θ^|τ𝒜|FL),θ)𝑣subscript𝜏𝒜𝜋superscript𝑤subscriptsuperscript^𝜃𝐹𝐿subscript𝜏𝒜superscript𝜃v(|\BFtau_{\mathcal{A}}|)=\pi(\BFw^{*}(\hat{\BFtheta}^{FL}_{|\BFtau_{\mathcal{% A}}|}),\BFtheta^{\ast})italic_v ( | italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT | ) = italic_π ( italic_w start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( over^ start_ARG italic_θ end_ARG start_POSTSUPERSCRIPT italic_F italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT | end_POSTSUBSCRIPT ) , italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) is realized, and the platform redistributes the coalition surplus according to the mechanism \mathcal{M}caligraphic_M.

In this work, we investigate the synergy between mechanism design, agent participation, operational decision-making, and the performance of FL algorithms in the MCFL system. The surplus allocation mechanism is designed based on possible statistical performance guarantees of learning algorithms. Moreover, such a mechanism influences the agent participation equilibrium which further impacts the computation and communication cost of the FL learning algorithm through the total number of synchronizations required in FL, Nsyncsubscript𝑁𝑠𝑦𝑛𝑐N_{sync}italic_N start_POSTSUBSCRIPT italic_s italic_y italic_n italic_c end_POSTSUBSCRIPT.

4 A Shapley Value Based Mechanism for MCFL

In this section, we investigate the surplus allocation mechanism announced by the platform in the MCFL framework. Particularly, we focus on the Shaply-value-based mechanism. We first introduce the MCFL Shapley value, and then discuss the equilibrium induced by the MCFL Shapley value considering false name manipulation.

4.1 The MCFL Shapley Value

A natural idea of a fair and efficient payoff allocation mechanism in cooperative games is based on the renowned Shapley value. The original definition of Shapley value is traced back to Shapley et al. (1953). We state the definition as follows

Definition 4.1 (Shapley value Shapley et al. (1953))

Suppose a cooperative game consists of K𝐾Kitalic_K agents and the characteristic function is v:2𝒩(K)normal-:𝑣normal-→superscript2𝒩𝐾v:2^{\mathcal{N}(K)}\to\mathbb{R}italic_v : 2 start_POSTSUPERSCRIPT caligraphic_N ( italic_K ) end_POSTSUPERSCRIPT → blackboard_R. Then the payoff allocated to agent k𝑘kitalic_k is defined as

ψk(v)=𝒮𝒩\{k}|𝒮|!(K|𝒮|1)!K!(v(𝒮{k})v(𝒮)).subscript𝜓𝑘𝑣subscript𝒮\𝒩𝑘𝒮𝐾𝒮1𝐾𝑣𝒮𝑘𝑣𝒮\psi_{k}(v)=\sum_{\mathcal{S}\subseteq\mathcal{N}\backslash\{k\}}\frac{|% \mathcal{S}|!(K-|\mathcal{S}|-1)!}{K!}(v(\mathcal{S}\cup\{k\})-v(\mathcal{S})).italic_ψ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_v ) = ∑ start_POSTSUBSCRIPT caligraphic_S ⊆ caligraphic_N \ { italic_k } end_POSTSUBSCRIPT divide start_ARG | caligraphic_S | ! ( italic_K - | caligraphic_S | - 1 ) ! end_ARG start_ARG italic_K ! end_ARG ( italic_v ( caligraphic_S ∪ { italic_k } ) - italic_v ( caligraphic_S ) ) .

Recall that the mechanism specified by the platform announces the allocation rule ψ(v)K𝜓𝑣superscript𝐾\mathbf{\psi}(v)\in\mathbb{R}^{K}italic_ψ ( italic_v ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT which specifies the share of the payoff allocated to each player k=1,,K𝑘1𝐾k=1,\dots,Kitalic_k = 1 , … , italic_K. Agents thus decide the participation profiles τ𝜏\BFtauitalic_τ after observing the allocation rule ψ(v)𝜓𝑣\mathbf{\psi}(v)italic_ψ ( italic_v ).

As mentioned in section 3, we model the decision-aware collaboration problem under MCFL which allows agents to have various levels of participation decisions by selecting the number of samples they would like to contribute to the coalition, as quantified by the participation profile. To incorporate the multiple levels of participation, we adopt the multi-choice Shapley value proposed in Hsiao and Raghavan (1993), Hsiao (2004). Note that to define the multi-choice Shapley value, a weight function must be defined prior to the Shapley value (Hsiao and Raghavan (1993), Hsiao (2004)). Particularly, the weight function maps any possible action to a non-negative number and satisfies α(0)=0𝛼00\alpha(0)=0italic_α ( 0 ) = 0, and α(i)α(i+1)𝛼𝑖𝛼𝑖1\alpha(i)\leq\alpha(i+1)italic_α ( italic_i ) ≤ italic_α ( italic_i + 1 ) for any i=1,,K1𝑖1𝐾1i=1,\dots,K-1italic_i = 1 , … , italic_K - 1. The weight function is defined as prior knowledge of the power (or importance) of each action. Moreover, given the action space {0,1,,mk}Ksuperscript01subscript𝑚𝑘𝐾\{0,1,\dots,m_{k}\}^{K}{ 0 , 1 , … , italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT, in order to guarantee fairness in effort exertion, an ideal mechanism ψ𝜓\psiitalic_ψ should satisfy the following axiom:

Axiom 1

Axiom 1 in Hsiao (2004), Hsiao and Raghavan (1993) Given any τ𝜏\BFtauitalic_τ, for the unanimity game where the value function is defined as

Vτ(τ)={1if ττ0otherwise, superscript𝑉𝜏superscript𝜏cases1if superscript𝜏𝜏0otherwise, V^{\BFtau}(\BFtau^{\prime})=\begin{cases}1\qquad&\textrm{if }\BFtau^{\prime}% \geq\BFtau\\ 0\qquad&\textrm{otherwise, }\end{cases}italic_V start_POSTSUPERSCRIPT italic_τ end_POSTSUPERSCRIPT ( italic_τ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) = { start_ROW start_CELL 1 end_CELL start_CELL if italic_τ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ≥ italic_τ end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL otherwise, end_CELL end_ROW

The payoff allocated to agent k𝑘kitalic_k is proportional to α(τk)𝛼subscript𝜏𝑘\alpha(\tau_{k})italic_α ( italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ).

We are now ready to define the Shapley value under MCFL, which further defines the platform allocation mechanism based on the MCFL Shapley value.

Definition 4.2 (MCFL Shapley Value)

For any agent participation decision profile τ𝜏\BFtauitalic_τ, we define Mk(τ)={v|τvmv,vk}subscript𝑀𝑘𝜏conditional-set𝑣formulae-sequencesubscript𝜏𝑣subscript𝑚𝑣𝑣𝑘M_{k}(\BFtau)=\{v|\tau_{v}\neq m_{v},v\neq k\}italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) = { italic_v | italic_τ start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT ≠ italic_m start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT , italic_v ≠ italic_k } as the set of players who is not agent k𝑘kitalic_k and does not share all the data. Let b(k)=[0,0,1,0,,0]RN𝑏𝑘00normal-…10normal-…0superscript𝑅𝑁\BFb(k)=[0,0\dots,1,0,\dots,0]\in R^{N}italic_b ( italic_k ) = [ 0 , 0 … , 1 , 0 , … , 0 ] ∈ italic_R start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT where the kthsuperscript𝑘𝑡k^{th}italic_k start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT element of b(k)𝑏𝑘\BFb(k)italic_b ( italic_k ) equals to 1. Then, for an agent k𝑘kitalic_k sharing i𝑖iitalic_i observations, the allocated payoff is given by

ψi,kα(v)=j=1iτ:τk=j,τ0[TMk(τ)(1)|T|α(j)τα+rT[α(τr+1)α(τr)]][v(τ)v(τb(k))],subscriptsuperscript𝜓𝛼𝑖𝑘𝑣subscriptsuperscript𝑖𝑗1subscript:𝜏formulae-sequencesubscript𝜏𝑘𝑗𝜏0delimited-[]subscript𝑇subscript𝑀𝑘𝜏superscript1𝑇𝛼𝑗subscriptnorm𝜏𝛼subscript𝑟𝑇delimited-[]𝛼subscript𝜏𝑟1𝛼subscript𝜏𝑟delimited-[]𝑣𝜏𝑣𝜏𝑏𝑘\psi^{\alpha}_{i,k}(v)=\sum^{i}_{j=1}\sum_{\BFtau:\tau_{k}=j,\BFtau\neq 0}% \left[\sum_{T\subseteq M_{k}(\BFtau)}(-1)^{|T|}\frac{\alpha(j)}{||\BFtau||_{% \alpha}+\sum_{r\in T}[\alpha(\tau_{r}+1)-\alpha(\tau_{r})]}\right][v(\BFtau)-v% (\BFtau-\BFb(k))],italic_ψ start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , italic_k end_POSTSUBSCRIPT ( italic_v ) = ∑ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_τ : italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_j , italic_τ ≠ 0 end_POSTSUBSCRIPT [ ∑ start_POSTSUBSCRIPT italic_T ⊆ italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) end_POSTSUBSCRIPT ( - 1 ) start_POSTSUPERSCRIPT | italic_T | end_POSTSUPERSCRIPT divide start_ARG italic_α ( italic_j ) end_ARG start_ARG | | italic_τ | | start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_r ∈ italic_T end_POSTSUBSCRIPT [ italic_α ( italic_τ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT + 1 ) - italic_α ( italic_τ start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ) ] end_ARG ] [ italic_v ( italic_τ ) - italic_v ( italic_τ - italic_b ( italic_k ) ) ] ,

where for any τ{0,1,,mk}K𝜏superscript01normal-…subscript𝑚𝑘𝐾\BFtau\in\{0,1,\dots,m_{k}\}^{K}italic_τ ∈ { 0 , 1 , … , italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT, τα:=k=1Kα(τk)assignsubscriptnorm𝜏𝛼superscriptsubscript𝑘1𝐾𝛼subscript𝜏𝑘\|\BFtau\|_{\alpha}:=\sum_{k=1}^{K}\alpha(\tau_{k})∥ italic_τ ∥ start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT := ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_α ( italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ).

Remark 1 (Interpreting the weight function)

The weight of the action sharing a sample with size i𝑖iitalic_i can be interpreted as the importance or power of this action. In the problem context of data sharing, there often exists a unit effort to obtain data samples (Karimireddy et al. 2022). Thus, it is natural to consider linear weights, specifically, sharing a sample of size i𝑖iitalic_i has a weight as a linear function of i𝑖iitalic_i.

For simplicity, we set α(τk)=τk𝛼subscript𝜏𝑘subscript𝜏𝑘\alpha(\tau_{k})=\tau_{k}italic_α ( italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) = italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT111 This is without loss of generosity as any proportional functions α(τk)=ατk𝛼subscript𝜏𝑘𝛼subscript𝜏𝑘\alpha(\tau_{k})=\alpha\tau_{k}italic_α ( italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) = italic_α italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT could be degenerate to α(τk)=τk𝛼subscript𝜏𝑘subscript𝜏𝑘\alpha(\tau_{k})=\tau_{k}italic_α ( italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) = italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT., for all τk+subscript𝜏𝑘superscript\tau_{k}\in\mathbb{N}^{+}italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ blackboard_N start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT. Thus the Shapley value defined in 4.2 can be simplified as the following

Definition 4.3 (MCFL Shapley Value with Linear Weights)

For any agent participation decision profile τ𝜏\BFtauitalic_τ, for an agent k𝑘kitalic_k sharing i𝑖iitalic_i observations, the allocated payoff is given by

ψi,k(v)=j=1iτ:τk=j,τ0[TMk(τ)(1)|T|j|τ|+|T|][v(τ)v(τb(k))].subscript𝜓𝑖𝑘𝑣subscriptsuperscript𝑖𝑗1subscript:𝜏formulae-sequencesubscript𝜏𝑘𝑗𝜏0delimited-[]subscript𝑇subscript𝑀𝑘𝜏superscript1𝑇𝑗𝜏𝑇delimited-[]𝑣𝜏𝑣𝜏𝑏𝑘\psi_{i,k}(v)=\sum^{i}_{j=1}\sum_{\BFtau:\tau_{k}=j,\BFtau\neq 0}\left[\sum_{T% \subseteq M_{k}(\BFtau)}(-1)^{|T|}\frac{j}{|\BFtau|+|T|}\right][v(\BFtau)-v(% \BFtau-\BFb(k))].italic_ψ start_POSTSUBSCRIPT italic_i , italic_k end_POSTSUBSCRIPT ( italic_v ) = ∑ start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_τ : italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_j , italic_τ ≠ 0 end_POSTSUBSCRIPT [ ∑ start_POSTSUBSCRIPT italic_T ⊆ italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) end_POSTSUBSCRIPT ( - 1 ) start_POSTSUPERSCRIPT | italic_T | end_POSTSUPERSCRIPT divide start_ARG italic_j end_ARG start_ARG | italic_τ | + | italic_T | end_ARG ] [ italic_v ( italic_τ ) - italic_v ( italic_τ - italic_b ( italic_k ) ) ] .

The uniqueness of Shapley value lies in that, it satisfies all the desired properties for an allocation mechanism. An ideal mechanism should (1) prevent free-riding from a non-contributing agent; (2) ensure equality of treatment for agents who contribute equally; (3) guarantee that the surplus is fully allocated without any waste; and (4) should be consistent and scalable across different situations. These desired properties could further be translated into the following axioms.

Axiom 2 (Desired Axioms for MCFL Mechanisms)

Desired axioms for MCFL mechanisms include the following:

  1. A.

    Null player. A player that doesn’t add value gets nothing. for any player k𝑘kitalic_k with action τksubscript𝜏𝑘\tau_{k}italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, if

    v([τ1,,τk,,τK]])=v([τ1,,0,,τK]),v([\tau_{1},\dots,\tau_{k},\dots,\tau_{K}]])=v([\tau_{1},\dots,0,\dots,\tau_{K% }]),italic_v ( [ italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , … , italic_τ start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ] ] ) = italic_v ( [ italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , 0 , … , italic_τ start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ] ) ,

    then ψτk,k(v)=0subscript𝜓subscript𝜏𝑘𝑘𝑣0\psi_{\tau_{k},k}(v)=0italic_ψ start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_k end_POSTSUBSCRIPT ( italic_v ) = 0.

  2. B.

    Symmetry. If v([τ1,,τi,0,,τK]])=v([τ1,,0,τj,,τK]])v([\tau_{1},\dots,\tau_{i},0,\dots,\tau_{K}]])=v([\tau_{1},\dots,0,\tau_{j},% \dots,\tau_{K}]])italic_v ( [ italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_τ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , 0 , … , italic_τ start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ] ] ) = italic_v ( [ italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , 0 , italic_τ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , … , italic_τ start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ] ] ) for τi=τjsubscript𝜏𝑖subscript𝜏𝑗\tau_{i}=\tau_{j}italic_τ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_τ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, then ψi,k(v)=ψj,k(v)subscript𝜓𝑖𝑘𝑣subscript𝜓𝑗𝑘𝑣\psi_{i,k}(v)=\psi_{j,k}(v)italic_ψ start_POSTSUBSCRIPT italic_i , italic_k end_POSTSUBSCRIPT ( italic_v ) = italic_ψ start_POSTSUBSCRIPT italic_j , italic_k end_POSTSUBSCRIPT ( italic_v ) for all k𝑘kitalic_k.

  3. C.

    Efficiency (Budget balanced). i=1nψmk,k(v)=v(m)superscriptsubscript𝑖1𝑛subscript𝜓subscript𝑚𝑘𝑘𝑣𝑣𝑚\sum_{i=1}^{n}\psi_{m_{k},k}(v)=v(\BFm)∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_ψ start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_k end_POSTSUBSCRIPT ( italic_v ) = italic_v ( italic_m ).

  4. D.

    Additivity. For two characteristic functions v𝑣vitalic_v and u𝑢uitalic_u, ψi,k(v+u)=ψi,k(u)+ψi,k(v)subscript𝜓𝑖𝑘𝑣𝑢subscript𝜓𝑖𝑘𝑢subscript𝜓𝑖𝑘𝑣\psi_{i,k}(v+u)=\psi_{i,k}(u)+\psi_{i,k}(v)italic_ψ start_POSTSUBSCRIPT italic_i , italic_k end_POSTSUBSCRIPT ( italic_v + italic_u ) = italic_ψ start_POSTSUBSCRIPT italic_i , italic_k end_POSTSUBSCRIPT ( italic_u ) + italic_ψ start_POSTSUBSCRIPT italic_i , italic_k end_POSTSUBSCRIPT ( italic_v ).

It’s worth noting that, the MCFL Shapley value is the only mechanism that satisfies Axiom 1 and Axiom 2 Hsiao (2004). Hence, if a platform wants to guarantee the desired properties of an allocation mechanism and fairness in effort exertion, the MCFL Shapley value is the only choice among all possible allocation rules.

4.2 False-Name Manipulation

As we presented in the previous section, Shapley value is the only option if the platform requires certain desirable properties stated in Axiom 1 and 2. However, when applied in the real world, Shapley value based mechanism can be vulnerable to dishonest behaviors or manipulations conducted by the participating agents. One possible manipulation is the false-name manipulation. False-name manipulation is the behavior where an agent creates fake identities in the game and then splits her data among two or more identities. To be more specific, if the player k𝑘kitalic_k splits into m𝑚mitalic_m identities, then the original sample Sksubscript𝑆𝑘S_{k}italic_S start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is split into m𝑚mitalic_m small sub-samples, each containing at least one data sample, and the player k𝑘kitalic_k pretends that the m𝑚mitalic_m small sub-samples comes from m𝑚mitalic_m different (fake) agents.

We first demonstrate that the allocation mechanism defined by Shapley values suffers the potential risk of false-name manipulation. For any agent k𝑘kitalic_k, suppose agent k𝑘kitalic_k adopts false-name manipulation and splits into two fake identities, agents k1subscript𝑘1k_{1}italic_k start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and k2subscript𝑘2k_{2}italic_k start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. Then we compare the payoff that the original receives, with the total payoff that the two fake agents receive in Theorem 4.1.

Theorem 4.1 (Vulnerability of MCFL Shapley under False-name Manipulation)

For any T,Ti𝑇superscript𝑇normal-′𝑖T,T^{\prime}\leq iitalic_T , italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ≤ italic_i such that T+T=i𝑇superscript𝑇normal-′𝑖T+T^{\prime}=iitalic_T + italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_i, we have

  • (i)

    ψi,k(v)<ψT,k1(v)+ψT,k2(v)subscript𝜓𝑖𝑘𝑣subscript𝜓𝑇subscript𝑘1𝑣subscript𝜓superscript𝑇subscript𝑘2𝑣\psi_{i,k}(v)<\psi_{T,k_{1}}(v)+\psi_{T^{\prime},k_{2}}(v)italic_ψ start_POSTSUBSCRIPT italic_i , italic_k end_POSTSUBSCRIPT ( italic_v ) < italic_ψ start_POSTSUBSCRIPT italic_T , italic_k start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_v ) + italic_ψ start_POSTSUBSCRIPT italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_k start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_v ) if v()𝑣v(\cdot)italic_v ( ⋅ ) is a strictly concave function.

  • (ii)

    ψi,k(v)=ψT,k1(v)+ψT,k2(v)subscript𝜓𝑖𝑘𝑣subscript𝜓𝑇subscript𝑘1𝑣subscript𝜓superscript𝑇subscript𝑘2𝑣\psi_{i,k}(v)=\psi_{T,k_{1}}(v)+\psi_{T^{\prime},k_{2}}(v)italic_ψ start_POSTSUBSCRIPT italic_i , italic_k end_POSTSUBSCRIPT ( italic_v ) = italic_ψ start_POSTSUBSCRIPT italic_T , italic_k start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_v ) + italic_ψ start_POSTSUBSCRIPT italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_k start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_v ) if v()𝑣v(\cdot)italic_v ( ⋅ ) is a linear function.

In other words, with a concave value function, an agent tends to create a duplicated identity and split her original data set for higher benefit allocation. Moreover, for an agent k𝑘kitalic_k with mksubscript𝑚𝑘m_{k}italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT samples of data, the equilibrium participation profile under MCFL Shapley is: τk^=1,k^=1,,mkformulae-sequencesubscript𝜏normal-^𝑘1normal-^𝑘1normal-…subscript𝑚𝑘\tau_{\hat{k}}=1,\ \hat{k}=1,\dots,m_{k}italic_τ start_POSTSUBSCRIPT over^ start_ARG italic_k end_ARG end_POSTSUBSCRIPT = 1 , over^ start_ARG italic_k end_ARG = 1 , … , italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, where an agent fully participates in the coalition but splits all the data samples, with each identity contributes one sample.

Normally, v(|τ𝒜|)𝑣subscript𝜏𝒜v(|\BFtau_{\mathcal{A}}|)italic_v ( | italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT | ) is a strictly increasing and concave function for many business or operations decisions. In Appendix 8, we provide a specific example for v(|τ𝒜|)𝑣subscript𝜏𝒜v(|\BFtau_{\mathcal{A}}|)italic_v ( | italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT | ) being a strictly increasing and concave function for pricing under uncertainty. Theorem 4.1 suggests that, while Shapley value satisfies all the desired properties and encourages full participation in providing all data samples, it inherently incentivizes agents to split data with fake identities. While this dishonest behavior does not impact the total coalition surplus v(|τ𝒜|)𝑣subscript𝜏𝒜v(|\BFtau_{\mathcal{A}}|)italic_v ( | italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT | ), which only depends on the total number of samples in the coalition, it significantly hurts the performance of the FL algorithm. In the next section, we further elaborate on how the false-name manipulation hurts the learning process, and may further decrease the overall system efficiency.

5 System Efficiency

In section 4, we describe a mechanism based on Shapley value that incentivizes full participation of the agents. We also discuss the potential vulnerability of Shapley value under dishonest behaviors like false-name manipulation. In this section, we elaborate on how false-name manipulation would impact the performance of FL algorithms. Our study is the first to study the impact of mechanism design on agents’ decisions to provide data, and how these decisions subsequently affect the performance of platform learning algorithms. We delve into how an efficient mechanism like Shapley, one that is budget-balanced and promotes complete data provision to enhance estimator quality, is vulnerable to false-name manipulation, which leads to agents participating with splitting their data. Such vulnerability could result in escalated communication costs within federated learning frameworks, presenting a complex challenge that intertwines mechanism design with operational efficiency.

The structure of this section is organized as follows. We first introduce the platform federated learning process and define the system efficiency that combines both the agent allocation of MCFL and the communication cost of federated learning. We then conduct an analysis of the system efficiency to Shapley value based mechanism. Our findings emphasize that although the Shapley mechanism adheres to the desired axioms of mechanism design, it fails to guard against false-name manipulation, resulting in considerably higher training costs compared to other mechanisms. This ultimately leads to the pitfall of Shapley mechanism.

5.1 Algorithm for Federated Learning

In this section, we first introduce the general approach of federated learning. We then apply federated learning to our collaborative learning setup and define the system efficiency under federated learning.

Federated learning is a decentralized and privacy-preserving machine learning framework, where multiple clients collaboratively train a model under a central platform without sharing raw data. The platform seeks to minimize a global loss function, which could be further represented as the sum of each agent k𝑘kitalic_k’s local loss functions :

minθL(θ)=k=1KLk(θ)subscript𝜃𝐿𝜃subscriptsuperscript𝐾𝑘1subscript𝐿𝑘𝜃{\min_{\BFtheta}\ L(\BFtheta)=\sum^{K}_{k=1}L_{k}(\BFtheta)}roman_min start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT italic_L ( italic_θ ) = ∑ start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_θ ) (3)

A basic blueprint for designing Federated Learning (FL) training algorithms is the Federated Averaging algorithm (FedAvg) McMahan et al. (2017). While there are many variants of FedAvg Kairouz et al. (2021), in this section, we consider the following variant of FedAvg, a parallel gradient descent method, also known as local gradient descent (local GD) presented in Algorithm 1 Mangasarian (1995), Khaled et al. (2019). The reason for not adopting a local stochastic gradient descent (SGD) method lies in that, the equilibrium participation profile may involve each identity of agent providing only one data sample, making SGD infeasible under such conditions. Algorithm 1 is parameterized with step-size ρ𝜌\rhoitalic_ρ, the initial point θ0superscript𝜃0\BFtheta^{0}italic_θ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT, the synchronization interval H𝐻Hitalic_H, and the total number of iteration T𝑇Titalic_T, which are all platform decision variables. In local GD, each agent individually computes gradients on her own machine for a given interval H𝐻Hitalic_H and then synchronizes with the platform. The platform averages over the local results and broadcasts back to each agent. Define the set of platform decision variables as Φ={ρ,θ0,T,H}Φ𝜌superscript𝜃0𝑇𝐻\Phi=\{\rho,\BFtheta^{0},T,H\}roman_Φ = { italic_ρ , italic_θ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT , italic_T , italic_H }. The details are provided in Algorithm 1.

Input: t=0𝑡0t=0italic_t = 0, Φ={ρ,θ0,T,H}Φ𝜌superscript𝜃0𝑇𝐻\Phi=\{\rho,\BFtheta^{0},T,H\}roman_Φ = { italic_ρ , italic_θ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT , italic_T , italic_H }, θk0=θ0subscriptsuperscript𝜃0𝑘superscript𝜃0\BFtheta^{0}_{k}=\BFtheta^{0}italic_θ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_θ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT for all k𝑘kitalic_k
Output: Training weights θ^|τ𝒜|FLsubscriptsuperscript^𝜃𝐹𝐿subscript𝜏𝒜\hat{\BFtheta}^{FL}_{|\BFtau_{\mathcal{A}}|}over^ start_ARG italic_θ end_ARG start_POSTSUPERSCRIPT italic_F italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT | end_POSTSUBSCRIPT as average of θt,t{H,2H,,T1}superscript𝜃𝑡𝑡𝐻2𝐻𝑇1\BFtheta^{t},t\in\{H,2H,\dots,T-1\}italic_θ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , italic_t ∈ { italic_H , 2 italic_H , … , italic_T - 1 }
1 while t<T𝑡𝑇t<Titalic_t < italic_T do
2       for k=1,,K𝑘1normal-…𝐾k=1,\dots,Kitalic_k = 1 , … , italic_K do
3             if t{H,2H,,T1}𝑡𝐻2𝐻normal-…𝑇1t\in\{H,2H,\dots,T-1\}italic_t ∈ { italic_H , 2 italic_H , … , italic_T - 1 } then
                   θt+1=1Ki=1K(θitρLi(θit)),θkt+1=θt+1formulae-sequencesuperscript𝜃𝑡11𝐾subscriptsuperscript𝐾𝑖1subscriptsuperscript𝜃𝑡𝑖𝜌subscript𝐿𝑖subscriptsuperscript𝜃𝑡𝑖subscriptsuperscript𝜃𝑡1𝑘superscript𝜃𝑡1\BFtheta^{t+1}=\frac{1}{K}\sum^{K}_{i=1}(\BFtheta^{t}_{i}-\rho\nabla L_{i}(% \BFtheta^{t}_{i})),\ \BFtheta^{t+1}_{k}=\BFtheta^{t+1}italic_θ start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_K end_ARG ∑ start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT ( italic_θ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_ρ ∇ italic_L start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_θ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) , italic_θ start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_θ start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT for all k𝑘kitalic_k ;
                    // Synchronization
4                  
5            else
                   θkt+1=θktρLk(θkt)subscriptsuperscript𝜃𝑡1𝑘subscriptsuperscript𝜃𝑡𝑘𝜌subscript𝐿𝑘subscriptsuperscript𝜃𝑡𝑘\BFtheta^{t+1}_{k}=\BFtheta^{t}_{k}-\rho\nabla L_{k}(\BFtheta^{t}_{k})italic_θ start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_θ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_ρ ∇ italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_θ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ;
                    // Local Training
6                  
7            
8      t=t+1𝑡𝑡1t=t+1italic_t = italic_t + 1
Algorithm 1 Federated learning approach with Parallel Gradient Descent for solving (3) (e.g. Mangasarian (1995), Khaled et al. (2019))

In collaborative learning, we consider an example where the platform utilizes maximum likelihood estimation (MLE) to learn θ𝜃\BFthetaitalic_θ through federated learning. Here, the loss function is the negative of the likelihood. Let τ=[τ1,,τK]𝜏subscript𝜏1subscript𝜏𝐾\BFtau=[\tau_{1},\dots,\tau_{K}]italic_τ = [ italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_τ start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ] denote the agents participation profile, where each agent k=1,,K𝑘1𝐾k=1,\dots,Kitalic_k = 1 , … , italic_K provides {yi,i=1,,τk}formulae-sequencesubscript𝑦𝑖𝑖1subscript𝜏𝑘\{\BFy_{i},i=1,\dots,\tau_{k}\}{ italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_i = 1 , … , italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT }. The associated probability density function of yisubscript𝑦𝑖\BFy_{i}italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is given by f(𝐲i;θ*)𝑓subscript𝐲𝑖superscript𝜃f(\mathbf{y}_{i};\BFtheta^{*})italic_f ( bold_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ; italic_θ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ). And the objective function is given by

L(θ)=1|τ|k=1Kj=1τklogf(𝐲j;θ).𝐿𝜃1𝜏subscriptsuperscript𝐾𝑘1subscriptsuperscriptsubscript𝜏𝑘𝑗1𝑙𝑜𝑔𝑓subscript𝐲𝑗𝜃{L(\BFtheta)=-\frac{1}{|\BFtau|}\sum^{K}_{k=1}\sum^{\tau_{k}}_{j=1}\ log\ f(% \mathbf{y}_{j};\BFtheta).}italic_L ( italic_θ ) = - divide start_ARG 1 end_ARG start_ARG | italic_τ | end_ARG ∑ start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT ∑ start_POSTSUPERSCRIPT italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT italic_l italic_o italic_g italic_f ( bold_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ; italic_θ ) . (4)

In practice, θFL*=argminθL(θ)subscriptsuperscript𝜃𝐹𝐿subscriptargmin𝜃𝐿𝜃\BFtheta^{*}_{FL}=\textup{argmin}_{\BFtheta}\ L(\BFtheta)italic_θ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_F italic_L end_POSTSUBSCRIPT = argmin start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT italic_L ( italic_θ ) is not attainable due to optimization/training loss in accuracy, and the platform trains θ^|τ𝒜|FLsubscriptsuperscript^𝜃𝐹𝐿subscript𝜏𝒜\hat{\BFtheta}^{FL}_{|\BFtau_{\mathcal{A}}|}over^ start_ARG italic_θ end_ARG start_POSTSUPERSCRIPT italic_F italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT | end_POSTSUBSCRIPT that converges to θFL*subscriptsuperscript𝜃𝐹𝐿\BFtheta^{*}_{FL}italic_θ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_F italic_L end_POSTSUBSCRIPT. The agents’ total surplus is thus given by

k𝒜ψτk,k(v(|τ𝒜|))=k𝒜ψτk,k(π(w*(θ^|τ𝒜|FL),θ)).subscript𝑘𝒜subscript𝜓subscript𝜏𝑘𝑘𝑣subscript𝜏𝒜subscript𝑘𝒜subscript𝜓subscript𝜏𝑘𝑘𝜋superscript𝑤subscriptsuperscript^𝜃𝐹𝐿subscript𝜏𝒜superscript𝜃{\sum_{k\in\mathcal{A}}\psi_{\tau_{k},k}(v(|\BFtau_{\mathcal{A}}|))=\sum_{k\in% \mathcal{A}}\psi_{\tau_{k},k}(\pi(\BFw^{*}(\hat{\BFtheta}^{FL}_{|\BFtau_{% \mathcal{A}}|}),\BFtheta^{\ast})).}∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_A end_POSTSUBSCRIPT italic_ψ start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_k end_POSTSUBSCRIPT ( italic_v ( | italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT | ) ) = ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_A end_POSTSUBSCRIPT italic_ψ start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_k end_POSTSUBSCRIPT ( italic_π ( italic_w start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( over^ start_ARG italic_θ end_ARG start_POSTSUPERSCRIPT italic_F italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT | end_POSTSUBSCRIPT ) , italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) . (5)

The cost of federated learning occurs in the synchronization step of Algorithm 1. Let the total number of communication required to converge to the target performance guarantee be Nsync(δ0,Φ,)subscript𝑁𝑠𝑦𝑛𝑐subscript𝛿0ΦN_{sync}(\delta_{0},\Phi,\mathcal{M})italic_N start_POSTSUBSCRIPT italic_s italic_y italic_n italic_c end_POSTSUBSCRIPT ( italic_δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , roman_Φ , caligraphic_M ). At each synchronization, the platform queries for local training results across the coalition of agents. And let the cost per query round be c𝑐citalic_c. we define the system efficiency as

Π(p0,Φ,)=k𝒜ψτk,k(π(w*(θ^|τ𝒜|FL),θ))cNsync(δ0,Φ,),Πsubscript𝑝0Φsubscript𝑘𝒜subscript𝜓subscript𝜏𝑘𝑘𝜋superscript𝑤subscriptsuperscript^𝜃𝐹𝐿subscript𝜏𝒜superscript𝜃𝑐subscript𝑁𝑠𝑦𝑛𝑐subscript𝛿0Φ{\Pi(p_{0},\Phi,\mathcal{M})=\sum_{k\in\mathcal{A}}\psi_{\tau_{k},k}(\pi(\BFw^% {*}(\hat{\BFtheta}^{FL}_{|\BFtau_{\mathcal{A}}|}),\BFtheta^{\ast}))-cN_{sync}(% \delta_{0},\Phi,\mathcal{M}),}roman_Π ( italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , roman_Φ , caligraphic_M ) = ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_A end_POSTSUBSCRIPT italic_ψ start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_k end_POSTSUBSCRIPT ( italic_π ( italic_w start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( over^ start_ARG italic_θ end_ARG start_POSTSUPERSCRIPT italic_F italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT | end_POSTSUBSCRIPT ) , italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ) - italic_c italic_N start_POSTSUBSCRIPT italic_s italic_y italic_n italic_c end_POSTSUBSCRIPT ( italic_δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , roman_Φ , caligraphic_M ) , (6)

which is the agents’ total surplus, subtracting the communication cost cNsync𝑐subscript𝑁𝑠𝑦𝑛𝑐cN_{sync}italic_c italic_N start_POSTSUBSCRIPT italic_s italic_y italic_n italic_c end_POSTSUBSCRIPT. It’s worth noticing that, here we assume that the cost incurred per synchronization round with querying all the identities of agents in coalition is c𝑐citalic_c. If c𝑐citalic_c linearly increases with the number of identities of agents, which essentially assumes that the platform incurs a cost per communication to each agent identity, then having fake identities trivially increases the cost of federated learning. Here, we assume that c𝑐citalic_c, the cost per synchronization round across all the identities of agents can be a constant, that does not increase in the number of identities. Still, in the next section, we manage to show that even under the case where the cost per synchronization round is constant, the total cost of federated learning still grows linearly with the number of fake identities created, which eventually leads to a pitfall of Shapley value based mechanism.

5.2 Analysis of System Efficiency

In section 3, we show that the Shapley value based mechanism is the only mechanism that satisfies the desired properties of axioms and leads to efficient allocation with no extra surplus left on the table. In this section, we focus more on how the mechanism would impact the platform training process. Specifically, while false-name participation in data splitting would not negatively impact the estimator quality, as data splitting does not vary the total number of the observations, false-name participation would significantly increase the training cost to attain the performance guarantee, as it creates redundant communications and operations between the central platform and the fake identities of agents.

Under the following assumption, we can show that Nsyncsubscript𝑁𝑠𝑦𝑛𝑐N_{sync}italic_N start_POSTSUBSCRIPT italic_s italic_y italic_n italic_c end_POSTSUBSCRIPT strictly increases with the number of participating agents K𝐾Kitalic_K.

Assumption 5.1 (L𝐿Litalic_L-smoothness, bounded gradient and strong convexity)

Lk(θ)subscript𝐿𝑘𝜃L_{k}(\BFtheta)italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_θ ) is L𝐿Litalic_L-smooth for all k𝑘kitalic_k. Lk(θ)ξnormnormal-∇subscript𝐿𝑘𝜃𝜉\|\nabla L_{k}(\BFtheta)\|\leq\xi∥ ∇ italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_θ ) ∥ ≤ italic_ξ for all θ𝜃\BFthetaitalic_θ, k𝑘kitalic_k. θFL*subscriptsuperscript𝜃𝐹𝐿\BFtheta^{*}_{FL}italic_θ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_F italic_L end_POSTSUBSCRIPT is the unique minimizer of L(θ)𝐿𝜃L(\BFtheta)italic_L ( italic_θ ) in the interior of the parameter space Θnormal-Θ\Thetaroman_Θ, H(θ)=2L(θ)𝐻𝜃superscriptnormal-∇2𝐿𝜃H(\BFtheta)=\nabla^{2}L(\BFtheta)italic_H ( italic_θ ) = ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_L ( italic_θ ) is smooth, and all eigenvalues of H(θFL*)𝐻subscriptsuperscript𝜃𝐹𝐿H(\BFtheta^{*}_{FL})italic_H ( italic_θ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_F italic_L end_POSTSUBSCRIPT ) are strictly positive.

Assumption 5.1 guarantees that the gradient descent algorithm converges under reasonable choices of step size, and the minimizer to the loss function also converges to the ground truth estimator θ*superscript𝜃\BFtheta^{*}italic_θ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT. With assumption 5.1, we are now ready to state the main proposition on system efficiency.

Theorem 5.2

Under the MCFL framework, the system efficiency of the Shapley value based mechanism is upper bounded by

Π(δ0,Φ,=Shapley)v(|m|)c((64Lθ0θ*2μ+12σ2Lμ)1/2+ξ4L)3(ε(|m|,δ0))3Πsubscript𝛿0ΦShapley𝑣𝑚𝑐superscriptsuperscript64𝐿superscriptnormsubscript𝜃0superscript𝜃2𝜇12superscript𝜎2𝐿𝜇12𝜉4𝐿3superscript𝜀𝑚subscript𝛿03{\Pi(\delta_{0},\Phi,\mathcal{M}=\textup{Shapley})\leq v(|\BFm|)-c\left(\left(% \frac{64L\|\BFtheta_{0}-\BFtheta^{*}\|^{2}}{\mu}+\frac{12\sigma^{2}}{L\mu}% \right)^{1/2}+\frac{\xi}{4L}\right)^{3}(\varepsilon(|\BFm|,\delta_{0}))^{-3}}roman_Π ( italic_δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , roman_Φ , caligraphic_M = Shapley ) ≤ italic_v ( | italic_m | ) - italic_c ( ( divide start_ARG 64 italic_L ∥ italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - italic_θ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_μ end_ARG + divide start_ARG 12 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_L italic_μ end_ARG ) start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT + divide start_ARG italic_ξ end_ARG start_ARG 4 italic_L end_ARG ) start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ( italic_ε ( | italic_m | , italic_δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT (7)

Specifically, with ε(|τ𝒜|,δ0)=β1log(β2δ0)|τ𝒜|α𝜀subscript𝜏𝒜subscript𝛿0subscript𝛽1𝑙𝑜𝑔subscript𝛽2subscript𝛿0superscriptsubscript𝜏𝒜𝛼\varepsilon(|\BFtau_{\mathcal{A}}|,\delta_{0})=\beta_{1}log(\frac{\beta_{2}}{% \delta_{0}})|\BFtau_{\mathcal{A}}|^{-\alpha}italic_ε ( | italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT | , italic_δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = italic_β start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_l italic_o italic_g ( divide start_ARG italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG italic_δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG ) | italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT | start_POSTSUPERSCRIPT - italic_α end_POSTSUPERSCRIPT, the system efficiency Π(δ0,Φ,)normal-Πsubscript𝛿0normal-Φ\Pi(\delta_{0},\Phi,\mathcal{M})roman_Π ( italic_δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , roman_Φ , caligraphic_M ) is bounded by

Π(δ0,Φ,)v(|m|)cλ|m|3α,Πsubscript𝛿0Φ𝑣𝑚𝑐𝜆superscript𝑚3𝛼{\Pi(\delta_{0},\Phi,\mathcal{M})\leq v(|\BFm|)-c\lambda|\BFm|^{3\alpha},}roman_Π ( italic_δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , roman_Φ , caligraphic_M ) ≤ italic_v ( | italic_m | ) - italic_c italic_λ | italic_m | start_POSTSUPERSCRIPT 3 italic_α end_POSTSUPERSCRIPT , (8)

with λ𝜆\lambdaitalic_λ being a constant.

In concentration bounds, α>0𝛼0\alpha>0italic_α > 0. For example, when applying Hoeffding’s style bounds to MLE estimation with i.i.d. Bernoulli random variables with parameter P(yi=1)=θ𝑃subscript𝑦𝑖1𝜃P(y_{i}=1)=\thetaitalic_P ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1 ) = italic_θ gives α=12𝛼12\alpha=\frac{1}{2}italic_α = divide start_ARG 1 end_ARG start_ARG 2 end_ARG, and the system loss increases in the order of |m|3/2superscript𝑚32|\BFm|^{3/2}| italic_m | start_POSTSUPERSCRIPT 3 / 2 end_POSTSUPERSCRIPT as we increase number of observations |m|𝑚|\BFm|| italic_m |. Moreover, by the concavity of v(|m|)𝑣𝑚v(|\BFm|)italic_v ( | italic_m | ), as long as α>13𝛼13\alpha>\frac{1}{3}italic_α > divide start_ARG 1 end_ARG start_ARG 3 end_ARG, adding an extra observation would hurt the system efficiency at a certain point under false name manipulation due to the increased cost in communication.

Theorem 5.2 underscores the importance of development for new mechanism designs that are more attuned to the practical challenges and operational realities of FL, moving beyond traditional Shapley valued based approaches to ensure more effective and efficient collaborative learning. It’s worth mentioning that, there is no free lunch in designing a mechanism that satisfies all the desired properties, while still being robust to false name manipulation and minimizing computation cost. As the previous result suggests, Shapley is the only mechanism that satisfies all the desired axioms, however, it is not robust enough to prevent data splitting. This implies that there is no optimal mechanism that satisfies the desired properties, while still minimizing the the communication cost in FL. In practice, the platform needs to carefully balance the trade-offs when designing the mechanism.

In the following subsection, we introduce a simple numerical example to demonstrate the harm of data splitting to FL algorithm.

5.2.1 Numerical Example

The Newsvendor Problem

In this numerical example, we focus on the newsvendor problem as presented in Example 3.1. Specifically, we assume the demand data held by each agent independently follows the distribution of di=xiTθ*+ϵisubscript𝑑𝑖subscriptsuperscript𝑥𝑇𝑖superscript𝜃subscriptitalic-ϵ𝑖d_{i}=\BFx^{T}_{i}\BFtheta^{*}+\epsilon_{i}italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_x start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT + italic_ϵ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, and yi:=(xi,di)assignsubscript𝑦𝑖subscript𝑥𝑖subscript𝑑𝑖\BFy_{i}:=(\BFx_{i},d_{i})italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT := ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ), where xipsubscript𝑥𝑖superscript𝑝\BFx_{i}\in\mathbb{R}^{p}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT is the contextual information, and ϵisubscriptitalic-ϵ𝑖\epsilon_{i}italic_ϵ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT follows the Normal distribution with zero mean and variance σ2superscript𝜎2\sigma^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. Here, θ*psuperscript𝜃superscript𝑝\BFtheta^{*}\in\mathbb{R}^{p}italic_θ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT is the unknown estimator we try to obtain from FL. According to the newsvendor problem formulation, lj(w,dj)=h(wdj)++b(djw)+subscript𝑙𝑗𝑤subscript𝑑𝑗superscript𝑤subscript𝑑𝑗𝑏superscriptsubscript𝑑𝑗𝑤l_{j}(w,d_{j})=h(w-d_{j})^{+}+b(d_{j}-w)^{+}italic_l start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_w , italic_d start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) = italic_h ( italic_w - italic_d start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT + italic_b ( italic_d start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT - italic_w ) start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT, where (wdi)+=max{0,wdi}superscript𝑤subscript𝑑𝑖0𝑤subscript𝑑𝑖(w-d_{i})^{+}=\max\{0,w-d_{i}\}( italic_w - italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT = roman_max { 0 , italic_w - italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT }, and (diw)+=max{0,diw)+}(d_{i}-w)^{+}=\max\{0,d_{i}-w)^{+}\}( italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_w ) start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT = roman_max { 0 , italic_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_w ) start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT }, the platform objective is to obtain an estimator θ^^𝜃\hat{\BFtheta}over^ start_ARG italic_θ end_ARG that minimizes the following L2-regularized NV objective

θ^=argminθL(θ)=k=1KyjSklj(yjTθ,dj)+λθ22.^𝜃subscript𝜃𝐿𝜃subscriptsuperscript𝐾𝑘1subscriptsubscript𝑦𝑗subscript𝑆𝑘subscript𝑙𝑗subscriptsuperscript𝑦𝑇𝑗𝜃subscript𝑑𝑗𝜆subscriptsuperscriptnorm𝜃22{\hat{\BFtheta}=\arg\min_{\BFtheta}\ L(\BFtheta)=\sum^{K}_{k=1}\sum_{\BFy_{j}% \in S_{k}}\ l_{j}(\BFy^{T}_{j}\BFtheta,d_{j})+\lambda\|\BFtheta\|^{2}_{2}.}over^ start_ARG italic_θ end_ARG = roman_arg roman_min start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT italic_L ( italic_θ ) = ∑ start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ italic_S start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_y start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_θ , italic_d start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) + italic_λ ∥ italic_θ ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT . (9)

In the following numerical example, we consider the case where h=0.10.1h=0.1italic_h = 0.1, b=0.9𝑏0.9b=0.9italic_b = 0.9, and λ=1𝜆1\lambda=1italic_λ = 1. We further set σ2=2.25superscript𝜎22.25\sigma^{2}=2.25italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 2.25 and let ϵisubscriptitalic-ϵ𝑖\epsilon_{i}italic_ϵ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT follows distribution of N(0,σ2)𝑁0superscript𝜎2N(0,\sigma^{2})italic_N ( 0 , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). We let xjsubscript𝑥𝑗x_{j}italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT follows distribution of N(0,σx2)𝑁0subscriptsuperscript𝜎2𝑥N(0,\sigma^{2}_{x})italic_N ( 0 , italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ) where σx=2subscript𝜎𝑥2\sigma_{x}=2italic_σ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT = 2. We assume there are two agents, denoted by agent k𝑘kitalic_k and agent j𝑗jitalic_j. Agent k𝑘kitalic_k possesses samples y1subscript𝑦1\BFy_{1}italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and y2subscript𝑦2\BFy_{2}italic_y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT while agent j𝑗jitalic_j possesses y3subscript𝑦3\BFy_{3}italic_y start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT and y4subscript𝑦4\BFy_{4}italic_y start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT. Agent k𝑘kitalic_k (or j𝑗jitalic_j) could potentially participate under identity k1subscript𝑘1k_{1}italic_k start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT (or j1subscript𝑗1j_{1}italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT) and k2subscript𝑘2k_{2}italic_k start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT (or j2subscript𝑗2j_{2}italic_j start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT), with each fake identity contributing one sample. Figure 1 compares the convergence and number of synchronizations with and without data splitting. Figure 1 shows that having fake identities adds noise to the convergence process during the epochs when each agent independently performs local gradient descents without synchronization. And in Figure 0(b), when agents split data, the convergence speed significantly decreases compared to Figure 0(a), when there are no fake identities, fixing the number of synchronizations the same. Hence, in order to converge to the same performance guarantee as in Figure 0(a), the total number of synchronization required increases from 6666 to 19191919 in Figure 0(c), which doubles the cost of communication.

Refer to caption
(a) Performance of FL, no data splitting.
Refer to caption
(b) Data splitting, fixing Nsyncsubscript𝑁syncN_{\text{sync}}italic_N start_POSTSUBSCRIPT sync end_POSTSUBSCRIPT the same as Fig.0(a).
Refer to caption
(c) Performance of FL under data splitting, increasing number of synchronization Nsyncsubscript𝑁syncN_{\text{sync}}italic_N start_POSTSUBSCRIPT sync end_POSTSUBSCRIPT such that output θ^FLsuperscript^𝜃𝐹𝐿\hat{\theta}^{FL}over^ start_ARG italic_θ end_ARG start_POSTSUPERSCRIPT italic_F italic_L end_POSTSUPERSCRIPT is within 101superscript10110^{-1}10 start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT distance to θ^FLsuperscript^𝜃𝐹𝐿\hat{\theta}^{FL}over^ start_ARG italic_θ end_ARG start_POSTSUPERSCRIPT italic_F italic_L end_POSTSUPERSCRIPT obtained in Fig.0(a) after T𝑇Titalic_T epochs.
Figure 1: Performance of FL under (potential) data splitting.

We further investigate the cost in newsvendor. In figure 2, we observe that the better quality of estimator presented in figure 1 directly leads to better performance and cost-reduction in decision-making under uncertainty for the decision-aware agents. Without data splitting, the newsvendor loss quickly converges to the minimized loss for the decision-aware agents. However, when agents split data, in figure 1(b), we zoom in to the 15 - 55 epochs and find that similar to the estimator performance in figure 0(b), the loss also oscillates around the optimal loss, and within T=55𝑇55T=55italic_T = 55 epochs, the platform cannot provide an estimator that satisfies the performance guarantee. Hence, the platform is required to increase the number of synchronization from 6666 to 19191919, in order to promise the guaranteed surplus to the decision-aware agents. This further hugely increases the communication cost.

Refer to caption
(a) Performance of FL, no data splitting.
Refer to caption
(b) Data splitting, fixing Nsyncsubscript𝑁syncN_{\text{sync}}italic_N start_POSTSUBSCRIPT sync end_POSTSUBSCRIPT the same as Fig.1(a).
Refer to caption
(c) Performance of FL under data splitting, increasing number of synchronization Nsyncsubscript𝑁syncN_{\text{sync}}italic_N start_POSTSUBSCRIPT sync end_POSTSUBSCRIPT such that output θ^FLsuperscript^𝜃𝐹𝐿\hat{\theta}^{FL}over^ start_ARG italic_θ end_ARG start_POSTSUPERSCRIPT italic_F italic_L end_POSTSUPERSCRIPT is within 101superscript10110^{-1}10 start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT distance to θ^FLsuperscript^𝜃𝐹𝐿\hat{\theta}^{FL}over^ start_ARG italic_θ end_ARG start_POSTSUPERSCRIPT italic_F italic_L end_POSTSUPERSCRIPT obtained in Fig.1(a) after T𝑇Titalic_T epochs.
Figure 2: Performance of FL under (potential) data splitting.
Portfolio Optimization

We consider a risk-averse portfolio optimization problem as an example of the operational decision among agents. We let a random vector ξd𝜉superscript𝑑\BFxi\in\mathbb{R}^{d}italic_ξ ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT denote the random return, and the agent aims to make investment decisions wd𝑤superscript𝑑\BFw\in\mathbb{R}^{d}italic_w ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT and w0subscript𝑤0w_{0}\in\mathbb{R}italic_w start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∈ blackboard_R to optimize the allocation of assets. The objective is formulated as c(w,w0,ξ):=α(l=1dwlξlw0)2l=1dwlξlassign𝑐𝑤subscript𝑤0𝜉𝛼superscriptsuperscriptsubscript𝑙1𝑑superscript𝑤𝑙superscript𝜉𝑙subscript𝑤02superscriptsubscript𝑙1𝑑superscript𝑤𝑙superscript𝜉𝑙c(\BFw,w_{0},\BFxi):=\alpha(\sum_{l=1}^{d}w^{l}\xi^{l}-w_{0})^{2}-\sum_{l=1}^{% d}w^{l}\xi^{l}italic_c ( italic_w , italic_w start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_ξ ) := italic_α ( ∑ start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT italic_w start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT italic_ξ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT - italic_w start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - ∑ start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT italic_w start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT italic_ξ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT, where wlsuperscript𝑤𝑙w^{l}italic_w start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT and ξlsuperscript𝜉𝑙\xi^{l}italic_ξ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT denote the l𝑙litalic_l-th component, respectively. Moreover, we assume that ξi=xiTθ*+N(0,σ)subscript𝜉𝑖subscriptsuperscript𝑥𝑇𝑖superscript𝜃𝑁0𝜎\xi_{i}=\BFx^{T}_{i}\BFtheta^{*}+N(0,\sigma)italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_x start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT + italic_N ( 0 , italic_σ ), where xiR1×psubscript𝑥𝑖superscript𝑅1𝑝\BFx_{i}\in R^{1\times p}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_R start_POSTSUPERSCRIPT 1 × italic_p end_POSTSUPERSCRIPT is fixed local feature data. And yi:=(xi,ξi)assignsubscript𝑦𝑖subscript𝑥𝑖subscript𝜉𝑖\BFy_{i}:=(\BFx_{i},\xi_{i})italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT := ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_ξ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ). Here, θ*p×1superscript𝜃superscript𝑝1\BFtheta^{*}\in\mathbb{R}^{p\times 1}italic_θ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_p × 1 end_POSTSUPERSCRIPT is the unknown estimator we try to obtain from FL. In the MCFL framework, the agents obtain a maximum likelihood estimator θ^^𝜃\hat{\BFtheta}over^ start_ARG italic_θ end_ARG by minimizing the mean square error (MSE) loss function lMSE(ξ,ξ):=ξξ2assignsubscript𝑙𝑀𝑆𝐸𝜉superscript𝜉superscriptnorm𝜉superscript𝜉2l_{MSE}(\BFxi,\BFxi^{\prime}):=\|\BFxi-\BFxi^{\prime}\|^{2}italic_l start_POSTSUBSCRIPT italic_M italic_S italic_E end_POSTSUBSCRIPT ( italic_ξ , italic_ξ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) := ∥ italic_ξ - italic_ξ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT for any ξ𝜉\BFxiitalic_ξ, ξsuperscript𝜉\BFxi^{\prime}italic_ξ start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, through FL:

θ^=argminθL(θ)=k=1KyjSklMSE(xjTθ,ξj).^𝜃subscript𝜃𝐿𝜃superscriptsubscript𝑘1𝐾subscriptsubscript𝑦𝑗subscript𝑆𝑘subscript𝑙𝑀𝑆𝐸subscriptsuperscript𝑥𝑇𝑗𝜃subscript𝜉𝑗\hat{\BFtheta}=\arg\min_{\BFtheta}L(\BFtheta)=\sum_{k=1}^{K}\sum_{\BFy_{j}\in S% _{k}}l_{MSE}(\BFx^{T}_{j}\BFtheta,\BFxi_{j}).over^ start_ARG italic_θ end_ARG = roman_arg roman_min start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT italic_L ( italic_θ ) = ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ italic_S start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT italic_M italic_S italic_E end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_θ , italic_ξ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) .

The following Figure 3 demonstrates the intuition on how data splitting hurts the performance of FL under portfolio optimization. In the following numerical example, we consider the case where two agents may split their samples and participate as four agents under fake identities. The details of the numerical setup are as follows. There are two agents, denoted by agent k𝑘kitalic_k and agent j𝑗jitalic_j. Agent k𝑘kitalic_k possesses mk=8subscript𝑚𝑘8m_{k}=8italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 8 samples of [xk]Rmk×1delimited-[]subscript𝑥𝑘superscript𝑅subscript𝑚𝑘1[\BFx_{k}]\in R^{m_{k}\times 1}[ italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ] ∈ italic_R start_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT × 1 end_POSTSUPERSCRIPT and [ξk]Rmkdelimited-[]subscript𝜉𝑘superscript𝑅subscript𝑚𝑘[\BFxi_{k}]\in R^{m_{k}}[ italic_ξ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ] ∈ italic_R start_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, while agent j𝑗jitalic_j possesses mj=8subscript𝑚𝑗8m_{j}=8italic_m start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = 8 samples of [xj×1]Rmjdelimited-[]subscript𝑥𝑗1superscript𝑅subscript𝑚𝑗[\BFx_{j}\times 1]\in R^{m_{j}}[ italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT × 1 ] ∈ italic_R start_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT and [ξj]Rmjdelimited-[]subscript𝜉𝑗superscript𝑅subscript𝑚𝑗[\BFxi_{j}]\in R^{m_{j}}[ italic_ξ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ] ∈ italic_R start_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT. Observations of xksubscript𝑥𝑘x_{k}italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT and xjsubscript𝑥𝑗x_{j}italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT follow i.i.d. standard normal distribution with xk,xjN(0, 1)similar-tosubscript𝑥𝑘subscript𝑥𝑗𝑁01x_{k},x_{j}\sim N(0,\ 1)italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∼ italic_N ( 0 , 1 ). and ξk=xkTθ*+N(0,σ)subscript𝜉𝑘subscriptsuperscript𝑥𝑇𝑘superscript𝜃𝑁0𝜎\xi_{k}=x^{T}_{k}\theta^{*}+N(0,\sigma)italic_ξ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_x start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT + italic_N ( 0 , italic_σ ), with θ*=1superscript𝜃1\theta^{*}=1italic_θ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT = 1, and σ=0.01𝜎0.01\sigma=0.01italic_σ = 0.01. Further more, we set α=12𝛼12\alpha=\frac{1}{2}italic_α = divide start_ARG 1 end_ARG start_ARG 2 end_ARG in the the objective function. The platform adopts algorithm 1 to obtain θ^|τ𝒜|FLsubscriptsuperscript^𝜃𝐹𝐿subscript𝜏𝒜\hat{\theta}^{FL}_{|\BFtau_{\mathcal{A}}|}over^ start_ARG italic_θ end_ARG start_POSTSUPERSCRIPT italic_F italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT | end_POSTSUBSCRIPT. The FL algorithm parameter is given by Φ={ρ=0.1,θ0=2,T=55,H=10}Φformulae-sequence𝜌0.1formulae-sequencesuperscript𝜃02formulae-sequence𝑇55𝐻10\Phi=\{\rho=0.1,\theta^{0}=2,T=55,H=10\}roman_Φ = { italic_ρ = 0.1 , italic_θ start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT = 2 , italic_T = 55 , italic_H = 10 }. Agent k𝑘kitalic_k (or j𝑗jitalic_j) could potentially participate under identity k1subscript𝑘1k_{1}italic_k start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT (or j1subscript𝑗1j_{1}italic_j start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT) and k2subscript𝑘2k_{2}italic_k start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT (or j2subscript𝑗2j_{2}italic_j start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT), with each fake identity contributing 4 samples.

Refer to caption
(a) Performance of FL, no data splitting.
Refer to caption
(b) Data splitting, fixing Nsyncsubscript𝑁syncN_{\text{sync}}italic_N start_POSTSUBSCRIPT sync end_POSTSUBSCRIPT the same as Fig.2(a).
Refer to caption
(c) Performance of FL under data splitting, increasing number of synchronization Nsyncsubscript𝑁syncN_{\text{sync}}italic_N start_POSTSUBSCRIPT sync end_POSTSUBSCRIPT such that output θ^FLsuperscript^𝜃𝐹𝐿\hat{\theta}^{FL}over^ start_ARG italic_θ end_ARG start_POSTSUPERSCRIPT italic_F italic_L end_POSTSUPERSCRIPT is within 103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT distance to θ^FLsuperscript^𝜃𝐹𝐿\hat{\theta}^{FL}over^ start_ARG italic_θ end_ARG start_POSTSUPERSCRIPT italic_F italic_L end_POSTSUPERSCRIPT obtained in Fig.2(a) after T𝑇Titalic_T epochs.
Figure 3: Performance of FL under (potential) data splitting.

Similarly, once an estimator θ^^𝜃\hat{\BFtheta}over^ start_ARG italic_θ end_ARG is obtained, we look into how the quality of the estimator translates to the profits of decision-aware agents. To be more specific, we evaluate the out-of-sample profit, evaluated by the objective function c𝑐citalic_c. From figure 4, we observe that similar to the case of newsvendor, the quality of the estimator nicely translates into the profits for decision-aware agents. Here, unlike the newsvendor problem where we try to minimize loss, in portfolio optimization, the platform tries to maximize the profit, and data splitting directly leads to a potential decrease in the profits that the agents would gain.

Refer to caption
(a) Performance of FL, no data splitting.
Refer to caption
(b) Data splitting, fixing Nsyncsubscript𝑁syncN_{\text{sync}}italic_N start_POSTSUBSCRIPT sync end_POSTSUBSCRIPT the same as Fig.3(a).
Refer to caption
(c) Performance of FL under data splitting, increasing number of synchronization Nsyncsubscript𝑁syncN_{\text{sync}}italic_N start_POSTSUBSCRIPT sync end_POSTSUBSCRIPT such that output θ^FLsuperscript^𝜃𝐹𝐿\hat{\theta}^{FL}over^ start_ARG italic_θ end_ARG start_POSTSUPERSCRIPT italic_F italic_L end_POSTSUPERSCRIPT is within 103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT distance to θ^FLsuperscript^𝜃𝐹𝐿\hat{\theta}^{FL}over^ start_ARG italic_θ end_ARG start_POSTSUPERSCRIPT italic_F italic_L end_POSTSUPERSCRIPT obtained in Fig.3(a) after T𝑇Titalic_T epochs.
Figure 4: Performance of FL under (potential) data splitting.

6 Conclusions

In conclusion, our study has shed light on the intricate dynamics of collaborative learning in multi-agent systems through the lens of Federated Learning (FL) technology and Shapley value-based mechanisms. By establishing a comprehensive framework, we have underscored the critical role of platform-facilitated collaboration among decision-aware agents and delved into the nuanced impacts of mechanism design on both decision quality and FL algorithm efficiency.

Our investigation reveals that while Shapley value based mechanism ensures fair allocation and guarantees quality decisions among agents through encouraging full participation, they inadvertently introduce significant communication costs during the FL process due to the agents’ dishonest behavior of false-name manipulation, highlighting a crucial trade-off between decision quality and operational efficiency. This discovery not only addresses a gap in existing research but also opens new avenues for exploring mechanism design that balances decision quality with the practicalities of implementation in FL environments.

Moreover, our work stands as a pioneering effort to systematically explore the interplay between mechanism design and FL performance, offering valuable insights for both theoreticians and practitioners interested in optimizing collaborative learning settings. The identification of Shapley value mechanisms’ limitations further enriches the studies in collaborative learning, prompting a reevaluation of widely accepted practices and encouraging the development of more efficient, cost-effective solutions. Several future promising directions involve investigating other mechanisms beyond Shapley value and analyzing this framework under more specific business or operations contexts, including pricing, and inventory management, to name a few.

References

  • AbdulRahman et al. (2020) AbdulRahman S, Tout H, Ould-Slimane H, Mourad A, Talhi C, Guizani M (2020) A survey on federated learning: The journey from centralized to distributed on-site learning and beyond. IEEE Internet of Things Journal 8(7):5476–5497.
  • Anily and Haviv (2010) Anily S, Haviv M (2010) Cooperation in service systems. Operations Research 58(3):660–673.
  • Arora and Jain (2023) Arora A, Jain T (2023) Data sharing between platform and seller: An analysis of contracts, privacy, and regulation. European Journal of Operational Research .
  • Aziz et al. (2011) Aziz H, Bachrach Y, Elkind E, Paterson M (2011) False-name manipulations in weighted voting games. Journal of Artificial Intelligence Research 40:57–93.
  • Bergantinos and Moreno-Ternero (2020) Bergantinos G, Moreno-Ternero JD (2020) Sharing the revenues from broadcasting sport events. Management Science 66(6):2417–2431.
  • Branzei et al. (2008) Branzei R, Dimitrov D, Tijs S (2008) Models in cooperative game theory, volume 556 (Springer Science & Business Media).
  • Choudhury (2023) Choudhury O (2023) Federated learning on aws with fedml: Health analytics without sharing sensitive data URL https://aws.amazon.com/blogs/machine-learning/federated-learning-on-aws-with-fedml-health-analytics-without-sharing-sensitive-data/, aWS Machine Learning Blog.
  • Chraibi et al. (2019) Chraibi S, Khaled A, Kovalev D, Richtárik P, Salim A, Takáč M (2019) Distributed fixed point methods with compressed iterates. arXiv preprint arXiv:1912.09925 .
  • Conitzer and Yokoo (2010) Conitzer V, Yokoo M (2010) Using mechanism design to prevent false-name manipulations. AI magazine 31(4):65–78.
  • Gafni and Tennenholtz (2022) Gafni Y, Tennenholtz M (2022) Long-term data sharing under exclusivity attacks. Proceedings of the 23rd ACM Conference on Economics and Computation, 739–759.
  • Ghorbani and Zou (2019) Ghorbani A, Zou J (2019) Data shapley: Equitable valuation of data for machine learning. International conference on machine learning, 2242–2251 (PMLR).
  • Gopalakrishnan et al. (2014) Gopalakrishnan R, Marden JR, Wierman A (2014) Potential games are necessary to ensure pure nash equilibria in cost sharing games. Mathematics of Operations Research 39(4):1252–1296.
  • Gopalakrishnan et al. (2021) Gopalakrishnan S, Granot D, Granot F, Sošić G, Cui H (2021) Incentives and emission responsibility allocation in supply chains. Management Science 67(7):4172–4190.
  • Gopalakrishnan and Sankaranarayanan (2023) Gopalakrishnan S, Sankaranarayanan S (2023) Cooperative security against interdependent risks. Production and Operations Management 32(11):3504–3520.
  • Hamer et al. (2020) Hamer J, Mohri M, Suresh AT (2020) Fedboost: A communication-efficient algorithm for federated learning. International Conference on Machine Learning, 3973–3983 (PMLR).
  • Hsiao (2004) Hsiao CR (2004) The power indices for multi-choice multi-valued games. Taiwanese Journal of Mathematics 8(2):259–270.
  • Hsiao and Raghavan (1993) Hsiao CR, Raghavan T (1993) Shapley value for multichoice cooperative games, i. Games and economic behavior 5(2):240–256.
  • Iwasaki et al. (2010) Iwasaki A, Conitzer V, Omori Y, Sakurai Y, Todo T, Guo M, Yokoo M (2010) Worst-case efficiency ratio in false-name-proof combinatorial auction mechanisms. Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1-Volume 1, 633–640.
  • Jia et al. (2019) Jia R, Dao D, Wang B, Hubis FA, Hynes N, Gürel NM, Li B, Zhang C, Song D, Spanos CJ (2019) Towards efficient data valuation based on the shapley value. The 22nd International Conference on Artificial Intelligence and Statistics, 1167–1176 (PMLR).
  • Kairouz et al. (2021) Kairouz P, McMahan HB, Avent B, Bellet A, Bennis M, Bhagoji AN, Bonawitz K, Charles Z, Cormode G, Cummings R, et al. (2021) Advances and open problems in federated learning. Foundations and Trends® in Machine Learning 14(1–2):1–210.
  • Karimireddy et al. (2022) Karimireddy SP, Guo W, Jordan MI (2022) Mechanisms that incentivize data sharing in federated learning. arXiv preprint arXiv:2207.04557 .
  • Kemahlıoğlu-Ziya and Bartholdi III (2011) Kemahlıoğlu-Ziya E, Bartholdi III JJ (2011) Centralizing inventory in supply chains by using shapley value to allocate the profits. Manufacturing & Service Operations Management 13(2):146–162.
  • Khaled et al. (2019) Khaled A, Mishchenko K, Richtárik P (2019) First analysis of local gd on heterogeneous data. arXiv preprint arXiv:1909.04715 .
  • Konečnỳ et al. (2016) Konečnỳ J, McMahan HB, Yu FX, Richtárik P, Suresh AT, Bacon D (2016) Federated learning: Strategies for improving communication efficiency. arXiv preprint arXiv:1610.05492 .
  • Leng et al. (2021) Leng M, Luo C, Liang L (2021) Multiplayer allocations in the presence of diminishing marginal contributions: Cooperative game analysis and applications in management science. Management Science 67(5):2891–2903.
  • Leng and Parlar (2009) Leng M, Parlar M (2009) Allocation of cost savings in a three-level supply chain with demand information sharing: A cooperative-game approach. Operations Research 57(1):200–213.
  • Mak and Max Shen (2021) Mak HY, Max Shen ZJ (2021) When triple-a supply chains meet digitalization: The case of jd. com’s c2m model. Production and Operations Management 30(3):656–665.
  • Mangasarian (1995) Mangasarian L (1995) Parallel gradient distribution in unconstrained optimization. SIAM Journal on Control and Optimization 33(6):1916–1925.
  • Masters (2019) Masters K (2019) Amazon releases free analytics that brands previously paid $30,000-plus per year for. Forbes URL https://www.forbes.com/sites/kirimasters/2019/02/12/amazon-releases-free-analytics-that-brands-previously-paid-30000-per-year-for/?sh=f48b36c44b4e, last Accessed on 19 August 2023.
  • McMahan et al. (2017) McMahan B, Moore E, Ramage D, Hampson S, y Arcas BA (2017) Communication-efficient learning of deep networks from decentralized data. Artificial intelligence and statistics, 1273–1282 (PMLR).
  • Qi et al. (2021) Qi M, Grigas P, Shen ZJM (2021) Integrated conditional estimation-optimization. arXiv preprint arXiv:2110.12351 .
  • Qi et al. (2020) Qi M, Mak HY, Shen ZJM (2020) Data-driven research in retail operations—a review. Naval Research Logistics (NRL) 67(8):595–616.
  • Rozemberczki et al. (2022) Rozemberczki B, Watson L, Bayer P, Yang HT, Kiss O, Nilsson S, Sarkar R (2022) The shapley value in machine learning. arXiv preprint arXiv:2202.05594 .
  • Shamir et al. (2014) Shamir O, Srebro N, Zhang T (2014) Communication-efficient distributed optimization using an approximate newton-type method. International conference on machine learning, 1000–1008 (PMLR).
  • Shapley et al. (1953) Shapley LS, et al. (1953) A value for n-person games .
  • Sim et al. (2020) Sim RHL, Zhang Y, Chan MC, Low BKH (2020) Collaborative machine learning with incentive-aware model rewards. International conference on machine learning, 8927–8936 (PMLR).
  • Singal et al. (2019) Singal R, Besbes O, Desir A, Goyal V, Iyengar G (2019) Shapley meets uniform: An axiomatic framework for attribution in online advertising. The World Wide Web Conference, 1713–1723.
  • Stich (2018) Stich SU (2018) Local sgd converges fast and communicates little. arXiv preprint arXiv:1805.09767 .
  • Yu et al. (2019) Yu H, Yang S, Zhu S (2019) Parallel restarted sgd with faster convergence and less communication: Demystifying why model averaging works for deep learning. Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, 5693–5700.
  • Yuan and Ma (2020) Yuan H, Ma T (2020) Federated accelerated stochastic gradient descent. Advances in Neural Information Processing Systems, volume 33, 5332–5344.
  • Zeng et al. (2021) Zeng R, Zeng C, Wang X, Li B, Chu X (2021) A comprehensive survey of incentive mechanism for federated learning. arXiv preprint arXiv:2106.15406 .
  • Zhan et al. (2021) Zhan Y, Zhang J, Hong Z, Wu L, Li P, Guo S (2021) A survey of incentive mechanism design for federated learning. IEEE Transactions on Emerging Topics in Computing 10(2):1035–1044.
  • Zhang et al. (2022) Zhang Z, Liu G, Wu J, Tan Y (2022) Data and algorithm pricing: Incentive mechanisms design for federated learning. Available at SSRN 4061980 .
  • Zinkevich et al. (2010) Zinkevich M, et al. (2010) Parallelized stochastic gradient descent. Advances in neural information processing systems, volume 23.
{APPENDICES}

7 Supplementary materials for Theorem 1

In this part, we let

𝒯(m[2:K]):={τ:τk{0,,mk},k=2,,K}assign𝒯subscript𝑚delimited-[]:2𝐾conditional-set𝜏formulae-sequencesubscript𝜏𝑘0subscript𝑚𝑘for-all𝑘2𝐾\mathcal{T}(\BFm_{[2:K]}):=\{\BFtau:\BFtau_{k}\in\{0,\dots,m_{k}\},\forall k=2% ,\dots,K\}caligraphic_T ( italic_m start_POSTSUBSCRIPT [ 2 : italic_K ] end_POSTSUBSCRIPT ) := { italic_τ : italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ { 0 , … , italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT } , ∀ italic_k = 2 , … , italic_K }

which denotes the set of all possible profiles given m[2:K]subscript𝑚delimited-[]:2𝐾\BFm_{[2:K]}italic_m start_POSTSUBSCRIPT [ 2 : italic_K ] end_POSTSUBSCRIPT.

We also define

v|τ|t=v(|τ|+t)v(|τ|+t1),subscriptsuperscript𝑣𝑡𝜏𝑣𝜏𝑡𝑣𝜏𝑡1\nabla v^{t}_{|\BFtau|}=v(|\BFtau|+t)-v(|\BFtau|+t-1),∇ italic_v start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ | end_POSTSUBSCRIPT = italic_v ( | italic_τ | + italic_t ) - italic_v ( | italic_τ | + italic_t - 1 ) ,

and

ctk(τ)=l=0|Mk(τ)|C|Mk(τ)|l(1)l1|τ|+l+t.subscriptsuperscript𝑐𝑘𝑡𝜏subscriptsuperscriptsubscript𝑀𝑘𝜏𝑙0subscriptsuperscript𝐶𝑙subscript𝑀𝑘𝜏superscript1𝑙1𝜏𝑙𝑡c^{k}_{t}(\BFtau)=\sum^{|M_{k}(\BFtau)|}_{l=0}C^{l}_{|M_{k}(\BFtau)|}(-1)^{l}% \frac{1}{|\BFtau|+l+t}.italic_c start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_τ ) = ∑ start_POSTSUPERSCRIPT | italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) | end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l = 0 end_POSTSUBSCRIPT italic_C start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) | end_POSTSUBSCRIPT ( - 1 ) start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG | italic_τ | + italic_l + italic_t end_ARG .

For notation simplicity, in the proof when we analyze the incentive on data splitting for a specific agent k𝑘kitalic_k, without loss of generosity we let k=1𝑘1k=1italic_k = 1, and we omit the dependency on agent index k𝑘kitalic_k and let ctk(τ)=ct(τ)subscriptsuperscript𝑐𝑘𝑡𝜏subscript𝑐𝑡𝜏c^{k}_{t}(\BFtau)=c_{t}(\BFtau)italic_c start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_τ ) = italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_τ ) for future analysis. We first introduce the following lemma.

Lemma 7.1

For any T𝑇Titalic_T and T{1,,i}superscript𝑇normal-′1normal-…𝑖T^{\prime}\in\{1,\dots,i\}italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ { 1 , … , italic_i }, and any maximum data vector m[2:K]subscript𝑚delimited-[]normal-:2𝐾\BFm_{[2:K]}italic_m start_POSTSUBSCRIPT [ 2 : italic_K ] end_POSTSUBSCRIPT, we have

ψT,k1=t=1Tτ𝒯(m[2:K])tct(τ)v|τ|t+t=1T[t1=1Tτ𝒯(m[2:K])tct+t1(τ)(v|τ|t1+tv|τ|t1+t1)].subscript𝜓𝑇subscript𝑘1subscriptsuperscript𝑇𝑡1subscript𝜏𝒯subscript𝑚delimited-[]:2𝐾𝑡subscript𝑐𝑡𝜏subscriptsuperscript𝑣𝑡𝜏subscriptsuperscript𝑇𝑡1delimited-[]subscriptsuperscriptsuperscript𝑇subscript𝑡11subscript𝜏𝒯subscript𝑚delimited-[]:2𝐾𝑡subscript𝑐𝑡subscript𝑡1𝜏subscriptsuperscript𝑣subscript𝑡1𝑡𝜏subscriptsuperscript𝑣subscript𝑡1𝑡1𝜏\psi_{T,k_{1}}=\sum^{T}_{t=1}\sum_{\BFtau\in\mathcal{T}(\BFm_{[2:K]})}tc_{t}(% \BFtau)\nabla v^{t}_{|\BFtau|}+\sum^{T}_{t=1}\left[\sum^{T^{\prime}}_{t_{1}=1}% \sum_{\BFtau\in\mathcal{T}(\BFm_{[2:K]})}tc_{t+t_{1}}(\BFtau)(\nabla v^{t_{1}+% t}_{|\BFtau|}-\nabla v^{t_{1}+t-1}_{|\BFtau|})\right].italic_ψ start_POSTSUBSCRIPT italic_T , italic_k start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT = ∑ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_τ ∈ caligraphic_T ( italic_m start_POSTSUBSCRIPT [ 2 : italic_K ] end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT italic_t italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_τ ) ∇ italic_v start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ | end_POSTSUBSCRIPT + ∑ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT [ ∑ start_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 1 end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_τ ∈ caligraphic_T ( italic_m start_POSTSUBSCRIPT [ 2 : italic_K ] end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT italic_t italic_c start_POSTSUBSCRIPT italic_t + italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_τ ) ( ∇ italic_v start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ | end_POSTSUBSCRIPT - ∇ italic_v start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ | end_POSTSUBSCRIPT ) ] .

Similarly,

ψT,k2=t=1Tτ𝒯(m[2:K])tct(τ)v|τ|t+t=1T[t1=1Tτ𝒯(m[2:K])tct+t1(τ)(v|τ|t1+tv|τ|t1+t1)].subscript𝜓superscript𝑇subscript𝑘2subscriptsuperscriptsuperscript𝑇𝑡1subscript𝜏𝒯subscript𝑚delimited-[]:2𝐾𝑡subscript𝑐𝑡𝜏subscriptsuperscript𝑣𝑡𝜏subscriptsuperscriptsuperscript𝑇𝑡1delimited-[]subscriptsuperscript𝑇subscript𝑡11subscript𝜏𝒯subscript𝑚delimited-[]:2𝐾𝑡subscript𝑐𝑡subscript𝑡1𝜏subscriptsuperscript𝑣subscript𝑡1𝑡𝜏subscriptsuperscript𝑣subscript𝑡1𝑡1𝜏\psi_{T^{\prime},k_{2}}=\sum^{T^{\prime}}_{t=1}\sum_{\BFtau\in\mathcal{T}(\BFm% _{[2:K]})}tc_{t}(\BFtau)\nabla v^{t}_{|\BFtau|}+\sum^{T^{\prime}}_{t=1}\left[% \sum^{T}_{t_{1}=1}\sum_{\BFtau\in\mathcal{T}(\BFm_{[2:K]})}tc_{t+t_{1}}(\BFtau% )(\nabla v^{t_{1}+t}_{|\BFtau|}-\nabla v^{t_{1}+t-1}_{|\BFtau|})\right].italic_ψ start_POSTSUBSCRIPT italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_k start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT = ∑ start_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_τ ∈ caligraphic_T ( italic_m start_POSTSUBSCRIPT [ 2 : italic_K ] end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT italic_t italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_τ ) ∇ italic_v start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ | end_POSTSUBSCRIPT + ∑ start_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT [ ∑ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 1 end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_τ ∈ caligraphic_T ( italic_m start_POSTSUBSCRIPT [ 2 : italic_K ] end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT italic_t italic_c start_POSTSUBSCRIPT italic_t + italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_τ ) ( ∇ italic_v start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ | end_POSTSUBSCRIPT - ∇ italic_v start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ | end_POSTSUBSCRIPT ) ] .
Proof.

Proof of Lemma 7.1

ψT,k1subscript𝜓𝑇subscript𝑘1\displaystyle\psi_{T,k_{1}}italic_ψ start_POSTSUBSCRIPT italic_T , italic_k start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT
=t=1T[t1=0T1τ𝒯(m[2:K])t(l=0|Mk(τ)|+1C|Mk(τ)|+1l(1)l1|τ|+l+t+t1)v|τ|t1+t\displaystyle=\sum^{T}_{t=1}\left[\sum^{T^{\prime}-1}_{t_{1}=0}\sum_{\BFtau\in% \mathcal{T}(\BFm_{[2:K]})}t\left(\sum^{|M_{k}(\BFtau)|+1}_{l=0}C^{l}_{|M_{k}(% \BFtau)|+1}(-1)^{l}\frac{1}{|\BFtau|+l+t+t_{1}}\right)\nabla v^{t_{1}+t}_{|% \BFtau|}\right.= ∑ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT [ ∑ start_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0 end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_τ ∈ caligraphic_T ( italic_m start_POSTSUBSCRIPT [ 2 : italic_K ] end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT italic_t ( ∑ start_POSTSUPERSCRIPT | italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) | + 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l = 0 end_POSTSUBSCRIPT italic_C start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) | + 1 end_POSTSUBSCRIPT ( - 1 ) start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG | italic_τ | + italic_l + italic_t + italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG ) ∇ italic_v start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ | end_POSTSUBSCRIPT
+τ𝒯(m[2:K])t(l=0|Mk(τ)|C|Mk(τ)|l(1)l1|τ|+l+t+T)v|τ|T+t]\displaystyle\quad\left.+\sum_{\BFtau\in\mathcal{T}(\BFm_{[2:K]})}t\left(\sum^% {|M_{k}(\BFtau)|}_{l=0}C^{l}_{|M_{k}(\BFtau)|}(-1)^{l}\frac{1}{|\BFtau|+l+t+T^% {\prime}}\right)\nabla v^{T^{\prime}+t}_{|\BFtau|}\right]+ ∑ start_POSTSUBSCRIPT italic_τ ∈ caligraphic_T ( italic_m start_POSTSUBSCRIPT [ 2 : italic_K ] end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT italic_t ( ∑ start_POSTSUPERSCRIPT | italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) | end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l = 0 end_POSTSUBSCRIPT italic_C start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) | end_POSTSUBSCRIPT ( - 1 ) start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG | italic_τ | + italic_l + italic_t + italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_ARG ) ∇ italic_v start_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT + italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ | end_POSTSUBSCRIPT ]
=t=1T[t1=0T1τ𝒯(m[2:K])t(l=0|Mk(τ)|C|Mk(τ)|+1l(1)l1|τ|+l+t+t1+(1)|Mk(τ)|+11|τ|+(|Mk(τ)|+1)+t+t1)\displaystyle=\sum^{T}_{t=1}\left[\sum^{T^{\prime}-1}_{t_{1}=0}\sum_{\BFtau\in% \mathcal{T}(\BFm_{[2:K]})}t\left(\sum^{|M_{k}(\BFtau)|}_{l=0}C^{l}_{|M_{k}(% \BFtau)|+1}(-1)^{l}\frac{1}{|\BFtau|+l+t+t_{1}}+(-1)^{|M_{k}(\BFtau)|+1}\frac{% 1}{|\BFtau|+(|M_{k}(\BFtau)|+1)+t+t_{1}}\right)\right.= ∑ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT [ ∑ start_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0 end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_τ ∈ caligraphic_T ( italic_m start_POSTSUBSCRIPT [ 2 : italic_K ] end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT italic_t ( ∑ start_POSTSUPERSCRIPT | italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) | end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l = 0 end_POSTSUBSCRIPT italic_C start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) | + 1 end_POSTSUBSCRIPT ( - 1 ) start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG | italic_τ | + italic_l + italic_t + italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG + ( - 1 ) start_POSTSUPERSCRIPT | italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) | + 1 end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG | italic_τ | + ( | italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) | + 1 ) + italic_t + italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG )
v|τ|t1+t+τ𝒯(m[2:K])tct+T(τ)v|τ|t+T]\displaystyle\quad\left.\nabla v^{t_{1}+t}_{|\BFtau|}+\sum_{\BFtau\in\mathcal{% T}(\BFm_{[2:K]})}tc_{t+T^{\prime}}(\BFtau)\nabla v^{t+T^{\prime}}_{|\BFtau|}\right]∇ italic_v start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ | end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_τ ∈ caligraphic_T ( italic_m start_POSTSUBSCRIPT [ 2 : italic_K ] end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT italic_t italic_c start_POSTSUBSCRIPT italic_t + italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_τ ) ∇ italic_v start_POSTSUPERSCRIPT italic_t + italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ | end_POSTSUBSCRIPT ]
=t=1T[t1=0T1τ𝒯(m[2:K])t(l=0|Mk(τ)|[C|Mk(τ)|l1+C|Mk(τ)|l](1)l1|τ|+l+t+t1\displaystyle=\sum^{T}_{t=1}\left[\sum^{T^{\prime}-1}_{t_{1}=0}\sum_{\BFtau\in% \mathcal{T}(\BFm_{[2:K]})}t\left(\sum^{|M_{k}(\BFtau)|}_{l=0}[C^{l-1}_{|M_{k}(% \BFtau)|}+C^{l}_{|M_{k}(\BFtau)|}](-1)^{l}\frac{1}{|\BFtau|+l+t+t_{1}}\right.\right.= ∑ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT [ ∑ start_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0 end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_τ ∈ caligraphic_T ( italic_m start_POSTSUBSCRIPT [ 2 : italic_K ] end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT italic_t ( ∑ start_POSTSUPERSCRIPT | italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) | end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l = 0 end_POSTSUBSCRIPT [ italic_C start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) | end_POSTSUBSCRIPT + italic_C start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) | end_POSTSUBSCRIPT ] ( - 1 ) start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG | italic_τ | + italic_l + italic_t + italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG
+(1)|Mk(τ)|+11|τ|+(|Mk(τ)|+1)+t+t1)v|τ|t1+t+τ𝒯(m[2:K])tct+T(τ)v|τ|t+T],\displaystyle\quad\left.\left.+(-1)^{|M_{k}(\BFtau)|+1}\frac{1}{|\BFtau|+(|M_{% k}(\BFtau)|+1)+t+t_{1}}\right)\nabla v^{t_{1}+t}_{|\BFtau|}+\sum_{\BFtau\in% \mathcal{T}(\BFm_{[2:K]})}tc_{t+T^{\prime}}(\BFtau)\nabla v^{t+T^{\prime}}_{|% \BFtau|}\right],+ ( - 1 ) start_POSTSUPERSCRIPT | italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) | + 1 end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG | italic_τ | + ( | italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) | + 1 ) + italic_t + italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG ) ∇ italic_v start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ | end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_τ ∈ caligraphic_T ( italic_m start_POSTSUBSCRIPT [ 2 : italic_K ] end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT italic_t italic_c start_POSTSUBSCRIPT italic_t + italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_τ ) ∇ italic_v start_POSTSUPERSCRIPT italic_t + italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ | end_POSTSUBSCRIPT ] ,

where we use the recurrence relation of binomial coefficient. And, the previous equation equals to

=t=1T[t1=0T1τ𝒯(m[2:K])tct+t1(τ)v|τ|t1+t+t1=0T1τ𝒯(m[2:K])t(l=1|Mk(τ)|C|Mk(τ)|l1(1)l1|τ|+l+t+t1\displaystyle=\sum^{T}_{t=1}\left[\sum^{T^{\prime}-1}_{t_{1}=0}\sum_{\BFtau\in% \mathcal{T}(\BFm_{[2:K]})}tc_{t+t_{1}}(\BFtau)\nabla v^{t_{1}+t}_{|\BFtau|}+% \sum^{T^{\prime}-1}_{t_{1}=0}\sum_{\BFtau\in\mathcal{T}(\BFm_{[2:K]})}t\left(% \sum^{|M_{k}(\BFtau)|}_{l=1}C^{l-1}_{|M_{k}(\BFtau)|}(-1)^{l}\frac{1}{|\BFtau|% +l+t+t_{1}}\right.\right.= ∑ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT [ ∑ start_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0 end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_τ ∈ caligraphic_T ( italic_m start_POSTSUBSCRIPT [ 2 : italic_K ] end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT italic_t italic_c start_POSTSUBSCRIPT italic_t + italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_τ ) ∇ italic_v start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ | end_POSTSUBSCRIPT + ∑ start_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0 end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_τ ∈ caligraphic_T ( italic_m start_POSTSUBSCRIPT [ 2 : italic_K ] end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT italic_t ( ∑ start_POSTSUPERSCRIPT | italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) | end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT italic_C start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) | end_POSTSUBSCRIPT ( - 1 ) start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG | italic_τ | + italic_l + italic_t + italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG
+(1)|Mk(τ)|+11|τ|+(|Mk(τ)|+1)+t+t1)v|τ|t1+t+τ𝒯(m[2:K])tct+T(τ)v|τ|t+T].\displaystyle\quad\left.\left.+(-1)^{|M_{k}(\BFtau)|+1}\frac{1}{|\BFtau|+(|M_{% k}(\BFtau)|+1)+t+t_{1}}\right)\nabla v^{t_{1}+t}_{|\BFtau|}+\sum_{\BFtau\in% \mathcal{T}(\BFm_{[2:K]})}tc_{t+T^{\prime}}(\BFtau)\nabla v^{t+T^{\prime}}_{|% \BFtau|}\right].+ ( - 1 ) start_POSTSUPERSCRIPT | italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) | + 1 end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG | italic_τ | + ( | italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) | + 1 ) + italic_t + italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG ) ∇ italic_v start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ | end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_τ ∈ caligraphic_T ( italic_m start_POSTSUBSCRIPT [ 2 : italic_K ] end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT italic_t italic_c start_POSTSUBSCRIPT italic_t + italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_τ ) ∇ italic_v start_POSTSUPERSCRIPT italic_t + italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ | end_POSTSUBSCRIPT ] .

The equability is by the definition of ct+t1(τ)subscript𝑐𝑡subscript𝑡1𝜏c_{t+t_{1}}(\BFtau)italic_c start_POSTSUBSCRIPT italic_t + italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_τ ), and changing the index from l=1𝑙1l=1italic_l = 1 to l=0𝑙0l=0italic_l = 0, we have

=t=1T[t1=0T1τ𝒯(m[2:K])tct+t1(τ)v|τ|t1+t+τ𝒯(m[2:K])tct+T(τ)v|τ|t+T\displaystyle=\sum^{T}_{t=1}\left[\sum^{T^{\prime}-1}_{t_{1}=0}\sum_{\BFtau\in% \mathcal{T}(\BFm_{[2:K]})}tc_{t+t_{1}}(\BFtau)\nabla v^{t_{1}+t}_{|\BFtau|}+% \sum_{\BFtau\in\mathcal{T}(\BFm_{[2:K]})}tc_{t+T^{\prime}}(\BFtau)\nabla v^{t+% T^{\prime}}_{|\BFtau|}\right.= ∑ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT [ ∑ start_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0 end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_τ ∈ caligraphic_T ( italic_m start_POSTSUBSCRIPT [ 2 : italic_K ] end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT italic_t italic_c start_POSTSUBSCRIPT italic_t + italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_τ ) ∇ italic_v start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ | end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_τ ∈ caligraphic_T ( italic_m start_POSTSUBSCRIPT [ 2 : italic_K ] end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT italic_t italic_c start_POSTSUBSCRIPT italic_t + italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_τ ) ∇ italic_v start_POSTSUPERSCRIPT italic_t + italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ | end_POSTSUBSCRIPT
t1=0T1τ𝒯(m[2:K])t(l=0|Mk(τ)|C|Mk(τ)|l(1)l|τ|+l+1+t+t1)v|τ|t1+t]\displaystyle\quad\left.-\sum^{T^{\prime}-1}_{t_{1}=0}\sum_{\BFtau\in\mathcal{% T}(\BFm_{[2:K]})}t\left(\sum^{|M_{k}(\BFtau)|}_{l=0}C^{l}_{|M_{k}(\BFtau)|}% \frac{(-1)^{l}}{|\BFtau|+l+1+t+t_{1}}\right)\nabla v^{t_{1}+t}_{|\BFtau|}\right]- ∑ start_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0 end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_τ ∈ caligraphic_T ( italic_m start_POSTSUBSCRIPT [ 2 : italic_K ] end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT italic_t ( ∑ start_POSTSUPERSCRIPT | italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) | end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l = 0 end_POSTSUBSCRIPT italic_C start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) | end_POSTSUBSCRIPT divide start_ARG ( - 1 ) start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT end_ARG start_ARG | italic_τ | + italic_l + 1 + italic_t + italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG ) ∇ italic_v start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ | end_POSTSUBSCRIPT ]
=t=1T[t1=0T1τ𝒯(m[2:K])tct+t1(τ)v|τ|t1+tt1=0T1τ𝒯(m[2:K])tct+t1+1(τ)v|τ|t1+t+τ𝒯(m[2:K])tct+T(τ)v|τ|t+T].absentsubscriptsuperscript𝑇𝑡1delimited-[]subscriptsuperscriptsuperscript𝑇1subscript𝑡10subscript𝜏𝒯subscript𝑚delimited-[]:2𝐾𝑡subscript𝑐𝑡subscript𝑡1𝜏subscriptsuperscript𝑣subscript𝑡1𝑡𝜏subscriptsuperscriptsuperscript𝑇1subscript𝑡10subscript𝜏𝒯subscript𝑚delimited-[]:2𝐾𝑡subscript𝑐𝑡subscript𝑡11𝜏subscriptsuperscript𝑣subscript𝑡1𝑡𝜏subscript𝜏𝒯subscript𝑚delimited-[]:2𝐾𝑡subscript𝑐𝑡superscript𝑇𝜏subscriptsuperscript𝑣𝑡superscript𝑇𝜏\displaystyle=\sum^{T}_{t=1}\left[\sum^{T^{\prime}-1}_{t_{1}=0}\sum_{\BFtau\in% \mathcal{T}(\BFm_{[2:K]})}tc_{t+t_{1}}(\BFtau)\nabla v^{t_{1}+t}_{|\BFtau|}-% \sum^{T^{\prime}-1}_{t_{1}=0}\sum_{\BFtau\in\mathcal{T}(\BFm_{[2:K]})}tc_{t+t_% {1}+1}(\BFtau)\nabla v^{t_{1}+t}_{|\BFtau|}+\sum_{\BFtau\in\mathcal{T}(\BFm_{[% 2:K]})}tc_{t+T^{\prime}}(\BFtau)\nabla v^{t+T^{\prime}}_{|\BFtau|}\right].= ∑ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT [ ∑ start_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0 end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_τ ∈ caligraphic_T ( italic_m start_POSTSUBSCRIPT [ 2 : italic_K ] end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT italic_t italic_c start_POSTSUBSCRIPT italic_t + italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_τ ) ∇ italic_v start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ | end_POSTSUBSCRIPT - ∑ start_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0 end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_τ ∈ caligraphic_T ( italic_m start_POSTSUBSCRIPT [ 2 : italic_K ] end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT italic_t italic_c start_POSTSUBSCRIPT italic_t + italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + 1 end_POSTSUBSCRIPT ( italic_τ ) ∇ italic_v start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ | end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_τ ∈ caligraphic_T ( italic_m start_POSTSUBSCRIPT [ 2 : italic_K ] end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT italic_t italic_c start_POSTSUBSCRIPT italic_t + italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_τ ) ∇ italic_v start_POSTSUPERSCRIPT italic_t + italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ | end_POSTSUBSCRIPT ] .

Similarly, the previous equality comes from the definition of ct+T(τ)subscript𝑐𝑡superscript𝑇𝜏c_{t+T^{\prime}}(\BFtau)italic_c start_POSTSUBSCRIPT italic_t + italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_τ ) and ct+t1+1(τ)subscript𝑐𝑡subscript𝑡11𝜏c_{t+t_{1}+1}(\BFtau)italic_c start_POSTSUBSCRIPT italic_t + italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + 1 end_POSTSUBSCRIPT ( italic_τ ), and changing the index of t1=0subscript𝑡10t_{1}=0italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0 to t1=1subscript𝑡11t_{1}=1italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 1,

ψT,k1subscript𝜓𝑇subscript𝑘1\displaystyle\psi_{T,k_{1}}italic_ψ start_POSTSUBSCRIPT italic_T , italic_k start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT =\displaystyle==
=t=1T[τ𝒯(m[2:K])tct(τ)v|τ|t+t1=1Tτ𝒯(m[2:K])tct+t1(τ)v|τ|t1+tt1=1Tτ𝒯(m[2:K])tct+t1(τ)v|τ|t1+t1]absentsubscriptsuperscript𝑇𝑡1delimited-[]subscript𝜏𝒯subscript𝑚delimited-[]:2𝐾𝑡subscript𝑐𝑡𝜏subscriptsuperscript𝑣𝑡𝜏subscriptsuperscriptsuperscript𝑇subscript𝑡11subscript𝜏𝒯subscript𝑚delimited-[]:2𝐾𝑡subscript𝑐𝑡subscript𝑡1𝜏subscriptsuperscript𝑣subscript𝑡1𝑡𝜏subscriptsuperscriptsuperscript𝑇subscript𝑡11subscript𝜏𝒯subscript𝑚delimited-[]:2𝐾𝑡subscript𝑐𝑡subscript𝑡1𝜏subscriptsuperscript𝑣subscript𝑡1𝑡1𝜏\displaystyle=\sum^{T}_{t=1}\left[\sum_{\BFtau\in\mathcal{T}(\BFm_{[2:K]})}tc_% {t}(\BFtau)\nabla v^{t}_{|\BFtau|}+\sum^{T^{\prime}}_{t_{1}=1}\sum_{\BFtau\in% \mathcal{T}(\BFm_{[2:K]})}tc_{t+t_{1}}(\BFtau)\nabla v^{t_{1}+t}_{|\BFtau|}-% \sum^{T^{\prime}}_{t_{1}=1}\sum_{\BFtau\in\mathcal{T}(\BFm_{[2:K]})}tc_{t+t_{1% }}(\BFtau)\nabla v^{t_{1}+t-1}_{|\BFtau|}\right]= ∑ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT [ ∑ start_POSTSUBSCRIPT italic_τ ∈ caligraphic_T ( italic_m start_POSTSUBSCRIPT [ 2 : italic_K ] end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT italic_t italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_τ ) ∇ italic_v start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ | end_POSTSUBSCRIPT + ∑ start_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 1 end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_τ ∈ caligraphic_T ( italic_m start_POSTSUBSCRIPT [ 2 : italic_K ] end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT italic_t italic_c start_POSTSUBSCRIPT italic_t + italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_τ ) ∇ italic_v start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ | end_POSTSUBSCRIPT - ∑ start_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 1 end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_τ ∈ caligraphic_T ( italic_m start_POSTSUBSCRIPT [ 2 : italic_K ] end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT italic_t italic_c start_POSTSUBSCRIPT italic_t + italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_τ ) ∇ italic_v start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ | end_POSTSUBSCRIPT ]
=t=1T[τ𝒯(m[2:K])tct(τ)v|τ|t+t1=1Tτ𝒯(m[2:K])tct+t1(τ)(v|τ|t1+tv|τ|t1+t1)].absentsubscriptsuperscript𝑇𝑡1delimited-[]subscript𝜏𝒯subscript𝑚delimited-[]:2𝐾𝑡subscript𝑐𝑡𝜏subscriptsuperscript𝑣𝑡𝜏subscriptsuperscriptsuperscript𝑇subscript𝑡11subscript𝜏𝒯subscript𝑚delimited-[]:2𝐾𝑡subscript𝑐𝑡subscript𝑡1𝜏subscriptsuperscript𝑣subscript𝑡1𝑡𝜏subscriptsuperscript𝑣subscript𝑡1𝑡1𝜏\displaystyle=\sum^{T}_{t=1}\left[\sum_{\BFtau\in\mathcal{T}(\BFm_{[2:K]})}tc_% {t}(\BFtau)\nabla v^{t}_{|\BFtau|}+\sum^{T^{\prime}}_{t_{1}=1}\sum_{\BFtau\in% \mathcal{T}(\BFm_{[2:K]})}tc_{t+t_{1}}(\BFtau)(\nabla v^{t_{1}+t}_{|\BFtau|}-% \nabla v^{t_{1}+t-1}_{|\BFtau|})\right].= ∑ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT [ ∑ start_POSTSUBSCRIPT italic_τ ∈ caligraphic_T ( italic_m start_POSTSUBSCRIPT [ 2 : italic_K ] end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT italic_t italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_τ ) ∇ italic_v start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ | end_POSTSUBSCRIPT + ∑ start_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 1 end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_τ ∈ caligraphic_T ( italic_m start_POSTSUBSCRIPT [ 2 : italic_K ] end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT italic_t italic_c start_POSTSUBSCRIPT italic_t + italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_τ ) ( ∇ italic_v start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ | end_POSTSUBSCRIPT - ∇ italic_v start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ | end_POSTSUBSCRIPT ) ] .

Similarly, we have

ψT,k2=t=1Tτ𝒯(m[2:K])tct(τ)v|τ|t+t=1T[t1=1Tτ𝒯(m[2:K])tct+t1(τ)(v|τ|t1+tv|τ|t1+t1)].subscript𝜓superscript𝑇subscript𝑘2subscriptsuperscriptsuperscript𝑇𝑡1subscript𝜏𝒯subscript𝑚delimited-[]:2𝐾𝑡subscript𝑐𝑡𝜏subscriptsuperscript𝑣𝑡𝜏subscriptsuperscriptsuperscript𝑇𝑡1delimited-[]subscriptsuperscript𝑇subscript𝑡11subscript𝜏𝒯subscript𝑚delimited-[]:2𝐾𝑡subscript𝑐𝑡subscript𝑡1𝜏subscriptsuperscript𝑣subscript𝑡1𝑡𝜏subscriptsuperscript𝑣subscript𝑡1𝑡1𝜏\psi_{T^{\prime},k_{2}}=\sum^{T^{\prime}}_{t=1}\sum_{\BFtau\in\mathcal{T}(\BFm% _{[2:K]})}tc_{t}(\BFtau)\nabla v^{t}_{|\BFtau|}+\sum^{T^{\prime}}_{t=1}\left[% \sum^{T}_{t_{1}=1}\sum_{\BFtau\in\mathcal{T}(\BFm_{[2:K]})}tc_{t+t_{1}}(\BFtau% )(\nabla v^{t_{1}+t}_{|\BFtau|}-\nabla v^{t_{1}+t-1}_{|\BFtau|})\right].italic_ψ start_POSTSUBSCRIPT italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_k start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT = ∑ start_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_τ ∈ caligraphic_T ( italic_m start_POSTSUBSCRIPT [ 2 : italic_K ] end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT italic_t italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_τ ) ∇ italic_v start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ | end_POSTSUBSCRIPT + ∑ start_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT [ ∑ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 1 end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_τ ∈ caligraphic_T ( italic_m start_POSTSUBSCRIPT [ 2 : italic_K ] end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT italic_t italic_c start_POSTSUBSCRIPT italic_t + italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_τ ) ( ∇ italic_v start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ | end_POSTSUBSCRIPT - ∇ italic_v start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ | end_POSTSUBSCRIPT ) ] .

We further introduce the following lemma 7.2 for our final proof.

Lemma 7.2

For any K𝐾Kitalic_K and and m𝑚\BFmitalic_m

τ𝒯(m)ct(τ)=1tsubscript𝜏𝒯𝑚subscript𝑐𝑡𝜏1𝑡\sum_{\BFtau\in\mathcal{T}(\BFm)}c_{t}(\BFtau)=\frac{1}{t}∑ start_POSTSUBSCRIPT italic_τ ∈ caligraphic_T ( italic_m ) end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_τ ) = divide start_ARG 1 end_ARG start_ARG italic_t end_ARG
Proof.

Proof of Lemma 7.2 Start induction from K=1𝐾1K=1italic_K = 1, m1=[m1]subscript𝑚1delimited-[]subscript𝑚1\BFm_{1}=[m_{1}]italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = [ italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ], we have

τ𝒯(𝐦1)ct(τ)subscript𝜏𝒯subscript𝐦1subscript𝑐𝑡𝜏\displaystyle\sum_{\BFtau\in\mathcal{T}(\mathbf{m}_{1})}c_{t}(\BFtau)∑ start_POSTSUBSCRIPT italic_τ ∈ caligraphic_T ( bold_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_τ ) =j=0m11[l=01C1l(1)l1j+l+t]+(1)01m1+tabsentsubscriptsuperscriptsubscript𝑚11𝑗0delimited-[]subscriptsuperscript1𝑙0subscriptsuperscript𝐶𝑙1superscript1𝑙1𝑗𝑙𝑡superscript101subscript𝑚1𝑡\displaystyle=\sum^{m_{1}-1}_{j=0}\left[\sum^{1}_{l=0}C^{l}_{1}(-1)^{l}\frac{1% }{j+l+t}\right]+(-1)^{0}\frac{1}{m_{1}+t}= ∑ start_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j = 0 end_POSTSUBSCRIPT [ ∑ start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l = 0 end_POSTSUBSCRIPT italic_C start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( - 1 ) start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_j + italic_l + italic_t end_ARG ] + ( - 1 ) start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t end_ARG
=j=0m11[1j+t1j+t+1]+1m1+t=1t.absentsubscriptsuperscriptsubscript𝑚11𝑗0delimited-[]1𝑗𝑡1𝑗𝑡11subscript𝑚1𝑡1𝑡\displaystyle=\sum^{m_{1}-1}_{j=0}\left[\frac{1}{j+t}-\frac{1}{j+t+1}\right]+% \frac{1}{m_{1}+t}=\frac{1}{t}.= ∑ start_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j = 0 end_POSTSUBSCRIPT [ divide start_ARG 1 end_ARG start_ARG italic_j + italic_t end_ARG - divide start_ARG 1 end_ARG start_ARG italic_j + italic_t + 1 end_ARG ] + divide start_ARG 1 end_ARG start_ARG italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t end_ARG = divide start_ARG 1 end_ARG start_ARG italic_t end_ARG .

Generally, suppose with 𝐦K=[m1,,mK]subscript𝐦𝐾subscript𝑚1subscript𝑚𝐾\mathbf{m}_{K}=[m_{1},\dots,m_{K}]bold_m start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT = [ italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_m start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ], the previous induction step holds

τ𝒯(𝐦K)ct(τ)=1t.subscript𝜏𝒯subscript𝐦𝐾subscript𝑐𝑡𝜏1𝑡\sum_{\BFtau\in\mathcal{T}(\mathbf{m}_{K})}c_{t}(\BFtau)=\frac{1}{t}.∑ start_POSTSUBSCRIPT italic_τ ∈ caligraphic_T ( bold_m start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_τ ) = divide start_ARG 1 end_ARG start_ARG italic_t end_ARG .

Then with 𝐦K+1=[m1,,mK,mK+1]subscript𝐦𝐾1subscript𝑚1subscript𝑚𝐾subscript𝑚𝐾1\mathbf{m}_{K+1}=[m_{1},\dots,m_{K},m_{K+1}]bold_m start_POSTSUBSCRIPT italic_K + 1 end_POSTSUBSCRIPT = [ italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_m start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT italic_K + 1 end_POSTSUBSCRIPT ], one have

τ𝒯(𝐦K+1)ct(τ)subscript𝜏𝒯subscript𝐦𝐾1subscript𝑐𝑡𝜏\displaystyle\sum_{\BFtau\in\mathcal{T}(\mathbf{m}_{K+1})}c_{t}(\BFtau)∑ start_POSTSUBSCRIPT italic_τ ∈ caligraphic_T ( bold_m start_POSTSUBSCRIPT italic_K + 1 end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_τ ) =j=0mK+11τ𝒯(𝐦K)l=0|Mk(τ)|+1C|Mk(τ)|+1l(1)l1|τ|+l+(t+j)absentsubscriptsuperscriptsubscript𝑚𝐾11𝑗0subscript𝜏𝒯subscript𝐦𝐾subscriptsuperscriptsubscript𝑀𝑘𝜏1𝑙0subscriptsuperscript𝐶𝑙subscript𝑀𝑘𝜏1superscript1𝑙1𝜏𝑙𝑡𝑗\displaystyle=\sum^{m_{K+1}-1}_{j=0}\sum_{\BFtau\in\mathcal{T}(\mathbf{m}_{K})% }\sum^{|M_{k}(\BFtau)|+1}_{l=0}C^{l}_{|M_{k}(\BFtau)|+1}(-1)^{l}\frac{1}{|% \BFtau|+l+(t+j)}= ∑ start_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT italic_K + 1 end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j = 0 end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_τ ∈ caligraphic_T ( bold_m start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ∑ start_POSTSUPERSCRIPT | italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) | + 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l = 0 end_POSTSUBSCRIPT italic_C start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) | + 1 end_POSTSUBSCRIPT ( - 1 ) start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG | italic_τ | + italic_l + ( italic_t + italic_j ) end_ARG
+τ𝒯(𝐦K)l=0|Mk(τ)|C|Mk(τ)|l(1)l1|τ|+l+(t+mK+1)subscript𝜏𝒯subscript𝐦𝐾subscriptsuperscriptsubscript𝑀𝑘𝜏𝑙0subscriptsuperscript𝐶𝑙subscript𝑀𝑘𝜏superscript1𝑙1𝜏𝑙𝑡subscript𝑚𝐾1\displaystyle\quad\quad+\sum_{\BFtau\in\mathcal{T}(\mathbf{m}_{K})}\sum^{|M_{k% }(\BFtau)|}_{l=0}C^{l}_{|M_{k}(\BFtau)|}(-1)^{l}\frac{1}{|\BFtau|+l+(t+m_{K+1})}+ ∑ start_POSTSUBSCRIPT italic_τ ∈ caligraphic_T ( bold_m start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ∑ start_POSTSUPERSCRIPT | italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) | end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l = 0 end_POSTSUBSCRIPT italic_C start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) | end_POSTSUBSCRIPT ( - 1 ) start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG | italic_τ | + italic_l + ( italic_t + italic_m start_POSTSUBSCRIPT italic_K + 1 end_POSTSUBSCRIPT ) end_ARG
=j=0mK+11τ𝒯(𝐦K)l=0|Mk(τ)|+1C|Mk(τ)|+1l(1)l1|τ|+l+(t+j)+τ𝒯(𝐦K)ct+mK+1(τ)absentsubscriptsuperscriptsubscript𝑚𝐾11𝑗0subscript𝜏𝒯subscript𝐦𝐾subscriptsuperscriptsubscript𝑀𝑘𝜏1𝑙0subscriptsuperscript𝐶𝑙subscript𝑀𝑘𝜏1superscript1𝑙1𝜏𝑙𝑡𝑗subscript𝜏𝒯subscript𝐦𝐾subscript𝑐𝑡subscript𝑚𝐾1𝜏\displaystyle=\sum^{m_{K+1}-1}_{j=0}\sum_{\BFtau\in\mathcal{T}(\mathbf{m}_{K})% }\sum^{|M_{k}(\BFtau)|+1}_{l=0}C^{l}_{|M_{k}(\BFtau)|+1}(-1)^{l}\frac{1}{|% \BFtau|+l+(t+j)}+\sum_{\BFtau\in\mathcal{T}(\mathbf{m}_{K})}c_{t+m_{K+1}}(\BFtau)= ∑ start_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT italic_K + 1 end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j = 0 end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_τ ∈ caligraphic_T ( bold_m start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ∑ start_POSTSUPERSCRIPT | italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) | + 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l = 0 end_POSTSUBSCRIPT italic_C start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) | + 1 end_POSTSUBSCRIPT ( - 1 ) start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG | italic_τ | + italic_l + ( italic_t + italic_j ) end_ARG + ∑ start_POSTSUBSCRIPT italic_τ ∈ caligraphic_T ( bold_m start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_t + italic_m start_POSTSUBSCRIPT italic_K + 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_τ )
=j=0mK+11τ𝒯(𝐦K)l=0|Mk(τ)|+1C|Mk(τ)|+1l(1)l1|τ|+l+(t+j)+1t+mK+1,absentsubscriptsuperscriptsubscript𝑚𝐾11𝑗0subscript𝜏𝒯subscript𝐦𝐾subscriptsuperscriptsubscript𝑀𝑘𝜏1𝑙0subscriptsuperscript𝐶𝑙subscript𝑀𝑘𝜏1superscript1𝑙1𝜏𝑙𝑡𝑗1𝑡subscript𝑚𝐾1\displaystyle=\sum^{m_{K+1}-1}_{j=0}\sum_{\BFtau\in\mathcal{T}(\mathbf{m}_{K})% }\sum^{|M_{k}(\BFtau)|+1}_{l=0}C^{l}_{|M_{k}(\BFtau)|+1}(-1)^{l}\frac{1}{|% \BFtau|+l+(t+j)}+\frac{1}{t+m_{K+1}},= ∑ start_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT italic_K + 1 end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j = 0 end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_τ ∈ caligraphic_T ( bold_m start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ∑ start_POSTSUPERSCRIPT | italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) | + 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l = 0 end_POSTSUBSCRIPT italic_C start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) | + 1 end_POSTSUBSCRIPT ( - 1 ) start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG | italic_τ | + italic_l + ( italic_t + italic_j ) end_ARG + divide start_ARG 1 end_ARG start_ARG italic_t + italic_m start_POSTSUBSCRIPT italic_K + 1 end_POSTSUBSCRIPT end_ARG ,

where the last equality holds by induction hypothesis. Define CM1=0subscriptsuperscript𝐶1𝑀0C^{-1}_{M}=0italic_C start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT = 0 for all M𝑀Mitalic_M, we have

τ𝒯(𝐦K)l=0|Mk(τ)|+1C|Mk(τ)|+1l(1)l1|τ|+l+(t+j)subscript𝜏𝒯subscript𝐦𝐾subscriptsuperscriptsubscript𝑀𝑘𝜏1𝑙0subscriptsuperscript𝐶𝑙subscript𝑀𝑘𝜏1superscript1𝑙1𝜏𝑙𝑡𝑗\displaystyle\sum_{\BFtau\in\mathcal{T}(\mathbf{m}_{K})}\sum^{|M_{k}(\BFtau)|+% 1}_{l=0}C^{l}_{|M_{k}(\BFtau)|+1}(-1)^{l}\frac{1}{|\BFtau|+l+(t+j)}∑ start_POSTSUBSCRIPT italic_τ ∈ caligraphic_T ( bold_m start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ∑ start_POSTSUPERSCRIPT | italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) | + 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l = 0 end_POSTSUBSCRIPT italic_C start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) | + 1 end_POSTSUBSCRIPT ( - 1 ) start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG | italic_τ | + italic_l + ( italic_t + italic_j ) end_ARG
=τ𝒯(𝐦K)[l=0|Mk(τ)|C|Mk(τ)|+1l(1)l1|τ|+l+(t+j)+(1)|Mk(τ)|+11|τ|+(|Mk(τ)|+1)+t+j]absentsubscript𝜏𝒯subscript𝐦𝐾delimited-[]subscriptsuperscriptsubscript𝑀𝑘𝜏𝑙0subscriptsuperscript𝐶𝑙subscript𝑀𝑘𝜏1superscript1𝑙1𝜏𝑙𝑡𝑗superscript1subscript𝑀𝑘𝜏11𝜏subscript𝑀𝑘𝜏1𝑡𝑗\displaystyle=\sum_{\BFtau\in\mathcal{T}(\mathbf{m}_{K})}\left[\sum^{|M_{k}(% \BFtau)|}_{l=0}C^{l}_{|M_{k}(\BFtau)|+1}(-1)^{l}\frac{1}{|\BFtau|+l+(t+j)}+(-1% )^{|M_{k}(\BFtau)|+1}\frac{1}{|\BFtau|+(|M_{k}(\BFtau)|+1)+t+j}\right]= ∑ start_POSTSUBSCRIPT italic_τ ∈ caligraphic_T ( bold_m start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT [ ∑ start_POSTSUPERSCRIPT | italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) | end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l = 0 end_POSTSUBSCRIPT italic_C start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) | + 1 end_POSTSUBSCRIPT ( - 1 ) start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG | italic_τ | + italic_l + ( italic_t + italic_j ) end_ARG + ( - 1 ) start_POSTSUPERSCRIPT | italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) | + 1 end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG | italic_τ | + ( | italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) | + 1 ) + italic_t + italic_j end_ARG ]
=τ𝒯(𝐦K)[l=0|Mk(τ)|[C|Mk(τ)|l1+C|Mk(τ)|l](1)l1|τ|+l+(t+j)+(1)|Mk(τ)|+11|τ|+(|Mk(τ)|+1)+t+j].absentsubscript𝜏𝒯subscript𝐦𝐾delimited-[]subscriptsuperscriptsubscript𝑀𝑘𝜏𝑙0delimited-[]subscriptsuperscript𝐶𝑙1subscript𝑀𝑘𝜏subscriptsuperscript𝐶𝑙subscript𝑀𝑘𝜏superscript1𝑙1𝜏𝑙𝑡𝑗superscript1subscript𝑀𝑘𝜏11𝜏subscript𝑀𝑘𝜏1𝑡𝑗\displaystyle=\sum_{\BFtau\in\mathcal{T}(\mathbf{m}_{K})}\left[\sum^{|M_{k}(% \BFtau)|}_{l=0}[C^{l-1}_{|M_{k}(\BFtau)|}+C^{l}_{|M_{k}(\BFtau)|}](-1)^{l}% \frac{1}{|\BFtau|+l+(t+j)}+(-1)^{|M_{k}(\BFtau)|+1}\frac{1}{|\BFtau|+(|M_{k}(% \BFtau)|+1)+t+j}\right].= ∑ start_POSTSUBSCRIPT italic_τ ∈ caligraphic_T ( bold_m start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT [ ∑ start_POSTSUPERSCRIPT | italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) | end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l = 0 end_POSTSUBSCRIPT [ italic_C start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) | end_POSTSUBSCRIPT + italic_C start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) | end_POSTSUBSCRIPT ] ( - 1 ) start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG | italic_τ | + italic_l + ( italic_t + italic_j ) end_ARG + ( - 1 ) start_POSTSUPERSCRIPT | italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) | + 1 end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG | italic_τ | + ( | italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) | + 1 ) + italic_t + italic_j end_ARG ] .

Again, for the previous equations, we apply the recurrence relation of binomial coefficient. Moreover, for the above equation we have

=τ𝒯(𝐦K)[l=1|Mk(τ)|C|Mk(τ)|l1(1)l1|τ|+l+(t+j)+(1)|Mk(τ)|+11|τ|+(|Mk(τ)|+1)+t+j]absentsubscript𝜏𝒯subscript𝐦𝐾delimited-[]subscriptsuperscriptsubscript𝑀𝑘𝜏𝑙1subscriptsuperscript𝐶𝑙1subscript𝑀𝑘𝜏superscript1𝑙1𝜏𝑙𝑡𝑗superscript1subscript𝑀𝑘𝜏11𝜏subscript𝑀𝑘𝜏1𝑡𝑗\displaystyle=\sum_{\BFtau\in\mathcal{T}(\mathbf{m}_{K})}\left[\sum^{|M_{k}(% \BFtau)|}_{l=1}C^{l-1}_{|M_{k}(\BFtau)|}(-1)^{l}\frac{1}{|\BFtau|+l+(t+j)}+(-1% )^{|M_{k}(\BFtau)|+1}\frac{1}{|\BFtau|+(|M_{k}(\BFtau)|+1)+t+j}\right]= ∑ start_POSTSUBSCRIPT italic_τ ∈ caligraphic_T ( bold_m start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT [ ∑ start_POSTSUPERSCRIPT | italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) | end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT italic_C start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) | end_POSTSUBSCRIPT ( - 1 ) start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG | italic_τ | + italic_l + ( italic_t + italic_j ) end_ARG + ( - 1 ) start_POSTSUPERSCRIPT | italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) | + 1 end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG | italic_τ | + ( | italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) | + 1 ) + italic_t + italic_j end_ARG ]
+τ𝒯(𝐦K)[l=0|Mk(τ)|C|Mk(τ)|l(1)l1|τ|+l+(t+j)]subscript𝜏𝒯subscript𝐦𝐾delimited-[]subscriptsuperscriptsubscript𝑀𝑘𝜏𝑙0subscriptsuperscript𝐶𝑙subscript𝑀𝑘𝜏superscript1𝑙1𝜏𝑙𝑡𝑗\displaystyle\quad+\sum_{\BFtau\in\mathcal{T}(\mathbf{m}_{K})}\left[\sum^{|M_{% k}(\BFtau)|}_{l=0}C^{l}_{|M_{k}(\BFtau)|}(-1)^{l}\frac{1}{|\BFtau|+l+(t+j)}\right]+ ∑ start_POSTSUBSCRIPT italic_τ ∈ caligraphic_T ( bold_m start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT [ ∑ start_POSTSUPERSCRIPT | italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) | end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l = 0 end_POSTSUBSCRIPT italic_C start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) | end_POSTSUBSCRIPT ( - 1 ) start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG | italic_τ | + italic_l + ( italic_t + italic_j ) end_ARG ]
=τ𝒯(𝐦K)[l=1|Mk(τ)|C|Mk(τ)|l1(1)l1|τ|+l+(t+j)+(1)|Mk(τ)|+11|τ|+(|Mk(τ)|+1)+t+j]absentsubscript𝜏𝒯subscript𝐦𝐾delimited-[]subscriptsuperscriptsubscript𝑀𝑘𝜏𝑙1subscriptsuperscript𝐶𝑙1subscript𝑀𝑘𝜏superscript1𝑙1𝜏𝑙𝑡𝑗superscript1subscript𝑀𝑘𝜏11𝜏subscript𝑀𝑘𝜏1𝑡𝑗\displaystyle=\sum_{\BFtau\in\mathcal{T}(\mathbf{m}_{K})}\left[\sum^{|M_{k}(% \BFtau)|}_{l=1}C^{l-1}_{|M_{k}(\BFtau)|}(-1)^{l}\frac{1}{|\BFtau|+l+(t+j)}+(-1% )^{|M_{k}(\BFtau)|+1}\frac{1}{|\BFtau|+(|M_{k}(\BFtau)|+1)+t+j}\right]= ∑ start_POSTSUBSCRIPT italic_τ ∈ caligraphic_T ( bold_m start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT [ ∑ start_POSTSUPERSCRIPT | italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) | end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT italic_C start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) | end_POSTSUBSCRIPT ( - 1 ) start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG | italic_τ | + italic_l + ( italic_t + italic_j ) end_ARG + ( - 1 ) start_POSTSUPERSCRIPT | italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) | + 1 end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG | italic_τ | + ( | italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) | + 1 ) + italic_t + italic_j end_ARG ]
+τ𝒯(𝐦K)ct+j(τ)subscript𝜏𝒯subscript𝐦𝐾subscript𝑐𝑡𝑗𝜏\displaystyle\quad+\sum_{\BFtau\in\mathcal{T}(\mathbf{m}_{K})}c_{t+j}(\BFtau)+ ∑ start_POSTSUBSCRIPT italic_τ ∈ caligraphic_T ( bold_m start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_t + italic_j end_POSTSUBSCRIPT ( italic_τ )
=τ𝒯(𝐦K)[l=1|Mk(τ)|C|Mk(τ)|l1(1)l1|τ|+l+(t+j)+(1)|Mk(τ)|+11|τ|+(|Mk(τ)|+1)+t+j]+1t+jabsentsubscript𝜏𝒯subscript𝐦𝐾delimited-[]subscriptsuperscriptsubscript𝑀𝑘𝜏𝑙1subscriptsuperscript𝐶𝑙1subscript𝑀𝑘𝜏superscript1𝑙1𝜏𝑙𝑡𝑗superscript1subscript𝑀𝑘𝜏11𝜏subscript𝑀𝑘𝜏1𝑡𝑗1𝑡𝑗\displaystyle=\sum_{\BFtau\in\mathcal{T}(\mathbf{m}_{K})}\left[\sum^{|M_{k}(% \BFtau)|}_{l=1}C^{l-1}_{|M_{k}(\BFtau)|}(-1)^{l}\frac{1}{|\BFtau|+l+(t+j)}+(-1% )^{|M_{k}(\BFtau)|+1}\frac{1}{|\BFtau|+(|M_{k}(\BFtau)|+1)+t+j}\right]+\frac{1% }{t+j}= ∑ start_POSTSUBSCRIPT italic_τ ∈ caligraphic_T ( bold_m start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT [ ∑ start_POSTSUPERSCRIPT | italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) | end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT italic_C start_POSTSUPERSCRIPT italic_l - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) | end_POSTSUBSCRIPT ( - 1 ) start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG | italic_τ | + italic_l + ( italic_t + italic_j ) end_ARG + ( - 1 ) start_POSTSUPERSCRIPT | italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) | + 1 end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG | italic_τ | + ( | italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) | + 1 ) + italic_t + italic_j end_ARG ] + divide start_ARG 1 end_ARG start_ARG italic_t + italic_j end_ARG
=τ𝒯(𝐦K)[i=0|Mk(τ)|1C|Mk(τ)|i(1)i+11|τ|+(i+1)+(t+j)+(1)|Mk(τ)|+11|τ|+(|Mk(τ)|+1)+t+j]absentsubscript𝜏𝒯subscript𝐦𝐾delimited-[]subscriptsuperscriptsubscript𝑀𝑘𝜏1superscript𝑖0subscriptsuperscript𝐶superscript𝑖subscript𝑀𝑘𝜏superscript1superscript𝑖11𝜏superscript𝑖1𝑡𝑗superscript1subscript𝑀𝑘𝜏11𝜏subscript𝑀𝑘𝜏1𝑡𝑗\displaystyle=\sum_{\BFtau\in\mathcal{T}(\mathbf{m}_{K})}\left[\sum^{|M_{k}(% \BFtau)|-1}_{i^{\prime}=0}C^{i^{\prime}}_{|M_{k}(\BFtau)|}(-1)^{i^{\prime}+1}% \frac{1}{|\BFtau|+(i^{\prime}+1)+(t+j)}+(-1)^{|M_{k}(\BFtau)|+1}\frac{1}{|% \BFtau|+(|M_{k}(\BFtau)|+1)+t+j}\right]= ∑ start_POSTSUBSCRIPT italic_τ ∈ caligraphic_T ( bold_m start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT [ ∑ start_POSTSUPERSCRIPT | italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) | - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = 0 end_POSTSUBSCRIPT italic_C start_POSTSUPERSCRIPT italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) | end_POSTSUBSCRIPT ( - 1 ) start_POSTSUPERSCRIPT italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT + 1 end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG | italic_τ | + ( italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT + 1 ) + ( italic_t + italic_j ) end_ARG + ( - 1 ) start_POSTSUPERSCRIPT | italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) | + 1 end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG | italic_τ | + ( | italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) | + 1 ) + italic_t + italic_j end_ARG ]
+1t+j.1𝑡𝑗\displaystyle\quad+\frac{1}{t+j}.+ divide start_ARG 1 end_ARG start_ARG italic_t + italic_j end_ARG .

The above relations hold because we apply the definition of ct+j(τ)subscript𝑐𝑡𝑗𝜏c_{t+j}(\BFtau)italic_c start_POSTSUBSCRIPT italic_t + italic_j end_POSTSUBSCRIPT ( italic_τ ), and we use the induction hypothesis. Further, take out the common coefficient (1)1(-1)( - 1 ), we have

=(1)τ𝒯(𝐦K)[i=0|Mk(τ)|1C|Mk(τ)|i(1)l1|τ|+(i+1)+(t+j)+(1)|Mk(τ)|1|τ|+(|Mk(τ)|+1)+t+j]absent1subscript𝜏𝒯subscript𝐦𝐾delimited-[]subscriptsuperscriptsubscript𝑀𝑘𝜏1superscript𝑖0subscriptsuperscript𝐶superscript𝑖subscript𝑀𝑘𝜏superscript1𝑙1𝜏superscript𝑖1𝑡𝑗superscript1subscript𝑀𝑘𝜏1𝜏subscript𝑀𝑘𝜏1𝑡𝑗\displaystyle=(-1)\sum_{\BFtau\in\mathcal{T}(\mathbf{m}_{K})}\left[\sum^{|M_{k% }(\BFtau)|-1}_{i^{\prime}=0}C^{i^{\prime}}_{|M_{k}(\BFtau)|}(-1)^{l}\frac{1}{|% \BFtau|+(i^{\prime}+1)+(t+j)}+(-1)^{|M_{k}(\BFtau)|}\frac{1}{|\BFtau|+(|M_{k}(% \BFtau)|+1)+t+j}\right]= ( - 1 ) ∑ start_POSTSUBSCRIPT italic_τ ∈ caligraphic_T ( bold_m start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT [ ∑ start_POSTSUPERSCRIPT | italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) | - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = 0 end_POSTSUBSCRIPT italic_C start_POSTSUPERSCRIPT italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) | end_POSTSUBSCRIPT ( - 1 ) start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG | italic_τ | + ( italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT + 1 ) + ( italic_t + italic_j ) end_ARG + ( - 1 ) start_POSTSUPERSCRIPT | italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) | end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG | italic_τ | + ( | italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) | + 1 ) + italic_t + italic_j end_ARG ]
+1t+j1𝑡𝑗\displaystyle\quad+\frac{1}{t+j}+ divide start_ARG 1 end_ARG start_ARG italic_t + italic_j end_ARG
=(1)τ𝒯(𝐦K)[i=0|Mk(τ)|C|Mk(τ)|i(1)l1|τ|+i+(t+j+1)]+1t+jabsent1subscript𝜏𝒯subscript𝐦𝐾delimited-[]subscriptsuperscriptsubscript𝑀𝑘𝜏superscript𝑖0subscriptsuperscript𝐶superscript𝑖subscript𝑀𝑘𝜏superscript1𝑙1𝜏superscript𝑖𝑡𝑗11𝑡𝑗\displaystyle=(-1)\sum_{\BFtau\in\mathcal{T}(\mathbf{m}_{K})}\left[\sum^{|M_{k% }(\BFtau)|}_{i^{\prime}=0}C^{i^{\prime}}_{|M_{k}(\BFtau)|}(-1)^{l}\frac{1}{|% \BFtau|+i^{\prime}+(t+j+1)}\right]+\frac{1}{t+j}= ( - 1 ) ∑ start_POSTSUBSCRIPT italic_τ ∈ caligraphic_T ( bold_m start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT [ ∑ start_POSTSUPERSCRIPT | italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) | end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = 0 end_POSTSUBSCRIPT italic_C start_POSTSUPERSCRIPT italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) | end_POSTSUBSCRIPT ( - 1 ) start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG | italic_τ | + italic_i start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT + ( italic_t + italic_j + 1 ) end_ARG ] + divide start_ARG 1 end_ARG start_ARG italic_t + italic_j end_ARG
=(1)τ𝒯(𝐦K)ct+j+1(τ)+1t+j=1t+j1t+j+1.absent1subscript𝜏𝒯subscript𝐦𝐾subscript𝑐𝑡𝑗1𝜏1𝑡𝑗1𝑡𝑗1𝑡𝑗1\displaystyle=(-1)\sum_{\BFtau\in\mathcal{T}(\mathbf{m}_{K})}c_{t+j+1}(\BFtau)% +\frac{1}{t+j}=\frac{1}{t+j}-\frac{1}{t+j+1}.= ( - 1 ) ∑ start_POSTSUBSCRIPT italic_τ ∈ caligraphic_T ( bold_m start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_t + italic_j + 1 end_POSTSUBSCRIPT ( italic_τ ) + divide start_ARG 1 end_ARG start_ARG italic_t + italic_j end_ARG = divide start_ARG 1 end_ARG start_ARG italic_t + italic_j end_ARG - divide start_ARG 1 end_ARG start_ARG italic_t + italic_j + 1 end_ARG .

Hence

τ𝒯(𝐦K+1)ct(τ)=j=0mK+11[1t+j1t+j+1]+1t+mK+1=1t.subscript𝜏𝒯subscript𝐦𝐾1subscript𝑐𝑡𝜏subscriptsuperscriptsubscript𝑚𝐾11𝑗0delimited-[]1𝑡𝑗1𝑡𝑗11𝑡subscript𝑚𝐾11𝑡\displaystyle\sum_{\BFtau\in\mathcal{T}(\mathbf{m}_{K+1})}c_{t}(\BFtau)=\sum^{% m_{K+1}-1}_{j=0}\left[\frac{1}{t+j}-\frac{1}{t+j+1}\right]+\frac{1}{t+m_{K+1}}% =\frac{1}{t}.∑ start_POSTSUBSCRIPT italic_τ ∈ caligraphic_T ( bold_m start_POSTSUBSCRIPT italic_K + 1 end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_τ ) = ∑ start_POSTSUPERSCRIPT italic_m start_POSTSUBSCRIPT italic_K + 1 end_POSTSUBSCRIPT - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j = 0 end_POSTSUBSCRIPT [ divide start_ARG 1 end_ARG start_ARG italic_t + italic_j end_ARG - divide start_ARG 1 end_ARG start_ARG italic_t + italic_j + 1 end_ARG ] + divide start_ARG 1 end_ARG start_ARG italic_t + italic_m start_POSTSUBSCRIPT italic_K + 1 end_POSTSUBSCRIPT end_ARG = divide start_ARG 1 end_ARG start_ARG italic_t end_ARG .

And the proof is complete. ∎

With lemma 7.2,we can show the following lemma 7.3 holds.

Lemma 7.3

For any K,T,T𝐾𝑇superscript𝑇normal-′K,T,T^{\prime}italic_K , italic_T , italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, and any maximum data vector m[2:K]subscript𝑚delimited-[]normal-:2𝐾\BFm_{[2:K]}italic_m start_POSTSUBSCRIPT [ 2 : italic_K ] end_POSTSUBSCRIPT, we have

t=1Tτ𝒯(m[2:K])[tct(τ)v|τ|t(t+T)ct+T(τ)v|τ|t+T]subscriptsuperscriptsuperscript𝑇𝑡1subscript𝜏𝒯subscript𝑚delimited-[]:2𝐾delimited-[]𝑡subscript𝑐𝑡𝜏subscriptsuperscript𝑣𝑡𝜏𝑡𝑇subscript𝑐𝑡𝑇𝜏subscriptsuperscript𝑣𝑡𝑇𝜏\displaystyle\sum^{T^{\prime}}_{t=1}\sum_{\BFtau\in\mathcal{T}(\BFm_{[2:K]})}% \left[tc_{t}(\BFtau)\nabla v^{t}_{|\BFtau|}-(t+T)c_{t+T}(\BFtau)\nabla v^{t+T}% _{|\BFtau|}\right]∑ start_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_τ ∈ caligraphic_T ( italic_m start_POSTSUBSCRIPT [ 2 : italic_K ] end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT [ italic_t italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_τ ) ∇ italic_v start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ | end_POSTSUBSCRIPT - ( italic_t + italic_T ) italic_c start_POSTSUBSCRIPT italic_t + italic_T end_POSTSUBSCRIPT ( italic_τ ) ∇ italic_v start_POSTSUPERSCRIPT italic_t + italic_T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ | end_POSTSUBSCRIPT ]
=t=0T1t1=1Tτ𝒯(m[2:K])[(t1+t)ct1+t(τ)v|τ|t1+t(t1+t+1)ct1+t+1(τ)v|τ|t1+t+1].absentsuperscriptsubscript𝑡0𝑇1subscriptsuperscriptsuperscript𝑇subscript𝑡11subscript𝜏𝒯subscript𝑚delimited-[]:2𝐾delimited-[]subscript𝑡1𝑡subscript𝑐subscript𝑡1𝑡𝜏subscriptsuperscript𝑣subscript𝑡1𝑡𝜏subscript𝑡1𝑡1subscript𝑐subscript𝑡1𝑡1𝜏subscriptsuperscript𝑣subscript𝑡1𝑡1𝜏\displaystyle=\sum_{t=0}^{T-1}\sum^{T^{\prime}}_{t_{1}=1}\sum_{\BFtau\in% \mathcal{T}(\BFm_{[2:K]})}\left[(t_{1}+t)c_{t_{1}+t}(\BFtau)\nabla v^{t_{1}+t}% _{|\BFtau|}-(t_{1}+t+1)c_{t_{1}+t+1}(\BFtau)\nabla v^{t_{1}+t+1}_{|\BFtau|}% \right].= ∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 1 end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_τ ∈ caligraphic_T ( italic_m start_POSTSUBSCRIPT [ 2 : italic_K ] end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT [ ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t ) italic_c start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t end_POSTSUBSCRIPT ( italic_τ ) ∇ italic_v start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ | end_POSTSUBSCRIPT - ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t + 1 ) italic_c start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t + 1 end_POSTSUBSCRIPT ( italic_τ ) ∇ italic_v start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t + 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ | end_POSTSUBSCRIPT ] .
Proof.

Proof of Lemma 7.3 For any fixed t1[1,,T]subscript𝑡11superscript𝑇t_{1}\in[1,\dots,T^{\prime}]italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ [ 1 , … , italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ] and τ𝒯(m[2:K])𝜏𝒯subscript𝑚delimited-[]:2𝐾\BFtau\in\mathcal{T}(\BFm_{[2:K]})italic_τ ∈ caligraphic_T ( italic_m start_POSTSUBSCRIPT [ 2 : italic_K ] end_POSTSUBSCRIPT ),

t1ct1(τ)v|τ|t1(t1+T)ct1+T(τ)v|τ|t1+Tsubscript𝑡1subscript𝑐subscript𝑡1𝜏subscriptsuperscript𝑣subscript𝑡1𝜏subscript𝑡1𝑇subscript𝑐subscript𝑡1𝑇𝜏subscriptsuperscript𝑣subscript𝑡1𝑇𝜏\displaystyle t_{1}c_{t_{1}}(\BFtau)\nabla v^{t_{1}}_{|\BFtau|}-(t_{1}+T)c_{t_% {1}+T}(\BFtau)\nabla v^{t_{1}+T}_{|\BFtau|}italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_τ ) ∇ italic_v start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ | end_POSTSUBSCRIPT - ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_T ) italic_c start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_T end_POSTSUBSCRIPT ( italic_τ ) ∇ italic_v start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ | end_POSTSUBSCRIPT
=t=t1t1+T1[tct(τ)v|τ|t(t+1)ct+1(τ)v|τ|t+1]absentsuperscriptsubscriptsuperscript𝑡subscript𝑡1subscript𝑡1𝑇1delimited-[]superscript𝑡subscript𝑐superscript𝑡𝜏subscriptsuperscript𝑣superscript𝑡𝜏superscript𝑡1subscript𝑐superscript𝑡1𝜏subscriptsuperscript𝑣superscript𝑡1𝜏\displaystyle=\sum_{t^{\prime}=t_{1}}^{t_{1}+T-1}\left[t^{\prime}c_{t^{\prime}% }(\BFtau)\nabla v^{t^{\prime}}_{|\BFtau|}-(t^{\prime}+1)c_{t^{\prime}+1}(% \BFtau)\nabla v^{t^{\prime}+1}_{|\BFtau|}\right]= ∑ start_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_T - 1 end_POSTSUPERSCRIPT [ italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_c start_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_τ ) ∇ italic_v start_POSTSUPERSCRIPT italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ | end_POSTSUBSCRIPT - ( italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT + 1 ) italic_c start_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT + 1 end_POSTSUBSCRIPT ( italic_τ ) ∇ italic_v start_POSTSUPERSCRIPT italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT + 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ | end_POSTSUBSCRIPT ]
=t=0T1[(t1+t)ct1+t(τ)v|τ|t1+t(t1+t+1)ct1+t+1(τ)v|τ|t1+t+1].absentsuperscriptsubscript𝑡0𝑇1delimited-[]subscript𝑡1𝑡subscript𝑐subscript𝑡1𝑡𝜏subscriptsuperscript𝑣subscript𝑡1𝑡𝜏subscript𝑡1𝑡1subscript𝑐subscript𝑡1𝑡1𝜏subscriptsuperscript𝑣subscript𝑡1𝑡1𝜏\displaystyle=\sum_{t=0}^{T-1}\left[(t_{1}+t)c_{t_{1}+t}(\BFtau)\nabla v^{t_{1% }+t}_{|\BFtau|}-(t_{1}+t+1)c_{t_{1}+t+1}(\BFtau)\nabla v^{t_{1}+t+1}_{|\BFtau|% }\right].= ∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT [ ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t ) italic_c start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t end_POSTSUBSCRIPT ( italic_τ ) ∇ italic_v start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ | end_POSTSUBSCRIPT - ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t + 1 ) italic_c start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t + 1 end_POSTSUBSCRIPT ( italic_τ ) ∇ italic_v start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t + 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ | end_POSTSUBSCRIPT ] .

Thus the desired result follows. ∎

We introduce the final lemma before the main proof of the theorem.

Lemma 7.4

For any k𝑘kitalic_k, n+𝑛superscriptn\in\mathbb{N}^{+}italic_n ∈ blackboard_N start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT, we have

k=0nCnk(1)kx+k=n!Πk=0n(x+k).superscriptsubscript𝑘0𝑛superscriptsubscript𝐶𝑛𝑘superscript1𝑘𝑥𝑘𝑛superscriptsubscriptΠ𝑘0𝑛𝑥𝑘\sum_{k=0}^{n}C_{n}^{k}\frac{(-1)^{k}}{x+k}=\frac{n!}{\Pi_{k=0}^{n}(x+k)}.∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_C start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT divide start_ARG ( - 1 ) start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT end_ARG start_ARG italic_x + italic_k end_ARG = divide start_ARG italic_n ! end_ARG start_ARG roman_Π start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_x + italic_k ) end_ARG .
Proof.

Proof of Lemma 7.4 We consider the partial fraction expansion of

H(x):=1Πl=0n(x+k).assign𝐻𝑥1superscriptsubscriptΠ𝑙0𝑛𝑥𝑘H(x):=\frac{1}{\Pi_{l=0}^{n}(x+k)}.italic_H ( italic_x ) := divide start_ARG 1 end_ARG start_ARG roman_Π start_POSTSUBSCRIPT italic_l = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_x + italic_k ) end_ARG .

Since k=n,,0𝑘𝑛0k=-n,\dots,0italic_k = - italic_n , … , 0 are simple poles of H(x)𝐻𝑥H(x)italic_H ( italic_x ), then there exists a decomposition

H(x)=k=0nakx+k,𝐻𝑥superscriptsubscript𝑘0𝑛subscript𝑎𝑘𝑥𝑘H(x)=\sum_{k=0}^{n}\frac{a_{k}}{x+k},italic_H ( italic_x ) = ∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT divide start_ARG italic_a start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG italic_x + italic_k end_ARG ,

and

aisubscript𝑎𝑖\displaystyle a_{i}italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT =limxk(x+k)H(x)absentsubscript𝑥𝑘𝑥𝑘𝐻𝑥\displaystyle=\lim_{x\rightarrow-k}(x+k)H(x)= roman_lim start_POSTSUBSCRIPT italic_x → - italic_k end_POSTSUBSCRIPT ( italic_x + italic_k ) italic_H ( italic_x )
=limxk1(k)(k+1)(1)1(1)(k+n)absentsubscript𝑥𝑘1𝑘𝑘1111𝑘𝑛\displaystyle=\lim_{x\rightarrow-k}\frac{1}{(-k)(-k+1)\cdots(-1)}\cdot\frac{1}% {(1)\cdots(-k+n)}= roman_lim start_POSTSUBSCRIPT italic_x → - italic_k end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG ( - italic_k ) ( - italic_k + 1 ) ⋯ ( - 1 ) end_ARG ⋅ divide start_ARG 1 end_ARG start_ARG ( 1 ) ⋯ ( - italic_k + italic_n ) end_ARG
=(1)kk!(nk)!.absentsuperscript1𝑘𝑘𝑛𝑘\displaystyle=\frac{(-1)^{k}}{k!(n-k)!}.= divide start_ARG ( - 1 ) start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT end_ARG start_ARG italic_k ! ( italic_n - italic_k ) ! end_ARG .

Therefore, we have

n!H(x)𝑛𝐻𝑥\displaystyle n!H(x)italic_n ! italic_H ( italic_x ) =k=0n(1)kn!k!(nk)!1x+kabsentsuperscriptsubscript𝑘0𝑛superscript1𝑘𝑛𝑘𝑛𝑘1𝑥𝑘\displaystyle=\sum_{k=0}^{n}\frac{(-1)^{k}n!}{k!(n-k)!}\frac{1}{x+k}= ∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT divide start_ARG ( - 1 ) start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_n ! end_ARG start_ARG italic_k ! ( italic_n - italic_k ) ! end_ARG divide start_ARG 1 end_ARG start_ARG italic_x + italic_k end_ARG
=k=0nCnk(1)kx+k.absentsuperscriptsubscript𝑘0𝑛superscriptsubscript𝐶𝑛𝑘superscript1𝑘𝑥𝑘\displaystyle=\sum_{k=0}^{n}C_{n}^{k}\frac{(-1)^{k}}{x+k}.= ∑ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_C start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT divide start_ARG ( - 1 ) start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT end_ARG start_ARG italic_x + italic_k end_ARG .

With all the above lemmas, we are now ready to prove the main theorem.

Proof.

Proof of Theorem 4.1

By Lemma 7.1, we have

ψT,k1+ψT,k2subscript𝜓𝑇subscript𝑘1subscript𝜓superscript𝑇subscript𝑘2\displaystyle\psi_{T,k_{1}}+\psi_{T^{\prime},k_{2}}italic_ψ start_POSTSUBSCRIPT italic_T , italic_k start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT + italic_ψ start_POSTSUBSCRIPT italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_k start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT =t=1Tτ𝒯(m[2:K])tct(τ)v|τ|t+t=1T[t1=1Tτ𝒯(m[2:K])tct+t1(τ)(v|τ|t1+tv|τ|t1+t1)]absentsubscriptsuperscript𝑇𝑡1subscript𝜏𝒯subscript𝑚delimited-[]:2𝐾𝑡subscript𝑐𝑡𝜏subscriptsuperscript𝑣𝑡𝜏subscriptsuperscript𝑇𝑡1delimited-[]subscriptsuperscriptsuperscript𝑇subscript𝑡11subscript𝜏𝒯subscript𝑚delimited-[]:2𝐾𝑡subscript𝑐𝑡subscript𝑡1𝜏subscriptsuperscript𝑣subscript𝑡1𝑡𝜏subscriptsuperscript𝑣subscript𝑡1𝑡1𝜏\displaystyle=\sum^{T}_{t=1}\sum_{\BFtau\in\mathcal{T}(\BFm_{[2:K]})}tc_{t}(% \BFtau)\nabla v^{t}_{|\BFtau|}+\sum^{T}_{t=1}\left[\sum^{T^{\prime}}_{t_{1}=1}% \sum_{\BFtau\in\mathcal{T}(\BFm_{[2:K]})}tc_{t+t_{1}}(\BFtau)(\nabla v^{t_{1}+% t}_{|\BFtau|}-\nabla v^{t_{1}+t-1}_{|\BFtau|})\right]= ∑ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_τ ∈ caligraphic_T ( italic_m start_POSTSUBSCRIPT [ 2 : italic_K ] end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT italic_t italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_τ ) ∇ italic_v start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ | end_POSTSUBSCRIPT + ∑ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT [ ∑ start_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 1 end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_τ ∈ caligraphic_T ( italic_m start_POSTSUBSCRIPT [ 2 : italic_K ] end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT italic_t italic_c start_POSTSUBSCRIPT italic_t + italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_τ ) ( ∇ italic_v start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ | end_POSTSUBSCRIPT - ∇ italic_v start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ | end_POSTSUBSCRIPT ) ]
+t=1Tτ𝒯(m[2:K])tct(τ)v|τ|t+t=1T[t1=1Tτ𝒯(m[2:K])tct+t1(τ)(v|τ|t1+tv|τ|t1+t1)]subscriptsuperscriptsuperscript𝑇𝑡1subscript𝜏𝒯subscript𝑚delimited-[]:2𝐾𝑡subscript𝑐𝑡𝜏subscriptsuperscript𝑣𝑡𝜏subscriptsuperscriptsuperscript𝑇𝑡1delimited-[]subscriptsuperscript𝑇subscript𝑡11subscript𝜏𝒯subscript𝑚delimited-[]:2𝐾𝑡subscript𝑐𝑡subscript𝑡1𝜏subscriptsuperscript𝑣subscript𝑡1𝑡𝜏subscriptsuperscript𝑣subscript𝑡1𝑡1𝜏\displaystyle\quad+\sum^{T^{\prime}}_{t=1}\sum_{\BFtau\in\mathcal{T}(\BFm_{[2:% K]})}tc_{t}(\BFtau)\nabla v^{t}_{|\BFtau|}+\sum^{T^{\prime}}_{t=1}\left[\sum^{% T}_{t_{1}=1}\sum_{\BFtau\in\mathcal{T}(\BFm_{[2:K]})}tc_{t+t_{1}}(\BFtau)(% \nabla v^{t_{1}+t}_{|\BFtau|}-\nabla v^{t_{1}+t-1}_{|\BFtau|})\right]+ ∑ start_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_τ ∈ caligraphic_T ( italic_m start_POSTSUBSCRIPT [ 2 : italic_K ] end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT italic_t italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_τ ) ∇ italic_v start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ | end_POSTSUBSCRIPT + ∑ start_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT [ ∑ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 1 end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_τ ∈ caligraphic_T ( italic_m start_POSTSUBSCRIPT [ 2 : italic_K ] end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT italic_t italic_c start_POSTSUBSCRIPT italic_t + italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_τ ) ( ∇ italic_v start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ | end_POSTSUBSCRIPT - ∇ italic_v start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ | end_POSTSUBSCRIPT ) ]
=t=1Tτ𝒯(m[2:K])tct(τ)v|τ|t+t=1Tτ𝒯(m[2:K])tct(τ)v|τ|tabsentsubscriptsuperscript𝑇𝑡1subscript𝜏𝒯subscript𝑚delimited-[]:2𝐾𝑡subscript𝑐𝑡𝜏subscriptsuperscript𝑣𝑡𝜏subscriptsuperscriptsuperscript𝑇𝑡1subscript𝜏𝒯subscript𝑚delimited-[]:2𝐾𝑡subscript𝑐𝑡𝜏subscriptsuperscript𝑣𝑡𝜏\displaystyle=\sum^{T}_{t=1}\sum_{\BFtau\in\mathcal{T}(\BFm_{[2:K]})}tc_{t}(% \BFtau)\nabla v^{t}_{|\BFtau|}+\sum^{T^{\prime}}_{t=1}\sum_{\BFtau\in\mathcal{% T}(\BFm_{[2:K]})}tc_{t}(\BFtau)\nabla v^{t}_{|\BFtau|}= ∑ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_τ ∈ caligraphic_T ( italic_m start_POSTSUBSCRIPT [ 2 : italic_K ] end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT italic_t italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_τ ) ∇ italic_v start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ | end_POSTSUBSCRIPT + ∑ start_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_τ ∈ caligraphic_T ( italic_m start_POSTSUBSCRIPT [ 2 : italic_K ] end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT italic_t italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_τ ) ∇ italic_v start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ | end_POSTSUBSCRIPT
+t=1T[t1=1Tτ𝒯(m[2:K])(t+t1)ct+t1(τ)(v|τ|t1+tv|τ|t1+t1)].subscriptsuperscript𝑇𝑡1delimited-[]subscriptsuperscriptsuperscript𝑇subscript𝑡11subscript𝜏𝒯subscript𝑚delimited-[]:2𝐾𝑡subscript𝑡1subscript𝑐𝑡subscript𝑡1𝜏subscriptsuperscript𝑣subscript𝑡1𝑡𝜏subscriptsuperscript𝑣subscript𝑡1𝑡1𝜏\displaystyle\quad+\sum^{T}_{t=1}\left[\sum^{T^{\prime}}_{t_{1}=1}\sum_{\BFtau% \in\mathcal{T}(\BFm_{[2:K]})}(t+t_{1})c_{t+t_{1}}(\BFtau)(\nabla v^{t_{1}+t}_{% |\BFtau|}-\nabla v^{t_{1}+t-1}_{|\BFtau|})\right].+ ∑ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT [ ∑ start_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 1 end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_τ ∈ caligraphic_T ( italic_m start_POSTSUBSCRIPT [ 2 : italic_K ] end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ( italic_t + italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) italic_c start_POSTSUBSCRIPT italic_t + italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_τ ) ( ∇ italic_v start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ | end_POSTSUBSCRIPT - ∇ italic_v start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ | end_POSTSUBSCRIPT ) ] .

Noting that

ψT+T,ksubscript𝜓𝑇superscript𝑇𝑘\displaystyle\psi_{T+T^{\prime},k}italic_ψ start_POSTSUBSCRIPT italic_T + italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_k end_POSTSUBSCRIPT =t=1T+Tτ𝒯(m[2:K])tct(τ)v|τ|tabsentsubscriptsuperscript𝑇superscript𝑇𝑡1subscript𝜏𝒯subscript𝑚delimited-[]:2𝐾𝑡subscript𝑐𝑡𝜏subscriptsuperscript𝑣𝑡𝜏\displaystyle=\sum^{T+T^{\prime}}_{t=1}\sum_{\BFtau\in\mathcal{T}(\BFm_{[2:K]}% )}tc_{t}(\BFtau)\nabla v^{t}_{|\BFtau|}= ∑ start_POSTSUPERSCRIPT italic_T + italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_τ ∈ caligraphic_T ( italic_m start_POSTSUBSCRIPT [ 2 : italic_K ] end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT italic_t italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_τ ) ∇ italic_v start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ | end_POSTSUBSCRIPT
=t=1Tτ𝒯(m[2:K])tct(τ)v|τ|t+t=1Tτ𝒯(m[2:K])(t+T)ct+T(τ)v|τ|t+T.absentsubscriptsuperscript𝑇𝑡1subscript𝜏𝒯subscript𝑚delimited-[]:2𝐾𝑡subscript𝑐𝑡𝜏subscriptsuperscript𝑣𝑡𝜏subscriptsuperscriptsuperscript𝑇𝑡1subscript𝜏𝒯subscript𝑚delimited-[]:2𝐾𝑡𝑇subscript𝑐𝑡𝑇𝜏subscriptsuperscript𝑣𝑡𝑇𝜏\displaystyle=\sum^{T}_{t=1}\sum_{\BFtau\in\mathcal{T}(\BFm_{[2:K]})}tc_{t}(% \BFtau)\nabla v^{t}_{|\BFtau|}+\sum^{T^{\prime}}_{t=1}\sum_{\BFtau\in\mathcal{% T}(\BFm_{[2:K]})}(t+T)c_{t+T}(\BFtau)\nabla v^{t+T}_{|\BFtau|}.= ∑ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_τ ∈ caligraphic_T ( italic_m start_POSTSUBSCRIPT [ 2 : italic_K ] end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT italic_t italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_τ ) ∇ italic_v start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ | end_POSTSUBSCRIPT + ∑ start_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_τ ∈ caligraphic_T ( italic_m start_POSTSUBSCRIPT [ 2 : italic_K ] end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ( italic_t + italic_T ) italic_c start_POSTSUBSCRIPT italic_t + italic_T end_POSTSUBSCRIPT ( italic_τ ) ∇ italic_v start_POSTSUPERSCRIPT italic_t + italic_T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ | end_POSTSUBSCRIPT .

Thus, we have

ψT,k1+ψT,k2ψT+T,ksubscript𝜓𝑇subscript𝑘1subscript𝜓superscript𝑇subscript𝑘2subscript𝜓𝑇superscript𝑇𝑘\displaystyle\psi_{T,k_{1}}+\psi_{T^{\prime},k_{2}}-\psi_{T+T^{\prime},k}italic_ψ start_POSTSUBSCRIPT italic_T , italic_k start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT + italic_ψ start_POSTSUBSCRIPT italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_k start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT - italic_ψ start_POSTSUBSCRIPT italic_T + italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_k end_POSTSUBSCRIPT =t=1T[t1=1Tτ𝒯(m[2:K])(t+t1)ct+t1(τ)(v|τ|t1+tv|τ|t1+t1)]absentsubscriptsuperscript𝑇𝑡1delimited-[]subscriptsuperscriptsuperscript𝑇subscript𝑡11subscript𝜏𝒯subscript𝑚delimited-[]:2𝐾𝑡subscript𝑡1subscript𝑐𝑡subscript𝑡1𝜏subscriptsuperscript𝑣subscript𝑡1𝑡𝜏subscriptsuperscript𝑣subscript𝑡1𝑡1𝜏\displaystyle=\sum^{T}_{t=1}\left[\sum^{T^{\prime}}_{t_{1}=1}\sum_{\BFtau\in% \mathcal{T}(\BFm_{[2:K]})}(t+t_{1})c_{t+t_{1}}(\BFtau)(\nabla v^{t_{1}+t}_{|% \BFtau|}-\nabla v^{t_{1}+t-1}_{|\BFtau|})\right]= ∑ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT [ ∑ start_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 1 end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_τ ∈ caligraphic_T ( italic_m start_POSTSUBSCRIPT [ 2 : italic_K ] end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ( italic_t + italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) italic_c start_POSTSUBSCRIPT italic_t + italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_τ ) ( ∇ italic_v start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ | end_POSTSUBSCRIPT - ∇ italic_v start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ | end_POSTSUBSCRIPT ) ]
+t=1Tτ𝒯(m[2:K])[tct(τ)v|τ|t(t+T)ct+T(τ)v|τ|t+T]subscriptsuperscriptsuperscript𝑇𝑡1subscript𝜏𝒯subscript𝑚delimited-[]:2𝐾delimited-[]𝑡subscript𝑐𝑡𝜏subscriptsuperscript𝑣𝑡𝜏𝑡𝑇subscript𝑐𝑡𝑇𝜏subscriptsuperscript𝑣𝑡𝑇𝜏\displaystyle\quad+\sum^{T^{\prime}}_{t=1}\sum_{\BFtau\in\mathcal{T}(\BFm_{[2:% K]})}\left[tc_{t}(\BFtau)\nabla v^{t}_{|\BFtau|}-(t+T)c_{t+T}(\BFtau)\nabla v^% {t+T}_{|\BFtau|}\right]+ ∑ start_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_τ ∈ caligraphic_T ( italic_m start_POSTSUBSCRIPT [ 2 : italic_K ] end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT [ italic_t italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_τ ) ∇ italic_v start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ | end_POSTSUBSCRIPT - ( italic_t + italic_T ) italic_c start_POSTSUBSCRIPT italic_t + italic_T end_POSTSUBSCRIPT ( italic_τ ) ∇ italic_v start_POSTSUPERSCRIPT italic_t + italic_T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ | end_POSTSUBSCRIPT ]
=t=0T1[t1=1Tτ𝒯(m[2:K])(t+t1+1)ct+t1+1(τ)(v|τ|t1+t+1v|τ|t1+t)]absentsubscriptsuperscript𝑇1𝑡0delimited-[]subscriptsuperscriptsuperscript𝑇subscript𝑡11subscript𝜏𝒯subscript𝑚delimited-[]:2𝐾𝑡subscript𝑡11subscript𝑐𝑡subscript𝑡11𝜏subscriptsuperscript𝑣subscript𝑡1𝑡1𝜏subscriptsuperscript𝑣subscript𝑡1𝑡𝜏\displaystyle=\sum^{T-1}_{t=0}\left[\sum^{T^{\prime}}_{t_{1}=1}\sum_{\BFtau\in% \mathcal{T}(\BFm_{[2:K]})}(t+t_{1}+1)c_{t+t_{1}+1}(\BFtau)(\nabla v^{t_{1}+t+1% }_{|\BFtau|}-\nabla v^{t_{1}+t}_{|\BFtau|})\right]= ∑ start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT [ ∑ start_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 1 end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_τ ∈ caligraphic_T ( italic_m start_POSTSUBSCRIPT [ 2 : italic_K ] end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ( italic_t + italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + 1 ) italic_c start_POSTSUBSCRIPT italic_t + italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + 1 end_POSTSUBSCRIPT ( italic_τ ) ( ∇ italic_v start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t + 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ | end_POSTSUBSCRIPT - ∇ italic_v start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ | end_POSTSUBSCRIPT ) ]
+t=0T1t1=1Tτ𝒯(m[2:K])[(t1+t)ct1+t(τ)v|τ|t1+t(t1+t+1)ct1+t+1(τ)v|τ|t1+t+1]superscriptsubscript𝑡0𝑇1subscriptsuperscriptsuperscript𝑇subscript𝑡11subscript𝜏𝒯subscript𝑚delimited-[]:2𝐾delimited-[]subscript𝑡1𝑡subscript𝑐subscript𝑡1𝑡𝜏subscriptsuperscript𝑣subscript𝑡1𝑡𝜏subscript𝑡1𝑡1subscript𝑐subscript𝑡1𝑡1𝜏subscriptsuperscript𝑣subscript𝑡1𝑡1𝜏\displaystyle\quad+\sum_{t=0}^{T-1}\sum^{T^{\prime}}_{t_{1}=1}\sum_{\BFtau\in% \mathcal{T}(\BFm_{[2:K]})}\left[(t_{1}+t)c_{t_{1}+t}(\BFtau)\nabla v^{t_{1}+t}% _{|\BFtau|}-(t_{1}+t+1)c_{t_{1}+t+1}(\BFtau)\nabla v^{t_{1}+t+1}_{|\BFtau|}\right]+ ∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 1 end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_τ ∈ caligraphic_T ( italic_m start_POSTSUBSCRIPT [ 2 : italic_K ] end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT [ ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t ) italic_c start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t end_POSTSUBSCRIPT ( italic_τ ) ∇ italic_v start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ | end_POSTSUBSCRIPT - ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t + 1 ) italic_c start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t + 1 end_POSTSUBSCRIPT ( italic_τ ) ∇ italic_v start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t + 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ | end_POSTSUBSCRIPT ] (10)
=t=0T1t1=1Tτ𝒯(m[2:K])[(t1+t)ct1+t(τ)(t1+t+1)ct1+t+1(τ)]v|τ|t1+t.absentsuperscriptsubscript𝑡0𝑇1subscriptsuperscriptsuperscript𝑇subscript𝑡11subscript𝜏𝒯subscript𝑚delimited-[]:2𝐾delimited-[]subscript𝑡1𝑡subscript𝑐subscript𝑡1𝑡𝜏subscript𝑡1𝑡1subscript𝑐subscript𝑡1𝑡1𝜏subscriptsuperscript𝑣subscript𝑡1𝑡𝜏\displaystyle=\sum_{t=0}^{T-1}\sum^{T^{\prime}}_{t_{1}=1}\sum_{\BFtau\in% \mathcal{T}(\BFm_{[2:K]})}\left[(t_{1}+t)c_{t_{1}+t}(\BFtau)-(t_{1}+t+1)c_{t_{% 1}+t+1}(\BFtau)\right]\nabla v^{t_{1}+t}_{|\BFtau|}.= ∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 1 end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_τ ∈ caligraphic_T ( italic_m start_POSTSUBSCRIPT [ 2 : italic_K ] end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT [ ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t ) italic_c start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t end_POSTSUBSCRIPT ( italic_τ ) - ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t + 1 ) italic_c start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t + 1 end_POSTSUBSCRIPT ( italic_τ ) ] ∇ italic_v start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ | end_POSTSUBSCRIPT . (11)

where (10) holds according to Lemma 7.3.

We first prove (ii). If v()𝑣v(\cdot)italic_v ( ⋅ ) is a linear function, there exists a constant v0subscript𝑣0v_{0}italic_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT such that v|τ|t=v0subscriptsuperscript𝑣𝑡𝜏subscript𝑣0\nabla v^{t}_{|\BFtau|}=v_{0}∇ italic_v start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ | end_POSTSUBSCRIPT = italic_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT for any t𝑡titalic_t and τ𝜏\BFtauitalic_τ. Therefore, we have

(11)italic-(11italic-)\displaystyle\eqref{eq: thm1-2}italic_( italic_) =t=0T1t1=1Tτ𝒯(m[2:K])[(t1+t)ct1+t(τ)(t1+t+1)ct1+t+1(τ)]v0absentsuperscriptsubscript𝑡0𝑇1subscriptsuperscriptsuperscript𝑇subscript𝑡11subscript𝜏𝒯subscript𝑚delimited-[]:2𝐾delimited-[]subscript𝑡1𝑡subscript𝑐subscript𝑡1𝑡𝜏subscript𝑡1𝑡1subscript𝑐subscript𝑡1𝑡1𝜏subscript𝑣0\displaystyle=\sum_{t=0}^{T-1}\sum^{T^{\prime}}_{t_{1}=1}\sum_{\BFtau\in% \mathcal{T}(\BFm_{[2:K]})}\left[(t_{1}+t)c_{t_{1}+t}(\BFtau)-(t_{1}+t+1)c_{t_{% 1}+t+1}(\BFtau)\right]v_{0}= ∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 1 end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_τ ∈ caligraphic_T ( italic_m start_POSTSUBSCRIPT [ 2 : italic_K ] end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT [ ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t ) italic_c start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t end_POSTSUBSCRIPT ( italic_τ ) - ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t + 1 ) italic_c start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t + 1 end_POSTSUBSCRIPT ( italic_τ ) ] italic_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT
=t=0T1t1=1T[(t1+t)τ𝒯(m[2:K])ct1+t(τ)(t1+t+1)τ𝒯(m[2:K])ct1+t+1(τ)]v0absentsuperscriptsubscript𝑡0𝑇1subscriptsuperscriptsuperscript𝑇subscript𝑡11delimited-[]subscript𝑡1𝑡subscript𝜏𝒯subscript𝑚delimited-[]:2𝐾subscript𝑐subscript𝑡1𝑡𝜏subscript𝑡1𝑡1subscript𝜏𝒯subscript𝑚delimited-[]:2𝐾subscript𝑐subscript𝑡1𝑡1𝜏subscript𝑣0\displaystyle=\sum_{t=0}^{T-1}\sum^{T^{\prime}}_{t_{1}=1}\left[(t_{1}+t)\sum_{% \BFtau\in\mathcal{T}(\BFm_{[2:K]})}c_{t_{1}+t}(\BFtau)-(t_{1}+t+1)\sum_{\BFtau% \in\mathcal{T}(\BFm_{[2:K]})}c_{t_{1}+t+1}(\BFtau)\right]v_{0}= ∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 1 end_POSTSUBSCRIPT [ ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t ) ∑ start_POSTSUBSCRIPT italic_τ ∈ caligraphic_T ( italic_m start_POSTSUBSCRIPT [ 2 : italic_K ] end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t end_POSTSUBSCRIPT ( italic_τ ) - ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t + 1 ) ∑ start_POSTSUBSCRIPT italic_τ ∈ caligraphic_T ( italic_m start_POSTSUBSCRIPT [ 2 : italic_K ] end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT italic_c start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t + 1 end_POSTSUBSCRIPT ( italic_τ ) ] italic_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT
=t=0T1t1=1T[(t1+t)1t1+t(t1+t+1)1t1+t+1]v0absentsuperscriptsubscript𝑡0𝑇1subscriptsuperscriptsuperscript𝑇subscript𝑡11delimited-[]subscript𝑡1𝑡1subscript𝑡1𝑡subscript𝑡1𝑡11subscript𝑡1𝑡1subscript𝑣0\displaystyle=\sum_{t=0}^{T-1}\sum^{T^{\prime}}_{t_{1}=1}\left[(t_{1}+t)\frac{% 1}{t_{1}+t}-(t_{1}+t+1)\frac{1}{t_{1}+t+1}\right]v_{0}= ∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT ∑ start_POSTSUPERSCRIPT italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 1 end_POSTSUBSCRIPT [ ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t ) divide start_ARG 1 end_ARG start_ARG italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t end_ARG - ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t + 1 ) divide start_ARG 1 end_ARG start_ARG italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t + 1 end_ARG ] italic_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT (12)
=0.absent0\displaystyle=0.= 0 .

where (12) holds due to Lemma 7.2.

Now we prove (i). If v()𝑣v(\cdot)italic_v ( ⋅ ) is concave, then according to Lemma 7.4, one knows that, for any t𝑡titalic_t, ct(τ)=l=0|Mk(τ)|C|Mk(τ)|l(1)l1|τ|+l+t=|Mk(τ)|!Πk=0|Mk(τ)|(|τ|+t+k)subscript𝑐𝑡𝜏subscriptsuperscriptsubscript𝑀𝑘𝜏𝑙0subscriptsuperscript𝐶𝑙subscript𝑀𝑘𝜏superscript1𝑙1𝜏𝑙𝑡subscript𝑀𝑘𝜏superscriptsubscriptΠ𝑘0subscript𝑀𝑘𝜏𝜏𝑡𝑘c_{t}(\BFtau)=\sum^{|M_{k}(\BFtau)|}_{l=0}C^{l}_{|M_{k}(\BFtau)|}(-1)^{l}\frac% {1}{|\BFtau|+l+t}=\frac{|M_{k}(\BFtau)|!}{\Pi_{k=0}^{|M_{k}(\BFtau)|}(|\BFtau|% +t+k)}italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_τ ) = ∑ start_POSTSUPERSCRIPT | italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) | end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_l = 0 end_POSTSUBSCRIPT italic_C start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) | end_POSTSUBSCRIPT ( - 1 ) start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG | italic_τ | + italic_l + italic_t end_ARG = divide start_ARG | italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) | ! end_ARG start_ARG roman_Π start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT | italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) | end_POSTSUPERSCRIPT ( | italic_τ | + italic_t + italic_k ) end_ARG. Therefore, we have

tct(τ)(t+1)ct+1(τ)=tΠk=0|Mk(τ)|(|τ|+t+k+1)(t+1)Πk=0|Mk(τ)|(|τ|+t+k).𝑡subscript𝑐𝑡𝜏𝑡1subscript𝑐𝑡1𝜏𝑡superscriptsubscriptΠ𝑘0subscript𝑀𝑘𝜏𝜏𝑡𝑘1𝑡1superscriptsubscriptΠ𝑘0subscript𝑀𝑘𝜏𝜏𝑡𝑘\displaystyle\frac{tc_{t}(\BFtau)}{(t+1)c_{t+1}(\BFtau)}=\frac{t\Pi_{k=0}^{|M_% {k}(\BFtau)|}(|\BFtau|+t+k+1)}{(t+1)\Pi_{k=0}^{|M_{k}(\BFtau)|}(|\BFtau|+t+k)}.divide start_ARG italic_t italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_τ ) end_ARG start_ARG ( italic_t + 1 ) italic_c start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ( italic_τ ) end_ARG = divide start_ARG italic_t roman_Π start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT | italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) | end_POSTSUPERSCRIPT ( | italic_τ | + italic_t + italic_k + 1 ) end_ARG start_ARG ( italic_t + 1 ) roman_Π start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT | italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) | end_POSTSUPERSCRIPT ( | italic_τ | + italic_t + italic_k ) end_ARG .

Note that if |τ|=0𝜏0|\BFtau|=0| italic_τ | = 0, then

tΠk=0|Mk(τ)|(|τ|+t+k+1)(t+1)Πk=0|Mk(τ)|(|τ|+t+k)=Πk=1|Mk(τ)|(|τ|+t+k+1)Πk=1|Mk(τ)|(|τ|+t+k)>1.𝑡superscriptsubscriptΠ𝑘0subscript𝑀𝑘𝜏𝜏𝑡𝑘1𝑡1superscriptsubscriptΠ𝑘0subscript𝑀𝑘𝜏𝜏𝑡𝑘superscriptsubscriptΠ𝑘1subscript𝑀𝑘𝜏𝜏𝑡𝑘1superscriptsubscriptΠ𝑘1subscript𝑀𝑘𝜏𝜏𝑡𝑘1\frac{t\Pi_{k=0}^{|M_{k}(\BFtau)|}(|\BFtau|+t+k+1)}{(t+1)\Pi_{k=0}^{|M_{k}(% \BFtau)|}(|\BFtau|+t+k)}=\frac{\Pi_{k=1}^{|M_{k}(\BFtau)|}(|\BFtau|+t+k+1)}{% \Pi_{k=1}^{|M_{k}(\BFtau)|}(|\BFtau|+t+k)}>1.divide start_ARG italic_t roman_Π start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT | italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) | end_POSTSUPERSCRIPT ( | italic_τ | + italic_t + italic_k + 1 ) end_ARG start_ARG ( italic_t + 1 ) roman_Π start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT | italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) | end_POSTSUPERSCRIPT ( | italic_τ | + italic_t + italic_k ) end_ARG = divide start_ARG roman_Π start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT | italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) | end_POSTSUPERSCRIPT ( | italic_τ | + italic_t + italic_k + 1 ) end_ARG start_ARG roman_Π start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT | italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) | end_POSTSUPERSCRIPT ( | italic_τ | + italic_t + italic_k ) end_ARG > 1 .

Moreover, tΠk=0|Mk(τ)|(|τ|+t+k+1)(t+1)Πk=0|Mk(τ)|(|τ|+t+k)𝑡superscriptsubscriptΠ𝑘0subscript𝑀𝑘𝜏𝜏𝑡𝑘1𝑡1superscriptsubscriptΠ𝑘0subscript𝑀𝑘𝜏𝜏𝑡𝑘\frac{t\Pi_{k=0}^{|M_{k}(\BFtau)|}(|\BFtau|+t+k+1)}{(t+1)\Pi_{k=0}^{|M_{k}(% \BFtau)|}(|\BFtau|+t+k)}divide start_ARG italic_t roman_Π start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT | italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) | end_POSTSUPERSCRIPT ( | italic_τ | + italic_t + italic_k + 1 ) end_ARG start_ARG ( italic_t + 1 ) roman_Π start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT | italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) | end_POSTSUPERSCRIPT ( | italic_τ | + italic_t + italic_k ) end_ARG decreases as |τ|𝜏|\BFtau|| italic_τ | grows larger. We let |τ|¯tsubscript¯𝜏𝑡\bar{|\tau|}_{t}\in\mathbb{R}over¯ start_ARG | italic_τ | end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ blackboard_R denote the constant that satisfies

tΠk=0|Mk(τ)|(|τ|¯t+t+k+1)(t+1)Πk=0|Mk(τ)|(|τ|¯t+t+k)=1.𝑡superscriptsubscriptΠ𝑘0subscript𝑀𝑘𝜏subscript¯𝜏𝑡𝑡𝑘1𝑡1superscriptsubscriptΠ𝑘0subscript𝑀𝑘𝜏subscript¯𝜏𝑡𝑡𝑘1\frac{t\Pi_{k=0}^{|M_{k}(\BFtau)|}(\bar{|\tau|}_{t}+t+k+1)}{(t+1)\Pi_{k=0}^{|M% _{k}(\BFtau)|}(\bar{|\tau|}_{t}+t+k)}=1.divide start_ARG italic_t roman_Π start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT | italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) | end_POSTSUPERSCRIPT ( over¯ start_ARG | italic_τ | end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_t + italic_k + 1 ) end_ARG start_ARG ( italic_t + 1 ) roman_Π start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT | italic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_τ ) | end_POSTSUPERSCRIPT ( over¯ start_ARG | italic_τ | end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_t + italic_k ) end_ARG = 1 .

Let T+:={t,t1|t=0,,T1;t1=1,,T;maxτ𝒯(m[2:K])|τ|<|τ|¯t+t1}assignsuperscript𝑇conditional-set𝑡subscript𝑡1formulae-sequence𝑡0𝑇1formulae-sequencesubscript𝑡11superscript𝑇subscript𝜏𝒯subscript𝑚delimited-[]:2𝐾𝜏subscript¯𝜏𝑡subscript𝑡1T^{+}:=\{t,t_{1}|t=0,\dots,T-1;t_{1}=1,\dots,T^{\prime};\max_{\BFtau\in% \mathcal{T}(\BFm_{[2:K]})}|\BFtau|<\bar{|\tau|}_{t+t_{1}}\}italic_T start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT := { italic_t , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | italic_t = 0 , … , italic_T - 1 ; italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 1 , … , italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ; roman_max start_POSTSUBSCRIPT italic_τ ∈ caligraphic_T ( italic_m start_POSTSUBSCRIPT [ 2 : italic_K ] end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT | italic_τ | < over¯ start_ARG | italic_τ | end_ARG start_POSTSUBSCRIPT italic_t + italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT } and let T={t,t1|t=0,,T1;t1=1,,T}/T+superscript𝑇conditional-set𝑡subscript𝑡1formulae-sequence𝑡0𝑇1subscript𝑡11superscript𝑇superscript𝑇T^{-}=\{t,t_{1}|t=0,\dots,T-1;t_{1}=1,\dots,T^{\prime}\}/T^{+}italic_T start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT = { italic_t , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | italic_t = 0 , … , italic_T - 1 ; italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 1 , … , italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT } / italic_T start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT.

Therefore, we have

(11)=italic-(11italic-)absent\displaystyle\eqref{eq: thm1-2}=italic_( italic_) = t,t1T+τ𝒯(m[2:K])[(t1+t)ct1+t(τ)(t1+t+1)ct1+t+1(τ)]v|τ|t1+tsubscript𝑡subscript𝑡1superscript𝑇subscript𝜏𝒯subscript𝑚delimited-[]:2𝐾delimited-[]subscript𝑡1𝑡subscript𝑐subscript𝑡1𝑡𝜏subscript𝑡1𝑡1subscript𝑐subscript𝑡1𝑡1𝜏subscriptsuperscript𝑣subscript𝑡1𝑡𝜏\displaystyle\sum_{t,t_{1}\in T^{+}}\sum_{\BFtau\in\mathcal{T}(\BFm_{[2:K]})}% \left[(t_{1}+t)c_{t_{1}+t}(\BFtau)-(t_{1}+t+1)c_{t_{1}+t+1}(\BFtau)\right]% \nabla v^{t_{1}+t}_{|\BFtau|}∑ start_POSTSUBSCRIPT italic_t , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ italic_T start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_τ ∈ caligraphic_T ( italic_m start_POSTSUBSCRIPT [ 2 : italic_K ] end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT [ ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t ) italic_c start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t end_POSTSUBSCRIPT ( italic_τ ) - ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t + 1 ) italic_c start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t + 1 end_POSTSUBSCRIPT ( italic_τ ) ] ∇ italic_v start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ | end_POSTSUBSCRIPT
+t,t1Tτ𝒯(m[2:K])[(t1+t)ct1+t(τ)(t1+t+1)ct1+t+1(τ)]v|τ|t1+tsubscript𝑡subscript𝑡1superscript𝑇subscript𝜏𝒯subscript𝑚delimited-[]:2𝐾delimited-[]subscript𝑡1𝑡subscript𝑐subscript𝑡1𝑡𝜏subscript𝑡1𝑡1subscript𝑐subscript𝑡1𝑡1𝜏subscriptsuperscript𝑣subscript𝑡1𝑡𝜏\displaystyle+\sum_{t,t_{1}\in T^{-}}\sum_{\BFtau\in\mathcal{T}(\BFm_{[2:K]})}% \left[(t_{1}+t)c_{t_{1}+t}(\BFtau)-(t_{1}+t+1)c_{t_{1}+t+1}(\BFtau)\right]% \nabla v^{t_{1}+t}_{|\BFtau|}+ ∑ start_POSTSUBSCRIPT italic_t , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ italic_T start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_τ ∈ caligraphic_T ( italic_m start_POSTSUBSCRIPT [ 2 : italic_K ] end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT [ ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t ) italic_c start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t end_POSTSUBSCRIPT ( italic_τ ) - ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t + 1 ) italic_c start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t + 1 end_POSTSUBSCRIPT ( italic_τ ) ] ∇ italic_v start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ | end_POSTSUBSCRIPT
\displaystyle\geq t,t1Tτ𝒯(m[2:K])[(t1+t)ct1+t(τ)(t1+t+1)ct1+t+1(τ)]v|τ|t1+tsubscript𝑡subscript𝑡1superscript𝑇subscript𝜏𝒯subscript𝑚delimited-[]:2𝐾delimited-[]subscript𝑡1𝑡subscript𝑐subscript𝑡1𝑡𝜏subscript𝑡1𝑡1subscript𝑐subscript𝑡1𝑡1𝜏subscriptsuperscript𝑣subscript𝑡1𝑡𝜏\displaystyle\sum_{t,t_{1}\in T^{-}}\sum_{\BFtau\in\mathcal{T}(\BFm_{[2:K]})}% \left[(t_{1}+t)c_{t_{1}+t}(\BFtau)-(t_{1}+t+1)c_{t_{1}+t+1}(\BFtau)\right]% \nabla v^{t_{1}+t}_{|\BFtau|}∑ start_POSTSUBSCRIPT italic_t , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ italic_T start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_τ ∈ caligraphic_T ( italic_m start_POSTSUBSCRIPT [ 2 : italic_K ] end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT [ ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t ) italic_c start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t end_POSTSUBSCRIPT ( italic_τ ) - ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t + 1 ) italic_c start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t + 1 end_POSTSUBSCRIPT ( italic_τ ) ] ∇ italic_v start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ | end_POSTSUBSCRIPT
=\displaystyle== t,t1Tτ𝒯(m[2:K]),|τ||τ|¯t+t1[(t1+t)ct1+t(τ)(t1+t+1)ct1+t+1(τ)]v|τ|t1+tsubscript𝑡subscript𝑡1superscript𝑇subscriptformulae-sequence𝜏𝒯subscript𝑚delimited-[]:2𝐾𝜏subscript¯𝜏𝑡subscript𝑡1delimited-[]subscript𝑡1𝑡subscript𝑐subscript𝑡1𝑡𝜏subscript𝑡1𝑡1subscript𝑐subscript𝑡1𝑡1𝜏subscriptsuperscript𝑣subscript𝑡1𝑡𝜏\displaystyle\sum_{t,t_{1}\in T^{-}}\sum_{\BFtau\in\mathcal{T}(\BFm_{[2:K]}),|% \BFtau|\leq\bar{|\tau|}_{t+t_{1}}}\left[(t_{1}+t)c_{t_{1}+t}(\BFtau)-(t_{1}+t+% 1)c_{t_{1}+t+1}(\BFtau)\right]\nabla v^{t_{1}+t}_{|\BFtau|}∑ start_POSTSUBSCRIPT italic_t , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ italic_T start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_τ ∈ caligraphic_T ( italic_m start_POSTSUBSCRIPT [ 2 : italic_K ] end_POSTSUBSCRIPT ) , | italic_τ | ≤ over¯ start_ARG | italic_τ | end_ARG start_POSTSUBSCRIPT italic_t + italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t ) italic_c start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t end_POSTSUBSCRIPT ( italic_τ ) - ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t + 1 ) italic_c start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t + 1 end_POSTSUBSCRIPT ( italic_τ ) ] ∇ italic_v start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ | end_POSTSUBSCRIPT
+t,t1Tτ𝒯(m[2:K]),|τ|>|τ|¯t+t1[(t1+t)ct1+t(τ)(t1+t+1)ct1+t+1(τ)]v|τ|t1+tsubscript𝑡subscript𝑡1superscript𝑇subscriptformulae-sequence𝜏𝒯subscript𝑚delimited-[]:2𝐾𝜏subscript¯𝜏𝑡subscript𝑡1delimited-[]subscript𝑡1𝑡subscript𝑐subscript𝑡1𝑡𝜏subscript𝑡1𝑡1subscript𝑐subscript𝑡1𝑡1𝜏subscriptsuperscript𝑣subscript𝑡1𝑡𝜏\displaystyle+\sum_{t,t_{1}\in T^{-}}\sum_{\BFtau\in\mathcal{T}(\BFm_{[2:K]}),% |\BFtau|>\bar{|\tau|}_{t+t_{1}}}\left[(t_{1}+t)c_{t_{1}+t}(\BFtau)-(t_{1}+t+1)% c_{t_{1}+t+1}(\BFtau)\right]\nabla v^{t_{1}+t}_{|\BFtau|}+ ∑ start_POSTSUBSCRIPT italic_t , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ italic_T start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_τ ∈ caligraphic_T ( italic_m start_POSTSUBSCRIPT [ 2 : italic_K ] end_POSTSUBSCRIPT ) , | italic_τ | > over¯ start_ARG | italic_τ | end_ARG start_POSTSUBSCRIPT italic_t + italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t ) italic_c start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t end_POSTSUBSCRIPT ( italic_τ ) - ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t + 1 ) italic_c start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t + 1 end_POSTSUBSCRIPT ( italic_τ ) ] ∇ italic_v start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT | italic_τ | end_POSTSUBSCRIPT
>\displaystyle>> t,t1Tτ𝒯(m[2:K]),|τ||τ|¯t+t1[(t1+t)ct1+t(τ)(t1+t+1)ct1+t+1(τ)]v|τ|¯t+t1t1+tsubscript𝑡subscript𝑡1superscript𝑇subscriptformulae-sequence𝜏𝒯subscript𝑚delimited-[]:2𝐾𝜏subscript¯𝜏𝑡subscript𝑡1delimited-[]subscript𝑡1𝑡subscript𝑐subscript𝑡1𝑡𝜏subscript𝑡1𝑡1subscript𝑐subscript𝑡1𝑡1𝜏subscriptsuperscript𝑣subscript𝑡1𝑡subscript¯𝜏𝑡subscript𝑡1\displaystyle\sum_{t,t_{1}\in T^{-}}\sum_{\BFtau\in\mathcal{T}(\BFm_{[2:K]}),|% \BFtau|\leq\bar{|\tau|}_{t+t_{1}}}\left[(t_{1}+t)c_{t_{1}+t}(\BFtau)-(t_{1}+t+% 1)c_{t_{1}+t+1}(\BFtau)\right]\nabla v^{t_{1}+t}_{\bar{|\tau|}_{t+t_{1}}}∑ start_POSTSUBSCRIPT italic_t , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ italic_T start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_τ ∈ caligraphic_T ( italic_m start_POSTSUBSCRIPT [ 2 : italic_K ] end_POSTSUBSCRIPT ) , | italic_τ | ≤ over¯ start_ARG | italic_τ | end_ARG start_POSTSUBSCRIPT italic_t + italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t ) italic_c start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t end_POSTSUBSCRIPT ( italic_τ ) - ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t + 1 ) italic_c start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t + 1 end_POSTSUBSCRIPT ( italic_τ ) ] ∇ italic_v start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT over¯ start_ARG | italic_τ | end_ARG start_POSTSUBSCRIPT italic_t + italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT
+t,t1Tτ𝒯(m[2:K]),|τ|>|τ|¯t+t1[(t1+t)ct1+t(τ)(t1+t+1)ct1+t+1(τ)]v|τ|¯t+t1t1+tsubscript𝑡subscript𝑡1superscript𝑇subscriptformulae-sequence𝜏𝒯subscript𝑚delimited-[]:2𝐾𝜏subscript¯𝜏𝑡subscript𝑡1delimited-[]subscript𝑡1𝑡subscript𝑐subscript𝑡1𝑡𝜏subscript𝑡1𝑡1subscript𝑐subscript𝑡1𝑡1𝜏subscriptsuperscript𝑣subscript𝑡1𝑡subscript¯𝜏𝑡subscript𝑡1\displaystyle+\sum_{t,t_{1}\in T^{-}}\sum_{\BFtau\in\mathcal{T}(\BFm_{[2:K]}),% |\BFtau|>\bar{|\tau|}_{t+t_{1}}}\left[(t_{1}+t)c_{t_{1}+t}(\BFtau)-(t_{1}+t+1)% c_{t_{1}+t+1}(\BFtau)\right]\nabla v^{t_{1}+t}_{\bar{|\tau|}_{t+t_{1}}}+ ∑ start_POSTSUBSCRIPT italic_t , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ italic_T start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_τ ∈ caligraphic_T ( italic_m start_POSTSUBSCRIPT [ 2 : italic_K ] end_POSTSUBSCRIPT ) , | italic_τ | > over¯ start_ARG | italic_τ | end_ARG start_POSTSUBSCRIPT italic_t + italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t ) italic_c start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t end_POSTSUBSCRIPT ( italic_τ ) - ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t + 1 ) italic_c start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t + 1 end_POSTSUBSCRIPT ( italic_τ ) ] ∇ italic_v start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT over¯ start_ARG | italic_τ | end_ARG start_POSTSUBSCRIPT italic_t + italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT
=\displaystyle== t,t1Tτ𝒯(m[2:K])[(t1+t)ct1+t(τ)(t1+t+1)ct1+t+1(τ)]v|τ|¯t+t1t1+t=0.subscript𝑡subscript𝑡1superscript𝑇subscript𝜏𝒯subscript𝑚delimited-[]:2𝐾delimited-[]subscript𝑡1𝑡subscript𝑐subscript𝑡1𝑡𝜏subscript𝑡1𝑡1subscript𝑐subscript𝑡1𝑡1𝜏subscriptsuperscript𝑣subscript𝑡1𝑡subscript¯𝜏𝑡subscript𝑡10\displaystyle\sum_{t,t_{1}\in T^{-}}\sum_{\BFtau\in\mathcal{T}(\BFm_{[2:K]})}% \left[(t_{1}+t)c_{t_{1}+t}(\BFtau)-(t_{1}+t+1)c_{t_{1}+t+1}(\BFtau)\right]% \nabla v^{t_{1}+t}_{\bar{|\tau|}_{t+t_{1}}}=0.∑ start_POSTSUBSCRIPT italic_t , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ italic_T start_POSTSUPERSCRIPT - end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_τ ∈ caligraphic_T ( italic_m start_POSTSUBSCRIPT [ 2 : italic_K ] end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT [ ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t ) italic_c start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t end_POSTSUBSCRIPT ( italic_τ ) - ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t + 1 ) italic_c start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t + 1 end_POSTSUBSCRIPT ( italic_τ ) ] ∇ italic_v start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT over¯ start_ARG | italic_τ | end_ARG start_POSTSUBSCRIPT italic_t + italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT = 0 .

It’s straight-forward to show that ψi,k(v)subscript𝜓𝑖𝑘𝑣\psi_{i,k}(v)italic_ψ start_POSTSUBSCRIPT italic_i , italic_k end_POSTSUBSCRIPT ( italic_v ) for any i>0𝑖0i>0italic_i > 0. Hence, recursively applying Theorem 4.1 to each fake identity, we have, the equilibrium participation profile for an agent k𝑘kitalic_k with mksubscript𝑚𝑘m_{k}italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT data samples is of the form : τk^=1,k^=1,,mkformulae-sequencesubscript𝜏^𝑘1^𝑘1subscript𝑚𝑘\tau_{\hat{k}}=1,\ \hat{k}=1,\dots,m_{k}italic_τ start_POSTSUBSCRIPT over^ start_ARG italic_k end_ARG end_POSTSUBSCRIPT = 1 , over^ start_ARG italic_k end_ARG = 1 , … , italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT. In other words, an agent fully participates in the coalition but splits all the data samples, with mksubscript𝑚𝑘m_{k}italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT number of fake identities, and each fake identity contributes one sample.

8 Supplementary material for Section 3

In this section, we provide more details of the characteristic function v(|τ𝒜|)𝑣subscript𝜏𝒜v(|\BFtau_{\mathcal{A}}|)italic_v ( | italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT | ) solely as a function of the sample size |τ𝒜|subscript𝜏𝒜|\BFtau_{\mathcal{A}}|| italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT |. We first present an analytical format of v(|τ𝒜|)𝑣subscript𝜏𝒜v(|\BFtau_{\mathcal{A}}|)italic_v ( | italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT | ) for a pricing example where the closed-form solution of w*superscript𝑤w^{*}italic_w start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT is available. For a general case when w*superscript𝑤w^{*}italic_w start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT does not have a closed-form solution, we offer a comprehensive approach based on the Lipschitz assumptions on reward and decision functions.

Example 8.1 (Pricing under uncertainty)

Consider the agents try to predict the unknown consumer willingness-to-pay y𝑦yitalic_y, with yFθ*similar-to𝑦subscript𝐹superscript𝜃y\sim F_{\theta^{*}}italic_y ∼ italic_F start_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT, and θ*superscript𝜃\theta^{*}italic_θ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT being an unknown distribution parameter. Specifically, we assume Fθ*exp(θ*)similar-tosubscript𝐹superscript𝜃superscript𝜃F_{\theta^{*}}\sim\exp(\theta^{*})italic_F start_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ∼ roman_exp ( italic_θ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ). The agent’s decision is to set an optimal price to maximize the expected surplus : maxp𝔼yFθ*[I(yp)p]]=maxpp(1Fθ*(p))\max_{p}\ \mathbb{E}_{y\sim F_{\theta^{*}}}[I(y\geq p)p]\big{]}=\max_{p}\ p(1-% F_{\theta^{*}}(p))roman_max start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT italic_y ∼ italic_F start_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_I ( italic_y ≥ italic_p ) italic_p ] ] = roman_max start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT italic_p ( 1 - italic_F start_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_p ) ). For |τ𝒜|subscript𝜏𝒜|\BFtau_{\mathcal{A}}|| italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT | i.i.d. data samples of 𝐲τ𝒜subscript𝐲subscript𝜏𝒜\mathbf{y}_{\BFtau_{\mathcal{A}}}bold_y start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT end_POSTSUBSCRIPT, with yjFθ*similar-tosubscript𝑦𝑗subscript𝐹superscript𝜃y_{j}\sim F_{\theta^{*}}italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∼ italic_F start_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT, let θ^normal-^𝜃\hat{\theta}over^ start_ARG italic_θ end_ARG be the coalition sample mean and F^normal-^𝐹\hat{F}over^ start_ARG italic_F end_ARG be the predicted consumer income distribution under θ^normal-^𝜃\hat{\theta}over^ start_ARG italic_θ end_ARG, we have

p*(F^)1|τ𝒜|Erlang(|τ𝒜|,λ)=Erlang(|τ𝒜|,|τ𝒜|λ).similar-tosuperscript𝑝^𝐹1subscript𝜏𝒜𝐸𝑟𝑙𝑎𝑛𝑔subscript𝜏𝒜𝜆𝐸𝑟𝑙𝑎𝑛𝑔subscript𝜏𝒜subscript𝜏𝒜𝜆p^{*}(\hat{F})\sim\frac{1}{|\BFtau_{\mathcal{A}}|}Erlang(|\BFtau_{\mathcal{A}}% |,\lambda)=Erlang(|\BFtau_{\mathcal{A}}|,|\BFtau_{\mathcal{A}}|\lambda).italic_p start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( over^ start_ARG italic_F end_ARG ) ∼ divide start_ARG 1 end_ARG start_ARG | italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT | end_ARG italic_E italic_r italic_l italic_a italic_n italic_g ( | italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT | , italic_λ ) = italic_E italic_r italic_l italic_a italic_n italic_g ( | italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT | , | italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT | italic_λ ) .

Let X|τ𝒜|,λErlang(|τ𝒜|,|τ𝒜|λ)similar-tosubscript𝑋subscript𝜏𝒜𝜆𝐸𝑟𝑙𝑎𝑛𝑔subscript𝜏𝒜subscript𝜏𝒜𝜆X_{|\BFtau_{\mathcal{A}}|,\lambda}\sim Erlang(|\BFtau_{\mathcal{A}}|,|\BFtau_{% \mathcal{A}}|\lambda)italic_X start_POSTSUBSCRIPT | italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT | , italic_λ end_POSTSUBSCRIPT ∼ italic_E italic_r italic_l italic_a italic_n italic_g ( | italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT | , | italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT | italic_λ ), we have

p*(F^)(1Fθ*(p*(F^)))X|τ𝒜|,λexp(λX|τ𝒜|,λ).similar-tosuperscript𝑝^𝐹1subscript𝐹superscript𝜃superscript𝑝^𝐹subscript𝑋subscript𝜏𝒜𝜆𝑒𝑥𝑝𝜆subscript𝑋subscript𝜏𝒜𝜆p^{*}(\hat{F})(1-F_{\theta^{*}}(p^{*}(\hat{F})))\sim X_{|\BFtau_{\mathcal{A}}|% ,\lambda}exp(-\lambda X_{|\BFtau_{\mathcal{A}}|,\lambda}).italic_p start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( over^ start_ARG italic_F end_ARG ) ( 1 - italic_F start_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_p start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( over^ start_ARG italic_F end_ARG ) ) ) ∼ italic_X start_POSTSUBSCRIPT | italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT | , italic_λ end_POSTSUBSCRIPT italic_e italic_x italic_p ( - italic_λ italic_X start_POSTSUBSCRIPT | italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT | , italic_λ end_POSTSUBSCRIPT ) .

For simplicity, let θ*=1superscript𝜃1\theta^{*}=1italic_θ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT = 1, and

v(|τ𝒜|)=0xexp(λx)(λ|τ𝒜|)|τ𝒜|x|τ𝒜|1exp(λ|τ𝒜|x)(|τ𝒜|1)!𝑑x=(|τ𝒜||τ𝒜|+1)|τ𝒜|+1,𝑣subscript𝜏𝒜subscriptsuperscript0𝑥𝑒𝑥𝑝𝜆𝑥superscript𝜆subscript𝜏𝒜subscript𝜏𝒜superscript𝑥subscript𝜏𝒜1𝑒𝑥𝑝𝜆subscript𝜏𝒜𝑥subscript𝜏𝒜1differential-d𝑥superscriptsubscript𝜏𝒜subscript𝜏𝒜1subscript𝜏𝒜1v(|\BFtau_{\mathcal{A}}|)=\int^{\infty}_{0}xexp(-\lambda x)\frac{(\lambda|% \BFtau_{\mathcal{A}}|)^{|\BFtau_{\mathcal{A}}|}x^{|\BFtau_{\mathcal{A}}|-1}exp% (-\lambda|\BFtau_{\mathcal{A}}|x)}{(|\BFtau_{\mathcal{A}}|-1)!}dx=\left(\frac{% |\BFtau_{\mathcal{A}}|}{|\BFtau_{\mathcal{A}}|+1}\right)^{|\BFtau_{\mathcal{A}% }|+1},italic_v ( | italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT | ) = ∫ start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_x italic_e italic_x italic_p ( - italic_λ italic_x ) divide start_ARG ( italic_λ | italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT | ) start_POSTSUPERSCRIPT | italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT | end_POSTSUPERSCRIPT italic_x start_POSTSUPERSCRIPT | italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT | - 1 end_POSTSUPERSCRIPT italic_e italic_x italic_p ( - italic_λ | italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT | italic_x ) end_ARG start_ARG ( | italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT | - 1 ) ! end_ARG italic_d italic_x = ( divide start_ARG | italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT | end_ARG start_ARG | italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT | + 1 end_ARG ) start_POSTSUPERSCRIPT | italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT | + 1 end_POSTSUPERSCRIPT ,

where v(|τ𝒜|)𝑣subscript𝜏𝒜v(|\BFtau_{\mathcal{A}}|)italic_v ( | italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT | ) is a strictly increasing and concave function. It’s worth mentioning that in this example, we assume the platform could get an accurate sample mean, without considering the loss in performing FL.

In general, it is often difficult to obtain a closed-form solution. For such cases, we provide an expression of the characteristic function based on the Lipschitz assumption. We first state the Lipschitzness assumptions in the following:

Assumption 8.1

Lipschitzness assumptions for the decision-aware objective are as follows

  1. A.

    (Lipschitzness of w*superscript𝑤\BFw^{*}italic_w start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT) For any θ1,θ2Θsubscript𝜃1subscript𝜃2Θ\BFtheta_{1},\BFtheta_{2}\in\BFThetaitalic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ roman_Θ, we have

    w*(θ1)w*(θ2)Lwθ1θ2.normsuperscript𝑤subscript𝜃1superscript𝑤subscript𝜃2subscript𝐿𝑤normsubscript𝜃1subscript𝜃2\|\BFw^{*}(\BFtheta_{1})-\BFw^{*}(\BFtheta_{2})\|\leq L_{w}\|\BFtheta_{1}-% \BFtheta_{2}\|.∥ italic_w start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) - italic_w start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∥ ≤ italic_L start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT ∥ italic_θ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_θ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∥ .
  2. B.

    (Lipschitzness of r𝑟ritalic_r) The reward is Lipschitz with respect to decision w𝑤\BFwitalic_w

    |r(w1,y)r(w2,y)|Lrw1w2.𝑟subscript𝑤1𝑦𝑟subscript𝑤2𝑦subscript𝐿𝑟normsubscript𝑤1subscript𝑤2|r(\BFw_{1},\BFy)-r(\BFw_{2},\BFy)|\leq L_{r}\|\BFw_{1}-\BFw_{2}\|.| italic_r ( italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_y ) - italic_r ( italic_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_y ) | ≤ italic_L start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT ∥ italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∥ .

Assumption 8.1.A assumes that the optimal solution w*()superscript𝑤\BFw^{*}(\cdot)italic_w start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ( ⋅ ) is a Lwsubscript𝐿𝑤L_{w}italic_L start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT-Lipschitz with respect to the parameter θ𝜃\BFthetaitalic_θ. It can be further justified when Fθsubscript𝐹𝜃F_{\theta}italic_F start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT is a discrete distribution and r𝑟ritalic_r is strongly concave with respect to w𝑤witalic_w Qi et al. (2021). Assumption 8.1.B is a common assumption assuming that the reward function is Lipschitz with respect to the decision.

Under these assumptions, we have

|π(θ^|τ𝒜|)z*|LrLwθ^|τ𝒜|θ*.𝜋subscript^𝜃subscript𝜏𝒜superscript𝑧subscript𝐿𝑟subscript𝐿𝑤normsubscript^𝜃subscript𝜏𝒜superscript𝜃|\pi(\hat{\BFtheta}_{|\BFtau_{\mathcal{A}}|})-z^{*}|\leq L_{r}L_{w}\|\hat{% \BFtheta}_{|\BFtau_{\mathcal{A}}|}-\BFtheta^{*}\|.| italic_π ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT | italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT | end_POSTSUBSCRIPT ) - italic_z start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT | ≤ italic_L start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT ∥ over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT | italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT | end_POSTSUBSCRIPT - italic_θ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∥ .

Then for any τ𝒜subscript𝜏𝒜\BFtau_{\mathcal{A}}italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT, we can represent

z*v(|τ𝒜|)LrLwθ^|τ𝒜|θ*=Lr,wθ^|τ𝒜|θ*,superscript𝑧𝑣subscript𝜏𝒜subscript𝐿𝑟subscript𝐿𝑤normsubscript^𝜃subscript𝜏𝒜superscript𝜃subscript𝐿𝑟𝑤normsubscript^𝜃subscript𝜏𝒜superscript𝜃z^{*}-v(|\BFtau_{\mathcal{A}}|)\leq L_{r}L_{w}\|\hat{\BFtheta}_{|\BFtau_{% \mathcal{A}}|}-\BFtheta^{*}\|=L_{r,w}\|\hat{\BFtheta}_{|\BFtau_{\mathcal{A}}|}% -\BFtheta^{*}\|,italic_z start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT - italic_v ( | italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT | ) ≤ italic_L start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT ∥ over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT | italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT | end_POSTSUBSCRIPT - italic_θ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∥ = italic_L start_POSTSUBSCRIPT italic_r , italic_w end_POSTSUBSCRIPT ∥ over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT | italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT | end_POSTSUBSCRIPT - italic_θ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∥ ,

where Lr,w=LrLwsubscript𝐿𝑟𝑤subscript𝐿𝑟subscript𝐿𝑤L_{r,w}=L_{r}L_{w}italic_L start_POSTSUBSCRIPT italic_r , italic_w end_POSTSUBSCRIPT = italic_L start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT italic_L start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT.

9 Proof for Proposition 5.2

In this proof, we omit the dependency of τ𝒜subscript𝜏𝒜\BFtau_{\mathcal{A}}italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT in θ^τ𝒜FLsubscriptsuperscript^𝜃𝐹𝐿subscript𝜏𝒜\hat{\BFtheta}^{FL}_{\BFtau_{\mathcal{A}}}over^ start_ARG italic_θ end_ARG start_POSTSUPERSCRIPT italic_F italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT end_POSTSUBSCRIPT for notation simplicity. And we use θ^FLsubscript^𝜃𝐹𝐿\hat{\BFtheta}_{FL}over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_F italic_L end_POSTSUBSCRIPT to denote θ^τ𝒜FLsubscriptsuperscript^𝜃𝐹𝐿subscript𝜏𝒜\hat{\BFtheta}^{FL}_{\BFtau_{\mathcal{A}}}over^ start_ARG italic_θ end_ARG start_POSTSUPERSCRIPT italic_F italic_L end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT caligraphic_A end_POSTSUBSCRIPT end_POSTSUBSCRIPT . Under Shapely equilibrium, all agents provide data but split each sample as a single agent identity. Hence τ=[1,,1]R|m|×1𝜏11superscript𝑅𝑚1\BFtau=[1,\dots,1]\in R^{|\BFm|\times 1}italic_τ = [ 1 , … , 1 ] ∈ italic_R start_POSTSUPERSCRIPT | italic_m | × 1 end_POSTSUPERSCRIPT. By assumption 5.1, for some μ>0𝜇0\mu>0italic_μ > 0 and all x,y𝑥𝑦\BFx,\ \BFyitalic_x , italic_y, L()𝐿L(\cdot)italic_L ( ⋅ ) satisfies

L(y)L(x)+L(x)T(yx)+μ2yx2.𝐿𝑦𝐿𝑥𝐿superscript𝑥𝑇𝑦𝑥𝜇2superscriptnorm𝑦𝑥2L(\BFy)\geq L(\BFx)+\nabla L(\BFx)^{T}(\BFy-\BFx)+\frac{\mu}{2}\|\BFy-\BFx\|^{% 2}.italic_L ( italic_y ) ≥ italic_L ( italic_x ) + ∇ italic_L ( italic_x ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( italic_y - italic_x ) + divide start_ARG italic_μ end_ARG start_ARG 2 end_ARG ∥ italic_y - italic_x ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

Hence, for θFL*=\argminθL(θ)subscriptsuperscript𝜃𝐹𝐿subscript\argmin𝜃𝐿𝜃\BFtheta^{*}_{FL}=\argmin_{\BFtheta}L(\BFtheta)italic_θ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_F italic_L end_POSTSUBSCRIPT = start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT italic_L ( italic_θ ),

L(θFL*)L(θ^FL)ϵθFL*θ^FL(2ϵμ)1/2.𝐿subscriptsuperscript𝜃𝐹𝐿𝐿subscript^𝜃𝐹𝐿italic-ϵnormsubscriptsuperscript𝜃𝐹𝐿subscript^𝜃𝐹𝐿superscript2italic-ϵ𝜇12L(\BFtheta^{*}_{FL})-L(\hat{\BFtheta}_{FL})\leq\epsilon\ \Rightarrow\ \|% \BFtheta^{*}_{FL}-\hat{\BFtheta}_{FL}\|\leq\left(\frac{2\epsilon}{\mu}\right)^% {1/2}.italic_L ( italic_θ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_F italic_L end_POSTSUBSCRIPT ) - italic_L ( over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_F italic_L end_POSTSUBSCRIPT ) ≤ italic_ϵ ⇒ ∥ italic_θ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_F italic_L end_POSTSUBSCRIPT - over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_F italic_L end_POSTSUBSCRIPT ∥ ≤ ( divide start_ARG 2 italic_ϵ end_ARG start_ARG italic_μ end_ARG ) start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT . (13)

We now consider a sequence of auxiliary training results θ^FLa=1Tt=0T11|m|k=1mθktsubscriptsuperscript^𝜃𝑎𝐹𝐿1𝑇subscriptsuperscript𝑇1𝑡01𝑚subscriptsuperscript𝑚𝑘1subscriptsuperscript𝜃𝑡𝑘\hat{\BFtheta}^{a}_{FL}=\frac{1}{T}\sum^{T-1}_{t=0}\frac{1}{|\BFm|}\sum^{\BFm}% _{k=1}\BFtheta^{t}_{k}over^ start_ARG italic_θ end_ARG start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_F italic_L end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_T end_ARG ∑ start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG | italic_m | end_ARG ∑ start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT (which is not actually computed in algorithm, but is useful for analysis). Let θ^FL=1Tt=0T1θtsubscript^𝜃𝐹𝐿1𝑇subscriptsuperscript𝑇1𝑡0superscript𝜃𝑡\hat{\BFtheta}_{FL}=\frac{1}{T}\sum^{T-1}_{t=0}\BFtheta^{t}over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_F italic_L end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_T end_ARG ∑ start_POSTSUPERSCRIPT italic_T - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT, with θt=θt1superscript𝜃𝑡superscript𝜃𝑡1\BFtheta^{t}=\BFtheta^{t-1}italic_θ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = italic_θ start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT if t{H,2H,,T1}𝑡𝐻2𝐻𝑇1t\notin\{H,2H,\dots,T-1\}italic_t ∉ { italic_H , 2 italic_H , … , italic_T - 1 }.

θ^FLaθ^FLnormsubscriptsuperscript^𝜃𝑎𝐹𝐿subscript^𝜃𝐹𝐿\displaystyle\|\hat{\BFtheta}^{a}_{FL}-\hat{\BFtheta}_{FL}\|∥ over^ start_ARG italic_θ end_ARG start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_F italic_L end_POSTSUBSCRIPT - over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_F italic_L end_POSTSUBSCRIPT ∥ =1Tt=1T(1|m|k=1mθktθt)absentnorm1𝑇subscriptsuperscript𝑇𝑡11𝑚subscriptsuperscript𝑚𝑘1subscriptsuperscript𝜃𝑡𝑘superscript𝜃𝑡\displaystyle=\|\frac{1}{T}\sum^{T}_{t=1}\left(\frac{1}{|\BFm|}\sum^{\BFm}_{k=% 1}\BFtheta^{t}_{k}-\BFtheta^{t}\right)\|= ∥ divide start_ARG 1 end_ARG start_ARG italic_T end_ARG ∑ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG | italic_m | end_ARG ∑ start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_θ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ∥ (14)
1Tt=1T1|m|k=1mθktθtabsent1𝑇subscriptsuperscript𝑇𝑡1norm1𝑚subscriptsuperscript𝑚𝑘1subscriptsuperscript𝜃𝑡𝑘superscript𝜃𝑡\displaystyle\leq\frac{1}{T}\sum^{T}_{t=1}\|\frac{1}{|\BFm|}\sum^{\BFm}_{k=1}% \BFtheta^{t}_{k}-\BFtheta^{t}\|≤ divide start_ARG 1 end_ARG start_ARG italic_T end_ARG ∑ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT ∥ divide start_ARG 1 end_ARG start_ARG | italic_m | end_ARG ∑ start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_θ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∥
1Tt{H,2H,,T1}j=1H11|m|j=1mθkt+jθtabsent1𝑇subscriptsuperscript𝑡𝐻2𝐻𝑇1subscriptsuperscript𝐻1𝑗1norm1𝑚subscriptsuperscript𝑚𝑗1subscriptsuperscript𝜃superscript𝑡𝑗𝑘superscript𝜃superscript𝑡\displaystyle\leq\frac{1}{T}\sum_{t^{\prime}\in\{H,2H,\dots,T-1\}}\sum^{H-1}_{% j=1}\|\frac{1}{|\BFm|}\sum^{\BFm}_{j=1}\BFtheta^{t^{\prime}+j}_{k}-\BFtheta^{t% ^{\prime}}\|≤ divide start_ARG 1 end_ARG start_ARG italic_T end_ARG ∑ start_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ { italic_H , 2 italic_H , … , italic_T - 1 } end_POSTSUBSCRIPT ∑ start_POSTSUPERSCRIPT italic_H - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT ∥ divide start_ARG 1 end_ARG start_ARG | italic_m | end_ARG ∑ start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT italic_θ start_POSTSUPERSCRIPT italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT + italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_θ start_POSTSUPERSCRIPT italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ∥
=1Tt{H,2H,,T1}j=1H11|m|j=1m(θkt+j1ρLk(θkt+j1))θtabsent1𝑇subscriptsuperscript𝑡𝐻2𝐻𝑇1subscriptsuperscript𝐻1𝑗1norm1𝑚subscriptsuperscript𝑚𝑗1subscriptsuperscript𝜃superscript𝑡𝑗1𝑘𝜌subscript𝐿𝑘subscriptsuperscript𝜃superscript𝑡𝑗1𝑘superscript𝜃superscript𝑡\displaystyle=\frac{1}{T}\sum_{t^{\prime}\in\{H,2H,\dots,T-1\}}\sum^{H-1}_{j=1% }\|\frac{1}{|\BFm|}\sum^{\BFm}_{j=1}(\BFtheta^{t^{\prime}+j-1}_{k}-\rho\nabla L% _{k}(\BFtheta^{t^{\prime}+j-1}_{k}))-\BFtheta^{t^{\prime}}\|= divide start_ARG 1 end_ARG start_ARG italic_T end_ARG ∑ start_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ { italic_H , 2 italic_H , … , italic_T - 1 } end_POSTSUBSCRIPT ∑ start_POSTSUPERSCRIPT italic_H - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT ∥ divide start_ARG 1 end_ARG start_ARG | italic_m | end_ARG ∑ start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT ( italic_θ start_POSTSUPERSCRIPT italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT + italic_j - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_ρ ∇ italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_θ start_POSTSUPERSCRIPT italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT + italic_j - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ) - italic_θ start_POSTSUPERSCRIPT italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT ∥
1TTHj=1H1jρξρHξ2.absent1𝑇𝑇𝐻subscriptsuperscript𝐻1𝑗1𝑗𝜌𝜉𝜌𝐻𝜉2\displaystyle\leq\frac{1}{T}\frac{T}{H}\sum^{H-1}_{j=1}j\rho\xi\leq\frac{\rho H% \xi}{2}.≤ divide start_ARG 1 end_ARG start_ARG italic_T end_ARG divide start_ARG italic_T end_ARG start_ARG italic_H end_ARG ∑ start_POSTSUPERSCRIPT italic_H - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT italic_j italic_ρ italic_ξ ≤ divide start_ARG italic_ρ italic_H italic_ξ end_ARG start_ARG 2 end_ARG .

Define σ2=1|m|k=1|m|Lk(θFL*)2superscript𝜎21𝑚subscriptsuperscript𝑚𝑘1superscriptnormsubscript𝐿𝑘subscriptsuperscript𝜃𝐹𝐿2\sigma^{2}=\frac{1}{|\BFm|}\sum^{|\BFm|}_{k=1}\|\nabla L_{k}(\BFtheta^{*}_{FL}% )\|^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = divide start_ARG 1 end_ARG start_ARG | italic_m | end_ARG ∑ start_POSTSUPERSCRIPT | italic_m | end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT ∥ ∇ italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_θ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_F italic_L end_POSTSUBSCRIPT ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, and let L𝐿Litalic_L be the L𝐿Litalic_L-smooth parameter where

0Lk(x)Lk(y)Lk(y),xyL2xy2.0subscript𝐿𝑘𝑥subscript𝐿𝑘𝑦subscript𝐿𝑘𝑦𝑥𝑦𝐿2superscriptnorm𝑥𝑦20\leq L_{k}(\BFx)-L_{k}(\BFy)-\langle\nabla L_{k}(\BFy),\BFx-\BFy\rangle\leq% \frac{L}{2}\|\BFx-\BFy\|^{2}.0 ≤ italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x ) - italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_y ) - ⟨ ∇ italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_y ) , italic_x - italic_y ⟩ ≤ divide start_ARG italic_L end_ARG start_ARG 2 end_ARG ∥ italic_x - italic_y ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

Following Corollary 1 of Khaled et al. (2019), under data splitting equilibrium, for large |m|𝑚|\BFm|| italic_m | where |m|𝑚|\BFm|| italic_m | is in same order of T𝑇Titalic_T, in order to get same order dependency on T𝑇Titalic_T and |m|𝑚|\BFm|| italic_m | for total communication number, H=T1/4|m|3/4𝐻superscript𝑇14superscript𝑚34H=T^{1/4}|\BFm|^{-3/4}italic_H = italic_T start_POSTSUPERSCRIPT 1 / 4 end_POSTSUPERSCRIPT | italic_m | start_POSTSUPERSCRIPT - 3 / 4 end_POSTSUPERSCRIPT, with optimized step size ρ=|m|4LT𝜌𝑚4𝐿𝑇\rho=\frac{\sqrt{|\BFm|}}{4L\sqrt{T}}italic_ρ = divide start_ARG square-root start_ARG | italic_m | end_ARG end_ARG start_ARG 4 italic_L square-root start_ARG italic_T end_ARG end_ARG, this leads to

L(θ^FLa)L(θFL*)(8Lθ0θ*2+3σ22L)1T|m|,𝐿subscriptsuperscript^𝜃𝑎𝐹𝐿𝐿subscriptsuperscript𝜃𝐹𝐿8𝐿superscriptnormsubscript𝜃0superscript𝜃23superscript𝜎22𝐿1𝑇𝑚L(\hat{\BFtheta}^{a}_{FL})-L({\BFtheta}^{*}_{FL})\leq\left(8L\|\BFtheta_{0}-% \BFtheta^{*}\|^{2}+\frac{3\sigma^{2}}{2L}\right)\frac{1}{\sqrt{T|\BFm|}},italic_L ( over^ start_ARG italic_θ end_ARG start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_F italic_L end_POSTSUBSCRIPT ) - italic_L ( italic_θ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_F italic_L end_POSTSUBSCRIPT ) ≤ ( 8 italic_L ∥ italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - italic_θ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 3 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 italic_L end_ARG ) divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_T | italic_m | end_ARG end_ARG ,

which, by equation (13), implies

θFL*θ^FLa(16Lθ0θ*2μ+3σ2Lμ)1/2(T|m|)1/4.normsubscriptsuperscript𝜃𝐹𝐿subscriptsuperscript^𝜃𝑎𝐹𝐿superscript16𝐿superscriptnormsubscript𝜃0superscript𝜃2𝜇3superscript𝜎2𝐿𝜇12superscript𝑇𝑚14\|\BFtheta^{*}_{FL}-\hat{\BFtheta}^{a}_{FL}\|\leq\left(\frac{16L\|\BFtheta_{0}% -\BFtheta^{*}\|^{2}}{\mu}+\frac{3\sigma^{2}}{L\mu}\right)^{1/2}(T|\BFm|)^{-1/4}.∥ italic_θ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_F italic_L end_POSTSUBSCRIPT - over^ start_ARG italic_θ end_ARG start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_F italic_L end_POSTSUBSCRIPT ∥ ≤ ( divide start_ARG 16 italic_L ∥ italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - italic_θ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_μ end_ARG + divide start_ARG 3 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_L italic_μ end_ARG ) start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT ( italic_T | italic_m | ) start_POSTSUPERSCRIPT - 1 / 4 end_POSTSUPERSCRIPT .

Moreover, by equation (14),

θ^FLaθ^FLξ8L(T|m|)1/4.normsubscriptsuperscript^𝜃𝑎𝐹𝐿subscript^𝜃𝐹𝐿𝜉8𝐿superscript𝑇𝑚14\|\hat{\BFtheta}^{a}_{FL}-\hat{\BFtheta}_{FL}\|\leq\frac{\xi}{8L}(T|\BFm|)^{-1% /4}.∥ over^ start_ARG italic_θ end_ARG start_POSTSUPERSCRIPT italic_a end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_F italic_L end_POSTSUBSCRIPT - over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_F italic_L end_POSTSUBSCRIPT ∥ ≤ divide start_ARG italic_ξ end_ARG start_ARG 8 italic_L end_ARG ( italic_T | italic_m | ) start_POSTSUPERSCRIPT - 1 / 4 end_POSTSUPERSCRIPT .

Hence,

θFL*θ^FL((16Lθ0θ*2μ+3σ2Lμ)1/2+ξ8L)(T|m|)1/4.normsubscriptsuperscript𝜃𝐹𝐿subscript^𝜃𝐹𝐿superscript16𝐿superscriptnormsubscript𝜃0superscript𝜃2𝜇3superscript𝜎2𝐿𝜇12𝜉8𝐿superscript𝑇𝑚14\|\BFtheta^{*}_{FL}-\hat{\BFtheta}_{FL}\|\leq\left(\left(\frac{16L\|\BFtheta_{% 0}-\BFtheta^{*}\|^{2}}{\mu}+\frac{3\sigma^{2}}{L\mu}\right)^{1/2}+\frac{\xi}{8% L}\right)(T|\BFm|)^{-1/4}.∥ italic_θ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_F italic_L end_POSTSUBSCRIPT - over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_F italic_L end_POSTSUBSCRIPT ∥ ≤ ( ( divide start_ARG 16 italic_L ∥ italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - italic_θ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_μ end_ARG + divide start_ARG 3 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_L italic_μ end_ARG ) start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT + divide start_ARG italic_ξ end_ARG start_ARG 8 italic_L end_ARG ) ( italic_T | italic_m | ) start_POSTSUPERSCRIPT - 1 / 4 end_POSTSUPERSCRIPT .

In order to guarantee P(θ^FLθ*ε(|m|,δ0))δ0𝑃normsubscript^𝜃𝐹𝐿superscript𝜃𝜀𝑚subscript𝛿0subscript𝛿0{P(\|\hat{\BFtheta}_{FL}-\BFtheta^{*}\|\geq\varepsilon(|\BFm|,\delta_{0}))\leq% \delta_{0}}italic_P ( ∥ over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_F italic_L end_POSTSUBSCRIPT - italic_θ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∥ ≥ italic_ε ( | italic_m | , italic_δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ) ≤ italic_δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. Let MLE optimal estimator θFL*subscriptsuperscript𝜃𝐹𝐿\BFtheta^{*}_{FL}italic_θ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_F italic_L end_POSTSUBSCRIPT satisfies P(θFL*θ*ε(|m|,δ0)2)p0,𝑃normsubscriptsuperscript𝜃𝐹𝐿superscript𝜃𝜀𝑚subscript𝛿02subscript𝑝0{P(\|\BFtheta^{*}_{FL}-\BFtheta^{*}\|\geq\frac{\varepsilon(|\BFm|,\delta_{0})}% {2})\leq p_{0},}italic_P ( ∥ italic_θ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_F italic_L end_POSTSUBSCRIPT - italic_θ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∥ ≥ divide start_ARG italic_ε ( | italic_m | , italic_δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) end_ARG start_ARG 2 end_ARG ) ≤ italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , A sufficient condition is

θ^FLθ^((16Lθ0θ*2μ+3σ2Lμ)1/2+ξ8L)(T|m|)1/4ε(|m|,δ0)2.normsubscript^𝜃𝐹𝐿^𝜃superscript16𝐿superscriptnormsubscript𝜃0superscript𝜃2𝜇3superscript𝜎2𝐿𝜇12𝜉8𝐿superscript𝑇𝑚14𝜀𝑚subscript𝛿02\|\hat{\BFtheta}_{FL}-\hat{\BFtheta}\|\leq\left(\left(\frac{16L\|\BFtheta_{0}-% \BFtheta^{*}\|^{2}}{\mu}+\frac{3\sigma^{2}}{L\mu}\right)^{1/2}+\frac{\xi}{8L}% \right)(T|\BFm|)^{-1/4}\leq\frac{\varepsilon(|\BFm|,\delta_{0})}{2}.∥ over^ start_ARG italic_θ end_ARG start_POSTSUBSCRIPT italic_F italic_L end_POSTSUBSCRIPT - over^ start_ARG italic_θ end_ARG ∥ ≤ ( ( divide start_ARG 16 italic_L ∥ italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - italic_θ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_μ end_ARG + divide start_ARG 3 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_L italic_μ end_ARG ) start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT + divide start_ARG italic_ξ end_ARG start_ARG 8 italic_L end_ARG ) ( italic_T | italic_m | ) start_POSTSUPERSCRIPT - 1 / 4 end_POSTSUPERSCRIPT ≤ divide start_ARG italic_ε ( | italic_m | , italic_δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) end_ARG start_ARG 2 end_ARG .

Hence, Nsync=TH=(T|m|)3/4subscript𝑁𝑠𝑦𝑛𝑐𝑇𝐻superscript𝑇𝑚34N_{sync}=\frac{T}{H}=(T|\BFm|)^{3/4}italic_N start_POSTSUBSCRIPT italic_s italic_y italic_n italic_c end_POSTSUBSCRIPT = divide start_ARG italic_T end_ARG start_ARG italic_H end_ARG = ( italic_T | italic_m | ) start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT should satisfy

Nsync=(T|m|)3/4((64Lθ0θ*2μ+12σ2Lμ)1/2+ξ4L)3(ε(|m|,δ0))3.subscript𝑁𝑠𝑦𝑛𝑐superscript𝑇𝑚34superscriptsuperscript64𝐿superscriptnormsubscript𝜃0superscript𝜃2𝜇12superscript𝜎2𝐿𝜇12𝜉4𝐿3superscript𝜀𝑚subscript𝛿03{N_{sync}=(T|\BFm|)^{3/4}\geq\left(\left(\frac{64L\|\BFtheta_{0}-\BFtheta^{*}% \|^{2}}{\mu}+\frac{12\sigma^{2}}{L\mu}\right)^{1/2}+\frac{\xi}{4L}\right)^{3}(% \varepsilon(|\BFm|,\delta_{0}))^{-3}.}italic_N start_POSTSUBSCRIPT italic_s italic_y italic_n italic_c end_POSTSUBSCRIPT = ( italic_T | italic_m | ) start_POSTSUPERSCRIPT 3 / 4 end_POSTSUPERSCRIPT ≥ ( ( divide start_ARG 64 italic_L ∥ italic_θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - italic_θ start_POSTSUPERSCRIPT * end_POSTSUPERSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_μ end_ARG + divide start_ARG 12 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_L italic_μ end_ARG ) start_POSTSUPERSCRIPT 1 / 2 end_POSTSUPERSCRIPT + divide start_ARG italic_ξ end_ARG start_ARG 4 italic_L end_ARG ) start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ( italic_ε ( | italic_m | , italic_δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT . (15)